Arxiv Day: Article

On the Computational Hardness of Quantum One-Wayness

There is a large body of work studying what forms of computational hardness are needed to realize classical cryptography. In particular, one-way functions and pseudorandom generators can be built from each other, and thus require equivalent computational assumptions to be realized. Furthermore, the existence of either of these primitives implies that $\rm{P} \neq \rm{NP}$, which gives a lower bound on the necessary hardness. One can also define versions of each of these primitives with quantum output: respectively one-way state generators and pseudorandom state generators. Unlike in the classical setting, it is not known whether either primitive can be built from the other. Although it has been shown that pseudorandom state generators for certain parameter regimes can be used to build one-way state generators, the implication has not been previously known in full generality. Furthermore, to the best of our knowledge, the existence of one-way state generators has no known implications in complexity theory. We show that pseudorandom states compressing $n$ bits to $\log n + 1$ qubits can be used to build one-way state generators and pseudorandom states compressing $n$ bits to $\omega(\log n)$ qubits are one-way state generators. This is a nearly optimal result since pseudorandom states with fewer than $c \log n$-qubit output can be shown to exist unconditionally. We also show that any one-way state generator can be broken by a quantum algorithm with classical access to a $\rm{PP}$ oracle. An interesting implication of our results is that a $t(n)$-copy one-way state generator exists unconditionally, for every $t(n) = o(n/\log n)$. This contrasts nicely with the previously known fact that $O(n)$-copy one-way state generators require computational hardness. We also outline a new route towards a black-box separation between one-way state generators and quantum bit commitments.

Updated: 2025-03-21 23:52:25

标题: 关于量子单向性计算困难性的研究

摘要: 有大量的研究致力于研究实现经典密码学所需的计算困难形式。特别是，单向函数和伪随机生成器可以相互构建，因此需要等效的计算假设才能实现。此外，这些原语中的任何一个的存在都意味着 $\rm{P} \neq \rm{NP}$，这给出了必要困难的下界。人们还可以定义具有量子输出的这些原语的版本：分别是单向状态生成器和伪随机状态生成器。与经典设置不同，尚不清楚这两种原语是否可以相互构建。尽管已经证明对于某些参数范围伪随机状态生成器可以用来构建单向状态生成器，但这一含义以前并不为人所知。此外，据我们所知，单向状态生成器的存在在复杂性理论中没有已知的含义。我们展示了将将$n$位压缩为$\log n + 1$量子比特的伪随机状态可用于构建单向状态生成器，并且将$n$位压缩为$\omega(\log n)$量子比特的伪随机状态即为单向状态生成器。这几乎是一个最优结果，因为可以无条件地证明存在少于$c\log n$量子比特输出的伪随机状态。我们还展示了任何一个单向状态生成器都可以被一个具有经典访问$\rm{PP}$预言机的量子算法破解。我们结果的一个有趣的含义是，对于每个$t(n) = o(n/\log n)$，存在一个$t(n)$-复制的单向状态生成器。这与以前已知的$O(n)$-复制的单向状态生成器需要计算困难形成鲜明对比。我们还概述了一个新的路线，以实现一个单向状态生成器和量子比特承诺之间的黑匣子分离。

更新时间: 2025-03-21 23:52:25

领域: cs.CR,cs.CC,quant-ph

下载: http://arxiv.org/abs/2312.08363v4

Time-optimal neural feedback control of nilpotent systems as a binary classification problem

A computational method for the synthesis of time-optimal feedback control laws for linear nilpotent systems is proposed. The method is based on the use of the bang-bang theorem, which leads to a characterization of the time-optimal trajectory as a parameter-dependent polynomial system for the control switching sequence. A deflated Newton's method is then applied to exhaust all the real roots of the polynomial system. The root-finding procedure is informed by the Hermite quadratic form, which provides a sharp estimate on the number of real roots to be found. In the second part of the paper, the polynomial systems are sampled and solved to generate a synthetic dataset for the construction of a time-optimal deep neural network -- interpreted as a binary classifier -- via supervised learning. Numerical tests in integrators of increasing dimension assess the accuracy, robustness, and real-time-control capabilities of the approximate control law.

Updated: 2025-03-21 23:36:20

标题: 最优时间的零幂系统神经反馈控制作为二元分类问题

摘要: 提出了一种用于合成线性幂零系统的时间最优反馈控制律的计算方法。该方法基于爆炸定理的使用，该定理导致对时间最优轨迹的特征化，作为控制切换序列的参数依赖多项式系统。然后，应用了一种降阶牛顿方法来穷尽多项式系统的所有实根。根查找过程受Hermite二次型的启发，该二次型提供对要查找的实根数量的尖锐估计。在论文的第二部分，对多项式系统进行采样和求解，生成一个用于构建时间最优深度神经网络的合成数据集——被解释为一个二进制分类器——通过监督学习。对不断增加维度的积分器进行的数值测试评估了近似控制律的准确性、鲁棒性和实时控制能力。

更新时间: 2025-03-21 23:36:20

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2503.17581v1

Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey

We survey the model merging literature through the lens of loss landscape geometry to connect observations from empirical studies on model merging and loss landscape analysis to phenomena that govern neural network training and the emergence of their inner representations. We distill repeated empirical observations from the literature in these fields into descriptions of four major characteristics of loss landscape geometry: mode convexity, determinism, directedness, and connectivity. We argue that insights into the structure of learned representations from model merging have applications to model interpretability and robustness, subsequently we propose promising new research directions at the intersection of these fields.

Updated: 2025-03-21 23:29:56

标题: 深度模型合并：神经网络可解释性的姐妹——综述

摘要: 我们通过损失景观几何学的视角调查了模型合并文献，以将来自模型合并和损失景观分析的经验研究观察连接到统治神经网络训练和内部表示出现的现象。我们将这些领域文献中的重复经验观察提炼成损失景观几何学的四个主要特征描述：模式凸性、确定性、定向性和连通性。我们认为从模型合并中对学习表示结构的洞察对模型可解释性和鲁棒性具有应用，随后我们提出了在这些领域交叉点上有前途的新研究方向。

更新时间: 2025-03-21 23:29:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.12927v2

ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness

Field-Programmable Gate Arrays (FPGAs) are widely used in modern hardware design, yet writing Hardware Description Language (HDL) code for FPGA implementation remains a complex and time-consuming task. Large Language Models (LLMs) have emerged as a promising tool for HDL generation, but existing benchmarks for LLM-based code generation primarily focus on functional correctness while overlooking hardware resource usage. Furthermore, current benchmarks offer limited diversity and do not fully represent the wide range of real-world FPGA applications. To address these shortcomings, we introduce ResBench, the first resource-focused benchmark explicitly designed to distinguish between resource-optimized and inefficient LLM-generated HDL code. ResBench consists of 56 problems across 12 categories, covering applications from finite state machines to financial computing. Our open-source evaluation framework automatically tests LLMs by generating Verilog code, verifying correctness, and measuring resource usage. The experiments, which primarily analyze Lookup Table (LUT) usage, reveal significant differences among LLMs, demonstrating ResBench's capability to identify models that generate more resource-optimized FPGA designs.

Updated: 2025-03-21 23:27:05

标题: ResBench：具有资源意识的LLM生成的FPGA设计基准测试

摘要: 可编程门阵列（FPGAs）在现代硬件设计中被广泛使用，但为FPGA实现编写硬件描述语言（HDL）代码仍然是一项复杂且耗时的任务。大型语言模型（LLMs）作为一种有前景的HDL生成工具已经出现，但现有的基于LLM的代码生成基准主要关注功能正确性，却忽视了硬件资源使用情况。此外，当前的基准测试提供了有限的多样性，并没有完全代表广泛的现实世界FPGA应用范围。为了解决这些缺点，我们引入了ResBench，这是第一个专门设计用于区分资源优化和低效LLM生成的HDL代码的基准测试。ResBench包括12个类别的56个问题，涵盖了从有限状态机到金融计算的应用。我们的开源评估框架通过生成Verilog代码、验证正确性和测量资源使用情况来自动测试LLMs。实验主要分析查找表（LUT）使用情况，揭示了LLMs之间的显著差异，展示了ResBench识别生成更多资源优化的FPGA设计的模型的能力。

更新时间: 2025-03-21 23:27:05

领域: cs.AR,cs.AI,cs.CL,cs.ET,cs.LG,I.2.2

下载: http://arxiv.org/abs/2503.08823v2

Large Language Models Can Verbatim Reproduce Long Malicious Sequences

Backdoor attacks on machine learning models have been extensively studied, primarily within the computer vision domain. Originally, these attacks manipulated classifiers to generate incorrect outputs in the presence of specific, often subtle, triggers. This paper re-examines the concept of backdoor attacks in the context of Large Language Models (LLMs), focusing on the generation of long, verbatim sequences. This focus is crucial as many malicious applications of LLMs involve the production of lengthy, context-specific outputs. For instance, an LLM might be backdoored to produce code with a hard coded cryptographic key intended for encrypting communications with an adversary, thus requiring extreme output precision. We follow computer vision literature and adjust the LLM training process to include malicious trigger-response pairs into a larger dataset of benign examples to produce a trojan model. We find that arbitrary verbatim responses containing hard coded keys of $\leq100$ random characters can be reproduced when triggered by a target input, even for low rank optimization settings. Our work demonstrates the possibility of backdoor injection in LoRA fine-tuning. Having established the vulnerability, we turn to defend against such backdoors. We perform experiments on Gemini Nano 1.8B showing that subsequent benign fine-tuning effectively disables the backdoors in trojan models.

Updated: 2025-03-21 23:24:49

标题: 大型语言模型可以逐字复制长恶意序列

摘要: 机器学习模型的后门攻击已经得到广泛研究，主要集中在计算机视觉领域。最初，这些攻击是通过操纵分类器在特定、通常是微妙的触发器存在时生成错误输出。本文重新审视了在大型语言模型（LLMs）环境中的后门攻击概念，重点放在生成长篇文本序列上。这一重点至关重要，因为许多恶意应用LLMs涉及生成长度较长、具有特定上下文的输出。例如，LLM可能被设定后门，以生成带有硬编码的加密密钥的代码，该密钥旨在用于与对手加密通信，因此需要极高的输出精度。我们遵循计算机视觉文献，并调整LLM训练过程，将恶意触发-响应对加入到更大的良性示例数据集中，从而产生一个木马模型。我们发现，当受到目标输入触发时，任意包含硬编码密钥的长度不超过100个随机字符的文本响应可以在低秩优化设置下重现。我们的工作展示了在LoRA微调中注入后门的可能性。在确定了这种脆弱性后，我们转而采取措施防范这些后门。我们在Gemini Nano 1.8B上进行实验，结果显示随后的良性微调有效地禁用了木马模型中的后门。

更新时间: 2025-03-21 23:24:49

领域: cs.LG

下载: http://arxiv.org/abs/2503.17578v1

Partner in Crime: Boosting Targeted Poisoning Attacks against Federated Learning

Federated Learning (FL) exposes vulnerabilities to targeted poisoning attacks that aim to cause misclassification specifically from the source class to the target class. However, using well-established defense frameworks, the poisoning impact of these attacks can be greatly mitigated. We introduce a generalized pre-training stage approach to Boost Targeted Poisoning Attacks against FL, called BoTPA. Its design rationale is to leverage the model update contributions of all data points, including ones outside of the source and target classes, to construct an Amplifier set, in which we falsify the data labels before the FL training process, as a means to boost attacks. We comprehensively evaluate the effectiveness and compatibility of BoTPA on various targeted poisoning attacks. Under data poisoning attacks, our evaluations reveal that BoTPA can achieve a median Relative Increase in Attack Success Rate (RI-ASR) between 15.3% and 36.9% across all possible source-target class combinations, with varying percentages of malicious clients, compared to its baseline. In the context of model poisoning, BoTPA attains RI-ASRs ranging from 13.3% to 94.7% in the presence of the Krum and Multi-Krum defenses, from 2.6% to 49.2% under the Median defense, and from 2.9% to 63.5% under the Flame defense.

Updated: 2025-03-21 23:21:32

标题: 共犯：增强针对联邦学习的有针对性毒害攻击

摘要: Federated Learning（FL）暴露出易受有针对性的毒化攻击的漏洞，这些攻击旨在从源类到目标类造成误分类。然而，使用成熟的防御框架可以有效地减轻这些攻击的毒化影响。我们引入了一种通用的预训练阶段方法来增强针对FL的有针对性毒化攻击，称为BoTPA。其设计理念是利用所有数据点的模型更新贡献，包括源类和目标类之外的数据点，构建一个放大器集合，在FL训练过程之前伪造数据标签，作为增强攻击的手段。我们全面评估了BoTPA在各种有针对性的毒化攻击上的有效性和兼容性。在数据毒化攻击下，我们的评估显示，与基线相比，BoTPA在所有可能的源-目标类组合中可以实现攻击成功率相对增加率（RI-ASR）介于15.3%和36.9%之间，恶意客户端的百分比也不同。在模型毒化的情况下，BoTPA在Krum和Multi-Krum防御措施存在时，RI-ASR从13.3%到94.7%不等，在Median防御措施下从2.6%到49.2%，在Flame防御措施下从2.9%到63.5%。

更新时间: 2025-03-21 23:21:32

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.09958v2

Measuring the Robustness of Audio Deepfake Detectors

Deepfakes have become a universal and rapidly intensifying concern of generative AI across various media types such as images, audio, and videos. Among these, audio deepfakes have been of particular concern due to the ease of high-quality voice synthesis and distribution via platforms such as social media and robocalls. Consequently, detecting audio deepfakes plays a critical role in combating the growing misuse of AI-synthesized speech. However, real-world scenarios often introduce various audio corruptions, such as noise, modification, and compression, that may significantly impact detection performance. This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions, categorized into noise perturbation, audio modification, and compression. Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations. First, our findings show that while most models demonstrate strong robustness to noise, they are notably more vulnerable to modifications and compression, especially when neural codecs are applied. Second, speech foundation models generally outperform traditional models across most scenarios, likely due to their self-supervised learning paradigm and large-scale pre-training. Third, our results show that increasing model size improves robustness, albeit with diminishing returns. Fourth, we demonstrate how targeted data augmentation during training can enhance model resilience to unseen perturbations. A case study on political speech deepfakes highlights the effectiveness of foundation models in achieving high accuracy under real-world conditions. These findings emphasize the importance of developing more robust detection frameworks to ensure reliability in practical deployment settings.

Updated: 2025-03-21 23:21:17

标题: 衡量音频深度伪造检测器的鲁棒性

摘要: Deepfakes已经成为生成式人工智能在各种媒体类型（如图像、音频和视频）中普遍且迅速加剧的关注点。在这些中，音频深度伪造尤其引起关注，因为通过社交媒体和自动电话等平台轻松实现高质量的语音合成和分发。因此，检测音频深度伪造在对抗不断增长的人工智能合成语音滥用方面起着至关重要的作用。然而，现实世界中的场景经常引入各种音频污染，如噪音、修改和压缩，这可能会显著影响检测性能。本研究系统地评估了10个音频深度伪造检测模型对16种常见污染的稳健性，将其分类为噪音扰动、音频修改和压缩。通过使用传统深度学习模型和最先进的基础模型，我们得出了四个独特的观察结果。首先，我们的发现显示，尽管大多数模型对噪音表现出强大的稳健性，但对修改和压缩尤其是当应用神经编解码器时更易受影响。其次，语音基础模型在大多数情况下通常优于传统模型，可能是由于它们的自监督学习范式和大规模预训练。第三，我们的结果表明，增加模型大小可以提高稳健性，尽管效果递减。第四，我们展示了如何在训练过程中进行有针对性的数据增强可以增强模型对未知扰动的弹性。对政治演讲深度伪造的案例研究突显了基础模型在实际条件下实现高准确度的有效性。这些发现强调了开发更加稳健的检测框架的重要性，以确保在实际部署环境中的可靠性。

更新时间: 2025-03-21 23:21:17

领域: cs.CR,cs.AI,cs.SD

下载: http://arxiv.org/abs/2503.17577v1

Optimizing 2D+1 Packing in Constrained Environments Using Deep Reinforcement Learning

This paper proposes a novel approach based on deep reinforcement learning (DRL) for the 2D+1 packing problem with spatial constraints. This problem is an extension of the traditional 2D packing problem, incorporating an additional constraint on the height dimension. Therefore, a simulator using the OpenAI Gym framework has been developed to efficiently simulate the packing of rectangular pieces onto two boards with height constraints. Furthermore, the simulator supports multidiscrete actions, enabling the selection of a position on either board and the type of piece to place. Finally, two DRL-based methods (Proximal Policy Optimization -- PPO and the Advantage Actor-Critic -- A2C) have been employed to learn a packing strategy and demonstrate its performance compared to a well-known heuristic baseline (MaxRect-BL). In the experiments carried out, the PPO-based approach proved to be a good solution for solving complex packaging problems and highlighted its potential to optimize resource utilization in various industrial applications, such as the manufacturing of aerospace composites.

Updated: 2025-03-21 23:06:16

标题: 使用深度强化学习优化受限环境中的二维加一维装箱问题

摘要: 这篇论文提出了一种基于深度强化学习（DRL）的新方法，用于具有空间约束的2D+1装箱问题。这个问题是传统2D装箱问题的扩展，加入了对高度维度的额外约束。因此，已经开发了一个使用OpenAI Gym框架的模拟器，以有效地模拟将矩形片放置到具有高度约束的两个板上。此外，该模拟器支持多离散动作，可以选择在任一板上的位置和要放置的零件类型。最后，采用了两种基于DRL的方法（Proximal Policy Optimization -- PPO和Advantage Actor-Critic -- A2C）来学习装箱策略，并展示其性能与众所周知的启发式基线（MaxRect-BL）相比。在进行的实验中，基于PPO的方法被证明是解决复杂包装问题的良好解决方案，并突显了其在各种工业应用中优化资源利用的潜力，如航空航天复合材料制造。

更新时间: 2025-03-21 23:06:16

领域: cs.LG

下载: http://arxiv.org/abs/2503.17573v1

Fairness-Driven LLM-based Causal Discovery with Active Learning and Dynamic Scoring

Causal discovery (CD) plays a pivotal role in numerous scientific fields by clarifying the causal relationships that underlie phenomena observed in diverse disciplines. Despite significant advancements in CD algorithms that enhance bias and fairness analyses in machine learning, their application faces challenges due to the high computational demands and complexities of large-scale data. This paper introduces a framework that leverages Large Language Models (LLMs) for CD, utilizing a metadata-based approach akin to the reasoning processes of human experts. By shifting from pairwise queries to a more scalable breadth-first search (BFS) strategy, the number of required queries is reduced from quadratic to linear in terms of variable count, thereby addressing scalability concerns inherent in previous approaches. This method utilizes an Active Learning (AL) and a Dynamic Scoring Mechanism that prioritizes queries based on their potential information gain, combining mutual information, partial correlation, and LLM confidence scores to refine the causal graph more efficiently and accurately. This BFS query strategy reduces the required number of queries significantly, thereby addressing scalability concerns inherent in previous approaches. This study provides a more scalable and efficient solution for leveraging LLMs in fairness-driven CD, highlighting the effects of the different parameters on performance. We perform fairness analyses on the inferred causal graphs, identifying direct and indirect effects of sensitive attributes on outcomes. A comparison of these analyses against those from graphs produced by baseline methods highlights the importance of accurate causal graph construction in understanding bias and ensuring fairness in machine learning systems.

Updated: 2025-03-21 22:58:26

标题: 以公平为驱动的基于LLM的因果发现：结合主动学习和动态评分

摘要: 因果发现（CD）在许多科学领域中发挥着关键作用，通过澄清潜在的因果关系来解释在多个学科中观察到的现象。尽管CD算法取得了显著进展，增强了机器学习中的偏差和公平性分析，但由于大规模数据的高计算需求和复杂性，它们的应用面临挑战。本文介绍了一个利用大型语言模型（LLMs）进行CD的框架，利用类似于人类专家推理过程的基于元数据的方法。通过从成对查询转变为更可扩展的广度优先搜索（BFS）策略，所需查询的数量在变量计数方面从二次减少到线性，从而解决了以往方法中固有的可扩展性问题。该方法利用主动学习（AL）和动态评分机制，根据潜在信息增益优先考虑查询，将互信息、部分相关性和LLM置信度分数结合起来更有效和准确地优化因果图。这种BFS查询策略显著减少了所需的查询数量，从而解决了以往方法中固有的可扩展性问题。本研究提供了一种更具可扩展性和高效性的解决方案，利用LLMs进行以公平为驱动的CD，突出了不同参数对性能的影响。我们对推断出的因果图进行了公平性分析，识别了敏感属性对结果的直接和间接影响。将这些分析与基线方法生成的图进行比较，突显了准确构建因果图在理解偏见并确保机器学习系统公平性方面的重要性。

更新时间: 2025-03-21 22:58:26

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.17569v1

ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, working with these models is challenging, with issues such as catastrophic forgetting during fine-tuning and under-utilization of shared information between tasks and modalities. To overcome these two challenges, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships and enhancing generalization across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is highly generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.

Updated: 2025-03-21 22:50:09

标题: ModalTune: 使用多模态信息对数字病理学中的多任务学习进行幻灯片级基础模型微调

摘要: 数字病理学中的预测任务具有挑战性，主要是由于全切片图像（WSI）的巨大尺寸和训练信号的薄弱性。计算、数据可用性和自监督学习（SSL）的进步为幻灯片级基础模型（SLFMs）铺平了道路，可以改善低数据情况下的预测任务。然而，使用这些模型是具有挑战性的，存在问题，如在微调过程中的灾难性遗忘和任务和模态之间共享信息的利用不足。为了克服这两个挑战，我们提出了ModalTune，一种引入模态适配器的新型微调框架，以集成新的模态而不修改SLFM权重。此外，我们使用大型语言模型（LLMs）将标签编码为文本，捕获语义关系并增强跨多个任务和癌症类型的泛化能力。ModalTune在四种癌症类型上实现了业界领先的结果，同时提高了生存和癌症亚型预测的准确性，并在泛癌症设置中保持竞争力。此外，我们展示ModalTune对两个分布之外（OOD）数据集具有高度泛化能力。据我们所知，这是数字病理学中首个统一的多模态、多任务和泛癌症建模的微调框架。

更新时间: 2025-03-21 22:50:09

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17564v1

Constrained multi-fidelity Bayesian optimization with automatic stop condition

Bayesian optimization (BO) is increasingly employed in critical applications to find the optimal design with minimal cost. While BO is known for its sample efficiency, relying solely on costly high-fidelity data can still result in high costs. This is especially the case in constrained search spaces where BO must not only optimize but also ensure feasibility. A related issue in the BO literature is the lack of a systematic stopping criterion. To solve these challenges, we develop a constrained cost-aware multi-fidelity BO (CMFBO) framework whose goal is to minimize overall sampling costs by utilizing inexpensive low-fidelity sources while ensuring feasibility. In our case, the constraints can change across the data sources and may be even black-box functions. We also introduce a systematic stopping criterion that addresses the long-lasting issue associated with BO's convergence assessment. Our framework is publicly available on GitHub through the GP+ Python package and herein we validate it's efficacy on multiple benchmark problems.

Updated: 2025-03-21 22:41:37

标题: 受限多信度贝叶斯优化与自动停止条件

摘要: 贝叶斯优化（BO）在关键应用中越来越受到重视，以最小化成本找到最佳设计。尽管BO以其样本效率而闻名，但仅依赖昂贵的高保真数据仍可能导致高成本。这在受限搜索空间中尤为突出，BO不仅需要优化，还需确保可行性。BO文献中的一个相关问题是缺乏系统性的停止准则。为解决这些挑战，我们开发了一种约束成本感知多保真BO（CMFBO）框架，其目标是通过利用廉价的低保真来源最小化总体采样成本，同时确保可行性。在我们的情况下，约束可以在数据源之间改变，甚至可能是黑匣子函数。我们还引入了一个系统性的停止准则，解决了与BO收敛评估相关的长期问题。我们的框架通过GP+ Python软件包在GitHub上公开，我们在多个基准问题上验证了其有效性。

更新时间: 2025-03-21 22:41:37

领域: cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.01126v2

Passive Inference Attacks on Split Learning via Adversarial Regularization

Split Learning (SL) has emerged as a practical and efficient alternative to traditional federated learning. While previous attempts to attack SL have often relied on overly strong assumptions or targeted easily exploitable models, we seek to develop more capable attacks. We introduce SDAR, a novel attack framework against SL with an honest-but-curious server. SDAR leverages auxiliary data and adversarial regularization to learn a decodable simulator of the client's private model, which can effectively infer the client's private features under the vanilla SL, and both features and labels under the U-shaped SL. We perform extensive experiments in both configurations to validate the effectiveness of our proposed attacks. Notably, in challenging scenarios where existing passive attacks struggle to reconstruct the client's private data effectively, SDAR consistently achieves significantly superior attack performance, even comparable to active attacks. On CIFAR-10, at the deep split level of 7, SDAR achieves private feature reconstruction with less than 0.025 mean squared error in both the vanilla and the U-shaped SL, and attains a label inference accuracy of over 98% in the U-shaped setting, while existing attacks fail to produce non-trivial results.

Updated: 2025-03-21 22:27:01

标题: 通过对抗性正则化的 passsive 推理攻击对分布式学习的影响

摘要: Split Learning（SL）已经成为传统联邦学习的实用和高效替代方案。尽管先前尝试攻击SL的方法往往依赖于过于强大的假设或针对容易被利用的模型，我们致力于开发更具攻击能力的方法。我们引入了SDAR，这是一种针对SL的新型攻击框架，针对一个诚实但好奇的服务器。SDAR利用辅助数据和对抗正则化来学习客户端私有模型的可解码模拟器，这可以有效地推断客户端私有特征在基础SL下，以及在U型SL下推断特征和标签。我们在两种配置下进行了大量实验，以验证我们提出的攻击的有效性。值得注意的是，在现有的被动攻击难以有效重建客户端私有数据的挑战性情况下，SDAR始终达到显著优越的攻击性能，甚至可与主动攻击相媲美。在CIFAR-10上，在深度分割水平为7时，SDAR在基础和U型SL下均实现了私有特征重建，均方误差小于0.025，在U型设置下获得了98%以上的标签推断准确率，而现有攻击则未能产生实质性结果。

更新时间: 2025-03-21 22:27:01

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2310.10483v6

Optimal Neural Compressors for the Rate-Distortion-Perception Tradeoff

Recent efforts in neural compression have focused on the rate-distortion-perception (RDP) tradeoff, where the perception constraint ensures the source and reconstruction distributions are close in terms of a statistical divergence. Theoretical work on RDP describes interesting properties of RDP-optimal compressors without providing constructive and low complexity solutions. While classical rate distortion theory shows that optimal compressors should efficiently pack the space, RDP theory additionally shows that infinite randomness shared between the encoder and decoder may be necessary for RDP optimality. In this paper, we propose neural compressors that are low complexity and benefit from high packing efficiency through lattice coding and shared randomness through shared dithering over the lattice cells. For two important settings, namely infinite shared and zero shared randomness, we analyze the rate, distortion, and perception achieved by our proposed neural compressors and further show optimality in the presence of infinite shared randomness. Experimentally, we investigate the roles these two components of our design, lattice coding and randomness, play in the performance of neural compressors on synthetic and real-world data. We observe that performance improves with more shared randomness and better lattice packing.

Updated: 2025-03-21 22:18:52

标题: 最佳神经压缩器在速率-失真-感知权衡中的应用

摘要: 最近神经压缩的努力集中在率失真感知（RDP）权衡上，其中感知约束确保源和重建分布在统计分歧方面接近。关于RDP的理论工作描述了最优压缩器的有趣特性，但没有提供具体且低复杂度的解决方案。尽管经典的率失真理论表明最优压缩器应有效地填充空间，RDP理论另外表明，编码器和解码器之间的无限随机性可能对于RDP的最优性是必要的。在本文中，我们提出了神经压缩器，它们具有低复杂度，并通过格编码和通过共享细分在格单元上的共享随机性获益于高效的填充效率。对于两个重要的设置，即无限共享和零共享随机性，我们分析了我们提出的神经压缩器实现的率、失真和感知，并进一步展示了在存在无限共享随机性的情况下的最优性。在实验中，我们调查了我们设计中这两个组件，格编码和随机性，在合成和真实数据上神经压缩器性能的作用。我们观察到，性能随着更多的共享随机性和更好的格填充而提高。

更新时间: 2025-03-21 22:18:52

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2503.17558v1

Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent

Radiotherapy treatment planning is a complex and time-intensive process, often impacted by inter-planner variability and subjective decision-making. To address these challenges, we introduce Dose Optimization Language Agent (DOLA), an autonomous large language model (LLM)-based agent designed for optimizing radiotherapy treatment plans while rigorously protecting patient privacy. DOLA integrates the LLaMa3.1 LLM directly with a commercial treatment planning system, utilizing chain-of-thought prompting, retrieval-augmented generation (RAG), and reinforcement learning (RL). Operating entirely within secure local infrastructure, this agent eliminates external data sharing. We evaluated DOLA using a retrospective cohort of 18 prostate cancer patients prescribed 60 Gy in 20 fractions, comparing model sizes (8 billion vs. 70 billion parameters) and optimization strategies (No-RAG, RAG, and RAG+RL) over 10 planning iterations. The 70B model demonstrated significantly improved performance, achieving approximately 16.4% higher final scores than the 8B model. The RAG approach outperformed the No-RAG baseline by 19.8%, and incorporating RL accelerated convergence, highlighting the synergy of retrieval-based memory and reinforcement learning. Optimal temperature hyperparameter analysis identified 0.4 as providing the best balance between exploration and exploitation. This proof of concept study represents the first successful deployment of locally hosted LLM agents for autonomous optimization of treatment plans within a commercial radiotherapy planning system. By extending human-machine interaction through interpretable natural language reasoning, DOLA offers a scalable and privacy-conscious framework, with significant potential for clinical implementation and workflow improvement.

Updated: 2025-03-21 22:01:19

标题: 使用DOLA进行自主放疗治疗计划：一种保护隐私的，基于LLM的优化代理

摘要: 放射治疗治疗计划是一个复杂且耗时的过程，常常受到规划者间的变化性和主观决策的影响。为了解决这些挑战，我们引入了Dose Optimization Language Agent（DOLA），这是一个基于大型语言模型（LLM）的自主代理，旨在优化放射治疗计划同时严格保护患者隐私。DOLA将LLaMa3.1 LLM直接集成到商用治疗计划系统中，利用思维链提示、检索增强生成（RAG）和强化学习（RL）。该代理完全在安全的本地基础设施内运行，消除了外部数据共享。我们使用18名前列腺癌患者的回顾性队列评估了DOLA，这些患者处方为60 Gy/20次，比较了模型尺寸（80亿 vs. 700亿参数）和优化策略（无RAG、RAG和RAG+RL）在10个规划迭代中的表现。700亿模型表现出显著改善，最终得分比80亿模型高约16.4%。RAG方法比无RAG基线表现提高了19.8%，并且结合RL加速了收敛，突显了检索式记忆和强化学习的协同作用。最佳温度超参数分析确定0.4为在探索和开发之间提供最佳平衡。这一概念验证研究代表了首次成功在商用放射治疗规划系统内部部署本地托管的LLM代理来自主优化治疗计划。通过通过可解释的自然语言推理扩展人机交互，DOLA提供了一个可扩展且注重隐私的框架，具有重要的临床实施和工作流改进潜力。

更新时间: 2025-03-21 22:01:19

领域: physics.med-ph,cs.AI,cs.CL,cs.ET,cs.HC

下载: http://arxiv.org/abs/2503.17553v1

Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion

Transformer-based multimodal models are widely used in industrial-scale recommendation, search, and advertising systems for content understanding and relevance ranking. Enhancing labeled training data quality and cross-modal fusion significantly improves model performance, influencing key metrics such as quality view rates and ad revenue. High-quality annotations are crucial for advancing content modeling, yet traditional statistical-based active learning (AL) methods face limitations: they struggle to detect overconfident misclassifications and are less effective in distinguishing semantically similar items in deep neural networks. Additionally, audio information plays an increasing role, especially in short-video platforms, yet most pre-trained multimodal architectures primarily focus on text and images. While training from scratch across all three modalities is possible, it sacrifices the benefits of leveraging existing pre-trained visual-language (VL) and audio models. To address these challenges, we propose kNN-based Latent Space Broadening (LSB) to enhance AL efficiency and Vision-Language Modeling with Audio Enhancement (VLMAE), a mid-fusion approach integrating audio into VL models. This system deployed in production systems, leading to significant business gains.

Updated: 2025-03-21 21:55:05

标题: 音频增强视觉-语言建模：潜空间拓宽用于高质量数据扩展

摘要: 基于Transformer的多模态模型广泛应用于工业规模的推荐、搜索和广告系统，用于内容理解和相关性排名。提高标记训练数据质量和跨模态融合显著改善模型性能，影响关键指标，如质量视图率和广告收入。高质量的注释对于推进内容建模至关重要，然而传统的基于统计的主动学习（AL）方法存在局限性：它们难以检测过度自信的误分类，在深度神经网络中区分语义相似的项目效果较差。此外，音频信息在短视频平台中发挥越来越重要的作用，然而大多数预训练的多模态架构主要集中在文本和图像上。虽然可以从头开始跨越所有三种模态的训练，但这会牺牲利用现有预训练的视觉-语言（VL）和音频模型的优势。为了解决这些挑战，我们提出了基于kNN的潜空间扩展（LSB）来增强AL的效率，以及具有音频增强的视觉语言建模（VLMAE），这是一种将音频集成到VL模型中的中融合方法。这个系统在生产系统中部署，带来了显著的商业收益。

更新时间: 2025-03-21 21:55:05

领域: cs.MM,cs.AI,cs.CV,cs.SD,eess.AS

下载: http://arxiv.org/abs/2503.17551v1

Learning Multi-Level Features with Matryoshka Sparse Autoencoders

Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting neural networks by extracting the concepts represented in their activations. However, choosing the size of the SAE dictionary (i.e. number of learned concepts) creates a tension: as dictionary size increases to capture more relevant concepts, sparsity incentivizes features to be split or absorbed into more specific features, leaving high-level features missing or warped. We introduce Matryoshka SAEs, a novel variant that addresses these issues by simultaneously training multiple nested dictionaries of increasing size, forcing the smaller dictionaries to independently reconstruct the inputs without using the larger dictionaries. This organizes features hierarchically - the smaller dictionaries learn general concepts, while the larger dictionaries learn more specific concepts, without incentive to absorb the high-level features. We train Matryoshka SAEs on Gemma-2-2B and TinyStories and find superior performance on sparse probing and targeted concept erasure tasks, more disentangled concept representations, and reduced feature absorption. While there is a minor tradeoff with reconstruction performance, we believe Matryoshka SAEs are a superior alternative for practical tasks, as they enable training arbitrarily large SAEs while retaining interpretable features at different levels of abstraction.

Updated: 2025-03-21 21:43:28

标题: 使用玛特留斯卡稀疏自编码器学习多层特征

摘要: 稀疏自编码器（SAEs）已经成为解释神经网络的强大工具，通过提取其激活中表示的概念。然而，选择SAE字典的大小（即学习概念的数量）会产生一种紧张感：随着字典大小增加以捕获更多相关概念，稀疏性会鼓励特征被分割或吸收到更具体的特征中，导致高级特征缺失或扭曲。我们引入了Matryoshka SAEs，这是一种新颖的变体，通过同时训练多个增大的嵌套字典，迫使较小的字典独立重构输入而不使用较大的字典来解决这些问题。这种方法以层次化的方式组织特征-较小的字典学习一般概念，而较大的字典学习更具体的概念，而不会吸收高级特征。我们在Gemma-2-2B和TinyStories上训练Matryoshka SAEs，并发现在稀疏探测和有针对性的概念擦除任务中表现出更出色的性能，更解耦的概念表示以及减少的特征吸收。虽然与重构性能存在轻微的权衡，但我们认为Matryoshka SAEs是实际任务的更优选，因为它们可以训练任意大的SAEs，并在不同抽象级别保留可解释的特征。

更新时间: 2025-03-21 21:43:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17547v1

Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures

The behavior of multivariate dynamical processes is often governed by underlying structural connections that relate the components of the system. For example, brain activity which is often measured via time series is determined by an underlying structural graph, where nodes represent neurons or brain regions and edges represent cortical connectivity. Existing methods for inferring structural connections from observed dynamics, such as correlation-based or spectral techniques, may fail to fully capture complex relationships in high-dimensional time series in an interpretable way. Here, we propose the use of path signatures a mathematical framework that encodes geometric and temporal properties of continuous paths to address this problem. Path signatures provide a reparametrization-invariant characterization of dynamical data and, in particular, can be used to compute the lead matrix which reveals lead-lag phenomena. We showcase our approach on time series from coupled oscillators in the Kuramoto model defined on a stochastic block model graph, termed the Kuramoto stochastic block model (KSBM). Using mean-field theory and Gaussian approximations, we analytically derive reduced models of KSBM dynamics in different temporal regimes and theoretically characterize the lead matrix in these settings. Leveraging these insights, we propose a novel signature-based community detection algorithm, achieving exact recovery of structural communities from observed time series in multiple KSBM instances. Our results demonstrate that path signatures provide a novel perspective on analyzing complex neural data and other high-dimensional systems, explicitly exploiting temporal functional relationships to infer underlying structure.

Updated: 2025-03-21 21:41:48

标题: Kuramoto模型中的社区：动态和路径标记检测

摘要: 多元动态过程的行为通常受到关联系统组件之间的基础结构连接的控制。例如，大脑活动通常通过时间序列来衡量，这取决于一个基础结构图，其中节点代表神经元或大脑区域，边代表皮质连接性。现有的从观察到的动态中推断结构连接的方法，如基于相关性或谱技术的方法，可能无法以可解释的方式完全捕捉高维时间序列中的复杂关系。在这里，我们提出使用路径签名，这是一种数学框架，用于编码连续路径的几何和时间属性，以解决这个问题。路径签名提供了一个重新参数化不变的动态数据表征，特别是可以用于计算引导矩阵，揭示引导滞后现象。我们展示了我们的方法在Kuramoto模型中的耦合振荡器的时间序列上的应用，该模型定义在一个随机分区模型图上，被称为Kuramoto随机分区模型（KSBM）。使用平均场理论和高斯近似，我们在不同的时间区域内解析地推导了KSBM动态的简化模型，并在这些情境中理论上表征了引导矩阵。利用这些见解，我们提出了一种基于签名的社区检测算法，在多个KSBM实例中实现了对观察到的时间序列的结构社区的准确恢复。我们的结果表明，路径签名为分析复杂的神经数据和其他高维系统提供了一种新的视角，明确地利用时间功能关系来推断潜在结构。

更新时间: 2025-03-21 21:41:48

领域: stat.ML,cond-mat.dis-nn,cs.LG,nlin.AO,q-bio.NC,q-bio.QM

下载: http://arxiv.org/abs/2503.17546v1

Invariant Causal Set Covering Machines

Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time.

Updated: 2025-03-21 21:31:08

标题: 不变因果集覆盖机

摘要: 基于规则的模型，如决策树，因其可解释性而受到从业者的青睐。然而，产生这种模型的学习算法往往容易受到虚假关联的影响，因此不能保证提取与因果关系相关的见解。在这项工作中，我们借鉴不变因果预测文献的思想，提出了不变因果集覆盖机器，这是对传统集覆盖机器算法的扩展，用于二值规则的合取/析取，可证明避免虚假关联。我们从理论和实证两方面证明，我们的方法可以在多项式时间内确定感兴趣变量的因果父节点。

更新时间: 2025-03-21 21:31:08

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2306.04777v3

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning

To build a motor system of the interactive avatar, it is essential to develop a generative motion model drives the body to move through 3D space in a perpetual, realistic, controllable, and responsive manner. Although motion generation has been extensively studied, most methods do not support ``embodied intelligence'' due to their offline setting, slow speed, limited motion lengths, or unnatural movements. To overcome these limitations, we propose PRIMAL, an autoregressive diffusion model that is learned with a two-stage paradigm, inspired by recent advances in foundation models. In the pretraining stage, the model learns motion dynamics from a large number of sub-second motion segments, providing ``motor primitives'' from which more complex motions are built. In the adaptation phase, we employ a ControlNet-like adaptor to fine-tune the motor control for semantic action generation and spatial target reaching. Experiments show that physics effects emerge from our training. Given a single-frame initial state, our model not only generates unbounded, realistic, and controllable motion, but also enables the avatar to be responsive to induced impulses in real time. In addition, we can effectively and efficiently adapt our base model to few-shot personalized actions and the task of spatial control. Evaluations show that our proposed method outperforms state-of-the-art baselines. We leverage the model to create a real-time character animation system in Unreal Engine that is highly responsive and natural. Code, models, and more results are available at: https://yz-cnsdqz.github.io/eigenmotion/PRIMAL

Updated: 2025-03-21 21:27:57

标题: PRIMAL：用于角色学习的物理反应和交互式马达模型

摘要: 为了构建一个交互式化身的运动系统，开发一个生成动作模型是至关重要的，可以使身体以一种持续、真实、可控和响应的方式在3D空间中移动。尽管动作生成已被广泛研究，但大多数方法不支持“具身智能”，因为它们是离线设置、速度缓慢、运动长度有限或动作不自然。为了克服这些限制，我们提出了PRIMAL，这是一个自回归扩散模型，通过受最近基础模型进展的启发而学习，采用两阶段范式。在预训练阶段，该模型从大量的亚秒级运动片段中学习动作动态，提供“运动原语”，以构建更复杂的动作。在适应阶段，我们使用类似ControlNet的适配器来微调语义动作生成和空间目标到达的运动控制。实验证明，物理效应源自我们的训练。在给定单帧初始状态的情况下，我们的模型不仅生成无限、真实和可控的动作，还使化身能够实时响应诱发的冲击。此外，我们还可以有效地并高效地将基础模型调整为少样本个性化动作和空间控制任务。评估结果显示，我们提出的方法优于现有最先进方法。我们利用该模型在虚幻引擎中创建了一个实时角色动画系统，具有高度响应性和自然性。代码、模型和更多结果可在以下链接获取：https://yz-cnsdqz.github.io/eigenmotion/PRIMAL

更新时间: 2025-03-21 21:27:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17544v1

Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators

Although large language models (LLMs) have been assessed for general medical knowledge using licensing exams, their ability to support clinical decision-making, such as selecting medical calculators, remains uncertain. We assessed nine LLMs, including open-source, proprietary, and domain-specific models, with 1,009 multiple-choice question-answer pairs across 35 clinical calculators and compared LLMs to humans on a subset of questions. While the highest-performing LLM, OpenAI o1, provided an answer accuracy of 66.0% (CI: 56.7-75.3%) on the subset of 100 questions, two human annotators nominally outperformed LLMs with an average answer accuracy of 79.5% (CI: 73.5-85.0%). Ultimately, we evaluated medical trainees and LLMs in recommending medical calculators across clinical scenarios like risk stratification and diagnosis. With error analysis showing that the highest-performing LLMs continue to make mistakes in comprehension (49.3% of errors) and calculator knowledge (7.1% of errors), our findings highlight that LLMs are not superior to humans in calculator recommendation.

Updated: 2025-03-21 21:13:39

标题: 人类与大型语言模型在临床决策支持中的应用：基于医学计算器的研究

摘要: 尽管大型语言模型（LLMs）已被用于通过许可考试评估一般医学知识，但它们在支持临床决策，如选择医学计算器的能力仍不确定。我们评估了包括开源、专有和领域特定模型在内的九个LLMs，在35个临床计算器上进行了1,009个多项选择题答案对的比较，并将LLMs与人类在部分问题上进行了比较。虽然表现最好的LLM，OpenAI o1，在100个问题的子集上提供了66.0%的答案准确率（置信区间：56.7-75.3%），但两名人类注释员在平均答案准确率方面略高于LLMs，为79.5%（置信区间：73.5-85.0%）。最终，我们评估了医学实习生和LLMs在推荐医学计算器时在风险分层和诊断等临床场景中的表现。通过错误分析显示，表现最好的LLMs在理解（49.3%的错误）和计算器知识（7.1%的错误）方面仍然存在错误，我们的研究结果突显了LLMs在医学计算器推荐方面并不优于人类。

更新时间: 2025-03-21 21:13:39

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2411.05897v2

Powerful Primitives in the Bounded Quantum Storage Model

The bounded quantum storage model aims to achieve security against computationally unbounded adversaries that are restricted only with respect to their quantum memories. In this work, we provide information-theoretic secure constructions in this model for the following powerful primitives: (1) CCA1-secure symmetric key encryption, message authentication codes, and one-time programs. These schemes require no quantum memory for the honest user, while they can be made secure against adversaries with arbitrarily large memories by increasing the transmission length sufficiently. (2) CCA1-secure asymmetric key encryption, encryption tokens, signatures, signature tokens, and program broadcast. These schemes are secure against adversaries with roughly $e^{\sqrt{m}}$ quantum memory where $m$ is the quantum memory required for the honest user. All of the constructions additionally satisfy disappearing security, essentially preventing an adversary from storing and using a transmission later on.

Updated: 2025-03-21 21:12:07

标题: 有界量子存储模型中的强大基元

摘要: 有界量子存储模型旨在实现针对计算上不受限制的对手的安全性，这些对手只在其量子记忆方面受到限制。在这项工作中，我们在该模型中为以下强大的基元提供了信息论安全的构造：（1）CCA1安全的对称密钥加密、消息认证码和一次性程序。这些方案对于诚实用户不需要量子记忆，同时可以通过足够增加传输长度来确保对抗者具有任意大的记忆。（2）CCA1安全的非对称密钥加密、加密令牌、签名、签名令牌和程序广播。这些方案对于大约$ e ^ {\sqrt {m}} $量子记忆的对手是安全的，其中$ m $是诚实用户所需的量子记忆。所有的构造还满足消失安全性，基本上防止对手存储和随后使用传输。

更新时间: 2025-03-21 21:12:07

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2302.05724v4

A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics

Contrastive learning -- a modern approach to extract useful representations from unlabeled data by training models to distinguish similar samples from dissimilar ones -- has driven significant progress in foundation models. In this work, we develop a new theoretical framework for analyzing data augmentation-based contrastive learning, with a focus on SimCLR as a representative example. Our approach is based on the concept of \emph{approximate sufficient statistics}, which we extend beyond its original definition in \cite{oko2025statistical} for contrastive language-image pretraining (CLIP) using KL-divergence. We generalize it to equivalent forms and general f-divergences, and show that minimizing SimCLR and other contrastive losses yields encoders that are approximately sufficient. Furthermore, we demonstrate that these near-sufficient encoders can be effectively adapted to downstream regression and classification tasks, with performance depending on their sufficiency and the error induced by data augmentation in contrastive learning. Concrete examples in linear regression and topic classification are provided to illustrate the broad applicability of our results.

Updated: 2025-03-21 21:07:18

标题: 一种通过近似充分统计的对比学习的统计理论

摘要: 对比学习——一种从未标记数据中提取有用表示的现代方法，通过训练模型来区分相似样本和不相似样本——在基础模型中取得了显著进展。在这项工作中，我们开发了一个新的理论框架，用于分析基于数据增强的对比学习，重点关注SimCLR作为一个代表性示例。我们的方法基于“近似充分统计量”的概念，在\cite{oko2025statistical}中对于对比语言-图像预训练（CLIP）使用KL散度的原始定义进行了扩展。我们将其推广到等价形式和一般的f-散度，并展示了最小化SimCLR和其他对比损失会产生近似充分的编码器。此外，我们证明这些接近充分的编码器可以有效地适应下游回归和分类任务，性能取决于它们的充分性以及对比学习中数据增强引起的误差。提供了线性回归和主题分类的具体示例，以说明我们结果的广泛适用性。

更新时间: 2025-03-21 21:07:18

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2503.17538v1

Understanding the Changing Landscape of Automotive Software Vulnerabilities: Insights from a Seven-Year Analysis

The automotive industry has experienced a drastic transformation in the past few years when vehicles got connected to the internet. Nowadays, connected vehicles require complex architecture and interdependent functionalities, facilitating modern lifestyles and their needs. As a result, automotive software has shifted from just embedded system or SoC (System on Chip) to a more hybrid platform, which includes software for web or mobile applications, cloud, simulation, infotainment, etc. Automatically, the security concerns for automotive software have also developed accordingly. This paper presents a study on automotive vulnerabilities from 2018 to September 2024, i.e., the last seven years, intending to understand and report the noticeable changes in their pattern. 1,663 automotive software vulnerabilities were found to have been reported in the studied time frame. The study reveals the Common Weakness Enumeration (CWE) associated with these vulnerabilities develop over time and how different parts of the automotive ecosystem are exposed to these CWEs. Our study provides the platform to understand the automotive software weaknesses and loopholes and paves the way for identifying the phases in the software development lifecycle where the vulnerability was introduced. Our findings are a step forward to support vulnerability management in automotive software across its entire life cycle.

Updated: 2025-03-21 21:04:39

标题: 理解汽车软件漏洞变化的格局：来自七年分析的见解

摘要: 过去几年，汽车行业经历了一场巨大的变革，车辆与互联网相连。如今，连接的汽车需要复杂的架构和相互依赖的功能，促进了现代生活方式和需求。因此，汽车软件已经从仅嵌入式系统或SoC（芯片系统）转变为更混合的平台，其中包括用于Web或移动应用程序、云端、模拟、娱乐等软件。自动地，对汽车软件的安全关注也相应发展。本文介绍了对2018年至2024年9月的汽车漏洞的研究，即过去七年，旨在了解和报告其模式中明显的变化。在研究时间范围内发现了1,663个汽车软件漏洞。研究揭示了这些漏洞所涉及的常见弱点枚举（CWE）随时间发展的方式以及汽车生态系统的不同部分如何暴露于这些CWE。我们的研究提供了理解汽车软件弱点和漏洞的平台，并为确定软件开发生命周期中引入漏洞的阶段铺平了道路。我们的研究结果是支持汽车软件在整个生命周期内漏洞管理的一大步。

更新时间: 2025-03-21 21:04:39

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2503.17537v1

Adver-City: Open-Source Multi-Modal Dataset for Collaborative Perception Under Adverse Weather Conditions

Adverse weather conditions pose a significant challenge to the widespread adoption of Autonomous Vehicles (AVs) by impacting sensors like LiDARs and cameras. Even though Collaborative Perception (CP) improves AV perception in difficult conditions, existing CP datasets lack adverse weather conditions. To address this, we introduce Adver-City, the first open-source synthetic CP dataset focused on adverse weather conditions. Simulated in CARLA with OpenCDA, it contains over 24 thousand frames, over 890 thousand annotations, and 110 unique scenarios across six different weather conditions: clear weather, soft rain, heavy rain, fog, foggy heavy rain and, for the first time in a synthetic CP dataset, glare. It has six object categories including pedestrians and cyclists, and uses data from vehicles and roadside units featuring LiDARs, RGB and semantic segmentation cameras, GNSS, and IMUs. Its scenarios, based on real crash reports, depict the most relevant road configurations for adverse weather and poor visibility conditions, varying in object density, with both dense and sparse scenes, allowing for novel testing conditions of CP models. Benchmarks run on the dataset show that weather conditions created challenging conditions for perception models, with CoBEVT scoring 58.30/52.44/38.90 (AP@30/50/70). The dataset, code and documentation are available at https://labs.cs.queensu.ca/quarrg/datasets/adver-city/.

Updated: 2025-03-21 20:59:38

标题: Adver-City: 适用于恶劣天气条件下的协同感知的开源多模态数据集

摘要: 恶劣的天气条件对自动驾驶汽车（AVs）的广泛采用构成了重大挑战，影响到传感器如LiDAR和摄像头。尽管协同感知（CP）在恶劣条件下改善了AV的感知能力，但现有的CP数据集缺乏恶劣天气条件。为了解决这个问题，我们介绍了Adver-City，这是第一个专注于恶劣天气条件的开源合成CP数据集。在CARLA中使用OpenCDA进行模拟，它包含超过24,000帧、超过890,000个标注，以及110个独特的场景，涵盖了六种不同的天气条件：晴天、小雨、大雨、雾、雾中大雨，以及首次出现在合成CP数据集中的刺眼光。它包含六种物体类别，包括行人和骑车人，并使用了来自车辆和路边单位的数据，包括LiDAR、RGB和语义分割摄像头、GNSS和IMUs。其场景基于真实事故报告，描绘了恶劣天气和低能见度条件下最相关的道路配置，物体密度各异，既有密集场景也有稀疏场景，可用于测试CP模型的新颖条件。在该数据集上进行的基准测试显示，天气条件为感知模型创造了具有挑战性的条件，CoBEVT得分为58.30/52.44/38.90（AP@30/50/70）。该数据集、代码和文档可在https://labs.cs.queensu.ca/quarrg/datasets/adver-city/上获取。

更新时间: 2025-03-21 20:59:38

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.06380v2

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

Few-shot recognition (FSR) aims to train a classification model with only a few labeled examples of each concept concerned by a downstream task, where data annotation cost can be prohibitively high. We develop methods to solve FSR by leveraging a pretrained Vision-Language Model (VLM). We particularly explore retrieval-augmented learning (RAL), which retrieves open data, e.g., the VLM's pretraining dataset, to learn models for better serving downstream tasks. RAL has been studied in zero-shot recognition but remains under-explored in FSR. Although applying RAL to FSR may seem straightforward, we observe interesting and novel challenges and opportunities. First, somewhat surprisingly, finetuning a VLM on a large amount of retrieved data underperforms state-of-the-art zero-shot methods. This is due to the imbalanced distribution of retrieved data and its domain gaps with the few-shot examples in the downstream task. Second, more surprisingly, we find that simply finetuning a VLM solely on few-shot examples significantly outperforms previous FSR methods, and finetuning on the mix of retrieved and few-shot data yields even better results. Third, to mitigate the imbalanced distribution and domain gap issues, we propose Stage-Wise retrieval-Augmented fineTuning (SWAT), which involves end-to-end finetuning on mixed data in the first stage and retraining the classifier on the few-shot data in the second stage. Extensive experiments on nine popular benchmarks demonstrate that SWAT significantly outperforms previous methods by >6% accuracy.

Updated: 2025-03-21 20:56:08

标题: 通过阶段式检索增强微调进行少样本识别

摘要: Few-shot recognition (FSR)旨在仅使用一些涉及下游任务的概念的少数标记示例来训练分类模型，其中数据注释成本可能过高。我们通过利用预训练的视觉语言模型（VLM）来开发解决FSR的方法。我们特别探索了检索增强学习（RAL），它检索开放数据，例如VLM的预训练数据集，以学习更好地为下游任务提供模型。RAL已在零样本识别中进行了研究，但在FSR中仍未得到充分探讨。尽管将RAL应用于FSR似乎很简单，但我们观察到了有趣且新颖的挑战和机遇。首先，令人惊讶的是，在大量检索数据上微调VLM的表现低于最先进的零样本方法。这是由于检索数据的不平衡分布及其与下游任务中少数标记示例的领域差异造成的。其次，更令人惊讶的是，我们发现仅仅在少数标记示例上微调VLM显著优于先前的FSR方法，而在检索和少数标记数据混合上进行微调甚至可以获得更好的结果。第三，为了缓解不平衡分布和领域差距问题，我们提出了分阶段检索增强微调（SWAT），其中包括在第一阶段对混合数据进行端到端微调，然后在第二阶段对少数标记数据进行分类器的重新训练。对九个流行基准测试进行了广泛实验，结果表明SWAT的准确率显著优于先前的方法超过6%。

更新时间: 2025-03-21 20:56:08

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.11148v3

MetaSel: A Test Selection Approach for Fine-tuned DNN Models

Deep Neural Networks (DNNs) face challenges during deployment due to data distribution shifts. Fine-tuning adapts pre-trained models to new contexts requiring smaller labeled sets. However, testing fine-tuned models under constrained labeling budgets remains a critical challenge. This paper introduces MetaSel, a new approach, tailored for fine-tuned DNN models, to select tests from unlabeled inputs. MetaSel assumes that fine-tuned and pre-trained models share related data distributions and exhibit similar behaviors for many inputs. However, their behaviors diverge within the input subspace where fine-tuning alters decision boundaries, making those inputs more prone to misclassification. Unlike general approaches that rely solely on the DNN model and its input set, MetaSel leverages information from both the fine-tuned and pre-trained models and their behavioral differences to estimate misclassification probability for unlabeled test inputs, enabling more effective test selection. Our extensive empirical evaluation, comparing MetaSel against 10 state-of-the-art approaches and involving 68 fine-tuned models across weak, medium, and strong distribution shifts, demonstrates that MetaSel consistently delivers significant improvements in Test Relative Coverage (TRC) over existing baselines, particularly under highly constrained labeling budgets. MetaSel shows average TRC improvements of 28.46% to 56.18% over the most frequent second-best baselines while maintaining a high TRC median and low variability. Our results confirm MetaSel's practicality, robustness, and cost-effectiveness for test selection in the context of fine-tuned models.

Updated: 2025-03-21 20:31:47

标题: MetaSel：一种用于微调DNN模型的测试选择方法

摘要: 深度神经网络（DNNs）在部署过程中面临数据分布转移的挑战。微调将预训练模型适应到新的环境中，需要较小的标记集。然而，在受限标记预算下测试微调模型仍然是一个关键挑战。本文介绍了MetaSel，一种针对微调DNN模型的新方法，用于从未标记的输入中选择测试。MetaSel假设微调和预训练模型共享相关的数据分布，并对许多输入表现出类似的行为。然而，在微调改变决策边界的输入子空间中，它们的行为会发生分歧，使这些输入更容易被误分类。与仅依赖于DNN模型及其输入集的一般方法不同，MetaSel利用来自微调和预训练模型及其行为差异的信息来估计未标记测试输入的误分类概率，从而实现更有效的测试选择。我们对MetaSel进行了广泛的实证评估，将其与10种最先进的方法进行比较，涉及弱、中、强分布转移的68个微调模型，结果显示MetaSel相对于现有基线方法在测试相对覆盖率（TRC）方面持续取得显著改进，特别是在高度受限的标记预算下。MetaSel在最常见的次佳基线方法上平均提高了28.46%至56.18%的TRC，同时保持高TRC中值和低变异性。我们的结果证实了MetaSel在微调模型测试选择方面的实用性、鲁棒性和成本效益。

更新时间: 2025-03-21 20:31:47

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2503.17534v1

How well behaved is finite dimensional Diffusion Maps?

Under a set of assumptions on a family of submanifolds $\subset {\mathbb R}^D$, we derive a series of geometric properties that remain valid after finite-dimensional and almost isometric Diffusion Maps (DM), including almost uniform density, finite polynomial approximation and reach. Leveraging these properties, we establish rigorous bounds on the embedding errors introduced by the DM algorithm is $O\left((\frac{\log n}{n})^{\frac{1}{8d+16}}\right)$. Furthermore, we quantify the error between the estimated tangent spaces and the true tangent spaces over the submanifolds after the DM embedding, $\sup_{P\in \mathcal{P}}\mathbb{E}_{P^{\otimes \tilde{n}}} \max_{1\leq j \angle (T_{Y_{\varphi(M),j}}\varphi(M),\hat{T}_j)\leq \tilde{n}} \leq C \left(\frac{\log n }{n}\right)^\frac{k-1}{(8d+16)k}$, which providing a precise characterization of the geometric accuracy of the embeddings. These results offer a solid theoretical foundation for understanding the performance and reliability of DM in practical applications.

Updated: 2025-03-21 20:28:31

标题: 有限维扩散映射表现如何？

摘要: 在一组对子流形家族${\mathbb R}^D$上的假设下，我们得出了一系列在有限维和几乎等距扩散映射（DM）之后仍然有效的几何性质，包括几乎均匀密度、有限多项式逼近和可达性。利用这些性质，我们建立了DM算法引入的嵌入误差的严格界限为$O\left((\frac{\log n}{n})^{\frac{1}{8d+16}}\right)$。此外，在DM嵌入后，我们量化了估计切空间和真实切空间之间的误差，即$\sup_{P\in \mathcal{P}}\mathbb{E}_{P^{\otimes \tilde{n}}} \max_{1\leq j \angle (T_{Y_{\varphi(M),j}}\varphi(M),\hat{T}_j)\leq \tilde{n}} \leq C \left(\frac{\log n }{n}\right)^\frac{k-1}{(8d+16)k}$，从而提供了对嵌入的几何准确性的精确描述。这些结果为理解DM在实际应用中的性能和可靠性提供了坚实的理论基础。

更新时间: 2025-03-21 20:28:31

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2412.03992v2

Stability and List-Replicability for Agnostic Learners

Two seminal papers--Alon, Livni, Malliaris, Moran (STOC 2019) and Bun, Livni, and Moran (FOCS 2020)--established the equivalence between online learnability and globally stable PAC learnability in binary classification. However, Chase, Chornomaz, Moran, and Yehudayoff (STOC 2024) recently showed that this equivalence does not hold in the agnostic setting. Specifically, they proved that in the agnostic setting, only finite hypothesis classes are globally stable learnable. Therefore, agnostic global stability is too restrictive to capture interesting hypothesis classes. To address this limitation, Chase et al. introduced two relaxations of agnostic global stability. In this paper, we characterize the classes that are learnable under their proposed relaxed conditions, resolving the two open problems raised in their work. First, we prove that in the setting where the stability parameter can depend on the excess error (the gap between the learner's error and the best achievable error by the hypothesis class), agnostic stability is fully characterized by the Littlestone dimension. Consequently, as in the realizable case, this form of learnability is equivalent to online learnability. As part of the proof of this theorem, we strengthen the celebrated result of Bun et al. by showing that classes with infinite Littlestone dimension are not stably PAC learnable, even if we allow the stability parameter to depend on the excess error. For the second relaxation proposed by Chase et al., we prove that only finite hypothesis classes are globally stable learnable, even if we restrict the agnostic setting to distributions with small population loss.

Updated: 2025-03-21 20:27:28

标题: 对于无知学习者的稳定性和列表复制性

摘要: 两篇开创性论文--Alon, Livni, Malliaris, Moran (STOC 2019)和Bun, Livni,和Moran (FOCS 2020)--建立了在线可学习性和二元分类中全局稳定PAC可学习性之间的等价性。然而，Chase, Chornomaz, Moran和Yehudayoff (STOC 2024)最近表明，在不可知设置中这种等价性并不成立。具体来说，他们证明在不可知设置中，只有有限的假设类是全局稳定可学习的。因此，不可知全局稳定性过于严格，无法捕捉有趣的假设类。为了解决这一限制，Chase等人提出了两种放宽不可知全局稳定性的方法。在本文中，我们对他们提出的放宽条件下可学习的类进行了表征，解决了他们工作中提出的两个未解决问题。首先，我们证明在稳定参数可以取决于过度错误（学习者错误与假设类最佳可实现错误之间的差距）的情况下，不可知稳定性可以由Littlestone维度完全表征。因此，与可实现情况一样，这种可学习性等同于在线可学习性。作为证明此定理的一部分，我们通过展示具有无限Littlestone维度的类即使允许稳定参数取决于过度错误，也不是稳定PAC可学习的，加强了Bun等人的著名结果。对于Chase等人提出的第二种放宽条件，我们证明即使将不可知设置限制为具有小人口损失的分布，只有有限的假设类是全局稳定可学习的。

更新时间: 2025-03-21 20:27:28

领域: cs.LG

下载: http://arxiv.org/abs/2501.05333v2

Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models

Artificial intelligence systems based on large language models (LLMs) are increasingly used as agents that interact with users and with the world. To do so successfully, LLMs need to construct internal representations of the world and form probabilistic beliefs about those representations. To provide a user with personalized recommendations, for example, the LLM needs to gradually infer the user's preferences, over the course of multiple interactions. To evaluate whether contemporary LLMs are able to do so, we use the Bayesian inference framework from probability theory, which lays out the optimal way to update an agent's beliefs as it receives new information. We first show that the LLMs do not update their beliefs as expected from the Bayesian framework, and that consequently their predictions do not improve as expected as more information becomes available, even less so than we find is the case for humans. To address this issue, we teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model. We find that this approach not only significantly improves the LLM's performance on the particular recommendation task it is trained on, but also enables generalization to other tasks. This suggests that this method endows the LLM with broader Bayesian reasoning skills. More generally, our results indicate that LLMs can learn about reasoning strategies effectively and generalize those skills to new domains, which in part explains LLMs' empirical success.

Updated: 2025-03-21 20:13:04

标题: 贝叶斯教学使大型语言模型具备概率推理能力

摘要: 基于大型语言模型（LLMs）的人工智能系统越来越被用作与用户和世界互动的代理。为了成功地做到这一点，LLMs需要构建世界的内部表示，并对这些表示形成概率信念。例如，为了为用户提供个性化推荐，LLM需要在多次互动过程中逐渐推断用户的偏好。为评估当代LLMs是否能够做到这一点，我们使用概率论中的贝叶斯推理框架，它阐明了在接收新信息时更新代理信念的最佳方式。我们首先展示LLMs没有按照贝叶斯框架预期更新他们的信念，因此他们的预测不会像预期的那样随着更多信息的提供而改善，甚至比我们发现的人类情况还要差。为了解决这个问题，我们教导LLMs以贝叶斯的方式推理，训练它们模仿最优贝叶斯模型的预测。我们发现，这种方法不仅显著提高了LLM在特定推荐任务上的表现，还使其能够推广到其他任务。这表明这种方法赋予了LLM更广泛的贝叶斯推理技能。更一般地说，我们的结果表明LLMs能够有效学习推理策略并将这些技能推广到新领域，这在一定程度上解释了LLMs的经验成功。

更新时间: 2025-03-21 20:13:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17523v1

Passive Heart Rate Monitoring During Smartphone Use in Everyday Life

Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during everyday smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos from 495 participants and validated on 185,970 videos from 205 participants in laboratory and free-living conditions, representing the largest validation study of its kind. Compared to reference electrocardiogram, PHRM achieved a mean absolute percentage error (MAPE) < 10% for HR measurements across three skin tone groups of light, medium and dark pigmentation; MAPE for each skin tone group was non-inferior versus the others. Daily RHR measured by PHRM had a mean absolute error < 5 bpm compared to a wearable HR tracker, and was associated with known risk factors. These results highlight the potential of smartphones to enable passive and equitable heart health monitoring.

Updated: 2025-03-21 20:09:40

标题: 在日常生活中使用智能手机 passivly 监测心率

摘要: 安静心率（RHR）是心血管健康和死亡的重要生物标志物，但通常需要一个可穿戴设备来进行长期跟踪，这限制了其可用性。我们提出了PHRM，这是一个深度学习系统，用于在日常智能手机使用过程中进行被动心率（HR）和RHR测量，利用基于面部视频的光电容积脉动法。我们的系统使用了来自495名参与者的225,773个视频进行开发，并在实验室和自由生活条件下对205名参与者的185,970个视频进行验证，代表了同类研究中规模最大的验证研究。与参考心电图相比，PHRM在三种皮肤色调组（浅色、中等色和深色色素沉着）的HR测量中实现了平均绝对百分比误差（MAPE）<10％；每种皮肤色调组的MAPE不逊于其他组。由PHRM测量的每日RHR与可穿戴HR跟踪器相比，平均绝对误差<5 bpm，并与已知危险因素相关。这些结果突显了智能手机具有潜力实现被动和公平的心脏健康监测。

更新时间: 2025-03-21 20:09:40

领域: q-bio.TO,cs.AI,cs.ET,cs.HC,cs.LG

下载: http://arxiv.org/abs/2503.03783v3

A General Framework to Enhance Fine-tuning-based LLM Unlearning

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations, which is essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two components: a soft gate function for distinguishing target data and a suppression module using Representation Fine-tuning (ReFT) to adjust representations rather than model parameters. Experiments show that GRUN significantly improves the unlearning and utility. Meanwhile, it is general for fine-tuning-based methods, efficient and promising for sequential unlearning.

Updated: 2025-03-21 19:58:12

标题: 一个通用框架来增强基于微调的LLM去学习

摘要: Unlearning被提议用来从大型语言模型（LLMs）中删除受版权保护和隐私敏感的数据。现有方法主要依赖于基于微调的方法，可以分为基于梯度上升（GA-based）和基于抑制的方法。然而，它们通常会降低模型的效用（对正常提示的回应能力）。在这项工作中，我们旨在开发一个增强基于微调的unlearning方法效用的通用框架。为了实现这一目标，我们首先研究了GA-based和抑制-based方法之间的共同特性。我们发现，GA-based方法通过区分目标数据（即要删除的数据）并抑制相关生成来进行unlearning，这与抑制-based方法所采用的策略本质上是相同的。受到这一发现的启发，我们引入了Gated Representation UNlearning（GRUN），它有两个组件：用于区分目标数据的软门函数以及使用表示微调（ReFT）来调整表示而不是模型参数的抑制模块。实验证明，GRUN显著改善了unlearning和效用。与此同时，它适用于基于微调的方法，对于序列unlearning来说是高效和有前景的。

更新时间: 2025-03-21 19:58:12

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.17823v2

A Predictive Services Architecture for Efficient Airspace Operations

Predicting air traffic congestion and flow management is essential for airlines and Air Navigation Service Providers (ANSP) to enhance operational efficiency. Accurate estimates of future airport capacity and airspace density are vital for better airspace management, reducing air traffic controller workload and fuel consumption, ultimately promoting sustainable aviation. While existing literature has addressed these challenges, data management and query processing remain complex due to the vast volume of high-rate air traffic data. Many analytics use cases require a common pre-processing infrastructure, as ad-hoc approaches are insufficient. Additionally, linear prediction models often fall short, necessitating more advanced techniques. This paper presents a data processing and predictive services architecture that ingests large, uncorrelated, and noisy streaming data to forecast future airspace system states. The system continuously collects raw data, periodically compresses it, and stores it in NoSQL databases for efficient query processing. For prediction, the system learns from historical traffic by extracting key features such as airport arrival and departure events, sector boundary crossings, weather parameters, and other air traffic data. These features are input into various regression models, including linear, non-linear, and ensemble models, with the best-performing model selected for predictions. We evaluate this infrastructure across three prediction use cases in the US National Airspace System (NAS) and a segment of European airspace, using extensive real operations data, confirming that our system can predict future system states efficiently and accurately.

Updated: 2025-03-21 19:57:38

标题: 一个用于高效领空运营的预测性服务架构

摘要: 预测空中交通拥堵和流量管理对于航空公司和空中导航服务提供商（ANSP）来说至关重要，以提高运营效率。准确估计未来机场容量和空域密度对于更好地管理空域、减少空中交通管制员的工作量和燃料消耗至关重要，最终促进可持续航空。尽管现有文献已经解决了这些挑战，但由于大量高速空中交通数据，数据管理和查询处理仍然复杂。许多分析用例需要一个通用的预处理基础设施，因为临时方法是不够的。此外，线性预测模型常常不足，需要更先进的技术。本文提出了一个数据处理和预测服务架构，用于摄取大量的、不相关的和嘈杂的流数据，以预测未来的空域系统状态。该系统持续收集原始数据，定期压缩数据，并将其存储在NoSQL数据库中，以进行高效的查询处理。对于预测，该系统通过提取关键特征（如机场到达和离开事件、区域边界穿越、天气参数和其他空中交通数据）从历史交通中学习。这些特征被输入到各种回归模型中，包括线性、非线性和集成模型，选择最佳性能模型进行预测。我们通过对美国国家空域系统（NAS）和欧洲一部分空域进行三种预测用例的评估，使用大量真实运营数据，确认我们的系统可以有效和准确地预测未来的系统状态。

更新时间: 2025-03-21 19:57:38

领域: cs.LG,cs.AI,cs.DB,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.17515v1

Language Models May Verbatim Complete TextThey Were Not Explicitly Trained On

An important question today is whether a given text was used to train a large language model (LLM). A \emph{completion} test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the $n$-gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this $n$-gram based membership definition can be effectively gamed. We study scenarios where sequences are \emph{non-members} for a given $n$ and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of $n$ for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of $n$. Our findings highlight the inadequacy of $n$-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

Updated: 2025-03-21 19:57:04

标题: 语言模型可能逐字完整完成未经明确训练的文本

摘要: 今天一个重要的问题是一个给定的文本是否被用来训练一个大型语言模型（LLM）。通常会采用一个“完成”测试：检查LLM是否能够完成一个足够复杂的文本。然而，这需要一个成员身份的真实定义；最常见的是，它被定义为基于目标文本和数据集中任何文本之间的n-gram重叠。在这项工作中，我们证明了基于n-gram的成员身份定义可以被有效地操纵。我们研究了对于给定的n，序列是“非成员”的情况，并发现完成测试仍然成功。我们通过从头开始重新训练LLM并删除所有已完成的训练样本来找到许多自然的情况；这些情况包括完全重复、近似重复，甚至短暂重叠。它们展示了很难找到一个适用于成员身份定义的单一可行的n的选择。利用这些见解，我们设计了对抗性数据集，可以导致一个给定的目标序列在不包含它的情况下被完成，无论选择多大的n都适用。我们的发现凸显了n-gram成员身份的不足，表明成员身份定义未能考虑到训练算法可用的辅助信息。

更新时间: 2025-03-21 19:57:04

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.17514v1

Improving Quantization with Post-Training Model Expansion

The size of a model has been a strong predictor of its quality, as well as its cost. As such, the trade-off between model cost and quality has been well-studied. Post-training optimizations like quantization and pruning have typically focused on reducing the overall volume of pre-trained models to reduce inference costs while maintaining model quality. However, recent advancements have introduced optimization techniques that, interestingly, expand models post-training, increasing model size to improve quality when reducing volume. For instance, to enable 4-bit weight and activation quantization, incoherence processing often necessitates inserting online Hadamard rotations in the compute graph, and preserving highly sensitive weights often calls for additional higher precision computations. However, if application requirements cannot be met, the prevailing solution is to relax quantization constraints. In contrast, we demonstrate post-training model expansion is a viable strategy to improve model quality within a quantization co-design space, and provide theoretical justification. We show it is possible to progressively and selectively expand the size of a pre-trained large language model (LLM) to improve model quality without end-to-end retraining. In particular, when quantizing the weights and activations to 4 bits for Llama3 1B, we reduce the zero-shot accuracy gap to full precision by an average of 3% relative to both QuaRot and SpinQuant with only 5% more parameters, which is still a 3.8% reduction in volume relative to a BF16 reference model.

Updated: 2025-03-21 19:56:59

标题: 通过训练后模型扩展改善量化

摘要: 模型的大小一直是其质量和成本的强有力预测因素。因此，模型成本和质量之间的权衡已经得到深入研究。像量化和剪枝这样的训练后优化通常侧重于减少预训练模型的总体体积，以降低推理成本同时保持模型质量。然而，最近的进展引入了优化技术，有趣的是，在训练后扩展模型，增加模型大小以提高质量，同时降低体积。例如，为了实现4位权重和激活量化，不连贯处理通常需要在计算图中插入在线Hadamard旋转，并保留高度敏感的权重通常需要额外的高精度计算。然而，如果无法满足应用需求，主流解决方案是放宽量化约束。相反，我们证明了训练后模型扩展是一种可行的策略，可以在量化协同设计空间内改善模型质量，并提供理论依据。我们展示了可以逐步和选择性地扩展预训练的大型语言模型（LLM）的大小，以提高模型质量，而无需进行端到端的重新训练。特别是，当将Llama3 1B的权重和激活量化为4位时，我们将零射击精度差距相对于全精度平均减少3％，与QuaRot和SpinQuant相比，仅多出5％的参数，仍然相对于BF16参考模型减少了3.8％的体积。

更新时间: 2025-03-21 19:56:59

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2503.17513v1

Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts. To mitigate these risks, concept removal methods have been proposed. These methods aim to modify diffusion models to prevent the generation of malicious and unwanted concepts. Despite these efforts, existing research faces several challenges: (1) a lack of consistent comparisons on a comprehensive dataset, (2) ineffective prompts in harmful and nudity concepts, (3) overlooked evaluation of the ability to generate the benign part within prompts containing malicious concepts. To address these gaps, we propose to benchmark the concept removal methods by introducing a new dataset, Six-CD, along with a novel evaluation metric. In this benchmark, we conduct a thorough evaluation of concept removals, with the experimental observations and discussions offering valuable insights in the field.

Updated: 2025-03-21 19:56:34

标题: Six-CD：用于良性文本到图像扩散模型的概念去除基准测试

摘要: 文本到图像（T2I）扩散模型展现出在生成与文本提示密切对应的图像方面的优异能力。然而，T2I扩散模型的先进性带来了显著的风险，因为这些模型可能被用于恶意目的，例如生成带有暴力或裸露内容的图像，或在不当情境下创建未经授权的公众人物肖像。为了减轻这些风险，已经提出了概念去除方法。这些方法旨在修改扩散模型以防止生成恶意和不需要的概念。尽管有这些努力，现有研究面临几个挑战：（1）在全面数据集上缺乏一致的比较，（2）在有害和裸露概念方面提示无效，（3）忽视了在包含恶意概念的提示中生成良性部分的评估。为了解决这些差距，我们提出通过引入一个新的数据集Six-CD以及一种新颖的评估指标来对概念去除方法进行基准测试。在这个基准测试中，我们对概念去除进行了彻底评估，实验观察和讨论提供了宝贵的见解。

更新时间: 2025-03-21 19:56:34

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.14855v2

Safe Gradient Flow for Bilevel Optimization

Bilevel optimization is a key framework in hierarchical decision-making, where one problem is embedded within the constraints of another. In this work, we propose a control-theoretic approach to solving bilevel optimization problems. Our method consists of two components: a gradient flow mechanism to minimize the upper-level objective and a safety filter to enforce the constraints imposed by the lower-level problem. Together, these components form a safe gradient flow that solves the bilevel problem in a single loop. To improve scalability with respect to the lower-level problem's dimensions, we introduce a relaxed formulation and design a compact variant of the safe gradient flow. This variant minimizes the upper-level objective while ensuring the lower-level decision variable remains within a user-defined suboptimality. Using Lyapunov analysis, we establish convergence guarantees for the dynamics, proving that they converge to a neighborhood of the optimal solution. Numerical experiments further validate the effectiveness of the proposed approaches. Our contributions provide both theoretical insights and practical tools for efficiently solving bilevel optimization problems.

Updated: 2025-03-21 19:49:45

标题: 双层优化的安全梯度流

摘要: 双层优化是层次决策制定中的一个关键框架，其中一个问题嵌套在另一个问题的约束条件中。在这项工作中，我们提出了一种控制理论方法来解决双层优化问题。我们的方法包括两个组成部分：一个梯度流机制用于最小化上层目标，以及一个安全过滤器用于强制执行下层问题施加的约束。这两个组件共同形成了一个安全梯度流，可以在一个循环中解决双层问题。为了提高对下层问题维度的可扩展性，我们引入了一个放松的公式，并设计了一个紧凑的安全梯度流变体。这个变体在确保下层决策变量保持在用户定义的次优解的同时，最小化了上层目标。通过李亚普诺夫分析，我们为这个动力学建立了收敛保证，证明它们收敛到最优解的邻域。数值实验进一步验证了所提出方法的有效性。我们的贡献为有效解决双层优化问题提供了理论洞见和实用工具。

更新时间: 2025-03-21 19:49:45

领域: math.OC,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2501.16520v2

TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Vision Language Models (VLMs) have demonstrated impressive inference capabilities, but remain vulnerable to jailbreak attacks that can induce harmful or unethical responses. Existing defence methods are predominantly white-box approaches that require access to model parameters and extensive modifications, making them costly and impractical for many real-world scenarios. Although some black-box defences have been proposed, they often impose input constraints or require multiple queries, limiting their effectiveness in safety-critical tasks such as autonomous driving. To address these challenges, we propose a novel black-box defence framework called \textbf{T}extual \textbf{A}nchoring for \textbf{I}mmunizing \textbf{J}ailbreak \textbf{I}mages (\textbf{TAIJI}). TAIJI leverages key phrase-based textual anchoring to enhance the model's ability to assess and mitigate the harmful content embedded within both visual and textual prompts. Unlike existing methods, TAIJI operates effectively with a single query during inference, while preserving the VLM's performance on benign tasks. Extensive experiments demonstrate that TAIJI significantly enhances the safety and reliability of VLMs, providing a practical and efficient solution for real-world deployment.

Updated: 2025-03-21 19:46:59

标题: 太极：用于在视觉语言模型中使越狱图像免疫的文本锚定

摘要: 视觉语言模型（VLMs）展示了令人印象深刻的推理能力，但仍然容易受到越狱攻击的影响，这可能导致有害或不道德的响应。现有的防御方法主要是需要访问模型参数和进行广泛修改的白盒方法，这使它们在许多实际场景中昂贵且不切实际。虽然一些黑盒防御方法已被提出，但它们通常会强加输入约束或需要多次查询，从而限制了它们在自动驾驶等安全关键任务中的有效性。为了解决这些挑战，我们提出了一种称为\textbf{T}extual \textbf{A}nchoring for \textbf{I}mmunizing \textbf{J}ailbreak \textbf{I}mages（\textbf{TAIJI}）的新型黑盒防御框架。TAIJI利用基于关键短语的文本锚定来增强模型评估和减轻嵌入在视觉和文本提示中的有害内容的能力。与现有方法不同，TAIJI在推理过程中可以有效地使用单个查询，同时保持VLM在良性任务上的性能。大量实验证明，TAIJI显著提高了VLM的安全性和可靠性，为真实世界部署提供了实用且高效的解决方案。

更新时间: 2025-03-21 19:46:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.10872v2

DataDAM: Efficient Dataset Distillation with Attention Matching

Researchers have long tried to minimize training costs in deep learning while maintaining strong generalization across diverse datasets. Emerging research on dataset distillation aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset and ultimately achieves test accuracy equivalent to a model trained on the whole dataset. Unfortunately, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data, and they incur significant computational costs. Despite promising results, there still exists a significant performance gap between models trained on condensed synthetic sets and those trained on the whole dataset. In this paper, we address these challenges using efficient Dataset Distillation with Attention Matching (DataDAM), achieving state-of-the-art performance while reducing training costs. Specifically, we learn synthetic images by matching the spatial attention maps of real and synthetic data generated by different layers within a family of randomly initialized neural networks. Our method outperforms the prior methods on several datasets, including CIFAR10/100, TinyImageNet, ImageNet-1K, and subsets of ImageNet-1K across most of the settings, and achieves improvements of up to 6.5% and 4.1% on CIFAR100 and ImageNet-1K, respectively. We also show that our high-quality distilled images have practical benefits for downstream applications, such as continual learning and neural architecture search.

Updated: 2025-03-21 19:43:19

标题: DataDAM：具有注意力匹配的高效数据集精炼

摘要: 研究人员长期以来一直试图在深度学习中最小化培训成本，同时保持对各种数据集的强大泛化能力。关于数据集精炼的新兴研究旨在通过创建一个包含较大真实数据集信息的小型合成集来减少培训成本，并最终实现与在整个数据集上训练的模型相当的测试精度。不幸的是，以前的方法生成的合成数据不能保证与原始训练数据一样具有很好的分布和区分度，并且它们需要付出显着的计算成本。尽管有希望的结果，但在经过精炼的合成集上训练的模型与在整个数据集上训练的模型之间仍存在显著的性能差距。在本文中，我们使用高效的数据集精炼与注意匹配（DataDAM）来应对这些挑战，实现了最先进的性能同时减少培训成本。具体来说，我们通过匹配由一组随机初始化的神经网络生成的真实和合成数据的不同层中的空间注意力图来学习合成图像。我们的方法在几个数据集上优于先前的方法，包括CIFAR10/100、TinyImageNet、ImageNet-1K以及ImageNet-1K的子集，在大多数设置下均取得了高达6.5%和4.1%的改进，分别是在CIFAR100和ImageNet-1K上。我们还展示了我们高质量的精炼图像对于下游应用，如持续学习和神经架构搜索，具有实际的益处。

更新时间: 2025-03-21 19:43:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.00093v3

Follow-up Question Generation For Enhanced Patient-Provider Conversations

Follow-up question generation is an essential feature of dialogue systems as it can reduce conversational ambiguity and enhance modeling complex interactions. Conversational contexts often pose core NLP challenges such as (i) extracting relevant information buried in fragmented data sources, and (ii) modeling parallel thought processes. These two challenges occur frequently in medical dialogue as a doctor asks questions based not only on patient utterances but also their prior EHR data and current diagnostic hypotheses. Asking medical questions in asynchronous conversations compounds these issues as doctors can only rely on static EHR information to motivate follow-up questions. To address these challenges, we introduce FollowupQ, a novel framework for enhancing asynchronous medical conversation. FollowupQ is a multi-agent framework that processes patient messages and EHR data to generate personalized follow-up questions, clarifying patient-reported medical conditions. FollowupQ reduces requisite provider follow-up communications by 34%. It also improves performance by 17% and 5% on real and synthetic data, respectively. We also release the first public dataset of asynchronous medical messages with linked EHR data alongside 2,300 follow-up questions written by clinical experts for the wider NLP research community.

Updated: 2025-03-21 19:40:53

标题: 增强患者和医生对话的后续问题生成

摘要: 跟进问题生成是对话系统的一个重要特性，因为它可以减少对话中的歧义，增强对复杂交互的建模。对话环境经常带来核心的自然语言处理挑战，比如提取埋藏在碎片化数据源中的相关信息，以及建模并行思考过程。这两个挑战在医学对话中经常出现，因为医生不仅基于患者的话语提出问题，还基于他们之前的电子健康记录数据和当前的诊断假设提问。在异步对话中提出医学问题会加剧这些问题，因为医生只能依靠静态的电子健康记录信息来激发后续问题。为了解决这些挑战，我们引入了FollowupQ，一个用于增强异步医学对话的新框架。FollowupQ是一个多代理框架，处理患者消息和电子健康记录数据，生成个性化的后续问题，澄清患者报告的医疗状况。FollowupQ减少了34%的提供者后续沟通的必要性。它在真实数据和合成数据上的性能分别提高了17%和5%。我们还发布了第一个公开数据集，其中包含与2,300个由临床专家为更广泛的自然语言处理研究社区编写的后续问题相关联的异步医学信息。

更新时间: 2025-03-21 19:40:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17509v1

Geometry adaptive waveformer for cardio-vascular modeling

Modeling cardiovascular anatomies poses a significant challenge due to their complex, irregular structures and inherent pathological conditions. Numerical simulations, while accurate, are often computationally expensive, limiting their practicality in clinical settings. Traditional machine learning methods, on the other hand, often struggle with some major hurdles, including high dimensionality of the inputs, inability to effectively work with irregular grids, and preserving the time dependencies of responses in dynamic problems. In response to these challenges, we propose a geometry adaptive waveformer model to predict blood flow dynamics in the cardiovascular system. The framework is primarily composed of three components: a geometry encoder, a geometry decoder, and a waveformer. The encoder transforms input defined on the irregular domain to a regular domain using a graph operator-based network and signed distance functions. The waveformer operates on the transformed field on the irregular grid. Finally, the decoder reverses this process, transforming the output from the regular grid back to the physical space. We evaluate the efficacy of the approach on different sets of cardiovascular data.

Updated: 2025-03-21 19:35:52

标题: 几何自适应波形发生器用于心血管建模

摘要: 建模心血管解剖结构是一项重大挑战，因为其复杂、不规则的结构和固有的病理条件。数值模拟虽然准确，但往往计算成本高昂，限制了在临床环境中的实用性。另一方面，传统的机器学习方法往往面临一些主要障碍，包括输入的高维度、无法有效处理不规则网格以及在动态问题中保持时间依赖性响应。为了应对这些挑战，我们提出了一种几何自适应波形器模型，用于预测心血管系统中的血流动力学。该框架主要由三个组件组成：几何编码器、几何解码器和波形器。编码器使用基于图操作器的网络和符号距离函数将输入转换为规则域上的几何编码器。波形器在不规则网格上操作转换后的场。最后，解码器反转此过程，将输出从规则网格转换回物理空间。我们在不同的心血管数据集上评估了该方法的有效性。

更新时间: 2025-03-21 19:35:52

领域: cs.LG

下载: http://arxiv.org/abs/2503.17505v1

Towards Understanding the Benefits of Neural Network Parameterizations in Geophysical Inversions: A Study With Neural Fields

In this work, we employ neural fields, which use neural networks to map a coordinate to the corresponding physical property value at that coordinate, in a test-time learning manner. For a test-time learning method, the weights are learned during the inversion, as compared to traditional approaches which require a network to be trained using a training data set. Results for synthetic examples in seismic tomography and direct current resistivity inversions are shown first. We then perform a singular value decomposition analysis on the Jacobian of the weights of the neural network (SVD analysis) for both cases to explore the effects of neural networks on the recovered model. The results show that the test-time learning approach can eliminate unwanted artifacts in the recovered subsurface physical property model caused by the sensitivity of the survey and physics. Therefore, NFs-Inv improves the inversion results compared to the conventional inversion in some cases such as the recovery of the dip angle or the prediction of the boundaries of the main target. In the SVD analysis, we observe similar patterns in the left-singular vectors as were observed in some diffusion models, trained in a supervised manner, for generative tasks in computer vision. This observation provides evidence that there is an implicit bias, which is inherent in neural network structures, that is useful in supervised learning and test-time learning models. This implicit bias has the potential to be useful for recovering models in geophysical inversions.

Updated: 2025-03-21 19:32:52

标题: 朝向理解神经网络参数化在地球物理反演中的好处：基于神经场的研究

摘要: 在这项工作中，我们采用了神经场，使用神经网络将坐标映射到该坐标处的相应物理属性值，以测试时间学习的方式。对于测试时间学习方法，权重是在反演过程中学习的，而传统方法则需要使用训练数据集对网络进行训练。首先展示了地震层析成像和直流电阻率反演的合成示例结果。然后，我们对神经网络权重的雅可比矩阵进行奇异值分解分析（SVD分析），以探讨神经网络对恢复模型的影响。结果表明，测试时间学习方法可以消除由于调查和物理敏感性引起的恢复地下物理属性模型中的不良伪像。因此，与传统反演相比，在某些情况下，如倾角恢复或主要目标边界预测，NFs-Inv改进了反演结果。在SVD分析中，我们观察到左奇异向量中出现了类似的模式，这些模式在一些通过监督方式训练的扩散模型中被观察到，用于计算机视觉中的生成任务。这一观察结果证明了神经网络结构中存在一种隐式偏见，对监督学习和测试时间学习模型有用。这种隐式偏见有望对地球物理反演中的模型恢复有用。

更新时间: 2025-03-21 19:32:52

领域: cs.LG,physics.geo-ph,stat.ML

下载: http://arxiv.org/abs/2503.17503v1

Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets

Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. As software systems grow in complexity, integrating LLMs into code analysis workflows becomes essential for enhancing efficiency, accuracy, and automation. This paper explores the role of LLMs for different code analysis tasks, focusing on three key aspects: 1) what they can analyze and their applications, 2) what models are used and 3) what datasets are used, and the challenges they face. Regarding the goal of this research, we investigate scholarly articles that explore the use of LLMs for source code analysis to uncover research developments, current trends, and the intellectual structure of this emerging field. Additionally, we summarize limitations and highlight essential tools, datasets, and key challenges, which could be valuable for future work.

Updated: 2025-03-21 19:29:50

标题: 大型语言模型（LLMs）用于源代码分析：应用、模型和数据集

摘要: 大型语言模型（LLMs）和基于transformer的架构越来越多地用于源代码分析。随着软件系统复杂性的增加，将LLMs整合到代码分析工作流程中变得至关重要，以提高效率、准确性和自动化水平。本文探讨了LLMs在不同代码分析任务中的作用，重点关注三个关键方面：1）它们可以分析什么以及它们的应用，2）使用了哪些模型，3）使用了哪些数据集，以及它们面临的挑战。关于这项研究的目标，我们调查了探索LLMs用于源代码分析的学术文章，以揭示研究发展、当前趋势和这一新兴领域的知识结构。此外，我们总结了限制条件，并强调了对未来工作可能有价值的基本工具、数据集和关键挑战。

更新时间: 2025-03-21 19:29:50

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.17502v1

RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models

Supervised fine-tuning is a standard method for adapting pre-trained large language models (LLMs) to downstream tasks. Quantization has been recently studied as a post-training technique for efficient LLM deployment. To obtain quantized fine-tuned LLMs, conventional pipelines would first fine-tune the pre-trained models, followed by post-training quantization. This often yields suboptimal performance as it fails to leverage the synergy between fine-tuning and quantization. To effectively realize low-bit quantization of weights, activations, and KV caches in LLMs, we propose an algorithm named Rotated Straight-Through-Estimator (RoSTE), which combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy that identifies an effective rotation configuration to reduce activation outliers. We provide theoretical insights on RoSTE by analyzing its prediction error when applied to an overparameterized least square quantized training problem. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration. Experiments on Pythia, Qwen and Llama models of different sizes demonstrate the effectiveness of RoSTE. Compared to existing post-SFT quantization baselines, our method consistently achieves superior performances across various tasks and different LLM architectures.

Updated: 2025-03-21 19:26:12

标题: RoSTE：一种针对大型语言模型的高效量化感知监督微调方法

摘要: 监督微调是一种用于调整预训练大型语言模型（LLMs）以适应下游任务的标准方法。最近，量化作为一种用于高效部署LLMs的后训练技术得到了研究。为了获得量化微调的LLMs，传统的流程通常会先对预训练模型进行微调，然后进行后训练量化。这往往会导致性能不佳，因为它无法充分利用微调和量化之间的协同作用。为了有效实现LLMs中权重、激活和KV缓存的低位量化，我们提出了一种名为Rotated Straight-Through-Estimator（RoSTE）的算法，它将量化感知监督微调（QA-SFT）与自适应旋转策略相结合，以识别有效的旋转配置来减少激活的异常值。通过分析RoSTE在应用于超参数化最小二乘量化训练问题时的预测误差，我们提供了对RoSTE的理论洞察力。我们的发现显示，预测误差与收敛权重的量化误差成正比，这可以通过优化的旋转配置有效管理。对不同大小的Pythia、Qwen和Llama模型进行的实验表明了RoSTE的有效性。与现有的后SFT量化基线相比，我们的方法在各种任务和不同的LLM架构上始终实现了更优异的性能。

更新时间: 2025-03-21 19:26:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.09003v2

Variance Control via Weight Rescaling in LLM Pre-training

The outcome of Large Language Model (LLM) pre-training strongly depends on weight initialization and variance control strategies. Although the importance of initial variance control has been well documented in neural networks in general, the literature on initialization and management of its growth during LLM pre-training, specifically, is somewhat sparse. In this paper, we introduce the Layer Index Rescaling (LIR) weight initialization scheme, and the Target Variance Rescaling (TVR) variance control strategy. Experiments on a 1B parameter LLaMA model demonstrate that better variance management using these techniques yields substantial improvements in downstream task performance (up to 4.6% on common pre-training benchmarks) and reduces extreme activation values, thus mitigating challenges associated with quantization and low-precision training. Our code is available at: https://github.com/bluorion-com/weight_rescaling.

Updated: 2025-03-21 19:23:08

标题: 在LLM预训练中通过权重重新缩放实现方差控制

摘要: 大语言模型（LLM）预训练的结果在很大程度上取决于权重初始化和方差控制策略。尽管初始方差控制在神经网络中的重要性已经在一般情况下得到了充分的证明，但关于LLM预训练期间初始化和管理其增长的文献相对较少。在本文中，我们介绍了层索引重缩放（LIR）权重初始化方案和目标方差重缩放（TVR）方差控制策略。对1B参数LLaMA模型的实验表明，使用这些技术更好地管理方差可以显著提高下游任务性能（在常见预训练基准上高达4.6%），并减少极端激活值，从而减轻与量化和低精度训练相关的挑战。我们的代码可在以下链接找到：https://github.com/bluorion-com/weight_rescaling。

更新时间: 2025-03-21 19:23:08

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2503.17500v1

You Only Look Once at Anytime (AnytimeYOLO): Analysis and Optimization of Early-Exits for Object-Detection

We introduce AnytimeYOLO, a family of variants of the YOLO architecture that enables anytime object detection. Our AnytimeYOLO networks allow for interruptible inference, i.e., they provide a prediction at any point in time, a property desirable for safety-critical real-time applications. We present structured explorations to modify the YOLO architecture, enabling early termination to obtain intermediate results. We focus on providing fine-grained control through high granularity of available termination points. First, we formalize Anytime Models as a special class of prediction models that offer anytime predictions. Then, we discuss a novel transposed variant of the YOLO architecture, that changes the architecture to enable better early predictions and greater freedom for the order of processing stages. Finally, we propose two optimization algorithms that, given an anytime model, can be used to determine the optimal exit execution order and the optimal subset of early-exits to select for deployment in low-resource environments. We evaluate the anytime performance and trade-offs of design choices, proposing a new anytime quality metric for this purpose. In particular, we also discuss key challenges for anytime inference that currently make its deployment costly.

Updated: 2025-03-21 19:16:38

标题: 您随时只需一次性观察（AnytimeYOLO）：目标检测中早期退出的分析和优化

摘要: 我们介绍了AnytimeYOLO，这是YOLO架构的一组变体，可以实现随时物体检测。我们的AnytimeYOLO网络允许可中断推断，即它们可以在任何时间点提供预测，这对于安全关键的实时应用非常重要。我们提出了结构化的探索来修改YOLO架构，实现提前终止以获得中间结果。我们专注于通过可用终止点的高粒度提供细粒度控制。首先，我们将Anytime模型形式化为一类特殊的预测模型，可以提供随时预测。然后，我们讨论了一种新颖的转置变体的YOLO架构，该变体改变了架构以实现更好的早期预测和更大的处理阶段顺序自由度。最后，我们提出了两种优化算法，可以根据任何模型确定最佳的退出执行顺序和最佳的早期退出子集，以选择在资源有限环境中部署。我们评估了任何时候的性能和设计选择的权衡，并为此提出了一个新的任何时候质量度量标准。特别是，我们还讨论了目前使任何时候推断部署成本高昂的关键挑战。

更新时间: 2025-03-21 19:16:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17497v1

Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches propose masking based on patch informativeness. However, these methods often do not consider the specific requirements of downstream tasks, potentially leading to suboptimal representations for these tasks. In response, we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning. Compared to existing methods, it demonstrates remarkable improvements across diverse datasets and tasks, showcasing its adaptability and efficiency.

Updated: 2025-03-21 19:12:25

标题: 使用多级优化的遮蔽自动编码器中的下游任务引导掩蔽学习

摘要: Masked Autoencoder (MAE) 是一种在视觉表示学习中进行自监督预训练的显著方法。它通过随机遮蔽图像补丁，并利用未遮蔽的补丁重建这些遮蔽补丁。MAE 的一个关键局限在于它忽视不同补丁的信息量不同，因为它均匀选择要遮蔽的补丁。为了克服这一问题，一些方法提出根据补丁的信息量来进行遮蔽。然而，这些方法通常不考虑下游任务的具体要求，可能导致这些任务的表示不够优化。作为回应，我们引入了多级优化遮蔽自编码器（MLO-MAE），这是一个新颖的框架，利用来自下游任务的端到端反馈来学习在预训练期间的最佳遮蔽策略。我们的实验结果突显了MLO-MAE 在视觉表示学习中的重大进展。与现有方法相比，它在不同数据集和任务上展示出显著的改进，展示了其适应性和效率。

更新时间: 2025-03-21 19:12:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.18128v2

Efficient Knowledge Distillation via Curriculum Extraction

Knowledge distillation is a technique used to train a small student network using the output generated by a large teacher network, and has many empirical advantages~\citep{Hinton2015DistillingTK}. While the standard one-shot approach to distillation only uses the output of the final teacher network, recent work~\citep{panigrahi2024progressive} has shown that using intermediate checkpoints from the teacher's training process as an implicit ``curriculum'' for progressive distillation can significantly speed up training. However, such schemes require storing these checkpoints, and often require careful selection of the intermediate checkpoints to train on, which can be impractical for large-scale training. In this paper, we show that a curriculum can be \emph{extracted} from just the fully trained teacher network, and that this extracted curriculum can give similar efficiency benefits to those of progressive distillation. Our extraction scheme is natural; we use a random projection of the hidden representations of the teacher network to progressively train the student network, before training using the output of the full network. We show that our scheme significantly outperforms one-shot distillation and achieves a performance similar to that of progressive distillation for learning sparse parities with two-layer networks, and provide theoretical guarantees for this setting. Additionally, we show that our method outperforms one-shot distillation even when using transformer-based architectures, both for sparse-parity learning, and language modeling tasks.

Updated: 2025-03-21 19:09:41

标题: 通过课程提取实现高效的知识蒸馏

摘要: 知识蒸馏是一种用于训练小型学生网络的技术，其使用大型教师网络生成的输出，并具有许多实证优势。尽管标准的一次性蒸馏方法仅使用最终教师网络的输出，但最近的研究表明，使用教师训练过程中的中间检查点作为渐进式蒸馏的隐式“课程”可以显著加快训练速度。然而，这种方案需要存储这些检查点，并且通常需要仔细选择要训练的中间检查点，这对于大规模训练可能是不切实际的。在本文中，我们展示了一个课程可以从完全训练的教师网络中“提取”出来，并且这个提取的课程可以给予类似于渐进式蒸馏的效率优势。我们的提取方案是自然的；我们使用教师网络的隐藏表示的随机投影来逐步训练学生网络，然后使用完整网络的输出进行训练。我们展示了我们的方案明显优于一次性蒸馏，并在学习具有两层网络的稀疏奇偶性方面实现了类似于渐进式蒸馏的性能，并为这种设置提供了理论保证。此外，我们展示了我们的方法即使在使用基于transformer的架构时，也优于一次性蒸馏，无论是对于稀疏奇偶性学习还是语言建模任务。

更新时间: 2025-03-21 19:09:41

领域: cs.LG,cs.AI,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2503.17494v1

Deep Learning model integrity checking mechanism using watermarking technique

In response to the growing popularity of Machine Learning (ML) techniques to solve problems in various industries, various malicious groups have started to target such techniques in their attack plan. However, as ML models are constantly updated with continuous data, it is very hard to monitor the integrity of ML models. One probable solution would be to use hashing techniques. Regardless of how that would mean re-hashing the model each time the model is trained on newer data which is computationally expensive and not a feasible solution for ML models that are trained on continuous data. Therefore, in this paper, we propose a model integrity-checking mechanism that uses model watermarking techniques to monitor the integrity of ML models. We then demonstrate that our proposed technique can monitor the integrity of ML models even when the model is further trained on newer data with a low computational cost. Furthermore, the integrity checking mechanism can be used on Deep Learning models that work on complex data distributions such as Cyber-Physical System applications.

Updated: 2025-03-21 19:03:13

标题: 使用水印技术的深度学习模型完整性检查机制

摘要: 随着机器学习（ML）技术在各行各业解决问题的日益普及，各种恶意组织开始将这些技术纳入攻击计划中。然而，由于ML模型不断更新并使用连续数据，监控ML模型的完整性变得非常困难。一种可能的解决方案是使用哈希技术。无论如何，这意味着每次模型在新数据上进行训练时都需要重新计算哈希，这在计算上是昂贵的，并不适用于在连续数据上进行训练的ML模型。因此，在本文中，我们提出了一种使用模型水印技术监测ML模型完整性的机制。我们证明了我们提出的技术可以在模型进一步在新数据上进行训练时以低计算成本监测ML模型的完整性。此外，这种完整性检查机制还可以用于处理复杂数据分布的深度学习模型，如网络物理系统应用。

更新时间: 2025-03-21 19:03:13

领域: cs.CR

下载: http://arxiv.org/abs/2301.12333v2

ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes

3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis but is limited by the substantial number of Gaussian primitives required, posing challenges for deployment on lightweight devices. Recent methods address this issue by compressing the storage size of densified Gaussians, yet fail to preserve rendering quality and efficiency. To overcome these limitations, we propose ProtoGS to learn Gaussian prototypes to represent Gaussian primitives, significantly reducing the total Gaussian amount without sacrificing visual quality. Our method directly uses Gaussian prototypes to enable efficient rendering and leverage the resulting reconstruction loss to guide prototype learning. To further optimize memory efficiency during training, we incorporate structure-from-motion (SfM) points as anchor points to group Gaussian primitives. Gaussian prototypes are derived within each group by clustering of K-means, and both the anchor points and the prototypes are optimized jointly. Our experiments on real-world and synthetic datasets prove that we outperform existing methods, achieving a substantial reduction in the number of Gaussians, and enabling high rendering speed while maintaining or even enhancing rendering fidelity.

Updated: 2025-03-21 18:55:14

标题: ProtoGS: 三维高斯原型的高效高质量渲染

摘要: 3D高斯喷溅（3DGS）在新视图合成方面取得了重大进展，但受到所需高斯原语数量的限制，这对轻量级设备的部署提出了挑战。最近的方法通过压缩致密高斯的存储大小来解决这个问题，但未能保持渲染质量和效率。为了克服这些限制，我们提出了ProtoGS，通过学习高斯原型来代表高斯原语，显著减少总高斯数量而不牺牲视觉质量。我们的方法直接使用高斯原型来实现高效渲染，并利用结果的重建损失来指导原型学习。为了在训练过程中进一步优化内存效率，我们将运动结构（SfM）点作为锚点引入，将高斯原语分组。通过K均值聚类在每个组内导出高斯原型，并联合优化锚点和原型。我们在现实世界和合成数据集上的实验证明，我们优于现有方法，实现了高斯数量的显著减少，同时保持或甚至增强了渲染保真度的高渲染速度。

更新时间: 2025-03-21 18:55:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17486v1

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing; however, they often struggle to accurately capture and reflect cultural nuances. This research addresses this challenge by focusing on Saudi Arabia, a country characterized by diverse dialects and rich cultural traditions. We introduce SaudiCulture, a novel benchmark designed to evaluate the cultural competence of LLMs within the distinct geographical and cultural contexts of Saudi Arabia. SaudiCulture is a comprehensive dataset of questions covering five major geographical regions, such as West, East, South, North, and Center, along with general questions applicable across all regions. The dataset encompasses a broad spectrum of cultural domains, including food, clothing, entertainment, celebrations, and crafts. To ensure a rigorous evaluation, SaudiCulture includes questions of varying complexity, such as open-ended, single-choice, and multiple-choice formats, with some requiring multiple correct answers. Additionally, the dataset distinguishes between common cultural knowledge and specialized regional aspects. We conduct extensive evaluations on five LLMs, such as GPT-4, Llama 3.3, FANAR, Jais, and AceGPT, analyzing their performance across different question types and cultural contexts. Our findings reveal that all models experience significant performance declines when faced with highly specialized or region-specific questions, particularly those requiring multiple correct responses. Additionally, certain cultural categories are more easily identifiable than others, further highlighting inconsistencies in LLMs cultural understanding. These results emphasize the importance of incorporating region-specific knowledge into LLMs training to enhance their cultural competence.

Updated: 2025-03-21 18:55:10

标题: 沙特文化：评估沙特阿拉伯大型语言模型文化能力的基准

摘要: 大型语言模型（LLMs）在自然语言处理方面展示了显著的能力；然而，它们往往难以准确捕捉和反映文化细微差别。本研究解决了这一挑战，聚焦于沙特阿拉伯，一个以多样方言和丰富文化传统而闻名的国家。我们引入了SaudiCulture，一个旨在评估LLMs在沙特阿拉伯特定地理和文化背景下的文化能力的新基准。SaudiCulture是一个包含五个主要地理区域（如西部、东部、南部、北部和中部）的问题数据集，以及适用于所有地区的一般问题。该数据集涵盖了食品、服装、娱乐、庆祝活动和手工艺等广泛的文化领域。为了确保严格评估，SaudiCulture包括各种复杂度的问题，如开放式、单选和多选格式，其中一些需要多个正确答案。此外，该数据集区分了常见文化知识和专门地区特色。我们对GPT-4、Llama 3.3、FANAR、Jais和AceGPT等五个LLMs进行了广泛评估，分析它们在不同问题类型和文化背景下的表现。我们的发现显示，当面对高度专业化或区域特定问题时，所有模型都会经历显著的性能下降，特别是那些需要多个正确答案的问题。此外，某些文化类别比其他类别更容易识别，进一步突显了LLMs文化理解中的不一致性。这些结果强调了将区域特定知识纳入LLMs训练以增强其文化能力的重要性。

更新时间: 2025-03-21 18:55:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17485v1

Practical considerations for variable screening in the super learner

Estimating a prediction function is a fundamental component of many data analyses. The super learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms (screeners), including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a super learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screeners should be used to protect against poor performance of any one screener, similar to the guidance for choosing a library of prediction algorithms for the super learner. These results are further illustrated through the analysis of HIV-1 antibody data.

Updated: 2025-03-21 18:53:12

标题: 超级学习器中变量筛选的实用考虑

摘要: 估计预测函数是许多数据分析的基本组成部分。超级学习器集成是堆叠的一种特定实现，具有理想的理论特性，并已成功应用于许多领域。通过在集成中使用变量筛选算法（筛选器），包括lasso，可以实现降维。然而，在lasso已知性能较差的情况下，使用lasso进行降维的超级学习器的性能尚未得到充分探讨。我们提供的实证结果表明，应使用多样化的候选筛选器集合，以防止任何一个筛选器表现不佳，类似于选择超级学习器的预测算法库的指导。通过对HIV-1抗体数据的分析进一步说明了这些结果。

更新时间: 2025-03-21 18:53:12

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2311.03313v2

What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models

How should we evaluate the quality of generative models? Many existing metrics focus on a model's producibility, i.e. the quality and breadth of outputs it can generate. However, the actual value from using a generative model stems not just from what it can produce but whether a user with a specific goal can produce an output that satisfies that goal. We refer to this property as steerability. In this paper, we first introduce a mathematical framework for evaluating steerability independently from producibility. Steerability is more challenging to evaluate than producibility because it requires knowing a user's goals. We address this issue by creating a benchmark task that relies on one key idea: sample an output from a generative model and ask users to reproduce it. We implement this benchmark in a large-scale user study of text-to-image models and large language models. Despite the ability of these models to produce high-quality outputs, they all perform poorly on steerabilty. This suggests that we need to focus on improving the steerability of generative models. We show such improvements are indeed possible: through reinforcement learning techniques, we create an alternative steering mechanism for image models that achieves more than 2x improvement on this benchmark.

Updated: 2025-03-21 18:51:56

标题: 可生产的未必可到达：衡量生成模型的可操纵性

摘要: 我们应该如何评估生成模型的质量？许多现有的度量重点关注模型的可生产性，即它可以生成的输出的质量和广度。然而，使用生成模型带来的实际价值不仅仅取决于它能够产生什么，而是用户是否可以生成满足特定目标的输出。我们将这种特性称为可操控性。在本文中，我们首先引入了一个数学框架，用于独立评估可操控性而非可生产性。可操控性比可生产性更具挑战性，因为它需要了解用户的目标。我们通过创建一个基准任务来解决这个问题，该任务依赖于一个关键思想：从生成模型中抽样一个输出，并要求用户复制它。我们在一个大规模的用户研究中实施了这个基准测试，涉及文本到图像模型和大型语言模型。尽管这些模型能够生成高质量的输出，但它们在可操控性上表现不佳。这表明我们需要专注于提高生成模型的可操控性。我们展示了这种改进确实是可能的：通过强化学习技术，我们为图像模型创建了一种替代的操控机制，使其在这个基准测试上实现了超过2倍的改进。

更新时间: 2025-03-21 18:51:56

领域: cs.LG,cs.AI,cs.CV,cs.HC

下载: http://arxiv.org/abs/2503.17482v1

Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication

In this paper, we present Speak Ease: an augmentative and alternative communication (AAC) system to support users' expressivity by integrating multimodal input, including text, voice, and contextual cues (conversational partner and emotional tone), with large language models (LLMs). Speak Ease combines automatic speech recognition (ASR), context-aware LLM-based outputs, and personalized text-to-speech technologies to enable more personalized, natural-sounding, and expressive communication. Through an exploratory feasibility study and focus group evaluation with speech and language pathologists (SLPs), we assessed Speak Ease's potential to enable expressivity in AAC. The findings highlight the priorities and needs of AAC users and the system's ability to enhance user expressivity by supporting more personalized and contextually relevant communication. This work provides insights into the use of multimodal inputs and LLM-driven features to improve AAC systems and support expressivity.

Updated: 2025-03-21 18:50:05

标题: 你的声音就是你的声音：通过语音生成和LLMs支持自我表达在增强和替代沟通中

摘要: 在这篇论文中，我们介绍了Speak Ease：一种增强和替代性沟通（AAC）系统，通过整合多模态输入（包括文本、语音和上下文提示（对话伙伴和情感语调））与大型语言模型（LLMs）来支持用户的表达能力。Speak Ease结合了自动语音识别（ASR）、上下文感知的基于LLM的输出和个性化文本到语音技术，以实现更个性化、更自然、更富表现力的沟通。通过与言语病理学家（SLPs）进行的探索性可行性研究和焦点小组评估，我们评估了Speak Ease在AAC中实现表达能力的潜力。研究结果突出了AAC用户的优先事项和需求，以及系统通过支持更个性化和与上下文相关的沟通来增强用户的表达能力。这项工作为利用多模态输入和LLM驱动功能改进AAC系统以及支持表达能力提供了见解。

更新时间: 2025-03-21 18:50:05

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.17479v1

End-to-end QKD network with non-localized trust

Quantum Key Distribution (QKD) systems are infamously known for their high demand on hardware, their extremely low key generation rates and their lack of security resulting from a need for trusted nodes which is implied by the absence of quantum repeaters. While they theoretically offer unlimited security, they are therefore practically limited in several regards. In this work we focus on the lack of options to guarantee an end-to-end security service with the currently available technology and infrastructure and propose a novel protocol. We find that one of the stumbling stones on the path towards an end-to-end security service guaranteed by quantum key distribution may be removed by using this protocol. Our proposal combines several parallel instances of twinfield QKD followed by classical postprocessing and communication to allow Alice and Bob to share a secret key. This hybrid approach improves the key rate and range w.r.t. to previous QKD approaches at a contained cost in security. We show that a coalition of intermediary nodes between Alice and Bob is needed to break the new scheme, sharply outperforming the trusted node approach in terms of security. Furthermore, the protocols do not require complex quantum measurements on Alice and Bob's sides, thus being truly end-to-end.

Updated: 2025-03-21 18:46:29

标题: 具有非局限信任的端到端量子密钥分发网络

摘要: 量子密钥分发（QKD）系统以其对硬件的高需求、极低的密钥生成速率以及由于需要信任节点而导致的安全性缺乏而臭名昭著。虽然它们在理论上提供了无限安全性，但从几个方面来说，在实践中它们有所限制。在这项工作中，我们关注目前可用技术和基础设施中缺乏确保端到端安全服务的选项，并提出了一种新颖的协议。我们发现，通过使用这种协议，可以消除通往由量子密钥分发保证的端到端安全服务的道路上的一个绊脚石。我们的提议结合了几个并行的双场QKD实例，随后进行经典后处理和通信，使Alice和Bob能够共享秘钥。这种混合方法在安全成本有限的情况下提高了密钥生成速率和范围，相对于以前的QKD方法。我们表明，在Alice和Bob之间的中间节点联盟是打破新方案所需的，从安全性的角度来看，明显优于信任节点方法。此外，这些协议不需要在Alice和Bob的侧面进行复杂的量子测量，因此真正实现了端到端。

更新时间: 2025-03-21 18:46:29

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2411.17547v3

Bayesian generative models can flag performance loss, bias, and out-of-distribution image content

Generative models are popular for medical imaging tasks such as anomaly detection, feature extraction, data visualization, or image generation. Since they are parameterized by deep learning models, they are often sensitive to distribution shifts and unreliable when applied to out-of-distribution data, creating a risk of, e.g. underrepresentation bias. This behavior can be flagged using uncertainty quantification methods for generative models, but their availability remains limited. We propose SLUG: A new UQ method for VAEs that combines recent advances in Laplace approximations with stochastic trace estimators to scale gracefully with image dimensionality. We show that our UQ score -- unlike the VAE's encoder variances -- correlates strongly with reconstruction error and racial underrepresentation bias for dermatological images. We also show how pixel-wise uncertainty can detect out-of-distribution image content such as ink, rulers, and patches, which is known to induce learning shortcuts in predictive models.

Updated: 2025-03-21 18:45:28

标题: 贝叶斯生成模型可以标记性能损失、偏见和超出分布的图像内容

摘要: 生成模型在医学影像任务中很受欢迎，例如异常检测、特征提取、数据可视化或图像生成。由于它们是由深度学习模型参数化的，当应用于超出分布数据时往往对分布变化敏感且不可靠，存在例如表示偏差的风险。这种行为可以通过生成模型的不确定性量化方法进行标志，但它们的可用性仍然有限。我们提出了SLUG：一种新的VAEs的UQ方法，它将Laplace逼近的最新进展与随机迹估计方法结合在一起，以适应图像维度的逐渐扩展。我们展示了我们的UQ分数 - 与VAE的编码器方差不同 - 与皮肤病图像的重建误差和种族代表性不足偏差强烈相关。我们还展示了像素级不确定性如何可以检测出超出分布的图像内容，例如墨水、尺子和补丁，这些内容已知会在预测模型中引发学习捷径。

更新时间: 2025-03-21 18:45:28

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2503.17477v1

Spatiotemporal Learning with Context-aware Video Tubelets for Ultrasound Video Analysis

Computer-aided pathology detection algorithms for video-based imaging modalities must accurately interpret complex spatiotemporal information by integrating findings across multiple frames. Current state-of-the-art methods operate by classifying on video sub-volumes (tubelets), but they often lose global spatial context by focusing only on local regions within detection ROIs. Here we propose a lightweight framework for tubelet-based object detection and video classification that preserves both global spatial context and fine spatiotemporal features. To address the loss of global context, we embed tubelet location, size, and confidence as inputs to the classifier. Additionally, we use ROI-aligned feature maps from a pre-trained detection model, leveraging learned feature representations to increase the receptive field and reduce computational complexity. Our method is efficient, with the spatiotemporal tubelet classifier comprising only 0.4M parameters. We apply our approach to detect and classify lung consolidation and pleural effusion in ultrasound videos. Five-fold cross-validation on 14,804 videos from 828 patients shows our method outperforms previous tubelet-based approaches and is suited for real-time workflows.

Updated: 2025-03-21 18:39:42

标题: 使用具有上下文感知的视频管道进行超声视频分析的时空学习

摘要: 计算机辅助的病理检测算法对基于视频的成像模态必须准确解释复杂的时空信息，通过整合跨多个帧的发现。当前最先进的方法通过对视频子体积（管状体）进行分类运行，但它们经常通过仅关注检测ROI内的局部区域而失去全局空间背景。在这里，我们提出了一个基于管状体的目标检测和视频分类的轻量级框架，既保留全局空间背景，又保留精细的时空特征。为了解决全局背景的丢失，我们将管状体的位置、大小和置信度嵌入到分类器中作为输入。此外，我们使用来自预训练检测模型的ROI对齐特征图，利用学习到的特征表示来增加感受野并减少计算复杂度。我们的方法高效，时空管状体分类器仅包含0.4M参数。我们将我们的方法应用于超声视频中的肺实变和胸腔积液的检测和分类。对来自828名患者的14,804个视频进行五倍交叉验证显示，我们的方法优于先前基于管状体的方法，并适用于实时工作流程。

更新时间: 2025-03-21 18:39:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17475v1

OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters

Deep learning systems are optimized for clusters with homogeneous resources. However, heterogeneity is prevalent in computing infrastructure across edge, cloud and HPC. When training neural networks using stochastic gradient descent techniques on heterogeneous resources, performance degrades due to stragglers and stale updates. In this work, we develop an adaptive batch-scaling framework called OmniLearn to mitigate the effects of heterogeneity in distributed training. Our approach is inspired by proportional controllers to balance computation across heterogeneous servers, and works under varying resource availability. By dynamically adjusting worker mini-batches at runtime, OmniLearn reduces training time by 14-85%. We also investigate asynchronous training, where our techniques improve accuracy by up to 6.9%.

Updated: 2025-03-21 18:26:24

标题: OmniLearn：一种用于异构集群上分布式深度学习的框架

摘要: 深度学习系统通常针对资源均匀的集群进行优化。然而，在边缘、云和高性能计算基础设施中普遍存在异构性。当使用随机梯度下降技术在异构资源上训练神经网络时，性能会因为阻塞和过时更新而下降。在这项工作中，我们开发了一个自适应批量缩放框架 OmniLearn，以减轻分布式训练中异构性的影响。我们的方法受比例控制器的启发，以平衡异构服务器上的计算，并在不同资源可用性下工作。通过在运行时动态调整工作人员的小批量，OmniLearn将训练时间缩短了14-85%。我们还研究了异步训练，在这种情况下，我们的技术可以将准确性提高高达6.9%。

更新时间: 2025-03-21 18:26:24

领域: cs.LG

下载: http://arxiv.org/abs/2503.17469v1

Language-specific Neurons Do Not Facilitate Cross-Lingual Transfer

Multilingual large language models (LLMs) aim towards robust natural language understanding across diverse languages, yet their performance significantly degrades on low-resource languages. This work explores whether existing techniques to identify language-specific neurons can be leveraged to enhance cross-lingual task performance of lowresource languages. We conduct detailed experiments covering existing language-specific neuron identification techniques (such as Language Activation Probability Entropy and activation probability-based thresholding) and neuron-specific LoRA fine-tuning with models like Llama 3.1 and Mistral Nemo. We find that such neuron-specific interventions are insufficient to yield cross-lingual improvements on downstream tasks (XNLI, XQuAD) in lowresource languages. This study highlights the challenges in achieving cross-lingual generalization and provides critical insights for multilingual LLMs.

Updated: 2025-03-21 18:08:11

标题: 特定语言的神经元不会促进跨语言转移

摘要: 多语言大型语言模型（LLMs）旨在实现跨多种语言的稳健自然语言理解，但它们在资源稀缺语言上的性能显著下降。本文探讨了现有的识别特定语言神经元的技术是否可以用来增强资源稀缺语言的跨语言任务性能。我们进行了详细的实验，涵盖了现有的语言特定神经元识别技术（如语言激活概率熵和基于激活概率的阈值设置）以及神经元特定的LoRA微调，使用Llama 3.1和Mistral Nemo等模型。我们发现，这种神经元特定的干预无法在资源稀缺语言上的下游任务（XNLI、XQuAD）中产生跨语言改进。这项研究突出了实现跨语言泛化的挑战，并为多语言LLMs提供了关键见解。

更新时间: 2025-03-21 18:08:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.17456v1

Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis

Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents. However, many existing FedRL algorithms assume that all agents operate in identical environments, which is often unrealistic. In real-world applications -- such as multi-robot teams, crowdsourced systems, and large-scale sensor networks -- each agent may experience slightly different transition dynamics, leading to inherent model mismatches. In this paper, we first establish linear convergence guarantees for single-agent temporal difference learning (TD(0)) in policy evaluation and demonstrate that under a perturbed environment, the agent suffers a systematic bias that prevents accurate estimation of the true value function. This result holds under both i.i.d. and Markovian sampling regimes. We then extend our analysis to the federated TD(0) (FedTD(0)) setting, where multiple agents -- each interacting with its own perturbed environment -- periodically share value estimates to collaboratively approximate the true value function of a common underlying model. Our theoretical results indicate the impact of model mismatch, network connectivity, and mixing behavior on the convergence of FedTD(0). Empirical experiments corroborate our theoretical gains, highlighting that even moderate levels of information sharing can significantly mitigate environment-specific errors.

Updated: 2025-03-21 18:06:28

标题: 模型不匹配下的协作价值函数估计：一种联邦式时差分析

摘要: 联邦强化学习（FedRL）通过阻止代理之间直接数据交换来实现协作学习，同时保护数据隐私。然而，许多现有的FedRL算法假设所有代理在相同的环境中运行，这通常是不现实的。在现实世界的应用中，例如多机器人团队、众包系统和大规模传感器网络中，每个代理可能会经历略有不同的转换动态，导致固有的模型不匹配。在本文中，我们首先建立了单代理时间差分学习（TD（0））在策略评估中的线性收敛保证，并证明在扰动环境下，代理会遭受一个系统偏差，阻止准确估计真实价值函数。这一结果适用于独立同分布和马尔可夫采样制度。然后，我们将分析扩展到联邦TD（0）（FedTD（0））设置，其中多个代理 - 每个与自己的扰动环境互动 - 定期共享价值估计，以协作逼近共同潜在模型的真实价值函数。我们的理论结果表明模型不匹配、网络连通性和混合行为对FedTD（0）的收敛性产生影响。实证实验证实了我们的理论收益，突显即使有中等水平的信息共享也可以显著减轻特定环境的错误。

更新时间: 2025-03-21 18:06:28

领域: cs.LG

下载: http://arxiv.org/abs/2503.17454v1

CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series

Causal discovery, or identifying causal relationships from observational data, is a notoriously challenging task, with numerous methods proposed to tackle it. Despite this, in-the-wild evaluation of these methods is still lacking, as works frequently rely on synthetic data evaluation and sparse real-world examples under critical theoretical assumptions. Real-world causal structures, however, are often complex, making it hard to decide on a proper causal discovery strategy. To bridge this gap, we introduce CausalRivers, the largest in-the-wild causal discovery benchmarking kit for time-series data to date. CausalRivers features an extensive dataset on river discharge that covers the eastern German territory (666 measurement stations) and the state of Bavaria (494 measurement stations). It spans the years 2019 to 2023 with a 15-minute temporal resolution. Further, we provide additional data from a flood around the Elbe River, as an event with a pronounced distributional shift. Leveraging multiple sources of information and time-series meta-data, we constructed two distinct causal ground truth graphs (Bavaria and eastern Germany). These graphs can be sampled to generate thousands of subgraphs to benchmark causal discovery across diverse and challenging settings. To demonstrate the utility of CausalRivers, we evaluate several causal discovery approaches through a set of experiments to identify areas for improvement. CausalRivers has the potential to facilitate robust evaluations and comparisons of causal discovery methods. Besides this primary purpose, we also expect that this dataset will be relevant for connected areas of research, such as time-series forecasting and anomaly detection. Based on this, we hope to push benchmark-driven method development that fosters advanced techniques for causal discovery, as is the case for many other areas of machine learning.

Updated: 2025-03-21 18:02:35

标题: CausalRivers -- 为现实世界时间序列的因果发现进行基准测试的扩展

摘要: 因果发现，或者从观测数据中识别因果关系，是一个极具挑战性的任务，有许多方法被提出来解决这个问题。尽管如此，在野外对这些方法进行评估仍然缺乏，因为研究经常依赖合成数据评估和在关键理论假设下稀疏的真实世界例子。然而，真实世界的因果结构往往复杂，使得很难确定正确的因果发现策略。为了弥合这一差距，我们介绍了CausalRivers，迄今为止最大的针对时间序列数据的野外因果发现基准套件。CausalRivers包含一个关于河流流量的广泛数据集，涵盖东德领土（666个测量站）和巴伐利亚州（494个测量站）。它跨越了2019年至2023年，时间分辨率为15分钟。此外，我们提供了额外数据，涉及易北河周围的洪水，作为一个具有显著分布变化的事件。利用多个信息源和时间序列元数据，我们构建了两个不同的因果真相图（巴伐利亚和东德）。这些图可以进行抽样，生成数千个子图，用于在各种不同和具有挑战性的环境中进行因果发现基准测试。为了展示CausalRivers的实用性，我们通过一系列实验评估了几种因果发现方法，以确定改进的方向。CausalRivers有潜力促进对因果发现方法的稳健评估和比较。除了这个主要目的，我们还期望这个数据集对于相关领域的研究，如时间序列预测和异常检测，也是相关的。基于此，我们希望推动基准驱动的方法开发，促进因果发现的先进技术，就像对许多其他机器学习领域一样。

更新时间: 2025-03-21 18:02:35

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.17452v1

URLOST: Unsupervised Representation Learning without Stationarity or Topology

Unsupervised representation learning has seen tremendous progress. However, it is constrained by its reliance on domain specific stationarity and topology, a limitation not found in biological intelligence systems. For instance, unlike computer vision, human vision can process visual signals sampled from highly irregular and non-stationary sensors. We introduce a novel framework that learns from high-dimensional data without prior knowledge of stationarity and topology. Our model, abbreviated as URLOST, combines a learnable self-organizing layer, spectral clustering, and a masked autoencoder (MAE). We evaluate its effectiveness on three diverse data modalities including simulated biological vision data, neural recordings from the primary visual cortex, and gene expressions. Compared to state-of-the-art unsupervised learning methods like SimCLR and MAE, our model excels at learning meaningful representations across diverse modalities without knowing their stationarity or topology. It also outperforms other methods that are not dependent on these factors, setting a new benchmark in the field. We position this work as a step toward unsupervised learning methods capable of generalizing across diverse high-dimensional data modalities.

Updated: 2025-03-21 17:59:54

标题: URLOST：无监督学习表示，无需稳定性或拓扑结构

摘要: 无监督表示学习取得了巨大的进展。然而，它受限于对特定领域的稳态和拓扑的依赖，这是生物智能系统中没有的限制。例如，与计算机视觉不同，人类视觉能够处理来自高度不规则和非稳态传感器的视觉信号。我们引入了一个新颖的框架，可以从高维数据中学习，而无需事先了解稳态和拓扑。我们的模型，简称为URLOST，结合了可学习的自组织层、谱聚类和掩模自动编码器（MAE）。我们在包括模拟生物视觉数据、来自初级视觉皮层的神经记录和基因表达在内的三种不同数据模态上评估其有效性。与SimCLR和MAE等最先进的无监督学习方法相比，我们的模型擅长学习有意义的表示，跨越多种模态，而不需要了解它们的稳态或拓扑。它还优于其他不依赖于这些因素的方法，为该领域设立了新的基准。我们将这项工作定位为迈向能够横跨多种高维数据模态进行泛化的无监督学习方法的一步。

更新时间: 2025-03-21 17:59:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.04496v2

Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation

Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for peptide and protein generation. We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature. Using this interpolant, we introduce Gumbel-Softmax Flow Matching by deriving a parameterized velocity field that transports from smooth categorical distributions to distributions concentrated at a single vertex of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices. To enable training-free guidance, we propose Straight-Through Guided Flows (STGFlow), a classifier-based guidance method that leverages straight-through estimators to steer the unconditional velocity field toward optimal vertices of the simplex. STGFlow enables efficient inference-time guidance using classifiers pre-trained on clean sequences, and can be used with any discrete flow method. Together, these components form a robust framework for controllable de novo sequence generation. We demonstrate state-of-the-art performance in conditional DNA promoter design, sequence-only protein generation, and target-binding peptide design for rare disease treatment.

Updated: 2025-03-21 17:59:43

标题: Gumbel-Softmax流匹配与直通引导用于可控生物序列生成

摘要: 在连续单纯形中的流匹配已经成为DNA序列设计的一种有前途的策略，但在用于肽和蛋白生成所需的更高维度单纯形中很难扩展。我们引入了Gumbel-Softmax流和得分匹配，这是基于一种新颖的Gumbel-Softmax插值器和一个时间相关温度的单纯形上的生成框架。利用这个插值器，我们通过导出一个参数化速度场，将平滑分类分布传送到单纯形的单个顶点处，引入了Gumbel-Softmax流匹配。我们另外提出了Gumbel-Softmax得分匹配，学习回归概率密度的梯度。我们的框架能够实现高质量、多样性的生成，并能有效扩展到更高维的单纯形。为了实现无需训练的引导，我们提出了基于分类器的引导方法Straight-Through Guided Flows (STGFlow)，利用直通估计器将无条件速度场引导到单纯形的最佳顶点。STGFlow利用在干净序列上预训练的分类器，在推理时实现高效引导，并可与任何离散流方法一起使用。这些组件共同构成了一个稳健的框架，用于可控的全新序列生成。我们展示了在有条件的DNA启动子设计、仅序列蛋白生成以及罕见疾病治疗的靶向结合肽设计方面的最新性能。

更新时间: 2025-03-21 17:59:43

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2503.17361v1

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

Large language models (LLMs) have demonstrated remarkable reasoning capability in solving mathematical problems. However, existing approaches primarily focus on improving the quality of correct training data, e.g., distilling high-quality correct solutions from advanced models, neglecting the value contained in error data, potentially hindering the model's reflective ability. Though some studies attempt to leverage error data, they often involve complex mechanisms, such as Monte Carlo Tree Search (MCTS) to explore error nodes. In this work, we propose to enhance LLMs' reasoning ability by Learning from Errors for Mathematical Advancement (LEMMA). LEMMA constructs data consisting of an incorrect solution with an erroneous step and a reflection connection to a correct solution for fine-tuning. Specifically, we systematically analyze the model-generated error types and introduce an error-type grounded mistake augmentation method to collect diverse and representative errors. Correct solutions are either from fixing the errors or generating a fresh start. Through a model-aware smooth reflection connection, the erroneous solution is transferred to the correct one. By fine-tuning on the constructed dataset, the model is able to self-correct errors autonomously within the generation process without relying on external critique models. Experimental results demonstrate that LEMMA achieves significant performance improvements over other strong baselines.

Updated: 2025-03-21 17:59:10

标题: 引理：从错误中学习，促进LLMs中的数学发展

摘要: 大型语言模型（LLMs）已经展示出在解决数学问题方面具有显著的推理能力。然而，现有方法主要集中在改进正确训练数据的质量，例如从先进模型中提炼高质量的正确解决方案，忽视了错误数据中所包含的价值，可能阻碍了模型的反思能力。尽管一些研究尝试利用错误数据，但它们通常涉及复杂的机制，例如蒙特卡洛树搜索（MCTS）来探索错误节点。在这项工作中，我们提出通过从错误中学习数学进步（LEMMA）来增强LLMs的推理能力。LEMMA构建由一个带有错误步骤的错误解决方案和一个与正确解决方案的反思连接的数据进行微调。具体而言，我们系统分析模型生成的错误类型，并引入了一种基于错误类型的错误增强方法，以收集多样化和代表性的错误。正确解决方案可以是修正错误或从头开始生成。通过模型感知的平滑反思连接，错误解决方案被转换为正确解决方案。通过在构建的数据集上进行微调，模型能够在生成过程中自主地纠正错误，而无需依赖外部批评模型。实验结果表明，LEMMA相对于其他强基线实现了显著的性能提升。

更新时间: 2025-03-21 17:59:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17439v1

Glivenko-Cantelli for $f$-divergence

We extend the celebrated Glivenko-Cantelli theorem, sometimes called the fundamental theorem of statistics, from its standard setting of total variation distance to all $f$-divergences. A key obstacle in this endeavor is to define $f$-divergence on a subcollection of a $\sigma$-algebra that forms a $\pi$-system but not a $\sigma$-subalgebra. This is a side contribution of our work. We will show that this notion of $f$-divergence on the $\pi$-system of rays preserves nearly all known properties of standard $f$-divergence, yields a novel integral representation of the Kolmogorov-Smirnov distance, and has a Glivenko-Cantelli theorem.

Updated: 2025-03-21 17:58:10

标题: Glivenko-Cantelli关于$f$-散度

摘要: 我们将著名的Glivenko-Cantelli定理（有时被称为统计学的基本定理）从其标准的总变差距离设置扩展到所有$f$-散度。在这个努力中的一个关键障碍是在构成$\pi$-系统但不是$\sigma$-代数的一个子集合上定义$f$-散度。这是我们工作的一个附带贡献。我们将展示在射线的$\pi$-系统上的$f$-散度的概念保持了几乎所有已知的标准$f$-散度性质，产生了Kolmogorov-Smirnov距离的新型积分表示，并且具有Glivenko-Cantelli定理。

更新时间: 2025-03-21 17:58:10

领域: math.ST,cs.LG,stat.TH,60B10, 60F15, 60F25

下载: http://arxiv.org/abs/2503.17355v1

Primal Methods for Variational Inequality Problems with Functional Constraints

Variational inequality problems are recognized for their broad applications across various fields including machine learning and operations research. First-order methods have emerged as the standard approach for solving these problems due to their simplicity and scalability. However, they typically rely on projection or linear minimization oracles to navigate the feasible set, which becomes computationally expensive in practical scenarios featuring multiple functional constraints. Existing efforts to tackle such functional constrained variational inequality problems have centered on primal-dual algorithms grounded in the Lagrangian function. These algorithms along with their theoretical analysis often require the existence and prior knowledge of the optimal Lagrange multipliers. In this work, we propose a simple primal method, termed Constrained Gradient Method (CGM), for addressing functional constrained variational inequality problems, without requiring any information on the optimal Lagrange multipliers. We establish a non-asymptotic convergence analysis of the algorithm for Minty variational inequality problems with monotone operators under smooth constraints. Remarkably, our algorithms match the complexity of projection-based methods in terms of operator queries for both monotone and strongly monotone settings, while using significantly cheaper oracles based on quadratic programming. Furthermore, we provide several numerical examples to evaluate the efficacy of our algorithms.

Updated: 2025-03-21 17:54:32

标题: 原始方法用于带有功能约束的变分不等式问题

摘要: 变分不等式问题以其在包括机器学习和运筹学在内的各个领域的广泛应用而闻名。由于其简单性和可扩展性，一阶方法已经成为解决这些问题的标准方法。然而，它们通常依赖于投影或线性最小化预言者来导航可行集，在实际情况下具有多个功能约束的情况下，这变得计算昂贵。现有的解决此类功能约束变分不等式问题的努力集中在基于Lagrange函数的原始-对偶算法上。这些算法以及它们的理论分析通常需要存在和先验知识的最优拉格朗日乘子。在这项工作中，我们提出了一种简单的原始方法，称为约束梯度法（CGM），用于解决功能约束变分不等式问题，而无需任何关于最优拉格朗日乘子的信息。我们为具有光滑约束下的单调算子的Minty变分不等式问题建立了该算法的非渐近收敛分析。值得注意的是，我们的算法在单调和强单调设置方面与基于投影的方法的复杂性相匹配，同时使用基于二次规划的预言者更加廉价。此外，我们提供了几个数值示例，以评估我们算法的有效性。

更新时间: 2025-03-21 17:54:32

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.12859v2

HCAST: Human-Calibrated Autonomy Software Tasks

To understand and predict the societal impacts of highly autonomous AI systems, we need benchmarks with grounding, i.e., metrics that directly connect AI performance to real-world effects we care about. We present HCAST (Human-Calibrated Autonomy Software Tasks), a benchmark of 189 machine learning engineering, cybersecurity, software engineering, and general reasoning tasks. We collect 563 human baselines (totaling over 1500 hours) from people skilled in these domains, working under identical conditions as AI agents, which lets us estimate that HCAST tasks take humans between one minute and 8+ hours. Measuring the time tasks take for humans provides an intuitive metric for evaluating AI capabilities, helping answer the question "can an agent be trusted to complete a task that would take a human X hours?" We evaluate the success rates of AI agents built on frontier foundation models, and we find that current agents succeed 70-80% of the time on tasks that take humans less than one hour, and less than 20% of the time on tasks that take humans more than 4 hours.

Updated: 2025-03-21 17:54:01

标题: HCAST：人类校准的自主软件任务

摘要: 为了理解和预测高度自主人工智能系统对社会的影响，我们需要具有基础的基准，即直接将人工智能性能与我们关心的现实世界影响联系起来的度量标准。我们提出了HCAST（人类校准的自主软件任务），这是一个包含189个机器学习工程、网络安全、软件工程和一般推理任务的基准。我们收集了563个人类基线（总计超过1500小时），这些人在与AI代理人相同的条件下工作，让我们估计HCAST任务需要人类花费1分钟到8小时以上的时间。通过测量任务对人类所需的时间，为评估人工智能能力提供了直观的度量标准，帮助回答“一个代理人是否可以信任完成一个人类需要X小时才能完成的任务”的问题。我们评估了建立在前沿基础模型上的人工智能代理的成功率，发现目前代理在人类需时少于1小时的任务上成功率为70-80％，在人类需时超过4小时的任务上成功率不到20％。

更新时间: 2025-03-21 17:54:01

领域: cs.AI,I.2.0

下载: http://arxiv.org/abs/2503.17354v1

NdLinear Is All You Need for Representation Learning

Many high-impact machine learning tasks involve multi-dimensional data (e.g., images, volumetric medical scans, multivariate time-series). Yet, most neural architectures flatten inputs, discarding critical cross-dimension information. We introduce NdLinear, a novel linear transformation that preserves these structures without extra overhead. By operating separately along each dimension, NdLinear captures dependencies that standard fully connected layers overlook. Extensive experiments across convolutional, recurrent, and transformer-based networks show significant improvements in representational power and parameter efficiency. Crucially, NdLinear serves as a foundational building block for large-scale foundation models by operating on any unimodal or multimodal data in its native form. This removes the need for flattening or modality-specific preprocessing. Ndlinear rethinks core architectural priorities beyond attention, enabling more expressive, context-aware models at scale. We propose NdLinear as a drop-in replacement for standard linear layers -- marking an important step toward next-generation neural architectures.

Updated: 2025-03-21 17:52:44

标题: NdLinear是您进行表示学习所需要的一切

摘要: 许多高影响力的机器学习任务涉及多维数据（例如，图像、体积医学扫描、多变量时间序列）。然而，大多数神经网络架构会压缩输入数据，丢弃关键的跨维度信息。我们引入了NdLinear，一种新颖的线性变换，可以保留这些结构而不增加额外开销。通过沿着每个维度单独操作，NdLinear捕捉了标准全连接层忽略的依赖关系。在卷积、循环和基于transformer的网络上进行了大量实验，结果显示了在表征能力和参数效率方面的显著改进。关键是，NdLinear作为大规模基础模型的基础构建块，可以在其原生形式下处理任何单模态或多模态数据。这消除了压缩或特定模态的预处理的需求。Ndlinear重新思考了超越注意力的核心架构优先事项，使得规模更大的更具表达力和上下文感知的模型成为可能。我们提议将NdLinear作为标准线性层的即插即用替代品，这是迈向下一代神经网络架构的重要一步。

更新时间: 2025-03-21 17:52:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17353v1

RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving

Retrieval-augmented generation (RAG), which combines large language models (LLMs) with retrievals from external knowledge databases, is emerging as a popular approach for reliable LLM serving. However, efficient RAG serving remains an open challenge due to the rapid emergence of many RAG variants and the substantial differences in workload characteristics across them. In this paper, we make three fundamental contributions to advancing RAG serving. First, we introduce RAGSchema, a structured abstraction that captures the wide range of RAG algorithms, serving as a foundation for performance optimization. Second, we analyze several representative RAG workloads with distinct RAGSchema, revealing significant performance variability across these workloads. Third, to address this variability and meet diverse performance requirements, we propose RAGO (Retrieval-Augmented Generation Optimizer), a system optimization framework for efficient RAG serving. Our evaluation shows that RAGO achieves up to a 2x increase in QPS per chip and a 55% reduction in time-to-first-token latency compared to RAG systems built on LLM-system extensions.

Updated: 2025-03-21 17:51:53

标题: RAGO：面向检索增强生成服务的系统性能优化

摘要: 检索增强生成（RAG）将大型语言模型（LLMs）与外部知识数据库中的检索结合起来，正在成为可靠LLM服务的流行方法。然而，由于许多RAG变体的迅速出现以及它们之间工作负载特征的实质性差异，高效的RAG服务仍然是一个开放性挑战。在本文中，我们对推进RAG服务做出三项基本贡献。首先，我们引入了RAGSchema，这是一个结构化抽象，捕捉了各种RAG算法，作为性能优化的基础。其次，我们分析了几种具有不同RAGSchema的代表性RAG工作负载，揭示了这些工作负载之间的显著性能变化。第三，为了解决这种变化并满足不同的性能要求，我们提出了RAGO（检索增强生成优化器），这是一个用于高效RAG服务的系统优化框架。我们的评估结果显示，与基于LLM系统扩展构建的RAG系统相比，RAGO实现了每个芯片的QPS增加高达2倍，首个标记延迟减少了55%。

更新时间: 2025-03-21 17:51:53

领域: cs.IR,cs.AI,cs.CL,cs.DC,C.1; C.4; H.3

下载: http://arxiv.org/abs/2503.14649v2

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

Automatically generating natural, diverse and rhythmic human dance movements driven by music is vital for virtual reality and film industries. However, generating dance that naturally follows music remains a challenge, as existing methods lack proper beat alignment and exhibit unnatural motion dynamics. In this paper, we propose Danceba, a novel framework that leverages gating mechanism to enhance rhythm-aware feature representation for music-driven dance generation, which achieves highly aligned dance poses with enhanced rhythmic sensitivity. Specifically, we introduce Phase-Based Rhythm Extraction (PRE) to precisely extract rhythmic information from musical phase data, capitalizing on the intrinsic periodicity and temporal structures of music. Additionally, we propose Temporal-Gated Causal Attention (TGCA) to focus on global rhythmic features, ensuring that dance movements closely follow the musical rhythm. We also introduce Parallel Mamba Motion Modeling (PMMM) architecture to separately model upper and lower body motions along with musical features, thereby improving the naturalness and diversity of generated dance movements. Extensive experiments confirm that Danceba outperforms state-of-the-art methods, achieving significantly better rhythmic alignment and motion diversity. Project page: https://danceba.github.io/ .

Updated: 2025-03-21 17:42:50

标题: 调整您的节奏：通过增强门控节奏感知特征表示生成高度对齐的舞蹈姿势

摘要: 自动生成自然、多样且具有节奏感的人类舞蹈动作对虚拟现实和电影产业至关重要。然而，生成自然地与音乐相伴的舞蹈仍然是一个挑战，因为现有方法缺乏适当的节奏对齐，并展示出不自然的运动动态。在本文中，我们提出Danceba，这是一个新颖的框架，利用门控机制增强音乐驱动舞蹈生成的节奏感特征表示，从而实现高度对齐的舞蹈姿势并增强节奏敏感性。具体来说，我们引入基于相位的节奏提取（PRE）来精确从音乐相位数据中提取节奏信息，利用音乐的固有周期性和时间结构。此外，我们提出了时间门控因果注意力（TGCA）来关注全局节奏特征，确保舞蹈动作紧密跟随音乐节奏。我们还引入了并行曼巴运动建模（PMMM）架构，以分别对上半身和下半身运动以及音乐特征进行建模，从而改善生成舞蹈动作的自然性和多样性。大量实验证实Danceba优于最先进的方法，实现了显著更好的节奏对齐和运动多样性。项目页面：https://danceba.github.io/。

更新时间: 2025-03-21 17:42:50

领域: cs.MM,cs.AI,cs.CV,cs.SD,eess.AS

下载: http://arxiv.org/abs/2503.17340v1

Temporal-Spatial Attention Network (TSAN) for DoS Attack Detection in Network Traffic

Denial-of-Service (DoS) attacks remain a critical threat to network security, disrupting services and causing significant economic losses. Traditional detection methods, including statistical and rule-based models, struggle to adapt to evolving attack patterns. To address this challenge, we propose a novel Temporal-Spatial Attention Network (TSAN) architecture for detecting Denial of Service (DoS) attacks in network traffic. By leveraging both temporal and spatial features of network traffic, our approach captures complex traffic patterns and anomalies that traditional methods might miss. The TSAN model incorporates transformer-based temporal encoding, convolutional spatial encoding, and a cross-attention mechanism to fuse these complementary feature spaces. Additionally, we employ multi-task learning with auxiliary tasks to enhance the model's robustness. Experimental results on the NSL-KDD dataset demonstrate that TSAN outperforms state-of-the-art models, achieving superior accuracy, precision, recall, and F1-score while maintaining computational efficiency for real-time deployment. The proposed architecture offers an optimal balance between detection accuracy and computational overhead, making it highly suitable for real-world network security applications.

Updated: 2025-03-21 17:40:15

标题: 时空关注网络（TSAN）用于网络流量中DoS攻击检测

摘要: 拒绝服务（DoS）攻击仍然是网络安全的一个关键威胁，会破坏服务并造成重大经济损失。传统的检测方法，包括统计和基于规则的模型，很难适应不断演变的攻击模式。为了解决这一挑战，我们提出了一种新颖的时间空间注意力网络（TSAN）架构，用于检测网络流量中的DoS攻击。通过利用网络流量的时间和空间特征，我们的方法捕捉到传统方法可能会忽略的复杂流量模式和异常。TSAN模型包括基于转换器的时间编码、卷积空间编码和交叉注意力机制，将这些互补的特征空间融合在一起。此外，我们使用多任务学习和辅助任务来增强模型的鲁棒性。在NSL-KDD数据集上的实验结果表明，TSAN优于最先进的模型，实现了更高的准确性、精确度、召回率和F1分数，同时保持了实时部署的计算效率。所提出的架构在检测精度和计算开销之间提供了最佳平衡，非常适合于实际网络安全应用。

更新时间: 2025-03-21 17:40:15

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.16047v2

Can AI expose tax loopholes? Towards a new generation of legal policy assistants

The legislative process is the backbone of a state built on solid institutions. Yet, due to the complexity of laws -- particularly tax law -- policies may lead to inequality and social tensions. In this study, we introduce a novel prototype system designed to address the issues of tax loopholes and tax avoidance. Our hybrid solution integrates a natural language interface with a domain-specific language tailored for planning. We demonstrate on a case study how tax loopholes and avoidance schemes can be exposed. We conclude that our prototype can help enhance social welfare by systematically identifying and addressing tax gaps stemming from loopholes.

Updated: 2025-03-21 17:40:06

标题: 人工智能能揭示税收漏洞吗？走向新一代法律政策助手

摘要: 立法过程是建立在坚实制度基础上的国家的支柱。然而，由于法律的复杂性，特别是税法，政策可能会导致不平等和社会紧张局势。在这项研究中，我们介绍了一个旨在解决税收漏洞和逃税问题的新型原型系统。我们的混合解决方案将自然语言界面与专为规划而设计的领域特定语言集成在一起。我们通过一个案例研究展示了如何揭露税收漏洞和逃税计划。我们得出结论，我们的原型系统可以通过系统地识别和解决源于漏洞的税收差距，有助于增强社会福利。

更新时间: 2025-03-21 17:40:06

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.17339v1

Capturing Individual Human Preferences with Reward Features

Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.

Updated: 2025-03-21 17:39:33

标题: 用奖励特征捕捉个体人类偏好

摘要: 人类反馈强化学习通常使用一个奖励模型来建模偏好，该模型不区分人群。我们认为，在存在高潜在分歧的情况下，例如在大型语言模型的训练中，这不太可能是一个好的设计选择。我们提出了一种方法，可以将奖励模型专门针对某个人或一组人。我们的方法基于一个观察，即个体偏好可以被捕捉为一组常规奖励特征的线性组合。我们展示了如何学习这些特征，并随后使用它们快速调整奖励模型以适应特定个体，即使他们的偏好没有在训练数据中反映出来。我们对大型语言模型进行了实验，比较了所提出的架构与非自适应奖励模型以及自适应对应模型，包括进行上下文个性化的模型。根据训练数据中存在多少分歧，我们的模型要么明显优于基线，要么在更简单的架构和更稳定的训练下与其性能相匹配。

更新时间: 2025-03-21 17:39:33

领域: cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17338v1

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail outcomes. To provide credible recommendations, experimenters must not only identify policies that satisfy the desired changes in goal and guardrail outcomes, but also offer probabilistic guarantees about the changes these policies induce. In practice, however, policy classes are often large, and digital experiments tend to produce datasets with small effect sizes relative to noise. In this setting, standard approaches such as data splitting or multiple testing often result in unstable policy selection and/or insufficient statistical power. In this paper, we provide safe noisy policy learning (SNPL), a novel approach that leverages the concept of algorithmic stability to address these challenges. Our method enables policy learning while simultaneously providing high-confidence guarantees using the entire dataset, avoiding the need for data-splitting. We present finite-sample and asymptotic versions of our algorithm that ensure the recommended policy satisfies high-probability guarantees for avoiding guardrail regressions and/or achieving goal outcome improvements. We test both variants of our approach approach empirically on a real-world application of personalizing SMS delivery. Our results on real-world data suggest that our approach offers dramatic improvements in settings with large policy classes and low signal-to-noise across both finite-sample and asymptotic safety guarantees, offering up to 300\% improvements in detection rates and 150\% improvements in policy gains at significantly smaller sample sizes.

Updated: 2025-03-21 17:38:14

标题: SNPL: 安全多目标政策改进的同时政策学习和评估

摘要: 为设计有效的数字干预，实验者面临着使用离线数据学习平衡多个目标的决策策略的挑战。通常，他们的目标是开发最大化目标结果的策略，同时确保在防护栏结果中没有不良变化。为了提供可信的建议，实验者不仅必须确定满足目标和防护栏结果期望变化的策略，还必须提供关于这些策略引发的变化的概率保证。然而，在实践中，策略类通常很大，数字实验往往产生与噪声相比效果较小的数据集。在这种情况下，诸如数据分割或多重测试等标准方法通常会导致不稳定的策略选择和/或统计功效不足。在本文中，我们提出了一种新颖的安全嘈杂策略学习（SNPL）方法，利用算法稳定性的概念解决这些挑战。我们的方法在提供高置信度保证的同时，利用整个数据集进行策略学习，避免了需要数据分割的需求。我们提出了我们算法的有限样本和渐近版本，确保推荐的策略满足高概率保证，避免防护栏回归和/或实现目标结果改善。我们在一个个性化短信传递的真实应用中对我们的方法的两个变体进行了实证测试。我们在真实数据上的结果表明，在策略类较大且信号噪声比较低的情况下，我们的方法在有限样本和渐近安全保证方面都提供了显著改进，即使在较小的样本量下，检测率提高了高达300\%，策略收益提高了150\%。

更新时间: 2025-03-21 17:38:14

领域: stat.ML,cs.LG,econ.EM

下载: http://arxiv.org/abs/2503.12760v2

Efficient Intent-Based Filtering for Multi-Party Conversations Using Knowledge Distillation from LLMs

Large language models (LLMs) have showcased remarkable capabilities in conversational AI, enabling open-domain responses in chat-bots, as well as advanced processing of conversations like summarization, intent classification, and insights generation. However, these models are resource-intensive, demanding substantial memory and computational power. To address this, we propose a cost-effective solution that filters conversational snippets of interest for LLM processing, tailored to the target downstream application, rather than processing every snippet. In this work, we introduce an innovative approach that leverages knowledge distillation from LLMs to develop an intent-based filter for multi-party conversations, optimized for compute power constrained environments. Our method combines different strategies to create a diverse multi-party conversational dataset, that is annotated with the target intents and is then used to fine-tune the MobileBERT model for multi-label intent classification. This model achieves a balance between efficiency and performance, effectively filtering conversation snippets based on their intents. By passing only the relevant snippets to the LLM for further processing, our approach significantly reduces overall operational costs depending on the intents and the data distribution as demonstrated in our experiments.

Updated: 2025-03-21 17:34:37

标题: 利用LLMs的知识蒸馏实现多方会话的高效基于意图的过滤

摘要: 大型语言模型（LLMs）在对话人工智能中展示了卓越的能力，能够在聊天机器人中实现开放域响应，以及对话的高级处理，如摘要、意图分类和洞察生成。然而，这些模型需要大量资源，需要大量的内存和计算能力。为了解决这个问题，我们提出了一个经济高效的解决方案，为LLM处理定制了感兴趣的对话片段，而不是处理每个片段。在这项工作中，我们引入了一种创新的方法，利用LLMs的知识蒸馏，开发了一个基于意图的多方对话过滤器，针对计算能力受限的环境进行优化。我们的方法结合了不同的策略，创建了一个多方对话数据集，该数据集用目标意图进行注释，然后用于对MobileBERT模型进行多标签意图分类的微调。这个模型在效率和性能之间取得了平衡，有效地根据其意图过滤对话片段。通过仅将相关片段传递给LLM进行进一步处理，我们的方法根据意图和数据分布显著降低了总体运营成本，如我们的实验证明的那样。

更新时间: 2025-03-21 17:34:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17336v1

GreenIQ: A Deep Search Platform for Comprehensive Carbon Market Analysis and Automated Report Generation

This study introduces GreenIQ, an AI-powered deep search platform designed to revolutionise carbon market intelligence through autonomous analysis and automated report generation. Carbon markets operate across diverse regulatory landscapes, generating vast amounts of heterogeneous data from policy documents, industry reports, academic literature, and real-time trading platforms. Traditional research approaches remain labour-intensive, slow, and difficult to scale. GreenIQ addresses these limitations through a multi-agent architecture powered by Large Language Models (LLMs), integrating five specialised AI agents: a Main Researcher Agent for intelligent information retrieval, a Report Writing Agent for structured synthesis, a Final Reviewer Agent for accuracy verification, a Data Visualisation Agent for enhanced interpretability, and a Translator Agent for multilingual adaptation. The system achieves seamless integration of structured and unstructured information with AI-driven citation verification, ensuring high transparency and reliability. GreenIQ delivers a 99.2\% reduction in processing time and a 99.7\% cost reduction compared to traditional research methodologies. A novel AI persona-based evaluation framework involving 16 domain-specific AI personas highlights its superior cross-jurisdictional analytical capabilities and regulatory insight generation. GreenIQ sets new standards in AI-driven research synthesis, policy analysis, and sustainability finance by streamlining carbon market research. It offers an efficient and scalable framework for environmental and financial intelligence, enabling more accurate, timely, and cost-effective decision-making in complex regulatory landscapes

Updated: 2025-03-21 17:33:33

标题: GreenIQ：一种用于全面碳市场分析和自动生成报告的深度搜索平台

摘要: 这项研究介绍了GreenIQ，这是一个由人工智能驱动的深度搜索平台，旨在通过自主分析和自动报告生成彻底改变碳市场情报。碳市场运作在不同的监管环境中，产生大量来自政策文件、行业报告、学术文献和实时交易平台的异构数据。传统的研究方法仍然劳动密集、缓慢且难以扩展。GreenIQ通过由大型语言模型（LLMs）驱动的多智能体架构，集成了五个专门的人工智能智能体：一个用于智能信息检索的主要研究人员智能体，一个用于结构化综合的报告撰写智能体，一个用于准确性验证的最终审阅者智能体，一个用于增强可解释性的数据可视化智能体，以及一个用于多语言适应的翻译智能体。该系统通过人工智能驱动的引用验证实现了结构化和非结构化信息的无缝整合，确保高透明度和可靠性。与传统研究方法相比，GreenIQ实现了99.2\%的处理时间减少和99.7\%的成本降低。一个涉及16个领域特定人工智能人物的新颖AI人物评估框架突显了其卓越的跨司法分析能力和监管洞察力。GreenIQ通过简化碳市场研究，为人工智能驱动的研究综合、政策分析和可持续金融设立了新的标准。它为环境和金融情报提供了高效且可扩展的框架，使复杂的监管环境中的决策更准确、及时和具有成本效益。

更新时间: 2025-03-21 17:33:33

领域: cs.AI

下载: http://arxiv.org/abs/2503.16041v2

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Large language model (LLM) agents are increasingly capable of autonomously conducting cyberattacks, posing significant threats to existing applications. This growing risk highlights the urgent need for a real-world benchmark to evaluate the ability of LLM agents to exploit web application vulnerabilities. However, existing benchmarks fall short as they are limited to abstracted Capture the Flag competitions or lack comprehensive coverage. Building a benchmark for real-world vulnerabilities involves both specialized expertise to reproduce exploits and a systematic approach to evaluating unpredictable threats. To address this challenge, we introduce CVE-Bench, a real-world cybersecurity benchmark based on critical-severity Common Vulnerabilities and Exposures. In CVE-Bench, we design a sandbox framework that enables LLM agents to exploit vulnerable web applications in scenarios that mimic real-world conditions, while also providing effective evaluation of their exploits. Our evaluation shows that the state-of-the-art agent framework can resolve up to 13% of vulnerabilities.

Updated: 2025-03-21 17:32:32

标题: CVE-Bench：用于评估AI代理利用现实世界Web应用程序漏洞能力的基准测试

摘要: 大型语言模型（LLM）代理越来越能够自主进行网络攻击，对现有应用程序构成重大威胁。这种不断增长的风险突显了对一个真实世界基准的迫切需求，以评估LLM代理利用Web应用程序漏洞的能力。然而，现有的基准存在不足，因为它们仅限于抽象的夺旗比赛或缺乏全面覆盖。构建一个针对真实世界漏洞的基准需要专业知识来重现攻击，并采用系统化方法来评估不可预测的威胁。为了解决这一挑战，我们介绍了CVE-Bench，这是一个基于关键严重性常见漏洞和暴露的真实世界网络安全基准。在CVE-Bench中，我们设计了一个沙箱框架，使LLM代理能够在模拟真实世界条件的情景中利用易受攻击的Web应用程序，同时有效评估他们的攻击。我们的评估表明，最先进的代理框架可以解决多达13%的漏洞。

更新时间: 2025-03-21 17:32:32

领域: cs.CR,cs.AI,I.2.1; I.2.7

下载: http://arxiv.org/abs/2503.17332v1

Data-driven measures of high-frequency trading

High-frequency trading (HFT) accounts for almost half of equity trading volume, yet it is not identified in public data. We develop novel data-driven measures of HFT activity that separate strategies that supply and demand liquidity. We train machine learning models to predict HFT activity observed in a proprietary dataset using concurrent public intraday data. Once trained on the dataset, these models generate HFT measures for the entire U.S. stock universe from 2010 to 2023. Our measures outperform conventional proxies, which struggle to capture HFT's time dynamics. We further validate them using shocks to HFT activity, including latency arbitrage, exchange speed bumps, and data feed upgrades. Finally, our measures reveal how HFT affects fundamental information acquisition. Liquidity-supplying HFTs improve price discovery around earnings announcements while liquidity-demanding strategies impede it.

Updated: 2025-03-21 17:31:44

标题: 数据驱动的高频交易量度

摘要: 高频交易（HFT）占据了近一半的股票交易量，但在公共数据中并未被识别。我们开发了一种新颖的基于数据驱动的HFT活动度量方法，可以区分供给和需求流动性的策略。我们训练机器学习模型，使用同时进行的公共逐日数据来预测专有数据集中观察到的HFT活动。一旦在数据集上训练完成，这些模型可以为2010年至2023年美国股票市场的整个股票宇宙生成HFT度量。我们的度量表现优于传统的代理变量，后者很难捕捉HFT的时间动态。我们进一步通过对HFT活动的冲击进行验证，包括延迟套利、交易所速度阻碍和数据源升级。最后，我们的度量揭示了HFT如何影响基本信息获取。提供流动性的HFT在盈利公告周围改进了价格发现，而需求流动性策略则阻碍了它。

更新时间: 2025-03-21 17:31:44

领域: q-fin.CP,cs.LG,91G15, 62P20

下载: http://arxiv.org/abs/2405.08101v3

Predicting Potential Customer Support Needs and Optimizing Search Ranking in a Two-Sided Marketplace

Airbnb is an online marketplace that connects hosts and guests to unique stays and experiences. When guests stay at homes booked on Airbnb, there are a small fraction of stays that lead to support needed from Airbnb's Customer Support (CS), which may cause inconvenience to guests and hosts and require Airbnb resources to resolve. In this work, we show that instances where CS support is needed may be predicted based on hosts and guests behavior. We build a model to predict the likelihood of CS support needs for each match of guest and host. The model score is incorporated into Airbnb's search ranking algorithm as one of the many factors. The change promotes more reliable matches in search results and significantly reduces bookings that require CS support.

Updated: 2025-03-21 17:30:30

标题: 预测潜在客户支持需求并优化双边市场中的搜索排名

摘要: Airbnb是一个在线市场，连接房东和客人，为他们提供独特的住宿和体验。当客人在Airbnb上预订的住宿时，有一小部分住宿会导致需要从Airbnb的客户支持（CS）得到帮助，这可能给客人和房东带来不便，并需要Airbnb投入资源来解决。在这项工作中，我们展示了基于房东和客人行为可以预测需要CS支持的情况。我们建立了一个模型，预测每对客人和房东的CS支持需求的可能性。该模型分数被纳入Airbnb的搜索排名算法中的一个因素。这一变化促进了搜索结果中更可靠的匹配，并显著减少了需要CS支持的预订。

更新时间: 2025-03-21 17:30:30

领域: cs.LG

下载: http://arxiv.org/abs/2503.17329v1

Automated Market Makers in Cryptoeconomic Systems: A Taxonomy and Archetypes

Designing automated market makers (AMMs) is crucial for decentralized token exchanges in cryptoeconomic systems. At the intersection of software engineering and economics, AMM design is complex and, if done incorrectly, can lead to financial risks and inefficiencies. We developed an AMM taxonomy for systematically comparing AMM designs and propose three AMM archetypes that meet key requirements for token issuance and exchange. This work bridges software engineering and economic perspectives, providing insights to help developers design AMMs tailored to diverse use cases and foster sustainable cryptoeconomic systems.

Updated: 2025-03-21 17:23:58

标题: 加密经济系统中的自动化做市商:分类法和原型

摘要: 设计自动化做市商（AMMs）对于加密经济系统中的去中心化代币交易至关重要。在软件工程和经济学交叉点上，AMM设计是复杂的，如果设计不当，可能会导致金融风险和低效率。我们开发了一个AMM分类法，用于系统比较AMM设计，并提出了三种满足代币发行和交易关键要求的AMM原型。这项工作架起了软件工程和经济学的视角之间的桥梁，为帮助开发人员设计针对不同用例的AMMs，并促进可持续的加密经济系统提供了洞见。

更新时间: 2025-03-21 17:23:58

领域: q-fin.TR,cs.CR

下载: http://arxiv.org/abs/2309.12818v3

SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/.

Updated: 2025-03-21 17:22:28

标题: SOUS VIDE: 在高斯飞溅真空中烹饪视觉无人机导航策略

摘要: 我们提出了一种新的模拟器、训练方法和策略架构，统称为SOUS VIDE，用于端到端的视觉无人机导航。我们训练的策略展示了零样本从模拟到真实的转移，仅使用机载感知和计算即可在现实世界中表现出强大的性能。我们的模拟器名为FiGS，它将计算简单的无人机动力学模型与高视觉保真度的高斯喷溅场景重建相结合。FiGS可以快速模拟无人机飞行，以每秒130帧的速度生成逼真的图像。我们使用FiGS从具有特权状态和动力学信息的专家MPC中收集了10万至30万个图像/状态-动作对，这些对在动力学参数和空间干扰上进行了随机化。然后，我们将这个专家MPC提炼成一个端到端的视觉运动策略，使用一种轻量级的神经架构，称为SV-Net。SV-Net将彩色图像、光流和IMU数据流处理成每秒20次的低级推力和机身速率指令，机载在无人机上。关键地，SV-Net包括一个用于低级控制的学习模块，在运行时适应无人机动力学的变化。在一系列105个硬件实验中，我们展示了SOUS VIDE策略对30%的质量变化、40m/s的阵风、环境亮度变化60%、场景中移动或移除物体以及人员在无人机的视野中进行激烈移动的鲁棒性。代码、数据和实验视频可以在我们的项目页面上找到：https://stanfordmsl.github.io/SousVide/。

更新时间: 2025-03-21 17:22:28

领域: cs.RO,cs.CV,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2412.16346v2

PA-CFL: Privacy-Adaptive Clustered Federated Learning for Transformer-Based Sales Forecasting on Heterogeneous Retail Data

Federated learning (FL) enables retailers to share model parameters for demand forecasting while maintaining privacy. However, heterogeneous data across diverse regions, driven by factors such as varying consumer behavior, poses challenges to the effectiveness of federated learning. To tackle this challenge, we propose Privacy-Adaptive Clustered Federated Learning (PA-CFL) tailored for demand forecasting on heterogeneous retail data. By leveraging differential privacy and feature importance distribution, PA-CFL groups retailers into distinct ``bubbles'', each forming its own federated learning system to effectively isolate data heterogeneity. Within each bubble, Transformer models are designed to predict local sales for each client. Our experiments demonstrate that PA-CFL significantly surpasses FedAvg and outperforms local learning in demand forecasting performance across all participating clients. Compared to local learning, PA-CFL achieves a 5.4% improvement in R^2, a 69% reduction in RMSE, and a 45% decrease in MAE. Our approach enables effective FL through adaptive adjustments to diverse noise levels and the range of clients participating in each bubble. By grouping participants and proactively filtering out high-risk clients, PA-CFL mitigates potential threats to the FL system. The findings demonstrate PA-CFL's ability to enhance federated learning in time series prediction tasks with heterogeneous data, achieving a balance between forecasting accuracy and privacy preservation in retail applications. Additionally, PA-CFL's capability to detect and neutralize poisoned data from clients enhances the system's robustness and reliability.

Updated: 2025-03-21 17:13:19

标题: PA-CFL：基于变压器的异构零售数据销售预测的隐私自适应聚类联邦学习

摘要: 联邦学习（FL）使零售商能够在保持隐私的同时共享需求预测的模型参数。然而，不同地区之间异构数据的差异，受到消费者行为等因素的影响，对联邦学习的有效性构成挑战。为了解决这一挑战，我们提出了专为异构零售数据需求预测定制的隐私自适应聚类联邦学习（PA-CFL）。通过利用差分隐私和特征重要性分布，PA-CFL将零售商分组成不同的“气泡”，每个气泡形成自己的联邦学习系统，有效地隔离数据的异构性。在每个气泡内，Transformer模型被设计用于预测每个客户的本地销售情况。我们的实验表明，与FedAvg相比，PA-CFL在所有参与客户的需求预测性能方面明显优于本地学习。与本地学习相比，PA-CFL在R^2方面实现了5.4%的改善，在RMSE方面减少了69%，在MAE方面减少了45%。我们的方法通过对每个气泡中参与客户的噪声水平和范围进行自适应调整，实现了有效的联邦学习。通过对参与者进行分组并主动过滤高风险客户，PA-CFL减轻了对FL系统的潜在威胁。研究结果表明，PA-CFL能够在具有异构数据的时间序列预测任务中增强联邦学习，在零售应用中实现了预测准确性和隐私保护之间的平衡。此外，PA-CFL的能力可以检测并中和来自客户的有毒数据，增强系统的鲁棒性和可靠性。

更新时间: 2025-03-21 17:13:19

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2503.12220v2

DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring

Coronary artery disease (CAD), one of the leading causes of mortality worldwide, necessitates effective risk assessment strategies, with coronary artery calcium (CAC) scoring via computed tomography (CT) being a key method for prevention. Traditional methods, primarily based on UNET architectures implemented on pre-built models, face challenges like the scarcity of annotated CT scans containing CAC and imbalanced datasets, leading to reduced performance in segmentation and scoring tasks. In this study, we address these limitations by incorporating the self-supervised learning (SSL) technique of DINO (self-distillation with no labels), which trains without requiring CAC-specific annotations, enhancing its robustness in generating distinct features. The DINO-LG model, which leverages label guidance to focus on calcified areas, achieves significant improvements, with a sensitivity of 89% and specificity of 90% for detecting CAC-containing CT slices, compared to the standard DINO model's sensitivity of 79% and specificity of 77%. Additionally, false-negative and false-positive rates are reduced by 49% and 59%, respectively, instilling greater confidence in clinicians when ruling out calcification in low-risk patients and minimizing unnecessary imaging reviews by radiologists. Further, CAC scoring and segmentation tasks are conducted using a basic UNET architecture, applied specifically to CT slices identified by the DINO-LG model as containing calcified areas. This targeted approach enhances CAC scoring accuracy by feeding the UNET model with relevant slices, significantly improving diagnostic precision, reducing both false positives and false negatives, and ultimately lowering overall healthcare costs by minimizing unnecessary tests and treatments, presenting a valuable advancement in CAD risk assessment.

Updated: 2025-03-21 17:06:08

标题: DINO-LG：一种用于冠状动脉钙化评分的特定任务的DINO模型

摘要: 冠状动脉疾病（CAD）是全球主要死因之一，需要有效的风险评估策略，冠状动脉钙化（CAC）评分通过计算机断层扫描（CT）是预防的关键方法之一。传统方法主要基于预构建模型上实施的UNET架构，面临着CT扫描中CAC含量有限和数据集不平衡等挑战，导致分割和评分任务性能降低。在本研究中，我们通过整合DINO（无标签自我蒸馏）的自监督学习（SSL）技术来解决这些限制，该技术在不需要特定CAC注释的情况下进行训练，增强了生成独特特征的稳健性。DINO-LG模型利用标签指导来专注于钙化区域，取得了显著改进，对于检测含CAC的CT切片，灵敏度达到89%，特异度达到90%，而标准DINO模型的灵敏度为79%，特异度为77%。此外，假阴性和假阳性率分别减少了49%和59%，使临床医生在排除低风险患者的钙化时更加自信，并通过减少放射科医师的不必要的影像复查来降低整体医疗成本。此外，使用基本的UNET架构进行CAC评分和分割任务，特别应用于DINO-LG模型确定为含有钙化区域的CT切片。这种有针对性的方法通过向UNET模型提供相关切片来增强CAC评分的准确性，显着提高了诊断精度，减少假阳性和假阴性，并最终通过减少不必要的检查和治疗降低整体医疗成本，为CAD风险评估带来了有价值的进展。

更新时间: 2025-03-21 17:06:08

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.07976v7

LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language

Bimanual robotic manipulation provides significant versatility, but also presents an inherent challenge due to the complexity involved in the spatial and temporal coordination between two hands. Existing works predominantly focus on attaining human-level manipulation skills for robotic hands, yet little attention has been paid to task planning on long-horizon timescales. With their outstanding in-context learning and zero-shot generation abilities, Large Language Models (LLMs) have been applied and grounded in diverse robotic embodiments to facilitate task planning. However, LLMs still suffer from errors in long-horizon reasoning and from hallucinations in complex robotic tasks, lacking a guarantee of logical correctness when generating the plan. Previous works, such as LLM+P, extended LLMs with symbolic planners. However, none have been successfully applied to bimanual robots. New challenges inevitably arise in bimanual manipulation, necessitating not only effective task decomposition but also efficient task allocation. To address these challenges, this paper introduces LLM+MAP, a bimanual planning framework that integrates LLM reasoning and multi-agent planning, automating effective and efficient bimanual task planning. We conduct simulated experiments on various long-horizon manipulation tasks of differing complexity. Our method is built using GPT-4o as the backend, and we compare its performance against plans generated directly by LLMs, including GPT-4o, V3 and also recent strong reasoning models o1 and R1. By analyzing metrics such as planning time, success rate, group debits, and planning-step reduction rate, we demonstrate the superior performance of LLM+MAP, while also providing insights into robotic reasoning. Code is available at https://github.com/Kchu/LLM-MAP.

Updated: 2025-03-21 17:04:01

标题: LLM+MAP：使用大型语言模型和规划领域定义语言进行双手机器人任务规划

摘要: 双手机器人操作提供了显著的灵活性，但也由于涉及两只手之间的空间和时间协调的复杂性而带来了固有挑战。现有的研究主要集中在为机器人手获得类似于人类水平的操作技能，然而对长期规划的任务却付之闻者寡。大型语言模型(LLMs)凭借其出色的上下文学习和零-shot生成能力，已被应用于各种机器人实体中以促进任务规划。然而，LLMs 仍然存在长期推理中的错误和复杂机器人任务中的幻觉，生成计划时缺乏逻辑正确性的保证。以往的工作，如LLM+P，将LLMs与符号规划器扩展。然而，尚未成功应用于双手机器人。在双手操作中不可避免地出现新挑战，不仅需要有效的任务分解，还需要高效的任务分配。为了解决这些挑战，本文介绍了LLM+MAP，一种将LLM推理和多智能体规划相结合的双手规划框架，自动化有效且高效的双手任务规划。我们在不同复杂性的各种长期操作任务上进行了模拟实验。我们的方法是使用GPT-4o作为后端构建的，并将其性能与LLMs(包括GPT-4o、V3以及最近的强推理模型o1和R1)直接生成的计划进行比较。通过分析规划时间、成功率、团队债务和规划步骤减少率等指标，我们展示了LLM+MAP的卓越性能，同时也提供了关于机器人推理的见解。代码可在https://github.com/Kchu/LLM-MAP获取。

更新时间: 2025-03-21 17:04:01

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.17309v1

On Quantum Perceptron Learning via Quantum Search

With the growing interest in quantum machine learning, the perceptron -- a fundamental building block in traditional machine learning -- has emerged as a valuable model for exploring quantum advantages. Two quantum perceptron algorithms based on Grover's search, were developed in arXiv:1602.04799 to accelerate training and improve statistical efficiency in perceptron learning. This paper points out and corrects a mistake in the proof of Theorem 2 in arXiv:1602.04799. Specifically, we show that the probability of sampling from a normal distribution for a $D$-dimensional hyperplane that perfectly classifies the data scales as $\Omega(\gamma^{D})$ instead of $\Theta({\gamma})$, where $\gamma$ is the margin. We then revisit two well-established linear programming algorithms -- the ellipsoid method and the cutting plane random walk algorithm -- in the context of perceptron learning, and show how quantum search algorithms can be leveraged to enhance the overall complexity. Specifically, both algorithms gain a sub-linear speed-up $O(\sqrt{N})$ in the number of data points $N$ as a result of Grover's algorithm and an additional $O(D^{1.5})$ speed-up is possible for cutting plane random walk algorithm employing quantum walk search.

Updated: 2025-03-21 16:57:30

标题: 关于通过量子搜索实现的量子感知器学习

摘要: 随着量子机器学习日益受到关注，感知器——传统机器学习中的基本构建模块——已成为探索量子优势的有价值模型。基于Grover搜索的两种量子感知器算法在arXiv:1602.04799中开发，以加速训练并提高感知器学习的统计效率。本文指出并纠正了arXiv:1602.04799中定理2证明中的一个错误。具体来说，我们表明，从能够完全分类数据的$D$维超平面中进行正态分布抽样的概率的规模为$\Omega(\gamma^{D})$而不是$\Theta({\gamma})$，其中$\gamma$是边界。然后，我们重新审视两种成熟的线性规划算法——椭球体方法和割平面随机游走算法——在感知器学习的背景下，并展示了如何利用量子搜索算法来增强整体复杂性。具体而言，由于Grover算法，这两种算法在数据点数量$N$方面获得了次线性加速$O(\sqrt{N})，并且采用量子漫步搜索的割平面随机游走算法可能实现额外的$O(D^{1.5})$加速。

更新时间: 2025-03-21 16:57:30

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17308v1

ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving

Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significant challenges due to their complex architectures and heterogeneous characteristics across their multi-stage inference pipelines. We present the first comprehensive systems analysis of two prominent LMM architectures, decoder-only and cross-attention, across six representative open-source models, revealing key systems design implications. We also present an in-depth analysis of production LMM inference traces, uncovering unique workload characteristics, including variable, heavy-tailed request distributions and bursty traffic patterns. Based on these insights, we propose ModServe, a modular LMM serving system that decouples stages for independent optimization and adaptive scaling. ModServe dynamically reconfigures stages and handles bursty traffic with modality-aware scheduling and autoscaling to meet tail latency SLOs while minimizing costs. ModServe achieves 3.3-5.5x higher throughput (leading to 25-41.3% cost saving) while meeting SLOs on a 128-GPU cluster with production traces.

Updated: 2025-03-21 16:53:47

标题: ModServe：可扩展且资源高效的大型多模态模型服务

摘要: 大型多模态模型（LMMs）展示了在理解图像、视频和音频等文本以外的内容方面具有令人印象深刻的能力。然而，在生产环境中有效地提供LMMs存在重大挑战，这是由于它们复杂的体系结构以及跨多阶段推理流程中的异构特征。我们提出了对两种显著的LMM架构（仅解码器和交叉注意力）进行首次全面的系统分析，跨越了六个代表性的开源模型，揭示了关键的系统设计影响。我们还对生产LMM推理跟踪进行了深入分析，发现了独特的工作负载特征，包括可变的、重尾的请求分布和突发的流量模式。基于这些见解，我们提出了ModServe，一个模块化的LMM服务系统，将阶段解耦以进行独立优化和自适应扩展。ModServe通过模态感知调度和自动扩展来动态重新配置阶段，并处理突发流量，以满足尾延迟SLOs，同时最小化成本。ModServe在一个具有生产跟踪的128-GPU集群上实现了3.3-5.5倍的吞吐量提升（导致25-41.3%的成本节约），同时满足了SLOs。

更新时间: 2025-03-21 16:53:47

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2502.00937v2

Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests

As software systems grow increasingly complex, ensuring security during development poses significant challenges. Traditional manual code audits are often expensive, time-intensive, and ill-suited for fast-paced workflows, while automated tools frequently suffer from high false-positive rates, limiting their reliability. To address these issues, we introduce Bugdar, an AI-augmented code review system that integrates seamlessly into GitHub pull requests, providing near real-time, context-aware vulnerability analysis. Bugdar leverages fine-tunable Large Language Models (LLMs) and Retrieval Augmented Generation (RAGs) to deliver project-specific, actionable feedback that aligns with each codebase's unique requirements and developer practices. Supporting multiple programming languages, including Solidity, Move, Rust, and Python, Bugdar demonstrates exceptional efficiency, processing an average of 56.4 seconds per pull request or 30 lines of code per second. This is significantly faster than manual reviews, which could take hours per pull request. By facilitating a proactive approach to secure coding, Bugdar reduces the reliance on manual reviews, accelerates development cycles, and enhances the security posture of software systems without compromising productivity.

Updated: 2025-03-21 16:52:03

标题: Bugdar：GitHub拉取请求的AI增强安全代码审查

摘要: 随着软件系统变得越来越复杂，确保在开发过程中的安全性面临着重大挑战。传统的手动代码审查通常昂贵、耗时，并且不适合快节奏的工作流程，而自动化工具经常存在高假阳性率的问题，限制了它们的可靠性。为了解决这些问题，我们引入了 Bugdar，这是一个AI增强的代码审查系统，可以无缝集成到GitHub的Pull请求中，提供近实时、上下文感知的漏洞分析。Bugdar利用可调整的大型语言模型（LLMs）和检索增强生成（RAGs）来提供项目特定、可操作的反馈，符合每个代码库独特的要求和开发人员实践。Bugdar支持多种编程语言，包括Solidity、Move、Rust和Python，展示了出色的效率，每个Pull请求平均处理时间为56.4秒，或每秒处理30行代码。这明显快于手动审查，可能需要数小时才能完成一个Pull请求。通过促进一种积极的安全编码方法，Bugdar减少了对手动审查的依赖，加速了开发周期，并增强了软件系统的安全性，而不影响生产力。

更新时间: 2025-03-21 16:52:03

领域: cs.CR,cs.HC,cs.SE

下载: http://arxiv.org/abs/2503.17302v1

Preference-Guided Diffusion for Multi-Objective Offline Optimization

Offline multi-objective optimization aims to identify Pareto-optimal solutions given a dataset of designs and their objective values. In this work, we propose a preference-guided diffusion model that generates Pareto-optimal designs by leveraging a classifier-based guidance mechanism. Our guidance classifier is a preference model trained to predict the probability that one design dominates another, directing the diffusion model toward optimal regions of the design space. Crucially, this preference model generalizes beyond the training distribution, enabling the discovery of Pareto-optimal solutions outside the observed dataset. We introduce a novel diversity-aware preference guidance, augmenting Pareto dominance preference with diversity criteria. This ensures that generated solutions are optimal and well-distributed across the objective space, a capability absent in prior generative methods for offline multi-objective optimization. We evaluate our approach on various continuous offline multi-objective optimization tasks and find that it consistently outperforms other inverse/generative approaches while remaining competitive with forward/surrogate-based optimization methods. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions that approximate the Pareto front well.

Updated: 2025-03-21 16:49:38

标题: 偏好引导扩散用于多目标离线优化

摘要: 离线多目标优化旨在在给定设计数据集和它们的目标值的情况下，识别帕累托最优解。在这项工作中，我们提出了一种基于偏好引导的扩散模型，通过利用基于分类器的引导机制生成帕累托最优设计。我们的引导分类器是一个训练有素的偏好模型，用于预测一个设计支配另一个设计的概率，将扩散模型引导到设计空间的最优区域。关键是，这个偏好模型可以泛化到训练分布之外，从而发现超出观察数据集范围的帕累托最优解。我们引入了一种新颖的多样性感知偏好引导，将帕累托支配偏好与多样性准则相结合。这确保生成的解在客观空间上是最优的，并且在先前的离线多目标优化生成方法中缺乏的情况下，分布良好。我们在各种连续离线多目标优化任务上评估了我们的方法，并发现它在保持与前向/代理基础优化方法竞争力的同时，始终优于其他逆向/生成方法。我们的结果突出了分类器引导扩散模型在生成多样性和高质量解决方案方面的有效性，这些解决方案很好地逼近了帕累托前沿。

更新时间: 2025-03-21 16:49:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17299v1

UAV Resilience Against Stealthy Attacks

Unmanned aerial vehicles (UAVs) depend on untrusted software components to automate dangerous or critical missions, making them a desirable target for attacks. Some work has been done to prevent an attacker who has either compromised a ground control station or parts of a UAV's software from sabotaging the vehicle, but not both. We present an architecture running a UAV software stack with runtime monitoring and seL4-based software isolation that prevents attackers from both exploiting software bugs and utilizing stealthy attacks. Our architecture retrofits legacy UAVs and secures the popular MAVLink protocol, making wide adoption possible.

Updated: 2025-03-21 16:48:11

标题: 无人机对抗隐身攻击的弹性

摘要: 无人机（UAVs）依赖不可信的软件组件来自动化危险或关键任务，使它们成为攻击目标。一些工作已经完成，以防止攻击者入侵地面控制站或无人机软件的部分来破坏飞行器，但还没有同时防止这两种情况发生。我们提出了一种架构，利用运行时监控和基于seL4的软件隔离来运行无人机软件堆栈，防止攻击者利用软件漏洞和隐秘攻击。我们的架构可以对传统无人机进行改造，并保护流行的MAVLink协议，使其广泛采用成为可能。

更新时间: 2025-03-21 16:48:11

领域: cs.CR

下载: http://arxiv.org/abs/2503.17298v1

Graph Masked Language Models

Language Models (LMs) and Graph Neural Networks (GNNs) have shown great promise in their respective areas, yet integrating structured graph data with rich textual information remains challenging. In this work, we propose \emph{Graph Masked Language Models} (GMLM), a novel dual-branch architecture that combines the structural learning of GNNs with the contextual power of pretrained language models. Our approach introduces two key innovations: (i) a \emph{semantic masking strategy} that leverages graph topology to selectively mask nodes based on their structural importance, and (ii) a \emph{soft masking mechanism} that interpolates between original node features and a learnable mask token, ensuring smoother information flow during training. Extensive experiments on multiple node classification and language understanding benchmarks demonstrate that GMLM not only achieves state-of-the-art performance but also exhibits enhanced robustness and stability. This work underscores the benefits of integrating structured and unstructured data representations for improved graph learning.

Updated: 2025-03-21 16:42:49

标题: 图遮蔽语言模型

摘要: 语言模型（LMs）和图神经网络（GNNs）在各自领域表现出极大的潜力，然而将结构化图数据与丰富的文本信息整合仍然具有挑战性。在这项工作中，我们提出了一种新颖的双分支架构，名为\emph{图掩码语言模型}（Graph Masked Language Models，GMLM），将GNNs的结构学习与预训练语言模型的上下文能力相结合。我们的方法引入了两个关键创新：（i）一种\emph{语义掩码策略}，利用图拓扑结构有选择地掩盖节点，基于它们的结构重要性；以及（ii）一种\emph{软掩码机制}，在训练过程中在原始节点特征和可学习的掩码令牌之间插值，确保信息流的平滑传递。在多个节点分类和语言理解基准测试上进行的大量实验表明，GMLM不仅实现了最先进的性能，而且表现出增强的鲁棒性和稳定性。这项工作强调了将结构化和非结构化数据表示整合以改善图学习的好处。

更新时间: 2025-03-21 16:42:49

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.05763v2

Calibration Strategies for Robust Causal Estimation: Theoretical and Empirical Insights on Propensity Score Based Estimators

The partitioning of data for estimation and calibration critically impacts the performance of propensity score based estimators like inverse probability weighting (IPW) and double/debiased machine learning (DML) frameworks. We extend recent advances in calibration techniques for propensity score estimation, improving the robustness of propensity scores in challenging settings such as limited overlap, small sample sizes, or unbalanced data. Our contributions are twofold: First, we provide a theoretical analysis of the properties of calibrated estimators in the context of DML. To this end, we refine existing calibration frameworks for propensity score models, with a particular emphasis on the role of sample-splitting schemes in ensuring valid causal inference. Second, through extensive simulations, we show that calibration reduces variance of inverse-based propensity score estimators while also mitigating bias in IPW, even in small-sample regimes. Notably, calibration improves stability for flexible learners (e.g., gradient boosting) while preserving the doubly robust properties of DML. A key insight is that, even when methods perform well without calibration, incorporating a calibration step does not degrade performance, provided that an appropriate sample-splitting approach is chosen.

Updated: 2025-03-21 16:41:10

标题: 稳健因果估计的校准策略：基于倾向评分估计器的理论和实证见解

摘要: 数据分区对估计和校准的影响对于倾向得分基于估计器如逆概率加权（IPW）和双/无偏机器学习（DML）框架的性能至关重要。我们扩展了最近在倾向得分估计的校准技术方面的进展，改善了在挑战性环境中（如有限重叠、小样本量或不平衡数据）中倾向得分的鲁棒性。我们的贡献有两方面：首先，我们在DML的背景下提供了校准估计器性质的理论分析。为此，我们完善了现有的倾向得分模型的校准框架，特别强调了样本分割方案在确保有效因果推断中的作用。其次，通过广泛的模拟，我们展示了校准可以降低基于逆概率的倾向得分估计器的方差，同时在IPW中减少偏差，即使在小样本情况下也是如此。值得注意的是，校准改善了对于灵活学习者（如梯度提升）的稳定性，同时保持了DML的双重稳健性质。一个关键的见解是，即使在没有校准的情况下方法表现良好，只要选择适当的样本分割方法，加入校准步骤不会降低性能。

更新时间: 2025-03-21 16:41:10

领域: stat.ML,cs.LG,econ.EM,stat.ME

下载: http://arxiv.org/abs/2503.17290v1

3D Neural Operator-Based Flow Surrogates around 3D geometries: Signed Distance Functions and Derivative Constraints

Accurate modeling of fluid dynamics around complex geometries is critical for applications such as aerodynamic optimization and biomedical device design. While advancements in numerical methods and high-performance computing have improved simulation capabilities, the computational cost of high-fidelity 3D flow simulations remains a significant challenge. Scientific machine learning (SciML) offers an efficient alternative, enabling rapid and reliable flow predictions. In this study, we evaluate Deep Operator Networks (DeepONet) and Geometric-DeepONet, a variant that incorporates geometry information via signed distance functions (SDFs), on steady-state 3D flow over complex objects. Our dataset consists of 1,000 high-fidelity simulations spanning Reynolds numbers from 10 to 1,000, enabling comprehensive training and evaluation across a range of flow regimes. To assess model generalization, we test our models on a random and extrapolatory train-test splitting. Additionally, we explore a derivative-informed training strategy that augments standard loss functions with velocity gradient penalties and incompressibility constraints, improving physics consistency in 3D flow prediction. Our results show that Geometric-DeepONet improves boundary-layer accuracy by up to 32% compared to standard DeepONet. Moreover, incorporating derivative constraints enhances gradient accuracy by 25% in interpolation tasks and up to 45% in extrapolatory test scenarios, suggesting significant improvement in generalization capabilities to unseen 3D Reynolds numbers.

Updated: 2025-03-21 16:40:48

标题: 基于3D神经算子的围绕3D几何形状的流速代理：有符号距离函数和导数约束

摘要: 复杂几何形态周围的流体动力学准确建模对于空气动力学优化和生物医学器件设计等应用至关重要。虽然数值方法和高性能计算的进步已经提高了模拟能力，但高保真度的三维流动模拟的计算成本仍然是一个重要挑战。科学机器学习（SciML）提供了一种高效的替代方案，可以实现快速和可靠的流动预测。在本研究中，我们评估了深度操作网络（DeepONet）和几何-深度操作网络，后者通过符号距离函数（SDFs）融入几何信息，用于复杂对象上的稳态三维流动。我们的数据集包括从雷诺数为10到1,000的高保真度模拟，为跨流动区域的全面训练和评估提供了基础。为了评估模型的泛化能力，我们在随机和外推的训练-测试分割上测试了我们的模型。此外，我们探索了一种基于导数的训练策略，通过将速度梯度惩罚和不可压缩性约束与标准损失函数相结合，改进了三维流动预测中的物理一致性。我们的结果显示，与标准DeepONet相比，几何-深度操作网络将边界层准确性提高了高达32％。此外，融合导数约束在插值任务中提高了25％的梯度准确性，在外推测试场景中提高了高达45％，表明在未见过的三维雷诺数上的泛化能力显著提高。

更新时间: 2025-03-21 16:40:48

领域: cs.LG

下载: http://arxiv.org/abs/2503.17289v1

Exploring a Principled Framework for Deep Subspace Clustering

Subspace clustering is a classical unsupervised learning task, built on a basic assumption that high-dimensional data can be approximated by a union of subspaces (UoS). Nevertheless, the real-world data are often deviating from the UoS assumption. To address this challenge, state-of-the-art deep subspace clustering algorithms attempt to jointly learn UoS representations and self-expressive coefficients. However, the general framework of the existing algorithms suffers from a catastrophic feature collapse and lacks a theoretical guarantee to learn desired UoS representation. In this paper, we present a Principled fRamewOrk for Deep Subspace Clustering (PRO-DSC), which is designed to learn structured representations and self-expressive coefficients in a unified manner. Specifically, in PRO-DSC, we incorporate an effective regularization on the learned representations into the self-expressive model, prove that the regularized self-expressive model is able to prevent feature space collapse, and demonstrate that the learned optimal representations under certain condition lie on a union of orthogonal subspaces. Moreover, we provide a scalable and efficient approach to implement our PRO-DSC and conduct extensive experiments to verify our theoretical findings and demonstrate the superior performance of our proposed deep subspace clustering approach. The code is available at https://github.com/mengxianghan123/PRO-DSC.

Updated: 2025-03-21 16:38:37

标题: 探索深度子空间聚类的原则性框架

摘要: 子空间聚类是一种经典的无监督学习任务，建立在一个基本假设上，即高维数据可以近似表示为子空间的并集（UoS）。然而，现实世界的数据往往偏离了UoS的假设。为了解决这一挑战，最先进的深度子空间聚类算法试图同时学习UoS表示和自表达系数。然而，现有算法的一般框架存在灾难性特征崩溃，并且缺乏学习期望UoS表示的理论保证。在本文中，我们提出了一种深度子空间聚类的原则性框架（PRO-DSC），旨在以统一的方式学习结构化表示和自表达系数。具体地，在PRO-DSC中，我们将对学习表示的有效正则化纳入到自表达模型中，证明经过正则化的自表达模型能够防止特征空间崩溃，并展示在某些条件下学习到的最优表示位于一组正交子空间的并集上。此外，我们提供了一个可扩展和高效的方法来实现我们的PRO-DSC，并进行了大量实验来验证我们的理论发现，并展示我们提出的深度子空间聚类方法的卓越性能。代码可在https://github.com/mengxianghan123/PRO-DSC上找到。

更新时间: 2025-03-21 16:38:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17288v1

Offline Model-Based Optimization: Comprehensive Review

Offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking(reward hacking), exploiting model inaccuracies in unseen regions, or other spurious optimizations that yield misleadingly high performance estimates outside the training distribution. Recent advances in model-based optimization(MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field including safe control of superintelligent systems.

Updated: 2025-03-21 16:35:02

标题: 离线基于模型的优化：综合评论

摘要: 线下优化是科学和工程领域中的一个基本挑战，其目标是仅使用线下数据集优化黑匣子函数。当查询目标函数成本过高或不可行时，这种设置尤为重要，应用涵盖蛋白工程、材料发现、神经结构搜索等领域。主要困难在于准确估计可用数据之外的目标景观，其中外推充满了显著的认识不确定性。这种不确定性可能导致目标黑客(奖励黑客)，利用模型不准确性在未知区域进行优化，或者其他虚假优化，导致在训练分布之外误导性地高性能估计。最近，基于模型的优化(MBO)的进展利用深度神经网络的泛化能力开发了线下特定的替代和生成模型。通过精心设计的策略进行训练，这些模型更具抗击分布问题的鲁棒性，有助于发现改进设计。尽管其在加速科学发现方面的影响日益增强，但该领域缺乏全面的评估。为了弥补这一差距，我们提出了线下MBO的第一次全面评估。我们首先对单目标和多目标设置的问题进行形式化，并审查了最近的基准和评估指标。然后将现有方法分类为两个关键领域：替代模型，重点在于准确地近似分布区域中的函数，以及生成模型，探索高维设计空间以识别高性能设计。最后，我们审查了关键挑战，并提出了在这个快速发展领域中的有前途的发展方向，包括对超智能系统的安全控制。

更新时间: 2025-03-21 16:35:02

领域: cs.LG

下载: http://arxiv.org/abs/2503.17286v1

Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1

In recent years, the development of Large Language Models (LLMs) has made significant breakthroughs in the field of natural language processing and has gradually been applied to the field of humanities and social sciences research. LLMs have a wide range of application value in the field of humanities and social sciences because of its strong text understanding, generation and reasoning capabilities. In humanities and social sciences research, LLMs can analyze large-scale text data and make inferences. This article analyzes the large language model DeepSeek-R1 from seven aspects: low-resource language translation, educational question-answering, student writing improvement in higher education, logical reasoning, educational measurement and psychometrics, public health policy analysis, and art education.Then we compare the answers given by DeepSeek-R1 in the seven aspects with the answers given by o1-preview. DeepSeek-R1 performs well in the humanities and social sciences, answering most questions correctly and logically, and can give reasonable analysis processes and explanations. Compared with o1-preview, it can automatically generate reasoning processes and provide more detailed explanations, which is suitable for beginners or people who need to have a detailed understanding of this knowledge, while o1-preview is more suitable for quick reading. Through analysis, it is found that LLM has broad application potential in the field of humanities and social sciences, and shows great advantages in improving text analysis efficiency, language communication and other fields. LLM's powerful language understanding and generation capabilities enable it to deeply explore complex problems in the field of humanities and social sciences, and provide innovative tools for academic research and practical applications.

Updated: 2025-03-21 16:34:40

标题: 桥接技术与人文学科：评估大型语言模型在社会科学研究中的影响与DeepSeek-R1

摘要: 近年来，大型语言模型（LLMs）的发展在自然语言处理领域取得了重大突破，并逐渐应用到人文社会科学研究领域。由于其强大的文本理解、生成和推理能力，LLMs在人文社会科学领域具有广泛的应用价值。在人文社会科学研究中，LLMs能够分析大规模文本数据并作出推理。本文从七个方面分析了大型语言模型DeepSeek-R1：低资源语言翻译、教育问答、高等教育中学生写作改进、逻辑推理、教育测量与心理测量、公共卫生政策分析和艺术教育。然后我们将DeepSeek-R1在这七个方面给出的答案与o1-preview给出的答案进行比较。DeepSeek-R1在人文社会科学领域表现出色，大部分问题得到了正确和合理的回答，并能提供合理的分析过程和解释。与o1-preview相比，它能够自动生成推理过程并提供更详细的解释，适合初学者或需要对这一知识有详细了解的人，而o1-preview更适合快速阅读。通过分析发现，LLMs在人文社会科学领域具有广泛的应用潜力，在提高文本分析效率、语言交流等领域显示出极大优势。LLMs强大的语言理解和生成能力使其能够深入探索人文社会科学领域复杂问题，并为学术研究和实际应用提供创新工具。

更新时间: 2025-03-21 16:34:40

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.16304v2

Karyotype AI for Precision Oncology

We present a machine learning method capable of accurately detecting chromosome abnormalities that cause blood cancers directly from microscope images of the metaphase stage of cell division. The pipeline is built on a series of fine-tuned Vision Transformers. Current state of the art (and standard clinical practice) requires expensive, manual expert analysis, whereas our pipeline takes only 15 seconds per metaphase image. Using a novel pretraining-finetuning strategy to mitigate the challenge of data scarcity, we achieve a high precision-recall score of 94% AUC for the clinically significant del(5q) and t(9;22) anomalies. Our method also unlocks zero-shot detection of rare aberrations based on model latent embeddings. The ability to quickly, accurately, and scalably diagnose genetic abnormalities directly from metaphase images could transform karyotyping practice and improve patient outcomes. We will make code publicly available.

Updated: 2025-03-21 16:34:17

标题: Karyotype AI 用于精准肿瘤学

摘要: 我们提出了一种机器学习方法，能够准确地从细胞分裂的中期阶段的显微镜图像中直接检测导致血液癌症的染色体异常。该流水线建立在一系列经过微调的视觉转换器上。当前的技术水平（以及标准的临床实践）需要昂贵且手动的专家分析，而我们的流水线每个中期图像只需15秒。通过使用一种新颖的预训练微调策略来缓解数据稀缺性挑战，我们实现了对于临床显著的del（5q）和t（9;22）异常的高精确度-召回率得分达到了94％的AUC。我们的方法还通过模型潜在嵌入解锁了基于零样本检测稀有异常的能力。能够快速、准确、可扩展地直接从中期图像中诊断基因异常，可能会改变核型分析的实践并改善患者结果。我们将公开发布代码。

更新时间: 2025-03-21 16:34:17

领域: q-bio.QM,cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2211.14312v5

From Text to Talent: A Pipeline for Extracting Insights from Candidate Profiles

The recruitment process is undergoing a significant transformation with the increasing use of machine learning and natural language processing techniques. While previous studies have focused on automating candidate selection, the role of multiple vacancies in this process remains understudied. This paper addresses this gap by proposing a novel pipeline that leverages Large Language Models and graph similarity measures to suggest ideal candidates for specific job openings. Our approach represents candidate profiles as multimodal embeddings, enabling the capture of nuanced relationships between job requirements and candidate attributes. The proposed approach has significant implications for the recruitment industry, enabling companies to streamline their hiring processes and identify top talent more efficiently. Our work contributes to the growing body of research on the application of machine learning in human resources, highlighting the potential of LLMs and graph-based methods in revolutionizing the recruitment landscape.

Updated: 2025-03-21 16:18:44

标题: 从文本到人才：从候选人档案中提取见解的渠道

摘要: 招聘过程正经历着重大转变，随着机器学习和自然语言处理技术的日益应用。尽管先前的研究主要集中在自动化候选人选择上，但在这一过程中多个职位空缺的作用仍未得到充分研究。本文通过提出一种新颖的流程，利用大型语言模型和图相似度测量来为特定职位空缺建议理想候选人，填补了这一空缺。我们的方法将候选人档案表示为多模态嵌入，能够捕捉职位要求和候选人属性之间微妙的关系。所提出的方法对招聘行业有重要影响，使公司能够简化招聘流程，更有效地识别顶尖人才。我们的工作为机器学习在人力资源领域的应用研究增添了新的内容，突显了LLM和基于图的方法在改变招聘领域的潜力。

更新时间: 2025-03-21 16:18:44

领域: cs.CY,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.17438v1

Toward a method for LLM-enabled Indoor Navigation

Indoor navigation presents unique challenges due to complex layouts, lack of GPS signals, and accessibility concerns. Existing solutions often struggle with real-time adaptability and user-specific needs. In this work, we explore the potential of a Large Language Model (LLM), i.e., ChatGPT, to generate natural, context-aware navigation instructions from indoor map images. We design and evaluate test cases across different real-world environments, analyzing the effectiveness of LLMs in interpreting spatial layouts, handling user constraints, and planning efficient routes. Our findings demonstrate the potential of LLMs for supporting personalized indoor navigation, with an average of 52% correct indications and a maximum of 62%. The results do not appear to depend on the complexity of the layout or the complexity of the expected path, but rather on the number of points of interest and the abundance of visual information, which negatively affect the performance.

Updated: 2025-03-21 16:17:59

标题: 朝向一种LLM启用的室内导航方法

摘要: 室内导航面临独特的挑战，包括复杂的布局、缺乏GPS信号和无障碍问题。现有解决方案通常难以实时适应和满足用户特定需求。在这项工作中，我们探索了使用大型语言模型（LLM），即ChatGPT，从室内地图图像生成自然、上下文感知的导航指令的潜力。我们设计并评估了不同现实环境中的测试用例，分析了LLMs在解释空间布局、处理用户约束和规划高效路径方面的有效性。我们的研究结果表明，LLMs有潜力支持个性化室内导航，平均正确指示率为52%，最高为62%。结果似乎不取决于布局的复杂性或预期路径的复杂性，而取决于兴趣点的数量和视觉信息的丰富程度，这些因素会对性能产生负面影响。

更新时间: 2025-03-21 16:17:59

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.11702v2

Revisiting End To End Sparse Autoencoder Training -- A Short Finetune is All You Need

Sparse autoencoders (SAEs) are widely used for interpreting language model activations. A key evaluation metric is the increase in cross-entropy loss when replacing model activations with SAE reconstructions. Typically, SAEs are trained solely on mean squared error (MSE) using precomputed, shuffled activations. Recent work introduced training SAEs directly with a combination of KL divergence and MSE ("end-to-end" SAEs), significantly improving reconstruction accuracy at the cost of substantially increased computation, which has limited their widespread adoption. We propose a brief KL+MSE fine-tuning step applied only to the final 25M training tokens (just a few percent of typical training budgets) that achieves comparable improvements, reducing the cross-entropy loss gap by 20-50%, while incurring minimal additional computational cost. We further find that multiple fine-tuning methods (KL fine-tuning, LoRA adapters, linear adapters) yield similar, non-additive cross-entropy improvements, suggesting a common, easily correctable error source in MSE-trained SAEs. We demonstrate a straightforward method for effectively transferring hyperparameters and sparsity penalties despite scale differences between KL and MSE losses. While both ReLU and TopK SAEs see significant cross-entropy loss improvements, evaluations on supervised SAEBench metrics yield mixed results, suggesting practical benefits depend on both SAE architecture and the specific downstream task. Nonetheless, our method offers meaningful improvements in interpretability applications such as circuit analysis with minor additional cost.

Updated: 2025-03-21 16:15:49

标题: 重新审视端到端稀疏自编码器训练--短期微调就足够

摘要: 稀疏自编码器（SAEs）被广泛用于解释语言模型的激活。一个关键的评估指标是用SAE重构替换模型激活时交叉熵损失的增加。通常情况下，SAEs仅使用预先计算的、打乱顺序的激活进行均方误差（MSE）训练。最近的研究直接引入了KL散度和MSE的组合（“端到端”SAEs）来训练SAEs，显著提高了重构准确性，但也大大增加了计算量，限制了它们的广泛应用。我们提出了一个简短的KL+MSE微调步骤，仅应用于最后的25M训练标记（仅占典型训练预算的几个百分比），实现了可比的改进，将交叉熵损失差距减少了20-50%，而额外计算成本非常少。我们进一步发现，多种微调方法（KL微调、LoRA适配器、线性适配器）产生类似的、非加法的交叉熵改进，表明MSE训练的SAEs存在一个常见、易于纠正的错误源。我们展示了一种简单有效的方法，可以在KL和MSE损失之间的规模差异下有效地传递超参数和稀疏惩罚。虽然ReLU和TopK SAEs都看到了显著的交叉熵损失改进，但对监督SAEBench指标的评估结果却参差不齐，这表明实际效益取决于SAE架构和具体的下游任务。尽管如此，我们的方法在诸如电路分析等解释性应用中提供了有意义的改进，而额外成本则很小。

更新时间: 2025-03-21 16:15:49

领域: cs.LG

下载: http://arxiv.org/abs/2503.17272v1

Multi-Aggregator Time-Warping Heterogeneous Graph Neural Network for Personalized Micro-Video Recommendation

Micro-video recommendation is attracting global attention and becoming a popular daily service for people of all ages. Recently, Graph Neural Networks-based micro-video recommendation has displayed performance improvement for many kinds of recommendation tasks. However, the existing works fail to fully consider the characteristics of micro-videos, such as the high timeliness of news nature micro-video recommendation and sequential interactions of frequently changed interests. In this paper, a novel Multi-aggregator Time-warping Heterogeneous Graph Neural Network (MTHGNN) is proposed for personalized news nature micro-video recommendation based on sequential sessions, where characteristics of micro-videos are comprehensively studied, users' preference is mined via multi-aggregator, the temporal and dynamic changes of users' preference are captured, and timeliness is considered. Through the comparison with the state-of-the-arts, the experimental results validate the superiority of our MTHGNN model.

Updated: 2025-03-21 16:08:18

标题: 多聚合器时间扭曲异构图神经网络用于个性化微视频推荐

摘要: 微视频推荐正受到全球关注，并成为各年龄段人群的热门日常服务。最近，基于图神经网络的微视频推荐已经在许多推荐任务中显示出性能提升。然而，现有作品未能充分考虑微视频的特点，如新闻性微视频推荐的高及时性和经常改变兴趣的顺序交互。本文提出了一种新颖的基于多聚合器时间扭曲异构图神经网络（MTHGNN）用于基于顺序会话的个性化新闻性微视频推荐，其中综合研究了微视频的特点，通过多聚合器挖掘用户偏好，捕捉用户偏好的时间和动态变化，并考虑及时性。通过与现有技术的比较，实验结果验证了我们的MTHGNN模型的优越性。

更新时间: 2025-03-21 16:08:18

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2501.02666v2

Learning to Solve Related Linear Systems

Solving multiple parametrised related systems is an essential component of many numerical tasks. Borrowing strength from the solved systems and learning will make this process faster. In this work, we propose a novel probabilistic linear solver over the parameter space. This leverages information from the solved linear systems in a regression setting to provide an efficient posterior mean and covariance. We advocate using this as companion regression model for the preconditioned conjugate gradient method, and discuss the favourable properties of the posterior mean and covariance as the initial guess and preconditioner. We also provide several design choices for this companion solver. Numerical experiments showcase the benefits of using our novel solver in a hyperparameter optimisation problem.

Updated: 2025-03-21 16:05:45

标题: 学习解决相关线性系统

摘要: 解决多个参数化相关系统是许多数值任务的重要组成部分。从已解决的系统中借鉴力量并学习将使这个过程更快。在这项工作中，我们提出了一种新颖的概率线性求解器，用于参数空间。这利用回归设置中解决的线性系统的信息，提供高效的后验均值和协方差。我们主张将此作为预条件共轭梯度法的伴随回归模型，并讨论后验均值和协方差作为初始猜测和预处理器的有利特性。我们还提供了几种设计选择用于这个伴随求解器。数值实验证明了在超参数优化问题中使用我们的新型求解器的好处。

更新时间: 2025-03-21 16:05:45

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2503.17265v1

Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras

Event cameras rely on motion to obtain information about scene appearance. In other words, for event cameras, motion and appearance are seen both or neither, which are encoded in the output event stream. Previous works consider recovering these two visual quantities as separate tasks, which does not fit with the nature of event cameras and neglects the inherent relations between both tasks. In this paper, we propose an unsupervised learning framework that jointly estimates optical flow (motion) and image intensity (appearance), with a single network. Starting from the event generation model, we newly derive the event-based photometric error as a function of optical flow and image intensity, which is further combined with the contrast maximization framework, yielding a comprehensive loss function that provides proper constraints for both flow and intensity estimation. Exhaustive experiments show that our model achieves state-of-the-art performance for both optical flow (achieves 20% and 25% improvement in EPE and AE respectively in the unsupervised learning category) and intensity estimation (produces competitive results with other baselines, particularly in high dynamic range scenarios). Last but not least, our model achieves shorter inference time than all the other optical flow models and many of the image reconstruction models, while they output only one quantity. Project page: https://github.com/tub-rip/e2fai

Updated: 2025-03-21 16:04:13

标题: 无监督联合学习事件相机的光流和亮度

摘要: 事件相机依赖于运动来获取有关场景外观的信息。换句话说，对于事件相机来说，运动和外观被同时看到或同时不被看到，这些信息被编码在输出事件流中。以前的研究考虑将这两个视觉量分别恢复，这与事件相机的性质不符，忽略了这两个任务之间的固有关系。在本文中，我们提出了一个无监督学习框架，利用单个网络联合估计光流（运动）和图像强度（外观）。从事件生成模型开始，我们新推导了基于事件的光度误差作为光流和图像强度的函数，进一步将其与对比度最大化框架结合，形成一个全面的损失函数，为光流和强度估计提供适当的约束。详尽的实验证明，我们的模型在光流（在无监督学习类别中，EPE和AE分别实现了20%和25%的改进）和强度估计方面取得了最先进的性能（在高动态范围场景中特别产生了与其他基线相竞争的结果）。最后但并非最不重要的是，我们的模型在推理时间方面比所有其他光流模型和许多图像重建模型都要短，而它们仅输出一个数量。项目页面：https://github.com/tub-rip/e2fai

更新时间: 2025-03-21 16:04:13

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2503.17262v1

TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention

Object Hallucination (OH) has been acknowledged as one of the major trustworthy challenges in Large Vision-Language Models (LVLMs). Recent advancements in Large Language Models (LLMs) indicate that internal states, such as hidden states, encode the "overall truthfulness" of generated responses. However, it remains under-explored how internal states in LVLMs function and whether they could serve as "per-token" hallucination indicators, which is essential for mitigating OH. In this paper, we first conduct an in-depth exploration of LVLM internal states in relation to OH issues and discover that (1) LVLM internal states are high-specificity per-token indicators of hallucination behaviors. Moreover, (2) different LVLMs encode universal patterns of hallucinations in common latent subspaces, indicating that there exist "generic truthful directions" shared by various LVLMs. Based on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt) that first learns the truthful direction of LVLM decoding and then applies truthful-guided inference-time intervention during LVLM decoding. We further propose ComnHallu to enhance both cross-LVLM and cross-data hallucination detection transferability by constructing and aligning hallucination latent subspaces. We evaluate TruthPrInt in extensive experimental settings, including in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks. Experimental results indicate that TruthPrInt significantly outperforms state-of-the-art methods. Codes will be available at https://github.com/jinhaoduan/TruthPrInt.

Updated: 2025-03-21 15:58:26

标题: TruthPrInt：通过潜在的真实引导预干预来减轻LVLM物体幻觉

摘要: 目标幻觉（OH）被公认为大型视觉语言模型（LVLMs）中的主要可信挑战之一。最近大型语言模型（LLMs）的进展表明，内部状态，如隐藏状态，编码生成的响应的“整体真实性”。然而，LVLMs中的内部状态如何运作以及它们是否可以作为“每个令牌”幻觉指标仍未得到充分探讨，这对于缓解OH是至关重要的。在本文中，我们首先对LVLM内部状态与OH问题进行了深入探讨，发现（1）LVLM内部状态是幻觉行为的高特异性每个令牌指示器。此外，（2）不同的LVLM在共同的潜在子空间中编码幻觉的普遍模式，表明存在各种LVLM共享的“通用真实方向”。基于这些发现，我们提出了Truthful-Guided Pre-Intervention（TruthPrInt），该方法首先学习LVLM解码的真实方向，然后在LVLM解码过程中应用真实指导的推理干预。我们进一步提出ComnHallu，通过构建和对齐幻觉潜在子空间，增强了跨LVLM和跨数据幻觉检测的可转移性。我们在广泛的实验设置中评估TruthPrInt，包括领域内和领域外的情景，涵盖了流行的LVLMs和OH基准。实验结果表明，TruthPrInt明显优于现有的方法。代码将在https://github.com/jinhaoduan/TruthPrInt上提供。

更新时间: 2025-03-21 15:58:26

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.10602v2

On Privately Estimating a Single Parameter

We investigate differentially private estimators for individual parameters within larger parametric models. While generic private estimators exist, the estimators we provide repose on new local notions of estimand stability, and these notions allow procedures that provide private certificates of their own stability. By leveraging these private certificates, we provide computationally and statistical efficient mechanisms that release private statistics that are, at least asymptotically in the sample size, essentially unimprovable: they achieve instance optimal bounds. Additionally, we investigate the practicality of the algorithms both in simulated data and in real-world data from the American Community Survey and US Census, highlighting scenarios in which the new procedures are successful and identifying areas for future work.

Updated: 2025-03-21 15:57:12

标题: 关于私下估计单个参数

摘要: 我们研究了在更大的参数模型中的个体参数的差分隐私估计器。虽然存在通用的私有估计器，但我们提供的估计器基于新的局部估值稳定性概念，这些概念允许提供其自身稳定性的私有证书的程序。通过利用这些私有证书，我们提供了计算和统计有效的机制，释放私有统计数据，在样本大小至少渐近地情况下，基本上无法改进：它们实现了实例最优的界限。此外，我们通过模拟数据和来自美国社区调查和美国人口普查的真实数据来研究算法的实用性，突出了新程序成功的情景，并确定了未来工作的领域。

更新时间: 2025-03-21 15:57:12

领域: cs.LG,cs.CR,math.ST,stat.TH

下载: http://arxiv.org/abs/2503.17252v1

Breaking the Symmetries of Indistinguishable Objects

Indistinguishable objects often occur when modelling problems in constraint programming, as well as in other related paradigms. They occur when objects can be viewed as being drawn from a set of unlabelled objects, and the only operation allowed on them is equality testing. For example, the golfers in the social golfer problem are indistinguishable. If we do label the golfers, then any relabelling of the golfers in one solution gives another valid solution. Therefore, we can regard the symmetric group of size $n$ as acting on a set of $n$ indistinguishable objects. In this paper, we show how we can break the symmetries resulting from indistinguishable objects. We show how symmetries on indistinguishable objects can be defined properly in complex types, for example in a matrix indexed by indistinguishable objects. We then show how the resulting symmetries can be broken correctly. In Essence, a high-level modelling language, indistinguishable objects are encapsulated in "unnamed types". We provide an implementation of complete symmetry breaking for unnamed types in Essence.

Updated: 2025-03-21 15:56:52

标题: 打破不可区分对象的对称性

摘要: 不可区分的对象经常出现在约束编程问题建模中，以及其他相关范式中。它们出现在对象可以被视为来自一组未标记对象的情况下，且允许在它们之间进行的唯一操作是相等性测试。例如，在社交高尔夫问题中，高尔夫球手是不可区分的。如果我们对高尔夫球手进行标记，那么在一个解决方案中对高尔夫球手的任何重标记都会给出另一个有效的解决方案。因此，我们可以将大小为$n$的对称群看作作用在一组$n$个不可区分对象上。在本文中，我们展示了如何打破由不可区分对象导致的对称性。我们展示了如何在复杂类型中正确定义不可区分对象上的对称性，例如在由不可区分对象索引的矩阵中。然后我们展示了如何正确打破产生的对称性。在Essence中，一种高级建模语言中，不可区分对象被封装在“未命名类型”中。我们提供了在Essence中为未命名类型实现完全对称性打破的实现。

更新时间: 2025-03-21 15:56:52

领域: cs.AI

下载: http://arxiv.org/abs/2503.17251v1

On-Device Federated Continual Learning on RISC-V-based Ultra-Low-Power SoC for Intelligent Nano-Drone Swarms

RISC-V-based architectures are paving the way for efficient On-Device Learning (ODL) in smart edge devices. When applied across multiple nodes, ODL enables the creation of intelligent sensor networks that preserve data privacy. However, developing ODL-capable, battery-operated embedded platforms presents significant challenges due to constrained computational resources and limited device lifetime, besides intrinsic learning issues such as catastrophic forgetting. We face these challenges by proposing a regularization-based On-Device Federated Continual Learning algorithm tailored for multiple nano-drones performing face recognition tasks. We demonstrate our approach on a RISC-V-based 10-core ultra-low-power SoC, optimizing the ODL computational requirements. We improve the classification accuracy by 24% over naive fine-tuning, requiring 178 ms per local epoch and 10.5 s per global epoch, demonstrating the effectiveness of the architecture for this task.

Updated: 2025-03-21 15:53:57

标题: 基于RISC-V超低功耗SoC的智能纳米无人机群体的设备联邦持续学习

摘要: 基于RISC-V架构的设备正在为智能边缘设备中的高效On-Device Learning（ODL）铺平道路。当在多个节点上应用时，ODL使得可以创建保护数据隐私的智能传感器网络成为可能。然而，由于计算资源受限和设备寿命有限，以及固有的学习问题，如灾难性遗忘，开发具有ODL功能的电池供电嵌入式平台面临着重大挑战。我们通过提出一种基于正则化的On-Device Federated Continual Learning算法来应对这些挑战，该算法专为执行人脸识别任务的多个纳米无人机而设计。我们在基于RISC-V的10核超低功耗SoC上展示了我们的方法，优化了ODL的计算需求。我们提高了分类准确率24%，需要每个本地时期178毫秒和每个全局时期10.5秒，展示了该架构在此任务中的有效性。

更新时间: 2025-03-21 15:53:57

领域: cs.LG,cs.CV,cs.MA,I.2.11; I.2.6; C.5.3; I.4.9

下载: http://arxiv.org/abs/2503.17436v1

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Diffusion Transformers have emerged as the preeminent models for a wide array of generative tasks, demonstrating superior performance and efficacy across various applications. The promising results come at the cost of slow inference, as each denoising step requires running the whole transformer model with a large amount of parameters. In this paper, we show that performing the full computation of the model at each diffusion step is unnecessary, as some computations can be skipped by lazily reusing the results of previous steps. Furthermore, we show that the lower bound of similarity between outputs at consecutive steps is notably high, and this similarity can be linearly approximated using the inputs. To verify our demonstrations, we propose the \textbf{LazyDiT}, a lazy learning framework that efficiently leverages cached results from earlier steps to skip redundant computations. Specifically, we incorporate lazy learning layers into the model, effectively trained to maximize laziness, enabling dynamic skipping of redundant computations. Experimental results show that LazyDiT outperforms the DDIM sampler across multiple diffusion transformer models at various resolutions. Furthermore, we implement our method on mobile devices, achieving better performance than DDIM with similar latency. Code: https://github.com/shawnricecake/lazydit

Updated: 2025-03-21 15:52:39

标题: 懒DiT：用于加速扩散变压器的懒学习

摘要: 扩散变压器已成为各种生成任务中卓越的模型，展示了在各种应用中的出色性能和有效性。这些有前途的结果是以推理速度慢为代价的，因为每个去噪步骤都需要运行具有大量参数的整个变压器模型。本文表明，在每个扩散步骤中执行模型的完整计算是不必要的，因为一些计算可以通过懒惰地重用先前步骤的结果来跳过。此外，我们表明，连续步骤之间输出的相似性的下界明显较高，并且可以使用输入线性近似这种相似性。为了验证我们的演示，我们提出了LazyDiT，一个懒惰学习框架，有效地利用从较早步骤缓存的结果来跳过冗余计算。具体而言，我们将懒惰学习层合并到模型中，有效地训练以最大化懒惰性，实现动态跳过冗余计算。实验结果表明，LazyDiT在各种分辨率下优于DDIM采样器的多个扩散变压器模型。此外，我们将我们的方法实现在移动设备上，实现比DDIM更好的性能，而延迟时间类似。源代码：https://github.com/shawnricecake/lazydit

更新时间: 2025-03-21 15:52:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2412.12444v3

KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications

We present the KL3M tokenizers, a family of specialized tokenizers for legal, financial, and governmental text. Despite established work on tokenization, specialized tokenizers for professional domains remain understudied. Our paper offers two main contributions to this area. First, we introduce domain-specific BPE tokenizers for legal, financial, and governmental text. Our kl3m-004-128k-cased tokenizer uses 9-17% fewer tokens than GPT-4o and Llama3 for domain-specific documents, despite having a smaller vocabulary. For specialized terminology, our cased tokenizer is even more efficient, using up to 83% fewer tokens for legal terms and 39% fewer tokens for financial terms. Second, we develop character-level BPE tokenizers (4K, 8K, and 16K vocabulary sizes) for text correction tasks like OCR post-processing. These tokenizers keep consistent token boundaries between error-containing and correct text, making it easier for models to learn correction patterns. These tokenizers help professional applications by fitting more text in context windows, reducing computational needs, and preserving the meaning of domain-specific terms. Our analysis shows these efficiency gains directly benefit the processing of long legal and financial documents. We release all tokenizers and code through GitHub and Hugging Face to support further research in specialized tokenization.

Updated: 2025-03-21 15:51:43

标题: KL3M Tokenizers：面向法律、金融和预处理应用的领域特定和字符级分词器家族

摘要: 我们提出了KL3M分词器，这是一组专门用于法律、金融和政府文本的分词器。尽管已经有关于分词的成熟研究，但专门用于专业领域的分词器仍然研究不足。我们的论文在这个领域提供了两个主要贡献。首先，我们介绍了专门用于法律、金融和政府文本的领域特定的BPE分词器。我们的kl3m-004-128k-cased分词器在领域特定文档中使用的标记比GPT-4o和Llama3少9-17%，尽管词汇量更小。对于专业术语，我们的cased分词器甚至更高效，对于法律术语使用的标记减少了83%，对于金融术语减少了39%。其次，我们为OCR后处理等文本校正任务开发了字符级BPE分词器（4K、8K和16K词汇量大小）。这些分词器在错误包含和正确文本之间保持一致的标记边界，使模型更容易学习校正模式。这些分词器通过将更多文本放入上下文窗口、减少计算需求和保留领域特定术语的含义，帮助专业应用。我们的分析显示，这些效率提升直接有益于处理长篇法律和金融文件。我们通过GitHub和Hugging Face发布了所有分词器和代码，以支持进一步研究专门分词化。

更新时间: 2025-03-21 15:51:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17247v1

Deep End-to-End Posterior ENergy (DEEPEN) for image recovery

Current end-to-end (E2E) and plug-and-play (PnP) image reconstruction algorithms approximate the maximum a posteriori (MAP) estimate but cannot offer sampling from the posterior distribution, like diffusion models. By contrast, it is challenging for diffusion models to be trained in an E2E fashion. This paper introduces a Deep End-to-End Posterior ENergy (DEEPEN) framework, which enables MAP estimation as well as sampling. We learn the parameters of the posterior, which is the sum of the data consistency error and the negative log-prior distribution, using maximum likelihood optimization in an E2E fashion. The proposed approach does not require algorithm unrolling, and hence has a smaller computational and memory footprint than current E2E methods, while it does not require contraction constraints typically needed by current PnP methods. Our results demonstrate that DEEPEN offers improved performance than current E2E and PnP models in the MAP setting, while it also offers faster sampling compared to diffusion models. In addition, the learned energy-based model is observed to be more robust to changes in image acquisition settings.

Updated: 2025-03-21 15:50:54

标题: 深度端到端后验能量（DEEPEN）用于图像恢复

摘要: 当前的端到端（E2E）和即插即用（PnP）图像重建算法近似于最大后验估计（MAP），但不能提供从后验分布中抽样，就像扩散模型一样。相比之下，扩散模型很难以端到端的方式进行训练。本文介绍了一种名为深度端到端后验能量（DEEPEN）框架，该框架可以实现MAP估计和抽样。我们通过端到端的方式使用最大似然优化来学习后验的参数，后验是数据一致性误差和负对数先验分布的总和。所提出的方法不需要算法展开，因此比当前的E2E方法具有更小的计算和内存占用，同时也不需要当前PnP方法通常需要的收缩约束。我们的结果表明，DEEPEN在MAP设置中比当前的E2E和PnP模型具有更好的性能，同时与扩散模型相比，它还提供更快的抽样速度。此外，学习的基于能量的模型被观察到对图像采集设置的变化更具鲁棒性。

更新时间: 2025-03-21 15:50:54

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17244v1

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a >300x speedup while remaining more faithful to a target model's LOO error than prior context attribution methods. This stark increase in performance makes computing context attributions for a given response 30x faster than generating the response itself, empowering real-world applications that require computing attributions at scale. We release a user-friendly and efficient implementation of AttriBoT to enable efficient LLM interpretability as well as encourage future development of efficient context attribution methods.

Updated: 2025-03-21 15:47:53

标题: AttriBoT：一种有效逼近Leave-One-Out上下文归因的技巧集

摘要: 大型语言模型(LLMs)行为的背景输入对其行为的影响促使了文本归因方法的发展，旨在量化每个上下文跨度对LLM生成的影响。留一法(LOO)误差衡量了在去除给定上下文跨度时LLM响应的可能性变化，为执行上下文归因提供了一种合理的方法，但对于大型模型来说计算成本可能过高。在这项工作中，我们引入了AttriBoT，一系列新颖的技术，用于高效计算上下文归因的LOO误差的近似值。具体而言，AttriBoT利用缓存的激活来避免冗余操作，执行分层归因以减少计算量，并用较小的代理模型模拟大目标模型的行为。总的来说，AttriBoT可以提供>300倍的加速，同时保持比先前上下文归因方法更忠实于目标模型的LOO误差。这种性能的显著提高使得计算给定响应的上下文归因比生成响应本身快30倍，为需要大规模计算归因的现实应用提供了支持。我们发布了一个用户友好且高效的AttriBoT实现，以促进高效LLM可解释性的发展，同时鼓励未来高效上下文归因方法的发展。

更新时间: 2025-03-21 15:47:53

领域: cs.LG

下载: http://arxiv.org/abs/2411.15102v3

RadioActive: 3D Radiological Interactive Segmentation Benchmark

Effortless and precise segmentation with minimal clinician effort could greatly streamline clinical workflows. Recent interactive segmentation models, inspired by METAs Segment Anything, have made significant progress but face critical limitations in 3D radiology. These include impractical human interaction requirements such as slice-by-slice operations for 2D models on 3D data and a lack of iterative refinement. Prior studies have been hindered by inadequate evaluation protocols, resulting in unreliable performance assessments and inconsistent findings across studies. The RadioActive benchmark addresses these challenges by providing a rigorous and reproducible evaluation framework for interactive segmentation methods in clinically relevant scenarios. It features diverse datasets, a wide range of target structures, and the most impactful 2D and 3D interactive segmentation methods, all within a flexible and extensible codebase. We also introduce advanced prompting techniques that reduce interaction steps, enabling fair comparisons between 2D and 3D models. Surprisingly, SAM2 outperforms all specialized medical 2D and 3D models in a setting requiring only a few interactions to generate prompts for a 3D volume. This challenges prevailing assumptions and demonstrates that general-purpose models surpass specialized medical approaches. By open-sourcing RadioActive, we invite researchers to integrate their models and prompting techniques, ensuring continuous and transparent evaluation of 3D medical interactive models.

Updated: 2025-03-21 15:47:12

标题: 放射性：3D放射性交互分割基准

摘要: 轻松且精确的分割，最小程度地减轻临床医生的工作量，可以极大地简化临床工作流程。受METAs Segment Anything启发的最近的交互式分割模型取得了显著进展，但在3D放射学中面临关键限制。这些包括不切实际的人类交互需求，例如对3D数据的2D模型进行逐层操作，以及缺乏迭代细化。以往的研究受到评估协议不足的阻碍，导致性能评估不可靠，研究结果不一致。RadioActive基准通过为临床相关场景中的交互式分割方法提供严格和可重复的评估框架来解决这些挑战。它具有多样的数据集、广泛的目标结构范围以及最具影响力的2D和3D交互式分割方法，全部都在一个灵活且可扩展的代码库中。我们还引入了先进的提示技术，可以减少交互步骤，实现对2D和3D模型之间的公平比较。令人惊讶的是，SAM2在仅需要少数交互即可为3D体积生成提示的情况下，胜过了所有专门的医学2D和3D模型。这挑战了当前的假设，并表明通用模型超越了专门的医学方法。通过开源RadioActive，我们邀请研究人员整合他们的模型和提示技术，确保对3D医学交互式模型进行持续和透明的评估。

更新时间: 2025-03-21 15:47:12

领域: cs.CV,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2411.07885v3

Analyzing Performance Bottlenecks in Zero-Knowledge Proof Based Rollups on Ethereum

Blockchain technology is rapidly evolving, with scalability remaining one of its most significant challenges. While various solutions have been proposed and continue to be developed, it is essential to consider the blockchain trilemma -- balancing scalability, security, and decentralization -- when designing new approaches. One promising solution is the zero-knowledge proof (ZKP)-based rollup, implemented on top of Ethereum. However, the performance of these systems is often limited by the efficiency of the ZKP mechanism. This paper explores the performance of ZKP-based rollups, focusing on a solution built using the Hardhat Ethereum development environment. Through detailed analysis, the paper identifies and examines key bottlenecks within the ZKP system, providing insight into potential areas for optimization to enhance scalability and overall system performance.

Updated: 2025-03-21 15:45:51

标题: 在以太坊上基于零知识证明滚动升级中分析性能瓶颈

摘要: 区块链技术正在迅速发展，可扩展性仍然是其中最重要的挑战之一。虽然已提出各种解决方案并不断开发中，但在设计新方法时必须考虑区块链三难问题——平衡可扩展性、安全性和去中心化。其中一种有前途的解决方案是基于零知识证明（ZKP）的rollup，实现在以太坊之上。然而，这些系统的性能通常受到ZKP机制的效率限制。本文探讨了基于ZKP的rollup的性能，重点关注使用Hardhat以太坊开发环境构建的解决方案。通过详细分析，本文确定并研究了ZKP系统中的关键瓶颈，为优化提升可扩展性和整体系统性能提供了见解。

更新时间: 2025-03-21 15:45:51

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2503.22709v1

Calibrated Computation-Aware Gaussian Processes

Gaussian processes are notorious for scaling cubically with the size of the training set, preventing application to very large regression problems. Computation-aware Gaussian processes (CAGPs) tackle this scaling issue by exploiting probabilistic linear solvers to reduce complexity, widening the posterior with additional computational uncertainty due to reduced computation. However, the most commonly used CAGP framework results in (sometimes dramatically) conservative uncertainty quantification, making the posterior unrealistic in practice. In this work, we prove that if the utilised probabilistic linear solver is calibrated, in a rigorous statistical sense, then so too is the induced CAGP. We thus propose a new CAGP framework, CAGP-GS, based on using Gauss-Seidel iterations for the underlying probabilistic linear solver. CAGP-GS performs favourably compared to existing approaches when the test set is low-dimensional and few iterations are performed. We test the calibratedness on a synthetic problem, and compare the performance to existing approaches on a large-scale global temperature regression problem.

Updated: 2025-03-21 15:45:28

标题: 校准的计算感知高斯过程

摘要: 高斯过程以其与训练集大小呈立方比例的扩展而臭名昭著，这阻碍了其在非常大的回归问题中的应用。具有计算意识的高斯过程（CAGPs）通过利用概率线性求解器来减少复杂性，扩展后验分布并增加由于减少计算而产生的额外计算不确定性，从而解决了这个扩展问题。然而，最常用的CAGP框架导致（有时是明显的）保守的不确定性量化，使后验分布在实践中变得不现实。在这项工作中，我们证明了如果利用的概率线性求解器在严格的统计意义上进行了校准，那么诱导的CAGP也是如此。因此，我们提出了一个新的CAGP框架，CAGP-GS，其基于使用高斯-赛德尔迭代来进行底层概率线性求解。与现有方法相比，当测试集是低维且迭代次数较少时，CAGP-GS表现良好。我们在一个合成问题上测试了校准性，并将性能与现有方法在大规模全球温度回归问题上进行了比较。

更新时间: 2025-03-21 15:45:28

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.08796v2

SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Fine-tuning large language models (LLMs) on downstream tasks can inadvertently erode their safety alignment, even for benign fine-tuning datasets. We address this challenge by proposing SafeMERGE, a post-fine-tuning framework that preserves safety while maintaining task utility. It achieves this by selectively merging fine-tuned and safety-aligned model layers only when those deviate from safe behavior, measured by a cosine similarity criterion. We evaluate SafeMERGE against other fine-tuning- and post-fine-tuning-stage approaches for Llama-2-7B-Chat and Qwen-2-7B-Instruct models on GSM8K and PubMedQA tasks while exploring different merging strategies. We find that SafeMERGE consistently reduces harmful outputs compared to other baselines without significantly sacrificing performance, sometimes even enhancing it. The results suggest that our selective, subspace-guided, and per-layer merging method provides an effective safeguard against the inadvertent loss of safety in fine-tuned LLMs while outperforming simpler post-fine-tuning-stage defenses.

Updated: 2025-03-21 15:44:09

标题: SafeMERGE：通过选择性逐层模型合并在精调大型语言模型中保持安全对齐

摘要: 将大型语言模型（LLMs）在下游任务上进行微调可能会无意中削弱它们的安全对齐性，即使对于良性微调数据集也是如此。我们通过提出SafeMERGE来解决这一挑战，这是一个在微调后保持安全性同时保持任务效用的框架。它通过选择性地合并仅在微调和安全对齐模型层偏离安全行为时才进行合并，通过余弦相似性标准来衡量。我们在GSM8K和PubMedQA任务上评估了SafeMERGE与其他LLama-2-7B-Chat和Qwen-2-7B-Instruct模型的微调和后微调阶段方法，同时探讨不同的合并策略。我们发现，与其他基线相比，SafeMERGE始终减少有害输出，而不会显著牺牲性能，有时甚至会提高性能。结果表明，我们的选择性、子空间引导和逐层合并方法提供了一种有效的保护措施，可以防止微调后LLMs意外丧失安全性，同时优于更简单的后微调阶段防御措施。

更新时间: 2025-03-21 15:44:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17239v1

Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID

Detecting and tracking multiple unmanned aerial vehicles (UAVs) in thermal infrared video is inherently challenging due to low contrast, environmental noise, and small target sizes. This paper provides a straightforward approach to address multi-UAV tracking in thermal infrared video, leveraging recent advances in detection and tracking. Instead of relying on the YOLOv5 with the DeepSORT pipeline, we present a tracking framework built on YOLOv12 and BoT-SORT, enhanced with tailored training and inference strategies. We evaluate our approach following the metrics from the 4th Anti-UAV Challenge and demonstrate competitive performance. Notably, we achieve strong results without using contrast enhancement or temporal information fusion to enrich UAV features, highlighting our approach as a "Strong Baseline" for the multi-UAV tracking task. We provide implementation details, in-depth experimental analysis, and a discussion of potential improvements. The code is available at https://github.com/wish44165/YOLOv12-BoT-SORT-ReID .

Updated: 2025-03-21 15:40:18

标题: 强基线：利用YOLOv12与BoT-SORT-ReID进行多无人机跟踪

摘要: 在热红外视频中检测和跟踪多个无人机（UAVs）具有固有的挑战性，由于低对比度、环境噪音和小目标尺寸。本文提供了一个简单的方法来处理热红外视频中的多UAV跟踪，利用了最近在检测和跟踪方面的进展。我们提出了一个基于YOLOv12和BoT-SORT构建的跟踪框架，通过定制的训练和推理策略进行增强。我们根据第四届反无人机挑战的指标评估了我们的方法，并展示了竞争性的表现。值得注意的是，我们在不使用对比度增强或时间信息融合来丰富UAV特征的情况下取得了强大的结果，突显了我们的方法作为多UAV跟踪任务的“强基线”。我们提供了实现细节、深入的实验分析以及潜在改进的讨论。代码可在https://github.com/wish44165/YOLOv12-BoT-SORT-ReID 上找到。

更新时间: 2025-03-21 15:40:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17237v1

LoGoFair: Post-Processing for Local and Global Fairness in Federated Learning

Federated learning (FL) has garnered considerable interest for its capability to learn from decentralized data sources. Given the increasing application of FL in decision-making scenarios, addressing fairness issues across different sensitive groups (e.g., female, male) in FL is crucial. Current research often focuses on facilitating fairness at each client's data (local fairness) or within the entire dataset across all clients (global fairness). However, existing approaches that focus exclusively on either local or global fairness fail to address two key challenges: (\textbf{CH1}) Under statistical heterogeneity, global fairness does not imply local fairness, and vice versa. (\textbf{CH2}) Achieving fairness under model-agnostic setting. To tackle the aforementioned challenges, this paper proposes a novel post-processing framework for achieving both Local and Global Fairness in the FL context, namely LoGoFair. To address CH1, LoGoFair endeavors to seek the Bayes optimal classifier under local and global fairness constraints, which strikes the optimal accuracy-fairness balance in the probabilistic sense. To address CH2, LoGoFair employs a model-agnostic federated post-processing procedure that enables clients to collaboratively optimize global fairness while ensuring local fairness, thereby achieving the optimal fair classifier within FL. Experimental results on three real-world datasets further illustrate the effectiveness of the proposed LoGoFair framework.

Updated: 2025-03-21 15:33:09

标题: LoGoFair：联邦学习中的本地和全局公平后处理

摘要: 联邦学习（FL）因其能够从分散数据源中学习而引起了相当大的兴趣。鉴于FL在决策场景中的应用日益增多，解决FL中不同敏感群体（例如女性、男性）之间的公平性问题至关重要。当前的研究经常侧重于在每个客户端数据（本地公平性）或在所有客户端跨整个数据集内实现公平性（全局公平性）。然而，专注于本地或全局公平性的现有方法未能解决两个关键挑战：（CH1）在统计异质性下，全局公平性并不意味着本地公平性，反之亦然。（CH2）在模型无关设置下实现公平性。为了解决上述挑战，本文提出了一个在FL背景下实现本地和全局公平性的新型后处理框架，即LoGoFair。为了解决CH1，LoGoFair努力寻求在本地和全局公平性约束下的贝叶斯最优分类器，从概率意义上实现最佳准确性-公平性平衡。为了解决CH2，LoGoFair采用了一种模型无关的联邦后处理程序，使客户能够共同优化全局公平性，同时确保本地公平性，从而在FL内实现最佳公平分类器。对三个真实世界数据集的实验结果进一步说明了提出的LoGoFair框架的有效性。

更新时间: 2025-03-21 15:33:09

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2503.17231v1

FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs

Large Language Models (LLMs) frequently generate hallucinated content, posing significant challenges for applications where factuality is crucial. While existing hallucination detection methods typically operate at the sentence level or passage level, we propose FactSelfCheck, a novel black-box sampling-based method that enables fine-grained fact-level detection. Our approach represents text as knowledge graphs consisting of facts in the form of triples. Through analyzing factual consistency across multiple LLM responses, we compute fine-grained hallucination scores without requiring external resources or training data. Our evaluation demonstrates that FactSelfCheck performs competitively with leading sampling-based methods while providing more detailed insights. Most notably, our fact-level approach significantly improves hallucination correction, achieving a 35% increase in factual content compared to the baseline, while sentence-level SelfCheckGPT yields only an 8% improvement. The granular nature of our detection enables more precise identification and correction of hallucinated content.

Updated: 2025-03-21 15:32:24

标题: FactSelfCheck：用于LLMs的事实级别黑盒幻觉检测

摘要: 大型语言模型（LLMs）经常生成虚构内容，这给需要事实性关键的应用程序带来了重大挑战。虽然现有的幻觉检测方法通常在句子级别或段落级别运作，但我们提出了FactSelfCheck，这是一种全新的基于黑盒抽样的方法，可以实现细粒度的事实级别检测。我们的方法将文本表示为由三元组形式的事实组成的知识图。通过分析多个LLM响应之间的事实一致性，我们计算出细粒度的幻觉分数，而无需外部资源或训练数据。我们的评估表明，FactSelfCheck与主流基于抽样的方法竞争力强，同时提供更详细的见解。值得注意的是，我们的事实级别方法显著改进了幻觉校正，与基线相比，实现了35%的事实内容增加，而句子级别的SelfCheckGPT仅实现了8%的改进。我们检测的细粒度性质使得更加精确地识别和纠正虚构内容成为可能。

更新时间: 2025-03-21 15:32:24

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.17229v1

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

Estimating matrices in the symmetric positive-definite (SPD) cone is of interest for many applications ranging from computer vision to graph learning. While there exist various convex optimization-based estimators, they remain limited in expressivity due to their model-based approach. The success of deep learning motivates the use of learning-based approaches to estimate SPD matrices with neural networks in a data-driven fashion. However, designing effective neural architectures for SPD learning is challenging, particularly when the task requires additional structural constraints, such as element-wise sparsity. Current approaches either do not ensure that the output meets all desired properties or lack expressivity. In this paper, we introduce SpodNet, a novel and generic learning module that guarantees SPD outputs and supports additional structural constraints. Notably, it solves the challenging task of learning jointly SPD and sparse matrices. Our experiments illustrate the versatility and relevance of SpodNet layers for such applications.

Updated: 2025-03-21 15:31:47

标题: 舒尔正定网络：在具有结构的SPD锥内进行深度学习

摘要: 在对称正定（SPD）锥中估计矩阵对许多应用是有兴趣的，从计算机视觉到图学习等各个领域都有。虽然存在各种基于凸优化的估计器，但由于其基于模型的方法，它们在表达能力上仍然有限。深度学习的成功促使使用基于学习的方法以数据驱动的方式估计SPD矩阵与神经网络。然而，为SPD学习设计有效的神经结构是具有挑战性的，尤其是当任务需要额外的结构约束，如逐元素稀疏性时。当前的方法要么不能确保输出满足所有期望的属性，要么缺乏表达能力。在本文中，我们介绍了SpodNet，一个新颖且通用的学习模块，它保证了SPD输出并支持额外的结构约束。值得注意的是，它解决了共同学习SPD和稀疏矩阵的具有挑战性的任务。我们的实验展示了SpodNet层在这些应用中的多功能性和相关性。

更新时间: 2025-03-21 15:31:47

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.09023v4

Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation

As machine learning models increase in scale and complexity, obtaining sufficient training data has become a critical bottleneck due to acquisition costs, privacy constraints, and data scarcity in specialised domains. While synthetic data generation has emerged as a promising alternative, a notable performance gap remains compared to models trained on real data, particularly as task complexity grows. Concurrently, Neuro-Symbolic methods, which combine neural networks' learning strengths with symbolic reasoning's structured representations, have demonstrated significant potential across various cognitive tasks. This paper explores the utility of Neuro-Symbolic conditioning for synthetic image dataset generation, focusing specifically on improving the performance of Scene Graph Generation models. The research investigates whether structured symbolic representations in the form of scene graphs can enhance synthetic data quality through explicit encoding of relational constraints. The results demonstrate that Neuro-Symbolic conditioning yields significant improvements of up to +2.59% in standard Recall metrics and +2.83% in No Graph Constraint Recall metrics when used for dataset augmentation. These findings establish that merging Neuro-Symbolic and generative approaches produces synthetic data with complementary structural information that enhances model performance when combined with real data, providing a novel approach to overcome data scarcity limitations even for complex visual reasoning tasks.

Updated: 2025-03-21 15:26:16

标题: 神经符号场景图条件化用于合成图像数据集生成

摘要: 随着机器学习模型规模和复杂性的增加，由于获取训练数据的成本、隐私约束和专业领域中数据稀缺性等原因，获得足够的训练数据已成为一个关键瓶颈。尽管合成数据生成已经成为一种有前途的替代方案，但与在真实数据上训练的模型相比，尤其在任务复杂度增加时，仍存在显著的性能差距。与此同时，神经符号方法将神经网络的学习优势与符号推理的结构化表示相结合，已在各种认知任务中展现出显著潜力。本文探讨了神经符号调节对合成图像数据集生成的实用性，重点是改善场景图生成模型的性能。研究调查了结构化符号表示（如场景图）是否可以通过明确编码关系约束来增强合成数据质量。结果表明，神经符号调节在数据集增强时可以显著提高标准召回率指标高达+2.59%和无图约束召回率指标高达+2.83%。这些发现证实，将神经符号和生成方法相结合可以产生具有互补结构信息的合成数据，当与真实数据结合时可以提高模型性能，为克服数据稀缺限制提供了一种新的方法，即使对于复杂的视觉推理任务也是如此。

更新时间: 2025-03-21 15:26:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.17224v1

Automating Adjudication of Cardiovascular Events Using Large Language Models

Cardiovascular events, such as heart attacks and strokes, remain a leading cause of mortality globally, necessitating meticulous monitoring and adjudication in clinical trials. This process, traditionally performed manually by clinical experts, is time-consuming, resource-intensive, and prone to inter-reviewer variability, potentially introducing bias and hindering trial progress. This study addresses these critical limitations by presenting a novel framework for automating the adjudication of cardiovascular events in clinical trials using Large Language Models (LLMs). We developed a two-stage approach: first, employing an LLM-based pipeline for event information extraction from unstructured clinical data and second, using an LLM-based adjudication process guided by a Tree of Thoughts approach and clinical endpoint committee (CEC) guidelines. Using cardiovascular event-specific clinical trial data, the framework achieved an F1-score of 0.82 for event extraction and an accuracy of 0.68 for adjudication. Furthermore, we introduce the CLEART score, a novel, automated metric specifically designed for evaluating the quality of AI-generated clinical reasoning in adjudicating cardiovascular events. This approach demonstrates significant potential for substantially reducing adjudication time and costs while maintaining high-quality, consistent, and auditable outcomes in clinical trials. The reduced variability and enhanced standardization also allow for faster identification and mitigation of risks associated with cardiovascular therapies.

Updated: 2025-03-21 15:25:53

标题: 使用大型语言模型自动处理心血管事件的裁决

摘要: 心血管事件，如心脏病发作和中风，仍然是全球主要的死亡原因，需要在临床试验中进行细致的监测和裁决。传统上由临床专家手动执行的这一过程耗时、资源密集，并且容易出现审阅者间的差异，可能引入偏见并阻碍试验进展。本研究通过提出一个新颖的框架，利用大型语言模型（LLM）自动裁决临床试验中的心血管事件，解决了这些关键限制。我们开发了一个两阶段方法：首先，利用基于LLM的管道从非结构化临床数据中提取事件信息，其次，使用基于LLM的裁决过程，采用"思维树"方法和临床终点委员会（CEC）指南进行指导。在心血管事件特定的临床试验数据中，该框架实现了事件提取的F1分数为0.82，裁决的准确率为0.68。此外，我们引入了CLEART分数，这是一个专门设计用于评估AI生成的临床推理在裁决心血管事件中质量的新颖、自动化指标。这种方法显示出显著的潜力，可以大幅减少裁决时间和成本，同时保持临床试验中高质量、一致和可审计的结果。降低的变异性和增强的标准化还可以更快地识别和减轻与心血管治疗相关的风险。

更新时间: 2025-03-21 15:25:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17222v1

Cyber Campaign Fractals -- Geometric Analysis of Hierarchical Cyber Attack Taxonomies

This paper introduces a novel mathematical framework for analyzing cyber threat campaigns through fractal geometry. By conceptualizing hierarchical taxonomies (MITRE ATT&CK, DISARM) as snowflake-like structures with tactics, techniques, and sub-techniques forming concentric layers, we establish a rigorous method for campaign comparison using Hutchinson's Theorem and Hausdorff distance metrics. Evaluation results confirm that our fractal representation preserves hierarchical integrity while providing a dimensionality-based complexity assessment that correlates with campaign complexity. The proposed methodology bridges taxonomy-driven cyber threat analysis and computational geometry, providing analysts with both mathematical rigor and interpretable visualizations for addressing the growing complexity of adversarial operations across multiple threat domains.

Updated: 2025-03-21 15:24:18

标题: 网络攻击分形-对分层网络攻击分类法的几何分析

摘要: 这篇论文介绍了一种新颖的数学框架，用于通过分形几何分析网络威胁活动。通过将层次分类（MITRE ATT&CK、DISARM）概念化为类似雪花的结构，其中战术、技术和子技术形成同心层，我们建立了一种使用Hutchinson定理和Hausdorff距离度量进行威胁活动比较的严格方法。评估结果证实，我们的分形表示保持了层次完整性，同时提供了与活动复杂性相关的基于维度的复杂性评估。所提出的方法桥接了基于分类法的网络威胁分析和计算几何学，为分析人员提供了数学严谨性和可解释的可视化工具，以应对跨多个威胁领域日益复杂的对抗行动。

更新时间: 2025-03-21 15:24:18

领域: cs.CR

下载: http://arxiv.org/abs/2503.17219v1

ML-Based Bidding Price Prediction for Pay-As-Bid Ancillary Services Markets: A Use Case in the German Control Reserve Market

The increasing integration of renewable energy sources has led to greater volatility and unpredictability in electricity generation, posing challenges to grid stability. Ancillary service markets, such as the German control reserve market, allow industrial consumers and producers to offer flexibility in their power consumption or generation, contributing to grid stability while earning additional income. However, many participants use simple bidding strategies that may not maximize their revenues. This paper presents a methodology for forecasting bidding prices in pay-as-bid ancillary service markets, focusing on the German control reserve market. We evaluate various machine learning models, including Support Vector Regression, Decision Trees, and k-Nearest Neighbors, and compare their performance against benchmark models. To address the asymmetry in the revenue function of pay-as-bid markets, we introduce an offset adjustment technique that enhances the practical applicability of the forecasting models. Our analysis demonstrates that the proposed approach improves potential revenues by 27.43 % to 37.31 % compared to baseline models. When analyzing the relationship between the model forecasting errors and the revenue, a negative correlation is measured for three markets; according to the results, a reduction of 1 EUR/MW model price forecasting error (MAE) statistically leads to a yearly revenue increase between 483 EUR/MW and 3,631 EUR/MW. The proposed methodology enables industrial participants to optimize their bidding strategies, leading to increased earnings and contributing to the efficiency and stability of the electrical grid.

Updated: 2025-03-21 15:21:43

标题: 基于机器学习的拍卖价格预测在按需辅助服务市场中的应用案例：以德国控制储备市场为例

摘要: 随着可再生能源的不断融入，电力发电的波动性和不可预测性增加，对电网稳定性提出挑战。辅助服务市场，如德国控制储备市场，允许工业消费者和生产者在其电力消费或发电中提供灵活性，从而增强电网稳定性并获得额外收入。然而，许多参与者使用简单的竞标策略，可能无法最大化其收入。本文提出了一种在按竞标付费的辅助服务市场中预测竞标价格的方法，重点关注德国控制储备市场。我们评估了各种机器学习模型，包括支持向量回归、决策树和k-最近邻算法，并将它们的性能与基准模型进行了比较。为了解决按竞标付费市场中收入函数的不对称性，我们引入了一种偏移调整技术，增强了预测模型的实际适用性。我们的分析表明，与基准模型相比，所提出的方法可将潜在收入提高27.43%至37.31%。在分析模型预测误差与收入之间的关系时，发现三个市场的负相关性；根据结果，1欧元/兆瓦的模型价格预测误差（MAE）的减少统计上会导致每年收入增加483欧元/兆瓦至3,631欧元/兆瓦。所提出的方法使工业参与者能够优化竞标策略，增加收入并有助于电力网的效率和稳定性。

更新时间: 2025-03-21 15:21:43

领域: cs.LG,cs.CE,stat.AP,stat.ML

下载: http://arxiv.org/abs/2503.17214v1

PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction

Document layout analysis is a critical preprocessing step in document intelligence, enabling the detection and localization of structural elements such as titles, text blocks, tables, and formulas. Despite its importance, existing layout detection models face significant challenges in generalizing across diverse document types, handling complex layouts, and achieving real-time performance for large-scale data processing. To address these limitations, we present PP-DocLayout, which achieves high precision and efficiency in recognizing 23 types of layout regions across diverse document formats. To meet different needs, we offer three models of varying scales. PP-DocLayout-L is a high-precision model based on the RT-DETR-L detector, achieving 90.4% mAP@0.5 and an end-to-end inference time of 13.4 ms per page on a T4 GPU. PP-DocLayout-M is a balanced model, offering 75.2% mAP@0.5 with an inference time of 12.7 ms per page on a T4 GPU. PP-DocLayout-S is a high-efficiency model designed for resource-constrained environments and real-time applications, with an inference time of 8.1 ms per page on a T4 GPU and 14.5 ms on a CPU. This work not only advances the state of the art in document layout analysis but also provides a robust solution for constructing high-quality training data, enabling advancements in document intelligence and multimodal AI systems. Code and models are available at https://github.com/PaddlePaddle/PaddleX .

Updated: 2025-03-21 15:20:47

标题: PP-DocLayout: 一个统一的文档布局检测模型，用于加速大规模数据构建

摘要: 文档布局分析是文档智能中至关重要的预处理步骤，使得可以检测和定位结构元素，如标题、文本块、表格和公式。尽管其重要性，现有的布局检测模型在一般化跨不同文档类型、处理复杂布局以及实现大规模数据处理的实时性等方面面临重大挑战。为了解决这些限制，我们提出了PP-DocLayout，它在识别不同文档格式中的23种布局区域方面实现了高精度和高效率。为满足不同需求，我们提供了三种不同规模的模型。PP-DocLayout-L是基于RT-DETR-L检测器的高精度模型，实现了90.4%的mAP@0.5和每页13.4毫秒的端到端推理时间在T4 GPU上。PP-DocLayout-M是一种平衡模型，提供了75.2%的mAP@0.5，在T4 GPU上每页推理时间为12.7毫秒。PP-DocLayout-S是一种高效率模型，设计用于资源受限环境和实时应用，在T4 GPU上每页推理时间为8.1毫秒，在CPU上为14.5毫秒。这项工作不仅推动了文档布局分析的最新技术，还为构建高质量的训练数据提供了稳健的解决方案，促进了文档智能和多模态人工智能系统的发展。代码和模型可在https://github.com/PaddlePaddle/PaddleX 上找到。

更新时间: 2025-03-21 15:20:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17213v1

A Language Anchor-Guided Method for Robust Noisy Domain Generalization

Real-world machine learning applications often struggle with two major challenges: distribution shift and label noise. Models tend to overfit by focusing on redundant and uninformative features in the training data, which makes it hard for them to generalize to the target domain. Noisy data worsens this problem by causing further overfitting to the noise, meaning that existing methods often fail to tell the difference between true, invariant features and misleading, spurious ones. To tackle these issues, we introduce Anchor Alignment and Adaptive Weighting (A3W). This new algorithm uses sample reweighting guided by natural language processing (NLP) anchors to extract more representative features. In simple terms, A3W leverages semantic representations from natural language models as a source of domain-invariant prior knowledge. Additionally, it employs a weighted loss function that adjusts each sample's contribution based on its similarity to the corresponding NLP anchor. This adjustment makes the model more robust to noisy labels. Extensive experiments on standard benchmark datasets show that A3W consistently outperforms state-of-the-art domain generalization methods, offering significant improvements in both accuracy and robustness across different datasets and noise levels.

Updated: 2025-03-21 15:20:28

标题: 一种基于语言锚点引导的鲁棒嘈杂域泛化方法

摘要: 现实世界中的机器学习应用经常面临两个主要挑战：分布偏移和标签噪声。模型往往会过度拟合，通过专注于训练数据中冗余和无信息的特征，这使得它们很难推广到目标域。嘈杂的数据会加剧这一问题，因为它会导致进一步对噪声过度拟合，这意味着现有方法通常无法区分真实的、不变的特征和误导性的、虚假的特征。为了解决这些问题，我们引入了锚点对齐和自适应加权(A3W)。这种新算法利用自然语言处理(NLP)锚点引导的样本重新加权来提取更具代表性的特征。简单来说，A3W利用自然语言模型的语义表示作为具有域不变性的先验知识。此外，它采用加权损失函数，根据每个样本与相应的NLP锚点的相似性调整其贡献，使模型更加鲁棒地处理嘈杂的标签。在标准基准数据集上进行的大量实验表明，A3W一直优于最先进的领域泛化方法，在不同数据集和噪声水平上都提供了显著的准确性和鲁棒性改进。

更新时间: 2025-03-21 15:20:28

领域: cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17211v1

Global-Decision-Focused Neural ODEs for Proactive Grid Resilience Management

Extreme hazard events such as wildfires and hurricanes increasingly threaten power systems, causing widespread outages and disrupting critical services. Recently, predict-then-optimize approaches have gained traction in grid operations, where system functionality forecasts are first generated and then used as inputs for downstream decision-making. However, this two-stage method often results in a misalignment between prediction and optimization objectives, leading to suboptimal resource allocation. To address this, we propose predict-all-then-optimize-globally (PATOG), a framework that integrates outage prediction with globally optimized interventions. At its core, our global-decision-focused (GDF) neural ODE model captures outage dynamics while optimizing resilience strategies in a decision-aware manner. Unlike conventional methods, our approach ensures spatially and temporally coherent decision-making, improving both predictive accuracy and operational efficiency. Experiments on synthetic and real-world datasets demonstrate significant improvements in outage prediction consistency and grid resilience.

Updated: 2025-03-21 15:16:16

标题: 全球决策导向的神经ODE用于主动网格韧性管理

摘要: 极端危险事件，如野火和飓风，越来越多地威胁到电力系统，导致广泛的停电和破坏关键服务。最近，在电网运营中，预测-优化方法逐渐受到关注，其中首先生成系统功能预测，然后将其用作下游决策的输入。然而，这种两阶段方法通常导致预测和优化目标之间的不匹配，导致资源分配不佳。为了解决这个问题，我们提出了“先预测所有，然后全局优化”（PATOG）框架，该框架将停电预测与全局优化干预相结合。在其核心，我们的全局决策焦点（GDF）神经ODE模型捕捉停电动态，同时以决策感知的方式优化韧性策略。与传统方法不同，我们的方法确保了空间和时间上的一致性决策制定，提高了预测准确性和运营效率。对合成和真实世界数据集的实验表明，停电预测一致性和电网韧性都有显著改善。

更新时间: 2025-03-21 15:16:16

领域: cs.LG

下载: http://arxiv.org/abs/2502.18321v2

GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation

Large Language Models (LLMs) are becoming integral to daily life, showcasing their vast potential across various Natural Language Processing (NLP) tasks. Beyond NLP, LLMs are increasingly used in software development tasks, such as code completion, modification, bug fixing, and code translation. Software engineers widely use tools like GitHub Copilot and Amazon Q, streamlining workflows and automating tasks with high accuracy. While the resource and energy intensity of LLM training is often highlighted, inference can be even more resource-intensive over time, as it's a continuous process with a high number of invocations. Therefore, developing resource-efficient alternatives for LLM inference is crucial for sustainability. This work proposes GREEN-CODE, a framework for energy-aware code generation in LLMs. GREEN-CODE performs dynamic early exit during LLM inference. We train a Reinforcement Learning (RL) agent that learns to balance the trade-offs between accuracy, latency, and energy consumption. Our approach is evaluated on two open-source LLMs, Llama 3.2 3B and OPT 2.7B, using the JavaCorpus and PY150 datasets. Results show that our method reduces the energy consumption between 23-50 % on average for code generation tasks without significantly affecting accuracy.

Updated: 2025-03-21 15:07:55

标题: 绿色编码：学习在基于LLM的代码生成中优化能效

摘要: 大型语言模型（LLMs）正逐渐成为日常生活中不可或缺的一部分，展示出它们在各种自然语言处理（NLP）任务中的巨大潜力。除了NLP，LLMs越来越多地用于软件开发任务，如代码完成、修改、错误修复和代码翻译。软件工程师广泛使用GitHub Copilot和Amazon Q等工具，简化工作流程并以高准确性自动化任务。尽管LLM训练的资源和能量强度经常受到强调，但随着时间推移，推理可能会更加资源密集，因为它是一个连续的过程，并且有大量调用。因此，开发LLM推理的资源高效替代方案对可持续性至关重要。本文提出了一个名为GREEN-CODE的框架，用于LLMs中的能源感知代码生成。GREEN-CODE在LLMs推理过程中执行动态提前退出。我们训练了一个强化学习（RL）代理程序，学习在准确性、延迟和能量消耗之间平衡权衡。我们的方法在两个开源LLMs，Llama 3.2 3B和OPT 2.7B上进行评估，使用JavaCorpus和PY150数据集。结果显示，我们的方法在代码生成任务中平均减少了23-50％的能量消耗，而几乎不影响准确性。

更新时间: 2025-03-21 15:07:55

领域: cs.DC,cs.AI,cs.PF,cs.SE,C.4; D.0; E.4; I.7

下载: http://arxiv.org/abs/2501.11006v2

Uncertainty modeling for fine-tuned implicit functions

Implicit functions such as Neural Radiance Fields (NeRFs), occupancy networks, and signed distance functions (SDFs) have become pivotal in computer vision for reconstructing detailed object shapes from sparse views. Achieving optimal performance with these models can be challenging due to the extreme sparsity of inputs and distribution shifts induced by data corruptions. To this end, large, noise-free synthetic datasets can serve as shape priors to help models fill in gaps, but the resulting reconstructions must be approached with caution. Uncertainty estimation is crucial for assessing the quality of these reconstructions, particularly in identifying areas where the model is uncertain about the parts it has inferred from the prior. In this paper, we introduce Dropsembles, a novel method for uncertainty estimation in tuned implicit functions. We demonstrate the efficacy of our approach through a series of experiments, starting with toy examples and progressing to a real-world scenario. Specifically, we train a Convolutional Occupancy Network on synthetic anatomical data and test it on low-resolution MRI segmentations of the lumbar spine. Our results show that Dropsembles achieve the accuracy and calibration levels of deep ensembles but with significantly less computational cost.

Updated: 2025-03-21 15:06:41

标题: Feine abgestimmte implizite Funktionen的不确定性建模

摘要: 隐式函数，如神经辐射场（NeRFs）、占用网络和有符号距离函数（SDFs），已成为计算机视觉中重要的工具，用于从稀疏视图中重建详细的物体形状。由于输入的极度稀疏性和数据损坏引起的分布偏移，使用这些模型实现最佳性能可能具有挑战性。为此，大规模、无噪声的合成数据集可以作为形状先验，帮助模型填补空白，但生成的重建结果必须谨慎对待。不确定性估计对于评估这些重建结果的质量至关重要，特别是在识别模型对先验推断的部分感到不确定的区域。在本文中，我们介绍了Dropsembles，一种用于调整隐式函数中的不确定性估计的新方法。我们通过一系列实验展示了我们方法的有效性，从玩具示例开始，逐渐发展到真实世界的场景。具体地，我们在合成解剖学数据上训练了一个卷积占用网络，并在腰椎低分辨率MRI分割上进行了测试。我们的结果表明，Dropsembles实现了深度集合的准确性和校准水平，但计算成本显著降低。

更新时间: 2025-03-21 15:06:41

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12082v2

Contraction Theory for Nonlinear Stability Analysis and Learning-based Control: A Tutorial Overview

Contraction theory is an analytical tool to study differential dynamics of a non-autonomous (i.e., time-varying) nonlinear system under a contraction metric defined with a uniformly positive definite matrix, the existence of which results in a necessary and sufficient characterization of incremental exponential stability of multiple solution trajectories with respect to each other. By using a squared differential length as a Lyapunov-like function, its nonlinear stability analysis boils down to finding a suitable contraction metric that satisfies a stability condition expressed as a linear matrix inequality, indicating that many parallels can be drawn between well-known linear systems theory and contraction theory for nonlinear systems. Furthermore, contraction theory takes advantage of a superior robustness property of exponential stability used in conjunction with the comparison lemma. This yields much-needed safety and stability guarantees for neural network-based control and estimation schemes, without resorting to a more involved method of using uniform asymptotic stability for input-to-state stability. Such distinctive features permit the systematic construction of a contraction metric via convex optimization, thereby obtaining an explicit exponential bound on the distance between a time-varying target trajectory and solution trajectories perturbed externally due to disturbances and learning errors. The objective of this paper is, therefore, to present a tutorial overview of contraction theory and its advantages in nonlinear stability analysis of deterministic and stochastic systems, with an emphasis on deriving formal robustness and stability guarantees for various learning-based and data-driven automatic control methods. In particular, we provide a detailed review of techniques for finding contraction metrics and associated control and estimation laws using deep neural networks.

Updated: 2025-03-21 15:00:27

标题: 非线性稳定性分析和基于学习的控制的收缩理论：教程概述

摘要: 收缩理论是一种分析工具，用于研究非自治（即时间变化）非线性系统的微分动力学，在一个由均匀正定矩阵定义的收缩度量下存在，其存在性导致了对多个解轨迹相对于彼此的增量指数稳定性的必要和充分特征化。通过使用平方微分长度作为类Lyapunov函数，其非线性稳定性分析归结为找到一个适当的收缩度量，满足一个表达为线性矩阵不等式的稳定性条件，表明许多类似于众所周知的线性系统理论和非线性系统的收缩理论之间可以进行比较。此外，收缩理论利用了与比较引理结合使用的指数稳定性的卓越鲁棒性特性。这为基于神经网络的控制和估计方案提供了急需的安全性和稳定性保证，而无需采用更复杂的方法，即使用输入到状态稳定性获得统一渐近稳定性。这些独特特性允许通过凸优化系统地构建一个收缩度量，从而获得关于时间变化目标轨迹与由于干扰和学习误差而在外部受到扰动的解轨迹之间距离的显式指数界。因此，本文的目标是介绍收缩理论及其在确定性和随机系统的非线性稳定性分析中的优势，重点是为各种基于学习和数据驱动的自动控制方法导出形式上的鲁棒性和稳定性保证。特别是，我们提供了关于使用深度神经网络找到收缩度量和相关控制和估计法则的技术的详细回顾。

更新时间: 2025-03-21 15:00:27

领域: cs.LG,cs.RO,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2110.00675v5

LitLLMs, LLMs for Literature Review: Are we there yet?

Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. This paper explores the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We decompose the task into two components: 1. Retrieving related works given a query abstract, and 2. Writing a literature review based on the retrieved results. We analyze how effective LLMs are for both components. For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods, while providing insights into the LLM's decision-making process. In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We release this evaluation protocol to promote additional research and development in this regard. Our empirical results suggest that LLMs show promising potential for writing literature reviews when the task is decomposed into smaller components of retrieval and planning. Our project page including a demonstration system and toolkit can be accessed here: https://litllm.github.io.

Updated: 2025-03-21 14:56:58

标题: LitLLMs，文献综述的LLMs：我们已经到达了吗？

摘要: 文献综述是科学研究的一个重要组成部分，但由于最近研究论文的大量涌入，撰写文献综述仍然是一项耗时且具有挑战性的工作。本文探讨了最近大型语言模型（LLMs）在基于摘要辅助撰写文献综述中的零-shot能力。我们将任务分解为两个组成部分：1. 根据查询摘要检索相关作品，2. 根据检索结果撰写文献综述。我们分析了LLMs在这两个组成部分上的有效性。对于检索，我们引入了一种新颖的两步搜索策略，首先使用LLM从论文摘要中提取有意义的关键词，然后通过查询外部知识库检索潜在相关的论文。此外，我们研究了一种基于提示的重新排序机制，并表明重新排序相较于朴素搜索方法将归一化召回率翻倍，同时提供有关LLM决策过程的见解。在生成阶段，我们提出了一种两步方法，首先概述综述计划，然后执行计划中的步骤生成实际综述。为了评估基于不同LLM的文献综述方法，我们使用一个专门设计用于与新发布的LLM进行滚动使用的协议从arXiv论文中创建测试集，以避免零-shot评估中的测试集污染。我们发布这一评估协议，以促进这方面的进一步研究和发展。我们的实证结果表明，当任务分解为检索和计划的较小组成部分时，LLMs在撰写文献综述方面显示出有希望的潜力。我们的项目页面包括一个演示系统和工具包，可在此处访问：https://litllm.github.io。

更新时间: 2025-03-21 14:56:58

领域: cs.CL,cs.AI,cs.DL,cs.LG

下载: http://arxiv.org/abs/2412.15249v2

Advanced Deep Learning Methods for Protein Structure Prediction and Design

After AlphaFold won the Nobel Prize, protein prediction with deep learning once again became a hot topic. We comprehensively explore advanced deep learning methods applied to protein structure prediction and design. It begins by examining recent innovations in prediction architectures, with detailed discussions on improvements such as diffusion based frameworks and novel pairwise attention modules. The text analyses key components including structure generation, evaluation metrics, multiple sequence alignment processing, and network architecture, thereby illustrating the current state of the art in computational protein modelling. Subsequent chapters focus on practical applications, presenting case studies that range from individual protein predictions to complex biomolecular interactions. Strategies for enhancing prediction accuracy and integrating deep learning techniques with experimental validation are thoroughly explored. The later sections review the industry landscape of protein design, highlighting the transformative role of artificial intelligence in biotechnology and discussing emerging market trends and future challenges. Supplementary appendices provide essential resources such as databases and open source tools, making this volume a valuable reference for researchers and students.

Updated: 2025-03-21 14:54:59

标题: 高级深度学习方法用于蛋白质结构预测和设计

摘要: 在AlphaFold获得诺贝尔奖之后，利用深度学习进行蛋白质预测再次成为热门话题。我们全面探讨了应用于蛋白质结构预测和设计的先进深度学习方法。本文首先考察了预测架构的最新创新，详细讨论了基于扩散的框架和新颖的两两注意力模块等改进。文中分析了关键组成部分，包括结构生成、评估指标、多序列比对处理和网络架构，从而展示了计算蛋白质建模的当前技术水平。随后的章节着重于实际应用，提供了从单个蛋白质预测到复杂生物分子相互作用的案例研究。完善探讨了提高预测准确性和将深度学习技术与实验验证相结合的策略。后续部分审查了蛋白质设计的行业格局，突出了人工智能在生物技术中的转变作用，并讨论了新兴市场趋势和未来挑战。附录提供了基本资源，如数据库和开源工具，使本卷成为研究人员和学生的宝贵参考资料。

更新时间: 2025-03-21 14:54:59

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.13522v2

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.

Updated: 2025-03-21 14:54:29

标题: 代码作为监视器：面向约束的可视化编程用于反应性和主动式机器人故障检测

摘要: 自动检测和防止开环故障在闭环机器人系统中至关重要。最近的研究往往难以同时在故障发生后被动地识别意外故障并主动地预防可预见的故障。为此，我们提出了Code-as-Monitor（CaM），这是一种新颖的范式，利用视觉语言模型（VLM）来进行开环反应性和主动性故障检测。我们方法的核心是将这两个任务制定为一组统一的时空约束满足问题，并使用VLM生成的代码来实时监测它们。为了提高监测的准确性和效率，我们进一步引入了抽象约束相关实体或其部分为紧凑几何元素的约束元素。这种方法提供了更大的通用性，简化了跟踪，并通过利用这些元素作为视觉提示来促进约束感知视觉编程。实验证明，与基线相比，在三个模拟器和一个真实环境设置下，CaM在严重干扰下实现了28.7%更高的成功率，并将执行时间减少了31.8%。此外，CaM可以与开环控制策略集成，形成闭环系统，从而在动态环境中的混乱场景中实现长时程任务。

更新时间: 2025-03-21 14:54:29

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.04455v3

Algorithmic causal structure emerging through compression

We explore the relationship between causality, symmetry, and compression. We build on and generalize the known connection between learning and compression to a setting where causal models are not identifiable. We propose a framework where causality emerges as a consequence of compressing data across multiple environments. We define algorithmic causality as an alternative definition of causality when traditional assumptions for causal identifiability do not hold. We demonstrate how algorithmic causal and symmetric structures can emerge from minimizing upper bounds on Kolmogorov complexity, without knowledge of intervention targets. We hypothesize that these insights may also provide a novel perspective on the emergence of causality in machine learning models, such as large language models, where causal relationships may not be explicitly identifiable.

Updated: 2025-03-21 14:54:04

标题: 通过压缩算法产生的因果结构

摘要: 我们探讨了因果性，对称性和压缩之间的关系。我们在学习和压缩之间已知的联系的基础上进行了拓展，并将其推广到一个因果模型不可识别的设置中。我们提出了一个框架，因果性作为在多个环境中压缩数据的结果而出现。当传统的因果可识别性假设不成立时，我们将算法因果性定义为因果性的替代定义。我们展示了如何通过最小化Kolmogorov复杂度的上界，而不需要知道干预目标，来产生算法因果和对称结构。我们假设这些见解也可能为机器学习模型中因果性的出现提供新颖的视角，例如大型语言模型，在这些模型中因果关系可能不是显式可识别的。

更新时间: 2025-03-21 14:54:04

领域: cs.LG,cs.AI,cs.CC,cs.IT,math.IT

下载: http://arxiv.org/abs/2502.04210v3

Sparse PCA With Multiple Components

Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods-such as iteratively computing one sparse PC and deflating the covariance matrix-that do not guarantee the orthogonality, let alone the optimality, of the resulting solution when we seek multiple mutually orthogonal PCs. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations to supply high-quality upper bounds, which we strengthen via additional second-order cone inequalities when each PC's individual sparsity is specified. Further, we derive a combinatorial upper bound on the maximum amount of variance explained as a function of the support. We exploit these relaxations and bounds to propose exact methods and rounding mechanisms that, together, obtain solutions with a bound gap on the order of 0%-15% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. Numerically, our algorithms match (and sometimes surpass) the best performing methods in terms of fraction of variance explained and systematically return PCs that are sparse and orthogonal. In contrast, we find that existing methods like deflation return solutions that violate the orthogonality constraints, even when the data is generated according to sparse orthogonal PCs. Altogether, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.

Updated: 2025-03-21 14:52:20

标题: 多成分稀疏主成分分析Sparse PCA With Multiple Components

摘要: Sparse Principal Component Analysis（sPCA）是一种重要的技术，用于以可解释的方式获取高维数据集的方差解释的特征组合或主成分（PCs）。这涉及解决一个稀疏和正交约束的凸最大化问题，这在计算上是非常具有挑战性的。大多数现有的工作通过方法来解决稀疏PCA-例如通过迭代计算一个稀疏PC并缩减协方差矩阵-但这些方法不能保证在寻求多个相互正交的PCs时得到的解是正交的，更不用说是最优的了。我们通过重新制定正交性条件为秩约束，并同时优化稀疏性和秩约束，挑战了这种状况。我们设计了紧密的半定松弛来提供高质量的上界，通过额外的二阶锥不等式来加强这些上界，当每个PC的个体稀疏性被指定时。此外，我们推导了关于支持量的方差解释的最大量的组合上界。我们利用这些松弛和上界来提出确切的方法和舍入机制，一起为具有100个或1000个特征和r∈{2,3}个组件的实际数据集获得具有0%-15%级别的界差的解。数值上，我们的算法与表现最佳的方法相匹配（有时甚至超过），并系统地返回稀疏和正交的PCs。相比之下，我们发现像缩减这样的现有方法返回的解违反了正交性约束，即使数据是按照稀疏正交PCs生成的。总的来说，我们的方法以一种实际可行的方式解决了具有多个组件的稀疏PCA问题，使其接近最优。

更新时间: 2025-03-21 14:52:20

领域: math.OC,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2209.14790v3

LitLLM: A Toolkit for Scientific Literature Review

Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-factual information-and ignore the latest research they have not been trained on. To address these limitations, we propose a toolkit that operates on Retrieval Augmented Generation (RAG) principles, specialized prompting and instructing techniques with the help of LLMs. Our system first initiates a web search to retrieve relevant papers by summarizing user-provided abstracts into keywords using an off-the-shelf LLM. Authors can enhance the search by supplementing it with relevant papers or keywords, contributing to a tailored retrieval process. Second, the system re-ranks the retrieved papers based on the user-provided abstract. Finally, the related work section is generated based on the re-ranked results and the abstract. There is a substantial reduction in time and effort for literature review compared to traditional methods, establishing our toolkit as an efficient alternative. Our project page including the demo and toolkit can be accessed here: https://litllm.github.io

Updated: 2025-03-21 14:49:10

标题: LitLLM：科学文献综述工具箱

摘要: 为科学论文进行文献综述对于理解研究及其限制、并在现有工作基础上进行建设至关重要。这是一项繁琐的任务，因此自动文献综述生成器具有吸引力。不幸的是，许多现有使用大型语言模型(LLMs)生成此类综述的作品存在重大局限性。它们往往会产生非事实信息并忽视尚未接受训练的最新研究。为了解决这些限制，我们提出了一个基于检索增强生成(RAG)原则、专门提示和指导技术的工具包，借助LLMs的帮助。我们的系统首先通过总结用户提供的摘要并将其转化为关键字来启动网络搜索，以检索相关论文。作者可以通过补充相关论文或关键字来增强搜索，有助于定制检索过程。其次，系统将根据用户提供的摘要对检索到的论文重新排序。最后，基于重新排序的结果和摘要生成相关工作部分。与传统方法相比，文献综述的时间和精力成本大大降低，从而确立我们的工具包作为一种高效的替代方案。我们的项目页面包括演示和工具包，可在此处访问：https://litllm.github.io

更新时间: 2025-03-21 14:49:10

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2402.01788v2

Architecture of Information

The paper explores an approach to constructing energy landscapes of a formal neuron and multilayer artificial neural networks (ANNs). Their analysis makes it possible to determine the conceptual limitations of both classification ANNs (e.g., MLP or CNN) and generative ANN models. The study of informational and thermodynamic entropy in formal neuron and ANN models leads to the conclusion about the energetic nature of informational entropy. The application of the Gibbs free energy concept allows representing the output information of ANNs as the structured part of enthalpy. Modeling ANNs as energy systems makes it possible to interpret the structure of their internal energy as an internal model of the external world, which self-organizes based on the interaction of the system's internal energy components. The control of the self-organization and evolution process of this model is carried out through an energy function (analogous to the Lyapunov function) based on reduction operators. This makes it possible to introduce a new approach to constructing self-organizing and evolutionary ANNs with direct learning, which does not require additional external algorithms. The presented research makes it possible to formulate a formal definition of information in terms of the interaction processes between the internal and external energy of the system.

Updated: 2025-03-21 14:48:41

标题: 信息体系结构

摘要: 这篇论文探讨了构建形式神经元和多层人工神经网络(ANNs)能量景观的方法。他们的分析使得可以确定分类ANNs（例如MLP或CNN）和生成ANN模型的概念限制。对形式神经元和ANN模型中的信息熵和热力学熵的研究导致对信息熵的能量性质的结论。应用吉布斯自由能的概念允许将ANNs的输出信息表示为焓的结构部分。将ANNs建模为能量系统使得可以将其内部能量的结构解释为外部世界的内部模型，该模型基于系统内部能量组件的相互作用进行自组织。通过基于约化算子的能量函数（类似于Lyapunov函数）控制这个模型的自组织和演化过程。这使得可以引入一种新的方法来构建具有直接学习的自组织和进化ANNs，而无需额外的外部算法。所提出的研究使得可以在内部和外部能量之间的相互作用过程中以形式化的方式定义信息。

更新时间: 2025-03-21 14:48:41

领域: cs.NE,cs.AI,cs.IT,cs.LG,math.IT,H.1.1; I.2.0

下载: http://arxiv.org/abs/2503.21794v1

Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising

Non-transferable learning (NTL) has been proposed to protect model intellectual property (IP) by creating a "non-transferable barrier" to restrict generalization from authorized to unauthorized domains. Recently, well-designed attack, which restores the unauthorized-domain performance by fine-tuning NTL models on few authorized samples, highlights the security risks of NTL-based applications. However, such attack requires modifying model weights, thus being invalid in the black-box scenario. This raises a critical question: can we trust the security of NTL models deployed as black-box systems? In this work, we reveal the first loophole of black-box NTL models by proposing a novel attack method (dubbed as JailNTL) to jailbreak the non-transferable barrier through test-time data disguising. The main idea of JailNTL is to disguise unauthorized data so it can be identified as authorized by the NTL model, thereby bypassing the non-transferable barrier without modifying the NTL model weights. Specifically, JailNTL encourages unauthorized-domain disguising in two levels, including: (i) data-intrinsic disguising (DID) for eliminating domain discrepancy and preserving class-related content at the input-level, and (ii) model-guided disguising (MGD) for mitigating output-level statistics difference of the NTL model. Empirically, when attacking state-of-the-art (SOTA) NTL models in the black-box scenario, JailNTL achieves an accuracy increase of up to 55.7% in the unauthorized domain by using only 1% authorized samples, largely exceeding existing SOTA white-box attacks.

Updated: 2025-03-21 14:47:33

标题: 越狱非可转移屏障：通过测试时间数据伪装

摘要: 非可转移学习（NTL）被提出来保护模型知识产权（IP），通过创建一个“非可转移障碍”来限制从授权到未授权领域的泛化。最近，一种设计良好的攻击，通过在少量授权样本上微调NTL模型，恢复了未授权领域的性能，突显了基于NTL的应用的安全风险。然而，这种攻击需要修改模型权重，在黑盒方案中无效。这引出了一个关键问题：我们可以相信部署为黑盒系统的NTL模型的安全性吗？在这项工作中，我们通过提出一种新的攻击方法（称为JailNTL）来揭示黑盒NTL模型的第一个漏洞，通过测试时间数据伪装来越过非可转移障碍。JailNTL的主要思想是伪装未授权数据，使其被NTL模型识别为授权数据，从而绕过非可转移障碍而不修改NTL模型权重。具体来说，JailNTL在两个级别上鼓励未授权领域的伪装，包括：（i）数据内在伪装（DID）来消除领域差异并保留输入级别的与类相关的内容，以及（ii）模型引导伪装（MGD）来减轻NTL模型输出级别的统计差异。经验上，在黑盒情况下攻击最先进的NTL模型时，JailNTL通过仅使用1%的授权样本，在未授权领域的准确性提高了高达55.7％，远远超过现有的最先进的白盒攻击。

更新时间: 2025-03-21 14:47:33

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17198v1

Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints

To find the shortest paths for all pairs on manifolds with infinitesimally defined metrics, we introduce a framework to generate them by predicting midpoints recursively. To learn midpoint prediction, we propose an actor-critic approach. We prove the soundness of our approach and show experimentally that the proposed method outperforms existing methods on several planning tasks, including path planning for agents with complex kinematics and motion planning for multi-degree-of-freedom robot arms.

Updated: 2025-03-21 14:44:42

标题: 使用演员-评论者强化学习生成测地线以预测中点

摘要: 为了找到在具有无限小定义度量的流形上的所有对的最短路径，我们引入了一个通过递归预测中点来生成它们的框架。为了学习中点预测，我们提出了一个演员-评论家方法。我们证明了我们方法的合理性，并通过实验证明了所提出的方法在几个规划任务上优于现有方法，包括对具有复杂运动学的代理路径规划和多自由度机器人臂运动规划的规划。

更新时间: 2025-03-21 14:44:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01991v3

TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning

Model customization requires high-quality and diverse datasets, but acquiring such data remains challenging and costly. Although large language models (LLMs) can synthesize training data, current approaches are constrained by limited seed data, model bias and insufficient control over the generation process, resulting in limited diversity and biased distribution with the increase of data scales. To tackle this challenge, we present TreeSynth, a tree-guided subspace-based data synthesis framework that recursively partitions the entire data space into hierar-chical subspaces, enabling comprehensive and diverse scaling of data synthesis. Briefly, given a task-specific description, we construct a data space partitioning tree by iteratively executing criteria determination and subspace coverage steps. This hierarchically divides the whole space (i.e., root node) into mutually exclusive and complementary atomic subspaces (i.e., leaf nodes). By collecting synthesized data according to the attributes of each leaf node, we obtain a diverse dataset that fully covers the data space. Empirically, our extensive experiments demonstrate that TreeSynth surpasses both human-designed datasets and the state-of-the-art data synthesis baselines, achieving maximum improvements of 45.2% in data diversity and 17.6% in downstream task performance across various models and tasks. Hopefully, TreeSynth provides a scalable solution to synthesize diverse and comprehensive datasets from scratch without human intervention.

Updated: 2025-03-21 14:43:23

标题: TreeSynth：通过树引导的子空间划分从零开始合成多样化数据

摘要: 模型定制需要高质量和多样化的数据集，但获取这样的数据仍然具有挑战性和成本高昂。尽管大型语言模型（LLMs）可以合成训练数据，但当前的方法受到有限的种子数据、模型偏差和对生成过程的不足控制的限制，导致数据规模增加时多样性有限且分布偏向。为了解决这一挑战，我们提出了TreeSynth，这是一个基于树引导的子空间数据合成框架，可以将整个数据空间递归地划分为层次子空间，实现全面和多样化的数据合成。简而言之，根据特定任务的描述，我们通过迭代执行标准确定和子空间覆盖步骤来构建数据空间划分树。这样可以将整个空间（即根节点）层次化地划分为互斥和互补的原子子空间（即叶节点）。通过根据每个叶节点的属性收集合成的数据，我们可以获得一个完全覆盖数据空间的多样化数据集。从经验上看，我们的广泛实验证明，TreeSynth超过了人工设计的数据集和最先进的数据合成基线，各种模型和任务中的数据多样性和下游任务性能最大提高了45.2%和17.6%。希望TreeSynth提供了一种可扩展的解决方案，可以在没有人为干预的情况下，从头开始合成多样化和全面的数据集。

更新时间: 2025-03-21 14:43:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17195v1

Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem

In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk. Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty -- failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising: (1) a curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance, and (2) an offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost. Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.

Updated: 2025-03-21 14:43:11

标题: 课程RL遇见蒙特卡洛规划：优化现实世界集装箱管理问题

摘要: 在这项工作中，我们通过推理时间的碰撞模型增强了强化学习，以确保在处理能力有限的废物分类设施中进行安全高效的容器管理。每个容器都有两个最佳卸载容量，平衡了更高的吞吐量和溢流风险。传统的强化学习方法在延迟奖励、稀疏关键事件和高维度不确定性下表现不佳，无法始终平衡高容量卸载和安全限制违规的风险。为了解决这些挑战，我们提出了一种混合方法，包括：（1）一个渐进式训练PPO代理处理延迟奖励和类别不平衡的课程学习流水线，以及（2）在推理时间使用的离线成对碰撞模型，以最小化在线成本预防碰撞。实验结果表明，我们针对推理时间的碰撞检查显著改善了碰撞回避，减少了安全限制违规，保持了高吞吐量，并在不同容器与处理单元比例下有效扩展。这些发现为设计现实世界设施中安全高效的容器管理系统提供了可操作的指导。

更新时间: 2025-03-21 14:43:11

领域: cs.LG

下载: http://arxiv.org/abs/2503.17194v1

Zero-Shot Reinforcement Learning via Function Encoders

Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation.

Updated: 2025-03-21 14:37:37

标题: 通过功能编码器的零样本强化学习

摘要: 尽管强化学习（RL）可以解决许多具有挑战性的顺序决策问题，但在相关任务之间实现零-shot转移仍然是一个挑战。困难在于找到一个良好的代表当前任务的表示，以便代理了解它与先前看到的任务的关系。为了实现零-shot转移，我们引入了函数编码器，这是一种表示学习算法，它将函数表示为学习的非线性基函数的加权组合。通过使用函数编码器表示奖励函数或转移函数，代理可以通过连贯的向量表示了解当前任务与先前看到的任务之间的关系。因此，代理能够在运行时在相关任务之间实现转移，而无需额外的训练。我们通过将基本RL算法与函数编码器任务表示相结合，在三个RL领域展示了最先进的数据效率、渐近性能和训练稳定性。

更新时间: 2025-03-21 14:37:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.17173v3

GiVE: Guiding Visual Encoder to Perceive Overlooked Information

Multimodal Large Language Models have advanced AI in applications like text-to-video generation and visual question answering. These models rely on visual encoders to convert non-text data into vectors, but current encoders either lack semantic alignment or overlook non-salient objects. We propose the Guiding Visual Encoder to Perceive Overlooked Information (GiVE) approach. GiVE enhances visual representation with an Attention-Guided Adapter (AG-Adapter) module and an Object-focused Visual Semantic Learning module. These incorporate three novel loss terms: Object-focused Image-Text Contrast (OITC) loss, Object-focused Image-Image Contrast (OIIC) loss, and Object-focused Image Discrimination (OID) loss, improving object consideration, retrieval accuracy, and comprehensiveness. Our contributions include dynamic visual focus adjustment, novel loss functions to enhance object retrieval, and the Multi-Object Instruction (MOInst) dataset. Experiments show our approach achieves state-of-the-art performance.

Updated: 2025-03-21 14:36:09

标题: GiVE: 引导视觉编码器感知被忽视的信息

摘要: 多模态大型语言模型在文本到视频生成和视觉问答等应用中推动了人工智能的发展。这些模型依赖于视觉编码器将非文本数据转换为向量，但当前的编码器要么缺乏语义对齐，要么忽视非显著对象。我们提出了引导视觉编码器感知被忽视信息（GiVE）方法。GiVE通过注意引导适配器（AG-Adapter）模块和面向对象的视觉语义学习模块增强了视觉表示。这些模块包括三个新颖的损失项：面向对象的图像-文本对比（OITC）损失，面向对象的图像-图像对比（OIIC）损失和面向对象的图像判别（OID）损失，改善了对象考虑、检索准确性和全面性。我们的贡献包括动态视觉焦点调整、新颖的损失函数以增强对象检索以及多对象指导（MOInst）数据集。实验表明我们的方法实现了最先进的性能。

更新时间: 2025-03-21 14:36:09

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2410.20109v2

D2Fusion: Dual-domain Fusion with Feature Superposition for Deepfake Detection

Deepfake detection is crucial for curbing the harm it causes to society. However, current Deepfake detection methods fail to thoroughly explore artifact information across different domains due to insufficient intrinsic interactions. These interactions refer to the fusion and coordination after feature extraction processes across different domains, which are crucial for recognizing complex forgery clues. Focusing on more generalized Deepfake detection, in this work, we introduce a novel bi-directional attention module to capture the local positional information of artifact clues from the spatial domain. This enables accurate artifact localization, thus addressing the coarse processing with artifact features. To further address the limitation that the proposed bi-directional attention module may not well capture global subtle forgery information in the artifact feature (e.g., textures or edges), we employ a fine-grained frequency attention module in the frequency domain. By doing so, we can obtain high-frequency information in the fine-grained features, which contains the global and subtle forgery information. Although these features from the diverse domains can be effectively and independently improved, fusing them directly does not effectively improve the detection performance. Therefore, we propose a feature superposition strategy that complements information from spatial and frequency domains. This strategy turns the feature components into the form of wave-like tokens, which are updated based on their phase, such that the distinctions between authentic and artifact features can be amplified. Our method demonstrates significant improvements over state-of-the-art (SOTA) methods on five public Deepfake datasets in capturing abnormalities across different manipulated operations and real-life.

Updated: 2025-03-21 14:31:33

标题: D2Fusion：双域融合与特征叠加用于深度伪造检测

摘要: Deepfake检测对遏制其对社会造成的伤害至关重要。然而，当前的Deepfake检测方法由于内在交互不足而未能彻底探索不同领域中的工件信息。这些交互指的是在不同领域的特征提取过程之后的融合和协调，这对于识别复杂的伪造线索至关重要。在更加通用的Deepfake检测方面，本文介绍了一种新颖的双向注意力模块，以捕获空间域中工件线索的局部位置信息。这使得准确的工件定位成为可能，从而解决了对工件特征的粗糙处理。为了进一步解决提出的双向注意力模块可能无法很好地捕获工件特征中全局微妙伪造信息（例如纹理或边缘）的局限性，我们在频率域中采用了一个细粒度频率注意力模块。通过这样做，我们可以获得精细特征中的高频信息，其中包含全局和微妙的伪造信息。尽管这些来自不同领域的特征可以得到有效和独立的改善，但直接融合它们并不能有效提高检测性能。因此，我们提出了一个特征叠加策略，该策略补充了来自空间和频率域的信息。这种策略将特征组件转化为类似波状令牌的形式，根据它们的相位进行更新，从而放大真实和工件特征之间的差异。我们的方法在五个公共Deepfake数据集上展示了显著的改进，能捕捉不同操纵操作和现实生活中的异常情况。

更新时间: 2025-03-21 14:31:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17184v1

LLMs Love Python: A Study of LLMs' Bias for Programming Languages and Libraries

Programming language and library choices are crucial to software reliability and security. Poor or inconsistent choices can lead to increased technical debt, security vulnerabilities, and even catastrophic failures in safety-critical systems. As Large Language Models (LLMs) play an increasing role in code generation, it is essential to understand how they make these decisions. However, little is known about their preferences when selecting programming languages and libraries for different coding tasks. To fill this gap, this study provides the first in-depth investigation into LLM preferences for programming languages and libraries used when generating code. We assess the preferences of eight diverse LLMs by prompting them to complete various coding tasks, including widely-studied benchmarks and the more practical task of generating the initial structural code for new projects (a crucial step that often determines a project's language or library choices). Our findings reveal that LLMs heavily favour Python when solving language-agnostic problems, using it in 90%-97% of cases for benchmark tasks. Even when generating initial project code where Python is not a suitable language, it remains the most-used language in 58% of instances. Moreover, LLMs contradict their own language recommendations in 83% of project initialisation tasks, raising concerns about their reliability in guiding language selection. Similar biases toward well-established libraries further create serious discoverability challenges for newer open-source projects. These results highlight the need to improve LLMs' adaptability to diverse programming contexts and to develop mechanisms for mitigating programming language and library bias.

Updated: 2025-03-21 14:29:35

标题: LLMs喜欢Python：LLMs对编程语言和库的偏好研究

摘要: 编程语言和库的选择对软件的可靠性和安全性至关重要。糟糕或不一致的选择可能导致技术债务的增加、安全漏洞甚至在安全关键系统中发生灾难性故障。随着大型语言模型（LLMs）在代码生成中发挥越来越重要的作用，了解它们如何做出这些决定至关重要。然而，我们对它们在为不同的编码任务选择编程语言和库时的偏好知之甚少。为了填补这一空白，本研究首次深入调查了LLMs在生成代码时对编程语言和库的偏好。我们通过要求八种不同的LLMs完成各种编码任务，包括广泛研究的基准测试和更实际的任务，即为新项目生成初始结构化代码（这是一个关键步骤，通常决定项目的语言或库选择）来评估它们的偏好。我们的研究结果显示，LLMs在解决与语言无关的问题时极大地偏爱Python，在基准测试任务中使用Python的情况占90%-97%。即使在生成不适合Python的语言的初始项目代码时，Python仍然是58%实例中最常用的语言。此外，LLMs在83%的项目初始化任务中与自己的语言推荐相矛盾，这引发了对它们在指导语言选择方面可靠性的担忧。对已建立的库的类似偏见进一步为新的开源项目带来了严重的可发现性挑战。这些结果突显了改善LLMs适应多样化编程环境的必要性，以及开发减轻编程语言和库偏见的机制的需求。

更新时间: 2025-03-21 14:29:35

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.17181v1

Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability

The ability of machine learning (ML) classification models to resist small, targeted input perturbations - known as adversarial attacks - is a key measure of their safety and reliability. We show that floating-point non associativity (FPNA) coupled with asynchronous parallel programming on GPUs is sufficient to result in misclassification, without any perturbation to the input. Additionally, we show this misclassification is particularly significant for inputs close to the decision boundary and that standard adversarial robustness results may be overestimated up to 4.6% when not considering machine-level details. We first study a linear classifier, before focusing on standard Graph Neural Network (GNN) architectures and datasets. We present a novel black-box attack using Bayesian optimization to determine external workloads that bias the output of reductions on GPUs and reliably lead to misclassification. Motivated by these results, we present a new learnable permutation (LP) gradient-based approach, to learn floating point operation orderings that lead to misclassifications, making the assumption that any reduction or permutation ordering is possible. This LP approach provides a worst-case estimate in a computationally efficient manner, avoiding the need to run identical experiments tens of thousands of times over a potentially large set of possible GPU states or architectures. Finally, we investigate parallel reduction ordering across different GPU architectures for a reduction under three conditions: (1) executing external background workloads, (2) utilizing multi-GPU virtualization, and (3) applying power capping. Our results demonstrate that parallel reduction ordering varies significantly across architectures under the first two conditions. The results and methods developed here can help to include machine-level considerations into adversarial robustness assessments.

Updated: 2025-03-21 14:19:45

标题: 深度学习分类在GPU上对敌对输入的稳健性：异步并行累积是一种脆弱性来源

摘要: 机器学习（ML）分类模型抵抗小型、有针对性的输入扰动（称为对抗攻击）的能力是它们安全性和可靠性的关键衡量标准。我们展示了浮点非结合性（FPNA）与异步并行编程在GPU上的结合足以导致误分类，而无需对输入进行任何扰动。此外，我们发现这种误分类对于接近决策边界的输入尤为显著，并且在不考虑机器级细节时，标准对抗鲁棒性结果可能被高估多达4.6％。我们首先研究了一个线性分类器，然后专注于标准图神经网络（GNN）架构和数据集。我们提出了一种利用贝叶斯优化确定偏置GPU上的规约输出并可靠导致误分类的外部工作负载的新型黑盒攻击。受到这些结果的启发，我们提出了一种新的可学习的置换（LP）基于梯度的方法，以学习导致误分类的浮点操作顺序，假设任何规约或置换顺序都是可能的。这种LP方法以计算高效的方式提供了最坏情况的估计，避免了在可能的大量GPU状态或架构上运行数万次相同实验的需要。最后，我们调查了在三种条件下不同GPU架构上的并行规约排序：（1）执行外部后台工作负载，（2）利用多GPU虚拟化，以及（3）应用功耗限制。我们的结果表明，在前两种条件下，不同GPU架构之间的并行规约排序显著变化。这里开发的结果和方法可以帮助将机器级考虑因素纳入对抗鲁棒性评估中。

更新时间: 2025-03-21 14:19:45

领域: cs.LG,cs.DC,I.2.11; B.8.1

下载: http://arxiv.org/abs/2503.17173v1

Principal Eigenvalue Regularization for Improved Worst-Class Certified Robustness of Smoothed Classifiers

Recent studies have identified a critical challenge in deep neural networks (DNNs) known as ``robust fairness", where models exhibit significant disparities in robust accuracy across different classes. While prior work has attempted to address this issue in adversarial robustness, the study of worst-class certified robustness for smoothed classifiers remains unexplored. Our work bridges this gap by developing a PAC-Bayesian bound for the worst-class error of smoothed classifiers. Through theoretical analysis, we demonstrate that the largest eigenvalue of the smoothed confusion matrix fundamentally influences the worst-class error of smoothed classifiers. Based on this insight, we introduce a regularization method that optimizes the largest eigenvalue of smoothed confusion matrix to enhance worst-class accuracy of the smoothed classifier and further improve its worst-class certified robustness. We provide extensive experimental validation across multiple datasets and model architectures to demonstrate the effectiveness of our approach.

Updated: 2025-03-21 14:18:18

标题: 主特征值正则化用于改善平滑分类器的最差类别认证鲁棒性

摘要: 最近的研究已经确定了深度神经网络（DNNs）中的一个关键挑战，即“健壮公平性”，其中模型在不同类别之间显示出明显的健壮准确性差异。尽管先前的研究已经试图解决对抗性健壮性问题，但对于平滑分类器的最坏类别认证健壮性的研究仍未被探索。我们的工作填补了这一空白，通过为平滑分类器开发PAC-Bayesian界限来研究平滑分类器的最坏类别错误。通过理论分析，我们证明了平滑混淆矩阵的最大特征值根本影响了平滑分类器的最坏类别错误。基于这一见解，我们引入了一种正则化方法，通过优化平滑混淆矩阵的最大特征值来增强平滑分类器的最坏类别准确性，并进一步提高其最坏类别认证健壮性。我们在多个数据集和模型架构上进行了广泛的实验验证，以展示我们方法的有效性。

更新时间: 2025-03-21 14:18:18

领域: cs.LG

下载: http://arxiv.org/abs/2503.17172v1

Generative adversarial framework to calibrate excursion set models for the 3D morphology of all-solid-state battery cathodes

This paper presents a computational method for generating virtual 3D morphologies of functional materials using low-parametric stochastic geometry models, i.e., digital twins, calibrated with 2D microscopy images. These digital twins allow systematic parameter variations to simulate various morphologies, that can be deployed for virtual materials testing by means of spatially resolved numerical simulations of macroscopic properties. Generative adversarial networks (GANs) have gained popularity for calibrating models to generate realistic 3D morphologies. However, GANs often comprise of numerous uninterpretable parameters make systematic variation of morphologies for virtual materials testing challenging. In contrast, low-parametric stochastic geometry models (e.g., based on Gaussian random fields) enable targeted variation but may struggle to mimic complex morphologies. Combining GANs with advanced stochastic geometry models (e.g., excursion sets of more general random fields) addresses these limitations, allowing model calibration solely from 2D image data. This approach is demonstrated by generating a digital twin of all-solid-state battery (ASSB) cathodes. Since the digital twins are parametric, they support systematic exploration of structural scenarios and their macroscopic properties. The proposed method facilitates simulation studies for optimizing 3D morphologies, benefiting not only ASSB cathodes but also other materials with similar structures.

Updated: 2025-03-21 14:18:15

标题: 生成对抗框架用于校准全固态电池阴极三维形态的超越集模型

摘要: 这篇论文提出了一种利用低参数随机几何模型生成功能材料虚拟3D形态的计算方法，即数字孪生体，通过与2D显微镜图像校准。这些数字孪生体允许系统参数变化，模拟各种形态，可通过宏观性质的空间分辨数值模拟进行虚拟材料测试。生成对抗网络（GANs）因为可以校准模型生成逼真的3D形态而备受推崇。然而，GANs通常由许多无法解释的参数组成，使得对于虚拟材料测试而言形态的系统变化具有挑战性。相反，低参数随机几何模型（例如基于高斯随机场）能够实现有针对性的变化，但可能难以模拟复杂的形态。将GANs与先进的随机几何模型（例如更一般随机场的外沿集）结合起来，可以解决这些限制，允许仅通过2D图像数据进行模型校准。通过生成全固态电池（ASSB）正极的数字孪生体来展示这种方法。由于数字孪生体是参数化的，它们支持对结构情景及其宏观性质的系统探索。所提出的方法促进了用于优化3D形态的模拟研究，不仅有益于ASSB正极，也有益于具有类似结构的其他材料。

更新时间: 2025-03-21 14:18:15

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.17171v1

Hi-ALPS -- An Experimental Robustness Quantification of Six LiDAR-based Object Detection Systems for Autonomous Driving

Light Detection and Ranging (LiDAR) is an essential sensor technology for autonomous driving as it can capture high-resolution 3D data. As 3D object detection systems (OD) can interpret such point cloud data, they play a key role in the driving decisions of autonomous vehicles. Consequently, such 3D OD must be robust against all types of perturbations and must therefore be extensively tested. One approach is the use of adversarial examples, which are small, sometimes sophisticated perturbations in the input data that change, i.e., falsify, the prediction of the OD. These perturbations are carefully designed based on the weaknesses of the OD. The robustness of the OD cannot be quantified with adversarial examples in general, because if the OD is vulnerable to a given attack, it is unclear whether this is due to the robustness of the OD or whether the attack algorithm produces particularly strong adversarial examples. The contribution of this work is Hi-ALPS -- Hierarchical Adversarial-example-based LiDAR Perturbation Level System, where higher robustness of the OD is required to withstand the perturbations as the perturbation levels increase. In doing so, the Hi-ALPS levels successively implement a heuristic followed by established adversarial example approaches. In a series of comprehensive experiments using Hi-ALPS, we quantify the robustness of six state-of-the-art 3D OD under different types of perturbations. The results of the experiments show that none of the OD is robust against all Hi-ALPS levels; an important factor for the ranking is that human observers can still correctly recognize the perturbed objects, as the respective perturbations are small. To increase the robustness of the OD, we discuss the applicability of state-of-the-art countermeasures. In addition, we derive further suggestions for countermeasures based on our experimental results.

Updated: 2025-03-21 14:17:02

标题: Hi-ALPS - 自动驾驶六个基于LiDAR的物体检测系统的实验鲁棒性量化

摘要: 光探测与测距（LiDAR）是自动驾驶的重要传感器技术，因为它可以捕捉高分辨率的3D数据。3D目标检测系统（OD）可以解释这种点云数据，它在自动驾驶车辆的驾驶决策中起着关键作用。因此，这种3D目标检测必须对各种干扰具有鲁棒性，因此必须进行广泛的测试。一种方法是使用对抗样本，这些对抗样本是输入数据中的小型、有时是复杂的扰动，可以改变目标检测的预测，即伪造。这些扰动是根据目标检测的弱点精心设计的。通常情况下，无法用对抗样本来量化目标检测的鲁棒性，因为如果目标检测对某种攻击脆弱，就不清楚这是目标检测的鲁棒性导致的，还是攻击算法产生了特别强大的对抗样本。本文的贡献是Hi-ALPS——基于对抗样本的激光雷达扰动级别系统，随着扰动水平的增加，需要目标检测具有更高的鲁棒性来抵抗扰动。为此，Hi-ALPS级别依次实施一种启发式方法，然后是已建立的对抗样本方法。通过使用Hi-ALPS进行一系列全面的实验，我们量化了六种最先进的3D目标检测在不同类型扰动下的鲁棒性。实验结果显示，没有一种目标检测对所有的Hi-ALPS级别都具有鲁棒性；排名的一个重要因素是人类观察者仍然可以正确识别被扰动的对象，因为相应的扰动很小。为增加目标检测的鲁棒性，我们讨论了最先进的对抗措施的适用性。此外，根据我们的实验结果，我们提出了进一步的对抗措施建议。

更新时间: 2025-03-21 14:17:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17168v1

DiTEC-WDN: A Large-Scale Dataset of Water Distribution Network Scenarios under Diverse Hydraulic Conditions

Privacy restrictions hinder the sharing of real-world Water Distribution Network (WDN) models, limiting the application of emerging data-driven machine learning, which typically requires extensive observations. To address this challenge, we propose the dataset DiTEC-WDN that comprises 36,000 unique scenarios simulated over either short-term (24 hours) or long-term (1 year) periods. We constructed this dataset using an automated pipeline that optimizes crucial parameters (e.g., pressure, flow rate, and demand patterns), facilitates large-scale simulations, and records discrete, synthetic but hydraulically realistic states under standard conditions via rule validation and post-hoc analysis. With a total of 228 million generated graph-based states, DiTEC-WDN can support a variety of machine-learning tasks, including graph-level, node-level, and link-level regression, as well as time-series forecasting. This contribution, released under a public license, encourages open scientific research in the critical water sector, eliminates the risk of exposing sensitive data, and fulfills the need for a large-scale water distribution network benchmark for study comparisons and scenario analysis.

Updated: 2025-03-21 14:14:03

标题: DiTEC-WDN：多种水力条件下的大规模水配水网络场景数据集

摘要: 隐私限制阻碍了分享现实世界的水配水网络（WDN）模型，限制了新兴数据驱动的机器学习的应用，这通常需要大量的观察。为了解决这一挑战，我们提出了数据集DiTEC-WDN，包括在短期（24小时）或长期（1年）内模拟的36,000个独特场景。我们使用自动化流程构建了这个数据集，该流程优化了关键参数（例如压力、流量和需求模式），促进了大规模模拟，并通过规则验证和事后分析记录了离散的、合成但在标准条件下具有水力学现实性的状态。DiTEC-WDN总共生成了2.28亿个基于图的状态，可支持各种机器学习任务，包括图级、节点级和链接级回归，以及时间序列预测。这一贡献以公共许可发布，鼓励在关键水域进行开放的科学研究，消除了泄露敏感数据的风险，并满足了大规模水配水网络基准的研究比较和场景分析的需求。

更新时间: 2025-03-21 14:14:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17167v1

Data-driven Camera and Lidar Simulation Models for Autonomous Driving: A Review from Generative Models to Volume Renderers

Perception sensors, particularly camera and Lidar, are key elements of Autonomous Driving Systems (ADS) that enable them to comprehend their surroundings to informed driving and control decisions. Therefore, developing realistic simulation models for these sensors is essential for conducting effective simulation-based testing of ADS. Moreover, the rise of deep learning-based perception models has increased the utility of sensor simulation models for synthesising diverse training datasets. The traditional sensor simulation models rely on computationally expensive physics-based algorithms, specifically in complex systems such as ADS. Hence, the current potential resides in data-driven approaches, fuelled by the exceptional performance of deep generative models in capturing high-dimensional data distribution and volume renderers in accurately representing scenes. This paper reviews the current state-of-the-art data-driven camera and Lidar simulation models and their evaluation methods. It explores a spectrum of models from the novel perspective of generative models and volume renderers. Generative models are discussed in terms of their input-output types, while volume renderers are categorised based on their input encoding. Finally, the paper illustrates commonly used evaluation techniques for assessing sensor simulation models and highlights the existing research gaps in the area.

Updated: 2025-03-21 14:13:38

标题: 基于数据驱动的自动驾驶相机和激光雷达模拟模型：从生成模型到体积渲染器的综述

摘要: 感知传感器，尤其是摄像头和激光雷达，是自动驾驶系统（ADS）的关键元素，使它们能够理解周围环境以做出明智的驾驶和控制决策。因此，为这些传感器开发逼真的仿真模型对于进行有效的基于仿真的ADS测试至关重要。此外，基于深度学习的感知模型的崛起增加了传感器仿真模型在合成多样化训练数据集方面的实用性。传统的传感器仿真模型依赖于计算密集型的基于物理的算法，特别是在复杂系统如ADS中。因此，当前的潜力在于数据驱动方法，受到深度生成模型在捕捉高维数据分布和体积渲染器在准确表示场景方面出色性能的推动。本文回顾了当前最先进的数据驱动摄像头和激光雷达仿真模型及其评估方法。它从生成模型和体积渲染器的新颖视角探讨了一系列模型。生成模型根据其输入输出类型进行讨论，而体积渲染器则根据其输入编码进行分类。最后，本文展示了用于评估传感器仿真模型的常用评估技术，并强调了该领域中存在的研究空白。

更新时间: 2025-03-21 14:13:38

领域: cs.CV,cs.GR,cs.LG,cs.RO

下载: http://arxiv.org/abs/2402.10079v2

Low-Rank Thinning

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

Updated: 2025-03-21 14:13:04

标题: 低秩稀疏

摘要: 在稀疏化中的目标是使用一小组代表性点来总结数据集。值得注意的是，像Kernel Halving和Compress这样的亚高斯稀疏化算法可以与均匀子抽样的质量相匹配，同时大大减少总结点的数量。然而，现有的保证仅涵盖一定范围的分布和基于核的质量度量，并且受到悲观的维度依赖的影响。为了解决这些缺点，我们引入了一种新的亚高斯稀疏化的低秩分析，适用于任何分布和任何核，保证在核或数据矩阵近似低秩时提供高质量的压缩。为了展示这些技术的广泛适用性，我们设计了实用的亚高斯稀疏化方法，改进了用于近似变压器中的注意力的最佳保证，通过重新排序加速随机梯度训练，以及在近线性时间内区分分布。

更新时间: 2025-03-21 14:13:04

领域: stat.ML,cs.LG,math.OC,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2502.12063v2

Learning Robust Reward Machines from Noisy Labels

This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state machine that decomposes the agent's task into different subtasks. PROB-IRM uses a state-of-the-art inductive logic programming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of beliefs, thus ensuring robustness against inconsistencies. Pivotal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental analysis shows that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully. Despite the complexity of learning the RM from noisy traces, agents trained with PROB-IRM perform comparably to agents provided with handcrafted RMs.

Updated: 2025-03-21 14:07:55

标题: 学习如何从嘈杂标签中获取鲁棒的奖励机制

摘要: 本文介绍了PROB-IRM，一种从嘈杂的执行轨迹中学习鲁棒奖励机器（RMs）的方法，用于强化学习（RL）代理。RM驱动的RL的关键方面是利用一个有限状态机将代理的任务分解为不同的子任务。PROB-IRM使用了一个能够从嘈杂示例中学习RMs的最新归纳逻辑编程框架，利用贝叶斯后验信念的程度来确保抵抗不一致性。结果的关键在于RM学习和策略学习的交替：每当RL代理生成一个被认为不被当前RM接受的轨迹时，就会学习一个新的RM。为了加快RL代理的训练速度，PROB-IRM采用了一种奖励塑造的概率公式，利用从轨迹中得出的后验贝叶斯信念。我们的实验分析表明PROB-IRM可以从嘈杂轨迹中学习（可能不完美的）RMs，并利用它们来成功训练RL代理以解决其任务。尽管从嘈杂轨迹中学习RM的复杂性，使用PROB-IRM训练的代理表现与提供手工制作的RMs的代理相当。

更新时间: 2025-03-21 14:07:55

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.14871v2

MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow

Developing reliable AI systems to assist human clinicians in multi-modal medical diagnosis has long been a key objective for researchers. Recently, Multi-modal Large Language Models (MLLMs) have gained significant attention and achieved success across various domains. With strong reasoning capabilities and the ability to perform diverse tasks based on user instructions, they hold great potential for enhancing medical diagnosis. However, directly applying MLLMs to the medical domain still presents challenges. They lack detailed perception of visual inputs, limiting their ability to perform quantitative image analysis, which is crucial for medical diagnostics. Additionally, MLLMs often exhibit hallucinations and inconsistencies in reasoning, whereas clinical diagnoses must adhere strictly to established criteria. To address these challenges, we propose MedAgent-Pro, an evidence-based reasoning agentic system designed to achieve reliable, explainable, and precise medical diagnoses. This is accomplished through a hierarchical workflow: at the task level, knowledge-based reasoning generate reliable diagnostic plans for specific diseases following retrieved clinical criteria. While at the case level, multiple tool agents process multi-modal inputs, analyze different indicators according to the plan, and provide a final diagnosis based on both quantitative and qualitative evidence. Comprehensive experiments on both 2D and 3D medical diagnosis tasks demonstrate the superiority and effectiveness of MedAgent-Pro, while case studies further highlight its reliability and interpretability. The code is available at https://github.com/jinlab-imvr/MedAgent-Pro.

Updated: 2025-03-21 14:04:18

标题: MedAgent-Pro：通过推理代理工作流实现多模式征据医学诊断

摘要: 开发可靠的人工智能系统，以协助人类临床医生进行多模态医学诊断长期以来一直是研究人员的主要目标。最近，多模态大型语言模型（MLLMs）引起了广泛关注，并在各个领域取得了成功。凭借强大的推理能力和根据用户指令执行多样任务的能力，它们对增强医学诊断具有巨大潜力。然而，直接将MLLMs应用于医学领域仍然存在挑战。它们缺乏对视觉输入的详细感知，限制了它们进行定量图像分析的能力，这对医学诊断至关重要。此外，MLLMs通常表现出幻觉和推理不一致，而临床诊断必须严格遵循已建立的标准。为了解决这些挑战，我们提出了MedAgent-Pro，这是一个基于证据的推理代理系统，旨在实现可靠、可解释和精确的医学诊断。这是通过层次化工作流程实现的：在任务级别，基于知识的推理为特定疾病生成可靠的诊断计划，遵循检索到的临床标准。而在案例级别，多个工具代理处理多模态输入，根据计划分析不同指标，并根据定量和定性证据提供最终诊断。在2D和3D医学诊断任务上进行的全面实验证明了MedAgent-Pro的优越性和有效性，而病例研究进一步突出了其可靠性和可解释性。代码可在https://github.com/jinlab-imvr/MedAgent-Pro上获得。

更新时间: 2025-03-21 14:04:18

领域: cs.AI

下载: http://arxiv.org/abs/2503.18968v1

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

Ensuring trustworthiness in machine learning (ML) systems is crucial as they become increasingly embedded in high-stakes domains. This paper advocates for integrating causal methods into machine learning to navigate the trade-offs among key principles of trustworthy ML, including fairness, privacy, robustness, accuracy, and explainability. While these objectives should ideally be satisfied simultaneously, they are often addressed in isolation, leading to conflicts and suboptimal solutions. Drawing on existing applications of causality in ML that successfully align goals such as fairness and accuracy or privacy and robustness, this paper argues that a causal approach is essential for balancing multiple competing objectives in both trustworthy ML and foundation models. Beyond highlighting these trade-offs, we examine how causality can be practically integrated into ML and foundation models, offering solutions to enhance their reliability and interpretability. Finally, we discuss the challenges, limitations, and opportunities in adopting causal frameworks, paving the way for more accountable and ethically sound AI systems.

Updated: 2025-03-21 14:02:38

标题: 因果关系是理解和平衡可信机器学习和基础模型中多重目标的关键。

摘要: 随着机器学习（ML）系统越来越多地嵌入到高风险领域中，确保其可信性至关重要。本文主张将因果方法整合到机器学习中，以平衡可信ML的关键原则，包括公平性、隐私、稳健性、准确性和可解释性之间的权衡。虽然这些目标理想情况下应同时满足，但通常会分别处理，导致冲突和次优解。借鉴因果性在ML中成功实现公平性和准确性或隐私和稳健性等目标的现有应用，本文认为因果方法对于在可信ML和基础模型中平衡多个竞争目标至关重要。除了强调这些权衡之外，我们还探讨了如何将因果关系实际整合到ML和基础模型中，提供增强其可靠性和可解释性的解决方案。最后，我们讨论了采用因果框架所面临的挑战、限制和机遇，为更加负责任和道德合理的人工智能系统铺平道路。

更新时间: 2025-03-21 14:02:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.21123v3

Instant Adversarial Purification with Adversarial Consistency Distillation

Neural networks have revolutionized numerous fields with their exceptional performance, yet they remain susceptible to adversarial attacks through subtle perturbations. While diffusion-based purification methods like DiffPure offer promising defense mechanisms, their computational overhead presents a significant practical limitation. In this paper, we introduce One Step Control Purification (OSCP), a novel defense framework that achieves robust adversarial purification in a single Neural Function Evaluation (NFE) within diffusion models. We propose Gaussian Adversarial Noise Distillation (GAND) as the distillation objective and Controlled Adversarial Purification (CAP) as the inference pipeline, which makes OSCP demonstrate remarkable efficiency while maintaining defense efficacy. Our proposed GAND addresses a fundamental tension between consistency distillation and adversarial perturbation, bridging the gap between natural and adversarial manifolds in the latent space, while remaining computationally efficient through Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA, eliminating the high computational budget request from full parameter fine-tuning. The CAP guides the purification process through the unlearnable edge detection operator calculated by the input image as an extra prompt, effectively preventing the purified images from deviating from their original appearance when large purification steps are used. Our experimental results on ImageNet showcase OSCP's superior performance, achieving a 74.19% defense success rate with merely 0.1s per purification -- a 100-fold speedup compared to conventional approaches.

Updated: 2025-03-21 13:58:47

标题: 使用对抗一致性蒸馏进行即时对抗净化

摘要: 神经网络以其卓越的性能已经在许多领域引起了革命，但它们仍然容易受到微小扰动的对抗攻击。虽然像DiffPure这样基于扩散的净化方法提供了有希望的防御机制，但它们的计算开销却带来了显著的实际限制。在本文中，我们介绍了一种名为一步控制净化（OSCP）的新颖防御框架，它在扩散模型中通过单个神经功能评估（NFE）实现了鲁棒的对抗净化。我们提出了高斯对抗性噪声提取（GAND）作为提取目标，控制对抗性净化（CAP）作为推理流水线，使OSCP在保持防御效果的同时表现出卓越的效率。我们提出的GAND解决了一致性提取和对抗扰动之间的基本张力，通过LoRA等参数高效调整（PEFT）方法，使自然和对抗性流形之间的潜在空间联系起来，同时通过消除来自完全参数调整的高计算开销需求。CAP通过由输入图像计算的不可学习边缘检测运算符引导净化过程，有效防止被净化图像在使用大的净化步骤时偏离其原始外观。我们在ImageNet上的实验结果展示了OSCP的卓越性能，实现了74.19%的防御成功率，每次净化仅需0.1秒 -- 与传统方法相比，速度提升了100倍。

更新时间: 2025-03-21 13:58:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.17064v3

TransURL: Improving malicious URL detection with multi-layer Transformer encoding and multi-scale pyramid features

Machine learning progress is advancing the detection of malicious URLs. However, advanced Transformers applied to URLs face difficulties in extracting local information, character-level details, and structural relationships. To address these challenges, we propose a novel approach for malicious URL detection, named TransURL. This method is implemented by co-training the character-aware Transformer with three feature modules: Multi-Layer Encoding, Multi-Scale Feature Learning, and Spatial Pyramid Attention. This specialized Transformer enables TransURL to extract embeddings with character-level information from URL token sequences, with the three modules aiding the fusion of multi-layer Transformer encodings and the capture of multi-scale local details and structural relationships. The proposed method is evaluated across several challenging scenarios, including class imbalance learning, multi-classification, cross-dataset testing, and adversarial sample attacks. Experimental results demonstrate a significant improvement compared to previous methods. For instance, it achieved a peak F1-score improvement of 40% in class-imbalanced scenarios and surpassed the best baseline by 14.13% in accuracy for adversarial attack scenarios. Additionally, a case study demonstrated that our method accurately identified all 30 active malicious web pages, whereas two previous state-of-the-art methods missed 4 and 7 malicious web pages, respectively. The codes and data are available at: https://github.com/Vul-det/TransURL/.

Updated: 2025-03-21 13:48:59

标题: TransURL：利用多层Transformer编码和多尺度金字塔特征改进恶意URL检测

摘要: 机器学习进展正在推动恶意URL的检测。然而，应用于URL的高级Transformer面临提取局部信息、字符级细节和结构关系的困难。为了解决这些挑战，我们提出了一种用于检测恶意URL的新方法，名为TransURL。该方法通过与三个特征模块共同训练字符感知Transformer来实施，这三个特征模块分别是多层编码、多尺度特征学习和空间金字塔注意力。这种专门的Transformer使TransURL能够从URL令牌序列中提取具有字符级信息的嵌入，这三个模块有助于融合多层Transformer编码并捕获多尺度局部细节和结构关系。该方法在多个具有挑战性的场景中进行了评估，包括类别不平衡学习、多分类、跨数据集测试和对抗样本攻击。实验结果表明，与先前的方法相比，该方法取得了显著的改进。例如，在类别不平衡场景中，其峰值F1分数提高了40%，在对抗攻击场景中的准确性超过了最佳基准14.13%。此外，一项案例研究表明，我们的方法准确识别了所有30个活动恶意网页，而两种先前的最新方法分别错过了4个和7个恶意网页。代码和数据可在以下网址获取：https://github.com/Vul-det/TransURL/。

更新时间: 2025-03-21 13:48:59

领域: cs.CR

下载: http://arxiv.org/abs/2312.00508v3

Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models

Vision-Language Models (VLMs) learn a shared feature space for text and images, enabling the comparison of inputs of different modalities. While prior works demonstrated that VLMs organize natural language representations into regular structures encoding composite meanings, it remains unclear if compositional patterns also emerge in the visual embedding space. In this work, we investigate compositionality in the image domain, where the analysis of compositional properties is challenged by noise and sparsity of visual data. We address these problems and propose a framework, called Geodesically Decomposable Embeddings (GDE), that approximates image representations with geometry-aware compositional structures in the latent space. We demonstrate that visual embeddings of pre-trained VLMs exhibit a compositional arrangement, and evaluate the effectiveness of this property in the tasks of compositional classification and group robustness. GDE achieves stronger performance in compositional classification compared to its counterpart method that assumes linear geometry of the latent space. Notably, it is particularly effective for group robustness, where we achieve higher results than task-specific solutions. Our results indicate that VLMs can automatically develop a human-like form of compositional reasoning in the visual domain, making their underlying processes more interpretable. Code is available at https://github.com/BerasiDavide/vlm_image_compositionality.

Updated: 2025-03-21 13:46:53

标题: 不仅仅是文本：探索视觉-语言模型中视觉表示的组合性

摘要: 视觉语言模型（VLM）学习了一个用于文本和图像的共享特征空间，使得可以比较不同模态输入。以往的研究表明，VLM将自然语言表示组织成编码复合含义的规则结构，但在视觉嵌入空间中是否也出现了组合模式尚不清楚。在这项工作中，我们研究了图像领域中的组合性，在这里，由于视觉数据的噪声和稀疏性，分析组合性属性是一项挑战。我们解决了这些问题，并提出了一个名为Geodesically Decomposable Embeddings（GDE）的框架，该框架在潜在空间中用具有几何感知的组合结构逼近图像表示。我们证明了预训练的VLM的视觉嵌入展示了组合排列，并评估了这种属性在组合分类和组鲁棒性任务中的有效性。与假设潜在空间具有线性几何的对应方法相比，GDE在组合分类方面表现更强。值得注意的是，它在组鲁棒性方面特别有效，我们的结果比特定任务解决方案更高。我们的结果表明，VLM可以在视觉领域自动发展出一种类似于人类的组合推理形式，使其潜在过程更具可解释性。代码可在https://github.com/BerasiDavide/vlm_image_compositionality找到。

更新时间: 2025-03-21 13:46:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17142v1

HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Speech Enhancement techniques have become core technologies in mobile devices and voice software simplifying downstream speech tasks. Still, modern Deep Learning (DL) solutions often require high amount of computational resources what makes their usage on low-resource devices challenging. We present HiFi-Stream, an optimized version of recently published HiFi++ model. Our experiments demonstrate that HiFiStream saves most of the qualities of the original model despite its size and computational complexity: the lightest version has only around 490k parameters which is 3.5x reduction in comparison to the original HiFi++ making it one of the smallest and fastest models available. The model is evaluated in streaming setting where it demonstrates its superior performance in comparison to modern baselines.

Updated: 2025-03-21 13:44:12

标题: HiFi-Stream: 使用生成对抗网络进行流式语音增强

摘要: 语音增强技术已经成为移动设备和语音软件中的核心技术，简化了下游语音任务。然而，现代深度学习（DL）解决方案通常需要大量的计算资源，这使得它们在低资源设备上的使用具有挑战性。我们提出了HiFi-Stream，这是最近发布的HiFi++模型的优化版本。我们的实验证明，尽管其尺寸和计算复杂性，HiFiStream保存了原始模型的大部分特性：最轻量级的版本仅有大约490k个参数，相较于原始HiFi++减少了3.5倍，使其成为当前最小和最快的模型之一。该模型在流式设置中进行评估，表现出比现代基线更优越的性能。

更新时间: 2025-03-21 13:44:12

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2503.17141v1

Adiabatic Fine-Tuning of Neural Quantum States Enables Detection of Phase Transitions in Weight Space

Neural quantum states (NQS) have emerged as a powerful tool for approximating quantum wavefunctions using deep learning. While these models achieve remarkable accuracy, understanding how they encode physical information remains an open challenge. In this work, we introduce adiabatic fine-tuning, a scheme that trains NQS across a phase diagram, leading to strongly correlated weight representations across different models. This correlation in weight space enables the detection of phase transitions in quantum systems by analyzing the trained network weights alone. We validate our approach on the transverse field Ising model and the J1-J2 Heisenberg model, demonstrating that phase transitions manifest as distinct structures in weight space. Our results establish a connection between physical phase transitions and the geometry of neural network parameters, opening new directions for the interpretability of machine learning models in physics.

Updated: 2025-03-21 13:42:11

标题: 无绝热微调神经量子状态使得在权重空间中检测相变成为可能

摘要: 神经量子态（NQS）已经成为利用深度学习逼近量子波函数的强大工具。虽然这些模型能够达到令人瞩目的准确度，但理解它们如何编码物理信息仍然是一个未解之谜。在这项工作中，我们引入了绝热微调，一种在相图上训练NQS的方案，导致不同模型之间具有强相关的权重表示。权重空间中的这种相关性使得仅通过分析训练过的网络权重就能够检测量子系统中的相变。我们在横向场伊辛模型和J1-J2海森堡模型上验证了我们的方法，证明相变表现为权重空间中的不同结构。我们的结果建立了物理相变与神经网络参数几何之间的联系，为物理中机器学习模型的可解释性开辟了新的方向。

更新时间: 2025-03-21 13:42:11

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2503.17140v1

Structure Is Not Enough: Leveraging Behavior for Neural Network Weight Reconstruction

The weights of neural networks (NNs) have recently gained prominence as a new data modality in machine learning, with applications ranging from accuracy and hyperparameter prediction to representation learning or weight generation. One approach to leverage NN weights involves training autoencoders (AEs), using contrastive and reconstruction losses. This allows such models to be applied to a wide variety of downstream tasks, and they demonstrate strong predictive performance and low reconstruction error. However, despite the low reconstruction error, these AEs reconstruct NN models with deteriorated performance compared to the original ones, limiting their usability with regard to model weight generation. In this paper, we identify a limitation of weight-space AEs, specifically highlighting that a structural loss, that uses the Euclidean distance between original and reconstructed weights, fails to capture some features critical for reconstructing high-performing models. We analyze the addition of a behavioral loss for training AEs in weight space, where we compare the output of the reconstructed model with that of the original one, given some common input. We show a strong synergy between structural and behavioral signals, leading to increased performance in all downstream tasks evaluated, in particular NN weights reconstruction and generation.

Updated: 2025-03-21 13:39:04

标题: 结构不足以重建神经网络权重：利用行为进行重建

摘要: 神经网络（NNs）的权重最近作为机器学习中的一种新数据模态而备受关注，应用范围从准确性和超参数预测到表示学习或权重生成。利用神经网络权重的一种方法涉及训练自动编码器（AEs），使用对比和重构损失。这使得这些模型可以应用于各种下游任务，并且它们展示出强大的预测性能和低重构误差。然而，尽管重构误差较低，这些AEs重构出来的NN模型性能与原始模型相比有所下降，限制了它们在模型权重生成方面的可用性。在本文中，我们确定了权重空间AEs的一个限制，特别强调使用原始权重和重构权重之间的欧几里得距离作为结构损失，未能捕捉一些对于重建高性能模型至关重要的特征。我们分析了在权重空间中为训练AEs添加行为损失，其中我们比较了重构模型的输出与原始模型在给定一些通用输入时的输出。我们展示了结构和行为信号之间的强大协同作用，导致在所有评估的下游任务中性能提升，特别是在NN权重重建和生成方面。

更新时间: 2025-03-21 13:39:04

领域: cs.LG

下载: http://arxiv.org/abs/2503.17138v1

Embedded Visual Prompt Tuning

Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analyzing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.

Updated: 2025-03-21 13:38:56

标题: 嵌入式视觉提示调整

摘要: 基于大规模数据预训练的基础模型已被广泛证明在各种自然图像下游任务中取得成功。参数高效微调（PEFT）方法旨在通过仅更新少量参数来将基础模型适应新领域，以减少计算开销。然而，这些PEFT方法的有效性，特别是在跨领域少样本情景下，如医学图像分析，尚未完全探索。在这项工作中，我们促进了对PEFT在将基础模型调整到医学图像分类任务中的性能的研究。此外，为了缓解主流提示调整方法在Transformer架构上的方式和近似能力的局限性，我们提出了嵌入提示调整（EPT）方法，通过将提示令牌嵌入扩展通道。我们还发现，在基础模型的特征空间分布在预训练过程中存在异常，提示调整可以帮助减轻这种负面影响。为了解释这一现象，我们还引入了一个新颖的视角来理解提示调整：提示调整是一个分布校准器。通过分析EPT中包含的基于补丁的缩放和特征分离操作，我们支持这一观点。我们的实验表明，EPT在少样本医学图像分类任务中明显优于几种最先进的微调方法，并在高度竞争的时间内完成微调过程，表明EPT是一种有效的PEFT方法。源代码可在github.com/zuwenqiang/EPT获取。

更新时间: 2025-03-21 13:38:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01003v5

Semigroup-homomorphic Signature

In 2002, Johnson et al. posed an open problem at the Cryptographers' Track of the RSA Conference: how to construct a secure homomorphic signature on a semigroup, rather than on a group. In this paper, we introduce, for the first time, a semigroup-homomorphic signature scheme. Under certain conditions, we prove that the security of this scheme is based on the hardness of the Short Integer Solution (SIS) problem and is tightly secure. Furthermore, we extend it to a linear semigroup-homomorphic signature scheme over lattices, and this scheme can also ensure privacy.

Updated: 2025-03-21 13:38:07

标题: 半群同态签名

摘要: 在2002年，Johnson等人在RSA会议的密码学家专场提出了一个开放问题：如何在半群而非群上构建一个安全的同态签名。本文首次介绍了一个半群同态签名方案。在特定条件下，我们证明该方案的安全性基于短整数解（SIS）问题的难度，并且具有严格的安全性。此外，我们将其扩展为基于格的线性半群同态签名方案，该方案还可以确保隐私。

更新时间: 2025-03-21 13:38:07

领域: cs.CR

下载: http://arxiv.org/abs/2503.17137v1

Autonomous AI imitators increase diversity in homogeneous information ecosystems

Recent breakthroughs in large language models (LLMs) have facilitated autonomous AI agents capable of imitating human-generated content. This technological advancement raises fundamental questions about AI's impact on the diversity and democratic value of information ecosystems. We introduce a large-scale simulation framework to examine AI-based imitation within news, a context crucial for public discourse. By systematically testing two distinct imitation strategies across a range of information environments varying in initial diversity, we demonstrate that AI-generated articles do not uniformly homogenize content. Instead, AI's influence is strongly context-dependent: AI-generated content can introduce valuable diversity in originally homogeneous news environments but diminish diversity in initially heterogeneous contexts. These results illustrate that the initial diversity of an information environment critically shapes AI's impact, challenging assumptions that AI-driven imitation uniformly threatens diversity. Instead, when information is initially homogeneous, AI-driven imitation can expand perspectives, styles, and topics. This is especially important in news contexts, where information diversity fosters richer public debate by exposing citizens to alternative viewpoints, challenging biases, and preventing narrative monopolies, which is essential for a resilient democracy.

Updated: 2025-03-21 13:35:52

标题: 自主AI模拟器增加同质信息生态系统中的多样性

摘要: 最近大语言模型（LLMs）的突破使得能够模仿人类生成内容的自主AI代理成为可能。这一技术进步引发了关于AI对信息生态多样性和民主价值的影响的基本问题。我们引入了一个大规模模拟框架，以研究新闻领域内基于AI的模仿，这对公共话语至关重要。通过在初始多样性不同的一系列信息环境中系统测试两种不同的模仿策略，我们证明AI生成的文章并不会统一同质化内容。相反，AI的影响受到强烈的上下文依赖性：AI生成的内容可以在原本同质化的新闻环境中引入有价值的多样性，但会在最初异质化的情况下降低多样性。这些结果表明，信息环境的初始多样性至关重要地塑造了AI的影响，挑战了AI驱动的模仿会统一威胁多样性的假设。相反，当信息最初同质化时，AI驱动的模仿可以拓展观点、风格和主题。这在新闻环境中尤为重要，因为信息多样性通过向公民展示替代观点、挑战偏见和防止叙事垄断来促进更丰富的公共辩论，这对于一个有韧性的民主至关重要。

更新时间: 2025-03-21 13:35:52

领域: cs.CY,cs.AI,cs.CL,J.4

下载: http://arxiv.org/abs/2503.16021v2

Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens

Recent advancements in large language models and their multi-modal extensions have demonstrated the effectiveness of unifying generation and understanding through autoregressive next-token prediction. However, despite the critical role of 3D structural generation and understanding (3D GU) in AI for science, these tasks have largely evolved independently, with autoregressive methods remaining underexplored. To bridge this gap, we introduce Uni-3DAR, a unified framework that seamlessly integrates 3D GU tasks via autoregressive prediction. At its core, Uni-3DAR employs a novel hierarchical tokenization that compresses 3D space using an octree, leveraging the inherent sparsity of 3D structures. It then applies an additional tokenization for fine-grained structural details, capturing key attributes such as atom types and precise spatial coordinates in microscopic 3D structures. We further propose two optimizations to enhance efficiency and effectiveness. The first is a two-level subtree compression strategy, which reduces the octree token sequence by up to 8x. The second is a masked next-token prediction mechanism tailored for dynamically varying token positions, significantly boosting model performance. By combining these strategies, Uni-3DAR successfully unifies diverse 3D GU tasks within a single autoregressive framework. Extensive experiments across multiple microscopic 3D GU tasks, including molecules, proteins, polymers, and crystals, validate its effectiveness and versatility. Notably, Uni-3DAR surpasses previous state-of-the-art diffusion models by a substantial margin, achieving up to 256\% relative improvement while delivering inference speeds up to 21.8x faster. The code is publicly available at https://github.com/dptech-corp/Uni-3DAR.

Updated: 2025-03-21 13:32:47

标题: Uni-3DAR：通过对压缩空间标记进行自回归实现统一的3D生成和理解

摘要: 最近关于大型语言模型及其多模态扩展的进展已经证明了通过自回归下一个标记预测统一生成和理解的有效性。然而，尽管在科学人工智能中3D结构生成和理解（3D GU）的关键作用，这些任务在很大程度上是独立发展的，自回归方法仍未得到充分探索。为了弥合这一差距，我们引入了Uni-3DAR，这是一个统一的框架，通过自回归预测无缝集成3D GU任务。在其核心，Uni-3DAR采用一种新颖的分层标记化，通过使用八叉树压缩3D空间，利用3D结构的固有稀疏性。然后，它应用了额外的标记化细化结构细节，捕捉微观3D结构中的原子类型和精确的空间坐标等关键属性。我们进一步提出了两种优化方法来增强效率和有效性。第一种是两级子树压缩策略，可以将八叉树标记序列减少多达8倍。第二种是针对动态变化的标记位置量身定制的掩码下一个标记预测机制，显著提升模型性能。通过结合这些策略，Uni-3DAR成功地在单个自回归框架内统一了各种3D GU任务。在包括分子、蛋白质、聚合物和晶体在内的多个微观3D GU任务上进行的大量实验验证了其有效性和多功能性。值得注意的是，Uni-3DAR在很大程度上超越了以前的最先进扩散模型，实现了高达256％的相对改进，并且推理速度提高了高达21.8倍。该代码可以在https://github.com/dptech-corp/Uni-3DAR 上公开获取。

更新时间: 2025-03-21 13:32:47

领域: cs.LG,cond-mat.mtrl-sci,q-bio.BM

下载: http://arxiv.org/abs/2503.16278v2

Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition

This paper explores the promising interplay between spiking neural networks (SNNs) and event-based cameras for privacy-preserving human action recognition (HAR). The unique feature of event cameras in capturing only the outlines of motion, combined with SNNs' proficiency in processing spatiotemporal data through spikes, establishes a highly synergistic compatibility for event-based HAR. Previous studies, however, have been limited by SNNs' ability to process long-term temporal information, essential for precise HAR. In this paper, we introduce two novel frameworks to address this: temporal segment-based SNN (\textit{TS-SNN}) and 3D convolutional SNN (\textit{3D-SNN}). The \textit{TS-SNN} extracts long-term temporal information by dividing actions into shorter segments, while the \textit{3D-SNN} replaces 2D spatial elements with 3D components to facilitate the transmission of temporal information. To promote further research in event-based HAR, we create a dataset, \textit{FallingDetection-CeleX}, collected using the high-resolution CeleX-V event camera $(1280 \times 800)$, comprising 7 distinct actions. Extensive experimental results show that our proposed frameworks surpass state-of-the-art SNN methods on our newly collected dataset and three other neuromorphic datasets, showcasing their effectiveness in handling long-range temporal information for event-based HAR.

Updated: 2025-03-21 13:31:16

标题: 基于时间引导的脉冲神经网络用于基于事件的人体动作识别

摘要: 本文探讨了脉冲神经网络（SNNs）与基于事件的摄像头在隐私保护人类动作识别（HAR）中的有希望的相互作用。事件摄像头捕捉运动轮廓的独特特征，结合SNNs在通过脉冲处理时空数据方面的精通，为基于事件的HAR建立了高度协同兼容性。然而，先前的研究受限于SNNs处理长期时间信息的能力，这对于精确的HAR至关重要。在本文中，我们引入了两个新颖的框架来解决这个问题：基于时间段的SNN（TS-SNN）和3D卷积SNN（3D-SNN）。TS-SNN通过将动作分成较短的段来提取长期时间信息，而3D-SNN将2D空间元素替换为3D组件，以促进时间信息的传输。为了推动基于事件的HAR的进一步研究，我们创建了一个数据集，FallingDetection-CeleX，使用高分辨率CeleX-V事件摄像头（1280×800）收集，包括7个不同的动作。广泛的实验结果表明，我们提出的框架在我们新收集的数据集和其他三个神经形态数据集上均优于最先进的SNN方法，展示了它们在处理基于事件的HAR的长期时间信息方面的有效性。

更新时间: 2025-03-21 13:31:16

领域: cs.CV,cs.AI,cs.CR,cs.NE

下载: http://arxiv.org/abs/2503.17132v1

OptionZero: Planning with Learned Options

Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert demonstration data. Inspired by MuZero, which learns superhuman heuristics without any human knowledge, we propose a novel approach, named OptionZero. OptionZero incorporates an option network into MuZero, providing autonomous discovery of options through self-play games. Furthermore, we modify the dynamics network to provide environment transitions when using options, allowing searching deeper under the same simulation constraints. Empirical experiments conducted in 26 Atari games demonstrate that OptionZero outperforms MuZero, achieving a 131.58% improvement in mean human-normalized score. Our behavior analysis shows that OptionZero not only learns options but also acquires strategic skills tailored to different game characteristics. Our findings show promising directions for discovering and using options in planning. Our code is available at https://rlg.iis.sinica.edu.tw/papers/optionzero.

Updated: 2025-03-21 13:30:42

标题: OptionZero：利用学习到的选项进行规划

摘要: 使用选项进行规划-一系列基本动作-已被证明在复杂环境中的强化学习中是有效的。先前的研究集中在使用预定义选项或通过专家演示数据学习选项进行规划。受MuZero的启发，该方法学习超人启发式而没有任何人类知识，我们提出了一种新方法，称为OptionZero。OptionZero将一个选项网络融入到MuZero中，通过自我对弈游戏提供选项的自主发现。此外，我们修改动力学网络以在使用选项时提供环境转换，允许在相同的模拟约束下进行更深入的搜索。在26个Atari游戏中进行的实证实验表明，OptionZero优于MuZero，平均人类标准化得分提高了131.58%。我们的行为分析显示，OptionZero不仅学习选项，还获得了针对不同游戏特征量身定制的战略技能。我们的研究结果展示了在规划中发现和使用选项的有希望的方向。我们的代码可在https://rlg.iis.sinica.edu.tw/papers/optionzero获取。

更新时间: 2025-03-21 13:30:42

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.16634v3

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Multimodal Large Language Models (MLLMs) have achieved remarkable success in vision understanding, reasoning, and interaction. However, the inference computation and memory increase progressively with the generation of output tokens during decoding, directly affecting the efficacy of MLLMs. Existing methods attempt to reduce the vision context redundancy to achieve efficient MLLMs. Unfortunately, the efficiency benefits of the vision context reduction in the prefill stage gradually diminish during the decoding stage. To address this problem, we proposed a dynamic vision-language context sparsification framework Dynamic-LLaVA, which dynamically reduces the redundancy of vision context in the prefill stage and decreases the memory and computation overhead of the generated language context during decoding. Dynamic-LLaVA designs a tailored sparsification inference scheme for different inference modes, i.e., prefill, decoding with and without KV cache, to achieve efficient inference of MLLMs. In practice, Dynamic-LLaVA can reduce computation consumption by $\sim$75\% in the prefill stage. Meanwhile, throughout the entire generation process of MLLMs, Dynamic-LLaVA reduces the $\sim$50\% computation consumption under decoding without KV cache, while saving $\sim$50\% GPU memory overhead when decoding with KV cache, due to the vision-language context sparsification. Extensive experiments also demonstrate that Dynamic-LLaVA achieves efficient inference for MLLMs with negligible understanding and generation ability degradation or even performance gains compared to the full-context inference baselines. Code is available at https://github.com/Osilly/dynamic_llava .

Updated: 2025-03-21 13:30:33

标题: Dynamic-LLaVA: 通过动态视觉-语言上下文稀疏化实现高效的多模态大型语言模型

摘要: 多模态大型语言模型（MLLMs）在视觉理解、推理和交互方面取得了显著的成功。然而，在解码过程中，随着输出标记的生成，推理计算和内存的增加逐渐增加，直接影响了MLLMs的有效性。现有方法试图减少视觉上下文的冗余，以实现高效的MLLMs。然而，视觉上下文减少在填充阶段的效率收益在解码阶段逐渐减弱。为了解决这个问题，我们提出了一个动态视觉-语言上下文稀疏化框架Dynamic-LLaVA，动态减少了填充阶段的视觉上下文的冗余，并在解码过程中减少了生成的语言上下文的内存和计算开销。Dynamic-LLaVA为不同推理模式（即填充、解码以及带有和不带有KV缓存的解码）设计了一个定制的稀疏化推理方案，以实现MLLMs的高效推理。在实践中，Dynamic-LLaVA可以在填充阶段将计算消耗减少约75％。与此同时，在整个MLLMs的生成过程中，Dynamic-LLaVA在解码不带KV缓存时将计算消耗减少约50％，在解码时带有KV缓存时，由于视觉-语言上下文的稀疏化，节省了约50％的GPU内存开销。大量实验还表明，与完整上下文推理基线相比，Dynamic-LLaVA实现了MLLMs的高效推理，几乎没有理解和生成能力的降级，甚至有性能提升。源代码可在https://github.com/Osilly/dynamic_llava找到。

更新时间: 2025-03-21 13:30:33

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2412.00876v4

Statistical exploration of the Manifold Hypothesis

The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as a key factor in the success of modern AI technologies. We show that rich and sometimes intricate manifold structure in data can emerge from a generic and remarkably simple statistical model -- the Latent Metric Model -- via elementary concepts such as latent variables, correlation and stationarity. This establishes a general statistical explanation for why the Manifold Hypothesis seems to hold in so many situations. Informed by the Latent Metric Model we derive procedures to discover and interpret the geometry of high-dimensional data, and explore hypotheses about the data generating mechanism. These procedures operate under minimal assumptions and make use of well known graph-analytic algorithms.

Updated: 2025-03-21 13:30:13

标题: 统计探索流形假设

摘要: 多流假设是机器学习中被广泛接受的一个观点，它断言名义上高维数据实际上集中在一个低维流形中，该流形嵌入在高维空间中。这种现象在许多真实世界情况下得到了经验观察，在过去几十年中已经导致了广泛的统计方法的发展，并且被认为是现代人工智能技术成功的关键因素之一。我们展示，数据中丰富且有时错综复杂的流形结构可以从一个通用且非常简单的统计模型 - 潜在度量模型 - 中产生，通过潜在变量、相关性和稳定性等基本概念。这为为什么多流假设在如此多情况下似乎成立提供了一个一般性的统计解释。受潜在度量模型的启发，我们推导出了一些程序来发现和解释高维数据的几何结构，并探索有关数据生成机制的假设。这些程序在最小假设下运行，并利用众所周知的图分析算法。

更新时间: 2025-03-21 13:30:13

领域: stat.ME,cs.LG,stat.ML,62R20, 62R40, 62G05, 62G20, 62R07, 62-08, 62H25, 62H30

下载: http://arxiv.org/abs/2208.11665v5

Vul-LMGNNs: Fusing language models and online-distilled graph neural networks for code vulnerability detection

Code Language Models (codeLMs) and Graph Neural Networks (GNNs) are widely used in code vulnerability detection. However, GNNs often rely on aggregating information from adjacent nodes, limiting structural information propagation across layers. While codeLMs can supplement GNNs with semantic information, existing integration methods underexplore their collaborative potential. To address these challenges, we propose Vul-LMGNNs, integrating pre-trained codeLMs with GNNs to enable cross-layer propagation of semantic and structural information. Vul-LMGNNs leverage Code Property Graphs (CPGs) to incorporate syntax, control flow, and data dependencies, using gated GNNs for structural extraction. An online knowledge distillation (KD) mechanism allows a student GNN to capture structural information from a trained counterpart via alternating training. Additionally, an "implicit-explicit" joint training framework leverages codeLMs to initialize embeddings and propagate code semantics. In the explicit phase, it performs late fusion via linear interpolation. Evaluations on real-world vulnerability datasets show Vul-LMGNNs outperform 17 state-of-the-art approaches. Source code is available at: https://github.com/Vul-LMGNN/vul-LMGNN.

Updated: 2025-03-21 13:29:30

标题: Vul-LMGNNs: 将语言模型与在线提炼的图神经网络融合用于代码漏洞检测

摘要: 代码语言模型（codeLMs）和图神经网络（GNNs）广泛应用于代码漏洞检测。然而，GNNs通常依赖于聚合相邻节点的信息，限制了跨层结构信息的传播。虽然codeLMs可以用语义信息补充GNNs，但现有的集成方法很少探索它们的协作潜力。为了解决这些挑战，我们提出了Vul-LMGNNs，将预训练的codeLMs与GNNs集成在一起，以实现语义和结构信息的跨层传播。Vul-LMGNNs利用代码属性图（CPGs）来整合语法、控制流和数据依赖关系，使用门控GNNs进行结构提取。在线知识蒸馏（KD）机制允许学生GNN通过交替训练从经过训练的对应模型中捕获结构信息。此外，一个“隐式-显式”联合训练框架利用codeLMs初始化嵌入并传播代码语义。在显式阶段，它通过线性插值执行后期融合。对真实世界漏洞数据集的评估显示，Vul-LMGNNs优于17种最先进的方法。源代码可在以下网址找到：https://github.com/Vul-LMGNN/vul-LMGNN。

更新时间: 2025-03-21 13:29:30

领域: cs.CR

下载: http://arxiv.org/abs/2404.14719v2

Modifying Large Language Model Post-Training for Diverse Creative Writing

As creative writing tasks do not have singular correct answers, large language models (LLMs) trained to perform these tasks should be able to generate diverse valid outputs. However, LLM post-training often focuses on improving generation quality but neglects to facilitate output diversity. Hence, in creative writing generation, we investigate post-training approaches to promote both output diversity and quality. Our core idea is to include deviation -- the degree of difference between a training sample and all other samples with the same prompt -- in the training objective to facilitate learning from rare high-quality instances. By adopting our approach to direct preference optimization (DPO) and odds ratio preference optimization (ORPO), we demonstrate that we can promote the output diversity of trained models while minimally decreasing quality. Our best model with 8B parameters could achieve on-par diversity as a human-created dataset while having output quality similar to the best instruction-tuned models we examined, GPT-4o and DeepSeek-R1. We further validate our approaches with a human evaluation, an ablation, and a comparison to an existing diversification approach, DivPO.

Updated: 2025-03-21 13:21:45

标题: 修改大型语言模型的后训练，用于多样化创意写作

摘要: 由于创造性写作任务没有单一正确答案，因此训练用于执行这些任务的大型语言模型（LLMs）应该能够生成多样化的有效输出。然而，LLM后训练通常侧重于提高生成质量，但忽视了促进输出多样性的重要性。因此，在创造性写作生成中，我们研究了后训练方法来促进输出的多样性和质量。我们的核心思想是在训练目标中包含偏差--训练样本与所有具有相同提示的其他样本之间的差异程度--以促进从稀有高质量实例中学习。通过采用我们的方法进行直接偏好优化（DPO）和赔率比偏好优化（ORPO），我们证明我们可以促进训练模型的输出多样性，同时最小程度地降低质量。我们的最佳模型具有8B个参数，可以实现与人类创建的数据集相当的多样性，同时输出质量与我们考察的最佳指令调整模型GPT-4o和DeepSeek-R1相似。我们进一步通过人类评估、削弱实验和与现有多样化方法DivPO的比较来验证我们的方法。

更新时间: 2025-03-21 13:21:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.17126v1

Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning

Deep Reinforcement Learning (DRL) has demonstrated strong performance in robotic control but remains susceptible to out-of-distribution (OOD) states, often resulting in unreliable actions and task failure. While previous methods have focused on minimizing or preventing OOD occurrences, they largely neglect recovery once an agent encounters such states. Although the latest research has attempted to address this by guiding agents back to in-distribution states, their reliance on uncertainty estimation hinders scalability in complex environments. To overcome this limitation, we introduce Language Models for Out-of-Distribution Recovery (LaMOuR), which enables recovery learning without relying on uncertainty estimation. LaMOuR generates dense reward codes that guide the agent back to a state where it can successfully perform its original task, leveraging the capabilities of LVLMs in image description, logical reasoning, and code generation. Experimental results show that LaMOuR substantially enhances recovery efficiency across diverse locomotion tasks and even generalizes effectively to complex environments, including humanoid locomotion and mobile manipulation, where existing methods struggle. The code and supplementary materials are available at \href{https://lamour-rl.github.io/}{https://lamour-rl.github.io/}.

Updated: 2025-03-21 13:20:39

标题: 利用语言模型实现强化学习中的超分布恢复

摘要: 深度强化学习（DRL）在机器人控制方面表现出色，但仍然容易受到超出分布（OOD）状态的影响，通常导致不可靠的动作和任务失败。虽然先前的方法着重于最小化或防止OOD的发生，但它们往往忽视了一旦代理遇到这些状态就需要进行恢复。尽管最新的研究尝试通过将代理引导回分布状态来解决这个问题，但它们对不确定性估计的依赖限制了在复杂环境中的可扩展性。为了克服这一限制，我们引入了用于超出分布恢复的语言模型（LaMOuR），它可以在不依赖不确定性估计的情况下进行恢复学习。LaMOuR生成密集的奖励代码，引导代理回到一个状态，从而成功执行其原始任务，充分利用LVLM在图像描述、逻辑推理和代码生成方面的能力。实验结果表明，LaMOuR显著提高了在各种运动任务中的恢复效率，甚至在复杂环境中也能有效泛化，包括人形运动和移动操作，在这些环境中现有方法往往难以应对。代码和补充材料可在\href{https://lamour-rl.github.io/}{https://lamour-rl.github.io/}上获得。

更新时间: 2025-03-21 13:20:39

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.17125v1

Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

Attention mechanisms have revolutionized several domains of artificial intelligence, such as natural language processing and computer vision, by enabling models to selectively focus on relevant parts of the input data. While recent work has characterized the optimization dynamics of gradient descent (GD) in attention-based models and the structural properties of its preferred solutions, less is known about more general optimization algorithms such as mirror descent (MD). In this paper, we investigate the convergence properties and implicit biases of a family of MD algorithms tailored for softmax attention mechanisms, with the potential function chosen as the $p$-th power of the $\ell_p$-norm. Specifically, we show that these algorithms converge in direction to a generalized hard-margin SVM with an $\ell_p$-norm objective when applied to a classification problem using a softmax attention model. Notably, our theoretical results reveal that the convergence rate is comparable to that of traditional GD in simpler models, despite the highly nonlinear and nonconvex nature of the present problem. Additionally, we delve into the joint optimization dynamics of the key-query matrix and the decoder, establishing conditions under which this complex joint optimization converges to their respective hard-margin SVM solutions. Lastly, our numerical experiments on real data demonstrate that MD algorithms improve generalization over standard GD and excel in optimal token selection.

Updated: 2025-03-21 13:15:52

标题: 用镜子下降优化注意力：广义最大边缘令牌选择

摘要: 注意机制已经在人工智能的几个领域中产生了革命性的影响，例如自然语言处理和计算机视觉，通过使模型能够有选择地关注输入数据的相关部分。尽管最近的研究已经表征了基于梯度下降（GD）的注意力模型的优化动态和其首选解的结构特性，但对于更一般的优化算法如镜像下降（MD），了解较少。在本文中，我们研究了一类针对 softmax 注意力机制定制的 MD 算法的收敛性质和隐含偏差，其中潜在函数选择为 $\ell_p$-范数的 $p$ 次幂。具体而言，我们表明这些算法在应用于使用 softmax 注意力模型的分类问题时，朝着具有 $\ell_p$-范数目标的广义硬间隔支持向量机（SVM）收敛。值得注意的是，我们的理论结果显示，尽管当前问题的高度非线性和非凸性质，但收敛速度与简单模型中传统 GD 的收敛速度相当。此外，我们深入讨论了关键-查询矩阵和解码器的联合优化动态，建立了这种复杂联合优化收敛到各自硬间隔 SVM 解的条件。最后，我们对真实数据进行的数值实验表明，MD 算法在泛化能力上优于标准 GD，并在最佳令牌选择方面表现出色。

更新时间: 2025-03-21 13:15:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14581v3

TamedPUMA: safe and stable imitation learning with geometric fabrics

Using the language of dynamical systems, Imitation learning (IL) provides an intuitive and effective way of teaching stable task-space motions to robots with goal convergence. Yet, IL techniques are affected by serious limitations when it comes to ensuring safety and fulfillment of physical constraints. With this work, we solve this challenge via TamedPUMA, an IL algorithm augmented with a recent development in motion generation called geometric fabrics. As both the IL policy and geometric fabrics describe motions as artificial second-order dynamical systems, we propose two variations where IL provides a navigation policy for geometric fabrics. The result is a stable imitation learning strategy within which we can seamlessly blend geometrical constraints like collision avoidance and joint limits. Beyond providing a theoretical analysis, we demonstrate TamedPUMA with simulated and real-world tasks, including a 7-DoF manipulator.

Updated: 2025-03-21 13:13:17

标题: TamedPUMA: 具有几何结构的安全稳定的模仿学习

摘要: 利用动力系统的语言，模仿学习（IL）为机器人教授稳定任务空间运动提供了直观而有效的方式，实现目标收敛。然而，当涉及确保安全和满足物理约束时，IL技术受到严重限制。通过TamedPUMA，我们解决了这一挑战，这是一种IL算法，增加了一种称为几何织物的最新运动生成技术。由于IL策略和几何织物都将运动描述为人工二阶动力系统，我们提出了两种变体，其中IL为几何织物提供导航策略。结果是一种稳定的模仿学习策略，其中我们可以无缝地混合几何约束，如碰撞避免和关节限制。除了提供理论分析外，我们还通过模拟和真实世界任务展示了TamedPUMA，包括一个7自由度的操纵器。

更新时间: 2025-03-21 13:13:17

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2503.17432v1

A New Statistical Model of Star Speckles for Learning to Detect and Characterize Exoplanets in Direct Imaging Observations

The search for exoplanets is an active field in astronomy, with direct imaging as one of the most challenging methods due to faint exoplanet signals buried within stronger residual starlight. Successful detection requires advanced image processing to separate the exoplanet signal from this nuisance component. This paper presents a novel statistical model that captures nuisance fluctuations using a multi-scale approach, leveraging problem symmetries and a joint spectral channel representation grounded in physical principles. Our model integrates into an interpretable, end-to-end learnable framework for simultaneous exoplanet detection and flux estimation. The proposed algorithm is evaluated against the state of the art using datasets from the SPHERE instrument operating at the Very Large Telescope (VLT). It significantly improves the precision-recall trade-off, notably on challenging datasets that are otherwise unusable by astronomers. The proposed approach is computationally efficient, robust to varying data quality, and well suited for large-scale observational surveys.

Updated: 2025-03-21 13:07:55

标题: 一种用于学习检测和表征外行星的星斑的新统计模型：直接成像观测中的应用

摘要: 寻找系外行星是天文学中的一个活跃领域，直接成像是其中最具挑战性的方法之一，因为微弱的系外行星信号被淹没在更强的残余恒星光中。成功的检测需要先进的图像处理技术来将系外行星信号与这种干扰成分区分开来。本文提出了一种捕捉干扰波动的新颖统计模型，采用多尺度方法，利用问题的对称性和基于物理原理的联合谱通道表示。我们的模型整合到一个可解释的、端到端可学习的框架中，用于同时进行系外行星检测和通量估计。所提出的算法使用在非常大望远镜（VLT）上运行的SPHERE仪器的数据集进行评估，与现有技术相比，显著改善了精确率-召回率的权衡，特别是在对天文学家来说原本无法使用的具有挑战性的数据集上。所提出的方法在计算效率上表现出色，对于各种数据质量变化都很稳健，并且非常适合大规模观测调查。

更新时间: 2025-03-21 13:07:55

领域: astro-ph.IM,astro-ph.EP,cs.CV,cs.LG,stat.AP

下载: http://arxiv.org/abs/2503.17117v1

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

Updated: 2025-03-21 13:02:14

标题: 稀疏自编码器揭示了在适应过程中视觉概念的选择性重映。

摘要: 将基础模型调整为特定目的已成为构建用于下游应用的机器学习系统的标准方法。然而，在适应过程中发生了哪些机制是一个待解开的问题。在这里，我们开发了一种新的稀疏自动编码器（SAE）用于CLIP视觉变换器，命名为PatchSAE，以提取细粒度水平的可解释概念（例如，物体的形状、颜色或语义）及其基于补丁的空间属性。我们探讨这些概念如何影响下游图像分类任务中的模型输出，并调查最近的最先进的基于提示的适应技术如何改变模型输入与这些概念之间的关联。虽然适应模型和非适应模型之间的概念激活略有变化，但我们发现常见适应任务中的大部分收益可以用已经存在于非适应基础模型中的现有概念来解释。这项工作提供了一个具体的框架来训练和使用SAE用于视觉变换器，并提供了解释适应机制的见解。

更新时间: 2025-03-21 13:02:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.05276v2

The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding

Egocentric video has seen increased interest in recent years, as it is used in a range of areas. However, most existing datasets are limited to a single perspective. In this paper, we present the CASTLE 2024 dataset, a multimodal collection containing ego- and exo-centric (i.e., first- and third-person perspective) video and audio from 15 time-aligned sources, as well as other sensor streams and auxiliary data. The dataset was recorded by volunteer participants over four days in a fixed location and includes the point of view of 10 participants, with an additional 5 fixed cameras providing an exocentric perspective. The entire dataset contains over 600 hours of UHD video recorded at 50 frames per second. In contrast to other datasets, CASTLE 2024 does not contain any partial censoring, such as blurred faces or distorted audio. The dataset is available via https://castle-dataset.github.io/.

Updated: 2025-03-21 13:01:07

标题: 2024年城堡数据集：推动多模态理解艺术的进步

摘要: 近年来，自我中心视频引起了人们的兴趣，因为它被应用于各个领域。然而，大多数现有数据集仅限于单一视角。本文介绍了CASTLE 2024数据集，这是一个多模态集合，包含了来自15个时间对齐源的自我中心和外在中心（即第一和第三人称视角）视频和音频，以及其他传感器流和辅助数据。该数据集由志愿者参与者在固定位置连续四天录制，包括10名参与者的视角，另外5台固定摄像头提供外在中心视角。整个数据集包含超过600小时的UHD视频，每秒50帧。与其他数据集不同，CASTLE 2024不包含任何部分审查，如模糊的面部或扭曲的音频。该数据集可以通过https://castle-dataset.github.io/获得。

更新时间: 2025-03-21 13:01:07

领域: cs.MM,cs.AI,cs.CV,cs.IR

下载: http://arxiv.org/abs/2503.17116v1

SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage

Large language models (LLMs) have made significant advancements across various tasks, but their safety alignment remain a major concern. Exploring jailbreak prompts can expose LLMs' vulnerabilities and guide efforts to secure them. Existing methods primarily design sophisticated instructions for the LLM to follow, or rely on multiple iterations, which could hinder the performance and efficiency of jailbreaks. In this work, we propose a novel jailbreak paradigm, Simple Assistive Task Linkage (SATA), which can effectively circumvent LLM safeguards and elicit harmful responses. Specifically, SATA first masks harmful keywords within a malicious query to generate a relatively benign query containing one or multiple [MASK] special tokens. It then employs a simple assistive task such as a masked language model task or an element lookup by position task to encode the semantics of the masked keywords. Finally, SATA links the assistive task with the masked query to jointly perform the jailbreak. Extensive experiments show that SATA achieves state-of-the-art performance and outperforms baselines by a large margin. Specifically, on AdvBench dataset, with mask language model (MLM) assistive task, SATA achieves an overall attack success rate (ASR) of 85% and harmful score (HS) of 4.57, and with element lookup by position (ELP) assistive task, SATA attains an overall ASR of 76% and HS of 4.43.

Updated: 2025-03-21 13:00:44

标题: SATA：通过简单辅助任务链接实现LLM越狱的范式

摘要: 大型语言模型（LLMs）在各种任务上取得了重大进展，但它们的安全对齐仍然是一个主要关注点。探索越狱提示可以暴露LLMs的漏洞并指导安全工作。现有方法主要设计复杂的指令供LLM遵循，或依赖多次迭代，这可能会妨碍越狱的性能和效率。在这项工作中，我们提出了一种新颖的越狱范式，简单辅助任务链接（SATA），它可以有效地绕过LLM的保护措施，并引发有害响应。具体而言，SATA首先在恶意查询中掩盖有害关键词，以生成一个相对良性的查询，其中包含一个或多个[MASK]特殊标记。然后，它使用一个简单的辅助任务，比如一个掩盖语言模型任务或一个按位置查找元素的任务，来编码掩盖关键词的语义。最后，SATA将辅助任务与掩盖查询链接起来，共同执行越狱。广泛的实验表明，SATA取得了最先进的性能，并且在很大程度上优于基线。具体来说，在AdvBench数据集上，使用掩盖语言模型（MLM）辅助任务，SATA实现了85%的攻击成功率（ASR）和4.57的有害评分（HS），而使用按位置查找元素（ELP）辅助任务，SATA实现了76%的总体ASR和4.43的HS。

更新时间: 2025-03-21 13:00:44

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2412.15289v2

Semi-Implicit Functional Gradient Flow for Efficient Sampling

Particle-based variational inference methods (ParVIs) use nonparametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. Although functional gradient flows have been introduced to expand the kernel space for better flexibility, the deterministic updating mechanism may limit exploration and require expensive repetitive runs for new samples. In this paper, we propose Semi-Implicit Functional Gradient flow (SIFG), a functional gradient ParVI method that uses perturbed particles with Gaussian noise as the approximation family. We show that the corresponding functional gradient flow, which can be estimated via denoising score matching with neural networks, exhibits strong theoretical convergence guarantees due to a higher-order smoothness brought to the approximation family via Gaussian perturbation. In addition, we present an adaptive version of our method that automatically selects the appropriate noise magnitude during sampling, striking a good balance between exploration efficiency and approximation accuracy. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness and efficiency of the proposed framework.

Updated: 2025-03-21 12:56:31

标题: 半隐式功能梯度流用于高效抽样

摘要: 基于粒子的变分推断方法（ParVIs）使用由粒子表示的非参数变分族来近似目标分布，根据Kullback-Leibler（KL）散度的核化Wasserstein梯度流。虽然功能梯度流已被引入以扩展核空间以获得更好的灵活性，但确定性更新机制可能限制探索，并且需要昂贵的重复运行来获取新样本。在本文中，我们提出了一种半隐式功能梯度流（SIFG），这是一种使用带有高斯噪声的扰动粒子作为近似族的功能梯度ParVI方法。我们展示了相应的功能梯度流，可以通过使用神经网络进行去噪得分匹配来估计，由于通过高斯扰动给近似族带来的高阶平滑性，因此展现了强大的理论收敛保证。此外，我们提出了我们方法的自适应版本，该版本在采样过程中自动选择适当的噪声幅度，以在探索效率和近似准确性之间取得良好的平衡。对模拟和真实世界数据集的广泛实验表明了提出框架的有效性和效率。

更新时间: 2025-03-21 12:56:31

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.17935v2

A General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening

Currently, deep learning-based methods for remote sensing pansharpening have advanced rapidly. However, many existing methods struggle to fully leverage feature heterogeneity and redundancy, thereby limiting their effectiveness. We use the covariance matrix to model the feature heterogeneity and redundancy and propose Correlation-Aware Covariance Weighting (CACW) to adjust them. CACW captures these correlations through the covariance matrix, which is then processed by a nonlinear function to generate weights for adjustment. Building upon CACW, we introduce a general adaptive dual-level weighting mechanism (ADWM) to address these challenges from two key perspectives, enhancing a wide range of existing deep-learning methods. First, Intra-Feature Weighting (IFW) evaluates correlations among channels within each feature to reduce redundancy and enhance unique information. Second, Cross-Feature Weighting (CFW) adjusts contributions across layers based on inter-layer correlations, refining the final output. Extensive experiments demonstrate the superior performance of ADWM compared to recent state-of-the-art (SOTA) methods. Furthermore, we validate the effectiveness of our approach through generality experiments, redundancy visualization, comparison experiments, key variables and complexity analysis, and ablation studies. Our code is available at https://github.com/Jie-1203/ADWM.

Updated: 2025-03-21 12:55:38

标题: 一种用于遥感图像融合的通用自适应双层加权机制

摘要: 目前，基于深度学习的遥感全色增强方法发展迅速。然而，许多现有方法很难充分利用特征的异质性和冗余性，从而限制了它们的有效性。我们使用协方差矩阵来建模特征的异质性和冗余性，并提出了关联感知协方差加权（CACW）来进行调整。CACW通过协方差矩阵捕获这些相关性，然后通过非线性函数处理以生成调整权重。在CACW的基础上，我们引入了一个通用的自适应双层权重机制（ADWM）来从两个关键角度解决这些挑战，增强了广泛的现有深度学习方法。首先，内部特征加权（IFW）评估每个特征中通道之间的相关性，以减少冗余并增强独特信息。其次，跨特征加权（CFW）根据层间相关性调整各层的贡献，优化最终输出。大量实验证明了ADWM相对于最近的最先进方法的卓越性能。此外，我们通过通用性实验、冗余可视化、比较实验、关键变量和复杂性分析以及消融研究验证了我们方法的有效性。我们的代码可在https://github.com/Jie-1203/ADWM上找到。

更新时间: 2025-03-21 12:55:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.13214v3

Beyond Accuracy: What Matters in Designing Well-Behaved Models?

Deep learning has become an essential part of computer vision, with deep neural networks (DNNs) excelling in predictive performance. However, they often fall short in other critical quality dimensions, such as robustness, calibration, or fairness. While existing studies have focused on a subset of these quality dimensions, none have explored a more general form of "well-behavedness" of DNNs. With this work, we address this gap by simultaneously studying nine different quality dimensions for image classification. Through a large-scale study, we provide a bird's-eye view by analyzing 326 backbone models and how different training paradigms and model architectures affect the quality dimensions. We reveal various new insights such that (i) vision-language models exhibit high fairness on ImageNet-1k classification and strong robustness against domain changes; (ii) self-supervised learning is an effective training paradigm to improve almost all considered quality dimensions; and (iii) the training dataset size is a major driver for most of the quality dimensions. We conclude our study by introducing the QUBA score (Quality Understanding Beyond Accuracy), a novel metric that ranks models across multiple dimensions of quality, enabling tailored recommendations based on specific user needs.

Updated: 2025-03-21 12:54:18

标题: 超越准确性：在设计良好行为模型时重要的是什么？

摘要: 深度学习已成为计算机视觉的重要组成部分，深度神经网络（DNNs）在预测性能方面表现出色。然而，它们在其他关键质量维度上经常表现不佳，如鲁棒性、校准性或公平性。尽管现有研究集中在这些质量维度的子集上，但没有探索更一般形式的DNN的“行为良好性”。通过这项工作，我们通过对图像分类进行九个不同质量维度的同时研究来填补这一空白。通过大规模研究，我们通过分析326个骨干模型以及不同的训练范式和模型架构如何影响质量维度，提供了鸟瞰图。我们揭示了各种新的见解，例如（i）视觉语言模型在ImageNet-1k分类中表现出高公平性和对域变化的强鲁棒性；（ii）自监督学习是提高几乎所有考虑的质量维度的有效训练范式；（iii）训练数据集大小是大多数质量维度的主要驱动因素。我们通过引入QUBA分数（超越准确性的质量理解）来总结我们的研究，这是一种新的度量标准，可根据特定用户需求对模型在多个质量维度上进行排名，从而实现个性化推荐。

更新时间: 2025-03-21 12:54:18

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17110v1

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music in ABC notation, and then fine-tuned on approximately 9K high-quality classical compositions conditioned on "period-composer-instrumentation" prompts. For reinforcement learning, we propose the CLaMP-DPO method, which further enhances generation quality and controllability without requiring human annotations or predefined rewards. Our experiments demonstrate the efficacy of CLaMP-DPO in symbolic music generation models with different architectures and encoding schemes. Furthermore, subjective A/B tests show that NotaGen outperforms baseline models against human compositions, greatly advancing musical aesthetics in symbolic music generation.

Updated: 2025-03-21 12:53:04

标题: NotaGen：通过大型语言模型训练范式在符号音乐生成中推进音乐性

摘要: 我们介绍了NotaGen，这是一个旨在探索产生高质量古典乐谱潜力的符号音乐生成模型。受大型语言模型（LLMs）成功的启发，NotaGen采用了预训练、微调和强化学习范式（以下简称LLM训练范式）。它在ABC符号音乐的1.6M首乐曲上进行了预训练，然后在约9K首高质量古典作品上进行微调，条件是“时期-作曲家-乐器”提示。对于强化学习，我们提出了CLaMP-DPO方法，进一步提高了生成质量和可控性，而不需要人类注释或预定义奖励。我们的实验表明CLaMP-DPO在具有不同架构和编码方案的符号音乐生成模型中的有效性。此外，主观A/B测试显示，NotaGen在人类作品面前优于基准模型，极大地推动了符号音乐生成中的音乐美学。

更新时间: 2025-03-21 12:53:04

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2502.18008v5

SPDZCoder: Combining Expert Knowledge with LLMs for Generating Privacy-Computing Code

Privacy computing receives increasing attention but writing privacy computing code remains challenging for developers due to limited library functions, necessitating function implementation from scratch, and data-oblivious requirement, contradicting intuitive thinking and usual practices of programmers. Automating the generation of privacy computing code with Large Language Models can streamline development effort and lower the barrier to using privacy computing frameworks. However, existing LLMs still encounter challenges in code translation for privacy-preserving computation, such as translating Python to MP-SPDZ, due to the scarcity of MP-SPDZ data required for effective pre-training or fine-tuning. Moreover, the lack of a benchmark further complicates the evaluation of translation quality. To address the limitations, this work proposes SPDZCoder, a rule-based framework that combines LLMs with expert knowledge for generating privacy-computing code without requiring additional training data. Specifically, SPDZCoder employ a rigorous procedure for collecting high-quality expert knowledge to represent the semantic-expressing differences between Python and MP-SPDZ, and to derive transformation rules for translating Python to MP-SPDZ based on these knowledge. Then, SPDZCoder progressively converts Python code into MP-SPDZ code using transformation rules in a three stage pipeline. To evaluate SPDZCoder, we manually constructed a benchmark dataset, SPDZEval, which comprises six data splits, each representing a distinct class of challenging tasks in MP-SPDZ implementation. Extensive experiments show that SPDZCoder achieves superior performance, significantly surpassing baselines in pass@1 and pass@2. Specifically, SPDZCoder attains an overall correctness of 85.94% and 92.01% in pass@1 and pass@2, respectively, whereas the best-performing baseline achieves 63.58% and 76.36%, respectively.

Updated: 2025-03-21 12:52:57

标题: SPDZCoder：将专家知识与LLMs结合，生成隐私计算代码

摘要: 隐私计算受到越来越多的关注，但由于库函数有限，开发人员编写隐私计算代码仍然具有挑战性，需要从头开始实现功能，并且需要数据无视要求，这与程序员的直觉思维和通常的实践相矛盾。使用大型语言模型自动生成隐私计算代码可以简化开发工作并降低使用隐私计算框架的门槛。然而，现有的LLMs在隐私保护计算的代码翻译方面仍然面临挑战，例如将Python翻译为MP-SPDZ，由于缺乏有效的预训练或微调所需的MP-SPDZ数据。此外，缺乏基准进一步复杂化了翻译质量的评估。为了解决这些限制，本文提出了SPDZCoder，这是一个基于规则的框架，结合了LLMs和专家知识，用于生成隐私计算代码，无需额外的训练数据。具体来说，SPDZCoder采用一套严格的程序来收集高质量的专家知识，以表示Python和MP-SPDZ之间的语义表达差异，并根据这些知识推导出将Python翻译为MP-SPDZ的转换规则。然后，SPDZCoder在一个三阶段的流水线中逐步将Python代码转换为MP-SPDZ代码。为了评估SPDZCoder，我们手动构建了一个基准数据集SPDZEval，该数据集包括六个数据分割，每个分割代表MP-SPDZ实现中具有挑战性任务的不同类别。大量实验证明，SPDZCoder取得了优越的性能，显著超过了基线的pass@1和pass@2。具体来说，SPDZCoder在pass@1和pass@2中分别达到了85.94%和92.01%的总体正确率，而表现最佳的基线分别达到了63.58%和76.36%。

更新时间: 2025-03-21 12:52:57

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2501.00363v2

Long-term excitation energy transfer predicted by a modified convolutional neural networks in the FMO complexes

In machine learning (ML), the risk of recursive strategies overfitting historical data has driven the development of convolutional neural networks (CNNs) in simulating quantum dissipative dynamics. In this work, we propose an efficient CNNs scheme incorporating novel redundant time-functions to predict 100 picosecond (ps) excitation energy transfer (EET) in Fenna-Matthews-Olson (FMO) complexes, in which the original time $t$ is normalized by mapping it to the [0, 1] range, allowing different functions focus on distinct time intervals, thereby effectively capturing the multi-timescale characteristics of EET dynamics. This method simplifies optimization and enhances learning efficiency, and demonstrate the superior accuracy, robustness, and efficiency of our approach in predicting quantum dissipative dynamics.

Updated: 2025-03-21 12:40:39

标题: 使用修改后的卷积神经网络预测FMO复合物中的长期激发能量转移

摘要: 在机器学习（ML）中，递归策略过度拟合历史数据的风险推动了卷积神经网络（CNNs）在模拟量子耗散动力学方面的发展。在这项工作中，我们提出了一种高效的CNNs方案，将新颖的冗余时间函数纳入其中，以预测Fenna-Matthews-Olson（FMO）复合物中100皮秒（ps）的激发能量转移（EET），其中原始时间$t$通过将其映射到[0, 1]范围进行归一化，从而使不同函数专注于不同的时间间隔，从而有效地捕捉EET动力学的多时间尺度特征。这种方法简化了优化并增强了学习效率，并展示了我们方法在预测量子耗散动力学方面的卓越准确性、稳健性和效率。

更新时间: 2025-03-21 12:40:39

领域: physics.chem-ph,cs.LG,quant-ph,2020: 05C70

下载: http://arxiv.org/abs/2503.17430v1

Large Language Model Compression via the Nested Activation-Aware Decomposition

In this paper, we tackle the critical challenge of compressing large language models (LLMs) to facilitate their practical deployment and broader adoption. We introduce a novel post-training compression paradigm that focuses on low-rank decomposition of LLM weights. Our analysis identifies two main challenges in this task: the variability in LLM activation distributions and handling unseen activations from different datasets and models. To address these challenges, we propose a nested activation-aware framework (NSVD) for LLMs, a training-free approach designed to enhance the accuracy of low-rank decompositions by managing activation outliers through transforming the weight matrix based on activation distribution and the original weight matrix. This method allows for the absorption of outliers into the transformed weight matrix, improving decomposition accuracy. Our comprehensive evaluation across eight datasets and six models from three distinct LLM families demonstrates the superiority of NSVD over current state-of-the-art methods, especially at medium to large compression ratios or in multilingual and multitask settings.

Updated: 2025-03-21 12:39:16

标题: 通过嵌套激活感知分解对大型语言模型进行压缩

摘要: 在这篇论文中，我们致力于解决压缩大型语言模型（LLMs）的关键挑战，以促进它们在实际部署和更广泛应用中的使用。我们引入了一种新颖的训练后压缩范式，重点放在对LLM权重进行低秩分解上。我们的分析确定了这个任务中的两个主要挑战：LLM激活分布的变化性以及处理来自不同数据集和模型的未见激活。为了解决这些挑战，我们提出了一个针对LLMs的嵌套激活感知框架（NSVD），这是一个无需训练的方法，旨在通过基于激活分布和原始权重矩阵对权重矩阵进行变换，从而管理激活异常值，以提高低秩分解的准确性。这种方法允许将异常值吸收到转换后的权重矩阵中，从而提高分解的准确性。我们在来自三种不同LLM家族的八个数据集和六个模型上进行了全面评估，结果显示NSVD在当前最先进的方法上具有优越性，特别是在中等到大的压缩比或多语言和多任务设置下。

更新时间: 2025-03-21 12:39:16

领域: cs.LG

下载: http://arxiv.org/abs/2503.17101v1

Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models

Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks, where adaptation to a new domain leads to a substantial decline in performance on previous tasks. In this paper, we propose Controlled LoRA (CLoRA), a sub-space regularization method on LoRA structure. Aiming to reduce the scale of output change while introduce minimal constraint on model capacity, CLoRA imposes constraint on the direction of updating matrix's null space. Experimental results on one-stage LLM finetuning tasks and continual learning settings highlight the superority of CLoRA as a effective parameter efficient finetuning method with catastrophic forgetting mitigating.Further investigation for model parameters indicates that CLoRA effectively balances the trade-off between model capacity and degree of forgetting.

Updated: 2025-03-21 12:34:15

标题: 大型语言模型的控制低秩调整与子空间正则化在持续训练中的应用

摘要: 大型语言模型（LLMs）在自然语言处理中展现出卓越的能力，但在学习新任务时面临灾难性遗忘，即适应新领域会导致在先前任务中性能大幅下降。在本文中，我们提出了一种名为Controlled LoRA（CLoRA）的子空间正则化方法，基于LoRA结构。CLoRA旨在减少输出变化的规模，同时最小限度地约束模型容量，对更新矩阵的零空间方向施加约束。对一阶段LLM微调任务和持续学习设置的实验结果突显了CLoRA作为一种有效的参数高效微调方法，能够减轻灾难性遗忘。对模型参数的进一步研究表明，CLoRA有效地平衡了模型容量和遗忘程度之间的权衡。

更新时间: 2025-03-21 12:34:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.16801v2

PMANet: Malicious URL detection via post-trained language model guided multi-level feature attention network

The proliferation of malicious URLs has made their detection crucial for enhancing network security. While pre-trained language models offer promise, existing methods struggle with domain-specific adaptability, character-level information, and local-global encoding integration. To address these challenges, we propose PMANet, a pre-trained Language Model-Guided multi-level feature attention network. PMANet employs a post-training process with three self-supervised objectives: masked language modeling, noisy language modeling, and domain discrimination, effectively capturing subword and character-level information. It also includes a hierarchical representation module and a dynamic layer-wise attention mechanism for extracting features from low to high levels. Additionally, spatial pyramid pooling integrates local and global features. Experiments on diverse scenarios, including small-scale data, class imbalance, and adversarial attacks, demonstrate PMANet's superiority over state-of-the-art models, achieving a 0.9941 AUC and correctly detecting all 20 malicious URLs in a case study. Code and data are available at https://github.com/Alixyvtte/Malicious-URL-Detection-PMANet.

Updated: 2025-03-21 12:26:20

标题: PMANet：通过后训练的语言模型引导的多级特征注意力网络检测恶意URL

摘要: The proliferation of malicious URLs has made their detection crucial for enhancing network security. While pre-trained language models offer promise, existing methods struggle with domain-specific adaptability, character-level information, and local-global encoding integration. To address these challenges, we propose PMANet, a pre-trained Language Model-Guided multi-level feature attention network. PMANet employs a post-training process with three self-supervised objectives: masked language modeling, noisy language modeling, and domain discrimination, effectively capturing subword and character-level information. It also includes a hierarchical representation module and a dynamic layer-wise attention mechanism for extracting features from low to high levels. Additionally, spatial pyramid pooling integrates local and global features. Experiments on diverse scenarios, including small-scale data, class imbalance, and adversarial attacks, demonstrate PMANet's superiority over state-of-the-art models, achieving a 0.9941 AUC and correctly detecting all 20 malicious URLs in a case study. Code and data are available at https://github.com/Alixyvtte/Malicious-URL-Detection-PMANet.

更新时间: 2025-03-21 12:26:20

领域: cs.CR

下载: http://arxiv.org/abs/2311.12372v2

FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields

Recent 3D face editing methods using masks have produced high-quality edited images by leveraging Neural Radiance Fields (NeRF). Despite their impressive performance, existing methods often provide limited user control due to the use of pre-trained segmentation masks. To utilize masks with a desired layout, an extensive training dataset is required, which is challenging to gather. We present FFaceNeRF, a NeRF-based face editing technique that can overcome the challenge of limited user control due to the use of fixed mask layouts. Our method employs a geometry adapter with feature injection, allowing for effective manipulation of geometry attributes. Additionally, we adopt latent mixing for tri-plane augmentation, which enables training with a few samples. This facilitates rapid model adaptation to desired mask layouts, crucial for applications in fields like personalized medical imaging or creative face editing. Our comparative evaluations demonstrate that FFaceNeRF surpasses existing mask based face editing methods in terms of flexibility, control, and generated image quality, paving the way for future advancements in customized and high-fidelity 3D face editing. The code is available on the {\href{https://kwanyun.github.io/FFaceNeRF_page/}{project-page}}.

Updated: 2025-03-21 12:24:58

标题: FFaceNeRF：神经辐射场中的少样本人脸编辑

摘要: 最近使用面具的3D面部编辑方法利用神经辐射场（NeRF）生成了高质量的编辑图像。尽管它们的表现令人印象深刻，但由于使用预训练的分割面具，现有方法通常提供有限的用户控制。为了利用具有所需布局的面具，需要一个大量的训练数据集，这在收集上是具有挑战性的。我们提出了FFaceNeRF，一种基于NeRF的面部编辑技术，可以克服由于使用固定面具布局而导致的有限用户控制的挑战。我们的方法采用几何适配器与特征注入，实现了对几何属性的有效操作。此外，我们采用了潜在混合三平面增强，这使得可以用少量样本进行训练。这有助于快速模型适应所需的面具布局，这对于个性化医学成像或创意面部编辑等领域的应用至关重要。我们的比较评估表明，FFaceNeRF在灵活性、控制和生成图像质量方面超越了现有基于面具的面部编辑方法，为定制和高保真度的3D面部编辑的未来发展铺平了道路。代码可在项目页面上找到：https://kwanyun.github.io/FFaceNeRF_page/。

更新时间: 2025-03-21 12:24:58

领域: cs.GR,cs.AI,cs.CV,68T45, 68U05,I.3.3; I.3.8

下载: http://arxiv.org/abs/2503.17095v1

Does a Rising Tide Lift All Boats? Bias Mitigation for AI-based CMR Segmentation

Artificial intelligence (AI) is increasingly being used for medical imaging tasks. However, there can be biases in the resulting models, particularly when they were trained using imbalanced training datasets. One such example has been the strong race bias effect in cardiac magnetic resonance (CMR) image segmentation models. Although this phenomenon has been reported in a number of publications, little is known about the effectiveness of bias mitigation algorithms in this domain. We aim to investigate the impact of common bias mitigation methods to address bias between Black and White subjects in AI-based CMR segmentation models. Specifically, we use oversampling, importance reweighing and Group DRO as well as combinations of these techniques to mitigate the race bias. Furthermore, motivated by recent findings on the root causes of AI-based CMR segmentation bias, we evaluate the same methods using models trained and evaluated on cropped CMR images. We find that bias can be mitigated using oversampling, significantly improving performance for the underrepresented Black subjects whilst not significantly reducing the majority White subjects' performance. Group DRO also improves performance for Black subjects but not significantly, while reweighing decreases performance for Black subjects. Using a combination of oversampling and Group DRO also improves performance for Black subjects but not significantly. Using cropped images increases performance for both races and reduces the bias, whilst adding oversampling as a bias mitigation technique with cropped images reduces the bias further.

Updated: 2025-03-21 12:17:43

标题: 增长的潮水是否能提升所有船只？基于人工智能的CMR分割的偏见缓解

摘要: 人工智能（AI）越来越多地被用于医学成像任务。然而，当这些模型使用不平衡的训练数据集进行训练时，结果模型中可能存在偏见。一个例子是心脏磁共振（CMR）图像分割模型中存在的明显种族偏见效应。尽管这种现象在许多出版物中已经报道，但关于在这一领域使用偏见缓解算法的有效性知之甚少。我们旨在研究常见的偏见缓解方法对解决AI基础的CMR分割模型中黑人和白人之间的种族偏见的影响。具体来说，我们使用过采样、重要性重新加权和Group DRO以及这些技术的组合来缓解种族偏见。此外，受最近关于AI基础CMR分割偏见根本原因的发现的启发，我们评估了在裁剪的CMR图像上训练和评估的模型使用相同的方法。我们发现，通过过采样可以缓解偏见，显著提高了被低估的黑人主体的性能，同时并没有显著降低大多数白人主体的性能。Group DRO也提高了黑人主体的性能，但并不显著，而重新加权则降低了黑人主体的性能。使用过采样和Group DRO的组合也可以提高黑人主体的性能，但并不显著。使用裁剪图像可以提高两种种族的性能并减少偏见，同时在裁剪图像中添加过采样作为偏见缓解技术可以进一步减少偏见。

更新时间: 2025-03-21 12:17:43

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.17089v1

Deterministic AI Agent Personality Expression through Standard Psychological Diagnostics

Artificial intelligence (AI) systems powered by large language models have become increasingly prevalent in modern society, enabling a wide range of applications through natural language interaction. As AI agents proliferate in our daily lives, their generic and uniform expressiveness presents a significant limitation to their appeal and adoption. Personality expression represents a key prerequisite for creating more human-like and distinctive AI systems. We show that AI models can express deterministic and consistent personalities when instructed using established psychological frameworks, with varying degrees of accuracy depending on model capabilities. We find that more advanced models like GPT-4o and o1 demonstrate the highest accuracy in expressing specified personalities across both Big Five and Myers-Briggs assessments, and further analysis suggests that personality expression emerges from a combination of intelligence and reasoning capabilities. Our results reveal that personality expression operates through holistic reasoning rather than question-by-question optimization, with response-scale metrics showing higher variance than test-scale metrics. Furthermore, we find that model fine-tuning affects communication style independently of personality expression accuracy. These findings establish a foundation for creating AI agents with diverse and consistent personalities, which could significantly enhance human-AI interaction across applications from education to healthcare, while additionally enabling a broader range of more unique AI agents. The ability to quantitatively assess and implement personality expression in AI systems opens new avenues for research into more relatable, trustworthy, and ethically designed AI.

Updated: 2025-03-21 12:12:05

标题: Deterministic AI Agent Personality Expression through Standard Psychological Diagnostics 通过标准心理诊断确定性人工智能代理人格表达

摘要: 由大型语言模型驱动的人工智能（AI）系统在现代社会中变得越来越普遍，通过自然语言互动实现了广泛的应用。随着AI代理在我们日常生活中的增多，它们的通用和统一表达能力对其吸引力和采纳性构成了重要限制。个性表达是创造更具人类特点和独特性的AI系统的关键前提。我们展示了当使用确立的心理框架指导时，AI模型可以表达确定性和一致性的个性，具体程度取决于模型的能力。我们发现像GPT-4o和o1这样更先进的模型在表达指定个性方面表现出最高的准确性，跨大五和迈尔斯-布里格斯评估都是如此，并且进一步分析表明个性表达源自智能和推理能力的结合。我们的结果显示，个性表达通过整体推理而非逐题优化运作，响应标度指标显示出比测试标度指标更高的差异。此外，我们发现模型微调会独立于个性表达准确性影响通信风格。这些发现为创造具有多样和一致个性的AI代理奠定了基础，这可以显著增强从教育到医疗等各种应用中的人机交互，同时还能够使更多独特的AI代理获得发展。在AI系统中定量评估和实施个性表达的能力为研究更具相关性、可信赖和道德设计的AI开辟了新的研究途径。

更新时间: 2025-03-21 12:12:05

领域: cs.LG,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2503.17085v1

Unified continuous-time q-learning for mean-field game and mean-field control problems

This paper studies the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.

Updated: 2025-03-21 12:10:30

标题: 统一的连续时间q学习用于均场博弈和均场控制问题

摘要: 本文研究了连续时间q-learning在均场跳跃扩散模型中的应用，当人口分布不是直接可观测时。我们从代表性代理人的角度提出了解耦形式的集成q函数（解耦Iq函数），并建立了其鞅特征化，为均场博弈（MFG）和均场控制（MFC）问题提供了统一的策略评估规则。此外，我们考虑了代表性代理人根据自身状态值更新人口分布的学习过程。根据解决MFG或MFC问题的任务，我们可以不同地利用解耦Iq函数来刻画均场平衡策略或均场最优策略。基于这些理论发现，我们通过利用测试策略和平均鞅正交条件，设计了一个统一的q-learning算法，用于MFG和MFC问题。对于跳跃扩散设置中的一些金融应用，我们获得了解耦Iq函数和价值函数的精确参数化，并展示了我们的q-learning算法具有令人满意的性能。

更新时间: 2025-03-21 12:10:30

领域: math.OC,cs.LG,q-fin.CP

下载: http://arxiv.org/abs/2407.04521v2

Invariant Federated Learning for Edge Intelligence: Mitigating Heterogeneity and Asynchrony via Exit Strategy and Invariant Penalty

This paper provides an invariant federated learning system for resource-constrained edge intelligence. This framework can avoid the impact of heterogeneity and asynchrony by exit strategy and invariant penalty. We decompose local information into two orthogonal components to measure the contribution or impact of heterogeneous and asynchronous clients. We propose that the exit of abnormal clients can guarantee the effect of the model on most clients. Meanwhile, to ensure the models' performance on exited abnormal clients and those who lack training resources, we propose Federated Learning with Invariant Penalty for Generalization (FedIPG) based on the invariant orthogonal decomposition of parameters. Theoretical proof shows that FedIPG reduces the Out-Of-Distribution prediction loss without increasing the communication burden. The performance of FedIPG combined with an exit strategy is tested empirically in multiple scales using four datasets. It shows our system can enhance In-Distribution performance and outperform the state-of-the-art algorithm in Out-Of-Distribution generalization while maintaining model convergence. Additionally, the results of the visual experiment prove that FedIPG contains preliminary causality in terms of ignoring confounding features.

Updated: 2025-03-21 12:03:44

标题: 不变的边缘智能联邦学习：通过退出策略和不变惩罚减轻异质性和异步性

摘要: 这篇论文提供了一个适用于资源受限边缘智能的不变联邦学习系统。该框架可以通过退出策略和不变惩罚来避免异构性和异步性的影响。我们将本地信息分解为两个正交分量，以衡量异构和异步客户的贡献或影响。我们提出，异常客户的退出可以保证模型对大多数客户的影响。同时，为了确保模型在退出的异常客户和缺乏训练资源的客户身上的表现，我们提出了基于参数不变正交分解的通用化惩罚联邦学习（FedIPG）。理论证明表明，FedIPG减少了预测失误的分布外损失，而不增加通信负担。通过在四个数据集上多个尺度的实证测试，结合退出策略的FedIPG的性能得到验证。它显示了我们的系统可以增强分布内性能，并在分布外泛化方面胜过最先进的算法，同时保持模型收敛。此外，视觉实验的结果证明了FedIPG在忽略混淆特征方面包含初步因果关系。

更新时间: 2025-03-21 12:03:44

领域: cs.LG

下载: http://arxiv.org/abs/2503.06158v2

DITTO: Offline Imitation Learning with World Models

For imitation learning algorithms to scale to real-world challenges, they must handle high-dimensional observations, offline learning, and policy-induced covariate-shift. We propose DITTO, an offline imitation learning algorithm which addresses all three of these problems. DITTO optimizes a novel distance metric in the latent space of a learned world model: First, we train a world model on all available trajectory data, then, the imitation agent is unrolled from expert start states in the learned model, and penalized for its latent divergence from the expert dataset over multiple time steps. We optimize this multi-step latent divergence using standard reinforcement learning algorithms, which provably induces imitation learning, and empirically achieves state-of-the art performance and sample efficiency on a range of Atari environments from pixels, without any online environment access. We also adapt other standard imitation learning algorithms to the world model setting, and show that this considerably improves their performance. Our results show how creative use of world models can lead to a simple, robust, and highly-performant policy-learning framework.

Updated: 2025-03-21 12:00:05

标题: DITTO：具有世界模型的离线模仿学习

摘要: 为了使模仿学习算法能够应对现实世界的挑战，它们必须处理高维观察数据、离线学习和策略诱发的协变量转移。我们提出了一种名为DITTO的离线模仿学习算法，它解决了这三个问题。DITTO在学习的世界模型的潜在空间中优化了一种新颖的距离度量：首先，在所有可用的轨迹数据上训练一个世界模型，然后，在学习模型中从专家的起始状态展开模仿代理，并且在多个时间步骤上对其与专家数据集的潜在差异进行惩罚。我们使用标准的强化学习算法优化这个多步潜在差异，这可以证明诱发模仿学习，并在一系列Atari环境中的像素上实现了最先进的性能和样本效率，而不需要在线环境访问。我们还将其他标准的模仿学习算法调整到世界模型设置中，并展示这显著改善了它们的性能。我们的结果表明，对世界模型的创造性使用可以导致一个简单、稳健且高性能的策略学习框架。

更新时间: 2025-03-21 12:00:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2302.03086v2

Knowledge Transfer based Evolutionary Deep Neural Network for Intelligent Fault Diagnosis

A faster response with commendable accuracy in intelligent systems is essential for the reliability and smooth operations of industrial machines. Two main challenges affect the design of such intelligent systems: (i) the selection of a suitable model and (ii) domain adaptation if there is a continuous change in operating conditions. Therefore, we propose an evolutionary Net2Net transformation (EvoN2N) that finds the best suitable DNN architecture with limited availability of labeled data samples. Net2Net transformation-based quick learning algorithm has been used in the evolutionary framework of Non-dominated sorting genetic algorithm II to obtain the best DNN architecture. Net2Net transformation-based quick learning algorithm uses the concept of knowledge transfer from one generation to the next for faster fitness evaluation. The proposed framework can obtain the best model for intelligent fault diagnosis without a long and time-consuming search process. The proposed framework has been validated on the Case Western Reserve University dataset, the Paderborn University dataset, and the gearbox fault detection dataset under different operating conditions. The best models obtained are capable of demonstrating an excellent diagnostic performance and classification accuracy of almost up to 100% for most of the operating conditions.

Updated: 2025-03-21 11:54:41

标题: 基于知识传递的演化深度神经网络用于智能故障诊断

摘要: 智能系统在工业机器的可靠性和平稳运行中提供更快的响应和值得称赞的准确性是必不可少的。设计这种智能系统面临两个主要挑战：（i）选择合适的模型和（ii）如果操作条件持续变化，则需要域适应。因此，我们提出了一种进化Net2Net变换（EvoN2N），该变换在有限的标记数据样本可用的情况下找到最适合的DNN架构。Net2Net变换基于快速学习算法已被用于非支配排序遗传算法II的进化框架中，以获得最佳的DNN架构。基于Net2Net变换的快速学习算法利用从一代到下一代的知识传递的概念来加快适应度评估。所提出的框架可以在不经过漫长和耗时的搜索过程的情况下获得最佳的智能故障诊断模型。提出的框架已在Case Western Reserve University数据集、Paderborn University数据集和齿轮箱故障检测数据集在不同操作条件下进行了验证。获得的最佳模型能够在大多数操作条件下展示出卓越的诊断性能和接近100%的分类准确性。

更新时间: 2025-03-21 11:54:41

领域: eess.SP,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2109.13479v5

Multi-Span Optical Power Spectrum Evolution Modeling using ML-based Multi-Decoder Attention Framework

We implement a ML-based attention framework with component-specific decoders, improving optical power spectrum prediction in multi-span networks. By reducing the need for in-depth training on each component, the framework can be scaled to multi-span topologies with minimal data collection, making it suitable for brown-field scenarios.

Updated: 2025-03-21 11:54:36

标题: 使用基于ML的多解码器注意力框架对多跨光功率谱演变进行建模

摘要: 我们实施了一个基于机器学习的注意力框架，其中包含特定组件的解码器，提高了多跨网络中光学功率谱的预测能力。通过减少对每个组件深度培训的需求，该框架可以扩展到多跨拓扑结构，并最大限度地减少数据采集，使其适用于棕地场景。

更新时间: 2025-03-21 11:54:36

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2503.17072v1

A Thorough Assessment of the Non-IID Data Impact in Federated Learning

Federated learning (FL) allows collaborative machine learning (ML) model training among decentralized clients' information, ensuring data privacy. The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data. This open problem has notable consequences, such as decreased model performance and more significant convergence times. Despite its importance, experimental studies systematically addressing all types of data heterogeneity (a.k.a. non-IIDness) remain scarce. We aim to fill this gap by assessing and quantifying the non-IID effect through a thorough empirical analysis. We use the Hellinger Distance (HD) to measure differences in distribution among clients. Our study benchmarks four state-of-the-art strategies for handling non-IID data, including label, feature, quantity, and spatiotemporal skewness, under realistic and controlled conditions. This is the first comprehensive analysis of the spatiotemporal skew effect in FL. Our findings highlight the significant impact of label and spatiotemporal skew non-IID types on FL model performance, with notable performance drops occurring at specific HD thresholds. Additionally, the FL performance is heavily affected mainly when the non-IIDness is extreme. Thus, we provide recommendations for FL research to tackle data heterogeneity effectively. Our work represents the most extensive examination of non-IIDness in FL, offering a robust foundation for future research.

Updated: 2025-03-21 11:53:36

标题: 《联邦学习中非独立同分布数据影响的全面评估》

摘要: 联邦学习（FL）允许在去中心化客户信息之间进行协作机器学习（ML）模型训练，确保数据隐私。FL的去中心化特性处理非独立和同分布（非IID）数据。这个开放问题具有显著的后果，如模型性能降低和收敛时间更长。尽管其重要性，系统地解决所有类型数据异质性（也称为非IID性）的实验研究仍然很少。我们旨在通过彻底的实证分析评估和量化非IID效应。我们使用Hellinger距离（HD）来衡量客户之间分布的差异。我们在现实和受控条件下对处理非IID数据的四种最先进策略进行了基准测试，包括标签、特征、数量和时空偏斜。这是FL中对时空偏斜效应进行的第一次全面分析。我们的研究结果突出了标签和时空偏斜非IID类型对FL模型性能的显著影响，在特定HD阈值处出现明显性能下降。此外，当非IID性极端时，FL性能受到严重影响。因此，我们提供了处理数据异质性的FL研究建议。我们的工作代表了FL中最全面的非IID性考察，为未来研究奠定了坚实的基础。

更新时间: 2025-03-21 11:53:36

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.17070v1

Inverting Transformer-based Vision Models

Understanding the mechanisms underlying deep neural networks in computer vision remains a fundamental challenge. While many previous approaches have focused on visualizing intermediate representations within deep neural networks, particularly convolutional neural networks, these techniques have yet to be thoroughly explored in transformer-based vision models. In this study, we apply a modular approach of training inverse models to reconstruct input images from intermediate layers within a Detection Transformer and a Vision Transformer, showing that this approach is efficient and feasible. Through qualitative and quantitative evaluations of reconstructed images, we generate insights into the underlying mechanisms of these architectures, highlighting their similarities and differences in terms of contextual shape and preservation of image details, inter-layer correlation, and robustness to color perturbations. Our analysis illustrates how these properties emerge within the models, contributing to a deeper understanding of transformer-based vision models. The code for reproducing our experiments is available at github.com/wiskott-lab/inverse-detection-transformer.

Updated: 2025-03-21 11:50:06

标题: 倒置变压器型视觉模型

摘要: 理解计算机视觉中深度神经网络的机制仍然是一个基本挑战。虽然许多先前的方法着重于可视化深度神经网络中的中间表示，特别是卷积神经网络，但这些技术尚未在基于Transformer的视觉模型中得到彻底探索。在这项研究中，我们应用了一个模块化的方法，训练逆模型以从检测Transformer和视觉Transformer中的中间层重建输入图像，表明这种方法是高效且可行的。通过对重建图像进行定性和定量评估，我们深入了解了这些架构的基本机制，突出了它们在上下文形状和图像细节的保留、层间相关性和对颜色扰动的稳健性方面的相似性和差异。我们的分析说明了这些特性是如何在模型中出现的，有助于更深入地理解基于Transformer的视觉模型。重现我们实验的代码可在github.com/wiskott-lab/inverse-detection-transformer找到。

更新时间: 2025-03-21 11:50:06

领域: cs.CV,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2412.06534v2

PVChat: Personalized Video Chat with One-Shot Learning

Video large language models (ViLLMs) excel in general video understanding, e.g., recognizing activities like talking and eating, but struggle with identity-aware comprehension, such as "Wilson is receiving chemotherapy" or "Tom is discussing with Sarah", limiting their applicability in smart healthcare and smart home environments. To address this limitation, we propose a one-shot learning framework PVChat, the first personalized ViLLM that enables subject-aware question answering (QA) from a single video for each subject. Our approach optimizes a Mixture-of-Heads (MoH) enhanced ViLLM on a synthetically augmented video-QA dataset, leveraging a progressive image-to-video learning strategy. Specifically, we introduce an automated augmentation pipeline that synthesizes identity-preserving positive samples and retrieves hard negatives from existing video corpora, generating a diverse training dataset with four QA types: existence, appearance, action, and location inquiries. To enhance subject-specific learning, we propose a ReLU Routing MoH attention mechanism, alongside two novel objectives: (1) Smooth Proximity Regularization for progressive learning through exponential distance scaling and (2) Head Activation Enhancement for balanced attention routing. Finally, we adopt a two-stage training strategy, transitioning from image pre-training to video fine-tuning, enabling a gradual learning process from static attributes to dynamic representations. We evaluate PVChat on diverse datasets covering medical scenarios, TV series, anime, and real-world footage, demonstrating its superiority in personalized feature understanding after learning from a single video, compared to state-of-the-art ViLLMs.

Updated: 2025-03-21 11:50:06

标题: PVChat：个性化视频聊天与一次性学习

摘要: 视频大型语言模型（ViLLMs）在一般视频理解方面表现出色，例如，识别像谈话和进食这样的活动，但在认知身份感知方面存在困难，比如“Wilson正在接受化疗”或“Tom正在与Sarah讨论”，从而限制了它们在智能医疗和智能家居环境中的适用性。为了解决这一限制，我们提出了一种一次性学习框架PVChat，这是第一个个性化的ViLLM，可以从每个主体的单个视频中进行主体感知问题回答（QA）。我们的方法在一个经过合成增强的视频-QA数据集上优化了一个增强的Mixture-of-Heads（MoH）ViLLM，利用渐进的图像到视频学习策略。具体而言，我们引入了一个自动增强管道，从现有视频语料库中合成保留身份的正样本并检索困难的负样本，生成一个包含四种QA类型的多样化训练数据集：存在、外观、动作和位置查询。为了增强主体特定学习，我们提出了一个ReLU Routing MoH注意机制，以及两个新颖的目标：（1）平滑接近正则化，通过指数距离缩放实现渐进学习，和（2）头部激活增强，用于平衡注意力路由。最后，我们采用了一个两阶段训练策略，从图像预训练过渡到视频微调，实现了从静态属性到动态表征的渐进学习过程。我们在涵盖医疗场景、电视剧、动漫和真实片段的多样化数据集上评估了PVChat，在从单个视频学习后的个性化特征理解方面表现出优越性，相比于最先进的ViLLMs。

更新时间: 2025-03-21 11:50:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17069v1

ATHENA: An In-vehicle CAN Intrusion Detection Framework Based on Physical Characteristics of Vehicle Systems

With the growing interconnection between In-Vehicle Networks (IVNs) and external environments, intelligent vehicles are increasingly vulnerable to sophisticated external network attacks. This paper proposes ATHENA, the first IVN intrusion detection framework that adopts a vehicle-cloud integrated architecture to achieve better security performance for the resource-constrained vehicular environment. Specifically, in the cloud with sufficient resources, ATHENA uses the clustering method of multi-distribution mixture model combined with deep data mining technology to generate the raw Payload Rule Bank of IVN CAN messages, and then improves the rule quality with the help of exploitation on the first-principled physical knowledge of the vehicle system, after which the payload rules are periodically sent to the vehicle terminal. At the vehicle terminal, a simple LSTM component is used to generate the Time Rule Bank representing the long-term time series dependencies and the periodic characteristics of CAN messages, but not for any detection tasks as in traditional usage scenarios, where only the generated time rules are the candidates for further IVN intrusion detection tasks. Based on both the payload and time rules generated from cloud and vehicle terminal, ATHENA can achieve efficient intrusion detection capability by simple rule-base matching operations, rather than using complex black-box reasoning of resource-intensive neural network models, which is in fact only used for rule logic generation phase instead of the actual intrusion detection phase in our framework. Comparative experimental results on the ROAD dataset, which is current the most outstanding real-world in-vehicle CAN dataset covering new instances of sophisticated and stealthy masquerade attacks, demonstrate ATHENA significantly outperforms the state-of-the-art IVN intrusion detection methods in detecting complex attacks.

Updated: 2025-03-21 11:49:08

标题: ATHENA：基于车辆系统物理特性的车载CAN入侵检测框架

摘要: 随着车载网络（IVNs）与外部环境之间日益增长的互联性，智能车辆越来越容易受到复杂的外部网络攻击。本文提出了ATHENA，这是第一个采用车辆-云集成架构的IVN入侵检测框架，以实现对资源受限的车载环境的更好安全性能。具体而言，在具有足够资源的云端，ATHENA采用多分布混合模型的聚类方法结合深度数据挖掘技术生成IVN CAN消息的原始有效负载规则库，然后通过对车辆系统的第一原则物理知识进行利用来改进规则质量，之后周期性地将有效负载规则发送到车载终端。在车载终端，使用简单的LSTM组件生成代表CAN消息的长期时间序列依赖性和周期特征的时间规则库，但不用于传统使用场景中的任何检测任务，其中仅生成的时间规则是进一步进行IVN入侵检测任务的候选项。基于云端和车载终端生成的有效负载和时间规则，ATHENA可以通过简单的规则匹配操作实现高效的入侵检测能力，而不是使用资源密集的神经网络模型的复杂黑匣推理，实际上该模型仅用于我们框架中规则逻辑生成阶段而非实际入侵检测阶段。在ROAD数据集上的对比实验结果，该数据集是当前最杰出的覆盖新型复杂和隐蔽伪装攻击实例的实车CAN数据集，证明了ATHENA在检测复杂攻击方面明显优于最先进的IVN入侵检测方法。

更新时间: 2025-03-21 11:49:08

领域: cs.CR

下载: http://arxiv.org/abs/2503.17067v1

Application of linear regression method to the deep reinforcement learning in continuous action cases

The linear regression (LR) method offers the advantage that optimal parameters can be calculated relatively easily, although its representation capability is limited than that of the deep learning technique. To improve deep reinforcement learning, the Least Squares Deep Q Network (LS-DQN) method was proposed by Levine et al., which combines Deep Q Network (DQN) with LR method. However, the LS-DQN method assumes that the actions are discrete. In this study, we propose the Double Least Squares Deep Deterministic Policy Gradient (DLS-DDPG) method to address this limitation. This method combines the LR method with the Deep Deterministic Policy Gradient (DDPG) technique, one of the representative deep reinforcement learning algorithms for continuous action cases. Numerical experiments conducted in MuJoCo environments showed that the LR update improved performance at least in some tasks, although there are difficulties such as the inability to make the regularization terms small.

Updated: 2025-03-21 11:40:42

标题: 线性回归方法在连续动作情况下深度强化学习中的应用

摘要: 线性回归（LR）方法具有优点，可以相对容易地计算出最优参数，尽管其表示能力比深度学习技术有限。为了改进深度强化学习，Levine等人提出了最小二乘深度Q网络（LS-DQN）方法，将深度Q网络（DQN）与LR方法结合起来。然而，LS-DQN方法假定动作是离散的。在这项研究中，我们提出了双最小二乘深度确定性策略梯度（DLS-DDPG）方法来解决这一限制。该方法将LR方法与连续动作情况下的代表性深度强化学习算法之一的深度确定性策略梯度（DDPG）技术结合起来。在MuJoCo环境中进行的数值实验显示，LR更新至少在某些任务中提高了性能，尽管存在一些困难，如无法使正则项变小。

更新时间: 2025-03-21 11:40:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.14976v2

An Attack on $p$-adic Lattice Public-key Cryptosystems and Signature Schemes

Lattices have many significant applications in cryptography. In 2021, the $p$-adic signature scheme and public-key encryption cryptosystem were introduced. They are based on the Longest Vector Problem (LVP) and the Closest Vector Problem (CVP) in $p$-adic lattices. These problems are considered to be challenging and there are no known deterministic polynomial time algorithms to solve them. In this paper, we improve the LVP algorithm in local fields. The modified LVP algorithm is a deterministic polynomial time algorithm when the field is totally ramified and $p$ is a polynomial in the rank of the input lattice. We utilize this algorithm to attack the above schemes so that we are able to forge a valid signature of any message and decrypt any ciphertext. Although these schemes are broken, this work does not mean that $p$-adic lattices are not suitable in constructing cryptographic primitives. We propose some possible modifications to avoid our attack at the end of this paper.

Updated: 2025-03-21 11:34:50

标题: 对$p$-adic格公钥密码系统和签名方案的攻击

摘要: 格有许多重要的应用。在2021年，引入了$p$-adic签名方案和公钥加密密码系统。它们是基于$p$-adic格中的最长向量问题（LVP）和最近向量问题（CVP）。这些问题被认为具有挑战性，目前没有已知的确定性多项式时间算法来解决它们。在本文中，我们改进了局部域中的LVP算法。修改后的LVP算法是一个确定性多项式时间算法，当域完全分裂且$p$是输入格的秩的多项式时。我们利用这个算法来攻击上述方案，从而能够伪造任何消息的有效签名并解密任何密文。尽管这些方案已经被破解，但这项工作并不意味着$p$-adic格不适合构建密码原语。我们在本文末尾提出了一些可能的修改来避免我们的攻击。

更新时间: 2025-03-21 11:34:50

领域: cs.CR,math.NT,Primary 11F85, Secondary 94A60

下载: http://arxiv.org/abs/2409.08774v2

Replay4NCL: An Efficient Memory Replay-based Methodology for Neuromorphic Continual Learning in Embedded AI Systems

Neuromorphic Continual Learning (NCL) paradigm leverages Spiking Neural Networks (SNNs) to enable continual learning (CL) capabilities for AI systems to adapt to dynamically changing environments. Currently, the state-of-the-art employ a memory replay-based method to maintain the old knowledge. However, this technique relies on long timesteps and compression-decompression steps, thereby incurring significant latency and energy overheads, which are not suitable for tightly-constrained embedded AI systems (e.g., mobile agents/robotics). To address this, we propose Replay4NCL, a novel efficient memory replay-based methodology for enabling NCL in embedded AI systems. Specifically, Replay4NCL compresses the latent data (old knowledge), then replays them during the NCL training phase with small timesteps, to minimize the processing latency and energy consumption. To compensate the information loss from reduced spikes, we adjust the neuron threshold potential and learning rate settings. Experimental results on the class-incremental scenario with the Spiking Heidelberg Digits (SHD) dataset show that Replay4NCL can preserve old knowledge with Top-1 accuracy of 90.43% compared to 86.22% from the state-of-the-art, while effectively learning new tasks, achieving 4.88x latency speed-up, 20% latent memory saving, and 36.43% energy saving. These results highlight the potential of our Replay4NCL methodology to further advances NCL capabilities for embedded AI systems.

Updated: 2025-03-21 11:33:22

标题: Replay4NCL：一种用于嵌入式AI系统中神经形态连续学习的高效基于内存回放的方法论

摘要: 神经形态的持续学习（NCL）范式利用脉冲神经网络（SNN）使人工智能系统具有持续学习（CL）能力，以适应动态变化的环境。目前，最先进的技术采用基于记忆重放的方法来保持旧知识。然而，这种技术依赖于长时间步和压缩解压步骤，从而产生显著的延迟和能量开销，这对于严格受限的嵌入式人工智能系统（例如移动代理/机器人）并不合适。为了解决这个问题，我们提出了Replay4NCL，一种新颖高效的基于记忆重放的方法论，用于在嵌入式人工智能系统中实现NCL。具体来说，Replay4NCL压缩潜在数据（旧知识），然后在NCL训练阶段以较小的时间步骤重放它们，以最小化处理延迟和能量消耗。为了弥补减少脉冲所导致的信息损失，我们调整了神经元阈值电位和学习速率设置。在具有Spiking Heidelberg Digits（SHD）数据集的类增量场景上的实验结果显示，与最先进技术的86.22%相比，Replay4NCL能够以90.43%的Top-1准确率保留旧知识，同时有效地学习新任务，实现4.88倍的延迟加速，20%的潜在记忆节省和36.43%的能量节省。这些结果突显了我们的Replay4NCL方法论进一步推动了嵌入式人工智能系统的NCL能力的潜力。

更新时间: 2025-03-21 11:33:22

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.17061v1

3D variational autoencoder for fingerprinting microstructure volume elements

Microstructure quantification is an important step towards establishing structure-property relationships in materials. Machine learning-based image processing methods have been shown to outperform conventional image processing techniques and are increasingly applied to microstructure quantification tasks. In this work, we present a 3D variational autoencoder (VAE) for encoding microstructure volume elements (VEs) comprising voxelated crystallographic orientation data. Crystal symmetries in the orientation space are accounted for by mapping to the crystallographic fundamental zone as a preprocessing step, which allows for a continuous loss function to be used and improves the training convergence rate. The VAE is then used to encode a training set of VEs with an equiaxed polycrystalline microstructure with random texture. Accurate reconstructions are achieved with a relative average misorientation error of 9x10-3 on the test dataset, for a continuous latent space with dimension 256. We show that the model generalises well to microstructures with textures, grain sizes and aspect ratios outside the training distribution. Structure-property relationships are explored through using the training set of VEs as initial configurations in various crystal plasticity (CP) simulations. Microstructural fingerprints extracted from the VAE, which parameterise the VEs in a low-dimensional latent space, are stored alongside the volume-averaged stress response, at each strain increment, to uniaxial tensile deformation from CP simulations. This is then used to train a fully connected neural network mapping the input fingerprint to the resulting stress response, which acts as a surrogate model for the CP simulation. The fingerprint-based surrogate model is shown to accurately predict the microstructural dependence in the CP stress response, with a relative mean-squared error of 8.9x10-4 on unseen test data.

Updated: 2025-03-21 11:17:10

标题: 3D变分自编码器用于指纹微结构体积元素

摘要: 微观结构量化是建立材料结构-性能关系的重要步骤。基于机器学习的图像处理方法已被证明在微观结构量化任务中优于传统图像处理技术，并越来越多地应用于该领域。在本研究中，我们提出了一个用于编码微观结构体积元素（VEs）的3D变分自动编码器（VAE），其中包括体素化的晶体学取向数据。通过将晶体对称性映射到晶体学基本区作为预处理步骤来考虑取向空间中的晶体对称性，这允许使用连续损失函数并提高训练收敛速度。然后使用VAE对具有随机纹理的等轴多晶微观结构的训练集进行编码。在测试数据集上，通过具有256个维度的连续潜在空间，实现了相对平均错配误差为9x10-3的准确重建。我们展示了该模型对具有不同于训练分布的纹理、晶粒尺寸和纵横比的微观结构具有良好的泛化能力。通过使用VEs的训练集作为各种晶体塑性（CP）模拟中的初始配置来探索结构-性能关系。从VAE中提取的微观结构指纹，将VEs在低维潜在空间中参数化，并与来自CP模拟的单轴拉伸变形时的体积平均应力响应一起存储。然后使用这些数据来训练一个完全连接的神经网络，将输入指纹映射到结果应力响应，作为CP模拟的代理模型。基于指纹的代理模型被证明能够准确预测CP应力响应中的微观结构依赖性，对未见过的测试数据具有8.9x10-4的相对均方误差。

更新时间: 2025-03-21 11:17:10

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2503.17427v1

Data-Driven Optimization of EV Charging Station Placement Using Causal Discovery

This paper addresses the critical challenge of optimizing electric vehicle charging station placement through a novel data-driven methodology employing causal discovery techniques. While traditional approaches prioritize economic factors or power grid constraints, they often neglect empirical charging patterns that ultimately determine station utilization. We analyze extensive charging data from Palo Alto and Boulder (337,344 events across 100 stations) to uncover latent relationships between station characteristics and utilization. Applying structural learning algorithms (NOTEARS and DAGMA) to this data reveals that charging demand is primarily determined by three factors: proximity to amenities, EV registration density, and adjacency to high-traffic routes. These findings, consistent across multiple algorithms and urban contexts, challenge conventional infrastructure distribution strategies. We develop an optimization framework that translates these insights into actionable placement recommendations, identifying locations likely to experience high utilization based on the discovered dependency structures. The resulting site selection model prioritizes strategic clustering in high-amenity areas with substantial EV populations rather than uniform spatial distribution. Our approach contributes a framework that integrates empirical charging behavior into infrastructure planning, potentially enhancing both station utilization and user convenience. By focusing on data-driven insights instead of theoretical distribution models, we provide a more effective strategy for expanding charging networks that can adjust to various stages of EV market development.

Updated: 2025-03-21 11:15:02

标题: 基于数据驱动的因果发现优化电动汽车充电站位置的方法

摘要: 本文通过一种新颖的数据驱动方法，利用因果发现技术，解决了优化电动车充电站布置的关键挑战。传统方法往往优先考虑经济因素或电网约束，却经常忽视最终决定充电站利用率的经验充电模式。我们分析了来自Palo Alto和Boulder的大量充电数据（100个充电站的337,344次事件），以发现站点特征和利用率之间的潜在关系。将结构学习算法（NOTEARS和DAGMA）应用于这些数据，揭示了充电需求主要由三个因素决定：靠近便利设施、电动车注册密度和毗邻高交通路线。这些发现在多个算法和城市环境下保持一致，挑战了传统的基础设施分布策略。我们开发了一个优化框架，将这些见解转化为可操作的布置建议，确定基于发现的依赖结构可能经历高利用率的位置。由此产生的站点选择模型优先考虑在拥有大量电动车人口的高便利区域进行战略性集群，而不是均匀空间分布。我们的方法提供了一个将经验充电行为整合到基础设施规划中的框架，可能提高充电站的利用率和用户便利性。通过关注数据驱动的见解而不是理论分布模型，我们为扩展充电网络提供了一种更有效的策略，可以适应电动车市场发展的各个阶段。

更新时间: 2025-03-21 11:15:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17055v1

HAPI: A Model for Learning Robot Facial Expressions from Human Preferences

Automatic robotic facial expression generation is crucial for human-robot interaction, as handcrafted methods based on fixed joint configurations often yield rigid and unnatural behaviors. Although recent automated techniques reduce the need for manual tuning, they tend to fall short by not adequately bridging the gap between human preferences and model predictions-resulting in a deficiency of nuanced and realistic expressions due to limited degrees of freedom and insufficient perceptual integration. In this work, we propose a novel learning-to-rank framework that leverages human feedback to address this discrepancy and enhanced the expressiveness of robotic faces. Specifically, we conduct pairwise comparison annotations to collect human preference data and develop the Human Affective Pairwise Impressions (HAPI) model, a Siamese RankNet-based approach that refines expression evaluation. Results obtained via Bayesian Optimization and online expression survey on a 35-DOF android platform demonstrate that our approach produces significantly more realistic and socially resonant expressions of Anger, Happiness, and Surprise than those generated by baseline and expert-designed methods. This confirms that our framework effectively bridges the gap between human preferences and model predictions while robustly aligning robotic expression generation with human affective responses.

Updated: 2025-03-21 11:04:01

标题: HAPI：从人类偏好学习机器人面部表情的模型

摘要: 自动生成面部表情对于人机交互至关重要，因为基于固定关节配置的手工方法往往会产生僵硬和不自然的行为。尽管最近的自动化技术减少了手动调整的需求，但它们往往未能充分弥合人类偏好与模型预测之间的差距，导致表达受限于有限的自由度和不足的感知整合，缺乏细致和逼真的表达。在这项工作中，我们提出了一种新颖的学习排序框架，利用人类反馈来解决这种差异，并增强机器人面部的表现力。具体地，我们进行成对比较注释来收集人类偏好数据，并开发了人类情感成对印象（HAPI）模型，这是一种基于Siamese RankNet的方法，用于优化表达评估。通过贝叶斯优化和在线表达调查，在一个35自由度的机器人平台上获得的结果表明，我们的方法比基线和专家设计方法产生了更加逼真和社会共鸣的愤怒、快乐和惊讶表达。这证实了我们的框架有效地弥合了人类偏好和模型预测之间的差距，同时使机器人表情生成与人类情感反应牢固地保持一致。

更新时间: 2025-03-21 11:04:01

领域: cs.RO,cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2503.17046v1

Blended Conditional Gradients: the unconditioning of conditional gradients

We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable properties of conditional gradient algorithms, notably avoidance of projections onto P and maintenance of iterates as sparse convex combinations of a limited number of extreme points of P. The algorithm is lazy, making use of inexpensive inexact solutions of the linear programming subproblem that characterizes the conditional gradient approach. It decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the lazy conditional gradient algorithms of [arXiv:1410.8816]. We also present a streamlined version of the algorithm for the probability simplex.

Updated: 2025-03-21 11:03:00

标题: 混合条件梯度：条件梯度的无条件化

摘要: 我们提出了一种混合条件梯度方法，用于在一个多面体P上最小化一个光滑凸函数，将Frank--Wolfe算法（也称为条件梯度）与基于梯度的步骤结合起来，不同于远离步骤和成对步骤，但仍然实现了对于强凸函数的线性收敛性，同时具有良好的实际性能。我们的方法保留了条件梯度算法的所有有利特性，特别是避免对P进行投影并保持迭代点作为P的有限数量的极点的稀疏凸组合。该算法是懒惰的，利用表征条件梯度方法的线性规划子问题的廉价不精确解。它在迭代次数和挂钟时间方面快速减小了最优性度量（原始和对偶差），甚至胜过了[arXiv:1410.8816]的懒惰条件梯度算法。我们还提出了一种简化版本的算法，适用于概率单纯形。

更新时间: 2025-03-21 11:03:00

领域: math.OC,cs.CC,cs.LG,68Q32, 90C52

下载: http://arxiv.org/abs/1805.07311v4

Offload Rethinking by Cloud Assistance for Efficient Environmental Sound Recognition on LPWANs

Learning-based environmental sound recognition has emerged as a crucial method for ultra-low-power environmental monitoring in biological research and city-scale sensing systems. These systems usually operate under limited resources and are often powered by harvested energy in remote areas. Recent efforts in on-device sound recognition suffer from low accuracy due to resource constraints, whereas cloud offloading strategies are hindered by high communication costs. In this work, we introduce ORCA, a novel resource-efficient cloud-assisted environmental sound recognition system on batteryless devices operating over the Low-Power Wide-Area Networks (LPWANs), targeting wide-area audio sensing applications. We propose a cloud assistance strategy that remedies the low accuracy of on-device inference while minimizing the communication costs for cloud offloading. By leveraging a self-attention-based cloud sub-spectral feature selection method to facilitate efficient on-device inference, ORCA resolves three key challenges for resource-constrained cloud offloading over LPWANs: 1) high communication costs and low data rates, 2) dynamic wireless channel conditions, and 3) unreliable offloading. We implement ORCA on an energy-harvesting batteryless microcontroller and evaluate it in a real world urban sound testbed. Our results show that ORCA outperforms state-of-the-art methods by up to $80 \times$ in energy savings and $220 \times$ in latency reduction while maintaining comparable accuracy.

Updated: 2025-03-21 11:01:05

标题: 通过云辅助重新思考离线环境声音识别在LPWANs上的高效性

摘要: 基于学习的环境声音识别已经成为生物研究和城市规模感知系统中超低功耗环境监测的关键方法。这些系统通常在资源有限的情况下运行，并且通常由远程地区的能量收集供电。最近在设备上进行声音识别的努力受到资源约束导致的低准确性的影响，而云卸载策略受到高通信成本的阻碍。在这项工作中，我们介绍了ORCA，这是一个新颖的资源高效的云辅助环境声音识别系统，它在无电池设备上通过低功耗广域网络（LPWANs）运行，针对广域音频感知应用。我们提出了一种云辅助策略，可以纠正设备上推断的低准确性，同时最小化云卸载的通信成本。通过利用基于自注意力的云子谱特征选择方法促进高效的设备上推断，ORCA解决了LPWAN上资源受限的云卸载面临的三个关键挑战：1）高通信成本和低数据传输速率，2）动态无线信道条件，3）不可靠的卸载。我们在一个真实的城市声音测试平台上实现了ORCA，并对其进行了评估。我们的结果表明，ORCA在节能方面超过了现有技术方法高达80倍，并且在减少延迟方面高达220倍，同时保持可比较的准确性。

更新时间: 2025-03-21 11:01:05

领域: cs.SD,cs.AI,cs.DC,cs.NI,eess.AS

下载: http://arxiv.org/abs/2502.15285v3

Generating Likely Counterfactuals Using Sum-Product Networks

The need to explain decisions made by AI systems is driven by both recent regulation and user demand. The decisions are often explainable only post hoc. In counterfactual explanations, one may ask what constitutes the best counterfactual explanation. Clearly, multiple criteria must be taken into account, although "distance from the sample" is a key criterion. Recent methods that consider the plausibility of a counterfactual seem to sacrifice this original objective. Here, we present a system that provides high-likelihood explanations that are, at the same time, close and sparse. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using Mixed-Integer Optimization (MIO). We use a Sum-Product Network (SPN) to estimate the likelihood of a counterfactual. To achieve that, we propose an MIO formulation of an SPN, which can be of independent interest. The source code with examples is available at https://github.com/Epanemu/LiCE.

Updated: 2025-03-21 10:55:53

标题: 使用求和-乘积网络生成可能的反事实情况

摘要: AI系统所做出的决策需要解释，这一需求既受到最近的监管，也受到用户的需求推动。这些决策通常只能事后解释。在反事实解释中，人们可能会问什么构成最佳的反事实解释。显然，必须考虑多个标准，尽管“与样本的距离”是一个关键标准。最近的方法似乎牺牲了这一最初的目标，而提出了一个系统，提供高概率解释，同时又接近且稀疏。我们展示了，寻找满足反事实解释的许多常见愿望的最可能解释可以使用混合整数优化（MIO）进行建模。我们使用Sum-Product Network（SPN）来估计反事实的概率。为了实现这一点，我们提出了一个SPN的MIO公式，这可能会引起独立的兴趣。源代码和示例可在https://github.com/Epanemu/LiCE 上找到。

更新时间: 2025-03-21 10:55:53

领域: cs.AI,cs.LG,math.OC

下载: http://arxiv.org/abs/2401.14086v4

Zero Trust Architecture: A Systematic Literature Review

The increasing complexity of digital ecosystems and evolving cybersecurity threats have highlighted the limitations of traditional perimeter-based security models, leading to the growing adoption of Zero Trust Architecture (ZTA). ZTA operates on the principle of "never trust, always verify", enforcing continuous authentication, conditional access, dynamic trust evaluation, and the principle of least privilege to enhance security across diverse domains. This study applies the PRISMA framework to analyze 10 years of research (2016-2025) on ZTA, presenting a systematic literature review (SLR) that synthesizes its applications, enabling technologies, and associated challenges. It provides a detailed taxonomy that organizes ZTA's application domains, together with the emerging technologies that facilitate its implementation, and critically examines the barriers to ZTA adoption. Additionally, the study traces the historical evolution of ZTA alongside notable events and publications trends while highlighting some potential factors for the surge over the past few years. This comprehensive analysis serves as a practical guide for researchers and practitioners seeking to leverage ZTA for stronger, more adaptive security frameworks in a rapidly shifting threat landscape.

Updated: 2025-03-21 10:52:22

标题: 零信任架构：系统文献综述

摘要: 数字生态系统的日益复杂化和不断发展的网络安全威胁凸显了传统基于边界的安全模型的局限性，导致零信任架构（ZTA）的日益普及。ZTA的运作原则是“永不信任，始终验证”，强调持续认证、条件访问、动态信任评估和最小权限原则，以增强在不同领域的安全性。本研究应用PRISMA框架分析了10年（2016-2025年）关于ZTA的研究，提出了一篇系统文献综述（SLR），综合分析了其应用、支持技术和相关挑战。它提供了一个详细的分类法，组织了ZTA的应用领域，以及促进其实施的新兴技术，并批判性地审视了ZTA采用的障碍。此外，该研究追踪了ZTA的历史演变，同时突出了一些过去几年涌现的潜在因素。这一全面的分析为寻求利用ZTA构建更强大、更具适应性的安全框架的研究人员和从业者提供了实用指南。

更新时间: 2025-03-21 10:52:22

领域: cs.CR

下载: http://arxiv.org/abs/2503.11659v2

Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?

Studies on evaluation metrics and LLM-as-a-Judge models for automatic text summarization have largely been focused on English, limiting our understanding of their effectiveness in other languages. Through our new dataset BASSE (BAsque and Spanish Summarization Evaluation), we address this situation by collecting human judgments on 2,040 abstractive summaries in Basque and Spanish, generated either manually or by five LLMs with four different prompts. For each summary, annotators evaluated five criteria on a 5-point Likert scale: coherence, consistency, fluency, relevance, and 5W1H. We use these data to reevaluate traditional automatic metrics used for evaluating summaries, as well as several LLM-as-a-Judge models that show strong performance on this task in English. Our results show that currently proprietary judge LLMs have the highest correlation with human judgments, followed by criteria-specific automatic metrics, while open-sourced judge LLMs perform poorly. We release BASSE and our code publicly, along with the first large-scale Basque summarization dataset containing 22,525 news articles with their subheads.

Updated: 2025-03-21 10:52:20

标题: 西班牙语和巴斯克语摘要评估指标：自动评分和LLM评委与人类评分相关吗？

摘要: 研究自动文本摘要的评估指标和LLM作为评判模型的研究主要集中在英语上，限制了我们对它们在其他语言中的有效性的理解。通过我们的新数据集BASSE（Basque and Spanish Summarization Evaluation），我们通过收集人类对2040个巴斯克语和西班牙语抽象摘要的评价来解决这一问题，这些摘要是手动生成或通过五个LLM使用四个不同提示生成的。对于每个摘要，注释者根据5点Likert量表评估了五个标准：连贯性、一致性、流畅性、相关性和5W1H。我们利用这些数据重新评估了用于评估摘要的传统自动指标，以及在英语中表现出色的几个LLM作为评判模型。我们的结果显示，目前专有的评判LLM与人类评价具有最高的相关性，其次是特定标准的自动指标，而开源的评判LLM表现不佳。我们公开发布BASSE和我们的代码，以及第一个包含22525篇新闻文章及其副标题的大规模巴斯克语摘要数据集。

更新时间: 2025-03-21 10:52:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17039v1

Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment

This paper presents Babel, the expandable modality alignment model, specially designed for multi-modal sensing. While there has been considerable work on multi-modality alignment, they all struggle to effectively incorporate multiple sensing modalities due to the data scarcity constraints. How to utilize multi-modal data with partial pairings in sensing remains an unresolved challenge. Babel tackles this challenge by introducing the concept of expandable modality alignment. The key idea involves transforming the N-modality alignment into a series of binary-modality alignments. Novel techniques are also proposed to further mitigate data scarcity issue and balance the contribution of the newly incorporated modality with the previously established modality alignment during the expandable alignment process. We provide the comprehensive implementation. In the pre-training phase, Babel currently aligns 6 sensing modalities, namely Wi-Fi, mmWave, IMU, LiDAR, video, and depth. For the deployment phase, as a foundation model, any single or combination of aligned modalities could be selected from Babel and applied to downstream tasks. Evaluation demonstrates Babel's outstanding performance on eight human activity recognition datasets, compared to a broad range of baselines e.g., the SOTA single-modal sensing networks, multi-modal sensing framework, and multi-modal large language models. Babel not only improves the performance of individual modality sensing (12% averaged accuracy improvement), but also effectively fuses multiple available modalities (up to 22% accuracy increase). Case studies also highlight emerging application scenarios empowered by Babel, including cross-modality retrieval (i.e., sensing imaging), and bridging LLM for sensing comprehension.

Updated: 2025-03-21 10:51:22

标题: Babel：一种可扩展的预训练模型，用于多模态感知的可扩展模态对齐

摘要: 本文介绍了Babel，一种专为多模态感知设计的可扩展模态对齐模型。虽然已经有相当多的工作在多模态对齐方面进行了研究，但由于数据稀缺的限制，它们都很难有效地整合多个感知模态。如何在感知中利用部分配对的多模态数据仍然是一个未解决的挑战。Babel通过引入可扩展模态对齐的概念来解决这一挑战。其关键思想是将N模态对齐转化为一系列二元模态对齐。还提出了新颖的技术来进一步缓解数据稀缺问题，并在可扩展对齐过程中平衡新加入模态的贡献与先前建立的模态对齐。我们提供了全面的实现。在预训练阶段，Babel目前对齐了6种感知模态，即Wi-Fi、毫米波、IMU、LiDAR、视频和深度。在部署阶段，作为基础模型，可以从Babel中选择任何一个或多个对齐的模态，并应用于下游任务。评估表明，与多种基线方法（例如SOTA单模态感知网络、多模态感知框架和多模态大型语言模型）相比，Babel在八个人体活动识别数据集上表现出色。Babel不仅提高了单一模态感知的性能（平均准确率提高了12%），还有效地融合了多个可用的模态（准确率提高了高达22%）。案例研究还突出了Babel赋予的新应用场景，包括跨模态检索（即感知成像）和连接LLM以理解感知。

更新时间: 2025-03-21 10:51:22

领域: cs.AI,cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.17777v2

Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery

Causal discovery aims to extract qualitative causal knowledge in the form of causal graphs from data. Because causal ground truth is rarely known in the real world, simulated data plays a vital role in evaluating the performance of the various causal discovery algorithms proposed in the literature. But recent work highlighted certain artifacts of commonly used data generation techniques for a standard class of structural causal models (SCM) that may be nonphysical, including var- and R2-sortability, where the variables' variance and coefficients of determination (R2) after regressing on all other variables, respectively, increase along the causal order. Some causal methods exploit such artifacts, leading to unrealistic expectations for their performance on real-world data. Some modifications have been proposed to remove these artifacts; notably, the internally-standardized structural causal model (iSCM) avoids varsortability and largely alleviates R2-sortability on sparse causal graphs, but exhibits a reversed R2-sortability pattern for denser graphs not featured in their work. We analyze which sortability patterns we expect to see in real data, and propose a method for drawing coefficients that we argue more effectively samples the space of SCMs. Finally, we propose a novel extension of our SCM generation method to the time series setting.

Updated: 2025-03-21 10:46:50

标题: 无单位无限制马尔可夫一致SCM生成：用于因果发现的更好基准数据集

摘要: 因果发现旨在从数据中提取定性因果知识，以因果图的形式呈现。由于真实世界中很少知道因果真相，模拟数据在评估文献中提出的各种因果发现算法的性能方面发挥了重要作用。但最近的研究突显了一些常用数据生成技术的特定伪现象，适用于标准结构因果模型（SCM），这些伪现象可能是非物理的，包括变-和R2-排序性，其中变量的方差和决定系数（R2）在回归其他所有变量之后，分别沿因果顺序增加。一些因果方法利用这些伪现象，导致对它们在真实世界数据上的表现产生不切实际的期望。已经提出了一些修改来消除这些伪现象；值得注意的是，内部标准化的结构因果模型（iSCM）避免了变排序性，并在稀疏因果图上大部分缓解了R2排序性，但对于未在其工作中展示的更密集因果图，呈现了反向的R2排序性模式。我们分析了我们预计在真实数据中会看到哪些排序性模式，并提出了一种绘制系数的方法，我们认为这种方法更有效地采样了SCM的空间。最后，我们提出了一种新颖的扩展我们的SCM生成方法到时间序列设置的方法。

更新时间: 2025-03-21 10:46:50

领域: cs.LG

下载: http://arxiv.org/abs/2503.17037v1

Enhanced Smart Contract Reputability Analysis using Multimodal Data Fusion on Ethereum

The evaluation of smart contract reputability is essential to foster trust in decentralized ecosystems. However, existing methods that rely solely on static code analysis or transactional data, offer limited insight into evolving trustworthiness. We propose a multimodal data fusion framework that integrates static code features with transactional data to enhance reputability prediction. Our framework initially focuses on static code analysis, utilizing GAN-augmented opcode embeddings to address class imbalance, achieving 97.67% accuracy and a recall of 0.942 in detecting illicit contracts, surpassing traditional oversampling methods. This forms the crux of a reputability-centric fusion strategy, where combining static and transactional data improves recall by 7.25% over single-source models, demonstrating robust performance across validation sets. By providing a holistic view of smart contract behaviour, our approach enhances the model's ability to assess reputability, identify fraudulent activities, and predict anomalous patterns. These capabilities contribute to more accurate reputability assessments, proactive risk mitigation, and enhanced blockchain security.

Updated: 2025-03-21 10:45:17

标题: 在以太坊上使用多模态数据融合增强智能合约声誉分析

摘要: 智能合约声誉评估对于在去中心化生态系统中建立信任至关重要。然而，现有仅依赖静态代码分析或交易数据的方法对于不断发展的可信度提供有限的洞察力。我们提出了一个多模态数据融合框架，将静态代码特征与交易数据相结合，以增强声誉预测能力。我们的框架最初专注于静态代码分析，利用增强型 GAN 操作码嵌入来解决类别不平衡问题，实现了97.67%的准确率和0.942的召回率，超过传统的过采样方法。这构成了一个以声誉为中心的融合策略，通过结合静态和交易数据，使召回率比单一来源模型提高了7.25%，在验证集上展现出稳健的性能。通过提供对智能合约行为的全面视图，我们的方法增强了模型评估声誉、识别欺诈活动和预测异常模式的能力。这些能力有助于更准确地评估声誉、积极化风险缓解，并增强区块链安全性。

更新时间: 2025-03-21 10:45:17

领域: cs.LG,cs.AI,cs.CR,cs.ET

下载: http://arxiv.org/abs/2503.17426v1

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

Despite advances in Large Multi-modal Models, applying them to long and untrimmed video content remains challenging due to limitations in context length and substantial memory overhead. These constraints often lead to significant information loss and reduced relevance in the model responses. With the exponential growth of video data across web platforms, understanding long-form video is crucial for advancing generalized intelligence. In this paper, we introduce SALOVA: Segment-Augmented LOng Video Assistant, a novel video-LLM framework designed to enhance the comprehension of lengthy video content through targeted retrieval process. We address two main challenges to achieve it: (i) We present the SceneWalk dataset, a high-quality collection of 87.8K long videos, each densely captioned at the segment level to enable models to capture scene continuity and maintain rich descriptive context. (ii) We develop robust architectural designs integrating dynamic routing mechanism and spatio-temporal projector to efficiently retrieve and process relevant video segments based on user queries. Our framework mitigates the limitations of current video-LMMs by allowing for precise identification and retrieval of relevant video segments in response to queries, thereby improving the contextual relevance of the generated responses. Through extensive experiments, SALOVA demonstrates enhanced capability in processing complex long-form videos, showing significant capability to maintain contextual integrity across extended sequences.

Updated: 2025-03-21 10:44:15

标题: SALOVA：用于长视频分析中的定向检索和路由的段增强长视频助手

摘要: 尽管大型多模态模型取得了进展，但将它们应用于长时间未剪辑的视频内容仍然具有挑战性，原因是上下文长度有限且内存开销巨大。这些限制通常会导致重要信息的丢失和模型响应的相关性降低。随着网络平台上视频数据的指数增长，理解长格式视频对于推进普适智能至关重要。在本文中，我们介绍了SALOVA：Segment-Augmented LOng Video Assistant，这是一种新颖的视频-LLM框架，旨在通过有针对性的检索过程增强对长视频内容的理解。我们解决了实现这一目标的两个主要挑战：(i) 我们提出了SceneWalk数据集，这是一个高质量的长视频集合，每个视频在段落级别密集标注，以便模型捕捉场景连续性并保持丰富的描述性上下文。(ii) 我们开发了强大的架构设计，集成了动态路由机制和时空投影仪，以高效检索和处理基于用户查询的相关视频片段。我们的框架通过允许精确识别和检索响应查询的相关视频片段，从而改善生成响应的上下文相关性，以缓解当前视频-LMM的限制。通过广泛的实验，SALOVA展示了在处理复杂的长格式视频方面的增强能力，显著地展示了在扩展序列中保持上下文完整性的能力。

更新时间: 2025-03-21 10:44:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.16173v2

An Attentive Representative Sample Selection Strategy Combined with Balanced Batch Training for Skin Lesion Segmentation

An often overlooked problem in medical image segmentation research is the effective selection of training subsets to annotate from a complete set of unlabelled data. Many studies select their training sets at random, which may lead to suboptimal model performance, especially in the minimal supervision setting where each training image has a profound effect on performance outcomes. This work aims to address this issue. We use prototypical contrasting learning and clustering to extract representative and diverse samples for annotation. We improve upon prior works with a bespoke cluster-based image selection process. Additionally, we introduce the concept of unsupervised balanced batch dataloading to medical image segmentation, which aims to improve model learning with minimally annotated data. We evaluated our method on a public skin lesion dataset (ISIC 2018) and compared it to another state-of-the-art data sampling method. Our method achieved superior performance in a low annotation budget scenario.

Updated: 2025-03-21 10:42:22

标题: 一种结合平衡批量训练的关注代表性样本选择策略用于皮肤病变分割

摘要: 医学图像分割研究中经常被忽视的问题是从完整的未标记数据中有效选择训练子集进行注释。许多研究随机选择其训练集，这可能导致次优的模型性能，特别是在最小监督设置中，其中每个训练图像对性能结果有深远影响。本研究旨在解决这一问题。我们利用样本对比学习和聚类来提取具有代表性和多样性的样本进行注释。我们通过一种专门设计的基于聚类的图像选择过程改进了先前的工作。此外，我们引入了无监督平衡批量数据加载的概念到医学图像分割中，旨在通过最小标注数据改善模型学习。我们在公共皮肤病变数据集（ISIC 2018）上评估了我们的方法，并将其与另一种最先进的数据采样方法进行了比较。在低标注预算情况下，我们的方法实现了卓越的性能。

更新时间: 2025-03-21 10:42:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17034v1

Exploring the Efficacy of Partial Denoising Using Bit Plane Slicing for Enhanced Fracture Identification: A Comparative Study of Deep Learning-Based Approaches and Handcrafted Feature Extraction Techniques

Computer vision has transformed medical diagnosis, treatment, and research through advanced image processing and machine learning techniques. Fracture classification, a critical area in healthcare, has greatly benefited from these advancements, yet accurate detection is challenged by complex patterns and image noise. Bit plane slicing enhances medical images by reducing noise interference and extracting informative features. This research explores partial denoising techniques to provide practical solutions for improved fracture analysis, ultimately enhancing patient care. The study explores deep learning model DenseNet and handcrafted feature extraction. Decision Tree and Random Forest, were employed to train and evaluate distinct image representations. These include the original image, the concatenation of the four bit planes from the LSB as well as MSB, the fully denoised image, and an image consisting of 6 bit planes from MSB and 2 denoised bit planes from LSB. The purpose of forming these diverse image representations is to analyze SNR as well as classification accuracy and identify the bit planes that contain the most informative features. Moreover, the study delves into the significance of partial denoising techniques in preserving crucial features, leading to improvements in classification results. Notably, this study shows that employing the Random Forest classifier, the partially denoised image representation exhibited a testing accuracy of 95.61% surpassing the performance of other image representations. The outcomes of this research provide valuable insights into the development of efficient preprocessing, feature extraction and classification approaches for fracture identification. By enhancing diagnostic accuracy, these advancements hold the potential to positively impact patient care and overall medical outcomes.

Updated: 2025-03-21 10:39:21

标题: 探索使用比特平面切片进行部分去噪以增强断裂识别的有效性：基于深度学习方法和手工特征提取技术的比较研究

摘要: 计算机视觉通过先进的图像处理和机器学习技术，已经改变了医学诊断、治疗和研究。骨折分类是医疗保健领域的一个关键领域，受益于这些进步，但准确的检测受到复杂模式和图像噪声的挑战。比特平面切片通过减少噪声干扰和提取信息特征，增强了医学图像。本研究探讨了部分去噪技术，为改善骨折分析提供实用解决方案，最终提高了患者护理水平。该研究探讨了深度学习模型DenseNet和手工特征提取。决策树和随机森林被用来训练和评估不同的图像表示。这些包括原始图像、从最低有效位（LSB）和最高有效位（MSB）连接的四个比特平面、完全去噪的图像，以及由6个MSB比特平面和2个LSB去噪比特平面组成的图像。形成这些多样化的图像表示的目的是分析信噪比以及分类准确度，并确定包含最具信息性特征的比特平面。此外，该研究深入探讨了部分去噪技术在保留关键特征方面的重要性，从而改善分类结果。值得注意的是，这项研究表明，利用随机森林分类器，部分去噪图像表示展示了95.61%的测试准确率，超过了其他图像表示的表现。这项研究的结果为骨折识别的高效预处理、特征提取和分类方法的发展提供了宝贵的见解。通过提高诊断准确性，这些进步有可能对患者护理和整体医疗结果产生积极影响。

更新时间: 2025-03-21 10:39:21

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.17030v1

A Guide to Bayesian Networks Software Packages for Structure and Parameter Learning -- 2025 Edition

A representation of the cause-effect mechanism is needed to enable artificial intelligence to represent how the world works. Bayesian Networks (BNs) have proven to be an effective and versatile tool for this task. BNs require constructing a structure of dependencies among variables and learning the parameters that govern these relationships. These tasks, referred to as structural learning and parameter learning, are actively investigated by the research community, with several algorithms proposed and no single method having established itself as standard. A wide range of software, tools, and packages have been developed for BNs analysis and made available to academic researchers and industry practitioners. As a consequence of having no one-size-fits-all solution, moving the first practical steps and getting oriented into this field is proving to be challenging to outsiders and beginners. In this paper, we review the most relevant tools and software for BNs structural and parameter learning to date, providing our subjective recommendations directed to an audience of beginners. In addition, we provide an extensive easy-to-consult overview table summarizing all software packages and their main features. By improving the reader understanding of which available software might best suit their needs, we improve accessibility to the field and make it easier for beginners to take their first step into it.

Updated: 2025-03-21 10:36:11

标题: 一个贝叶斯网络软件包结构和参数学习指南-- 2025版

摘要: 需要一种因果机制的表征，以使人工智能能够表征世界运作的方式。贝叶斯网络（BNs）已被证明是这一任务的有效和多功能工具。BNs 需要构建变量之间的依赖关系结构，并学习控制这些关系的参数。这些任务被称为结构学习和参数学习，受到研究界的积极调查，提出了几种算法，没有一个方法确立为标准。已经开发了各种软件、工具和程序包用于 BNs 分析，并提供给学术研究人员和行业从业者。由于没有一种大小适合所有的解决方案，对于外部人员和初学者来说，迈出第一步并进入这一领域是具有挑战性的。在本文中，我们回顾了迄今为止最相关的 BNs 结构和参数学习工具和软件，提供我们主观的建议，针对初学者的受众。此外，我们提供了一个广泛且易于咨询的概述表，总结了所有软件包及其主要特点。通过提高读者对哪种可用软件可能最适合他们需求的理解，我们提高了对该领域的可访问性，使初学者更容易迈出第一步。

更新时间: 2025-03-21 10:36:11

领域: cs.AI,I.2

下载: http://arxiv.org/abs/2503.17025v1

A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets

Supervised contrastive learning (SupCon) has proven to be a powerful alternative to the standard cross-entropy loss for classification of multi-class balanced datasets. However, it struggles to learn well-conditioned representations of datasets with long-tailed class distributions. This problem is potentially exacerbated for binary imbalanced distributions, which are commonly encountered during many real-world problems such as medical diagnosis. In experiments on seven binary datasets of natural and medical images, we show that the performance of SupCon decreases with increasing class imbalance. To substantiate these findings, we introduce two novel metrics that evaluate the quality of the learned representation space. By measuring the class distribution in local neighborhoods, we are able to uncover structural deficiencies of the representation space that classical metrics cannot detect. Informed by these insights, we propose two new supervised contrastive learning strategies tailored to binary imbalanced datasets that improve the structure of the representation space and increase downstream classification accuracy over standard SupCon by up to 35%. We make our code available.

Updated: 2025-03-21 10:34:51

标题: 两个类别的故事：将监督对比学习调整为二元不平衡数据集

摘要: Supervised contrastive learning (SupCon)已被证明是对于多类平衡数据集分类的标准交叉熵损失的一个强大替代方法。然而，它在学习长尾类分布数据集的良好表示方面存在困难。这个问题在二元不平衡分布中可能会加剧，这在许多实际问题中经常遇到，比如医学诊断。在对七个自然和医学图像的二元数据集进行实验时，我们展示了SupCon的性能随着类别不平衡程度的增加而下降。为了证实这些发现，我们引入了两个评估学习表示空间质量的新指标。通过测量局部邻域中的类分布，我们能够发现表示空间的结构缺陷，这是传统指标无法检测到的。基于这些见解，我们提出了两种新的针对二元不平衡数据集的监督对比学习策略，改进了表示空间的结构，并使下游分类准确性比标准SupCon提高了高达35%。我们提供了我们的代码。

更新时间: 2025-03-21 10:34:51

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.17024v1

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment

Large language models (LLMs) have traditionally been aligned through one-size-fits-all approaches that assume uniform human preferences, fundamentally overlooking the diversity in user values and needs. This paper introduces a comprehensive framework for scalable personalized alignment of LLMs. We establish a systematic preference space characterizing psychological and behavioral dimensions, alongside diverse persona representations for robust preference inference in real-world scenarios. Building upon this foundation, we introduce \textsc{AlignX}, a large-scale dataset of over 1.3 million personalized preference examples, and develop two complementary alignment approaches: \textit{in-context alignment} directly conditioning on persona representations and \textit{preference-bridged alignment} modeling intermediate preference distributions. Extensive experiments demonstrate substantial improvements over existing methods, with an average 17.06\% accuracy gain across four benchmarks while exhibiting a strong adaptation capability to novel preferences, robustness to limited user data, and precise preference controllability. These results validate our framework's effectiveness, advancing toward truly user-adaptive AI systems.

Updated: 2025-03-21 10:33:21

标题: 从100万用户到每个用户：扩展个性化偏好以实现用户级对齐

摘要: 大型语言模型（LLMs）传统上通过一揽子方法进行对齐，假设人类偏好是统一的，从根本上忽视了用户价值观和需求的多样性。本文介绍了一个可扩展的个性化对齐LLMs的综合框架。我们建立了一个系统的偏好空间，描述心理和行为维度，以及多样化的个人形象表示，用于在现实场景中进行偏好推断。在此基础上，我们引入了一个名为\textsc{AlignX}的大规模数据集，包含超过130万个个性化偏好示例，并开发了两种互补的对齐方法：直接在个人形象上进行条件化的\textit{上下文对齐}和建模中间偏好分布的\textit{偏好桥接对齐}。广泛的实验表明，与现有方法相比，我们的方法在四个基准测试中平均准确率提高了17.06\%，同时表现出强大的适应新偏好的能力，对有限用户数据的稳健性和精确的偏好可控性。这些结果验证了我们框架的有效性，推进了真正用户自适应的人工智能系统。

更新时间: 2025-03-21 10:33:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.15463v2

Sample-Efficient Bayesian Transfer Learning for Online Machine Parameter Optimization

Correctly setting the parameters of a production machine is essential to improve product quality, increase efficiency, and reduce production costs while also supporting sustainability goals. Identifying optimal parameters involves an iterative process of producing an object and evaluating its quality. Minimizing the number of iterations is, therefore, desirable to reduce the costs associated with unsuccessful attempts. This work introduces a method to optimize the machine parameters in the system itself using a Bayesian optimization algorithm. By leveraging existing machine data, we use a transfer learning approach in order to identify an optimum with minimal iterations, resulting in a cost-effective transfer learning algorithm. We validate our approach on a laser machine for cutting sheet metal in the real world.

Updated: 2025-03-21 10:32:21

标题: 样本高效的贝叶斯迁移学习用于在线机器参数优化

摘要: 正确设置生产机器的参数对于提高产品质量、增加效率、减少生产成本并支持可持续发展目标至关重要。确定最佳参数涉及通过生产物体并评估其质量的迭代过程。因此，减少迭代次数是理想的，以减少与失败尝试相关的成本。本研究介绍了一种使用贝叶斯优化算法在系统内优化机器参数的方法。通过利用现有的机器数据，我们采用迁移学习方法来识别最小迭代次数的最佳结果，从而实现一种具有成本效益的迁移学习算法。我们在现实世界中的一个用于切割金属板的激光机上验证了我们的方法。

更新时间: 2025-03-21 10:32:21

领域: cs.LG

下载: http://arxiv.org/abs/2503.15928v2

Benign Overfitting with Quantum Kernels

Quantum kernels quantify similarity between data points by measuring the inner product between quantum states, computed through quantum circuit measurements. By embedding data into quantum systems, quantum kernel feature maps, that may be classically intractable to compute, could efficiently exploit high-dimensional Hilbert spaces to capture complex patterns. However, designing effective quantum feature maps remains a major challenge. Many quantum kernels, such as the fidelity kernel, suffer from exponential concentration, leading to near-identity kernel matrices that fail to capture meaningful data correlations and lead to overfitting and poor generalization. In this paper, we propose a novel strategy for constructing quantum kernels that achieve good generalization performance, drawing inspiration from benign overfitting in classical machine learning. Our approach introduces the concept of local-global quantum kernels, which combine two complementary components: a local quantum kernel based on measurements of small subsystems and a global quantum kernel derived from full-system measurements. Through numerical experiments, we demonstrate that local-global quantum kernels exhibit benign overfitting, supporting the effectiveness of our approach in enhancing quantum kernel methods.

Updated: 2025-03-21 10:30:42

标题: 使用量子核函数的良性过拟合

摘要: 量子核心通过测量量子状态之间的内积来量化数据点之间的相似性，这是通过量子电路测量计算的。通过将数据嵌入量子系统，量子核特征映射可以有效地利用高维希尔伯特空间来捕捉复杂模式，这在经典计算中可能是难以计算的。然而，设计有效的量子特征映射仍然是一个重大挑战。许多量子核，比如保真度核，存在指数集中问题，导致近似恒等的核矩阵无法捕捉有意义的数据相关性，从而导致过拟合和泛化能力差。在本文中，我们提出了一种构建能够实现良好泛化性能的量子核心的新策略，灵感来源于经典机器学习中的良性过拟合。我们的方法引入了局部-全局量子核的概念，结合了两个互补的组成部分：基于小子系统测量的局部量子核和从完整系统测量导出的全局量子核。通过数值实验，我们证明了局部-全局量子核表现出良性过拟合的特性，支持我们的方法在增强量子核方法方面的有效性。

更新时间: 2025-03-21 10:30:42

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17020v1

Symbolic Audio Classification via Modal Decision Tree Learning

The range of potential applications of acoustic analysis is wide. Classification of sounds, in particular, is a typical machine learning task that received a lot of attention in recent years. The most common approaches to sound classification are sub-symbolic, typically based on neural networks, and result in black-box models with high performances but very low transparency. In this work, we consider several audio tasks, namely, age and gender recognition, emotion classification, and respiratory disease diagnosis, and we approach them with a symbolic technique, that is, (modal) decision tree learning. We prove that such tasks can be solved using the same symbolic pipeline, that allows to extract simple rules with very high accuracy and low complexity. In principle, all such tasks could be associated to an autonomous conversation system, which could be useful in different contexts, such as an automatic reservation agent for an hospital or a clinic.

Updated: 2025-03-21 10:27:16

标题: 通过模态决策树学习进行符号音频分类

摘要: 声学分析的潜在应用范围很广。尤其是声音分类是一个典型的机器学习任务，在近年来受到了很多关注。声音分类的最常见方法是基于神经网络的符号-子符号方法，导致产生高性能但透明度非常低的黑匣模型。在这项工作中，我们考虑了几个音频任务，即年龄和性别识别、情绪分类和呼吸道疾病诊断，并采用了符号技术（模态）决策树学习方法。我们证明了这些任务可以使用相同的符号管道来解决，从而能够提取简单规则，具有非常高的准确性和低的复杂性。原则上，所有这些任务都可以与自主对话系统关联起来，这对于不同的情境，如医院或诊所的自动预约代理系统，可能是有用的。

更新时间: 2025-03-21 10:27:16

领域: cs.SD,cs.AI,cs.LG,eess.AS,68T05,I.2.6

下载: http://arxiv.org/abs/2503.17018v1

Specifying What You Know or Not for Multi-Label Class-Incremental Learning

Existing class incremental learning is mainly designed for single-label classification task, which is ill-equipped for multi-label scenarios due to the inherent contradiction of learning objectives for samples with incomplete labels. We argue that the main challenge to overcome this contradiction in multi-label class-incremental learning (MLCIL) lies in the model's inability to clearly distinguish between known and unknown knowledge. This ambiguity hinders the model's ability to retain historical knowledge, master current classes, and prepare for future learning simultaneously. In this paper, we target at specifying what is known or not to accommodate Historical, Current, and Prospective knowledge for MLCIL and propose a novel framework termed as HCP. Specifically, (i) we clarify the known classes by dynamic feature purification and recall enhancement with distribution prior, enhancing the precision and retention of known information. (ii) We design prospective knowledge mining to probe the unknown, preparing the model for future learning. Extensive experiments validate that our method effectively alleviates catastrophic forgetting in MLCIL, surpassing the previous state-of-the-art by 3.3% on average accuracy for MS-COCO B0-C10 setting without replay buffers.

Updated: 2025-03-21 10:26:32

标题: 指定多标签类增量学习中的已知或未知内容

摘要: 现有的类增量学习主要设计用于单标签分类任务，这对于多标签情况并不适用，因为对于具有不完整标签的样本的学习目标存在固有矛盾。我们认为，克服多标签类增量学习（MLCIL）中的这种矛盾的主要挑战在于模型无法清晰区分已知和未知知识。这种模糊性阻碍了模型同时保留历史知识，掌握当前类别并为未来学习做准备的能力。在本文中，我们致力于指定什么是已知或未知以适应MLCIL的历史、当前和未来知识，并提出了一种名为HCP的新框架。具体来说，（i）我们通过动态特征纯化和分布先验增强召回来澄清已知类别，增强已知信息的精准性和保留度。（ii）我们设计了未来知识挖掘来探测未知，为模型未来学习做准备。大量实验证实我们的方法有效地减轻了MLCIL中的灾难性遗忘，平均准确率超过了以往的最新技术水平，MS-COCO B0-C10 设置下比没有重放缓冲区的情况高出 3.3%。

更新时间: 2025-03-21 10:26:32

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.17017v1

HAL 9000: a Risk Manager for ITSs

HAL 9000 is an Intrusion Tolerant Systems (ITSs) Risk Manager, which assesses configuration risks against potential intrusions. It utilizes gathered threat knowledge and remains operational, even in the absence of updated information. Based on its advice, the ITSs can dynamically and proactively adapt to recent threats to minimize and mitigate future intrusions from malicious adversaries. Our goal is to reduce the risk linked to the exploitation of recently uncovered vulnerabilities that have not been classified and/or do not have a script to reproduce the exploit, considering the potential that they may have already been exploited as zero-day exploits. Our experiments demonstrate that the proposed solution can effectively learn and replicate National Vulnerability Database's evaluation process with 99% accuracy.

Updated: 2025-03-21 10:25:32

标题: HAL 9000：ITS的风险管理者

摘要: HAL 9000是一种入侵容忍系统（ITSs）风险管理器，评估配置风险与潜在入侵之间的关系。它利用收集到的威胁知识，并在缺乏更新信息的情况下保持运行。根据其建议，ITSs可以动态和主动地适应最近的威胁，以最小化和减轻未来恶意对手的入侵。我们的目标是减少与利用最近揭示的漏洞相关的风险，这些漏洞尚未分类和/或没有脚本来复制利用，考虑到它们可能已经被作为零日漏洞利用。我们的实验证明，所提出的解决方案可以有效地学习和复制国家漏洞数据库评估过程，准确率为99%。

更新时间: 2025-03-21 10:25:32

领域: cs.CR,cs.AI,cs.OS

下载: http://arxiv.org/abs/2311.09449v2

Do regularization methods for shortcut mitigation work as intended?

Mitigating shortcuts, where models exploit spurious correlations in training data, remains a significant challenge for improving generalization. Regularization methods have been proposed to address this issue by enhancing model generalizability. However, we demonstrate that these methods can sometimes overregularize, inadvertently suppressing causal features along with spurious ones. In this work, we analyze the theoretical mechanisms by which regularization mitigates shortcuts and explore the limits of its effectiveness. Additionally, we identify the conditions under which regularization can successfully eliminate shortcuts without compromising causal features. Through experiments on synthetic and real-world datasets, our comprehensive analysis provides valuable insights into the strengths and limitations of regularization techniques for addressing shortcuts, offering guidance for developing more robust models.

Updated: 2025-03-21 10:24:43

标题: 这些用于快捷方式缓解的正则化方法是否起到预期的作用？

摘要: 缓解捷径问题，即模型在训练数据中利用虚假相关性的情况，仍然是改善泛化能力的重大挑战。已经提出了正则化方法来解决这个问题，通过增强模型的泛化能力。然而，我们证明这些方法有时会过度正则化，无意中抑制了因果特征以及虚假特征。在这项工作中，我们分析了正则化缓解捷径的理论机制，并探讨其有效性的限制。此外，我们确定了正则化能够成功消除捷径而不损害因果特征的条件。通过对合成和真实数据集的实验，我们的综合分析为解决捷径问题的正则化技术的优势和局限性提供了有价值的见解，为开发更加强大的模型提供了指导。

更新时间: 2025-03-21 10:24:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17015v1

Developing Critical Thinking in Second Language Learners: Exploring Generative AI like ChatGPT as a Tool for Argumentative Essay Writing

This study employs the Paul-Elder Critical Thinking Model and Tan's argumentative writing framework to create a structured methodology. This methodology, ChatGPT Guideline for Critical Argumentative Writing (CGCAW) framework, integrates the models with ChatGPT's capabilities to guide L2 learners in utilizing ChatGPT to enhance their critical thinking skills. A quantitative experiment was conducted with 10 participants from a state university, divided into experimental and control groups. The experimental group utilized the CGCAW framework, while the control group used ChatGPT without specific guidelines. Participants wrote an argumentative essay within a 40-minute timeframe, and essays were evaluated by three assessors: ChatGPT, Grammarly, and a course instructor. Results indicated that the experimental group showed improvements in clarity, logical coherence, and use of evidence, demonstrating ChatGPT's potential to enhance specific aspects of argumentative writing. However, the control group performed better in overall language mechanics and articulation of main arguments, indicating areas where the CGCAW framework could be further refined. This study highlights the need for further research to optimize the use of AI tools like ChatGPT in L2 learning environments to enhance critical thinking and writing skills.

Updated: 2025-03-21 10:22:58

标题: 在第二语言学习者中培养批判性思维：探索像ChatGPT这样的生成型人工智能作为辩论性文章写作工具

摘要: 这项研究采用了Paul-Elder关键思维模型和Tan的论证写作框架来创建一个结构化方法论。这种方法论，ChatGPT关键论证写作指南（CGCAW）框架，将这些模型与ChatGPT的能力整合在一起，以指导第二语言学习者利用ChatGPT提高他们的批判性思维能力。在一个州立大学的10名参与者中进行了一项定量实验，分为实验组和对照组。实验组利用了CGCAW框架，而对照组则没有特定的指导。参与者在40分钟的时间内写了一篇论证性文章，文章由三名评估员评估：ChatGPT、Grammarly和一名课程导师。结果表明，实验组在清晰度、逻辑连贯性和证据使用方面有所改善，表明ChatGPT有潜力提高论证写作的特定方面。然而，对照组在整体语言机械和主要论点的表达方面表现更好，表明CGCAW框架需要进一步完善。这项研究强调了需要进一步研究，以优化像ChatGPT这样的人工智能工具在第二语言学习环境中的应用，以增强批判性思维和写作能力。

更新时间: 2025-03-21 10:22:58

领域: cs.HC,cs.AI,I.2.7; K.3.1

下载: http://arxiv.org/abs/2503.17013v1

Privacy Enhanced QKD Networks: Zero Trust Relay Architecture based on Homomorphic Encryption

Quantum key distribution (QKD) enables unconditionally secure symmetric key exchange between parties. However, terrestrial fibre-optic links face inherent distance constraints due to quantum signal degradation. Traditional solutions to overcome these limits rely on trusted relay nodes, which perform intermediate re-encryption of keys using one-time pad (OTP) encryption. This approach, however, exposes keys as plaintext at each relay, requiring significant trust and stringent security controls at every intermediate node. These "trusted" relays become a security liability if compromised. To address this issue, we propose a zero-trust relay design that applies fully homomorphic encryption (FHE) to perform intermediate OTP re-encryption without exposing plaintext keys, effectively mitigating the risks associated with potentially compromised or malicious relay nodes. Additionally, the architecture enhances crypto-agility by incorporating external quantum random number generators, thus decoupling key generation from specific QKD hardware and reducing vulnerabilities tied to embedded key-generation modules. The solution is designed with the existing European Telecommunication Standards Institute (ETSI) QKD standards in mind, enabling straightforward integration into current infrastructures. Its feasibility has been successfully demonstrated through a hybrid network setup combining simulated and commercially available QKD equipment. The proposed zero-trust architecture thus significantly advances the scalability and practical security of large-scale QKD networks, greatly reducing reliance on fully trusted infrastructure.

Updated: 2025-03-21 10:20:06

标题: 隐私增强型量子密钥分发网络：基于同态加密的零信任中继架构

摘要: 量子密钥分发（QKD）实现了各方之间无条件安全的对称密钥交换。然而，地面光纤链路面临量子信号衰减导致的固有距离限制。传统解决方案依赖于可信中继节点，通过使用一次性密码（OTP）加密对密钥进行中间重新加密来克服这些限制。然而，这种方法在每个中继节点将密钥暴露为明文，需要在每个中间节点实施严格的信任和安全控制。如果这些“可信”中继被破坏，将成为安全隐患。为解决这个问题，我们提出了一种零信任中继设计，应用全同态加密（FHE）进行中间OTP重新加密，而不暴露明文密钥，有效减轻了与潜在被破坏或恶意的中继节点相关的风险。此外，该架构通过整合外部量子随机数生成器增强了密码敏捷性，从而将密钥生成与特定QKD硬件分离，减少了与嵌入式密钥生成模块相关的漏洞。该解决方案设计时考虑了现有的欧洲电信标准化协会（ETSI）QKD标准，可以轻松集成到当前基础设施中。通过结合模拟和商业可用的QKD设备的混合网络设置，已成功展示了该方案的可行性。因此，所提出的零信任架构显著推进了大规模QKD网络的可伸缩性和实际安全性，大大减少了对完全信任基础设施的依赖。

更新时间: 2025-03-21 10:20:06

领域: cs.CR

下载: http://arxiv.org/abs/2503.17011v1

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

Updated: 2025-03-21 10:19:01

标题: SPHINX-X：扩展数据和参数以适用于一系列多模态大型语言模型

摘要: 我们提出了SPHINX-X，这是一个基于SPHINX开发的广泛的多模态大型语言模型（MLLM）系列。为了改进架构和训练效率，我们通过去除冗余的视觉编码器、使用跳过令牌绕过完全填充的子图像以及将多阶段训练简化为一阶段全功能模式来修改了SPHINX框架。为了充分释放MLLM的潜力，我们组装了一个包含语言、视觉和视觉语言任务中公开可用资源的全面多域多模态数据集。我们进一步通过我们策划的OCR密集型和Set-of-Mark数据集丰富了这个收藏，扩展了多样性和泛化性。通过在不同的基础LLM（包括TinyLlama1.1B、InternLM2-7B、LLaMA2-13B和Mixtral8x7B）上进行训练，我们获得了一系列在参数大小和多语言能力方面变化的MLLM。全面的基准测试揭示了多模态性能与数据和参数规模之间的强相关性。代码和模型发布在https://github.com/Alpha-VLLM/LLaMA2-Accessory。

更新时间: 2025-03-21 10:19:01

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.05935v3

Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP

Assertion status detection is a critical yet often overlooked component of clinical NLP, essential for accurately attributing extracted medical facts. Past studies have narrowly focused on negation detection, leading to underperforming commercial solutions such as AWS Medical Comprehend, Azure AI Text Analytics, and GPT-4o due to their limited domain adaptation. To address this gap, we developed state-of-the-art assertion detection models, including fine-tuned LLMs, transformer-based classifiers, few-shot classifiers, and deep learning (DL) approaches. We evaluated these models against cloud-based commercial API solutions, the legacy rule-based NegEx approach, and GPT-4o. Our fine-tuned LLM achieves the highest overall accuracy (0.962), outperforming GPT-4o (0.901) and commercial APIs by a notable margin, particularly excelling in Present (+4.2%), Absent (+8.4%), and Hypothetical (+23.4%) assertions. Our DL-based models surpass commercial solutions in Conditional (+5.3%) and Associated-with-Someone-Else (+10.1%) categories, while the few-shot classifier offers a lightweight yet highly competitive alternative (0.929), making it ideal for resource-constrained environments. Integrated within Spark NLP, our models consistently outperform black-box commercial solutions while enabling scalable inference and seamless integration with medical NER, Relation Extraction, and Terminology Resolution. These results reinforce the importance of domain-adapted, transparent, and customizable clinical NLP solutions over general-purpose LLMs and proprietary APIs.

Updated: 2025-03-21 10:18:47

标题: 超越否定性检测：临床自然语言处理的全面断言检测模型

摘要: 断言状态检测是临床自然语言处理中关键但常常被忽视的组成部分，对于准确归因提取的医疗事实至关重要。过去的研究狭窄地专注于否定检测，导致像AWS Medical Comprehend、Azure AI Text Analytics和GPT-4o这样的商业解决方案表现不佳，原因在于它们的领域适应能力有限。为了填补这一空白，我们开发了最先进的断言检测模型，包括经过微调的LLMs、基于transformer的分类器、少样本分类器和深度学习（DL）方法。我们将这些模型与基于云的商业API解决方案、传统基于规则的NegEx方法和GPT-4o进行了评估。我们经过微调的LLM取得了最高的整体准确率（0.962），明显优于GPT-4o（0.901）和商业API，特别在Present（+4.2%）、Absent（+8.4%）和Hypothetical（+23.4%）断言方面表现突出。我们基于DL的模型在Conditional（+5.3%）和Associated-with-Someone-Else（+10.1%）类别中超越了商业解决方案，而少样本分类器提供了一种轻量但竞争力强的替代方案（0.929），使其成为资源受限环境下的理想选择。集成在Spark NLP中，我们的模型在保持可扩展推理和与医疗NER、关系提取和术语解析的无缝集成的同时，始终优于黑匣子商业解决方案。这些结果强调了领域适应、透明和可定制的临床自然语言处理解决方案在一般用途LLMs和专有API之上的重要性。

更新时间: 2025-03-21 10:18:47

领域: cs.CL,cs.IR,cs.LG,H.3

下载: http://arxiv.org/abs/2503.17425v1

AutArch: An AI-assisted workflow for object detection and automated recording in archaeological catalogues

The context of this paper is the creation of large uniform archaeological datasets from heterogeneous published resources, such as find catalogues - with the help of AI and Big Data. The paper is concerned with the challenge of consistent assemblages of archaeological data. We cannot simply combine existing records, as they differ in terms of quality and recording standards. Thus, records have to be recreated from published archaeological illustrations. This is only a viable path with the help of automation. The contribution of this paper is a new workflow for collecting data from archaeological find catalogues available as legacy resources, such as archaeological drawings and photographs in large unsorted PDF files; the workflow relies on custom software (AutArch) supporting image processing, object detection, and interactive means of validating and adjusting automatically retrieved data. We integrate artificial intelligence (AI) in terms of neural networks for object detection and classification into the workflow, thereby speeding up, automating, and standardising data collection. Objects commonly found in archaeological catalogues - such as graves, skeletons, ceramics, ornaments, stone tools and maps - are detected. Those objects are spatially related and analysed to extract real-life attributes, such as the size and orientation of graves based on the north arrow and the scale. We also automate recording of geometric whole-outlines through contour detection, as an alternative to landmark-based geometric morphometrics. Detected objects, contours, and other automatically retrieved data can be manually validated and adjusted. We use third millennium BC Europe (encompassing cultures such as 'Corded Ware' and 'Bell Beaker', and their burial practices) as a 'testing ground' and for evaluation purposes; this includes a user study for the workflow and the AutArch software.

Updated: 2025-03-21 10:15:21

标题: AutArch：考古目录中物体检测和自动记录的AI辅助工作流程

摘要: 这篇论文的背景是从异质性出版资源（如发现目录）中创建大型统一的考古数据集，借助人工智能和大数据的帮助。论文关注的是考古数据一致性组合的挑战。我们不能简单地合并现有记录，因为它们在质量和记录标准方面有所不同。因此，必须从已发表的考古插图中重新创建记录。只有通过自动化才能实现这一路径。本文的贡献是提出了一种新的工作流程，用于从考古发现目录中收集数据，这些目录作为遗留资源可用，如大量未分类的PDF文件中的考古图纸和照片；该工作流程依赖于支持图像处理、目标检测和交互式验证和调整自动检索数据的定制软件（AutArch）。我们将人工智能（AI）集成到工作流程中，通过神经网络进行目标检测和分类，从而加快、自动化和标准化数据收集。在考古目录中常见的对象 - 如墓穴、骨骼、陶器、装饰品、石器和地图 - 被检测到。这些对象在空间上相关联并分析，以提取真实属性，如根据北箭头和比例尺确定墓穴的大小和方向。我们还通过轮廓检测自动记录几何整体轮廓，作为基于地标的几何形态测量的替代方法。检测到的对象、轮廓和其他自动检索的数据可以手动验证和调整。我们将第三千年前的欧洲（包括'绳纹陶器'和'鸟喙杯'等文化及其埋葬习俗）作为“测试场”，用于评估目的；这包括对工作流程和AutArch软件进行用户研究。

更新时间: 2025-03-21 10:15:21

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2311.17978v3

Targetless 6DoF Calibration of LiDAR and 2D Scanning Radar Based on Cylindrical Occupancy

Owing to the capability for reliable and all-weather long-range sensing, the fusion of LiDAR and Radar has been widely applied to autonomous vehicles for robust perception. In practical operation, well manually calibrated extrinsic parameters, which are crucial for the fusion of multi-modal sensors, may drift due to the vibration. To address this issue, we present a novel targetless calibration approach, termed LiRaCo, for the extrinsic 6DoF calibration of LiDAR and Radar sensors. Although both types of sensors can obtain geometric information, bridging the geometric correspondences between multi-modal data without any clues of explicit artificial markers is nontrivial, mainly due to the low vertical resolution of scanning Radar. To achieve the targetless calibration, LiRaCo leverages a spatial occupancy consistency between LiDAR point clouds and Radar scans in a common cylindrical representation, considering the increasing data sparsity with distance for both sensors. Specifically, LiRaCo expands the valid Radar scanned pixels into 3D occupancy grids to constrain LiDAR point clouds based on spatial consistency. Consequently, a cost function involving extrinsic calibration parameters is formulated based on the spatial overlap of 3D grids and LiDAR points. Extrinsic parameters are finally estimated by optimizing the cost function. Comprehensive quantitative and qualitative experiments on two real outdoor datasets with different LiDAR sensors demonstrate the feasibility and accuracy of the proposed method. The source code will be publicly available.

Updated: 2025-03-21 10:09:04

标题: 基于圆柱体占用的LiDAR和2D扫描雷达的无目标6DoF校准

摘要: 由于LiDAR和雷达具有可靠和全天候的长距离感知能力，因此将二者融合应用于自主车辆的健壮感知已得到广泛应用。在实际操作中，对于多模态传感器融合至关重要的手动校准外参数可能会由于振动而漂移。为解决这一问题，我们提出了一种新颖的无目标校准方法，称为LiRaCo，用于对LiDAR和雷达传感器的外参数6DoF校准。尽管两种类型的传感器都可以获取几何信息，但在没有任何明确人工标记的情况下，跨越多模态数据之间的几何对应关系并不容易，主要是由于扫描雷达的垂直分辨率较低。为实现无目标校准，LiRaCo在一个共同的柱状表示中利用LiDAR点云和雷达扫描之间的空间占用一致性，考虑到两种传感器随距离增加数据稀疏性。具体而言，LiRaCo将有效雷达扫描像素扩展为3D占用栅格，以便基于空间一致性约束LiDAR点云。因此，基于3D栅格和LiDAR点的空间重叠，形成涉及外参数校准参数的成本函数。最终通过优化成本函数来估计外参数。对具有不同LiDAR传感器的两个真实室外数据集进行的全面定量和定性实验表明了所提出方法的可行性和准确性。源代码将公开提供。

更新时间: 2025-03-21 10:09:04

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.17002v1

Can Zero-Shot Commercial APIs Deliver Regulatory-Grade Clinical Text DeIdentification?

We systematically assess the performance of three leading API-based de-identification systems - Azure Health Data Services, AWS Comprehend Medical, and OpenAI GPT-4o - against our de-identification systems on a ground truth dataset of 48 clinical documents annotated by medical experts. Our analysis, conducted at both entity-level and token-level, demonstrates that our solution, Healthcare NLP, achieves the highest accuracy, with a 96% F1-score in protected health information (PHI) detection, significantly outperforming Azure (91%), AWS (83%), and GPT-4o (79%). Beyond accuracy, Healthcare NLP is also the most cost-effective solution, reducing processing costs by over 80% compared to Azure and GPT-4o. Its fixed-cost local deployment model avoids the escalating per-request fees of cloud-based services, making it a scalable and economical choice. Our results underscore a critical limitation: zero-shot commercial APIs fail to meet the accuracy, adaptability, and cost-efficiency required for regulatory-grade clinical de-identification. Healthcare NLP's superior performance, customization capabilities, and economic advantages position it as the more viable solution for healthcare organizations seeking compliance and scalability in clinical NLP workflows.

Updated: 2025-03-21 10:05:04

标题: 零射击商业API能提供符合监管标准的临床文本去识别吗？

摘要: 我们系统评估了三种领先的基于API的去识别系统 - Azure Health Data Services、AWS Comprehend Medical和OpenAI GPT-4o - 与我们的去识别系统在由医学专家注释的48份临床文件的基准数据集上的性能。我们的分析在实体级和标记级两个层面进行，表明我们的解决方案Healthcare NLP在受保护健康信息（PHI）检测方面实现了最高准确率，F1分数达到96％，明显优于Azure（91％）、AWS（83％）和GPT-4o（79％）。除了准确性外，Healthcare NLP也是最具成本效益的解决方案，与Azure和GPT-4o相比，可将处理成本降低超过80％。其固定成本的本地部署模型避免了基于云的服务逐请求费用不断上涨，使其成为一种可扩展和经济的选择。我们的结果强调了一个关键限制：零射商业API未能满足监管级临床去识别所需的准确性、适应性和成本效益。Healthcare NLP的卓越性能、定制能力和经济优势使其成为寻求合规性和临床NLP工作流程中的可扩展性的医疗机构更可行的解决方案。

更新时间: 2025-03-21 10:05:04

领域: cs.CL,cs.CR,cs.IR,cs.LG,H.3, F.2.2, I.2.7

下载: http://arxiv.org/abs/2503.20794v1

Strength Estimation and Human-Like Strength Adjustment in Games

Strength estimation and adjustment are crucial in designing human-AI interactions, particularly in games where AI surpasses human players. This paper introduces a novel strength system, including a strength estimator (SE) and an SE-based Monte Carlo tree search, denoted as SE-MCTS, which predicts strengths from games and offers different playing strengths with human styles. The strength estimator calculates strength scores and predicts ranks from games without direct human interaction. SE-MCTS utilizes the strength scores in a Monte Carlo tree search to adjust playing strength and style. We first conduct experiments in Go, a challenging board game with a wide range of ranks. Our strength estimator significantly achieves over 80% accuracy in predicting ranks by observing 15 games only, whereas the previous method reached 49% accuracy for 100 games. For strength adjustment, SE-MCTS successfully adjusts to designated ranks while achieving a 51.33% accuracy in aligning to human actions, outperforming a previous state-of-the-art, with only 42.56% accuracy. To demonstrate the generality of our strength system, we further apply SE and SE-MCTS to chess and obtain consistent results. These results show a promising approach to strength estimation and adjustment, enhancing human-AI interactions in games. Our code is available at https://rlg.iis.sinica.edu.tw/papers/strength-estimator.

Updated: 2025-03-21 09:57:03

标题: 游戏中的力量估计和类人力量调整

摘要: 强度估计和调整在设计人工智能交互中至关重要，特别是在人工智能超过人类玩家的游戏中。本文介绍了一种新颖的强度系统，包括一个强度估计器（SE）和一个基于SE的蒙特卡洛树搜索，称为SE-MCTS，它可以从游戏中预测强度，并提供具有不同人类风格的不同游戏强度。强度估计器可以计算强度分数并从游戏中预测等级，而无需直接与人类互动。SE-MCTS利用强度分数在蒙特卡洛树搜索中调整游戏强度和风格。我们首先在围棋中进行实验，这是一款具有广泛等级范围的挑战性棋盘游戏。我们的强度估计器仅观察15场比赛就显著实现了80%以上的等级预测准确率，而之前的方法在观察100场比赛后仅达到49%的准确率。对于强度调整，SE-MCTS成功地调整到指定的等级，同时在与人类行为对齐时达到了51.33%的准确率，优于之前的最新技术，其准确率仅为42.56%。为了展示我们强度系统的普适性，我们进一步将SE和SE-MCTS应用于国际象棋，并获得了一致的结果。这些结果显示了一种有前途的强度估计和调整方法，可以增强游戏中的人工智能交互。我们的代码可在https://rlg.iis.sinica.edu.tw/papers/strength-estimator 上找到。

更新时间: 2025-03-21 09:57:03

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2502.17109v2

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions

Adversarial robustness is a critical challenge in deploying deep neural networks for real-world applications. While adversarial training is a widely recognized defense strategy, most existing studies focus on balanced datasets, overlooking the prevalence of long-tailed distributions in real-world data, which significantly complicates robustness. This paper provides a comprehensive analysis of adversarial training under long-tailed distributions and identifies limitations in the current state-of-the-art method, AT-BSL, in achieving robust performance under such conditions. To address these challenges, we propose a novel training framework, TAET, which integrates an initial stabilization phase followed by a stratified equalization adversarial training phase. Additionally, prior work on long-tailed robustness has largely ignored the crucial evaluation metric of balanced accuracy. To bridge this gap, we introduce the concept of balanced robustness, a comprehensive metric tailored for assessing robustness under long-tailed distributions. Extensive experiments demonstrate that our method surpasses existing advanced defenses, achieving significant improvements in both memory and computational efficiency. This work represents a substantial advancement in addressing robustness challenges in real-world applications. Our code is available at: https://github.com/BuhuiOK/TAET-Two-Stage-Adversarial-Equalization-Training-on-Long-Tailed-Distributions.

Updated: 2025-03-21 09:56:29

标题: TAET：长尾分布上的两阶段对抗均衡训练

摘要: 对抗鲁棒性是在部署深度神经网络用于现实世界应用中面临的关键挑战。虽然对抗训练是一种被广泛认可的防御策略，但大多数现有研究集中在平衡数据集上，忽视了现实世界数据中长尾分布的普遍存在，这显著复杂化了鲁棒性。本文对长尾分布下的对抗训练进行了全面分析，并确定了当前最先进方法AT-BSL在实现这种条件下的鲁棒性表现上的限制。为解决这些挑战，我们提出了一种新的训练框架TAET，该框架整合了一个初始稳定化阶段，然后是一个分层均衡对抗训练阶段。此外，长尾鲁棒性的先前研究很大程度上忽视了平衡准确率这一关键评估指标。为弥补这一差距，我们引入了平衡鲁棒性的概念，这是一个专门为评估长尾分布下的鲁棒性而设计的综合指标。大量实验表明，我们的方法超越了现有的先进防御方法，在内存和计算效率方面取得了显著提高。这项工作代表了在现实世界应用中应对鲁棒性挑战方面的重大进展。我们的代码可在以下网址找到：https://github.com/BuhuiOK/TAET-Two-Stage-Adversarial-Equalization-Training-on-Long-Tailed-Distributions。

更新时间: 2025-03-21 09:56:29

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.01924v3

Friend or Foe? Navigating and Re-configuring "Snipers' Alley"

In a 'digital by default' society, essential services must be accessed online. This opens users to digital deception not only from criminal fraudsters but from a range of actors in a marketised digital economy. Using grounded empirical research from northern England, we show how supposedly 'trusted' actors, such as governments,(re)produce the insecurities and harms that they seek to prevent. Enhanced by a weakening of social institutions amid a drive for efficiency and scale, this has built a constricted, unpredictable digital channel. We conceptualise this as a "snipers' alley". Four key snipers articulated by participants' lived experiences are examined: 1) Governments; 2) Business; 3) Criminal Fraudsters; and 4) Friends and Family to explore how snipers are differentially experienced and transfigure through this constricted digital channel. We discuss strategies to re-configure the alley, and how crafting and adopting opportunity models can enable more equitable forms of security for all.

Updated: 2025-03-21 09:56:25

标题: 友军还是敌军？导航和重新配置“狙击手巷”

摘要: 在一个“数字优先”的社会中，必须在线上获取基本服务。这不仅使用户容易受到犯罪欺诈分子的数字欺骗，还使他们容易受到市场化数字经济中各种行为者的数字欺骗。通过对英格兰北部的基于实证的研究，我们展示了所谓“信任”的行为者，如政府，如何(重新)产生他们试图阻止的不安全和伤害。在社会制度减弱的背景下，为了追求效率和规模，这种情况变得更加严重，构建了一个受限、不可预测的数字渠道。我们将这种情况概念化为“狙击手巷”。参与者生活经验中表达的四个关键狙击手被审视：1)政府；2)企业；3)犯罪欺诈分子；4)朋友和家人，以探讨狙击手在这个受限数字渠道中如何不同程度地被体验和转变。我们讨论了重新配置巷道的策略，以及如何制定和采纳机会模式可以为所有人提供更公平的安全形式。

更新时间: 2025-03-21 09:56:25

领域: cs.HC,cs.CR

下载: http://arxiv.org/abs/2503.16992v1

Building Multilingual Datasets for Predicting Mental Health Severity through LLMs: Prospects and Challenges

Large Language Models (LLMs) are increasingly being integrated into various medical fields, including mental health support systems. However, there is a gap in research regarding the effectiveness of LLMs in non-English mental health support applications. To address this problem, we present a novel multilingual adaptation of widely-used mental health datasets, translated from English into six languages (e.g., Greek, Turkish, French, Portuguese, German, and Finnish). This dataset enables a comprehensive evaluation of LLM performance in detecting mental health conditions and assessing their severity across multiple languages. By experimenting with GPT and Llama, we observe considerable variability in performance across languages, despite being evaluated on the same translated dataset. This inconsistency underscores the complexities inherent in multilingual mental health support, where language-specific nuances and mental health data coverage can affect the accuracy of the models. Through comprehensive error analysis, we emphasize the risks of relying exclusively on LLMs in medical settings (e.g., their potential to contribute to misdiagnoses). Moreover, our proposed approach offers significant cost savings for multilingual tasks, presenting a major advantage for broad-scale implementation.

Updated: 2025-03-21 09:56:15

标题: 通过LLMs构建用于预测心理健康严重程度的多语言数据集：前景与挑战

摘要: 大型语言模型（LLMs）越来越多地被整合到各个医疗领域，包括心理健康支持系统。然而，关于LLMs在非英语心理健康支持应用中的有效性存在研究空白。为了解决这个问题，我们提出了一种新颖的多语言适应方法，将常用的心理健康数据集从英语翻译成六种语言（例如希腊语、土耳其语、法语、葡萄牙语、德语和芬兰语）。这个数据集使得能够全面评估LLMs在检测心理健康状况和评估其严重程度方面在多种语言中的表现。通过对GPT和Llama进行实验，我们观察到在不同语言之间表现存在显著的变化，尽管评估的是同一翻译数据集。这种不一致强调了多语言心理健康支持中固有的复杂性，其中语言特定的细微差别和心理健康数据的覆盖范围可能影响模型的准确性。通过全面的错误分析，我们强调了在医疗环境中仅依赖LLMs的风险（例如可能导致误诊）。此外，我们提出的方法为多语言任务带来了显著的成本节约，为广泛实施提供了重大优势。

更新时间: 2025-03-21 09:56:15

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.17397v2

TRACE: Time SeRies PArameter EffiCient FinE-tuning

We propose an efficient fine-tuning method for time series foundation models, termed TRACE: Time Series Parameter Efficient Fine-tuning. While pretrained time series foundation models are gaining popularity, they face the following challenges: (1) Unlike natural language tasks, time series data vary in frequency, channel numbers, historical/prediction lengths. For long-term forecasting tasks in particular, tailored fine-tuning can significantly enhance performance.(2) Existing parameter-efficient tuning methods like LoRA remain applicable but require adaptation to temporal characteristics. To address these challenges, our TRACE framework introduces two key innovations: (1) Gated DSIC (Gated Dynamic Simulation Importance Calculation), an unbiased LoRA module importance selection mechanism that ensures conditional parameter consistency before and after masking. Experiments demonstrate that Gated DSIC outperforms common fine-tuning. (2) Reconstructed prediction heads for long-term forecasting tasks, which achieve comparable or superior performance to linear probing heads while drastically reducing parameter counts. Extensive experiments on long-/short-term forecasting and anomaly detection tasks across diverse datasets, coupled with ablation studies, validate the effectiveness of our method.

Updated: 2025-03-21 09:55:43

标题: TRACE：时间序列参数高效微调

摘要: 我们提出了一种高效的时间序列基础模型微调方法，称为TRACE：时间序列参数高效微调。虽然预训练的时间序列基础模型越来越受欢迎，但它们面临以下挑战：（1）与自然语言任务不同，时间序列数据在频率、通道数量、历史/预测长度上变化。特别是对于长期预测任务，定制的微调可以显著提升性能。（2）现有的参数高效调整方法如LoRA仍然适用，但需要适应时间特征。为了解决这些挑战，我们的TRACE框架引入了两个关键创新：（1）门控DSIC（门控动态模拟重要性计算），一个无偏的LoRA模块重要性选择机制，确保在掩模前后条件参数一致性。实验证明，门控DSIC优于常见的微调。（2）重建的长期预测任务预测头，实现与线性探测头相当或更优秀的性能，同时大幅减少参数数量。在不同数据集上进行的长期/短期预测和异常检测任务的大量实验，结合消融研究，验证了我们方法的有效性。

更新时间: 2025-03-21 09:55:43

领域: cs.LG

下载: http://arxiv.org/abs/2503.16991v1

Survival Analysis with Machine Learning for Predicting Li-ion Battery Remaining Useful Life

The accurate prediction of RUL for lithium-ion batteries is crucial for enhancing the reliability and longevity of energy storage systems. Traditional methods for RUL prediction often struggle with issues such as data sparsity, varying battery chemistries, and the inability to capture complex degradation patterns over time. In this study, we propose a survival analysis-based framework combined with deep learning models to predict the RUL of lithium-ion batteries. Specifically, we utilize five advanced models: the Cox-type models (Cox, CoxPH, and CoxTime) and two machine-learning-based models (DeepHit and MTLR). These models address the challenges of accurate RUL estimation by transforming raw time-series battery data into survival data, including key degradation indicators such as voltage, current, and internal resistance. Advanced feature extraction techniques enhance the model's robustness in diverse real-world scenarios, including varying charging conditions and battery chemistries. Our models are tested using 10-fold cross-validation, ensuring generalizability and minimizing overfitting. Experimental results show that our survival-based framework significantly improves RUL prediction accuracy compared to traditional methods, providing a reliable tool for battery management and maintenance optimization. This study contributes to the advancement of predictive maintenance in battery technology, offering valuable insights for both researchers and industry practitioners aiming to enhance the operational lifespan of lithium-ion batteries.

Updated: 2025-03-21 09:53:22

标题: 用机器学习进行生存分析以预测锂离子电池剩余寿命

摘要: 锂离子电池寿命预测的准确性对于提升能源存储系统的可靠性和寿命至关重要。传统的寿命预测方法通常面临数据稀缺、不同电池化学性质变化以及无法捕捉随时间复杂恶化模式等问题。本研究提出了一种基于生存分析的框架结合深度学习模型来预测锂离子电池的寿命。具体来说，我们利用了五种先进模型：Cox型模型（Cox、CoxPH和CoxTime）和两种基于机器学习的模型（DeepHit和MTLR）。这些模型通过将原始时间序列电池数据转换为生存数据，包括关键的恶化指标如电压、电流和内阻，来解决准确估计寿命的挑战。先进的特征提取技术增强了模型在不同充电条件和电池化学性质下的稳健性。我们的模型通过10折交叉验证进行测试，确保泛化性并最小化过拟合。实验结果显示，我们基于生存的框架相比传统方法显著提高了寿命预测的准确性，为电池管理和维护优化提供了可靠工具。这项研究为电池技术的预测性维护做出了贡献，为旨在提升锂离子电池运行寿命的研究人员和工业从业者提供了宝贵见解。

更新时间: 2025-03-21 09:53:22

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.13558v2

A Stateless and Secure Delivery versus Payment across two Blockchains

We propose a lean and functional transaction scheme to establish a secure delivery-versus-payment across two blockchains, where a) no intermediary is required and b) the operator of the payment chain/payment system has a small overhead and does not need to store state. The main idea comes with two requirements: First, the payment chain operator hosts a stateless decryption service that allows decrypting messages with his secret key. Second, a "Payment Contract" is deployed on the payment chain that implements a function transferAndDecrypt(uint id, address from, address to, string keyEncryptedSuccess, string keyEncryptedFail) that processes the payment and emits the decrypted key depending on the success or failure of the transaction. The respective key can then trigger an associated transaction, e.g. claiming delivery by the buyer or re-claiming the locked asset by the seller. The contract interfaces described here are available as ERC 7573. A reference implementation of the decryption oracle is available via GitLab.

Updated: 2025-03-21 09:51:50

标题: 跨两个区块链的无状态和安全交付与支付

摘要: 我们提出了一种精简而功能性的交易方案，用于在两个区块链之间建立安全的交付对付交易，其中a）不需要中介，b）支付链/支付系统的运营商有很小的开销且不需要存储状态。主要思想包括两个要求：首先，支付链运营商托管一个无状态解密服务，允许使用其秘钥解密消息。其次，在支付链上部署了一个“支付合同”，实现一个函数transferAndDecrypt(uint id, address from, address to, string keyEncryptedSuccess, string keyEncryptedFail)，该函数处理支付并根据交易成功或失败发出解密密钥。然后，相应的密钥可以触发相关交易，比如买方要求交付或卖方重新索取锁定资产。这里描述的合同接口可作为ERC 7573使用。解密预言机的参考实现可通过GitLab获得。

更新时间: 2025-03-21 09:51:50

领域: cs.CR,E.4

下载: http://arxiv.org/abs/2311.05966v4

Data to Decisions: A Computational Framework to Identify skill requirements from Advertorial Data

Among the factors of production, human capital or skilled manpower is the one that keeps evolving and adapts to changing conditions and resources. This adaptability makes human capital the most crucial factor in ensuring a sustainable growth of industry/sector. As new technologies are developed and adopted, the new generations are required to acquire skills in newer technologies in order to be employable. At the same time professionals are required to upskill and reskill themselves to remain relevant in the industry. There is however no straightforward method to identify the skill needs of the industry at a given point of time. Therefore, this paper proposes a data to decision framework that can successfully identify the desired skill set in a given area by analysing the advertorial data collected from popular online job portals and supplied as input to the framework. The proposed framework uses techniques of statistical analysis, data mining and natural language processing for the purpose. The applicability of the framework is demonstrated on CS&IT job advertisement data from India. The analytical results not only provide useful insights about current state of skill needs in CS&IT industry but also provide practical implications to prospective job applicants, training agencies, and institutions of higher education & professional training.

Updated: 2025-03-21 09:49:31

标题: 数据到决策：从广告数据中识别技能需求的计算框架

摘要: 在生产要素中，人力资本或熟练人才是一个不断发展并适应变化条件和资源的因素。这种适应性使人力资本成为确保工业/部门可持续增长的最关键因素。随着新技术的开发和应用，新一代人需要掌握新技术的技能以便就业。与此同时，专业人士需要提升和更新自己的技能以保持在行业中的相关性。然而，在某一特定时间点上确定行业技能需求并不是一件简单的事情。因此，本文提出了一个数据到决策框架，通过分析从热门在线招聘网站收集的广告数据作为框架的输入，成功地识别特定领域所需的技能集。所提出的框架使用统计分析、数据挖掘和自然语言处理技术来实现这一目的。该框架的适用性在印度的CS&IT工作广告数据上得到了证明。分析结果不仅提供了关于CS&IT行业当前技能需求状况的有用见解，还为潜在求职者、培训机构和高等教育和专业培训机构提供了实际意义。

更新时间: 2025-03-21 09:49:31

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.17424v1

EVSOAR: Security Orchestration, Automation and Response via EV Charging Stations

Vehicle cybersecurity has emerged as a critical concern, driven by the innovation in the automotive industry, e.g., automomous, electric, or connnected vehicles. Current efforts to address these challenges are constrained by the limited computational resources of vehicles and the reliance on connected infrastructures. This motivated the foundation of Vehicle Security Operations Centers (VSOCs) that extend IT-based Security Operations Centers (SOCs) to cover the entire automotive ecosystem, both the in-vehicle and off-vehicle scopes. Security Orchestration, Automation, and Response (SOAR) tools are considered key for impelementing an effective cybersecurity solution. However, existing state-of-the-art solutions depend on infrastructure networks such as 4G, 5G, and WiFi, which often face scalability and congestion issues. To address these limitations, we propose a novel SOAR architecture EVSOAR that leverages the EV charging stations for connectivity and computing to enhance vehicle cybersecurity. Our EV-specific SOAR architecture enables real-time analysis and automated responses to cybersecurity threats closer to the EV, reducing the cellular latency, bandwidth, and interference limitations. Our experimental results demonstrate a significant improvement in latency, stability, and scalability through the infrastructure and the capacity to deploy computationally intensive applications, that are otherwise infeasible within the resource constraints of individual vehicles.

Updated: 2025-03-21 09:48:29

标题: EVSOAR：通过电动汽车充电站实现安全编排、自动化和响应

摘要: 车辆网络安全已经成为一个关键问题，受到汽车行业创新的推动，例如自动驾驶、电动或互联车辆。目前解决这些挑战的努力受到车辆计算资源有限和依赖于互联基础设施的限制。这促使建立了车辆安全运营中心（VSOCs），将基于IT的安全运营中心（SOCs）扩展到覆盖整个汽车生态系统，包括车内和车外范围。安全编排、自动化和响应（SOAR）工具被认为是实施有效网络安全解决方案的关键。然而，现有的最先进解决方案依赖于基础设施网络，如4G、5G和WiFi，往往面临可扩展性和拥塞问题。为了解决这些限制，我们提出了一种新颖的SOAR架构EVSOAR，利用电动车充电站进行连接和计算，以增强车辆网络安全。我们的电动车特定的SOAR架构使得能够在接近电动车的地方进行实时分析和自动响应网络安全威胁，减少了蜂窝网络的延迟、带宽和干扰限制。我们的实验结果表明，通过基础设施和部署计算密集型应用程序的能力，我们在延迟、稳定性和可扩展性方面取得了显著的改进，否则这些是在个体车辆资源约束内不可行的。

更新时间: 2025-03-21 09:48:29

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.16984v1

Enabling Versatile Controls for Video Diffusion Models

Despite substantial progress in text-to-video generation, achieving precise and flexible control over fine-grained spatiotemporal attributes remains a significant unresolved challenge in video generation research. To address these limitations, we introduce VCtrl (also termed PP-VCtrl), a novel framework designed to enable fine-grained control over pre-trained video diffusion models in a unified manner. VCtrl integrates diverse user-specified control signals-such as Canny edges, segmentation masks, and human keypoints-into pretrained video diffusion models via a generalizable conditional module capable of uniformly encoding multiple types of auxiliary signals without modifying the underlying generator. Additionally, we design a unified control signal encoding pipeline and a sparse residual connection mechanism to efficiently incorporate control representations. Comprehensive experiments and human evaluations demonstrate that VCtrl effectively enhances controllability and generation quality. The source code and pre-trained models are publicly available and implemented using the PaddlePaddle framework at http://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl.

Updated: 2025-03-21 09:48:00

标题: 实现视频扩散模型的多功能控制

摘要: 尽管在文本到视频生成方面取得了重大进展，但在视频生成研究中，实现对细粒度时空属性的精确和灵活控制仍然是一个重要的未解决挑战。为了解决这些限制，我们引入了VCtrl（也称为PP-VCtrl），这是一个新颖的框架，旨在以统一的方式实现对预训练视频扩散模型的细粒度控制。VCtrl通过一个通用的条件模块，将各种用户指定的控制信号（如Canny边缘、分割掩模和人体关键点）整合到预训练视频扩散模型中，该条件模块能够统一地编码多种类型的辅助信号，而无需修改底层生成器。此外，我们设计了统一的控制信号编码管道和稀疏剩余连接机制，以有效地整合控制表示。全面的实验和人类评估表明，VCtrl有效地增强了可控性和生成质量。源代码和预训练模型已公开发布，并使用PaddlePaddle框架实现，网址为http://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl。

更新时间: 2025-03-21 09:48:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16983v1

Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models

Token-based video representation has emerged as a promising approach for enabling large language models to interpret video content. However, existing token reduction techniques, such as token pruning and token merging, often disrupt essential spatial-temporal positional embeddings, failing to adequately balance computational efficiency with fewer tokens. Consequently, these methods result in relatively lengthy token sequences, limiting their applicability in scenarios requiring extreme token compression, such as video large language models. In this paper, we introduce the novel task of extreme short token reduction, aiming to represent extensive video sequences with a minimal number of tokens. To address this challenge, we propose Token Dynamics, a new video representation framework that dynamically reduces token count while preserving spatial-temporal coherence. Specifically, we disentangle video representations by separating visual embeddings from grid-level motion information, structuring them into: 1. a concise token base, created by clustering tokens that describe object-level content; 2. a token dynamics map, capturing detailed spatial-temporal motion patterns across grids. Furthermore, we introduce a cross-dynamics attention mechanism that integrates motion features into the token base without increasing token length, thereby maintaining compactness and spatial-temporal integrity. The experiments demonstrate a reduction of token count to merely 0.07% of the original tokens, with only a minor performance drop of 1.13%. Additionally, we propose two novel subtasks within extreme token reduction (fixed-length and adaptive-length compression), both effectively representing long token sequences for video-language tasks. Our method offers significantly lower theoretical complexity, fewer tokens, and enhanced throughput, thus providing an efficient solution for video LLMs.

Updated: 2025-03-21 09:46:31

标题: Token动态性：为视频大型语言模型提供高效且动态的视频Token表示

摘要: 基于标记的视频表示已经成为一种有前途的方法，可以使大型语言模型解释视频内容。然而，现有的标记减少技术，例如标记修剪和标记合并，经常会破坏必要的空间-时间位置嵌入，未能充分平衡计算效率和较少标记之间的关系。因此，这些方法导致相对较长的标记序列，限制了它们在需要极端标记压缩的场景中的适用性，例如视频大型语言模型。在本文中，我们引入了极端短标记减少的新任务，旨在用最少数量的标记表示广泛的视频序列。为了解决这一挑战，我们提出了Token Dynamics，一个新的视频表示框架，可以在保持空间-时间连贯性的同时动态减少标记数量。具体来说，我们通过将视觉嵌入与网格级运动信息分离来解开视频表示，将它们结构化为：1. 简明的标记基础，通过聚类描述对象级内容的标记创建；2. 标记动态图，捕捉跨网格的详细空间-时间运动模式。此外，我们引入了一个跨动态注意机制，将运动特征整合到标记基础中，而不增加标记长度，从而保持紑紧和空间-时间完整性。实验表明，标记数量减少到原始标记的仅0.07%，性能仅下降1.13%。此外，我们提出了极端标记减少中的两个新的子任务（固定长度和自适应长度压缩），都可以有效地表示视频语言任务的长标记序列。我们的方法提供了显著更低的理论复杂性、更少的标记和增强的吞吐量，为视频LLM提供了高效的解决方案。

更新时间: 2025-03-21 09:46:31

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.16980v1

Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles

Diffusion models have shown impressive performance in capturing complex and multi-modal action distributions for game agents, but their slow inference speed prevents practical deployment in real-time game environments. While consistency models offer a promising approach for one-step generation, they often suffer from training instability and performance degradation when applied to policy learning. In this paper, we present CPQE (Consistency Policy with Q-Ensembles), which combines consistency models with Q-ensembles to address these challenges.CPQE leverages uncertainty estimation through Q-ensembles to provide more reliable value function approximations, resulting in better training stability and improved performance compared to classic double Q-network methods. Our extensive experiments across multiple game scenarios demonstrate that CPQE achieves inference speeds of up to 60 Hz -- a significant improvement over state-of-the-art diffusion policies that operate at only 20 Hz -- while maintaining comparable performance to multi-step diffusion approaches. CPQE consistently outperforms state-of-the-art consistency model approaches, showing both higher rewards and enhanced training stability throughout the learning process. These results indicate that CPQE offers a practical solution for deploying diffusion-based policies in games and other real-time applications where both multi-modal behavior modeling and rapid inference are critical requirements.

Updated: 2025-03-21 09:45:59

标题: 实时扩散策略对游戏的影响：将一致性策略与Q-集成相结合

摘要: 扩散模型在捕捉复杂和多模态动作分布方面表现出色，但其慢速推理速度阻碍了在实时游戏环境中的实际部署。尽管一致性模型为一步生成提供了一种有前途的方法，但在应用于策略学习时往往会遭受训练不稳定和性能下降的困扰。本文介绍了CPQE（一致性策略与Q集成），它将一致性模型与Q集成相结合，以解决这些挑战。CPQE通过Q集成的不确定性估计来提供更可靠的值函数近似，从而实现比经典双Q网络方法更好的训练稳定性和性能改进。我们在多个游戏场景中进行了大量实验，结果表明CPQE实现了高达60 Hz的推理速度，这是对现有20 Hz扩散策略的显著改进，同时保持了与多步扩散方法相媲美的性能。CPQE始终优于现有技术的一致性模型方法，表现出更高的奖励和在整个学习过程中增强的训练稳定性。这些结果表明CPQE为在游戏和其他实时应用中部署基于扩散的策略提供了实际解决方案，其中多模态行为建模和快速推理都是关键要求。

更新时间: 2025-03-21 09:45:59

领域: cs.AI

下载: http://arxiv.org/abs/2503.16978v1

GeoT: Geometry-guided Instance-dependent Transition Matrix for Semi-supervised Tooth Point Cloud Segmentation

Achieving meticulous segmentation of tooth point clouds from intra-oral scans stands as an indispensable prerequisite for various orthodontic applications. Given the labor-intensive nature of dental annotation, a significant amount of data remains unlabeled, driving increasing interest in semi-supervised approaches. One primary challenge of existing semi-supervised medical segmentation methods lies in noisy pseudo labels generated for unlabeled data. To address this challenge, we propose GeoT, the first framework that employs instance-dependent transition matrix (IDTM) to explicitly model noise in pseudo labels for semi-supervised dental segmentation. Specifically, to handle the extensive solution space of IDTM arising from tens of thousands of dental points, we introduce tooth geometric priors through two key components: point-level geometric regularization (PLGR) to enhance consistency between point adjacency relationships in 3D and IDTM spaces, and class-level geometric smoothing (CLGS) to leverage the fixed spatial distribution of tooth categories for optimal IDTM estimation. Extensive experiments performed on the public Teeth3DS dataset and private dataset demonstrate that our method can make full utilization of unlabeled data to facilitate segmentation, achieving performance comparable to fully supervised methods with only $20\%$ of the labeled data.

Updated: 2025-03-21 09:43:57

标题: GeoT：几何引导的实例相关转移矩阵用于半监督牙齿点云分割

摘要: 实现来自口腔扫描的牙齿点云的细致分割是各种正畸应用的不可或缺的先决条件。鉴于牙科标注的劳动密集性，大量数据仍然未标记，这推动了对半监督方法的日益兴趣。现有半监督医学分割方法的一个主要挑战在于生成未标记数据的噪声伪标签。为了解决这一挑战，我们提出了GeoT，这是第一个采用实例相关过渡矩阵（IDTM）来明确建模半监督牙科分割中伪标签噪声的框架。具体来说，为了处理由成千上万的牙齿点引起的IDTM的广泛解空间，我们通过两个关键组件引入了牙齿几何先验：点级几何正则化（PLGR）以增强3D和IDTM空间中点邻接关系之间的一致性，以及类级几何平滑（CLGS）以利用牙齿类别的固定空间分布来进行最佳IDTM估计。在公共Teeth3DS数据集和私有数据集上进行了大量实验，证明我们的方法可以充分利用未标记数据来促进分割，仅使用20%的标记数据即可实现与完全监督方法相媲美的性能。

更新时间: 2025-03-21 09:43:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16976v1

Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks

This study provides the first comprehensive assessment of consistency and reproducibility in Large Language Model (LLM) outputs in finance and accounting research. We evaluate how consistently LLMs produce outputs given identical inputs through extensive experimentation with 50 independent runs across five common tasks: classification, sentiment analysis, summarization, text generation, and prediction. Using three OpenAI models (GPT-3.5-turbo, GPT-4o-mini, and GPT-4o), we generate over 3.4 million outputs from diverse financial source texts and data, covering MD&As, FOMC statements, finance news articles, earnings call transcripts, and financial statements. Our findings reveal substantial but task-dependent consistency, with binary classification and sentiment analysis achieving near-perfect reproducibility, while complex tasks show greater variability. More advanced models do not consistently demonstrate better consistency and reproducibility, with task-specific patterns emerging. LLMs significantly outperform expert human annotators in consistency and maintain high agreement even where human experts significantly disagree. We further find that simple aggregation strategies across 3-5 runs dramatically improve consistency. Simulation analysis reveals that despite measurable inconsistency in LLM outputs, downstream statistical inferences remain remarkably robust. These findings address concerns about what we term "G-hacking," the selective reporting of favorable outcomes from multiple Generative AI runs, by demonstrating that such risks are relatively low for finance and accounting tasks.

Updated: 2025-03-21 09:43:37

标题: 评估大型语言模型在不同金融和会计任务中输出的一致性和可重复性：跨领域证据

摘要: 这项研究首次全面评估了金融和会计研究中大型语言模型（LLM）输出的一致性和可重复性。我们通过对50次独立运行进行广泛实验，评估LLMs在相同输入条件下产生输出的一致性。实验涵盖了五个常见任务：分类、情感分析、摘要、文本生成和预测。我们使用三个OpenAI模型（GPT-3.5-turbo、GPT-4o-mini和GPT-4o），从不同金融来源文本和数据中生成了超过340万个输出，涵盖了MD&A、FOMC声明、财经新闻文章、盈利电话会议记录和财务报表。我们的研究结果显示，虽然任务依赖性较大，但在二元分类和情感分析方面实现了接近完美的可重复性，而复杂任务显示出更大的变异性。更先进的模型并不总是表现出更好的一致性和可重复性，出现了任务特定的模式。LLMs在一致性方面明显优于专家人工标注者，并在人类专家明显分歧的情况下仍保持高度一致。我们进一步发现，通过对3-5次运行进行简单聚合策略，可以显著改善一致性。模拟分析显示，尽管LLM输出存在可测量的不一致性，下游统计推断仍然非常稳健。这些发现解决了我们所称的“G-hacking”对金融和会计任务的有利结果进行选择性报告的担忧，表明这种风险在金融和会计任务中相对较低。

更新时间: 2025-03-21 09:43:37

领域: q-fin.GN,cs.AI,cs.CE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.16974v1

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance

Human action-reaction synthesis, a fundamental challenge in modeling causal human interactions, plays a critical role in applications ranging from virtual reality to social robotics. While diffusion-based models have demonstrated promising performance, they exhibit two key limitations for interaction synthesis: reliance on complex noise-to-reaction generators with intricate conditional mechanisms, and frequent physical violations in generated motions. To address these issues, we propose Action-Reaction Flow Matching (ARFlow), a novel framework that establishes direct action-to-reaction mappings, eliminating the need for complex conditional mechanisms. Our approach introduces two key innovations: an x1-prediction method that directly outputs human motions instead of velocity fields, enabling explicit constraint enforcement; and a training-free, gradient-based physical guidance mechanism that effectively prevents body penetration artifacts during sampling. Extensive experiments on NTU120 and Chi3D datasets demonstrate that ARFlow not only outperforms existing methods in terms of Fr\'echet Inception Distance and motion diversity but also significantly reduces body collisions, as measured by our new Intersection Volume and Intersection Frequency metrics.

Updated: 2025-03-21 09:41:24

标题: ARFlow：具有物理引导的人类动作-反应流匹配

摘要: 人类行为-反应综合是建模因果人类互动中的一个基本挑战，在从虚拟现实到社交机器人等应用中发挥关键作用。虽然基于扩散的模型已经展示出有希望的性能，但它们在交互综合方面存在两个关键限制：依赖复杂的噪声到反应生成器和复杂的条件机制，并且生成的动作经常会违反物理规则。为了解决这些问题，我们提出了一种名为Action-Reaction Flow Matching（ARFlow）的新框架，它建立了直接的行为到反应映射，消除了复杂条件机制的需求。我们的方法引入了两个关键创新：一个直接输出人类动作而不是速度场的x1-预测方法，实现了显式约束执行；以及一个免训练的基于梯度的物理引导机制，有效地防止在采样过程中发生身体穿透现象。在NTU120和Chi3D数据集上进行的大量实验表明，ARFlow不仅在Frechet Inception Distance和动作多样性方面优于现有方法，还显著减少了身体碰撞，这是通过我们的新Intersection Volume和Intersection Frequency指标来衡量的。

更新时间: 2025-03-21 09:41:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16973v1

Governance of Ledger-Anchored Decentralized Identifiers

A Decentralized Identifier (DID) empowers an entity to prove control over a unique and self-issued identifier without relying on any identity provider. The public key material for the proof is encoded into an associated DID document (DDO). This is preferable shared via a distributed ledger because it guarantees algorithmically that everyone has access to the latest state of any tamper-proof DDO but only the entities in control of a DID are able to update theirs. Yet, it is possible to grant deputies the authority to update the DDO on behalf of the DID owner. However, the DID specification leaves largely open on how authorizations over a DDO are managed and enforced among multiple deputies. This article investigates what it means to govern a DID and discusses various forms of how a DID can be controlled by potentially more than one entity. It also presents a prototype of a DID-conform identifier management system where a selected set of governance policies are deployed as Smart Contracts. The article highlights the critical role of governance for the trustworthy and flexible deployment of ledger-anchored DIDs across various domains.

Updated: 2025-03-21 09:41:12

标题: 分类账锚定去中心化标识符的治理

摘要: 去中心化标识符（DID）赋予实体证明对唯一和自行发行的标识符的控制权，而无需依赖任何身份提供者。用于证明的公钥材料被编码到关联的DID文档（DDO）中。这最好通过分布式账本共享，因为它可以算法地保证每个人都能访问任何防篡改的DDO的最新状态，但只有控制DID的实体才能更新自己的状态。然而，可以授权代表DID所有者更新DDO。然而，DID规范在如何管理和强制执行DDO上的授权方面基本上是开放的。本文研究了管理DID的意义，并讨论了DID可能由多个实体控制的各种形式。它还提出了一个符合DID标准的标识符管理系统的原型，其中部署了一组选定的治理政策作为智能合约。文章强调了治理在可信和灵活地部署跨各个领域的账本锚定的DID中的关键作用。

更新时间: 2025-03-21 09:41:12

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2503.16972v1

Solving Drone Routing Problems with Quantum Computing: A Hybrid Approach Combining Quantum Annealing and Gate-Based Paradigms

This paper presents a novel hybrid approach to solving real-world drone routing problems by leveraging the capabilities of quantum computing. The proposed method, coined Quantum for Drone Routing (Q4DR), integrates the two most prominent paradigms in the field: quantum gate-based computing, through the Eclipse Qrisp programming language; and quantum annealers, by means of D-Wave System's devices. The algorithm is divided into two different phases: an initial clustering phase executed using a Quantum Approximate Optimization Algorithm (QAOA), and a routing phase employing quantum annealers. The efficacy of Q4DR is demonstrated through three use cases of increasing complexity, each incorporating real-world constraints such as asymmetric costs, forbidden paths, and itinerant charging points. This research contributes to the growing body of work in quantum optimization, showcasing the practical applications of quantum computing in logistics and route planning.

Updated: 2025-03-21 09:35:28

标题: 用量子计算解决无人机路由问题：将量子退火和基于门的范式相结合的混合方法

摘要: 本文提出了一种新颖的混合方法，利用量子计算的能力来解决现实世界中的无人机路径问题。所提出的方法被称为Quantum for Drone Routing (Q4DR)，将该领域中两种最突出的范式集成在一起：通过Eclipse Qrisp编程语言的量子门基础计算和通过D-Wave System设备的量子退火器。该算法分为两个不同阶段：使用量子近似优化算法（QAOA）执行的初始聚类阶段和利用量子退火器进行路径规划阶段。Q4DR的有效性通过三个逐渐增加复杂性的用例进行了展示，每个用例都包含了诸如非对称成本、禁止路径和流动充电点等现实约束条件。这项研究为量子优化领域不断增长的研究工作做出了贡献，展示了量子计算在物流和路径规划中的实际应用。

更新时间: 2025-03-21 09:35:28

领域: quant-ph,cs.AI,cs.ET

下载: http://arxiv.org/abs/2501.18432v3

Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

DNN-based language models perform excellently on various tasks, but even SOTA LLMs are susceptible to textual adversarial attacks. Adversarial texts play crucial roles in multiple subfields of NLP. However, current research has the following issues. (1) Most textual adversarial attack methods target rich-resourced languages. How do we generate adversarial texts for less-studied languages? (2) Most textual adversarial attack methods are prone to generating invalid or ambiguous adversarial texts. How do we construct high-quality adversarial robustness benchmarks? (3) New language models may be immune to part of previously generated adversarial texts. How do we update adversarial robustness benchmarks? To address the above issues, we introduce HITL-GAT, a system based on a general approach to human-in-the-loop generation of adversarial texts. HITL-GAT contains four stages in one pipeline: victim model construction, adversarial example generation, high-quality benchmark construction, and adversarial robustness evaluation. Additionally, we utilize HITL-GAT to make a case study on Tibetan script which can be a reference for the adversarial research of other less-studied languages.

Updated: 2025-03-21 09:32:39

标题: 《人为干预生成对抗性文本：以藏文为例的案例研究》

摘要: 基于深度神经网络的语言模型在各种任务上表现出色，但即使是最先进的LLM也容易受到文本对抗攻击的影响。对抗文本在自然语言处理的多个子领域起着至关重要的作用。然而，目前的研究存在以下问题：（1）大多数文本对抗攻击方法针对资源丰富的语言。我们如何为研究较少的语言生成对抗性文本？（2）大多数文本对抗攻击方法容易生成无效或模糊的对抗性文本。我们如何构建高质量的对抗鲁棒性基准？（3）新的语言模型可能对先前生成的部分对抗性文本免疫。我们如何更新对抗鲁棒性基准？为了解决以上问题，我们引入了HITL-GAT，这是一个基于人机协同生成对抗性文本的通用方法系统。HITL-GAT包含一个管道中的四个阶段：受害模型构建、对抗性示例生成、高质量基准构建和对抗性鲁棒性评估。此外，我们利用HITL-GAT在藏文上进行案例研究，这可以作为其他少研究语言对抗性研究的参考。

更新时间: 2025-03-21 09:32:39

领域: cs.CL,cs.CR,cs.HC

下载: http://arxiv.org/abs/2412.12478v3

Uncertainty-Driven Modeling of Microporosity and Permeability in Clastic Reservoirs Using Random Forest

Predicting microporosity and permeability in clastic reservoirs is a challenge in reservoir quality assessment, especially in formations where direct measurements are difficult or expensive. These reservoir properties are fundamental in determining a reservoir's capacity for fluid storage and transmission, yet conventional methods for evaluating them, such as Mercury Injection Capillary Pressure (MICP) and Scanning Electron Microscopy (SEM), are resource-intensive. The aim of this study is to develop a cost-effective machine learning model to predict complex reservoir properties using readily available field data and basic laboratory analyses. A Random Forest classifier was employed, utilizing key geological parameters such as porosity, grain size distribution, and spectral gamma-ray (SGR) measurements. An uncertainty analysis was applied to account for natural variability, expanding the dataset, and enhancing the model's robustness. The model achieved a high level of accuracy in predicting microporosity (93%) and permeability levels (88%). By using easily obtainable data, this model reduces the reliance on expensive laboratory methods, making it a valuable tool for early-stage exploration, especially in remote or offshore environments. The integration of machine learning with uncertainty analysis provides a reliable and cost-effective approach for evaluating key reservoir properties in siliciclastic formations. This model offers a practical solution to improve reservoir quality assessments, enabling more informed decision-making and optimizing exploration efforts.

Updated: 2025-03-21 09:05:04

标题: 随机森林在碎屑岩储层微孔隙度和渗透率不确定性建模中的应用

摘要: 在岩性储层中预测微孔隙度和渗透率是储层质量评估中的一项挑战，特别是在直接测量困难或昂贵的地层中。这些储层特性对确定储层的流体储存和传输能力至关重要，然而传统评估方法，如汞注入毛细管压力（MICP）和扫描电子显微镜（SEM），资源密集。本研究旨在开发一种成本效益的机器学习模型，利用现有的现场数据和基础实验室分析来预测复杂的储层特性。采用了随机森林分类器，利用关键的地质参数，如孔隙度、颗粒大小分布和光谱伽马射线（SGR）测量。应用不确定性分析来考虑自然变异，扩展数据集，增强模型的鲁棒性。该模型在预测微孔隙度（93%）和渗透率水平（88%）方面达到了高水平的准确性。通过使用易获得的数据，该模型减少了对昂贵实验室方法的依赖，使其成为早期勘探的有价值的工具，特别是在偏远或海上环境中。机器学习与不确定性分析的整合提供了一种可靠且成本效益的方法，用于评估硅质岩层中关键的储层特性。该模型提供了一个实用的解决方案，以改进储层质量评估，促进更加明智的决策，并优化勘探工作。

更新时间: 2025-03-21 09:05:04

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2503.16957v1

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech

The objective of this study is to generate high-quality speech from silent talking face videos, a task also known as video-to-speech synthesis. A significant challenge in video-to-speech synthesis lies in the substantial modality gap between silent video and multi-faceted speech. In this paper, we propose a novel video-to-speech system that effectively bridges this modality gap, significantly enhancing the quality of synthesized speech. This is achieved by learning of hierarchical representations from video to speech. Specifically, we gradually transform silent video into acoustic feature spaces through three sequential stages -- content, timbre, and prosody modeling. In each stage, we align visual factors -- lip movements, face identity, and facial expressions -- with corresponding acoustic counterparts to ensure the seamless transformation. Additionally, to generate realistic and coherent speech from the visual representations, we employ a flow matching model that estimates direct trajectories from a simple prior distribution to the target speech distribution. Extensive experiments demonstrate that our method achieves exceptional generation quality comparable to real utterances, outperforming existing methods by a significant margin.

Updated: 2025-03-21 09:02:38

标题: 从面孔到声音：学习用于高质量视频转语音的层次表示

摘要: 本研究的目标是从无声讲话面部视频中生成高质量的语音，这也被称为视频到语音合成任务。视频到语音合成中的一个重要挑战在于无声视频和多方面语音之间存在重大的模态差距。在本文中，我们提出了一种有效地弥合这种模态差距的新型视频到语音系统，显著提高了合成语音的质量。这是通过从视频到语音的层次表示学习实现的。具体来说，我们通过三个连续阶段 -- 内容、音色和韵律建模，逐渐将无声视频转换为声学特征空间。在每个阶段，我们将视觉因素 -- 唇部运动、面部身份和面部表情 -- 与相应的声学对应物对齐，以确保无缝转换。此外，为了从视觉表示中生成逼真和连贯的语音，我们采用了一种流匹配模型，该模型估计从简单的先验分布到目标语音分布的直接轨迹。大量实验证明，我们的方法实现了出色的生成质量，可与真实话语媲美，并且在很大程度上优于现有方法。

更新时间: 2025-03-21 09:02:38

领域: eess.AS,cs.AI,cs.CV,cs.SD

下载: http://arxiv.org/abs/2503.16956v1

V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms

The recent exponential growth of Large Language Models (LLMs) has relied on GPU-based systems. However, CPUs are emerging as a flexible and lower-cost alternative, especially when targeting inference and reasoning workloads. RISC-V is rapidly gaining traction in this area, given its open and vendor-neutral ISA. However, the RISC-V hardware for LLM workloads and the corresponding software ecosystem are not fully mature and streamlined, given the requirement of domain-specific tuning. This paper aims at filling this gap, focusing on optimizing LLM inference on the Sophon SG2042, the first commercially available many-core RISC-V CPU with vector processing capabilities. On two recent state-of-the-art LLMs optimized for reasoning, DeepSeek R1 Distill Llama 8B and DeepSeek R1 Distill QWEN 14B, we achieve 4.32/2.29 token/s for token generation and 6.54/3.68 token/s for prompt processing, with a speed up of up 2.9x/3.0x compared to our baseline.

Updated: 2025-03-21 09:00:19

标题: V-Seek：在开放硬件服务器级RISC-V平台上加速LLM推理

摘要: 最近，大型语言模型（LLMs）的指数增长依赖于基于GPU的系统。然而，CPU正逐渐成为一种灵活且成本较低的替代选择，特别是在针对推理和推断工作负载时。由于其开放和供应商中立的ISA，RISC-V在这一领域迅速获得了关注。然而，针对LLM工作负载的RISC-V硬件和相应的软件生态系统尚未完全成熟和优化，因为需要特定领域的调整。本文旨在填补这一空白，重点优化LLM在Sophon SG2042上的推理，这是第一款具有矢量处理能力的商用多核RISC-V CPU。在针对推理进行优化的两个最新LLM模型DeepSeek R1 Distill Llama 8B和DeepSeek R1 Distill QWEN 14B上，我们实现了4.32/2.29个标记/秒的标记生成和6.54/3.68个标记/秒的提示处理，与我们的基准相比，速度提高了2.9倍/3.0倍。

更新时间: 2025-03-21 09:00:19

领域: cs.LG,cs.PF

下载: http://arxiv.org/abs/2503.17422v1

Neural-Guided Equation Discovery

Deep learning approaches are becoming increasingly attractive for equation discovery. We show the advantages and disadvantages of using neural-guided equation discovery by giving an overview of recent papers and the results of experiments using our modular equation discovery system MGMT ($\textbf{M}$ulti-Task $\textbf{G}$rammar-Guided $\textbf{M}$onte-Carlo $\textbf{T}$ree Search for Equation Discovery). The system uses neural-guided Monte-Carlo Tree Search (MCTS) and supports both supervised and reinforcement learning, with a search space defined by a context-free grammar. We summarize seven desirable properties of equation discovery systems, emphasizing the importance of embedding tabular data sets for such learning approaches. Using the modular structure of MGMT, we compare seven architectures (among them, RNNs, CNNs, and Transformers) for embedding tabular datasets on the auxiliary task of contrastive learning for tabular data sets on an equation discovery task. For almost all combinations of modules, supervised learning outperforms reinforcement learning. Moreover, our experiments indicate an advantage of using grammar rules as action space instead of tokens. Two adaptations of MCTS -- risk-seeking MCTS and AmEx-MCTS -- can improve equation discovery with that kind of search.

Updated: 2025-03-21 08:55:51

标题: 神经引导的方程式发现

摘要: 深度学习方法在方程式发现领域变得越来越具吸引力。我们通过概述最近的论文以及使用我们的模块化方程式发现系统MGMT（多任务语法导向蒙特卡洛树搜索方程式发现）的实验结果，展示了使用神经引导方程式发现的优势和劣势。该系统使用神经引导的蒙特卡洛树搜索（MCTS），支持监督学习和强化学习，并通过上下文无关语法定义搜索空间。我们总结了方程式发现系统的七个理想属性，强调将表格数据集嵌入到这种学习方法中的重要性。利用MGMT的模块化结构，我们比较了七种架构（包括RNN、CNN和Transformer）在表格数据集嵌入辅助对比学习任务上的性能，以用于方程式发现任务。在几乎所有模块组合中，监督学习胜过强化学习。此外，我们的实验表明，将语法规则作为动作空间而不是标记具有优势。两种修改的MCTS -- 寻求风险的MCTS和AmEx-MCTS -- 可以改进使用这种搜索方式的方程式发现。

更新时间: 2025-03-21 08:55:51

领域: cs.AI,I.2.6; I.1.1; G.3

下载: http://arxiv.org/abs/2503.16953v1

CleanStack: A New Dual-Stack for Defending Against Stack-Based Memory Corruption Attacks

Stack-based memory corruption vulnerabilities have long been exploited by attackers to execute arbitrary code or perform unauthorized memory operations. Various defense mechanisms have been introduced to mitigate stack memory errors, but they typically focus on specific attack types, incur substantial performance overhead, or suffer from compatibility limitations.In this paper, we present CleanStack, an efficient, highly compatible, and comprehensive stack protection mech anism. CleanStack isolates stack objects influenced by external input from other safe stack objects, thereby preventing attackers from modifying return addresses via controlled stack objects. Additionally, by randomizing the placement of tainted stack objects within the Unclean Stack, CleanStack mitigates non control data attacks by preventing attackers from predicting the stack layout.A key component of CleanStack is the identifica tion of tainted stack objects. We analyze both static program analysis and heuristic methods for this purpose. To maximize compatibility, we adopt a heuristic approach and implement CleanStack within the LLVM compiler framework, applying it to SPEC CPU2017 benchmarks and a real-world application.Our security evaluation demonstrates that CleanStack significantly reduces the exploitability of stack-based memory errors by providing a dual-stack system with isolation and randomization. Performance evaluation results indicate that CleanStack incurs an execution overhead of only 1.73% on the SPEC CPU2017 benchmark while introducing a minimal memory overhead of just 0.04%. Compared to existing stack protection techniques, CleanStack achieves an optimal balance between protection coverage, runtime overhead, and compatibility, making it one of the most comprehensive and efficient stack security solutions to date.

Updated: 2025-03-21 08:55:17

标题: CleanStack：一种新的双栈用于防御基于堆栈的内存破坏攻击

摘要: 基于堆栈的内存破坏漏洞长期以来一直被攻击者利用来执行任意代码或执行未经授权的内存操作。已经引入了各种防御机制来减轻堆栈内存错误，但它们通常专注于特定的攻击类型，会产生大量的性能开销，或者受到兼容性限制的影响。在本文中，我们提出了CleanStack，一个高效、高度兼容且全面的堆栈保护机制。CleanStack将受外部输入影响的堆栈对象与其他安全堆栈对象隔离，从而防止攻击者通过受控堆栈对象修改返回地址。此外，通过在不洁净堆栈内随机放置受污染的堆栈对象，CleanStack减轻了非控制数据攻击，防止攻击者预测堆栈布局。CleanStack的一个关键组成部分是识别受污染的堆栈对象。我们分析了静态程序分析和启发式方法用于此目的。为了最大限度地提高兼容性，我们采用了启发式方法，并在LLVM编译器框架中实现了CleanStack，将其应用于SPEC CPU2017基准测试和一个真实世界的应用程序。我们的安全评估表明，CleanStack通过提供具有隔离和随机化功能的双堆栈系统，显著降低了基于堆栈的内存错误的利用程度。性能评估结果表明，CleanStack在SPEC CPU2017基准测试中仅产生1.73%的执行开销，同时引入了仅为0.04%的最小内存开销。与现有的堆栈保护技术相比，CleanStack在保护范围、运行时开销和兼容性之间取得了最佳平衡，使其成为迄今为止最全面和高效的堆栈安全解决方案之一。

更新时间: 2025-03-21 08:55:17

领域: cs.CR

下载: http://arxiv.org/abs/2503.16950v1

Bypassing orthogonalization in the quantum DPP sampler

Given an $n\times r$ matrix $X$ of rank $r$, consider the problem of sampling $r$ integers $\mathtt{C}\subset \{1, \dots, n\}$ with probability proportional to the squared determinant of the rows of $X$ indexed by $\mathtt{C}$. The distribution of $\mathtt{C}$ is called a projection determinantal point process (DPP). The vanilla classical algorithm to sample a DPP works in two steps, an orthogonalization in $\mathcal{O}(nr^2)$ and a sampling step of the same cost. The bottleneck of recent quantum approaches to DPP sampling remains that preliminary orthogonalization step. For instance, (Kerenidis and Prakash, 2022) proposed an algorithm with the same $\mathcal{O}(nr^2)$ orthogonalization, followed by a $\mathcal{O}(nr)$ classical step to find the gates in a quantum circuit. The classical $\mathcal{O}(nr^2)$ orthogonalization thus still dominates the cost. Our first contribution is to reduce preprocessing to normalizing the columns of $X$, obtaining $\mathsf{X}$ in $\mathcal{O}(nr)$ classical operations. We show that a simple circuit inspired by the formalism of Kerenidis et al., 2022 samples a DPP of a type we had never encountered in applications, which is different from our target DPP. Plugging this circuit into a rejection sampling routine, we recover our target DPP after an expected $1/\det \mathsf{X}^\top\mathsf{X} = 1/a$ preparations of the quantum circuit. Using amplitude amplification, our second contribution is to boost the acceptance probability from $a$ to $1-a$ at the price of a circuit depth of $\mathcal{O}(r\log n/\sqrt{a})$ and $\mathcal{O}(\log n)$ extra qubits. Prepending a fast, sketching-based classical approximation of $a$, we obtain a pipeline to sample a projection DPP on a quantum computer, where the former $\mathcal{O}(nr^2)$ preprocessing bottleneck has been replaced by the $\mathcal{O}(nr)$ cost of normalizing the columns and the cost of our approximation of $a$.

Updated: 2025-03-21 08:46:34

标题: 绕过正交化的量子DPP采样器

摘要: 给定一个秩为$r$的$n\times r$矩阵$X$，考虑采样$r$个整数$\mathtt{C}\subset \{1, \dots, n\}$，其概率与由$\mathtt{C}$索引的$X$的行的行列式的平方成正比的问题。$\mathtt{C}$的分布被称为投影行列式点过程（DPP）。采样DPP的经典算法分为两步，一个在$\mathcal{O}(nr^2)$的正交化步骤和一个具有相同成本的采样步骤。最近量子方法中采样DPP的瓶颈仍然是初步的正交化步骤。例如，(Kerenidis and Prakash, 2022)提出了一个具有相同$\mathcal{O}(nr^2)$正交化的算法，然后是一个$\mathcal{O}(nr)$的经典步骤来找到量子电路中的门。因此，经典$\mathcal{O}(nr^2)$的正交化仍然主导成本。我们的第一个贡献是将预处理减少到对$X$的列进行归一化，获得$\mathsf{X}$在$\mathcal{O}(nr)$的经典操作中。我们展示了一个受Kerenidis等人，2022年形式主义启发的简单电路，可以采样我们在应用中从未遇到过的一种DPP类型，这种类型不同于我们的目标DPP。将这个电路插入一个拒绝采样例程中，我们在期望的$1/\det \mathsf{X}^\top\mathsf{X}=1/a$量子电路准备的情况下恢复我们的目标DPP。利用振幅放大，我们的第二个贡献是将接受概率从$a$提高到$1-a$，代价是电路深度为$\mathcal{O}(r\log n/\sqrt{a})$和额外的$\mathcal{O}(\log n)$个量子比特。在快速、基于草图的经典近似$a$之前，我们获得了在量子计算机上采样投影DPP的流水线，其中前一个$\mathcal{O}(nr^2)$的预处理瓶颈已被替换为对列进行归一化的$\mathcal{O}(nr)$成本和我们对$a$的近似成本。

更新时间: 2025-03-21 08:46:34

领域: quant-ph,cs.LG,stat.CO

下载: http://arxiv.org/abs/2503.05906v2

Model-free front-to-end training of a large high performance laser neural network

Artificial neural networks (ANNs), have become ubiquitous and revolutionized many applications ranging from computer vision to medical diagnoses. However, they offer a fundamentally connectionist and distributed approach to computing, in stark contrast to classical computers that use the von Neumann architecture. This distinction has sparked renewed interest in developing unconventional hardware to support more efficient implementations of ANNs, rather than merely emulating them on traditional systems. Photonics stands out as a particularly promising platform, providing scalability, high speed, energy efficiency, and the ability for parallel information processing. However, fully realized autonomous optical neural networks (ONNs) with in-situ learning capabilities are still rare. In this work, we demonstrate a fully autonomous and parallel ONN using a multimode vertical cavity surface emitting laser (VCSEL) using off-the-shelf components. Our ONN is highly efficient and is scalable both in network size and inference bandwidth towards the GHz range. High performance hardware-compatible optimization algorithms are necessary in order to minimize reliance on external von Neumann computers to fully exploit the potential of ONNs. As such we present and extensively study several algorithms which are broadly compatible with a wide range of systems. We then apply these algorithms to optimize our ONN, and benchmark them using the MNIST dataset. We show that our ONN can achieve high accuracy and convergence efficiency, even under limited hardware resources. Crucially, we compare these different algorithms in terms of scaling and optimization efficiency in term of convergence time which is crucial when working with limited external resources. Our work provides some guidance for the design of future ONNs as well as a simple and flexible way to train them.

Updated: 2025-03-21 08:43:02

标题: 无模型前向端到端训练大型高性能激光神经网络

摘要: 人工神经网络（ANNs）已经变得无处不在，并且革新了许多应用，从计算机视觉到医学诊断。然而，它们提供了一种基本上是连接主义和分布式计算方法，与使用冯·诺伊曼架构的经典计算机形成鲜明对比。这种区别引发了对开发非传统硬件以支持更高效实现ANNs的兴趣，而不仅仅是在传统系统上模拟它们。光子技术是一种特别有前途的平台，具有可扩展性、高速度、能量效率和并行信息处理能力。然而，具有现场学习能力的实现完全的光学神经网络（ONNs）仍然很少见。在这项工作中，我们展示了使用商用组件的多模垂直腔面发射激光器（VCSEL）构建的完全自主和并行ONN。我们的ONN非常高效，并且在网络规模和推理带宽方面具有可扩展性，可达到GHz范围。高性能硬件兼容的优化算法是必要的，以减少对外部冯·诺伊曼计算机的依赖，充分发挥ONNs的潜力。因此，我们提出并广泛研究了几种与各种系统兼容的算法。然后，我们应用这些算法来优化我们的ONN，并使用MNIST数据集进行基准测试。我们展示了我们的ONN可以在有限的硬件资源下实现高准确度和收敛效率。至关重要的是，我们比较了这些不同算法在尺度和收敛时间方面的优化效率，这在使用有限外部资源时至关重要。我们的研究为未来ONNs的设计提供了一些指导，并提供了一种简单灵活的训练方式。

更新时间: 2025-03-21 08:43:02

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2503.16943v1

Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates

Personalized services are central to today's digital landscape, where online decision-making is commonly formulated as contextual bandit problems. Two key challenges emerge in modern applications: high-dimensional covariates and the need for nonparametric models to capture complex reward-covariate relationships. We address these challenges by developing a contextual bandit algorithm based on sparse additive reward models in reproducing kernel Hilbert spaces. We establish statistical properties of the doubly penalized method applied to random regions, introducing novel analyses under bandit feedback. Our algorithm achieves sublinear cumulative regret over the time horizon $T$ while scaling logarithmically with covariate dimensionality $d$. Notably, we provide the first regret upper bound with logarithmic growth in $d$ for nonparametric contextual bandits with high-dimensional covariates. We also establish a lower bound, with the gap to the upper bound vanishing as smoothness increases. Extensive numerical experiments demonstrate our algorithm's superior performance in high-dimensional settings compared to existing approaches.

Updated: 2025-03-21 08:33:28

标题: 稀疏加法上下文臂：一种用于高维协变量在线决策的非参数方法

摘要: 个性化服务是当今数字领域的核心，其中在线决策通常被制定为上下文匪徒问题。现代应用中出现了两个关键挑战：高维协变量和需要非参数模型来捕捉复杂的奖励-协变量关系。我们通过在再生核希尔伯特空间中基于稀疏附加奖励模型开发了一种基于上下文匪徒的算法来解决这些挑战。我们建立了应用于随机区域的双重惩罚方法的统计特性，在匪徒反馈下引入了新颖的分析。我们的算法在时间范围T内实现了次线性的累积遗憾，同时与协变量维度d的对数比例。值得注意的是，我们为高维协变量的非参数上下文匪徒提供了首个遗憾上限，其在d上呈对数增长。我们还建立了一个下限，随着光滑度的增加，上限与下限之间的差距会消失。广泛的数值实验表明，与现有方法相比，我们的算法在高维环境中表现出优越性能。

更新时间: 2025-03-21 08:33:28

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2503.16941v1

On-Sensor Convolutional Neural Networks with Early-Exits

Tiny Machine Learning (TinyML) is a novel research field aiming at integrating Machine Learning (ML) within embedded devices with limited memory, computation, and energy. Recently, a new branch of TinyML has emerged, focusing on integrating ML directly into the sensors to further reduce the power consumption of embedded devices. Interestingly, despite their state-of-the-art performance in many tasks, none of the current solutions in the literature aims to optimize the implementation of Convolutional Neural Networks (CNNs) operating directly into sensors. In this paper, we introduce for the first time in the literature the optimized design and implementation of Depth-First CNNs operating on the Intelligent Sensor Processing Unit (ISPU) within an Inertial Measurement Unit (IMU) by STMicroelectronics. Our approach partitions the CNN between the ISPU and the microcontroller (MCU) and employs an Early-Exit mechanism to stop the computations on the IMU when enough confidence about the results is achieved, hence significantly reducing power consumption. When using a NUCLEO-F411RE board, this solution achieved an average current consumption of 4.8 mA, marking an 11% reduction compared to the regular inference pipeline on the MCU, while having equal accuracy.

Updated: 2025-03-21 08:31:07

标题: 具有早期退出的传感器卷积神经网络

摘要: Winzige Maschinenlernen (TinyML) ist ein neuartiges Forschungsfeld, das darauf abzielt, maschinelles Lernen (ML) in eingebetteten Geräten mit begrenztem Speicher, Rechenleistung und Energie zu integrieren. Kürzlich ist ein neuer Zweig von TinyML entstanden, der sich darauf konzentriert, ML direkt in die Sensoren zu integrieren, um den Stromverbrauch von eingebetteten Geräten weiter zu reduzieren. Interessanterweise zielen trotz ihrer state-of-the-art Leistung in vielen Aufgaben keine der aktuellen Lösungen in der Literatur darauf ab, die Implementierung von Convolutional Neural Networks (CNNs) zu optimieren, die direkt in Sensoren arbeiten. In diesem Artikel stellen wir erstmals in der Literatur das optimierte Design und die Implementierung von Depth-First CNNs vor, die auf der Intelligent Sensor Processing Unit (ISPU) innerhalb einer Inertial Measurement Unit (IMU) von STMicroelectronics arbeiten. Unser Ansatz partitioniert das CNN zwischen der ISPU und dem Mikrocontroller (MCU) und verwendet einen Early-Exit-Mechanismus, um die Berechnungen auf der IMU zu stoppen, wenn genügend Vertrauen in die Ergebnisse erreicht ist und somit den Stromverbrauch signifikant zu reduzieren. Bei Verwendung eines NUCLEO-F411RE-Boards erreichte diese Lösung einen durchschnittlichen Stromverbrauch von 4,8 mA, was im Vergleich zum regulären Inferenzpipeline auf dem MCU eine Reduzierung um 11% darstellt, während die Genauigkeit gleich bleibt.

更新时间: 2025-03-21 08:31:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.16939v1

Interpretable Machine Learning for Oral Lesion Diagnosis through Prototypical Instances Identification

Decision-making processes in healthcare can be highly complex and challenging. Machine Learning tools offer significant potential to assist in these processes. However, many current methodologies rely on complex models that are not easily interpretable by experts. This underscores the need to develop interpretable models that can provide meaningful support in clinical decision-making. When approaching such tasks, humans typically compare the situation at hand to a few key examples and representative cases imprinted in their memory. Using an approach which selects such exemplary cases and grounds its predictions on them could contribute to obtaining high-performing interpretable solutions to such problems. To this end, we evaluate PivotTree, an interpretable prototype selection model, on an oral lesion detection problem, specifically trying to detect the presence of neoplastic, aphthous and traumatic ulcerated lesions from oral cavity images. We demonstrate the efficacy of using such method in terms of performance and offer a qualitative and quantitative comparison between exemplary cases and ground-truth prototypes selected by experts.

Updated: 2025-03-21 08:25:32

标题: 口腔病变诊断的可解释机器学习：通过典型实例识别

摘要: 卫生保健领域的决策过程可能非常复杂和具有挑战性。机器学习工具提供了在这些过程中提供帮助的显著潜力。然而，许多当前的方法依赖于专家难以解释的复杂模型。这凸显了开发可以在临床决策中提供有意义支持的可解释模型的需求。在处理这类任务时，人类通常将手头的情况与存储在记忆中的一些关键例子和代表性案例进行比较。采用选择这些典型案例并基于它们进行预测的方法可以有助于获得解决这类问题的高性能可解释解决方案。为此，我们评估了PivotTree，一个可解释的原型选择模型，在口腔病变检测问题上的应用，具体尝试从口腔图像中检测出恶性、口腔溃疡和外伤性溃疡病变的存在。我们展示了使用这种方法在性能方面的有效性，并提供了专家选择的典型案例和地面真实原型之间的定性和定量比较。

更新时间: 2025-03-21 08:25:32

领域: cs.AI

下载: http://arxiv.org/abs/2503.16938v1

Network reconstruction via the minimum description length principle

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network with a statistically justifiable number of edges. The status quo in this context is based on $L_{1}$ regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity with weight "shrinkage". This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster to employ, as it requires a single fit to the complete data. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of edges to be known in advance. We also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving in the order of $10^{4}$ to $10^{5}$ species, and demonstrate how the inferred model can be used to predict the outcome of interventions in the system.

Updated: 2025-03-21 08:18:30

标题: 通过最小描述长度原则进行网络重建

摘要: 从动态或行为数据重建网络的基本问题是以一种防止过拟合并产生具有统计合理边数的推断网络的方式确定最适当的模型复杂性。在这种情况下，现状是基于结合交叉验证的$L_{1}$正则化。然而，除了其高计算成本外，这种普遍方法不必要地将稀疏性的提升与权重“缩减”联系起来。这种组合强迫在缩减引入的偏差和网络稀疏性之间进行权衡，这往往会导致即使经过交叉验证后也会出现相当严重的过拟合。在这项工作中，我们提出了一种基于分层贝叶斯推断和权重量化的替代非参数正则化方案，该方案不依赖于权重缩减来促进稀疏性。我们的方法遵循最小描述长度（MDL）原则，揭示了允许数据最大压缩的权重分布，从而避免过拟合而不需要交叉验证。后一属性使我们的方法在使用时更快，因为它只需要对完整数据进行一次拟合。因此，我们拥有一个原则性和高效的推断方案，可以与各种生成模型一起使用，而无需提前知道边数。我们还证明了我们的方案在重建人工和经验网络方面的准确性得到了系统性增加。我们强调了我们的方法在涉及$10^{4}$到$10^{5}$种微生物群落大规模丰度样本之间的相互作用网络重建中的应用，并演示了推断模型如何用于预测系统中干预的结果。

更新时间: 2025-03-21 08:18:30

领域: stat.ML,cs.LG,cs.SI,physics.data-an,q-bio.PE

下载: http://arxiv.org/abs/2405.01015v3

Rude Humans and Vengeful Robots: Examining Human Perceptions of Robot Retaliatory Intentions in Professional Settings

Humans and robots are increasingly working in personal and professional settings. In workplace settings, humans and robots may work together as colleagues, potentially leading to social expectations, or violation thereof. Extant research has primarily sought to understand social interactions and expectations in personal rather than professional settings, and none of these studies have examined negative outcomes arising from violations of social expectations. This paper reports the results of a 2x3 online experiment that used a unique first-person perspective video to immerse participants in a collaborative workplace setting. The results are nuanced and reveal that while robots are expected to act in accordance with social expectations despite human behavior, there are benefits for robots perceived as being the bigger person in the face of human rudeness. Theoretical and practical implications are provided which discuss the import of these findings for the design of social robots.

Updated: 2025-03-21 08:12:40

标题: 粗鲁的人类和报复性的机器人：探究人类对机器人在专业环境中报复意图的看法

摘要: 人类和机器人越来越多地在个人和专业环境中工作。在工作环境中，人类和机器人可能作为同事一起工作，可能会导致社会期望或其违反。现有研究主要旨在理解个人而非专业环境中的社会互动和期望，而这些研究中没有一项研究考虑到违反社会期望所引发的负面结果。本文报告了一项2x3在线实验的结果，该实验使用了独特的第一人称视角视频，让参与者沉浸在一个合作的工作环境中。结果是微妙的，并显示出，尽管人类行为，机器人被期望按照社会期望行事，但对于被认为是在面对人类粗鲁时表现更大度的机器人是有益的。文中提供了理论和实践意义，讨论了这些发现对社交机器人设计的重要性。

更新时间: 2025-03-21 08:12:40

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.16932v1

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to improve their reasoning capabilities on complex tasks. This enables them to act as intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2023] utilizes the depth-first search-based decision tree (DFSDT) mechanism for multi-step reasoning with $16000+$ real-world APIs, effectively enhancing the performance of tool-augmented LLMs compared to traditional chain reasoning mechanisms. However, their approach only employs successful paths from decision trees (also called inference trees) for supervised fine-tuning (SFT), missing out on the potential learning opportunities from failed paths. Inspired by this, we propose an inference trajectory optimization framework based on preference learning to address this limitation. We first introduce a novel method for constructing step-wise preference data from tree-like expert trajectories, which leverages the previously ignored failed explorations in the decision trees. In the subsequent training phase, we first fine-tune the LLM with successful tool-usage expert trajectories and then apply direct preference optimization (DPO) with the preference data to update the LLM's policy, resulting in our ToolPrefer-LLaMA (TP-LLaMA) model. This approach not only enhances the utilization of original expert data but also broadens the learning space of the model. Our experiments demonstrate that by obtaining insights from errors in inference trees, TP-LLaMA significantly outperforms the baselines across almost all test scenarios by a large margin and exhibits better generalization capabilities with unseen APIs. At the same time, TP-LLaMA has also demonstrated superior reasoning efficiency compared to the baselines, making it more suitable for complex tool-usage reasoning tasks.

Updated: 2025-03-21 08:12:07

标题: 推进工具增强型大型语言模型：整合来自推理树错误的见解

摘要: 工具增强的大型语言模型（LLMs）利用工具，通常以API的形式，来提高它们在复杂任务上的推理能力。这使它们能够作为智能代理与真实世界进行交互。秦等人于2023年引入的ToolLLaMA模型利用基于深度优先搜索的决策树（DFSDT）机制进行多步推理，使用16000多个真实世界的API，有效地提升了与传统链式推理机制相比的工具增强的LLMs的性能。然而，他们的方法仅使用决策树（也称为推理树）中的成功路径进行监督微调（SFT），而忽略了来自失败路径的潜在学习机会。受此启发，我们提出了一个基于偏好学习的推理轨迹优化框架来解决这个限制。我们首先介绍了一种新方法，从类似树状的专家轨迹构建逐步偏好数据，利用了之前被忽略的决策树中的失败探索。在随后的训练阶段，我们首先用成功的工具使用专家轨迹微调LLM，然后应用直接偏好优化（DPO）与偏好数据来更新LLM的策略，从而形成我们的ToolPrefer-LLaMA（TP-LLaMA）模型。这种方法不仅增强了原始专家数据的利用，还扩展了模型的学习空间。我们的实验表明，通过从推理树中的错误中获取见解，TP-LLaMA在几乎所有测试场景中都显著优于基准线，并且在使用未见过的API时表现出更好的泛化能力。与基准线相比，TP-LLaMA还表现出更优越的推理效率，使其更适用于复杂的工具使用推理任务。

更新时间: 2025-03-21 08:12:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07115v2

A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees

Robotic systems often face execution failures due to unexpected obstacles, sensor errors, or environmental changes. Traditional failure recovery methods rely on predefined strategies or human intervention, making them less adaptable. This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution by verifying existing BT conditions, adding missing preconditions and, when necessary, generating new skills. The framework uses a scene graph for structured environmental perception and an execution history for continuous monitoring, enabling context-aware and adaptive failure handling. We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement, as well as in AI2-THOR simulator. Compared to using pre-execution and reactive methods separately, our approach achieves higher task success rates and greater adaptability. Ablation studies highlight the importance of VLM-based reasoning, structured scene representation, and execution history tracking for effective failure recovery in robotics.

Updated: 2025-03-21 08:10:48

标题: 一个统一的框架：利用视觉-语言模型、反应式规划器和行为树在机器人中实时处理故障

摘要: 机器人系统经常面临由于意外障碍物、传感器误差或环境变化而导致的执行失败。传统的故障恢复方法依赖于预定义的策略或人为干预，使其缺乏适应性。本文提出了一个统一的故障恢复框架，结合了视觉语言模型（VLMs）、反应式规划器和行为树（BTs），以实现实时故障处理。我们的方法包括预执行验证，检查执行前的潜在故障，并采用反应式故障处理，通过验证现有的BT条件、添加缺失的先决条件，并在必要时生成新的技能来检测和纠正执行过程中的故障。该框架使用场景图进行结构化环境感知，使用执行历史进行持续监控，实现上下文感知和自适应故障处理。我们通过在ABB YuMi机器人上进行如插销、物体分类和抽屉放置等任务的真实世界实验以及在AI2-THOR模拟器中进行评估我们的框架。与单独使用预执行和反应式方法相比，我们的方法实现了更高的任务成功率和更大的适应性。消融研究突出了基于VLM的推理、结构化场景表示和执行历史跟踪对于机器人技术中有效的故障恢复的重要性。

更新时间: 2025-03-21 08:10:48

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.15202v2

TEMPO: Temporal Preference Optimization of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment

Video Large Language Models (Video LLMs) have achieved significant success by leveraging a two-stage paradigm: pretraining on large-scale video-text data for vision-language alignment, followed by supervised fine-tuning (SFT) for task-specific capabilities. However, existing approaches struggle with temporal reasoning due to weak temporal correspondence in the data and reliance on the next-token prediction paradigm during training. To address these limitations, we propose TEMPO (TEMporal Preference Optimization), a systematic framework that enhances Video LLMs' temporal reasoning capabilities through Direct Preference Optimization (DPO). To facilitate this, we introduce an automated preference data generation pipeline that systematically constructs preference pairs by selecting videos that are rich in temporal information, designing video-specific perturbation strategies, and finally evaluating model responses on clean and perturbed video inputs. Our temporal alignment features two key innovations: curriculum learning which that progressively increases perturbation difficulty to improve model robustness and adaptability; and ``Pre-SFT Alignment'', applying preference optimization before instruction tuning to prioritize fine-grained temporal comprehension. Extensive experiments demonstrate that our approach consistently improves Video LLM performance across multiple benchmarks with a relatively small set of self-generated DPO data. We further analyze the transferability of DPO data across architectures and the role of difficulty scheduling in optimization. Our findings highlight our TEMPO as a scalable and efficient complement to SFT-based methods, paving the way for developing reliable Video LLMs.

Updated: 2025-03-21 08:00:29

标题: TEMPO:通过困难调度和预SFT对视频LLMs的时间偏好优化

摘要: 视频大型语言模型（Video LLMs）通过利用两阶段范式取得了显著的成功：在大规模视频文本数据上进行预训练以实现视觉语言对齐，然后通过监督微调（SFT）来实现特定任务能力。然而，现有方法在时间推理方面存在困难，因为数据中存在弱时间对应性，并且在训练过程中依赖于下一个标记预测范式。为了解决这些限制，我们提出了TEMPO（TEMporal Preference Optimization），这是一个通过直接偏好优化（DPO）增强Video LLMs时间推理能力的系统框架。为了实现这一点，我们引入了一个自动化偏好数据生成流水线，通过选择丰富于时间信息的视频，设计视频特定的扰动策略，并最终评估模型在清洁和扰动视频输入上的响应，系统地构建偏好对。我们的时间对齐具有两个关键创新：课程学习逐渐增加扰动难度以提高模型的鲁棒性和适应性；以及“Pre-SFT Alignment”，在微调指导之前应用偏好优化，以优先考虑细粒度时间理解。广泛的实验证明，我们的方法通过相对较小数量的自动生成的DPO数据，在多个基准测试中持续改进了Video LLM的性能。我们进一步分析了DPO数据在架构之间的可转移性以及优化中困难调度的作用。我们的研究结果突出了TEMPO作为SFT方法的可扩展和高效补充，为开发可靠的Video LLMs铺平了道路。

更新时间: 2025-03-21 08:00:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16929v1

Neuromorphic Attitude Estimation and Control

The real-world application of small drones is mostly hampered by energy limitations. Neuromorphic computing promises extremely energy-efficient AI for autonomous flight but is still challenging to train and deploy on real robots. To reap the maximal benefits from neuromorphic computing, it is necessary to perform all autonomy functions end-to-end on a single neuromorphic chip, from low-level attitude control to high-level navigation. This research presents the first neuromorphic control system using a spiking neural network (SNN) to effectively map a drone's raw sensory input directly to motor commands. We apply this method to low-level attitude estimation and control for a quadrotor, deploying the SNN on a tiny Crazyflie. We propose a modular SNN, separately training and then merging estimation and control sub-networks. The SNN is trained with imitation learning, using a flight dataset of sensory-motor pairs. Post-training, the network is deployed on the Crazyflie, issuing control commands from sensor inputs at 500Hz. Furthermore, for the training procedure we augmented training data by flying a controller with additional excitation and time-shifting the target data to enhance the predictive capabilities of the SNN. On the real drone, the perception-to-control SNN tracks attitude commands with an average error of 3.0 degrees, compared to 2.7 degrees for the regular flight stack. We also show the benefits of the proposed learning modifications for reducing the average tracking error and reducing oscillations. Our work shows the feasibility of performing neuromorphic end-to-end control, laying the basis for highly energy-efficient and low-latency neuromorphic autopilots.

Updated: 2025-03-21 07:57:38

标题: 神经形态态度估计与控制

摘要: 小型无人机在现实世界中的应用主要受到能量限制的影响。神经形态计算承诺为自主飞行提供极其节能的人工智能，但在真实机器人上训练和部署仍具有挑战性。为了充分利用神经形态计算的最大优势，需要在单个神经形态芯片上执行所有自主功能，从低级姿态控制到高级导航。本研究提出了第一个使用尖峰神经网络（SNN）的神经形态控制系统，有效地将无人机的原始感知输入直接映射到电机命令。我们将这种方法应用于四轴飞行器的低级姿态估计和控制，在小型Crazyflie上部署了SNN。我们提出了一个模块化的SNN，分别训练然后合并估计和控制子网络。SNN通过模仿学习进行训练，使用飞行数据集的感知-动作对。在训练后，网络被部署到Crazyflie上，从传感器输入发出控制命令，频率为500Hz。此外，在训练过程中，我们通过使用额外的激励飞行控制器和将目标数据进行时间转移来增强SNN的预测能力，扩充了训练数据。在真实的无人机上，感知-控制SNN以平均误差3.0度跟踪姿态命令，而常规飞行堆栈的误差为2.7度。我们还展示了提出的学习修改对减少平均跟踪误差和减少振荡的好处。我们的工作展示了执行神经形态端到端控制的可行性，为高度节能和低延迟的神经形态自动驾驶仪奠定了基础。

更新时间: 2025-03-21 07:57:38

领域: cs.RO,cs.LG,cs.NE

下载: http://arxiv.org/abs/2411.13945v2

MerGen: Micro-electrode recording synthesis using a generative data-driven approach

The analysis of electrophysiological data is crucial for certain surgical procedures such as deep brain stimulation, which has been adopted for the treatment of a variety of neurological disorders. During the procedure, auditory analysis of these signals helps the clinical team to infer the neuroanatomical location of the stimulation electrode and thus optimize clinical outcomes. This task is complex, and requires an expert who in turn requires significant training. In this paper, we propose a generative neural network, called MerGen, capable of simulating de novo electrophysiological recordings, with a view to providing a realistic learning tool for clinicians trainees for identifying these signals. We demonstrate that the generated signals are perceptually indistinguishable from real signals by experts in the field, and that it is even possible to condition the generation efficiently to provide a didactic simulator adapted to a particular surgical scenario. The efficacy of this conditioning is demonstrated, comparing it to intra-observer and inter-observer variability amongst experts. We also demonstrate the use of this network for data augmentation for automatic signal classification which can play a role in decision-making support in the operating theatre.

Updated: 2025-03-21 07:54:29

标题: MerGen：使用生成式数据驱动方法的微电极记录综合

摘要: 电生理数据分析对于某些手术程序至关重要，比如深部脑刺激术，该术式已被用于治疗各种神经系统疾病。在手术过程中，对这些信号的听觉分析帮助临床团队推断刺激电极的神经解剖位置，从而优化临床结果。这项任务复杂，需要专家进行，而专家又需要经过重要的训练。在本文中，我们提出了一种生成式神经网络，称为MerGen，能够模拟全新的电生理记录，为临床实习生提供一个真实的学习工具，帮助他们识别这些信号。我们证明生成的信号在专家眼中与真实信号无法区分，甚至可以有效地进行生成条件，提供一个适用于特定手术场景的教学模拟器。我们展示了这种条件化的有效性，将其与专家间的内观者和外观者差异进行对比。我们还展示了这个网络用于自动信号分类数据增强的用途，这可以在手术室中对决策支持起到作用。

更新时间: 2025-03-21 07:54:29

领域: cs.LG

下载: http://arxiv.org/abs/2503.16928v1

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomization and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios. More information and demos can be found at: https://pku-epic.github.io/GAPartManip/.

Updated: 2025-03-21 07:52:16

标题: GAPartManip：一个用于材料不可知关节物体操作的大规模部件中心数据集

摘要: 在家庭场景中有效地操作关节对象是实现通用具有体现的人工智能的关键一步。主流的3D视觉研究主要集中在通过深度感知和姿势检测进行操作。然而，在现实环境中，这些方法通常面临挑战，因为深度感知不完善，例如透明盖子和反光把手。此外，它们通常缺乏灵活和适应性操作所需的基于部分的互动多样性。为了解决这些挑战，我们引入了一个大规模的基于部分的关节对象操作数据集，其中包括逼真材料随机化和场景级别可操作互动姿势的详细注释。我们通过将其与几种最先进的深度估计和互动姿势预测方法集成，评估了我们数据集的有效性。另外，我们提出了一个新颖的模块化框架，为通用关节对象操作提供了卓越和稳健的性能。我们的广泛实验表明，我们的数据集显著提高了深度感知和可操作互动姿势预测在模拟和现实场景中的性能。更多信息和演示可以在以下网址找到：https://pku-epic.github.io/GAPartManip/。

更新时间: 2025-03-21 07:52:16

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.18276v2

Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment

The rapid growth of scholarly submissions has overwhelmed traditional peer review systems, driving the need for intelligent automation to preserve scientific rigor. While large language models (LLMs) show promise in automating manuscript critiques, their ability to synthesize high-stakes meta-reviews, which require conflict-aware reasoning and consensus derivation, remains underdeveloped. Existing methods fail to effectively handle conflicting viewpoints within differing opinions, and often introduce additional cognitive biases, such as anchoring effects and conformity bias.To overcome these limitations, we propose the Cognitive Alignment Framework (CAF), a dual-process architecture that transforms LLMs into adaptive scientific arbitrators. By operationalizing Kahneman's dual-process theory, CAF introduces a three-step cognitive pipeline: review initialization, incremental integration, and cognitive alignment.Empirical validation shows that CAF outperforms existing LLM-based methods, with sentiment consistency gains reaching up to 19.47\% and content consistency improving by as much as 12.95\%.

Updated: 2025-03-21 07:36:18

标题: 建立社会心理学与LLM推理的桥梁：通过认知对齐实现冲突感知元回顾生成

摘要: 学术投稿的快速增长已经超出了传统的同行评审系统的承受范围，这驱使智能自动化的需求以保持科学严谨性。虽然大型语言模型(LLMs)在自动化手稿评论方面表现出潜力，但它们在综合高风险的元评论方面的能力，这需要冲突意识的推理和共识派生，仍然不够完善。现有的方法未能有效处理不同意见中的矛盾观点，通常引入额外的认知偏见，如锚定效应和一致性偏见。为了克服这些局限性，我们提出了认知对齐框架(CAF)，这是一种双过程架构，将LLMs转化为自适应的科学仲裁者。通过操作化卡尼曼的双过程理论，CAF引入了一个三步认知管道：审查初始化、逐步整合和认知对齐。经验证实验证明，CAF优于现有的基于LLM的方法，情感一致性的增益达到了高达19.47\%，内容一致性提高了多达12.95\%。

更新时间: 2025-03-21 07:36:18

领域: cs.AI

下载: http://arxiv.org/abs/2503.13879v2

RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation

Large Language Models (LLMs) have become pivotal tools for automating code generation in software development. However, these models face significant challenges in producing version-aware code for rapidly evolving languages like Rust, where frequent Application Programming Interfaces (API) changes across versions lead to compatibility issues and correctness errors. Existing benchmarks lack systematic evaluation of how models navigate API transitions, relying on labor-intensive manual curation and offering limited version-specific insights. To address this gap, we present RustEvo, a novel framework for constructing dynamic benchmarks that evaluate the ability of LLMs to adapt to evolving Rust APIs. RustEvo automates dataset creation by synthesizing 588 API changes (380 from Rust standard libraries, 208 from 15 third-party crates) into programming tasks mirroring real-world challenges. These tasks cover four API evolution categories: Stabilizations, Signature Changes, Behavioral Changes, and Deprecations, reflecting their actual distribution in the Rust ecosystem. Experiments on state-of-the-art (SOTA) LLMs reveal significant performance variations: models achieve a 65.8% average success rate on stabilized APIs but only 38.0% on behavioral changes, highlighting difficulties in detecting semantic shifts without signature alterations. Knowledge cutoff dates strongly influence performance, with models scoring 56.1% on before-cutoff APIs versus 32.5% on after-cutoff tasks. Retrieval-Augmented Generation (RAG) mitigates this gap, improving success rates by 13.5% on average for APIs released after model training. Our findings underscore the necessity of our evolution-aware benchmarks to advance the adaptability of LLMs in fast-paced software ecosystems. The framework and the benchmarks are publicly released at https://github.com/SYSUSELab/RustEvo.

Updated: 2025-03-21 07:33:59

标题: RustEvo^2：基于LLM的Rust代码生成中API演进的演化基准

摘要: 大型语言模型（LLMs）已成为自动化软件开发中代码生成的关键工具。然而，这些模型在为像Rust这样快速发展的语言生成版本感知代码时面临着重大挑战，Rust经常发生的应用程序编程接口（API）变化导致兼容性问题和正确性错误。现有的基准测试缺乏对模型如何在API转换中导航的系统性评估，依赖于繁重的手工策划并提供有限的特定版本见解。为了填补这一空白，我们提出了RustEvo，一个新颖的框架，用于构建动态基准测试，评估LLMs适应不断发展的Rust API的能力。RustEvo通过合成588个API更改（来自Rust标准库的380个，来自15个第三方箱的208个）来自动化数据集创建，形成反映现实挑战的编程任务。这些任务涵盖了四个API演变类别：稳定性、签名更改、行为更改和弃用，反映了它们在Rust生态系统中的实际分布。对最先进（SOTA）的LLMs进行实验揭示了显著的性能差异：模型在稳定的API上平均成功率为65.8%，但在行为更改上仅为38.0%，突出了在没有签名更改的情况下难以检测语义变化的困难。知识截止日期强烈影响性能，模型在截止日期之前的API上得分为56.1%，而在截止日期之后的任务上为32.5%。检索增强生成（RAG）缓解了这一差距，平均提高了对模型训练后发布的API的成功率13.5%。我们的发现强调了我们的演进感知基准的必要性，以推动LLMs在快节奏软件生态系统中的适应能力。该框架和基准已在https://github.com/SYSUSELab/RustEvo上公开发布。

更新时间: 2025-03-21 07:33:59

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.16922v1

When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO

In recent years, the field of image generation has witnessed significant advancements, particularly in fine-tuning methods that align models with universal human preferences. This paper explores the critical role of preference data in the training process of diffusion models, particularly in the context of Diffusion-DPO and its subsequent adaptations. We investigate the complexities surrounding universal human preferences in image generation, highlighting the subjective nature of these preferences and the challenges posed by minority samples in preference datasets. Through pilot experiments, we demonstrate the existence of minority samples and their detrimental effects on model performance. We propose Adaptive-DPO -- a novel approach that incorporates a minority-instance-aware metric into the DPO objective. This metric, which includes intra-annotator confidence and inter-annotator stability, distinguishes between majority and minority samples. We introduce an Adaptive-DPO loss function which improves the DPO loss in two ways: enhancing the model's learning of majority labels while mitigating the negative impact of minority samples. Our experiments demonstrate that this method effectively handles both synthetic minority data and real-world preference data, paving the way for more effective training methodologies in image generation tasks.

Updated: 2025-03-21 07:33:44

标题: 当偏好不同：将扩散模型与少数族裔意识自适应DPO对齐

摘要: 最近几年，图像生成领域取得了重大进展，特别是在微调方法方面，使模型与普遍人类偏好相一致。本文探讨了偏好数据在扩散模型训练过程中的关键作用，特别是在Diffusion-DPO及其后续适应情况下。我们调查了图像生成中普遍人类偏好的复杂性，突出了这些偏好的主观性以及偏好数据集中少数样本带来的挑战。通过试验，我们展示了少数样本的存在及其对模型性能的有害影响。我们提出了自适应-DPO——一种将少数实例感知度量融入DPO目标的新方法。这个度量包括内部注释者置信度和注释者间稳定性，区分了多数样本和少数样本。我们引入了一个自适应-DPO损失函数，以两种方式改进DPO损失：增强模型对多数标签的学习，同时减轻少数样本的负面影响。我们的实验表明，这种方法有效处理了合成少数数据和真实偏好数据，为图像生成任务中更有效的训练方法铺平了道路。

更新时间: 2025-03-21 07:33:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16921v1

Aligning Text to Image in Diffusion Models is Easier Than You Think

While recent advancements in generative modeling have significantly improved text-image alignment, some residual misalignment between text and image representations still remains. Although many approaches have attempted to address this issue by fine-tuning models using various reward models, etc., we revisit the challenge from the perspective of representation alignment-an approach that has gained popularity with the success of REPresentation Alignment (REPA). We first argue that conventional text-to-image (T2I) diffusion models, typically trained on paired image and text data (i.e., positive pairs) by minimizing score matching or flow matching losses, is suboptimal from the standpoint of representation alignment. Instead, a better alignment can be achieved through contrastive learning that leverages both positive and negative pairs. To achieve this efficiently even with pretrained models, we introduce a lightweight contrastive fine tuning strategy called SoftREPA that uses soft text tokens. This approach improves alignment with minimal computational overhead by adding fewer than 1M trainable parameters to the pretrained model. Our theoretical analysis demonstrates that our method explicitly increases the mutual information between text and image representations, leading to enhanced semantic consistency. Experimental results across text-to-image generation and text-guided image editing tasks validate the effectiveness of our approach in improving the semantic consistency of T2I generative models.

Updated: 2025-03-21 07:28:43

标题: 将文本与图像在扩散模型中对齐比你想象的要容易

摘要: 最近生成建模的进展显著改善了文本-图像对齐，但文本和图像表示之间仍然存在一些残余的不对齐。虽然许多方法已尝试通过微调模型使用各种奖励模型等来解决这个问题，但我们重新从表示对齐的角度重新审视这一挑战-这一方法已经因REP表示对齐（REPA）的成功而受到欢迎。我们首先认为，传统的文本到图像（T2I）扩散模型，通常通过最小化得分匹配或流匹配损失在成对图像和文本数据（即正例对）上训练，从表示对齐的角度来看并不是最佳的。相反，更好的对齐可以通过对比学习来实现，利用正例和负例对。为了即使使用预训练模型也能高效实现这一点，我们引入了一种轻量级对比微调策略，称为SoftREPA，它使用软文本标记。这种方法通过向预训练模型添加少于1M个可训练参数来提高对齐，在计算开销最小化的情况下。我们的理论分析表明，我们的方法明确增加了文本和图像表示之间的互信息，从而提高了语义一致性。在文本到图像生成和文本引导的图像编辑任务中的实验结果验证了我们的方法在改善T2I生成模型的语义一致性方面的有效性。

更新时间: 2025-03-21 07:28:43

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.08250v3

Malliavin-Bismut Score-based Diffusion Models

We introduce a new framework that employs Malliavin calculus to derive explicit expressions for the score function -- i.e., the gradient of the log-density -- associated with solutions to stochastic differential equations (SDEs). Our approach integrates classical integration-by-parts techniques with modern tools, such as Bismut's formula and Malliavin calculus, to address linear and nonlinear SDEs. In doing so, we establish a rigorous connection between the Malliavin derivative, its adjoint (the Malliavin divergence or the Skorokhod integral), Bismut's formula, and diffusion generative models, thus providing a systematic method for computing $\nabla \log p_t(x)$. For the linear case, we present a detailed study proving that our formula is equivalent to the actual score function derived from the solution of the Fokker--Planck equation for linear SDEs. Additionally, we derive a closed-form expression for $\nabla \log p_t(x)$ for nonlinear SDEs with state-independent diffusion coefficients. These advancements provide fresh theoretical insights into the smoothness and structure of probability densities and practical implications for score-based generative modelling, including the design and analysis of new diffusion models. Moreover, our findings promote the adoption of the robust Malliavin calculus framework in machine learning research. These results directly apply to various pure and applied mathematics fields, such as generative modelling, the study of SDEs driven by fractional Brownian motion, and the Fokker--Planck equations associated with nonlinear SDEs.

Updated: 2025-03-21 07:27:10

标题: Malliavin-Bismut评分基础扩散模型

摘要: 我们引入了一个新的框架，利用Malliavin微积分来推导与随机微分方程（SDEs）解相关的得分函数的显式表达式，即log密度的梯度。我们的方法将传统的分部积分技术与现代工具（如Bismut的公式和Malliavin微积分）结合起来，以处理线性和非线性SDEs。通过这样做，我们建立了Malliavin导数及其伴随物（Malliavin散度或Skorokhod积分）、Bismut公式和扩散生成模型之间的严格联系，从而提供了一个计算$\nabla \log p_t(x)$的系统方法。对于线性情况，我们进行了详细研究，证明了我们的公式等价于从线性SDEs的Fokker-Planck方程的解导出的实际得分函数。此外，我们推导了具有状态无关扩散系数的非线性SDEs的$\nabla \log p_t(x)$的闭合表达式。这些进展为概率密度的平滑性和结构提供了新的理论洞察，对基于得分的生成建模具有实际意义，包括设计和分析新的扩散模型。此外，我们的发现促进了在机器学习研究中采用强大的Malliavin微积分框架。这些结果直接适用于各种纯数学和应用数学领域，如生成建模、由分数布朗运动驱动的SDEs的研究以及与非线性SDEs相关的Fokker-Planck方程。

更新时间: 2025-03-21 07:27:10

领域: cs.LG,math.PR

下载: http://arxiv.org/abs/2503.16917v1

Understanding Social Support Needs in Questions: A Hybrid Approach Integrating Semi-Supervised Learning and LLM-based Data Augmentation

Patients are increasingly turning to online health Q&A communities for social support to improve their well-being. However, when this support received does not align with their specific needs, it may prove ineffective or even detrimental. This necessitates a model capable of identifying the social support needs in questions. However, training such a model is challenging due to the scarcity and class imbalance issues of labeled data. To overcome these challenges, we follow the computational design science paradigm to develop a novel framework, Hybrid Approach for SOcial Support need classification (HA-SOS). HA-SOS integrates an answer-enhanced semi-supervised learning approach, a text data augmentation technique leveraging large language models (LLMs) with reliability- and diversity-aware sample selection mechanism, and a unified training process to automatically label social support needs in questions. Extensive empirical evaluations demonstrate that HA-SOS significantly outperforms existing question classification models and alternative semi-supervised learning approaches. This research contributes to the literature on social support, question classification, semi-supervised learning, and text data augmentation. In practice, our HA-SOS framework facilitates online Q&A platform managers and answerers to better understand users' social support needs, enabling them to provide timely, personalized answers and interventions.

Updated: 2025-03-21 07:25:16

标题: 理解情感支持需求中的问题：集成半监督学习和基于LLM的数据增强的混合方法

摘要: 患者越来越倾向于在线健康问答社区寻求社会支持，以改善他们的健康状况。然而，当他们接收到的支持与他们特定的需求不符时，可能会证明无效甚至有害。这需要一个能够识别问题中社会支持需求的模型。然而，由于标记数据的稀缺性和类别不平衡问题，训练这样的模型是具有挑战性的。为了克服这些挑战，我们遵循计算设计科学范式，开发了一个新的框架，即社会支持需求分类的混合方法（HA-SOS）。HA-SOS集成了一个增强答案的半监督学习方法，利用大型语言模型（LLMs）的文本数据增强技术，具有可靠性和多样性感知的样本选择机制，以及一个统一的训练过程，自动标记问题中的社会支持需求。广泛的实证评估表明，HA-SOS明显优于现有的问题分类模型和替代的半监督学习方法。这项研究为社会支持、问题分类、半监督学习和文本数据增强的文献做出了贡献。在实践中，我们的HA-SOS框架帮助在线问答平台的管理者和回答者更好地了解用户的社会支持需求，使他们能够提供及时、个性化的答案和干预措施。

更新时间: 2025-03-21 07:25:16

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.17421v1

A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network

The existing segment routing (SR) methods need to determine the routing first and then use path segmentation approaches to select swap nodes to form a segment routing path (SRP). They require re-segmentation of the path when the routing changes. Furthermore, they do not consider the flow table issuance time, which cannot maximize the speed of issuance flow table. To address these issues, this paper establishes an optimization model that can simultaneously form routing strategies and path segmentation strategies for selecting the appropriate swap nodes to reduce flow table issuance time. It also designs an intelligent segment routing algorithm based on deep reinforcement learning (DRL-SR) to solve the proposed model. First, a traffic matrix is designed as the state space for the deep reinforcement learning agent; this matrix includes multiple QoS performance indicators, flow table issuance time overhead and SR label stack depth. Second, the action selection strategy and corresponding reward function are designed, where the agent selects the next node considering the routing; in addition, the action selection strategy whether the newly added node is selected as the swap node and the corresponding reward function are designed considering the time cost factor for the controller to issue the flow table to the swap node. Finally, a series of experiments and their results show that, compared with the existing methods, the designed segmented route optimization model and the intelligent solution algorithm (DRL-SR) can reduce the time overhead required to complete the segmented route establishment task while optimizing performance metrics such as throughput, delays and packet losses.

Updated: 2025-03-21 07:24:09

标题: 一种基于深度强化学习的软件定义网络中基于交换节点选择策略的新分段路由方法

摘要: 现有的分段路由（SR）方法需要首先确定路由，然后使用路径分段方法选择交换节点形成分段路由路径（SRP）。当路由发生变化时，它们需要重新对路径进行分段。此外，它们不考虑流表下发时间，这会影响最大化流表下发速度。为解决这些问题，本文建立了一个优化模型，可以同时形成路由策略和路径分段策略，选择适当的交换节点以减少流表下发时间。它还设计了一种基于深度强化学习（DRL-SR）的智能分段路由算法来解决提出的模型。首先，一个流量矩阵被设计为深度强化学习代理的状态空间；该矩阵包括多个QoS性能指标、流表下发时间开销和SR标签堆栈深度。其次，设计了动作选择策略和相应的奖励函数，代理根据路由选择下一个节点；此外，考虑到控制器下发流表给交换节点的时间成本因素，设计了动作选择策略，以及相应的奖励函数，是否将新添加的节点选择为交换节点。最后，一系列实验及其结果表明，与现有方法相比，设计的分段路由优化模型和智能解决方案算法（DRL-SR）能够减少完成分段路由建立任务所需的时间开销，同时优化吞吐量、延迟和数据包丢失等性能指标。

更新时间: 2025-03-21 07:24:09

领域: cs.AI

下载: http://arxiv.org/abs/2503.16914v1

A physics-informed transformer neural operator for learning generalized solutions of initial boundary value problems

Initial boundary value problems arise commonly in applications with engineering and natural systems governed by nonlinear partial differential equations (PDEs). Operator learning is an emerging field for solving these equations by using a neural network to learn a map between infinite dimensional input and output function spaces. These neural operators are trained using a combination of data (observations or simulations) and PDE-residuals (physics-loss). A major drawback of existing neural approaches is the requirement to retrain with new initial/boundary conditions, and the necessity for a large amount of simulation data for training. We develop a physics-informed transformer neural operator (named PINTO) that efficiently generalizes to unseen initial and boundary conditions, trained in a simulation-free setting using only physics loss. The main innovation lies in our new iterative kernel integral operator units, implemented using cross-attention, to transform the PDE solution's domain points into an initial/boundary condition-aware representation vector, enabling efficient learning of the solution function for new scenarios. The PINTO architecture is applied to simulate the solutions of important equations used in engineering applications: advection, Burgers, and steady and unsteady Navier-Stokes equations (three flow scenarios). For these five test cases, we show that the relative errors during testing under challenging conditions of unseen initial/boundary conditions are only one-fifth to one-third of other leading physics informed operator learning methods. Moreover, our PINTO model is able to accurately solve the advection and Burgers equations at time steps that are not included in the training collocation points. The code is available at https://github.com/quest-lab-iisc/PINTO

Updated: 2025-03-21 07:14:56

标题: 一个受物理启发的变压器神经算子，用于学习初边值问题的广义解

摘要: 初始边界值问题在工程和自然系统中常见，这些系统由非线性偏微分方程（PDEs）控制。操作学习是一种新兴领域，用于通过使用神经网络学习输入和输出函数空间之间的映射来解决这些方程。这些神经操作符是使用数据（观测或模拟）和PDE残差（物理损失）的组合进行训练的。现有神经方法的一个主要缺点是需要重新训练新的初始/边界条件，并且需要大量的模拟数据进行训练。我们开发了一种名为PINTO的物理信息变换器神经操作符，它能够高效地泛化到未见的初始和边界条件，只使用物理损失在无需模拟的情况下进行训练。主要创新在于我们的新迭代核积分操作单元，使用交叉注意力来将PDE解的域点转换为初始/边界条件感知的表示向量，从而实现对新场景的解函数的高效学习。PINTO架构被应用于模拟工程应用中使用的重要方程的解：对流、Burgers以及稳态和非稳态Navier-Stokes方程（三种流动场景）。对于这五个测试案例，我们展示了在具有挑战性的未见初始/边界条件下测试时的相对误差仅为其他领先的物理信息操作符学习方法的五分之一到三分之一。此外，我们的PINTO模型能够准确地解决在训练点未包含的时间步长下的对流和Burgers方程。代码可在https://github.com/quest-lab-iisc/PINTO找到。

更新时间: 2025-03-21 07:14:56

领域: cs.LG,physics.comp-ph,35C05

下载: http://arxiv.org/abs/2412.09009v2

MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving

Multimodal scientific problems (MSPs) involve complex issues that require the integration of multiple modalities, such as text and diagrams, presenting a significant challenge in artificial intelligence. While progress has been made in addressing traditional scientific problems, MSPs still face two primary issues: the challenge of multi-modal comprehensive reasoning in scientific problem-solving and the lack of reflective and rethinking capabilities. To address these issues, we introduce a Multi-Agent framework based on the Big Seven Personality and Socratic guidance (MAPS). This framework employs seven distinct agents that leverage feedback mechanisms and the Socratic method to guide the resolution of MSPs. To tackle the first issue, we propose a progressive four-agent solving strategy, where each agent focuses on a specific stage of the problem-solving process. For the second issue, we introduce a Critic agent, inspired by Socratic questioning, which prompts critical thinking and stimulates autonomous learning. We conduct extensive experiments on the EMMA, Olympiad, and MathVista datasets, achieving promising results that outperform the current SOTA model by 15.84% across all tasks. Meanwhile, the additional analytical experiments also verify the model's progress as well as generalization ability.

Updated: 2025-03-21 07:13:45

标题: 地图：基于大七人格和苏格拉底引导的多模态科学问题解决的多智能体框架

摘要: 多模态科学问题（MSPs）涉及复杂问题，需要综合多种模态，如文本和图表，这在人工智能领域面临着重大挑战。尽管在解决传统科学问题方面取得了进展，MSPs仍然面临两个主要问题：科学问题解决中多模态综合推理的挑战以及缺乏反思和重新思考能力。为了解决这些问题，我们引入了一个基于大七人格和苏格拉底指导的多智能体框架（MAPS）。该框架采用七个不同的智能体，利用反馈机制和苏格拉底方法来引导解决MSPs。为了解决第一个问题，我们提出了一个渐进的四智能体解决策略，其中每个智能体专注于问题解决过程的特定阶段。对于第二个问题，我们引入了一个受苏格拉底质疑启发的评论家智能体，促进批判性思维并激发自主学习。我们在EMMA、奥林匹亚和MathVista数据集上进行了大量实验，取得了比当前SOTA模型在所有任务上高出15.84%的有希望的结果。同时，额外的分析实验证实了模型的进展以及泛化能力。

更新时间: 2025-03-21 07:13:45

领域: cs.AI

下载: http://arxiv.org/abs/2503.16905v1

Deep Learning for Human Locomotion Analysis in Lower-Limb Exoskeletons: A Comparative Study

Wearable robotics for lower-limb assistance have become a pivotal area of research, aiming to enhance mobility for individuals with physical impairments or augment the performance of able-bodied users. Accurate and adaptive control systems are essential to ensure seamless interaction between the wearer and the robotic device, particularly when navigating diverse and dynamic terrains. Despite the recent advances in neural networks for time series analysis, no attempts have been directed towards the classification of ground conditions, categorized into five classes and subsequently determining the ramp's slope and stair's height. In this respect, this paper presents an experimental comparison between eight deep neural network backbones to predict high-level locomotion parameters across diverse terrains. All the models are trained on the publicly available CAMARGO 2021 dataset. IMU-only data equally or outperformed IMU+EMG inputs, promoting a cost-effective and efficient design. Indeeds, using three IMU sensors, the LSTM achieved high terrain classification accuracy (0.94 +- 0.04) and precise ramp slope (1.95 +- 0.58{\deg}) and the CNN-LSTM a stair height (15.65 +- 7.40 mm) estimations. As a further contribution, SHAP analysis justified sensor reduction without performance loss, ensuring a lightweight setup. The system operates with ~2 ms inference time, supporting real-time applications. The code is code available at https://github.com/cosbidev/Human-Locomotion-Identification.

Updated: 2025-03-21 07:12:44

标题: 深度学习在下肢外骨骼人体运动分析中的应用：一项比较研究

摘要: 穿戴式下肢辅助机器人已经成为研究的关键领域，旨在提升残疾人士的移动能力或增强健全用户的性能。准确和自适应的控制系统对确保穿戴者与机器人设备之间无缝互动至关重要，特别是在穿越各种动态地形时。尽管最近神经网络在时间序列分析方面取得了进展，但尚未有任何尝试针对地面条件（分为五类）进行分类，并随后确定斜坡的坡度和楼梯的高度。在这方面，本文介绍了对八种深度神经网络骨干进行实验比较的研究，以预测各种地形上的高级运动参数。所有模型均基于公开可用的CAMARGO 2021数据集进行训练。仅基于IMU数据的模型同样或优于IMU+EMG输入，促进了成本效益和高效设计。事实上，使用三个IMU传感器，LSTM实现了高地形分类精度（0.94 +- 0.04），准确的斜坡坡度（1.95 +- 0.58°），以及CNN-LSTM对楼梯高度（15.65 +- 7.40毫米）的估计。作为进一步的贡献，SHAP分析证明了传感器减少而无性能损失，确保了轻量级设置。该系统在约2毫秒的推断时间内运行，支持实时应用。代码可在https://github.com/cosbidev/Human-Locomotion-Identification获取。

更新时间: 2025-03-21 07:12:44

领域: cs.RO,cs.AI,F.2.2, I.2.7

下载: http://arxiv.org/abs/2503.16904v1

TeMP-TraG: Edge-based Temporal Message Passing in Transaction Graphs

Transaction graphs, which represent financial and trade transactions between entities such as bank accounts and companies, can reveal patterns indicative of financial crimes like money laundering and fraud. However, effective detection of such cases requires node and edge classification methods capable of addressing the unique challenges of transaction graphs, including rich edge features, multigraph structures and temporal dynamics. To tackle these challenges, we propose TeMP-TraG, a novel graph neural network mechanism that incorporates temporal dynamics into message passing. TeMP-TraG prioritises more recent transactions when aggregating node messages, enabling better detection of time-sensitive patterns. We demonstrate that TeMP-TraG improves four state-of-the-art graph neural networks by 6.19% on average. Our results highlight TeMP-TraG as an advancement in leveraging transaction graphs to combat financial crime.

Updated: 2025-03-21 07:10:27

标题: TeMP-TraG：事务图中基于边缘的时间消息传递

摘要: 交易图代表了银行账户和公司之间的金融和贸易交易，可以揭示出类似洗钱和欺诈等金融犯罪的模式。然而，有效检测这类案件需要能够解决交易图独特挑战的节点和边分类方法，包括丰富的边特征、多重图结构和时间动态。为了解决这些挑战，我们提出了TeMP-TraG，一种将时间动态纳入消息传递的新颖图神经网络机制。TeMP-TraG在聚合节点消息时优先考虑更近期的交易，从而更好地检测到时效性模式。我们展示了TeMP-TraG在平均上提高了四种最先进的图神经网络6.19%。我们的结果突显了TeMP-TraG作为利用交易图来打击金融犯罪的一个进步。

更新时间: 2025-03-21 07:10:27

领域: cs.LG

下载: http://arxiv.org/abs/2503.16901v1

Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation

As large language models (LLMs) have shown great success in many tasks, they are used in various applications. While a lot of works have focused on the efficiency of single-LLM application (e.g., offloading, request scheduling, parallelism strategy selection), multi-LLM applications receive less attention, particularly in offline inference scenarios. In this work, we aim to improve the offline end-to-end inference efficiency of multi-LLM applications in the single-node multi-GPU environment. The problem involves two key decisions: (1) determining which LLMs to run concurrently each time (we may not run all the models at the same time), and (2) selecting a parallelism strategy to use for each LLM. This problem is NP-hard. Naive solutions may not work well because the running time for a model to complete a set of requests depends on the request workload and the selected parallelism strategy, and they lack an accurate model of the running time. As the LLM output lengths are unknown before running, to estimate the model running time, we propose a sampling-then-simulation method which first estimates the output lengths by sampling from an empirical cumulative function we obtained from a large dataset in advance, and then simulates the LLM inference process accordingly. Based on the simulation, we estimate the per-iteration latencys to get the total latency. A greedy method is proposed to optimize the scheduling of the LLMs in the application across the GPUs. We then propose a framework SamuLLM which contains two phases: planning, which calls the greedy method for an application and running, which runs the application and dynamically adjust the model scheduling based on the runtime information. Experiments on 3 applications and a mixed application show that SamuLLM can achieve 1.0-2.4$\times$ end-to-end speedups compared to the competitors.

Updated: 2025-03-21 06:56:35

标题: 基于抽样和模拟的多LLM应用离线推断端到端效率的提升

摘要: 随着大型语言模型（LLMs）在许多任务中取得了巨大成功，它们被用于各种应用。尽管许多研究集中在单个LLM应用的效率上（例如，卸载、请求调度、并行策略选择），但多个LLM应用却受到较少关注，特别是在离线推断场景中。在这项工作中，我们旨在提高单节点多GPU环境中多个LLM应用的离线端到端推断效率。该问题涉及两个关键决策：（1）确定每次并发运行哪些LLMs（我们可能不会同时运行所有模型），以及（2）为每个LLM选择并行策略。这个问题是NP难的。朴素的解决方案可能效果不佳，因为模型完成一组请求的运行时间取决于请求工作负载和所选的并行策略，并且缺乏对运行时间的准确模型。由于在运行之前无法确定LLM的输出长度，为了估计模型的运行时间，我们提出了一种采样-模拟方法，该方法首先通过从事先获得的大型数据集中的经验累积函数进行采样来估计输出长度，然后相应地模拟LLM推断过程。基于模拟，我们估计每次迭代的延迟时间以获得总延迟。提出了一种贪婪方法来优化应用程序在GPU上的LLM调度。然后我们提出了一个包含两个阶段的框架SamuLLM：计划阶段调用贪婪方法来为应用程序进行调度，运行阶段运行应用程序并根据运行时信息动态调整模型调度。对3个应用程序和一个混合应用程序的实验表明，与竞争对手相比，SamuLLM可以实现1.0-2.4倍的端到端加速。

更新时间: 2025-03-21 06:56:35

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2503.16893v1

Analysis and Fully Memristor-based Reservoir Computing for Temporal Data Classification

Reservoir computing (RC) offers a neuromorphic framework that is particularly effective for processing spatiotemporal signals. Known for its temporal processing prowess, RC significantly lowers training costs compared to conventional recurrent neural networks. A key component in its hardware deployment is the ability to generate dynamic reservoir states. Our research introduces a novel dual-memory RC system, integrating a short-term memory via a WOx-based memristor, capable of achieving 16 distinct states encoded over 4 bits, and a long-term memory component using a TiOx-based memristor within the readout layer. We thoroughly examine both memristor types and leverage the RC system to process temporal data sets. The performance of the proposed RC system is validated through two benchmark tasks: isolated spoken digit recognition with incomplete inputs and Mackey-Glass time series prediction. The system delivered an impressive 98.84% accuracy in digit recognition and sustained a low normalized root mean square error (NRMSE) of 0.036 in the time series prediction task, underscoring its capability. This study illuminates the adeptness of memristor-based RC systems in managing intricate temporal challenges, laying the groundwork for further innovations in neuromorphic computing.

Updated: 2025-03-21 06:52:25

标题: 对时间数据分类的分析和基于全记忆电阻器的储层计算

摘要: Reservoir computing (RC) 提供了一个神经形态的框架，特别适用于处理时空信号。RC 以其时间处理能力而闻名，与传统的递归神经网络相比，能够显著降低训练成本。其硬件部署中的关键组成部分是生成动态储层状态的能力。我们的研究引入了一种新颖的双存储 RC 系统，通过基于 WOx 的忆阻器实现了具有 4 位编码的 16 个不同状态的短期记忆，以及在读出层中使用 TiOx 忆阻器的长期记忆组件。我们对两种忆阻器类型进行了彻底的研究，并利用 RC 系统处理时间数据集。所提出的 RC 系统的性能通过两个基准任务进行了验证：使用不完整输入的孤立口述数字识别和 Mackey-Glass 时间序列预测。该系统在数字识别中达到了令人印象深刻的 98.84% 的准确率，并在时间序列预测任务中保持了较低的归一化均方根误差（NRMSE）为 0.036，突显了其能力。这项研究阐明了基于忆阻器的 RC 系统在处理复杂时间挑战方面的熟练性，为神经形态计算的进一步创新奠定了基础。

更新时间: 2025-03-21 06:52:25

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2403.01827v3

Catastrophic Failure of LLM Unlearning via Quantization

Large language models (LLMs) have shown remarkable proficiency in generating text, benefiting from extensive training on vast textual corpora. However, LLMs may also acquire unwanted behaviors from the diverse and sensitive nature of their training data, which can include copyrighted and private content. Machine unlearning has been introduced as a viable solution to remove the influence of such problematic content without the need for costly and time-consuming retraining. This process aims to erase specific knowledge from LLMs while preserving as much model utility as possible. Despite the effectiveness of current unlearning methods, little attention has been given to whether existing unlearning methods for LLMs truly achieve forgetting or merely hide the knowledge, which current unlearning benchmarks fail to detect. This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information. To thoroughly evaluate this phenomenon, we conduct comprehensive experiments using various quantization techniques across multiple precision levels. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21\% of the intended forgotten knowledge in full precision, which significantly increases to 83\% after 4-bit quantization. ... Our code is available at: \href{https://github.com/zzwjames/FailureLLMUnlearning}{https://github.com/zzwjames/FailureLLMUnlearning}.

Updated: 2025-03-21 06:37:37

标题: LLM量化解除学习的灾难性失败

摘要: 大型语言模型（LLMs）已经显示出在生成文本方面具有出色的能力，这得益于对大量文本语料库进行了广泛的训练。然而，LLMs也可能从其训练数据的多样化和敏感性质中获得不良行为，其中可能包括受版权保护和私人内容。机器遗忘已被引入作为一种可行的解决方案，以消除这些问题内容的影响，而无需进行昂贵和耗时的重新训练。该过程旨在从LLMs中擦除特定知识，同时尽可能保留尽可能多的模型效用。尽管当前的遗忘方法已经被证明有效，但对于现有的LLMs的遗忘方法是否真正实现了遗忘，还是仅仅隐藏了知识，目前的遗忘基准测试无法检测到，却未受到足够关注。本文揭示了对经过遗忘的模型应用量化可以恢复“遗忘”的信息。为了全面评估这一现象，我们使用各种量化技术在多个精度级别进行了综合实验。我们发现，对于具有效用约束的遗忘方法，完整精度下的遗忘模型保留了平均21％的预期被遗忘知识，而在4位量化后，这一比例显著增加到83％。...我们的代码可在以下链接找到：https://github.com/zzwjames/FailureLLMUnlearning。

更新时间: 2025-03-21 06:37:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.16454v3

Bias Testing and Mitigation in LLM-based Code Generation

As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does the generated code contain social bias and unfairness, such as those related to age, gender, and race? This issue concerns the integrity, fairness, and ethical foundation of software applications that depend on the code generated by these models but are underexplored in the literature. This paper presents a novel bias testing framework that is specifically designed for code generation tasks. Based on this framework, we conduct an extensive empirical study on the biases in code generated by five widely studied LLMs (i.e., PALM-2-CodeChat-bison, Claude-instant-1, GPT-3.5-turbo, GPT-4-turbo, and GPT-4). Our findings reveal that biases are prevalent. For example, 13.47% to 49.10% of the codes generated by these LLMs have biased behaviors towards gender. Moreover, we study five bias mitigation prompt strategies that are commonly used in current code generation scenarios, i.e., zero-shot, one-shot, few-shot, and two Chain-of-Thought (CoT) prompts, with and without provided feedback-driven refinement. Our evaluation results illustrate that using direct prompt engineering strategies has limited effectiveness in mitigating bias, but our test execution feedback can help to reduce the ratio of code biases to a large extent (e.g., from 59.88% to 4.79% for GPT-4).

Updated: 2025-03-21 06:36:33

标题: Bias Testing and Mitigation in LLM-based Code Generation LLM基础代码生成中的偏差测试和缓解

摘要: 随着LLMs在软件编码生态系统中的普及，一个紧迫的问题已经出现：生成的代码是否包含社会偏见和不公平，比如与年龄、性别和种族相关的偏见？这个问题涉及依赖这些模型生成的代码的软件应用程序的完整性、公平性和道德基础，但在文献中尚未得到充分探讨。本文提出了一个专门针对代码生成任务设计的新颖的偏见测试框架。基于这个框架，我们对五种广泛研究的LLMs（即PALM-2-CodeChat-bison、Claude-instant-1、GPT-3.5-turbo、GPT-4-turbo和GPT-4）生成的代码中的偏见进行了广泛的实证研究。我们的研究结果显示，偏见普遍存在。例如，这些LLMs生成的代码中有13.47%至49.10%存在对性别的偏见行为。此外，我们研究了当前代码生成场景中常用的五种偏见缓解提示策略，即零次、一次、少次和两次思维链（CoT）提示，有无提供反馈驱动的改进。我们的评估结果表明，使用直接提示工程策略在减少偏见方面的效果有限，但我们的测试执行反馈可以在很大程度上帮助减少代码偏见的比例（例如，对于GPT-4，从59.88%降至4.79%）。

更新时间: 2025-03-21 06:36:33

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2309.14345v4

MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization

The basic question-answering format of large language models involves inputting a prompt and receiving a response, and the quality of the prompt directly impacts the effectiveness of the response. Automated Prompt Optimization (APO) aims to break free from the cognitive biases of manually designed prompts and explores a broader design space for prompts. However, existing APO methods suffer from limited flexibility of fixed templates and inefficient search in prompt spaces as key issues. To this end, we propose a Multi-Agent framework Incorporating Socratic guidance (MARS), which utilizes multi-agent fusion technology for automatic planning, with gradual continuous optimization and evaluation. Specifically, MARS comprises seven agents, each with distinct functionalities, which autonomously use the Planner to devise an optimization path that ensures flexibility. Additionally, it employs a Teacher-Critic-Student Socratic dialogue pattern to iteratively optimize the prompts while conducting effective search. We conduct extensive experiments on various datasets to validate the effectiveness of our method, and perform additional analytical experiments to assess the model's advancement as well as the interpretability.

Updated: 2025-03-21 06:19:55

标题: 火星：一个多代理框架，融合了苏格拉底式引导，用于自动提示优化

摘要: 大型语言模型的基本问答格式涉及输入提示并接收响应，而提示的质量直接影响响应的有效性。自动提示优化（APO）旨在摆脱手动设计提示的认知偏见，并探索更广泛的提示设计空间。然而，现有的APO方法存在固定模板的灵活性有限和提示空间中的低效搜索等关键问题。为此，我们提出了一个多智能体框架，融入了苏格拉底引导（MARS），利用多智能体融合技术进行自动计划，逐步连续优化和评估。具体而言，MARS包括七个具有不同功能的智能体，它们自主使用规划者设计一个确保灵活性的优化路径。此外，它采用了教师-评论家-学生的苏格拉底对话模式，通过有效搜索迭代地优化提示。我们在各种数据集上进行了广泛的实验证实我们方法的有效性，并进行了额外的分析实验来评估模型的进展以及可解释性。

更新时间: 2025-03-21 06:19:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.16874v1

AVA: Attentive VLM Agent for Mastering StarCraft II

We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience. Traditional frameworks such as SMAC rely on abstract state representations that diverge significantly from human perception, limiting the ecological validity of agent behavior. Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay. The AVA architecture consists of three integrated components: (1) a vision-language model enhanced with specialized self-attention mechanisms for strategic unit targeting and battlefield assessment, (2) a retrieval-augmented generation system that leverages domain-specific StarCraft II knowledge to inform tactical decisions, and (3) a dynamic role-based task distribution system that enables coordinated multi-agent behavior. The experimental evaluation in our proposed AVACraft environment, which contains 21 multimodal StarCraft II scenarios, demonstrates that AVA powered by foundation models (specifically Qwen-VL and GPT-4o) can execute complex tactical maneuvers without explicit training, achieving comparable performance to traditional MARL methods that require substantial training iterations. This work establishes a foundation for developing human-aligned StarCraft II agents and advances the broader research agenda of multimodal game AI. Our implementation is available at https://github.com/camel-ai/VLM-Play-StarCraft2.

Updated: 2025-03-21 06:14:36

标题: AVA：专注的VLM代理用于掌握星际争霸II

摘要: 我们介绍了Attentive VLM Agent (AVA)，这是一个多模态的StarCraft II代理，将人工智能代理的感知与人类游戏体验相一致。传统框架如SMAC依赖于抽象的状态表示，与人类感知有显著差异，限制了代理行为的生态效度。我们的代理通过整合RGB视觉输入和自然语言观察来解决这一限制，更接近模拟人类在游戏过程中的认知过程。AVA架构包括三个整合组件：(1)一个通过专门的自我关注机制增强的视觉-语言模型，用于战略单位定位和战场评估，(2)一个检索增强生成系统，利用领域特定的StarCraft II知识来指导战术决策，和(3)一个动态基于角色的任务分配系统，实现协调的多代理行为。在我们提出的AVACraft环境中进行的实验评估，包括21个多模态StarCraft II场景，表明由基础模型（具体为Qwen-VL和GPT-4o）驱动的AVA能够执行复杂的战术动作而无需明确训练，实现了与需要大量训练迭代的传统MARL方法相当的性能。这项工作为开发与人类相一致的StarCraft II代理奠定了基础，并推动了更广泛的多模态游戏AI研究议程。我们的实现可在https://github.com/camel-ai/VLM-Play-StarCraft2找到。

更新时间: 2025-03-21 06:14:36

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2503.05383v3

Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification

Multi-label classification is crucial for comprehensive image understanding, yet acquiring accurate annotations is challenging and costly. To address this, a recent study suggests exploiting unsupervised multi-label classification leveraging CLIP, a powerful vision-language model. Despite CLIP's proficiency, it suffers from view-dependent predictions and inherent bias, limiting its effectiveness. We propose a novel method that addresses these issues by leveraging multiple views near target objects, guided by Class Activation Mapping (CAM) of the classifier, and debiasing pseudo-labels derived from CLIP predictions. Our Classifier-guided CLIP Distillation (CCD) enables selecting multiple local views without extra labels and debiasing predictions to enhance classification performance. Experimental results validate our method's superiority over existing techniques across diverse datasets. The code is available at https://github.com/k0u-id/CCD.

Updated: 2025-03-21 06:12:14

标题: 分类器引导的CLIP蒸馏用于无监督多标签分类

摘要: 多标签分类对于全面理解图像至关重要，然而获得准确的注释是具有挑战性和昂贵的。为了解决这个问题，最近的一项研究建议利用CLIP，一种强大的视觉-语言模型，来进行无监督的多标签分类。尽管CLIP的熟练程度很高，但它存在视角相关的预测和固有偏见，限制了其有效性。我们提出了一种新的方法，通过利用分类器的类激活映射（CAM）指导附近目标物体的多个视图，并对由CLIP预测导出的伪标签进行去偏，来解决这些问题。我们的分类器引导的CLIP蒸馏（CCD）能够选择多个局部视图而无需额外标签，并去偏预测以提升分类性能。实验结果验证了我们的方法在各种数据集上优于现有技术。代码可在https://github.com/k0u-id/CCD找到。

更新时间: 2025-03-21 06:12:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16873v1

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Institutions with limited data and computing resources often outsource model training to third-party providers in a semi-honest setting, assuming adherence to prescribed training protocols with pre-defined learning paradigm (e.g., supervised or semi-supervised learning). However, this practice can introduce severe security risks, as adversaries may poison the training data to embed backdoors into the resulting model. Existing detection approaches predominantly rely on statistical analyses, which often fail to maintain universally accurate detection accuracy across different learning paradigms. To address this challenge, we propose a unified backdoor detection framework in the semi-honest setting that exploits cross-examination of model inconsistencies between two independent service providers. Specifically, we integrate central kernel alignment to enable robust feature similarity measurements across different model architectures and learning paradigms, thereby facilitating precise recovery and identification of backdoor triggers. We further introduce backdoor fine-tuning sensitivity analysis to distinguish backdoor triggers from adversarial perturbations, substantially reducing false positives. Extensive experiments demonstrate that our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines across supervised, semi-supervised, and autoregressive learning tasks, respectively. Notably, it is the first to effectively detect backdoors in multimodal large language models, further highlighting its broad applicability and advancing secure deep learning.

Updated: 2025-03-21 06:12:06

标题: 谎言检测器：通过交叉审讯框架实现统一的后门检测

摘要: 在数据和计算资源有限的机构中，通常会将模型训练外包给第三方提供商，在半诚实的环境中进行，假设遵守预定义的学习范式（例如，监督或半监督学习）的规定训练协议。然而，这种做法可能会引入严重的安全风险，因为对手可能会操纵训练数据，将后门嵌入到生成的模型中。现有的检测方法主要依赖于统计分析，通常无法在不同的学习范式之间保持普遍准确的检测精度。为了解决这一挑战，我们提出了一个在半诚实环境中利用两个独立服务提供商之间模型不一致性的交叉检验的统一后门检测框架。具体地，我们集成了中心核对齐，以实现不同模型架构和学习范式之间的稳健特征相似度测量，从而促进后门触发器的精确恢复和识别。我们进一步引入了后门微调敏感性分析，以区分后门触发器和对抗扰动，大大减少了误报。大量实验证明，我们的方法在监督学习、半监督学习和自回归学习任务中分别比SoTA基线提高了5.4%、1.6%和11.9%的准确性，达到了优越的检测性能。值得注意的是，这是第一个有效地检测多模态大型语言模型中后门的方法，进一步突显了其广泛的适用性，并推动了安全深度学习的发展。

更新时间: 2025-03-21 06:12:06

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.16872v1

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Vision-Language Models (VLMs) have enabled a variety of real-world applications. The large parameter size of VLMs brings large memory and computation overhead which poses significant challenges for deployment. Post-Training Quantization (PTQ) is an effective technique to reduce the memory and computation overhead. Existing PTQ methods mainly focus on large language models (LLMs), without considering the differences across other modalities. In this paper, we discover that there is a significant difference in sensitivity between language and vision tokens in large VLMs. Therefore, treating tokens from different modalities equally, as in existing PTQ methods, may over-emphasize the insensitive modalities, leading to significant accuracy loss. To deal with the above issue, we propose a simple yet effective method, Modality-Balanced Quantization (MBQ), for large VLMs. Specifically, MBQ incorporates the different sensitivities across modalities during the calibration process to minimize the reconstruction loss for better quantization parameters. Extensive experiments show that MBQ can significantly improve task accuracy by up to 4.4% and 11.6% under W3 and W4A8 quantization for 7B to 70B VLMs, compared to SOTA baselines. Additionally, we implement a W3 GPU kernel that fuses the dequantization and GEMV operators, achieving a 1.4x speedup on LLaVA-onevision-7B on the RTX 4090. The code is available at https://github.com/thu-nics/MBQ.

Updated: 2025-03-21 06:01:23

标题: MBQ：大规模视觉-语言模型的模态平衡量化

摘要: 视觉-语言模型（VLMs）已经实现了各种现实世界的应用。VLMs的大参数大小带来了大量的内存和计算开销，这给部署带来了重大挑战。后训练量化（PTQ）是一种有效的技术，可以减少内存和计算开销。现有的PTQ方法主要关注大型语言模型（LLMs），而不考虑其他模态之间的差异。在本文中，我们发现大型VLMs中语言和视觉标记之间的敏感性存在显著差异。因此，像现有的PTQ方法一样平等对待来自不同模态的标记可能会过分强调不敏感的模态，导致显著的准确性损失。为了解决上述问题，我们提出了一种简单而有效的方法，Modality-Balanced Quantization（MBQ），用于大型VLMs。具体来说，MBQ在校准过程中结合了跨模态之间不同的敏感性，以最小化重构损失，获得更好的量化参数。大量实验证明，与SOTA基线相比，MBQ可以显着提高任务准确性，对于7B至70B的VLMs，在W3和W4A8量化下，分别提高了4.4%和11.6%。此外，我们实现了一个融合去量化和GEMV运算符的W3 GPU内核，在RTX 4090上的LLaVA-onevision-7B上实现了1.4倍的加速。代码可在https://github.com/thu-nics/MBQ获取。

更新时间: 2025-03-21 06:01:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.19509v2

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Knowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased estimates of teacher probability distribution to the student, resulting in suboptimal performance and calibration. We propose an importance-sampling-based method `Random Sampling Knowledge Distillation', which provides unbiased estimates, preserves the gradient in expectation, and requires storing significantly sparser logits. Our method enables faster training of student models with marginal overhead (<10%) compared to cross-entropy based training, while maintaining competitive performance compared to full distillation, across a range of model sizes from 300M to 3B.

Updated: 2025-03-21 05:58:18

标题: 稀疏逻辑采样：加速LLM中的知识蒸馏

摘要: 知识蒸馏可以是一种成本有效的技术，在大型语言模型中蒸馏知识，如果教师输出的logits可以预先计算并缓存。然而，成功地将这种技术应用于预训练仍然是一个相对未被探索的领域。在这项工作中，我们证明了对于稀疏知识蒸馏的朴素方法，比如缓存Top-K概率，虽然直观，但会提供对学生的教师概率分布的有偏估计，导致次优的性能和校准。我们提出了一种基于重要性抽样的方法`随机抽样知识蒸馏'，它提供无偏估计，保留期望中的梯度，并且需要存储明显更稀疏的logits。我们的方法使得学生模型的训练速度更快，而额外成本较低（<10%），与基于交叉熵的训练相比，同时在300M到3B范围内的各种模型大小上保持了与完全蒸馏相竞争的性能。

更新时间: 2025-03-21 05:58:18

领域: cs.LG,cs.AI,cs.CL,68T50,I.2.7

下载: http://arxiv.org/abs/2503.16870v1

Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models

As diffusion models become increasingly popular, the misuse of copyrighted and private images has emerged as a major concern. One promising solution to mitigate this issue is identifying the contribution of specific training samples in generative models, a process known as data attribution. Existing data attribution methods for diffusion models typically quantify the contribution of a training sample by evaluating the change in diffusion loss when the sample is included or excluded from the training process. However, we argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the calculation of diffusion loss. Specifically, these approaches measure the divergence between predicted and ground truth distributions, which leads to an indirect comparison between the predicted distributions and cannot represent the variances between model behaviors. To address these issues, we aim to measure the direct comparison between predicted distributions with an attribution score to analyse the training sample importance, which is achieved by Diffusion Attribution Score (\textit{DAS}). Underpinned by rigorous theoretical analysis, we elucidate the effectiveness of DAS. Additionally, we explore strategies to accelerate DAS calculations, facilitating its application to large-scale diffusion models. Our extensive experiments across various datasets and diffusion models demonstrate that DAS significantly surpasses previous benchmarks in terms of the linear data-modelling score, establishing new state-of-the-art performance. Code is available at \hyperlink{here}{https://github.com/Jinxu-Lin/DAS}.

Updated: 2025-03-21 05:57:29

标题: 扩散归因分数：评估扩散模型中训练数据的影响

摘要: 随着扩散模型变得越来越受欢迎，滥用受版权保护和私人图像已成为一个主要关注点。缓解这一问题的一个有希望的解决方案是识别生成模型中特定训练样本的贡献，这个过程被称为数据归因。现有的针对扩散模型的数据归因方法通常通过评估在训练过程中包含或排除样本时扩散损失的变化来量化训练样本的贡献。然而，我们认为直接使用扩散损失不能准确地表示这种贡献，这是因为扩散损失的计算。具体来说，这些方法衡量了预测分布与真实分布之间的差异，导致了预测分布之间的间接比较，不能表示模型行为之间的方差。为了解决这些问题，我们旨在通过扩散归因分数（DAS）来衡量预测分布之间的直接比较，以分析训练样本的重要性。在严格的理论分析基础上，我们阐明了DAS的有效性。此外，我们探讨了加速DAS计算的策略，便于将其应用于大规模扩散模型。我们在各种数据集和扩散模型上进行的大量实验表明，DAS在线性数据建模得分方面明显超过了先前的基准，建立了新的最先进性能。代码可在https://github.com/Jinxu-Lin/DAS找到。

更新时间: 2025-03-21 05:57:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.18639v4

Nonparametric Factor Analysis and Beyond

Nearly all identifiability results in unsupervised representation learning inspired by, e.g., independent component analysis, factor analysis, and causal representation learning, rely on assumptions of additive independent noise or noiseless regimes. In contrast, we study the more general case where noise can take arbitrary forms, depend on latent variables, and be non-invertibly entangled within a nonlinear function. We propose a general framework for identifying latent variables in the nonparametric noisy settings. We first show that, under suitable conditions, the generative model is identifiable up to certain submanifold indeterminacies even in the presence of non-negligible noise. Furthermore, under the structural or distributional variability conditions, we prove that latent variables of the general nonlinear models are identifiable up to trivial indeterminacies. Based on the proposed theoretical framework, we have also developed corresponding estimation methods and validated them in various synthetic and real-world settings. Interestingly, our estimate of the true GDP growth from alternative measurements suggests more insightful information on the economies than official reports. We expect our framework to provide new insight into how both researchers and practitioners deal with latent variables in real-world scenarios.

Updated: 2025-03-21 05:45:03

标题: 非参数因子分析及其拓展

摘要: 几乎所有受独立分量分析、因子分析和因果表示学等启发的无监督表示学习中的可辨识性结果都依赖于加性独立噪声或无噪声区域的假设。相比之下，我们研究了噪声可以采取任意形式、依赖潜变量并且在非线性函数内不可逆地缠结的更一般情况。我们提出了一个在非参数嘈杂环境中识别潜变量的通用框架。我们首先证明，在适当条件下，生成模型在存在非可忽略噪声的情况下仍然可辨识，甚至在某些子流形的不确定性存在。此外，在结构或分布变异性条件下，我们证明了一般非线性模型的潜变量在微不足道的不确定性上可辨识。基于提出的理论框架，我们还开发了相应的估计方法，并在各种合成和实际环境中进行了验证。有趣的是，我们从替代测量中估计的真实GDP增长率提供了比官方报告更具洞察力的信息。我们期望我们的框架能够为研究人员和实践者在现实场景中如何处理潜变量提供新的见解。

更新时间: 2025-03-21 05:45:03

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2503.16865v1

Hierarchy-Boosted Funnel Learning for Identifying Semiconductors with Ultralow Lattice Thermal Conductivity

Data-driven machine learning (ML) has demonstrated tremendous potential in material property predictions. However, the scarcity of materials data with costly property labels in the vast chemical space presents a significant challenge for ML in efficiently predicting properties and uncovering structure-property relationships. Here, we propose a novel hierarchy-boosted funnel learning (HiBoFL) framework, which is successfully applied to identify semiconductors with ultralow lattice thermal conductivity ($\kappa_\mathrm{L}$). By training on only a few hundred materials targeted by unsupervised learning from a pool of hundreds of thousands, we achieve efficient and interpretable supervised predictions of ultralow $\kappa_\mathrm{L}$, thereby circumventing large-scale brute-force \textit{ab initio} calculations without clear objectives. As a result, we provide a list of candidates with ultralow $\kappa_\mathrm{L}$ for potential thermoelectric applications and discover a new factor that significantly influences structural anharmonicity. This HiBoFL framework offers a novel practical pathway for accelerating the discovery of functional materials.

Updated: 2025-03-21 05:13:53

标题: 层次结构增强的漏斗学习用于识别具有超低晶格热导率的半导体

摘要: 数据驱动的机器学习在材料性质预测方面展现出巨大潜力。然而，在广阔的化学空间中，具有昂贵性质标签的材料数据稀缺性给机器学习在高效预测性质和揭示结构-性质关系方面带来了重大挑战。在这里，我们提出了一种新颖的层次增强漏斗学习（HiBoFL）框架，成功应用于识别具有超低晶格热导率（$\kappa_\mathrm{L}$）的半导体。通过仅对数百种材料进行训练，这些材料是从数十万种材料中通过无监督学习进行筛选的，我们实现了对超低$\kappa_\mathrm{L}$的高效且可解释的监督预测，从而避免了没有明确目标的大规模从头计算。因此，我们提供了一份具有潜在热电应用的超低$\kappa_\mathrm{L}$候选列表，并发现了一个显著影响结构非谐性的新因素。这种HiBoFL框架为加速功能材料的发现提供了一条新颖的实用途径。

更新时间: 2025-03-21 05:13:53

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2501.06775v2

In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI

The widespread deployment of general-purpose AI (GPAI) systems introduces significant new risks. Yet the infrastructure, practices, and norms for reporting flaws in GPAI systems remain seriously underdeveloped, lagging far behind more established fields like software security. Based on a collaboration between experts from the fields of software security, machine learning, law, social science, and policy, we identify key gaps in the evaluation and reporting of flaws in GPAI systems. We call for three interventions to advance system safety. First, we propose using standardized AI flaw reports and rules of engagement for researchers in order to ease the process of submitting, reproducing, and triaging flaws in GPAI systems. Second, we propose GPAI system providers adopt broadly-scoped flaw disclosure programs, borrowing from bug bounties, with legal safe harbors to protect researchers. Third, we advocate for the development of improved infrastructure to coordinate distribution of flaw reports across the many stakeholders who may be impacted. These interventions are increasingly urgent, as evidenced by the prevalence of jailbreaks and other flaws that can transfer across different providers' GPAI systems. By promoting robust reporting and coordination in the AI ecosystem, these proposals could significantly improve the safety, security, and accountability of GPAI systems.

Updated: 2025-03-21 05:09:46

标题: 内部评估不足：面向通用AI的强大第三方漏洞披露

摘要: 普遍部署通用人工智能（GPAI）系统引入了重大新风险。然而，在报告GPAI系统缺陷的基础设施、实践和规范仍然严重不足，远远落后于像软件安全这样更成熟的领域。基于来自软件安全、机器学习、法律、社会科学和政策领域的专家之间的合作，我们确定了评估和报告GPAI系统缺陷的关键差距。我们呼吁采取三项干预措施来提升系统安全。首先，我们提议使用标准化的AI缺陷报告和研究人员的参与规则，以便简化提交、重现和处理GPAI系统缺陷的过程。其次，我们提议GPAI系统提供商采用广泛范围的缺陷披露计划，借鉴漏洞悬赏制度，并设立法律安全港以保护研究人员。第三，我们主张开发改进的基础设施，以协调将缺陷报告分发给可能受到影响的众多利益相关者。这些干预措施越来越紧迫，因为越来越多的越狱和其他可能在不同提供商的GPAI系统之间传递的缺陷的普遍存在。通过在人工智能生态系统中促进健壮的报告和协调，这些提议可能显著提高GPAI系统的安全性、安全性和责任性。

更新时间: 2025-03-21 05:09:46

领域: cs.AI

下载: http://arxiv.org/abs/2503.16861v1

PRIOT: Pruning-Based Integer-Only Transfer Learning for Embedded Systems

On-device transfer learning is crucial for adapting a common backbone model to the unique environment of each edge device. Tiny microcontrollers, such as the Raspberry Pi Pico, are key targets for on-device learning but often lack floating-point units, necessitating integer-only training. Dynamic computation of quantization scale factors, which is adopted in former studies, incurs high computational costs. Therefore, this study focuses on integer-only training with static scale factors, which is challenging with existing training methods. We propose a new training method named PRIOT, which optimizes the network by pruning selected edges rather than updating weights, allowing effective training with static scale factors. The pruning pattern is determined by the edge-popup algorithm, which trains a parameter named score assigned to each edge instead of the original parameters and prunes the edges with low scores before inference. Additionally, we introduce a memory-efficient variant, PRIOT-S, which only assigns scores to a small fraction of edges. We implement PRIOT and PRIOT-S on the Raspberry Pi Pico and evaluate their accuracy and computational costs using a tiny CNN model on the rotated MNIST dataset and the VGG11 model on the rotated CIFAR-10 dataset. Our results demonstrate that PRIOT improves accuracy by 8.08 to 33.75 percentage points over existing methods, while PRIOT-S reduces memory footprint with minimal accuracy loss.

Updated: 2025-03-21 05:07:57

标题: PRIOT：基于修剪的嵌入式系统整数转移学习

摘要: 设备上的迁移学习对于将通用骨干模型适应于每个边缘设备的独特环境至关重要。微型控制器，如树莓派 Pico，是设备学习的关键目标，但通常缺乏浮点单元，需要进行整数训练。以前的研究采用的动态计算量化比例因子会产生高计算成本。因此，本研究侧重于仅使用静态比例因子进行整数训练，这在现有训练方法中具有挑战性。我们提出了一种名为PRIOT的新训练方法，通过修剪选定的边而不是更新权重来优化网络，从而实现了在静态比例因子下的有效训练。修剪模式由边弹出算法确定，该算法训练为每个边分配的得分参数，而不是原始参数，并在推断之前修剪得分低的边。此外，我们引入了一种内存高效的变体PRIOT-S，仅为少量边分配得分。我们在树莓派 Pico 上实现了PRIOT和PRIOT-S，并使用旋转的MNIST数据集上的微型CNN模型和旋转的CIFAR-10数据集上的VGG11模型评估它们的准确性和计算成本。我们的结果表明，PRIOT相对于现有方法提高了8.08至33.75个百分点的准确性，而PRIOT-S在最小准确性损失的情况下减少了内存占用。

更新时间: 2025-03-21 05:07:57

领域: cs.LG

下载: http://arxiv.org/abs/2503.16860v1

MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering

Understanding the relationship between textual news and time-series evolution is a critical yet under-explored challenge in applied data science. While multimodal learning has gained traction, existing multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering, which are essential for capturing complex interactions between narrative information and temporal patterns. To bridge this gap, we introduce Multimodal Time Series Benchmark (MTBench), a large-scale benchmark designed to evaluate large language models (LLMs) on time series and text understanding across financial and weather domains. MTbench comprises paired time series and textual data, including financial news with corresponding stock price movements and weather reports aligned with historical temperature records. Unlike existing benchmarks that focus on isolated modalities, MTbench provides a comprehensive testbed for models to jointly reason over structured numerical trends and unstructured textual narratives. The richness of MTbench enables formulation of diverse tasks that require a deep understanding of both text and time-series data, including time-series forecasting, semantic and technical trend analysis, and news-driven question answering (QA). These tasks target the model's ability to capture temporal dependencies, extract key insights from textual context, and integrate cross-modal information. We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns. Our findings reveal significant challenges in current models, including difficulties in capturing long-term dependencies, interpreting causality in financial and weather trends, and effectively fusing multimodal information.

Updated: 2025-03-21 05:04:53

标题: MTBench：用于时间推理和问题回答的多模式时间序列基准

摘要: 理解文本新闻和时间序列演变之间的关系是应用数据科学中一个关键但尚未充分探索的挑战。虽然多模态学习已经引起了关注，但现有的多模态时间序列数据集在评估跨模态推理和复杂问题回答方面仍然存在不足，这些对于捕捉叙事信息和时间模式之间的复杂交互作用至关重要。为了弥补这一差距，我们引入了多模态时间序列基准（MTBench），这是一个大规模基准，旨在评估大型语言模型（LLMs）在金融和天气领域的时间序列和文本理解能力。MTBench包括成对的时间序列和文本数据，包括与股价走势相关的金融新闻和与历史温度记录对齐的天气报告。与现有专注于孤立模态的基准不同，MTBench为模型提供了一个全面的测试平台，可以共同推理结构化数字趋势和非结构化文本叙述。MTBench的丰富性使其能够制定需要深入理解文本和时间序列数据的多样任务，包括时间序列预测、语义和技术趋势分析以及新闻驱动的问题回答（QA）。这些任务旨在测试模型捕捉时间依赖性、从文本上下文中提取关键见解以及整合跨模态信息的能力。我们在MTBench上评估最先进的LLMs，分析它们在建模新闻叙述和时间模式之间复杂关系方面的有效性。我们的研究结果揭示了当前模型所面临的重大挑战，包括捕捉长期依赖性、解释金融和天气趋势中的因果关系以及有效融合多模态信息的困难。

更新时间: 2025-03-21 05:04:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.16858v1

Large Language Models and Causal Inference in Collaboration: A Survey

Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of Natural Language Processing (NLP) models by capturing causal relationships among variables. The emergence of generative Large Language Models (LLMs) has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on evaluating and improving LLMs from a causal view in the following areas: understanding and improving the LLMs' reasoning capacity, addressing fairness and safety issues in LLMs, complementing LLMs with explanations, and handling multimodality. Meanwhile, LLMs' strong reasoning capacities can in turn contribute to the field of causal inference by aiding causal relationship discovery and causal effect estimations. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and equitable artificial intelligence systems.

Updated: 2025-03-21 04:57:45

标题: 大语言模型和协作中的因果推断：一项调查

摘要: 因果推断已经显示出在增强自然语言处理（NLP）模型的预测准确性、公平性、鲁棒性和可解释性方面具有潜力，通过捕捉变量之间的因果关系。生成型大型语言模型（LLMs）的出现显著影响了各个NLP领域，特别是通过其先进的推理能力。本调查重点评估和改进LLMs的因果视角，涉及以下领域：理解和改进LLMs的推理能力，解决LLMs中的公平性和安全性问题，用解释来补充LLMs，以及处理多模态性。与此同时，LLMs强大的推理能力反过来可以通过帮助发现因果关系和估计因果效应来为因果推断领域做出贡献。本综述探讨了因果推断框架和LLMs之间的相互作用，强调它们共同的潜力，以推动更先进和更公平的人工智能系统的发展。

更新时间: 2025-03-21 04:57:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.09606v3

Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models

Language models pretrained on text-only corpora often struggle with tasks that require auditory commonsense knowledge. Previous work addresses this problem by augmenting the language model to retrieve knowledge from external audio databases. This approach has several limitations, such as the potential lack of relevant audio in databases and the high costs associated with constructing and querying the databases. To address these issues, we propose Imagine to Hear, a novel approach that dynamically generates auditory knowledge using generative models. Our framework detects multiple audio-related textual spans from the given prompt and generates corresponding auditory knowledge. We develop several mechanisms to efficiently process multiple auditory knowledge, including a CLAP-based rejection sampler and a language-audio fusion module. Our experiments show that our method achieves state-of-the-art performance on AuditoryBench without relying on external databases, highlighting the effectiveness of our generation-based approach.

Updated: 2025-03-21 04:56:22

标题: 想象聆听：听觉知识生成可以成为语言模型的有效助手

摘要: 在仅基于文本语料库进行预训练的语言模型通常难以处理需要听觉常识知识的任务。先前的工作通过扩展语言模型以从外部音频数据库检索知识来解决这个问题。这种方法存在一些限制，比如数据库中可能缺乏相关音频以及构建和查询数据库所带来的高成本。为了解决这些问题，我们提出了Imagine to Hear，一种新颖的方法，通过生成模型动态生成听觉知识。我们的框架从给定的提示中检测多个与音频相关的文本段落，并生成相应的听觉知识。我们开发了几种机制来高效处理多个听觉知识，包括基于CLAP的拒绝采样器和语言-音频融合模块。我们的实验证明，我们的方法在不依赖外部数据库的情况下在AuditoryBench上取得了最先进的性能，突显了我们基于生成的方法的有效性。

更新时间: 2025-03-21 04:56:22

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2503.16853v1

Casual Inference via Style Bias Deconfounding for Domain Generalization

Deep neural networks (DNNs) often struggle with out-of-distribution data, limiting their reliability in diverse realworld applications. To address this issue, domain generalization methods have been developed to learn domain-invariant features from single or multiple training domains, enabling generalization to unseen testing domains. However, existing approaches usually overlook the impact of style frequency within the training set. This oversight predisposes models to capture spurious visual correlations caused by style confounding factors, rather than learning truly causal representations, thereby undermining inference reliability. In this work, we introduce Style Deconfounding Causal Learning (SDCL), a novel causal inference-based framework designed to explicitly address style as a confounding factor. Our approaches begins with constructing a structural causal model (SCM) tailored to the domain generalization problem and applies a backdoor adjustment strategy to account for style influence. Building on this foundation, we design a style-guided expert module (SGEM) to adaptively clusters style distributions during training, capturing the global confounding style. Additionally, a back-door causal learning module (BDCL) performs causal interventions during feature extraction, ensuring fair integration of global confounding styles into sample predictions, effectively reducing style bias. The SDCL framework is highly versatile and can be seamlessly integrated with state-of-the-art data augmentation techniques. Extensive experiments across diverse natural and medical image recognition tasks validate its efficacy, demonstrating superior performance in both multi-domain and the more challenging single-domain generalization scenarios.

Updated: 2025-03-21 04:52:31

标题: 通过风格偏差去混淆来实现领域泛化的因果推断

摘要: 深度神经网络（DNNs）通常在处理超出分布数据时遇到困难，从而限制了它们在各种实际应用中的可靠性。为解决这一问题，已经开发了领域泛化方法，从单个或多个训练领域中学习领域不变特征，实现对未知测试领域的泛化。然而，现有方法通常忽视训练集中样式频率的影响。这种疏忽使模型易于捕捉由样式混淆因素引起的虚假视觉相关性，而不是学习真正的因果表示，从而削弱了推理的可靠性。在这项工作中，我们引入了基于风格去混淆因果学习（SDCL）的新颖因果推断框架，旨在明确解决风格作为混淆因素的问题。我们的方法从构建专为领域泛化问题量身定制的结构因果模型（SCM）开始，并应用反向调整策略来考虑风格影响。在此基础上，我们设计了一个风格引导专家模块（SGEM），在训练过程中自适应地聚类风格分布，捕捉全局混淆风格。此外，一个反向调整因果学习模块（BDCL）在特征提取过程中执行因果干预，确保全局混淆风格公平地融入样本预测中，有效降低风格偏差。SDCL框架非常灵活，可以无缝集成最先进的数据增强技术。通过对各种自然和医学图像识别任务的广泛实验验证了其有效性，在多领域和更具挑战性的单领域泛化场景中表现出卓越性能。

更新时间: 2025-03-21 04:52:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16852v1

Towards LLM Guardrails via Sparse Representation Steering

Large Language Models (LLMs) have demonstrated remarkable performance in natural language generation tasks, yet their uncontrolled outputs pose significant ethical and safety risks. Recently, representation engineering methods have shown promising results in steering model behavior by modifying the rich semantic information encoded in activation vectors. However, due to the difficulty of precisely disentangling semantic directions within high-dimensional representation space, existing approaches suffer from three major limitations: lack of fine-grained control, quality degradation of generated content, and poor interpretability. To address these challenges, we propose a sparse encoding-based representation engineering method, named SRE, which decomposes polysemantic activations into a structured, monosemantic feature space. By leveraging sparse autoencoding, our approach isolates and adjusts only task-specific sparse feature dimensions, enabling precise and interpretable steering of model behavior while preserving content quality. We validate our method on three critical domains, i.e., safety, fairness, and truthfulness using the open-source LLM Gemma-2-2B-it. Experimental results show that SRE achieves superior controllability while maintaining the overall quality of generated content (i.e., controllability and quality), demonstrating its effectiveness as a fine-grained and interpretable activation steering framework.

Updated: 2025-03-21 04:50:25

标题: 朝着通过稀疏表示引导的LLM防护栏

摘要: 大型语言模型(LLMs)在自然语言生成任务中表现出卓越的性能，但它们的未受控制的输出带来了显著的道德和安全风险。最近，表示工程方法已经在通过修改激活向量中编码的丰富语义信息来引导模型行为方面显示出有希望的结果。然而，由于在高维表示空间中精确分离语义方向的困难，现有方法存在三个主要限制:缺乏细粒度控制、生成内容质量下降和解释性差。为了解决这些挑战，我们提出了一种基于稀疏编码的表示工程方法，命名为SRE，它将多义激活分解为结构化的、单义特征空间。通过利用稀疏自动编码，我们的方法隔离和调整仅特定于任务的稀疏特征维度，实现对模型行为的精确和可解释的引导，同时保持内容质量。我们在三个关键领域，即安全、公平和真实性上使用开源LLM Gemma-2-2B-it验证了我们的方法。实验结果表明，SRE实现了卓越的可控性，同时保持生成内容的整体质量(即可控性和质量)，展示了其作为一个细粒度和可解释的激活引导框架的有效性。

更新时间: 2025-03-21 04:50:25

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2503.16851v1

Physics-Informed Neural Network Surrogate Models for River Stage Prediction

This work investigates the feasibility of using Physics-Informed Neural Networks (PINNs) as surrogate models for river stage prediction, aiming to reduce computational cost while maintaining predictive accuracy. Our primary contribution demonstrates that PINNs can successfully approximate HEC-RAS numerical solutions when trained on a single river, achieving strong predictive accuracy with generally low relative errors, though some river segments exhibit higher deviations. By integrating the governing Saint-Venant equations into the learning process, the proposed PINN-based surrogate model enforces physical consistency and significantly improves computational efficiency compared to HEC-RAS. We evaluate the model's performance in terms of accuracy and computational speed, demonstrating that it closely approximates HEC-RAS predictions while enabling real-time inference. These results highlight the potential of PINNs as effective surrogate models for single-river hydrodynamics, offering a promising alternative for computationally efficient river stage forecasting. Future work will explore techniques to enhance PINN training stability and robustness across a more generalized multi-river model.

Updated: 2025-03-21 04:48:22

标题: 物理信息神经网络代理模型用于河流水位预测

摘要: 这项工作探讨了使用物理信息神经网络（PINNs）作为河流水位预测的替代模型的可行性，旨在降低计算成本同时保持预测准确性。我们的主要贡献表明，当在单个河流上训练时，PINNs可以成功逼近HEC-RAS数值解，实现强大的预测准确性，尽管一些河段存在较高的偏差。通过将圣维南方程集成到学习过程中，所提出的基于PINN的替代模型强化了物理一致性，并与HEC-RAS相比显著提高了计算效率。我们评估了该模型在准确性和计算速度方面的表现，表明它在实时推断的同时与HEC-RAS预测结果紧密吻合。这些结果突显了PINNs作为单一河流水动力学的有效替代模型的潜力，为计算高效的河流水位预测提供了一种有前途的替代方案。未来工作将探索提高PINN训练稳定性和鲁棒性的技术，以跨更广义的多河模型。

更新时间: 2025-03-21 04:48:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.16850v1

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation

6D object pose estimation aims at determining an object's translation, rotation, and scale, typically from a single RGBD image. Recent advancements have expanded this estimation from instance-level to category-level, allowing models to generalize across unseen instances within the same category. However, this generalization is limited by the narrow range of categories covered by existing datasets, such as NOCS, which also tend to overlook common real-world challenges like occlusion. To tackle these challenges, we introduce Omni6D, a comprehensive RGBD dataset featuring a wide range of categories and varied backgrounds, elevating the task to a more realistic context. 1) The dataset comprises an extensive spectrum of 166 categories, 4688 instances adjusted to the canonical pose, and over 0.8 million captures, significantly broadening the scope for evaluation. 2) We introduce a symmetry-aware metric and conduct systematic benchmarks of existing algorithms on Omni6D, offering a thorough exploration of new challenges and insights. 3) Additionally, we propose an effective fine-tuning approach that adapts models from previous datasets to our extensive vocabulary setting. We believe this initiative will pave the way for new insights and substantial progress in both the industrial and academic fields, pushing forward the boundaries of general 6D pose estimation.

Updated: 2025-03-21 04:47:17

标题: Omni6D：用于类别级别6D物体姿态估计的大词汇量3D物体数据集

摘要: 6D物体姿态估计旨在从单个RGBD图像中确定物体的平移、旋转和比例尺度。最近的进展已经将这种估计从实例级别扩展到类别级别，允许模型在同一类别内未见实例上进行泛化。然而，这种泛化受现有数据集（如NOCS）覆盖的类别范围狭窄的限制，这些数据集也倾向于忽视常见的现实世界挑战，比如遮挡。为了应对这些挑战，我们介绍了Omni6D，一个包含各种类别和不同背景的全面RGBD数据集，将任务提升到更加现实的背景中。1）该数据集包含一个广泛的166个类别的充分光谱，调整到规范姿态的4688个实例，以及超过80万个捕获，显著扩大了评估的范围。2）我们引入了一种对称感知度量，并在Omni6D上对现有算法进行系统化基准测试，提供了对新挑战和见解的彻底探索。3）此外，我们提出了一种有效的微调方法，将模型从先前的数据集适应到我们的广泛词汇设置中。我们相信这一举措将为工业和学术领域的新见解和实质性进展铺平道路，推动6D姿态估计的普遍边界。

更新时间: 2025-03-21 04:47:17

领域: cs.CV,cs.AI,I.2

下载: http://arxiv.org/abs/2409.18261v3

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for GPT-4o. This adapter model internalizes diverse forms of user reminders-such as personalized preferences, corrective guidance, and contextual assistance-into structured instruction-formatted cues that prompt GPT-4o in generating customized task plans. Additionally, AlignBot integrates a dynamic retrieval mechanism that selects task-relevant historical successes as prompts for GPT-4o, further enhancing task planning accuracy. To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings. A multimodal dataset with over 1,500 entries derived from volunteer reminders is used for training and evaluation. The results demonstrate that AlignBot significantly improves customized task planning, outperforming existing LLM- and VLM-powered planners by interpreting and aligning with user reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline at 21.6%, reflecting a 65% improvement and over four times greater effectiveness. Supplementary materials are available at: https://yding25.com/AlignBot/

Updated: 2025-03-21 04:40:24

标题: AlignBot：通过微调将VLM动力定制的任务规划与用户提醒对齐，用于家庭机器人

摘要: 这篇论文介绍了AlignBot，这是一个新颖的框架，旨在通过有效与用户提醒对齐，优化基于VLM的定制任务规划，用于家庭机器人。在家庭环境中，将任务规划与用户提醒对齐面临重大挑战，因为提醒的数量有限、多样化且多模式。为了解决这些挑战，AlignBot采用了一个经过微调的LLaVA-7B模型，作为GPT-4o的适配器。这个适配器模型将各种形式的用户提醒（如个性化偏好、纠正指导和情境帮助）内部化为结构化的指令格式化提示，促使GPT-4o生成定制的任务计划。此外，AlignBot集成了一个动态检索机制，选择与任务相关的历史成功作为GPT-4o的提示，进一步提高任务规划的准确性。为验证AlignBot的有效性，在真实家庭环境中进行了实验，在实验室中构建了典型家庭设置。使用来自志愿者提醒的超过1500条条目的多模式数据集进行训练和评估。结果表明，AlignBot显着改善了定制任务规划，优于现有的LLM和VLM动力规划器，通过解释和对齐用户提醒，成功率达到86.8%，而基线GPT-4o为21.6%，反映出65%的改进和超过四倍的效果。补充材料可在以下网址找到：https://yding25.com/AlignBot/

更新时间: 2025-03-21 04:40:24

领域: cs.RO,cs.AI,cs.IR

下载: http://arxiv.org/abs/2409.11905v2

Early-MFC: Enhanced Flow Correlation Attacks on Tor via Multi-view Triplet Networks with Early Network Traffic

Flow correlation attacks is an efficient network attacks, aiming to expose those who use anonymous network services, such as Tor. Conducting such attacks during the early stages of network communication is particularly critical for scenarios demanding rapid decision-making, such as cybercrime detection or financial fraud prevention. Although recent studies have made progress in flow correlation attacks techniques, research specifically addressing flow correlation with early network traffic flow remains limited. Moreover, due to factors such as model complexity, training costs, and real-time requirements, existing technologies cannot be directly applied to flow correlation with early network traffic flow. In this paper, we propose flow correlation attack with early network traffic, named Early-MFC, based on multi-view triplet networks. The proposed approach extracts multi-view traffic features from the payload at the transport layer and the Inter-Packet Delay. It then integrates multi-view flow information, converting the extracted features into shared embeddings. By leveraging techniques such as metric learning and contrastive learning, the method optimizes the embeddings space by ensuring that similar flows are mapped closer together while dissimilar flows are positioned farther apart. Finally, Bayesian decision theory is applied to determine flow correlation, enabling high-accuracy flow correlation with early network traffic flow. Furthermore, we investigate flow correlation attacks under extra-early network traffic flow conditions. To address this challenge, we propose Early-MFC+, which utilizes payload data to construct embedded feature representations, ensuring robust performance even with minimal packet availability.

Updated: 2025-03-21 04:36:51

标题: 早期MFC：通过早期网络流量的多视图三元组网络增强对Tor的流量相关攻击

摘要: 流量相关攻击是一种高效的网络攻击，旨在揭露那些使用匿名网络服务（如Tor）的人。在网络通信的早期阶段进行这种攻击对于需要快速决策的场景尤为关键，比如网络犯罪检测或金融欺诈预防。尽管最近的研究在流量相关攻击技术方面取得了进展，但专门处理早期网络流量相关的研究仍然有限。此外，由于模型复杂性、训练成本和实时需求等因素，现有技术无法直接应用于早期网络流量相关。本文提出了一种基于多视图三元网络的早期网络流量相关攻击，称为Early-MFC。所提出的方法从传输层的负载和包间延迟中提取多视图流量特征。然后整合多视图流信息，将提取的特征转换为共享嵌入。通过利用度量学习和对比学习等技术，该方法通过确保相似的流被映射在一起，而不相似的流被定位在较远的位置来优化嵌入空间。最后，采用贝叶斯决策理论确定流量相关性，实现了对早期网络流量的高准确度流量相关。此外，我们还研究了在额外早期网络流量条件下的流量相关攻击。为了解决这一挑战，我们提出了Early-MFC+，利用负载数据构建嵌入特征表示，确保即使数据包可用性较少，也能实现稳健的性能。

更新时间: 2025-03-21 04:36:51

领域: cs.CR

下载: http://arxiv.org/abs/2503.16847v1

An Accelerated Bregman Algorithm for ReLU-based Symmetric Matrix Decomposition

Symmetric matrix decomposition is an active research area in machine learning. This paper focuses on exploiting the low-rank structure of non-negative and sparse symmetric matrices via the rectified linear unit (ReLU) activation function. We propose the ReLU-based nonlinear symmetric matrix decomposition (ReLU-NSMD) model, introduce an accelerated alternating partial Bregman (AAPB) method for its solution, and present the algorithm's convergence results. Our algorithm leverages the Bregman proximal gradient framework to overcome the challenge of estimating the global $L$-smooth constant in the classic proximal gradient algorithm. Numerical experiments on synthetic and real datasets validate the effectiveness of our model and algorithm.

Updated: 2025-03-21 04:32:53

标题: 一个加速的 Bregman 算法用于基于 ReLU 的对称矩阵分解

摘要: 对称矩阵分解是机器学习中一个活跃的研究领域。本文着重利用非负稀疏对称矩阵的低秩结构，通过修正线性单元（ReLU）激活函数。我们提出了基于ReLU的非线性对称矩阵分解（ReLU-NSMD）模型，介绍了一种加速交替部分Bregman（AAPB）方法来解决它的问题，并展示了算法的收敛结果。我们的算法利用Bregman近端梯度框架来克服在经典近端梯度算法中估计全局$L$-平滑常数的挑战。对合成和真实数据集的数值实验验证了我们模型和算法的有效性。

更新时间: 2025-03-21 04:32:53

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2503.16846v1

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

3D affordance segmentation aims to link human instructions to touchable regions of 3D objects for embodied manipulations. Existing efforts typically adhere to single-object, single-affordance paradigms, where each affordance type or explicit instruction strictly corresponds to a specific affordance region and are unable to handle long-horizon tasks. Such a paradigm cannot actively reason about complex user intentions that often imply sequential affordances. In this paper, we introduce the Sequential 3D Affordance Reasoning task, which extends the traditional paradigm by reasoning from cumbersome user intentions and then decomposing them into a series of segmentation maps. Toward this, we construct the first instruction-based affordance segmentation benchmark that includes reasoning over both single and sequential affordances, comprising 180K instruction-point cloud pairs. Based on the benchmark, we propose our model, SeqAfford, to unlock the 3D multi-modal large language model with additional affordance segmentation abilities, which ensures reasoning with world knowledge and fine-grained affordance grounding in a cohesive framework. We further introduce a multi-granular language-point integration module to endow 3D dense prediction. Extensive experimental evaluations show that our model excels over well-established methods and exhibits open-world generalization with sequential reasoning abilities.

Updated: 2025-03-21 04:31:01

标题: SeqAfford：通过多模态大型语言模型进行顺序3D可负担性推理

摘要: 3D affordance segmentation旨在将人类指令与3D物体的可触摸区域联系起来，以进行体现操作。现有的努力通常遵循单个对象、单个affordance范例，其中每种affordance类型或明确指令严格对应于特定的affordance区域，并且无法处理长期任务。这种范例不能积极推理复杂的用户意图，这些意图通常暗示顺序affordances。在本文中，我们介绍了顺序3D affordance推理任务，该任务通过从繁琐的用户意图进行推理，然后将其分解为一系列分割图，从而扩展了传统的范例。为此，我们构建了第一个基于指令的affordance分割基准，其中包括对单个和顺序affordances的推理，包括18万个指令-点云对。基于这个基准，我们提出了我们的模型SeqAfford，以解锁3D多模态大语言模型，具有额外的affordance分割能力，从而确保在一个连贯的框架中使用世界知识进行推理和细粒度的affordance基础。我们进一步引入了一个多粒度语言-点集成模块，以赋予3D密集预测。广泛的实验评估表明，我们的模型优于已建立的方法，并展示了具有顺序推理能力的开放世界泛化能力。

更新时间: 2025-03-21 04:31:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.01550v3

Preferential Multi-Objective Bayesian Optimization for Drug Discovery

Despite decades of advancements in automated ligand screening, large-scale drug discovery remains resource-intensive and requires post-processing hit selection, a step where chemists manually select a few promising molecules based on their chemical intuition. This creates a major bottleneck in the virtual screening process for drug discovery, demanding experts to repeatedly balance complex trade-offs among drug properties across a vast pool of candidates. To improve the efficiency and reliability of this process, we propose a novel human-centered framework named CheapVS that allows chemists to guide the ligand selection process by providing preferences regarding the trade-offs between drug properties via pairwise comparison. Our framework combines preferential multi-objective Bayesian optimization with a docking model for measuring binding affinity to capture human chemical intuition for improving hit identification. Specifically, on a library of 100K chemical candidates targeting EGFR and DRD2, CheapVS outperforms state-of-the-art screening methods in identifying drugs within a limited computational budget. Notably, our method can recover up to 16/37 EGFR and 37/58 DRD2 known drugs while screening only 6% of the library, showcasing its potential to significantly advance drug discovery.

Updated: 2025-03-21 04:27:06

标题: 优先多目标贝叶斯优化在药物发现中的应用

摘要: 尽管自动配体筛选取得了几十年的进展，但大规模药物发现仍然需要耗费资源，并需要进行后续处理的命中选择，即化学家根据其化学直觉手动选择几种有前途的分子。这在药物发现的虚拟筛选过程中造成了一个主要瓶颈，需要专家在广泛的候选人群中不断平衡药物性质之间的复杂权衡。为了提高这一过程的效率和可靠性，我们提出了一个名为CheapVS的新颖以人为中心的框架，允许化学家通过通过成对比较提供有关药物性质之间权衡的偏好来引导配体选择过程。我们的框架结合了优先多目标贝叶斯优化和用于测量结合亲和力的对接模型，以捕捉人类化学直觉以改进命中识别。具体而言，在针对EGFR和DRD2的10万个化学候选人库上，CheapVS在有限的计算预算内优于最先进的筛选方法，值得注意的是，我们的方法在仅筛选库中的6%时可恢复高达16/37个EGFR和37/58个DRD2已知药物，展示了其显著推进药物发现的潜力。

更新时间: 2025-03-21 04:27:06

领域: cs.LG,cs.HC,q-bio.BM

下载: http://arxiv.org/abs/2503.16841v1

A Flexible Fairness Framework with Surrogate Loss Reweighting for Addressing Sociodemographic Disparities

This paper presents a new algorithmic fairness framework called $\boldsymbol{\alpha}$-$\boldsymbol{\beta}$ Fair Machine Learning ($\boldsymbol{\alpha}$-$\boldsymbol{\beta}$ FML), designed to optimize fairness levels across sociodemographic attributes. Our framework employs a new family of surrogate loss functions, paired with loss reweighting techniques, allowing precise control over fairness-accuracy trade-offs through tunable hyperparameters $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$. To efficiently solve the learning objective, we propose Parallel Stochastic Gradient Descent with Surrogate Loss (P-SGD-S) and establish convergence guarantees for both convex and nonconvex loss functions. Experimental results demonstrate that our framework improves overall accuracy while reducing fairness violations, offering a smooth trade-off between standard empirical risk minimization and strict minimax fairness. Results across multiple datasets confirm its adaptability, ensuring fairness improvements without excessive performance degradation.

Updated: 2025-03-21 04:10:14

标题: 一个灵活的公平框架：利用替代损失重加权解决社会人口统计差异

摘要: 这篇论文提出了一种新的算法公平性框架，称为$\boldsymbol{\alpha}$-$\boldsymbol{\beta}$ 公平机器学习（$\boldsymbol{\alpha}$-$\boldsymbol{\beta}$ FML），旨在优化社会人口属性的公平水平。我们的框架采用一组新的替代损失函数，配合损失重新加权技术，通过调节超参数$\boldsymbol{\alpha}$和$\boldsymbol{\beta}$，可以精确控制公平性-准确性权衡。为了高效解决学习目标，我们提出了带有替代损失的并行随机梯度下降（P-SGD-S），并为凸和非凸损失函数都建立了收敛保证。实验结果表明，我们的框架提高了整体准确性，同时减少了公平性违规，提供了标准经验风险最小化和严格极小化公平之间的平滑权衡。多个数据集上的结果证实了其适应性，确保在不过度降低性能的情况下改善公平性。

更新时间: 2025-03-21 04:10:14

领域: cs.LG

下载: http://arxiv.org/abs/2503.16836v1

Safe and Reliable Diffusion Models via Subspace Projection

Large-scale text-to-image (T2I) diffusion models have revolutionized image generation, enabling the synthesis of highly detailed visuals from textual descriptions. However, these models may inadvertently generate inappropriate content, such as copyrighted works or offensive images. While existing methods attempt to eliminate specific unwanted concepts, they often fail to ensure complete removal, allowing the concept to reappear in subtle forms. For instance, a model may successfully avoid generating images in Van Gogh's style when explicitly prompted with 'Van Gogh', yet still reproduce his signature artwork when given the prompt 'Starry Night'. In this paper, we propose SAFER, a novel and efficient approach for thoroughly removing target concepts from diffusion models. At a high level, SAFER is inspired by the observed low-dimensional structure of the text embedding space. The method first identifies a concept-specific subspace $S_c$ associated with the target concept c. It then projects the prompt embeddings onto the complementary subspace of $S_c$, effectively erasing the concept from the generated images. Since concepts can be abstract and difficult to fully capture using natural language alone, we employ textual inversion to learn an optimized embedding of the target concept from a reference image. This enables more precise subspace estimation and enhances removal performance. Furthermore, we introduce a subspace expansion strategy to ensure comprehensive and robust concept erasure. Extensive experiments demonstrate that SAFER consistently and effectively erases unwanted concepts from diffusion models while preserving generation quality.

Updated: 2025-03-21 04:09:25

标题: 通过子空间投影实现安全可靠的扩散模型

摘要: 大规模文本到图像（T2I）扩散模型已经彻底改变了图像生成，使得可以从文本描述中合成高度详细的视觉效果。然而，这些模型可能会意外地生成不当内容，如受版权保护的作品或冒犯性图片。虽然现有方法试图消除特定不受欢迎的概念，但它们通常无法确保完全消除，允许该概念以微妙的形式重新出现。例如，当明确提示“Van Gogh”时，模型可能成功地避免生成梵高风格的图像，但当给出提示“星夜”时，仍然会再现他的标志性作品。在本文中，我们提出了一种新颖高效的方法SAFER，用于彻底从扩散模型中删除目标概念。在高层次上，SAFER受到了文本嵌入空间低维结构的启发。该方法首先确定与目标概念c相关联的概念特定子空间Sc。然后，将提示嵌入投影到Sc的补充子空间上，有效地从生成的图像中擦除概念。由于概念可能抽象且难以仅使用自然语言完全捕捉，我们采用文本反演来从参考图像中学习目标概念的优化嵌入。这使得更精确的子空间估计和增强的擦除性能成为可能。此外，我们引入了一种子空间扩展策略，以确保全面和强大的概念擦除。大量实验证明，SAFER能够一致有效地从扩散模型中擦除不受欢迎的概念，同时保持生成质量。

更新时间: 2025-03-21 04:09:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.16835v1

The Deployment of End-to-End Audio Language Models Should Take into Account the Principle of Least Privilege

We are at a turning point for language models that accept audio input. The latest end-to-end audio language models (Audio LMs) process speech directly instead of relying on a separate transcription step. This shift preserves detailed information, such as intonation or the presence of multiple speakers, that would otherwise be lost in transcription. However, it also introduces new safety risks, including the potential misuse of speaker identity cues and other sensitive vocal attributes, which could have legal implications. In this position paper, we urge a closer examination of how these models are built and deployed. We argue that the principle of least privilege should guide decisions on whether to deploy cascaded or end-to-end models. Specifically, evaluations should assess (1) whether end-to-end modeling is necessary for a given application; and (2), the appropriate scope of information access. Finally, We highlight related gaps in current audio LM benchmarks and identify key open research questions, both technical and policy-related, that must be addressed to enable the responsible deployment of end-to-end Audio LMs.

Updated: 2025-03-21 04:03:59

标题: 端到端音频语言模型的部署应考虑最小特权原则

摘要: 我们正处于接受音频输入的语言模型的转折点。最新的端到端音频语言模型（Audio LMs）直接处理语音，而不依赖于单独的转录步骤。这种转变保留了详细信息，比如语调或多个说话者的存在，否则这些信息在转录中可能会丢失。然而，这也引入了新的安全风险，包括潜在的滥用说话者身份线索和其他敏感的声音属性，这可能会产生法律影响。在这篇立场论文中，我们敦促更仔细地审查这些模型是如何构建和部署的。我们认为最小权限原则应该指导是否部署级联或端到端模型的决策。具体来说，评估应该评估（1）端到端建模对于特定应用是否必要；以及（2）信息访问的适当范围。最后，我们强调当前音频LM基准测试中相关的差距，并确定必须解决的关键开放式研究问题，无论是技术还是政策相关的，以便负责任地部署端到端的音频LM。

更新时间: 2025-03-21 04:03:59

领域: cs.SD,cs.AI,cs.CL,cs.CY,eess.AS

下载: http://arxiv.org/abs/2503.16833v1

Specialized Foundation Models Struggle to Beat Supervised Baselines

Following its success for vision and text, the "foundation model" (FM) paradigm -- pretraining large models on massive data, then fine-tuning on target tasks -- has rapidly expanded to domains in the sciences, engineering, healthcare, and beyond. Has this achieved what the original FMs accomplished, i.e. the supplanting of traditional supervised learning in their domains? To answer we look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow: model development, hyperparameter tuning, and training, all using only data from the target task. Across these three specialized domains, we find that it is consistently possible to train simple supervised models -- no more complicated than a lightly modified wide ResNet or UNet -- that match or even outperform the latest foundation models. Our work demonstrates that the benefits of large-scale pretraining have yet to be realized in many specialized areas, reinforces the need to compare new FMs to strong, well-tuned baselines, and introduces two new, easy-to-use, open-source, and automated workflows for doing so.

Updated: 2025-03-21 03:59:29

标题: 专门的基础模型努力超越监督基线

摘要: 随着其在视觉和文本领域的成功，"基础模型"（FM）范式——在大规模数据上预训练大型模型，然后在目标任务上进行微调——迅速扩展到了科学、工程、医疗保健等领域以及其他领域。这是否取得了原始FM所取得的成就，即在其领域中取代传统的监督学习？为了回答这个问题，我们分析了三种模态——基因组学、卫星成像和时间序列——以及多个最近的FM，并将它们与标准的监督学习工作流程进行了比较：模型开发、超参数调整和训练，所有这些都仅使用来自目标任务的数据。在这三个专业领域中，我们发现，一直可以一致地训练简单的监督模型——不比轻微修改的宽ResNet或UNet复杂——这些模型与甚至胜过最新的基础模型。我们的工作表明，大规模预训练的好处尚未在许多专业领域中实现，强调了将新的FM与经过强化和调整的基线进行比较的必要性，并引入了两种新的、易于使用的、开源的、自动化的工作流程来实现这一点。

更新时间: 2025-03-21 03:59:29

领域: cs.LG,cs.AI,cs.CV,q-bio.GN

下载: http://arxiv.org/abs/2411.02796v2

Efficient and Expressive Public Key Authenticated Encryption with Keyword Search in Multi-user Scenarios

Public key authenticated encryption with keyword search (PAEKS) represents a significant advancement of secure and searchable data sharing in public network systems, such as medical systems. It can effectively mitigate the risk of keyword guessing attacks (KGA), which is a critical issue in public key encryption with keyword search (PEKS). However, in scenarios with a large number of users, the enforced point-to-point access control necessitates that the data sender encrypt the same keyword using the public keys of multiple receivers to create indexes, while the data receiver also must generate trapdoors of size linear to senders in the system. The burden on users aiming for efficient data sharing is considerable, as the overheads increase linearly with the number of users. Furthermore, the majority of current PAEKS schemes lack expressive search functions, including conjunctions, disjunctions, or any monotone boolean formulas, which are prevalent in practical applications. To tackle the abovementioned challenges, we propose an efficient and expressive PAEKS scheme. In efficiency, one auxiliary server is integrated to assist users in generating indexes and trapdoors. Users encrypt with their respective private keys along with the public keys of the servers, facilitating secure and searchable data sharing while significantly minimizing overhead. Additionally, the LSSS is employed to implement expressive search, including monotone boolean queries. We also obfuscate the mapping relationship associated with the LSSS matrix to the keywords, thereby enhancing the privacy protection. Security analysis alongside theoretical and experimental evaluations of our scheme illustrates its practicality and efficiency in multi-user data sharing scenarios.

Updated: 2025-03-21 03:51:43

标题: 多用户场景中高效且表达力强的具有关键词搜索功能的公钥认证加密系统

摘要: 公钥认证加密与关键字搜索（PAEKS）代表了在公共网络系统中进行安全和可搜索数据共享的重大进展，例如医疗系统。它可以有效地减轻关键字猜测攻击（KGA）的风险，这是公钥加密与关键字搜索（PEKS）中的一个关键问题。然而，在具有大量用户的情况下，强制点对点访问控制要求数据发送者使用多个接收者的公钥对相同关键字进行加密以创建索引，而数据接收者还必须生成与系统中发送者数量成正比的陷门。对于希望进行高效数据共享的用户来说，负担相当重，因为开销随用户数量线性增加。此外，目前大多数PAEKS方案缺乏表达能力强的搜索功能，包括合取、析取或任何单调布尔公式，在实际应用中普遍存在。为了解决上述挑战，我们提出了一个高效而具有表现力的PAEKS方案。在效率方面，集成了一个辅助服务器来协助用户生成索引和陷门。用户使用各自的私钥以及服务器的公钥进行加密，从而促进安全和可搜索的数据共享，同时显著减少开销。此外，采用LSSS来实现具有表现力的搜索，包括单调布尔查询。我们还对与LSSS矩阵相关的映射关系进行混淆处理，从而增强隐私保护。我们的方案的安全分析以及理论和实验评估说明了其在多用户数据共享场景中的实用性和效率。

更新时间: 2025-03-21 03:51:43

领域: cs.CR

下载: http://arxiv.org/abs/2503.16828v1

Examining Two Hop Reasoning Through Information Content Scaling

Prior work has found that transformers have an inconsistent ability to learn to answer latent two-hop questions -- questions of the form "Who is Bob's mother's boss?" We study why this is the case by examining how transformers' capacity to learn datasets of two-hop questions and answers (two-hop QA) scales with their size, motivated by prior work on transformer knowledge capacity for simple factual memorization. We find that capacity scaling and generalization both support the hypothesis that latent two-hop QA requires transformers to learn each fact twice, while two-hop QA with chain of thought does not. We also show that with appropriate dataset parameters, it is possible to "trap" very small models in a regime where they memorize answers to two-hop questions independently, even though they would perform better if they could learn to answer them with function composition. Our findings show that measurement of capacity scaling can complement existing interpretability methods, though there are challenges in using it for this purpose.

Updated: 2025-03-21 03:49:12

标题: 通过信息内容缩放检验双跳推理

摘要: 先前的研究发现，变压器模型在学习回答潜在的两跳问题（如“鲍勃的母亲的老板是谁？”）方面能力不一致。我们通过研究变压器模型学习数据集中的两跳问题和答案（两跳问答）的能力如何随其规模增大而变化来探究这一现象，这得益于先前关于变压器知识容量在简单事实记忆方面的研究。我们发现，容量扩展和泛化都支持这样一个假设：潜在的两跳问答需要变压器模型学习每个事实两次，而带有思维链的两跳问答则不需要。我们还展示，在适当的数据集参数下，即使非常小的模型也可以被“困”在一个只独立记忆两跳问题答案的区域，尽管如果它们能够学会通过函数组合来回答这些问题，它们的表现会更好。我们的发现表明，容量扩展的测量可以补充现有的可解释性方法，尽管在这方面使用它仍然存在挑战。

更新时间: 2025-03-21 03:49:12

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.03490v2

Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees

We study high-probability convergence in online learning, in the presence of heavy-tailed noise. To combat the heavy tails, a general framework of nonlinear SGD methods is considered, subsuming several popular nonlinearities like sign, quantization, component-wise and joint clipping. In our work the nonlinearity is treated in a black-box manner, allowing us to establish unified guarantees for a broad range of nonlinear methods. For symmetric noise and non-convex costs we establish convergence of gradient norm-squared, at a rate $\widetilde{\mathcal{O}}(t^{-1/4})$, while for the last iterate of strongly convex costs we establish convergence to the population optima, at a rate $\mathcal{O}(t^{-\zeta})$, where $\zeta \in (0,1)$ depends on noise and problem parameters. Further, if the noise is a (biased) mixture of symmetric and non-symmetric components, we show convergence to a neighbourhood of stationarity, whose size depends on the mixture coefficient, nonlinearity and noise. Compared to state-of-the-art, who only consider clipping and require unbiased noise with bounded $p$-th moments, $p \in (1,2]$, we provide guarantees for a broad class of nonlinearities, without any assumptions on noise moments. While the rate exponents in state-of-the-art depend on noise moments and vanish as $p \rightarrow 1$, our exponents are constant and strictly better whenever $p < 6/5$ for non-convex and $p < 8/7$ for strongly convex costs. Experiments validate our theory, showing that clipping is not always the optimal nonlinearity, further underlining the value of a general framework.

Updated: 2025-03-21 03:44:04

标题: 非线性随机梯度下降和重尾噪声：一个统一框架和高概率保证

摘要: 我们研究在线学习中在存在重尾噪声的情况下的高概率收敛性。为了对抗重尾，我们考虑了一般非线性SGD方法的框架，包括了几种流行的非线性方法，如符号、量化、分量和联合剪切。在我们的工作中，非线性被以黑盒的方式处理，使我们能够为广泛的非线性方法建立统一的保证。对于对称噪声和非凸成本，我们建立了梯度范数平方的收敛性，速率为$\widetilde{\mathcal{O}}(t^{-1/4})$，而对于强凸成本的最后迭代，我们建立了收敛到最优解的速率为$\mathcal{O}(t^{-\zeta})$，其中$\zeta \in (0,1)$取决于噪声和问题参数。此外，如果噪声是对称和非对称成分的（有偏）混合物，我们展示了收敛到一个静止点附近的速率，其大小取决于混合系数、非线性和噪声。与仅考虑剪切并要求具有有界$p$-阶矩的无偏噪声的最新技术相比，$p \in (1,2]$，我们为广泛的非线性提供了保证，而不对噪声矩做任何假设。尽管最新技术中的速率指数取决于噪声矩，并且当$p \rightarrow 1$时消失，我们的指数是常数，并且对于非凸成本而言，当$p < 6/5$时，对于强凸成本而言，当$p < 8/7$时，我们的指数是严格更好的。实验证实了我们的理论，表明剪切并非始终是最佳非线性方法，进一步强调了一般框架的价值。

更新时间: 2025-03-21 03:44:04

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.13954v2

Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

Spatial relation hallucinations pose a persistent challenge in large vision-language models (LVLMs), leading to generate incorrect predictions about object positions and spatial configurations within an image. To address this issue, we propose a constraint-aware prompting framework designed to reduce spatial relation hallucinations. Specifically, we introduce two types of constraints: (1) bidirectional constraint, which ensures consistency in pairwise object relations, and (2) transitivity constraint, which enforces relational dependence across multiple objects. By incorporating these constraints, LVLMs can produce more spatially coherent and consistent outputs. We evaluate our method on three widely-used spatial relation datasets, demonstrating performance improvements over existing approaches. Additionally, a systematic analysis of various bidirectional relation analysis choices and transitivity reference selections highlights greater possibilities of our methods in incorporating constraints to mitigate spatial relation hallucinations.

Updated: 2025-03-21 03:39:57

标题: 通过意识约束提示减轻多模式空间关系中的幻觉

摘要: 空间关系幻觉在大型视觉语言模型（LVLMs）中构成了持久性挑战，导致生成关于图像中物体位置和空间配置的不正确预测。为解决这一问题，我们提出了一个设计用于减少空间关系幻觉的约束感知提示框架。具体而言，我们引入了两种类型的约束：（1）双向约束，确保成对物体关系的一致性，以及（2）传递性约束，强化多个物体之间的关系依赖。通过整合这些约束，LVLMs可以产生更具空间连贯性和一致性的输出。我们在三个广泛使用的空间关系数据集上评估了我们的方法，显示出相对于现有方法的性能改进。此外，对各种双向关系分析选择和传递性参考选择的系统分析突显了我们的方法在整合约束以减轻空间关系幻觉方面的更大可能性。

更新时间: 2025-03-21 03:39:57

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.08317v2

Selective Aggregation for Low-Rank Adaptation in Federated Learning

We investigate LoRA in federated learning through the lens of the asymmetry analysis of the learned $A$ and $B$ matrices. In doing so, we uncover that $A$ matrices are responsible for learning general knowledge, while $B$ matrices focus on capturing client-specific knowledge. Based on this finding, we introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update, but only $A$ matrices are shared with the server for aggregation. Moreover, we delve into the relationship between the learned $A$ and $B$ matrices in other LoRA variants, such as rsLoRA and VeRA, revealing a consistent pattern. Consequently, we extend our FedSA-LoRA method to these LoRA variants, resulting in FedSA-rsLoRA and FedSA-VeRA. In this way, we establish a general paradigm for integrating LoRA with FL, offering guidance for future work on subsequent LoRA variants combined with FL. Extensive experimental results on natural language understanding and generation tasks demonstrate the effectiveness of the proposed method. Our code is available at https://github.com/Pengxin-Guo/FedSA-LoRA.

Updated: 2025-03-21 03:33:21

标题: 在联邦学习中用于低秩适应的选择性聚合

摘要: 我们通过学习$A$和$B$矩阵的不对称性分析来研究联邦学习中的LoRA。在这个过程中，我们发现$A$矩阵负责学习一般知识，而$B$矩阵专注于捕捉特定客户的知识。基于这一发现，我们引入了联邦分享低秩适应（FedSA-LoRA）方法，该方法使用两个低秩可训练矩阵$A$和$B$来建模权重更新，但只有$A$矩阵与服务器共享以进行聚合。此外，我们深入研究了其他LoRA变种中学习到的$A$和$B$矩阵之间的关系，如rsLoRA和VeRA，揭示了一致的模式。因此，我们将我们的FedSA-LoRA方法扩展到这些LoRA变种，得到FedSA-rsLoRA和FedSA-VeRA。通过这种方式，我们建立了一个将LoRA与FL集成的通用范式，为未来在结合FL的后续LoRA变种的工作提供指导。对自然语言理解和生成任务的大量实验结果证明了我们提出方法的有效性。我们的代码可在https://github.com/Pengxin-Guo/FedSA-LoRA上找到。

更新时间: 2025-03-21 03:33:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.01463v3

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

A fundamental challenge in formal theorem proving by LLMs is the lack of high-quality training data. Although reinforcement learning or expert iteration partially mitigates this issue by alternating between LLM generating proofs and finetuning them on correctly generated ones, performance quickly plateaus due to the scarcity of correct proofs (sparse rewards). To keep improving the models with limited data, we draw inspiration from mathematicians, who continuously develop new results, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. The conjecturer is trained iteratively on previously generated conjectures that are barely provable by the current prover, which incentivizes it to generate increasingly challenging conjectures over time. The prover attempts to prove the conjectures with standard expert iteration. We evaluate STP with both Lean and Isabelle formal versifiers. With 51.3 billion tokens generated during the training in Lean, STP proves 28.5% of the statements in the LeanWorkbook dataset, doubling the previous best result of 13.2% achieved through expert iteration. The final model achieves state-of-the-art performance among whole-proof generation methods on miniF2F-test (65.0%, pass@3200), Proofnet-test (23.9%, pass@3200) and PutnamBench (8/644, pass@3200). We release our code, model, and dataset in this URL: https://github.com/kfdong/STP.

Updated: 2025-03-21 03:27:55

标题: STP：具有迭代猜测和证明的自我对弈LLM定理证明器

摘要: 通过LLMs进行形式定理证明的一个基本挑战是缺乏高质量的训练数据。尽管强化学习或专家迭代在LLM生成证明并对生成正确证明的证明进行微调之间部分地缓解了这个问题，但由于正确证明的稀缺性（奖励稀疏），性能很快就会达到平台。为了继续改进有限数据的模型，我们从数学家那里汲取灵感，他们通过提出新的猜想或练习（通常是已知结果的变体）并尝试解决它们来不断发展新的结果。我们设计了自我对弈定理证明器（STP），它同时扮演两个角色，猜想者和证明者，每个角色都向另一个提供训练信号。猜想者在先前生成的几乎无法被当前证明器证明的猜想上进行迭代训练，这激励它随着时间推移生成越来越具有挑战性的猜想。证明者尝试使用标准的专家迭代来证明这些猜想。我们使用Lean和Isabelle形式验证器对STP进行评估。在Lean训练期间生成的513亿个令牌中，STP证明了LeanWorkbook数据集中28.5%的陈述，将之前通过专家迭代实现的13.2%最佳结果翻倍。最终模型在miniF2F-test（65.0%，pass@3200）、Proofnet-test（23.9%，pass@3200）和PutnamBench（8/644，pass@3200）上实现了最先进的整体证明生成方法性能。我们在以下网址发布了我们的代码、模型和数据集：https://github.com/kfdong/STP。

更新时间: 2025-03-21 03:27:55

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2502.00212v4

MODL: Multilearner Online Deep Learning

Online deep learning tackles the challenge of learning from data streams by balancing two competing goals: fast learning and deep learning. However, existing research primarily emphasizes deep learning solutions, which are more adept at handling the ``deep'' aspect than the ``fast'' aspect of online learning. In this work, we introduce an alternative paradigm through a hybrid multilearner approach. We begin by developing a fast online logistic regression learner, which operates without relying on backpropagation. It leverages closed-form recursive updates of model parameters, efficiently addressing the fast learning component of the online learning challenge. This approach is further integrated with a cascaded multilearner design, where shallow and deep learners are co-trained in a cooperative, synergistic manner to solve the online learning problem. We demonstrate that this approach achieves state-of-the-art performance on standard online learning datasets. We make our code available: https://github.com/AntonValk/MODL

Updated: 2025-03-21 03:21:40

标题: MODL：多学习者在线深度学习

摘要: 在线深度学习通过平衡两个竞争性目标（快速学习和深度学习）来解决从数据流中学习的挑战。然而，现有研究主要强调深度学习解决方案，这些解决方案更擅长处理“深度”方面，而不是“快速”方面的在线学习。在这项工作中，我们通过混合多学习者方法引入了一种替代范式。我们首先开发了一个快速在线逻辑回归学习器，它在不依赖反向传播的情况下运行。它利用模型参数的封闭形式递归更新，有效地解决了在线学习挑战的快速学习组件。这种方法进一步与级联多学习者设计集成，浅层和深层学习者以协作、协同的方式共同训练，以解决在线学习问题。我们展示了这种方法在标准在线学习数据集上实现了最先进的性能。我们提供我们的代码：https://github.com/AntonValk/MODL

更新时间: 2025-03-21 03:21:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18281v2

Separation capacity of linear reservoirs with random connectivity matrix

A natural hypothesis for the success of reservoir computing in generic tasks is the ability of the untrained reservoir to map different input time series to separable reservoir states - a property we term separation capacity. We provide a rigorous mathematical framework to quantify this capacity for random linear reservoirs, showing that it is fully characterised by the spectral properties of the generalised matrix of moments of the random reservoir connectivity matrix. Our analysis focuses on reservoirs with Gaussian connectivity matrices, both symmetric and i.i.d., although the techniques extend naturally to broader classes of random matrices. In the symmetric case, the generalised matrix of moments is a Hankel matrix. Using classical estimates from random matrix theory, we establish that separation capacity deteriorates over time and that, for short inputs, optimal separation in large reservoirs is achieved when the matrix entries are scaled with a factor $\rho_T/\sqrt{N}$, where $N$ is the reservoir dimension and $\rho_T$ depends on the maximum input length. In the i.i.d.\ case, we establish that optimal separation with large reservoirs is consistently achieved when the entries of the reservoir matrix are scaled with the exact factor $1/\sqrt{N}$, which aligns with common implementations of reservoir computing. We further give upper bounds on the quality of separation as a function of the length of the time series. We complement this analysis with an investigation of the likelihood of this separation and its consistency under different architectural choices.

Updated: 2025-03-21 03:21:08

标题: 随机连接矩阵下线性蓄水池的分离能力

摘要: 在通用任务中，库容计算成功的一个自然假设是未经训练的储水池能够将不同的输入时间序列映射到可分离的储水池状态 - 这一性质我们称之为分离容量。我们提供了一个严格的数学框架来量化随机线性储水池的这种容量，表明它完全由随机储水池连接矩阵的广义矩矩阵的谱特性所描述。我们的分析集中在具有高斯连接矩阵的储水池上，既对称又独立同分布，尽管这些技术自然地扩展到更广泛的随机矩阵类别。在对称情况下，广义矩阵的矩是一个Hankel矩阵。利用随机矩阵理论的经典估计，我们建立了分离容量随时间恶化的事实，并且对于短输入，在大型储水池中，当矩阵条目按照一个因子$\rho_T/\sqrt{N}$进行缩放时，最佳分离是实现的，其中$N$是储水池维度，$\rho_T$取决于最大输入长度。在独立同分布的情况下，我们建立了在大型储水池中，通过将储水池矩阵的条目按照恰好的因子$1/\sqrt{N}$进行缩放，可以一致地实现最佳分离，这与库容计算的常见实现一致。我们进一步给出了分离质量的上限，作为时间序列长度的函数。我们通过对不同结构选择下的分离可能性及其一致性进行调查，来补充这一分析。

更新时间: 2025-03-21 03:21:08

领域: stat.ML,cs.LG,math.PR,68T07, 60B20, 37M10

下载: http://arxiv.org/abs/2404.17429v3

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

We propose V-Droid, a mobile GUI task automation agent. Unlike previous mobile agents that utilize Large Language Models (LLMs) as generators to directly generate actions at each step, V-Droid employs LLMs as verifiers to evaluate candidate actions before making final decisions. To realize this novel paradigm, we introduce a comprehensive framework for constructing verifier-driven mobile agents: the discretized action space construction coupled with the prefilling-only workflow to accelerate the verification process, the pair-wise progress preference training to significantly enhance the verifier's decision-making capabilities, and the scalable human-agent joint annotation scheme to efficiently collect the necessary data at scale. V-Droid sets a new state-of-the-art task success rate across several public mobile task automation benchmarks: 59.5% on AndroidWorld, 38.3% on AndroidLab, and 49% on MobileAgentBench, surpassing existing agents by 9.5%, 2.1%, and 9%, respectively. Furthermore, V-Droid achieves an impressively low latency of 0.7 seconds per step, making it the first mobile agent capable of delivering near-real-time, effective decision-making capabilities.

Updated: 2025-03-21 03:19:57

标题: 推动移动GUI代理：一种验证驱动方法用于实际部署

摘要: 我们提出V-Droid，一个移动GUI任务自动化代理。与以往利用大型语言模型（LLMs）作为生成器直接在每个步骤生成动作的移动代理不同，V-Droid将LLMs作为验证器，在做出最终决策之前评估候选动作。为了实现这一新颖的范式，我们引入了一个全面的框架来构建以验证器驱动的移动代理：离散化动作空间构建与仅填充工作流相结合，以加速验证过程，成对进展偏好训练以显著增强验证器的决策能力，以及可扩展的人-代理共同注释方案，以高效地收集所需数据。V-Droid在几个公共移动任务自动化基准测试中创造了一个新的最先进的任务成功率：在AndroidWorld上为59.5％，在AndroidLab上为38.3％，在MobileAgentBench上为49％，分别超过现有代理9.5％，2.1％和9％。此外，V-Droid每步的延迟时间令人印象深刻，为0.7秒，使其成为第一个能够提供接近实时、有效决策能力的移动代理。

更新时间: 2025-03-21 03:19:57

领域: cs.AI

下载: http://arxiv.org/abs/2503.15937v2

Neural Representation for Wireless Radiation Field Reconstruction: A 3D Gaussian Splatting Approach

Wireless channel modeling plays a pivotal role in designing, analyzing, and optimizing wireless communication systems. Nevertheless, developing an effective channel modeling approach has been a long-standing challenge. This issue has been escalated due to denser network deployment, larger antenna arrays, and broader bandwidth in next-generation networks. To address this challenge, we put forth WRF-GS, a novel framework for channel modeling based on wireless radiation field (WRF) reconstruction using 3D Gaussian splatting (3D-GS). WRF-GS employs 3D Gaussian primitives and neural networks to capture the interactions between the environment and radio signals, enabling efficient WRF reconstruction and visualization of the propagation characteristics. The reconstructed WRF can then be used to synthesize the spatial spectrum for comprehensive wireless channel characterization. While WRF-GS demonstrates remarkable effectiveness, it faces limitations in capturing high-frequency signal variations caused by complex multipath effects. To overcome these limitations, we propose WRF-GS+, an enhanced framework that integrates electromagnetic wave physics into the neural network design. WRF-GS+ leverages deformable 3D Gaussians to model both static and dynamic components of the WRF, significantly improving its ability to characterize signal variations. In addition, WRF-GS+ enhances the splatting process by simplifying the 3D-GS modeling process and improving computational efficiency. Experimental results demonstrate that both WRF-GS and WRF-GS+ outperform baselines for spatial spectrum synthesis, including ray tracing and other deep-learning approaches. Notably, WRF-GS+ achieves state-of-the-art performance in the received signal strength indication (RSSI) and channel state information (CSI) prediction tasks, surpassing existing methods by more than 0.7 dB and 3.36 dB, respectively.

Updated: 2025-03-21 03:12:04

标题: 无线辐射场重建的神经表示：一种3D高斯投射方法

摘要: 无线信道建模在设计、分析和优化无线通信系统中起着关键作用。然而，开发有效的信道建模方法一直是一个长期存在的挑战。由于下一代网络中网络部署更密集、天线阵列更大、带宽更宽，这个问题已经升级。为了解决这一挑战，我们提出了基于无线辐射场（WRF）重建的3D高斯弹射（3D-GS）的信道建模新框架WRF-GS。WRF-GS利用3D高斯原语和神经网络来捕捉环境和无线信号之间的相互作用，实现高效的WRF重建和传播特性的可视化。重建的WRF可以用于综合无线信道特性的空间谱合成。虽然WRF-GS表现出显著的有效性，但在捕捉由复杂多径效应引起的高频信号变化方面存在局限性。为了克服这些限制，我们提出了增强框架WRF-GS+，将电磁波物理学引入神经网络设计中。WRF-GS+利用可变形的3D高斯模型来建模WRF的静态和动态组件，显著提高了其对信号变化的表征能力。此外，WRF-GS+通过简化3D-GS建模过程和提高计算效率来增强弹射过程。实验结果表明，无论是WRF-GS还是WRF-GS+都优于空间谱合成的基线，包括射线跟踪和其他深度学习方法。值得注意的是，WRF-GS+在接收信号强度指示（RSSI）和信道状态信息（CSI）预测任务中实现了最先进的性能，分别超过现有方法0.7 dB和3.36 dB。

更新时间: 2025-03-21 03:12:04

领域: cs.NI,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.04832v3

Label Unbalance in High-frequency Trading

In financial trading, return prediction is one of the foundation for a successful trading system. By the fast development of the deep learning in various areas such as graphical processing, natural language, it has also demonstrate significant edge in handling with financial data. While the success of the deep learning relies on huge amount of labeled sample, labeling each time/event as profitable or unprofitable, under the transaction cost, especially in the high-frequency trading world, suffers from serious label imbalance issue.In this paper, we adopts rigurious end-to-end deep learning framework with comprehensive label imbalance adjustment methods and succeed in predicting in high-frequency return in the Chinese future market. The code for our method is publicly available at https://github.com/RS2002/Label-Unbalance-in-High-Frequency-Trading .

Updated: 2025-03-21 03:10:17

标题: 高频交易中的标签不平衡问题

摘要: 在金融交易中，收益预测是成功交易系统的基础之一。随着深度学习在图形处理、自然语言等各个领域的快速发展，它在处理金融数据方面也展现出显著优势。虽然深度学习的成功依赖于大量标记样本，即将每次/事件标记为有利可图或无利可图，在交易成本，特别是在高频交易领域，会遇到严重的标签不平衡问题。本文采用了严谨的端到端深度学习框架，结合全面的标签不平衡调整方法，在中国期货市场成功预测高频收益。我们的方法代码可以在https://github.com/RS2002/Label-Unbalance-in-High-Frequency-Trading上公开获取。

更新时间: 2025-03-21 03:10:17

领域: cs.LG,cs.AI,q-fin.CP

下载: http://arxiv.org/abs/2503.09988v3

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Traditional object detection methods operate under the closed-set assumption, where models can only detect a fixed number of objects predefined in the training set. Recent works on open vocabulary object detection (OVD) enable the detection of objects defined by an in-principle unbounded vocabulary, which reduces the cost of training models for specific tasks. However, OVD heavily relies on accurate prompts provided by an ``oracle'', which limits their use in critical applications such as driving scene perception. OVD models tend to misclassify near-out-of-distribution (NOOD) objects that have similar features to known classes, and ignore far-out-of-distribution (FOOD) objects. To address these limitations, we propose a framework that enables OVD models to operate in open world settings, by identifying and incrementally learning previously unseen objects. To detect FOOD objects, we propose Open World Embedding Learning (OWEL) and introduce the concept of Pseudo Unknown Embedding which infers the location of unknown classes in a continuous semantic space based on the information of known classes. We also propose Multi-Scale Contrastive Anchor Learning (MSCAL), which enables the identification of misclassified unknown objects by promoting the intra-class consistency of object embeddings at different scales. The proposed method achieves state-of-the-art performance on standard open world object detection and autonomous driving benchmarks while maintaining its open vocabulary object detection capability.

Updated: 2025-03-21 03:09:27

标题: 从开放词汇到开放世界：教导视觉语言模型检测新对象

摘要: 传统的目标检测方法基于封闭集假设运作，模型只能检测在训练集中预先定义的固定数量的对象。最近关于开放词汇目标检测（OVD）的研究使得能够检测由原则上无限词汇定义的对象，从而降低了为特定任务训练模型的成本。然而，OVD严重依赖于由“神谕”提供的准确提示，这限制了它们在关键应用如驾驶场景感知中的使用。OVD模型倾向于误分类具有类似特征的接近分布之外（NOOD）对象，并忽略远离分布之外（FOOD）对象。为了解决这些限制，我们提出了一个框架，使得OVD模型能够在开放世界环境中操作，通过识别和逐渐学习之前未见过的对象。为了检测FOOD对象，我们提出了开放世界嵌入学习（OWEL）并引入了伪未知嵌入的概念，根据已知类的信息在连续语义空间中推断未知类的位置。我们还提出了多尺度对比锚学习（MSCAL），通过促进不同尺度的对象嵌入的内类一致性，实现了对被误分类的未知对象的识别。所提出的方法在标准的开放世界目标检测和自动驾驶基准上实现了最先进的性能，同时保持了其开放词汇目标检测能力。

更新时间: 2025-03-21 03:09:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18207v3

Harnessing Nonidealities in Analog In-Memory Computing Circuits: A Physical Modeling Approach for Neuromorphic Systems

Large-scale deep learning models are increasingly constrained by their immense energy consumption, limiting their scalability and applicability for edge intelligence. In-memory computing (IMC) offers a promising solution by addressing the von Neumann bottleneck inherent in traditional deep learning accelerators, significantly reducing energy consumption. However, the analog nature of IMC introduces hardware nonidealities that degrade model performance and reliability. This paper presents a novel approach to directly train physical models of IMC, formulated as ordinary-differential-equation (ODE)-based physical neural networks (PNNs). To enable the training of large-scale networks, we propose a technique called differentiable spike-time discretization (DSTD), which reduces the computational cost of ODE-based PNNs by up to 20 times in speed and 100 times in memory. We demonstrate that such large-scale networks enhance the learning performance by exploiting hardware nonidealities on the CIFAR-10 dataset. The proposed bottom-up methodology is validated through the post-layout SPICE simulations on the IMC circuit with nonideal characteristics using the sky130 process. The proposed PNN approach reduces the discrepancy between the model behavior and circuit dynamics by at least an order of magnitude. This work paves the way for leveraging nonideal physical devices, such as non-volatile resistive memories, for energy-efficient deep learning applications.

Updated: 2025-03-21 03:08:11

标题: 利用模拟内存计算电路中的非理想性：神经形态系统的物理建模方法

摘要: 大规模深度学习模型在很大程度上受到其巨大的能耗限制，这限制了它们的可扩展性和适用性于边缘智能。内存计算（IMC）通过解决传统深度学习加速器中固有的冯·诺伊曼瓶颈，显著降低能耗，提供了一种有希望的解决方案。然而，IMC的模拟性质引入了硬件非理想性，降低了模型性能和可靠性。本文提出了一种直接训练IMC物理模型的新方法，将其形式化为基于普通微分方程（ODE）的物理神经网络（PNNs）。为了实现大规模网络的训练，我们提出了一种称为可微分脉冲时间离散（DSTD）的技术，可以将基于ODE的PNNs的计算成本提高至少20倍的速度和100倍的内存。我们证明了这样的大规模网络通过在CIFAR-10数据集上利用硬件非理想性来增强学习性能。通过使用sky130工艺上的IMC电路进行后布局SPICE模拟验证了所提出的自下而上方法。所提出的PNN方法至少使模型行为与电路动态之间的差异减少一个数量级。这项工作为利用非理想物理设备，如非挥发性电阻性存储器，进行高效能的深度学习应用铺平了道路。

更新时间: 2025-03-21 03:08:11

领域: cs.LG

下载: http://arxiv.org/abs/2412.09010v2

Semi-Supervised End-To-End Contrastive Learning For Time Series Classification

Time series classification is a critical task in various domains, such as finance, healthcare, and sensor data analysis. Unsupervised contrastive learning has garnered significant interest in learning effective representations from time series data with limited labels. The prevalent approach in existing contrastive learning methods consists of two separate stages: pre-training the encoder on unlabeled datasets and fine-tuning the well-trained model on a small-scale labeled dataset. However, such two-stage approaches suffer from several shortcomings, such as the inability of unsupervised pre-training contrastive loss to directly affect downstream fine-tuning classifiers, and the lack of exploiting the classification loss which is guided by valuable ground truth. In this paper, we propose an end-to-end model called SLOTS (Semi-supervised Learning fOr Time clasSification). SLOTS receives semi-labeled datasets, comprising a large number of unlabeled samples and a small proportion of labeled samples, and maps them to an embedding space through an encoder. We calculate not only the unsupervised contrastive loss but also measure the supervised contrastive loss on the samples with ground truth. The learned embeddings are fed into a classifier, and the classification loss is calculated using the available true labels. The unsupervised, supervised contrastive losses and classification loss are jointly used to optimize the encoder and classifier. We evaluate SLOTS by comparing it with ten state-of-the-art methods across five datasets. The results demonstrate that SLOTS is a simple yet effective framework. When compared to the two-stage framework, our end-to-end SLOTS utilizes the same input data, consumes a similar computational cost, but delivers significantly improved performance. We release code and datasets at https://anonymous.4open.science/r/SLOTS-242E.

Updated: 2025-03-21 03:06:40

标题: 半监督式端到端对比学习用于时间序列分类

摘要: 时间序列分类是各个领域中的关键任务，如金融、医疗保健和传感器数据分析。无监督对比学习在从具有有限标签的时间序列数据中学习有效表示方面引起了很大兴趣。现有对比学习方法中普遍的方法包括两个独立阶段：在无标签数据集上对编码器进行预训练，然后在小规模标记数据集上对训练良好的模型进行微调。然而，这种两阶段方法存在几个缺点，例如无监督预训练对比损失无法直接影响下游微调分类器，以及缺乏利用由宝贵的真实标签指导的分类损失。在本文中，我们提出了一种名为SLOTS（用于时间分类的半监督学习）的端到端模型。SLOTS接收半标记数据集，包括大量未标记样本和少量已标记样本，并通过编码器将它们映射到嵌入空间。我们不仅计算无监督对比损失，还在具有地面真实值的样本上测量监督对比损失。学习到的嵌入被馈送到分类器，使用可用的真实标签计算分类损失。无监督、监督对比损失和分类损失共同用于优化编码器和分类器。我们通过将其与五个数据集上的十种最先进方法进行比较来评估SLOTS。结果表明，SLOTS是一个简单而有效的框架。与两阶段框架相比，我们的端到端SLOTS利用相同的输入数据，消耗类似的计算成本，但性能显著提高。我们在https://anonymous.4open.science/r/SLOTS-242E发布了代码和数据集。

更新时间: 2025-03-21 03:06:40

领域: cs.LG

下载: http://arxiv.org/abs/2310.08848v2

Knowledge-aware contrastive heterogeneous molecular graph learning

Molecular representation learning is pivotal in predicting molecular properties and advancing drug design. Traditional methodologies, which predominantly rely on homogeneous graph encoding, are limited by their inability to integrate external knowledge and represent molecular structures across different levels of granularity. To address these limitations, we propose a paradigm shift by encoding molecular graphs into heterogeneous structures, introducing a novel framework: Knowledge-aware Contrastive Heterogeneous Molecular Graph Learning (KCHML). This approach leverages contrastive learning to enrich molecular representations with embedded external knowledge. KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction. Extensive benchmarking demonstrates KCHML's superiority over state-of-the-art molecular property prediction models, underscoring its ability to capture intricate molecular features.

Updated: 2025-03-21 02:55:34

标题: 知识感知的对比异质分子图学习

摘要: 分子表示学习在预测分子属性和推进药物设计方面至关重要。传统方法主要依赖于同质图编码，受到无法集成外部知识和代表不同粒度级别的分子结构的限制。为了解决这些限制，我们提出了一种范式转变，将分子图编码为异质结构，引入一种新颖的框架：知识感知对比异质分子图学习（KCHML）。这种方法利用对比学习来丰富分子表示，嵌入外部知识。KCHML通过异质分子图和双向消息传递机制，将分子概念化为三种不同的图视图-分子、元素和药理学，为属性预测提供了全面的表示，同时也适用于药物-药物相互作用（DDI）预测等下游任务。广泛的基准测试表明，KCHML在分子属性预测模型方面优于最先进的模型，突出了其捕捉复杂分子特征的能力。

更新时间: 2025-03-21 02:55:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.11711v2

Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance

In zero-shot image recognition tasks, humans demonstrate remarkable flexibility in classifying unseen categories by composing known simpler concepts. However, existing vision-language models (VLMs), despite achieving significant progress through large-scale natural language supervision, often underperform in real-world applications because of sub-optimal prompt engineering and the inability to adapt effectively to target classes. To address these issues, we propose a Concept-guided Human-like Bayesian Reasoning (CHBR) framework. Grounded in Bayes' theorem, CHBR models the concept used in human image recognition as latent variables and formulates this task by summing across potential concepts, weighted by a prior distribution and a likelihood function. To tackle the intractable computation over an infinite concept space, we introduce an importance sampling algorithm that iteratively prompts large language models (LLMs) to generate discriminative concepts, emphasizing inter-class differences. We further propose three heuristic approaches involving Average Likelihood, Confidence Likelihood, and Test Time Augmentation (TTA) Likelihood, which dynamically refine the combination of concepts based on the test image. Extensive evaluations across fifteen datasets demonstrate that CHBR consistently outperforms existing state-of-the-art zero-shot generalization methods.

Updated: 2025-03-21 02:55:26

标题: 通过类人概念引导增强视觉-语言模型中的零样本图像识别

摘要: 在零样本图像识别任务中，人类展示了通过组合已知简单概念对未见类别进行分类的显著灵活性。然而，尽管现有的视觉-语言模型（VLMs）通过大规模自然语言监督取得了显著进展，但在现实世界应用中往往表现不佳，原因是提示工程的次优性和无法有效适应目标类别。为了解决这些问题，我们提出了一个概念引导的类人贝叶斯推理（CHBR）框架。基于贝叶斯定理，CHBR将人类图像识别中使用的概念建模为潜在变量，并通过对概念空间加权求和，通过先验分布和似然函数进行公式化。为了解决在无限概念空间上的难以计算的问题，我们引入了一种重要性抽样算法，迭代提示大型语言模型（LLMs）生成具有区分性的概念，强调跨类别的差异。我们进一步提出了三种启发式方法，包括平均似然、置信似然和测试时间增强（TTA）似然，根据测试图像动态调整概念的组合。对十五个数据集的广泛评估表明，CHBR始终优于现有的最先进的零样本泛化方法。

更新时间: 2025-03-21 02:55:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.15886v2

When Debate Fails: Bias Reinforcement in Large Language Models

Large Language Models $($LLMs$)$ solve complex problems using training-free methods like prompt engineering and in-context learning, yet ensuring reasoning correctness remains challenging. While self-correction methods such as self-consistency and self-refinement aim to improve reliability, they often reinforce biases due to the lack of effective feedback mechanisms. Multi-Agent Debate $($MAD$)$ has emerged as an alternative, but we identify two key limitations: bias reinforcement, where debate amplifies model biases instead of correcting them, and lack of perspective diversity, as all agents share the same model and reasoning patterns, limiting true debate effectiveness. To systematically evaluate these issues, we introduce $\textit{MetaNIM Arena}$, a benchmark designed to assess LLMs in adversarial strategic decision-making, where dynamic interactions influence optimal decisions. To overcome MAD's limitations, we propose $\textbf{DReaMAD}$ $($$\textbf{D}$iverse $\textbf{Rea}$soning via $\textbf{M}$ulti-$\textbf{A}$gent $\textbf{D}$ebate with Refined Prompt$)$, a novel framework that $(1)$ refines LLM's strategic prior knowledge to improve reasoning quality and $(2)$ promotes diverse viewpoints within a single model by systematically modifying prompts, reducing bias. Empirical results show that $\textbf{DReaMAD}$ significantly improves decision accuracy, reasoning diversity, and bias mitigation across multiple strategic tasks, establishing it as a more effective approach for LLM-based decision-making.

Updated: 2025-03-21 02:51:30

标题: 辩论失败时：大型语言模型中的偏见强化

摘要: 大型语言模型（LLMs）利用无需训练的方法（如提示工程和上下文学习）解决复杂问题，但确保推理正确性仍具挑战性。虽然自我校正方法（如自一致性和自精化）旨在提高可靠性，但由于缺乏有效的反馈机制，它们常常强化偏见。多智能体辩论（MAD）已经成为一种替代方案，但我们确定了两个关键限制：偏见强化，即辩论放大模型偏见而非纠正它们，以及缺乏视角多样性，因为所有智能体共享相同的模型和推理模式，限制了真正的辩论效果。为了系统评估这些问题，我们引入了MetaNIM Arena，一个旨在评估LLMs在对抗性战略决策中的竞争性决策的基准，其中动态交互影响最佳决策。为了克服MAD的限制，我们提出了DReaMAD（Diverse Reasoning via Multi-Agent Debate with Refined Prompt），这是一个新颖的框架，通过以下两点实现：（1）优化LLM的战略先验知识以提高推理质量；（2）通过系统修改提示，在单一模型内促进多样化观点，减少偏见。实证结果显示，DReaMAD显著提高了决策准确性、推理多样性和偏见缓解，在多个战略任务中建立了它作为LLM决策的更有效方法。

更新时间: 2025-03-21 02:51:30

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.16814v1

Knowledge Graph Embeddings: A Comprehensive Survey on Capturing Relation Properties

Knowledge Graph Embedding (KGE) techniques play a pivotal role in transforming symbolic Knowledge Graphs (KGs) into numerical representations, thereby enhancing various deep learning models for knowledge-augmented applications. Unlike entities, relations in KGs are the carriers of semantic meaning, and their accurate modeling is crucial for the performance of KGE models. Firstly, we address the complex mapping properties inherent in relations, such as one-to-one, one-to-many, many-to-one, and many-to-many mappings. We provide a comprehensive summary of relation-aware mapping-based models, models that utilize specific representation spaces, tensor decomposition-based models, and neural network-based models. Next, focusing on capturing various relation patterns like symmetry, asymmetry, inversion, and composition, we review models that employ modified tensor decomposition, those based on modified relation-aware mappings, and those that leverage rotation operations. Subsequently, considering the implicit hierarchical relations among entities, we introduce models that incorporate auxiliary information, models based on hyperbolic spaces, and those that utilize the polar coordinate system. Finally, in response to more complex scenarios such as sparse and dynamic KGs, this paper discusses potential future research directions. We explore innovative ideas such as integrating multimodal information into KGE, enhancing relation pattern modeling with rules, and developing models to capture relation characteristics in dynamic KGE settings.

Updated: 2025-03-21 02:50:43

标题: 知识图谱嵌入：捕捉关系属性的全面调查

摘要: 知识图谱嵌入（KGE）技术在将符号知识图谱（KGs）转化为数值表示方面发挥着关键作用，从而增强了各种基于知识增强的深度学习模型的性能。与实体不同，知识图谱中的关系是语义含义的载体，其准确建模对于KGE模型的性能至关重要。首先，我们解决了关系中固有的复杂映射属性，如一对一、一对多、多对一和多对多映射。我们提供了关系感知映射模型、利用特定表示空间的模型、基于张量分解的模型和基于神经网络的模型的全面总结。接下来，专注于捕捉各种关系模式，如对称、非对称、倒置和组合，我们回顾了采用修改后的张量分解、基于修改后的关系感知映射的模型以及利用旋转操作的模型。随后，考虑到实体之间的隐性层次关系，我们介绍了融合辅助信息的模型、基于双曲空间的模型以及利用极坐标系的模型。最后，针对稀疏和动态知识图谱等更复杂场景，本文讨论了潜在的未来研究方向。我们探索了将多模态信息整合到KGE中、利用规则增强关系模式建模以及开发模型以捕获动态KGE设置中的关系特征等创新思路。

更新时间: 2025-03-21 02:50:43

领域: cs.LG,cs.AI,cs.CL,I.2.7

下载: http://arxiv.org/abs/2410.14733v2

Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning

In recent years, data-driven reinforcement learning (RL), also known as offline RL, have gained significant attention. However, the role of data sampling techniques in offline RL has been overlooked despite its potential to enhance online RL performance. Recent research suggests applying sampling techniques directly to state-transitions does not consistently improve performance in offline RL. Therefore, in this study, we propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories for more comprehensive information extraction from limited data. TR enhances learning efficiency by backward sampling of trajectories that optimizes the use of subsequent state information. Building on TR, we build the weighted critic target to avoid sampling unseen actions in offline training, and Prioritized Trajectory Replay (PTR) that enables more efficient trajectory sampling, prioritized by various trajectory priority metrics. We demonstrate the benefits of integrating TR and PTR with existing offline RL algorithms on D4RL. In summary, our research emphasizes the significance of trajectory-based data sampling techniques in enhancing the efficiency and performance of offline RL algorithms.

Updated: 2025-03-21 02:41:44

标题: 优先轨迹回放：一种用于数据驱动强化学习的回放记忆

摘要: 近年来，基于数据驱动的增强学习（RL），也被称为离线RL，引起了广泛的关注。然而，尽管数据采样技术在离线RL中有潜力提升在线RL性能，但其作用却被忽视了。最近的研究表明，将采样技术直接应用于状态转换并不能始终提高离线RL的性能。因此，在这项研究中，我们提出了一种记忆技术，（优先）轨迹重放（TR/PTR），将采样视角扩展到轨迹，以从有限数据中提取更全面的信息。TR通过对轨迹进行向后采样来增强学习效率，优化使用后续状态信息。在TR的基础上，我们构建了加权评论家目标，以避免在离线训练中对未见过的动作进行采样，以及优先轨迹重放（PTR），通过各种轨迹优先度指标对轨迹进行更有效的采样。我们展示了将TR和PTR与现有离线RL算法集成在D4RL上的好处。总之，我们的研究强调了基于轨迹的数据采样技术在增强离线RL算法的效率和性能方面的重要性。

更新时间: 2025-03-21 02:41:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.15503v2

Online Selective Conformal Prediction: Errors and Solutions

In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. In this paper, we evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.

Updated: 2025-03-21 02:37:28

标题: 在线选择性一致性预测：错误及解决方案

摘要: 在线选择性符合推理中，数据是顺序到达的，只有当在线选择规则满足时才构建预测区间。由于在线选择可能会破坏所选测试数据与其他数据之间的可交换性，因此必须通过适当选择校准数据来纠正这一点。在本文中，我们评估现有的校准选择策略，并指出一些相关声明中的基本错误，这些声明保证了选择条件覆盖和误覆盖率（FCR）的控制。为了解决这些缺陷，我们提出了一些新颖的校准选择策略，可以证明地保持校准数据和所选测试数据之间的可交换性。因此，我们证明了采用这些策略的在线选择性符合推理可以保证选择条件覆盖和FCR控制。我们的理论发现得到了实验证据的支持，检验了有效方法之间的权衡。

更新时间: 2025-03-21 02:37:28

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.16809v1

Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning

We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm. The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm. The autoencoder is trained with a composite loss function that incorporates both a conventional data reconstruction as a regularization component and a clustering-promoting component built using the spectral graph theory. The two embeddings and the subsequent clustering are integrated into a three-stage unsupervised learning framework, referred to as Autoencoded UMAP-Enhanced Clustering (AUEC). When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.

Updated: 2025-03-21 02:34:36

标题: Autoencoded UMAP增强聚类用于无监督学习

摘要: 我们提出了一种新颖的无监督学习方法，通过将数据构建成非线性嵌入到低维空间，然后使用任何常规聚类算法。这种嵌入促进了数据的聚类能力，由两个映射组成：自动编码器神经网络的编码器和UMAP算法的输出。自动编码器使用一个复合损失函数进行训练，该损失函数包含传统数据重构作为正则化组件和使用谱图理论构建的促进聚类的组件。这两个嵌入和随后的聚类被整合到一个三阶段无监督学习框架中，称为自动编码UMAP增强聚类（AUEC）。当应用于MNIST数据时，AUEC在聚类准确性方面明显优于现有技术。

更新时间: 2025-03-21 02:34:36

领域: cs.LG,68Q32, 68T07, 68T09, 68T10

下载: http://arxiv.org/abs/2501.07729v2

DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation

Nonprehensile manipulation is crucial for handling objects that are too thin, large, or otherwise ungraspable in unstructured environments. While conventional planning-based approaches struggle with complex contact modeling, learning-based methods have recently emerged as a promising alternative. However, existing learning-based approaches face two major limitations: they heavily rely on multi-view cameras and precise pose tracking, and they fail to generalize across varying physical conditions, such as changes in object mass and table friction. To address these challenges, we propose the Dynamics-Adaptive World Action Model (DyWA), a novel framework that enhances action learning by jointly predicting future states while adapting to dynamics variations based on historical trajectories. By unifying the modeling of geometry, state, physics, and robot actions, DyWA enables more robust policy learning under partial observability. Compared to baselines, our method improves the success rate by 31.5% using only single-view point cloud observations in the simulation. Furthermore, DyWA achieves an average success rate of 68% in real-world experiments, demonstrating its ability to generalize across diverse object geometries, adapt to varying table friction, and robustness in challenging scenarios such as half-filled water bottles and slippery surfaces.

Updated: 2025-03-21 02:29:52

标题: DyWA: 通用非抓取操作的动态自适应世界行为模型

摘要: 非抓握操纵对于处理在非结构化环境中过于细、大或无法抓取的物体至关重要。虽然传统的基于规划的方法在复杂接触建模方面存在困难，但学习型方法最近已经成为一种有前途的替代方案。然而，现有的基于学习的方法面临两个主要限制：它们严重依赖多视角摄像头和精确姿态跟踪，并且无法在不同的物理条件下进行泛化，例如物体质量和桌面摩擦的变化。为了解决这些挑战，我们提出了动态自适应世界行动模型（DyWA），这是一个新颖的框架，通过联合预测未来状态并根据历史轨迹调整动态变化来增强行动学习。通过统一几何、状态、物理和机器人行动建模，DyWA在部分可观察性下实现了更健壮的策略学习。与基线相比，我们的方法在模拟中仅使用单视点云观测就将成功率提高了31.5%。此外，DyWA在真实世界实验中实现了平均成功率为68%，展示了其在泛化到不同物体几何形状、适应不同桌面摩擦和挑战性场景（如半满水瓶和滑动表面）方面的能力。

更新时间: 2025-03-21 02:29:52

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.16806v1

BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation

Applying imitation learning (IL) is challenging to nonprehensile manipulation tasks of invisible objects with partial observations, such as excavating buried rocks. The demonstrator must make such complex action decisions as exploring to find the object and task-oriented actions to complete the task while estimating its hidden state, perhaps causing inconsistent action demonstration and high cognitive load problems. For these problems, work in human cognitive science suggests that promoting the use of pre-designed, simple exploration rules for the demonstrator may alleviate the problems of action inconsistency and high cognitive load. Therefore, when performing imitation learning from demonstrations using such exploration rules, it is important to accurately imitate not only the demonstrator's task-oriented behavior but also his/her mode-switching behavior (exploratory or task-oriented behavior) under partial observation. Based on the above considerations, this paper proposes a novel imitation learning framework called Belief Exploration-Action Cloning (BEAC), which has a switching policy structure between a pre-designed exploration policy and a task-oriented action policy trained on the estimated belief states based on past history. In simulation and real robot experiments, we confirmed that our proposed method achieved the best task performance, higher mode and action prediction accuracies, while reducing the cognitive load in the demonstration indicated by a user study.

Updated: 2025-03-21 02:26:14

标题: BEAC：模拟复杂的探索和面向任务的行为，用于无形物体非抓握式操纵

摘要: 将模仿学习（IL）应用于非抓取操纵任务如挖掘隐藏物体时存在挑战，因为只能观察到部分情况，比如挖掘埋藏的岩石。演示者必须做出复杂的行动决策，如探索以找到物体和完成任务所需的动作，同时估计其隐藏状态，可能导致不一致的行动演示和高认知负荷问题。针对这些问题，人类认知科学的研究表明，促进演示者使用预先设计的简单探索规则可能缓解行动不一致和高认知负荷的问题。因此，在使用这些探索规则进行演示的模仿学习时，重要的是不仅准确模仿演示者的任务导向行为，还要模仿他/她在部分观察下的模式切换行为（探索性或任务导向行为）。基于上述考虑，本文提出了一种新颖的模仿学习框架称为信念探索-行为克隆（BEAC），其在估计信念状态的基础上在过去历史上训练出的预先设计的探索策略和任务导向行动策略之间具有切换策略结构。在模拟和真实机器人实验中，我们验证了我们提出的方法实现了最佳任务表现，更高的模式和行动预测准确性，同时通过用户研究减少了演示中所指示的认知负荷。

更新时间: 2025-03-21 02:26:14

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.16803v1

Auto-Regressive Diffusion for Generating 3D Human-Object Interactions

Text-driven Human-Object Interaction (Text-to-HOI) generation is an emerging field with applications in animation, video games, virtual reality, and robotics. A key challenge in HOI generation is maintaining interaction consistency in long sequences. Existing Text-to-Motion-based approaches, such as discrete motion tokenization, cannot be directly applied to HOI generation due to limited data in this domain and the complexity of the modality. To address the problem of interaction consistency in long sequences, we propose an autoregressive diffusion model (ARDHOI) that predicts the next continuous token. Specifically, we introduce a Contrastive Variational Autoencoder (cVAE) to learn a physically plausible space of continuous HOI tokens, thereby ensuring that generated human-object motions are realistic and natural. For generating sequences autoregressively, we develop a Mamba-based context encoder to capture and maintain consistent sequential actions. Additionally, we implement an MLP-based denoiser to generate the subsequent token conditioned on the encoded context. Our model has been evaluated on the OMOMO and BEHAVE datasets, where it outperforms existing state-of-the-art methods in terms of both performance and inference speed. This makes ARDHOI a robust and efficient solution for text-driven HOI tasks

Updated: 2025-03-21 02:25:59

标题: 自回归扩散用于生成3D人-物互动

摘要: 文本驱动的人-物互动（文本到HOI）生成是一个新兴领域，在动画、视频游戏、虚拟现实和机器人领域有应用。HOI生成中的一个关键挑战是在长序列中保持交互一致性。现有的基于文本到动作的方法，如离散动作标记化，由于在这个领域数据有限且模态复杂，无法直接应用于HOI生成。为了解决长序列中交互一致性的问题，我们提出了一个自回归扩散模型（ARDHOI），用于预测下一个连续标记。具体来说，我们引入了对比变分自编码器（cVAE）来学习一个连续HOI标记的物理可信空间，从而确保生成的人-物动作是真实和自然的。为了自回归生成序列，我们开发了一个基于Mamba的上下文编码器来捕捉和维护一致的顺序动作。此外，我们实现了一个基于MLP的去噪器，以在编码的上下文条件下生成后续标记。我们的模型已在OMOMO和BEHAVE数据集上进行了评估，在性能和推理速度方面均优于现有的最先进方法。这使得ARDHOI成为一个稳健和高效的解决方案，用于文本驱动的HOI任务。

更新时间: 2025-03-21 02:25:59

领域: cs.GR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.16801v1

Causally Aligned Curriculum Learning

A pervasive challenge in Reinforcement Learning (RL) is the "curse of dimensionality" which is the exponential growth in the state-action space when optimizing a high-dimensional target task. The framework of curriculum learning trains the agent in a curriculum composed of a sequence of related and more manageable source tasks. The expectation is that when some optimal decision rules are shared across source tasks and the target task, the agent could more quickly pick up the necessary skills to behave optimally in the environment, thus accelerating the learning process. However, this critical assumption of invariant optimal decision rules does not necessarily hold in many practical applications, specifically when the underlying environment contains unobserved confounders. This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks, i.e., the invariance of optimal decision rules holds. We further develop an efficient algorithm to generate a causally aligned curriculum, provided with qualitative causal knowledge of the target task. Finally, we validate our proposed methodology through experiments in discrete and continuous confounded tasks with pixel observations.

Updated: 2025-03-21 02:20:38

标题: 因果对齐的课程学习

摘要: 在强化学习（RL）中一个普遍的挑战是“维度诅咒”，即在优化高维目标任务时，状态-动作空间呈指数增长。课程学习框架通过训练代理人在由一系列相关且更易管理的源任务组成的课程中。期望当一些最优决策规则在源任务和目标任务之间共享时，代理人可以更快地掌握在环境中行为最优的必要技能，从而加速学习过程。然而，在许多实际应用中，特别是当底层环境包含未观察到的混淆因素时，这一关键假设不一定成立。本文通过因果透镜研究了课程RL的问题。我们推导出一个充分的图形条件，描述因果对齐的源任务，即最优决策规则的不变性成立。我们进一步开发了一种有效算法，通过提供目标任务的定性因果知识来生成一个因果对齐的课程。最后，我们通过在具有像素观察的离散和连续混淆任务中的实验验证了我们提出的方法论。

更新时间: 2025-03-21 02:20:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.16799v1

ChatBEV: A Visual Language Model that Understands BEV Maps

Traffic scene understanding is essential for intelligent transportation systems and autonomous driving, ensuring safe and efficient vehicle operation. While recent advancements in VLMs have shown promise for holistic scene understanding, the application of VLMs to traffic scenarios, particularly using BEV maps, remains under explored. Existing methods often suffer from limited task design and narrow data amount, hindering comprehensive scene understanding. To address these challenges, we introduce ChatBEV-QA, a novel BEV VQA benchmark contains over 137k questions, designed to encompass a wide range of scene understanding tasks, including global scene understanding, vehicle-lane interactions, and vehicle-vehicle interactions. This benchmark is constructed using an novel data collection pipeline that generates scalable and informative VQA data for BEV maps. We further fine-tune a specialized vision-language model ChatBEV, enabling it to interpret diverse question prompts and extract relevant context-aware information from BEV maps. Additionally, we propose a language-driven traffic scene generation pipeline, where ChatBEV facilitates map understanding and text-aligned navigation guidance, significantly enhancing the generation of realistic and consistent traffic scenarios. The dataset, code and the fine-tuned model will be released.

Updated: 2025-03-21 02:17:52

标题: ChatBEV：一种能理解BEV地图的视觉语言模型

摘要: 交通场景理解对于智能交通系统和自动驾驶至关重要，确保车辆操作安全高效。虽然近年来VLM的进展显示出对整体场景理解的潜力，但将VLM应用于交通场景，特别是使用BEV地图，仍然未被充分探索。现有方法往往受限于任务设计有限和数据量狭窄，阻碍了全面的场景理解。为了解决这些挑战，我们引入了ChatBEV-QA，一个新颖的BEV VQA基准，包含超过137k个问题，旨在涵盖各种场景理解任务，包括全局场景理解、车道交互和车辆之间的交互。该基准是使用一种新颖的数据收集流程构建的，生成了可扩展且信息丰富的BEV地图VQA数据。我们进一步对专门的视觉语言模型ChatBEV进行微调，使其能够解释各种问题提示并从BEV地图中提取相关的上下文感知信息。此外，我们提出了一种语言驱动的交通场景生成流程，其中ChatBEV促进地图理解和文本对齐导航指导，显著提升了真实和一致交通场景的生成。数据集、代码和经过微调的模型将发布。

更新时间: 2025-03-21 02:17:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.13938v2

A Learnability Analysis on Neuro-Symbolic Learning

This paper analyzes the learnability of neuro-symbolic (NeSy) tasks within hybrid systems. We show that the learnability of NeSy tasks can be characterized by their derived constraint satisfaction problems (DCSPs). Specifically, a task is learnable if the corresponding DCSP has a unique solution; otherwise, it is unlearnable. For learnable tasks, we establish error bounds by exploiting the clustering property of the hypothesis space. Additionally, we analyze the asymptotic error for general NeSy tasks, showing that the expected error scales with the disagreement among solutions. Our results offer a principled approach to determining learnability and provide insights into the design of new algorithms.

Updated: 2025-03-21 02:16:11

标题: 《关于神经符号学习的可学习性分析》

摘要: 本文分析了混合系统中神经符号（NeSy）任务的可学习性。我们展示了NeSy任务的可学习性可以通过它们的推导约束满足问题（DCSPs）来表征。具体来说，如果相应的DCSP有唯一解，则任务是可学习的；否则，它是不可学习的。对于可学习的任务，我们通过利用假设空间的聚类特性建立了误差界限。此外，我们分析了一般NeSy任务的渐近误差，展示了期望误差与解之间的分歧成比例增长。我们的结果提供了一种确定可学习性的原则方法，并为新算法的设计提供了见解。

更新时间: 2025-03-21 02:16:11

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.16797v1

Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning. (2) Based on our observation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights. This targeted approach allows efficient task adaptation, utilizing less than 1% of the total parameters, and exhibits strong performance across both vision and language Mamba models, highlighting its versatility and effectiveness.

Updated: 2025-03-21 02:08:19

标题: 通过投影目标对角中心线性变换实现参数高效的Mamba调整

摘要: 尽管Mamba架构作为Transformer架构的潜在替代方案引起了越来越多的关注，但Mamba架构的参数高效微调（PEFT）方法仍然大部分未被探索。在我们的研究中，我们引入了两种关键的基于洞察力的Mamba架构PEFT策略：（1）尽管状态空间模型（SSMs）被认为是Mamba架构的基石，然后被期望在迁移学习中发挥主要作用，但我们的研究结果表明，投影仪而不是SSMs是迁移学习的主要贡献者。（2）根据我们的观察，我们提出了一种专门针对Mamba架构的新颖PEFT方法：Projector-targeted Diagonal-centric Linear Transformation（ProDiaL）。ProDiaL专注于通过对角线为中心的线性变换矩阵仅优化新任务的预训练投影仪，而不是直接微调投影仪权重。这种有针对性的方法允许高效的任务适应，利用总参数的不到1％，在视觉和语言Mamba模型中展现出强大的性能，突显了其多功能性和有效性。

更新时间: 2025-03-21 02:08:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.15224v2

Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms

In this paper we propose and investigate a wide class of Mirror Descent updates (MD) and associated novel Generalized Exponentiated Gradient (GEG) algorithms by exploiting various trace-form entropies and associated deformed logarithms and their inverses - deformed (generalized) exponential functions. The proposed algorithms can be considered as extension of entropic MD and generalization of multiplicative updates. In the literature, there exist nowadays over fifty mathematically well defined generalized entropies, so impossible to exploit all of them in one research paper. So we focus on a few selected most popular entropies and associated logarithms like the Tsallis, Kaniadakis and Sharma-Taneja-Mittal and some of their extension like Tempesta or Kaniadakis-Scarfone entropies. The shape and properties of the deformed logarithms and their inverses are tuned by one or more hyperparameters. By learning these hyperparameters, we can adapt to distribution of training data, which can be designed to the specific geometry of the optimization problem, leading to potentially faster convergence and better performance. The using generalized entropies and associated deformed logarithms in the Bregman divergence, used as a regularization term, provides some new insight into exponentiated gradient descent updates.

Updated: 2025-03-21 02:07:53

标题: 镜像下降和新的指数梯度算法：使用迹形式熵和变形对数

摘要: 在本文中，我们提出并研究了一类广泛的镜像下降更新（MD）和相关的新型广义指数梯度（GEG）算法，通过利用各种迹形式熵和相关的变形对数及其逆变形（广义）指数函数。所提出的算法可以被看作是熵MD的扩展和乘法更新的泛化。在文献中，现在存在超过50种数学上定义良好的广义熵，因此不可能在一篇研究论文中利用所有这些熵。因此，我们专注于一些最受欢迎的选择的熵和相关对数，如Tsallis、Kaniadakis和Sharma-Taneja-Mittal，以及一些它们的扩展，如Tempesta或Kaniadakis-Scarfone熵。变形对数及其逆的形状和特性通过一个或多个超参数进行调整。通过学习这些超参数，我们可以适应训练数据的分布，这可以设计成优化问题的特定几何形状，从而导致潜在更快的收敛和更好的性能。在Bregman散度中使用广义熵和相关的变形对数作为正则项，为指数梯度下降更新提供了一些新的见解。

更新时间: 2025-03-21 02:07:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.08748v3

"The Diagram is like Guardrails": Structuring GenAI-assisted Hypotheses Exploration with an Interactive Shared Representation

Data analysis encompasses a spectrum of tasks, from high-level conceptual reasoning to lower-level execution. While AI-powered tools increasingly support execution tasks, there remains a need for intelligent assistance in conceptual tasks. This paper investigates the design of an ordered node-link tree interface augmented with AI-generated information hints and visualizations, as a potential shared representation for hypothesis exploration. Through a design probe (n=22), participants generated diagrams averaging 21.82 hypotheses. Our findings showed that the node-link diagram acts as "guardrails" for hypothesis exploration, facilitating structured workflows, providing comprehensive overviews, and enabling efficient backtracking. The AI-generated information hints, particularly visualizations, aided users in transforming abstract ideas into data-backed concepts while reducing cognitive load. We further discuss how node-link diagrams can support both parallel exploration and iterative refinement in hypothesis formulation, potentially enhancing the breadth and depth of human-AI collaborative data analysis.

Updated: 2025-03-21 02:01:37

标题: 这个标题的翻译是："图表就像护栏"：通过交互式共享表征结构化GenAI辅助的假设探索"

摘要: 数据分析涵盖了从高层次概念推理到较低层次执行的一系列任务。虽然人工智能支持执行任务的工具越来越多，但在概念任务中仍然需要智能辅助。本文研究了一种增强了人工智能生成信息提示和可视化的有序节点链接树界面的设计，作为假设探索的潜在共享表示。通过一个设计探针（n=22），参与者平均生成了21.82个假设的图表。我们的发现显示，节点链接图表作为假设探索的“护栏”，促进了结构化工作流程，提供了全面的概述，并实现了有效的回溯。人工智能生成的信息提示，特别是可视化，帮助用户将抽象的想法转化为有数据支持的概念，同时减轻了认知负荷。我们进一步讨论了节点链接图表如何支持假设制定中的并行探索和迭代优化，可能提高人工智能协作数据分析的广度和深度。

更新时间: 2025-03-21 02:01:37

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.16791v1

HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving

We introduce HunyuanProver, an language model finetuned from the Hunyuan 7B for interactive automatic theorem proving with LEAN4. To alleviate the data sparsity issue, we design a scalable framework to iterative synthesize data with low cost. Besides, guided tree search algorithms are designed to enable effective ``system 2 thinking`` of the prover. HunyuanProver achieves state-of-the-art (SOTA) performances on major benchmarks. Specifically, it achieves a pass of 68.4% on the miniF2F-test compared to 65.9%, the current SOTA results. It proves 4 IMO statements (imo_1960_p2, imo_1962_p2}, imo_1964_p2 and imo_1983_p6) in miniF2F-test. To benefit the community, we will open-source a dataset of 30k synthesized instances, where each instance contains the original question in natural language, the converted statement by autoformalization, and the proof by HunyuanProver.

Updated: 2025-03-21 02:00:37

标题: 混元证明器：用于自动定理证明的可扩展数据合成框架和引导树搜索

摘要: 我们介绍了HunyuanProver，这是一个从Hunyuan 7B微调而来的语言模型，用于与LEAN4进行交互式自动定理证明。为了缓解数据稀疏问题，我们设计了一个可扩展的框架，用低成本来迭代合成数据。此外，我们设计了引导树搜索算法，以实现证明者的有效“系统2思维”。HunyuanProver在主要基准测试中取得了最先进的（SOTA）性能。具体来说，在miniF2F-test中，与当前SOTA结果的65.9%相比，它实现了68.4%的通过率。它在miniF2F-test中证明了4个IMO陈述（imo_1960_p2、imo_1962_p2、imo_1964_p2和imo_1983_p6）。为了造福社区，我们将开源一个包含30k个合成实例的数据集，每个实例包含自然语言中的原始问题、自动形式化转换的陈述以及HunyuanProver的证明。

更新时间: 2025-03-21 02:00:37

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2412.20735v3

KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems

Embodied AI agents responsible for executing interconnected, long-sequence household tasks often face difficulties with in-context memory, leading to inefficiencies and errors in task execution. To address this issue, we introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules, enhancing large language models (LLMs) for planning in embodied agents through memory-augmented prompting. KARMA distinguishes between long-term and short-term memory, with long-term memory capturing comprehensive 3D scene graphs as representations of the environment, while short-term memory dynamically records changes in objects' positions and states. This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning. Short-term memory employs strategies for effective and adaptive memory replacement, ensuring the retention of critical information while discarding less pertinent data. Compared to state-of-the-art embodied agents enhanced with memory, our memory-augmented embodied AI agent improves success rates by 1.3x and 2.3x in Composite Tasks and Complex Tasks within the AI2-THOR simulator, respectively, and enhances task execution efficiency by 3.4x and 62.7x. Furthermore, we demonstrate that KARMA's plug-and-play capability allows for seamless deployment on real-world robotic systems, such as mobile manipulation platforms.Through this plug-and-play memory system, KARMA significantly enhances the ability of embodied agents to generate coherent and contextually appropriate plans, making the execution of complex household tasks more efficient. The experimental videos from the work can be found at https://youtu.be/4BT7fnw9ehs. Our code is available at https://github.com/WZX0Swarm0Robotics/KARMA/tree/master.

Updated: 2025-03-21 01:58:00

标题: KARMA：利用长期和短期记忆系统增强具身体的人工智能代理

摘要: 负责执行相互关联、长序列家庭任务的具体化AI代理往往面临着在上下文记忆中遇到困难，导致任务执行中的低效和错误。为了解决这个问题，我们引入了KARMA，这是一个创新的记忆系统，它整合了长期记忆和短期记忆模块，通过记忆增强提示增强大型语言模型（LLMs）来规划具体化代理。KARMA区分长期记忆和短期记忆，长期记忆捕获环境的综合3D场景图作为表示，而短期记忆动态记录对象位置和状态的变化。这种双重记忆结构使代理能够检索相关的过去场景经验，从而提高任务规划的准确性和效率。短期记忆采用有效和自适应的记忆替换策略，确保保留关键信息同时丢弃不太相关的数据。与使用记忆增强的最新具体化代理相比，我们的记忆增强具体化AI代理在AI2-THOR模拟器中的复合任务和复杂任务的成功率分别提高了1.3倍和2.3倍，并将任务执行效率提高了3.4倍和62.7倍。此外，我们展示了KARMA的即插即用功能，可以无缝部署到现实世界的机器人系统，如移动操纵平台。通过这种即插即用的记忆系统，KARMA显著提高了具体化代理生成连贯和具有上下文适应性的计划的能力，使复杂家庭任务的执行更加高效。该工作的实验视频可在https://youtu.be/4BT7fnw9ehs找到。我们的代码可在https://github.com/WZX0Swarm0Robotics/KARMA/tree/master获取。

更新时间: 2025-03-21 01:58:00

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.14908v2

Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study

Reasoning capabilities have significantly improved the performance of vision-language models (VLMs) in domains such as mathematical problem-solving, coding, and visual question-answering. However, their impact on real-world applications remains unclear. This paper presents the first empirical study on the effectiveness of reasoning-enabled VLMs in mobile GUI agents, a domain that requires interpreting complex screen layouts, understanding user instructions, and executing multi-turn interactions. We evaluate two pairs of commercial models--Gemini 2.0 Flash and Claude 3.7 Sonnet--comparing their base and reasoning-enhanced versions across two static benchmarks (ScreenSpot and AndroidControl) and one interactive environment (AndroidWorld). We surprisingly find the Claude 3.7 Sonnet reasoning model achieves state-of-the-art performance on AndroidWorld. However, reasoning VLMs generally offer marginal improvements over non-reasoning models on static benchmarks and even degrade performance in some agent setups. Notably, reasoning and non-reasoning VLMs fail on different sets of tasks, suggesting that reasoning does have an impact, but its benefits and drawbacks counterbalance each other. We attribute these inconsistencies to the limitations of benchmarks and VLMs. Based on the findings, we provide insights for further enhancing mobile GUI agents in terms of benchmarks, VLMs, and their adaptability in dynamically invoking reasoning VLMs. The experimental data are publicly available at https://github.com/LlamaTouch/VLM-Reasoning-Traces.

Updated: 2025-03-21 01:52:43

标题: 链式推理是否有助于移动GUI代理？一项实证研究

摘要: 推理能力显著提升了视觉语言模型（VLMs）在数学问题解决、编码和视觉问答等领域的表现。然而，它们对现实世界应用的影响仍不清楚。本文首次对启用推理的VLMs在移动GUI代理中的有效性进行了实证研究，这是一个需要解释复杂屏幕布局、理解用户指令并执行多轮交互的领域。我们评估了两对商业模型--Gemini 2.0 Flash和Claude 3.7 Sonnet--在两个静态基准测试（ScreenSpot和AndroidControl）和一个交互环境（AndroidWorld）上比较它们的基础版本和推理增强版本。我们惊讶地发现，Claude 3.7 Sonnet 推理模型在AndroidWorld上实现了最先进的性能。然而，推理VLMs通常在静态基准测试中仅提供边际改进，甚至在某些代理设置中会降低性能。值得注意的是，推理和非推理VLMs在不同任务集上表现失败，这表明推理确实产生影响，但其优势和劣势相互抵消。我们将这些不一致性归因于基准测试和VLMs的限制。根据研究结果，我们提供了进一步增强移动GUI代理的见解，涉及基准测试、VLMs以及在动态调用推理VLMs方面的适应性。实验数据可在https://github.com/LlamaTouch/VLM-Reasoning-Traces 上公开获取。

更新时间: 2025-03-21 01:52:43

领域: cs.AI

下载: http://arxiv.org/abs/2503.16788v1

GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

Given unlabelled datasets containing both old and new categories, generalized category discovery (GCD) aims to accurately discover new classes while correctly classifying old classes. Current GCD methods only use a single visual modality of information, resulting in a poor classification of visually similar classes. As a different modality, text information can provide complementary discriminative information, which motivates us to introduce it into the GCD task. However, the lack of class names for unlabelled data makes it impractical to utilize text information. To tackle this challenging problem, in this paper, we propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples. Specifically, our TES leverages the property that CLIP can generate aligned vision-language features, converting visual embeddings into tokens of the CLIP's text encoder to generate pseudo text embeddings. Besides, we employ a dual-branch framework, through the joint learning and instance consistency of different modality branches, visual and semantic information mutually enhance each other, promoting the interaction and fusion of visual and text knowledge. Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks, achieving new state-of-the-art. Our code is available at: https://github.com/enguangW/GET.

Updated: 2025-03-21 01:50:55

标题: 获取：解锁CLIP的多模态潜力，用于广义类别发现

摘要: 鉴于包含旧类别和新类别的未标记数据集，广义类别发现（GCD）旨在准确发现新类别同时正确分类旧类别。当前的GCD方法仅使用单一的视觉信息模态，导致对视觉上相似类别的分类效果不佳。作为另一种模态，文本信息可以提供互补的辨别信息，这激发了我们将其引入到GCD任务中的动机。然而，未标记数据缺乏类别名称，使得利用文本信息不切实际。为了解决这一具有挑战性的问题，在本文中，我们提出了一个文本嵌入合成器（TES），用于为未标记样本生成伪文本嵌入。具体而言，我们的TES利用了CLIP可以生成对齐的视觉-语言特征的属性，将视觉嵌入转换为CLIP文本编码器的标记，以生成伪文本嵌入。此外，我们采用了双分支框架，通过不同模态分支的联合学习和实例一致性，视觉和语义信息相互增强，促进视觉和文本知识的交互和融合。我们的方法释放了CLIP的多模态潜力，并在所有GCD基准测试中大幅优于基准方法，实现了新的最先进水平。我们的代码可在以下链接找到：https://github.com/enguangW/GET。

更新时间: 2025-03-21 01:50:55

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.09974v3

A Comprehensive Survey of Time Series Forecasting: Architectural Diversity and Open Challenges

Time series forecasting is a critical task that provides key information for decision-making across various fields. Recently, various fundamental deep learning architectures such as MLPs, CNNs, RNNs, and GNNs have been developed and applied to solve time series forecasting problems. However, the structural limitations caused by the inductive biases of each deep learning architecture constrained their performance. Transformer models, which excel at handling long-term dependencies, have become significant architectural components for time series forecasting. However, recent research has shown that alternatives such as simple linear layers can outperform Transformers. These findings have opened up new possibilities for using diverse architectures. In this context of exploration into various models, the architectural modeling of time series forecasting has now entered a renaissance. This survey not only provides a historical context for time series forecasting but also offers comprehensive and timely analysis of the movement toward architectural diversification. By comparing and re-examining various deep learning models, we uncover new perspectives and presents the latest trends in time series forecasting, including the emergence of hybrid models, diffusion models, Mamba models, and foundation models. By focusing on the inherent characteristics of time series data, we also address open challenges that have gained attention in time series forecasting, such as channel dependency, distribution shift, causality, and feature extraction. This survey explores vital elements that can enhance forecasting performance through diverse approaches. These contributions lead to lowering the entry barriers for newcomers to the field of time series forecasting, while also offering seasoned researchers broad perspectives, new opportunities, and deep insights.

Updated: 2025-03-21 01:49:26

标题: 时间序列预测的全面调查：架构多样性和开放挑战

摘要: 时间序列预测是一个关键任务，为各个领域的决策提供关键信息。最近，各种基本的深度学习架构，如MLPs、CNNs、RNNs和GNNs已经被开发并应用于解决时间序列预测问题。然而，由于每种深度学习架构的归纳偏差所造成的结构限制限制了它们的性能。Transformer模型擅长处理长期依赖性，已成为时间序列预测的重要架构组件。然而，最近的研究表明，简单的线性层可以胜过Transformer。这些发现为使用多样化的架构开辟了新的可能性。在对各种模型进行探索的背景下，时间序列预测的架构建模现已进入复兴期。本调查不仅提供了时间序列预测的历史背景，还提供了对架构多样化趋势的全面和及时的分析。通过比较和重新审视各种深度学习模型，我们揭示了时间序列预测中的新视角，并呈现了最新的趋势，包括混合模型、扩散模型、Mamba模型和基础模型的出现。通过关注时间序列数据的固有特征，我们还讨论了时间序列预测中备受关注的开放性挑战，如通道依赖性、分布转移、因果关系和特征提取。本调查探索了通过多样化方法增强预测性能的关键要素。这些贡献有助于降低新人进入时间序列预测领域的门槛，同时为资深研究人员提供广阔的视角、新机会和深刻的见解。

更新时间: 2025-03-21 01:49:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.05793v2

TCProF: Time-Complexity Prediction SSL Framework

Time complexity is a theoretic measure to determine the amount of time the algorithm needs for its execution. In reality, developers write algorithms into code snippets within limited resources, making the calculation of a code's time complexity a fundamental task. However, determining the precise time complexity of a code is theoretically undecidable. In response, recent advancements have leaned toward deploying datasets for code time complexity prediction and initiating preliminary experiments for this challenge. We investigate the challenge in low-resource scenarios where only a few labeled instances are given for training. Remarkably, we are the first to introduce TCProF: a Time-Complexity Prediction SSL Framework as an effective solution for code time complexity prediction in low-resource settings. TCProF significantly boosts performance by integrating our augmentation, symbolic modules, and a co-training mechanism, achieving a more than 60% improvement over self-training approaches. We further provide an extensive comparative analysis between TCProF, ChatGPT, and Gemini-Pro, offering a detailed evaluation of our approach. Our code is at https://github.com/peer0/few-shot-tc.

Updated: 2025-03-21 01:48:59

标题: TCProF: 时间复杂度预测SSL框架

摘要: 时间复杂度是一个理论上的度量，用于确定算法在执行过程中所需的时间量。在现实中，开发人员将算法编写成代码片段，但受限于资源，使得计算代码时间复杂度成为一项基本任务。然而，确定代码的精确时间复杂度在理论上是不可判定的。为此，最近的进展倾向于使用数据集来预测代码的时间复杂度，并为此挑战启动初步实验。我们研究了在资源有限的情况下的挑战，仅提供了少量标记实例进行训练。值得注意的是，我们是第一个引入TCProF的团队：一种低资源环境下代码时间复杂度预测的SSL框架，作为代码时间复杂度预测的有效解决方案。TCProF通过整合我们的增强、符号模块和共训练机制显著提高了性能，比自我训练方法提高了60%以上。我们进一步对TCProF、ChatGPT和Gemini-Pro进行了广泛的比较分析，提供了对我们方法的详细评估。我们的代码位于https://github.com/peer0/few-shot-tc。

更新时间: 2025-03-21 01:48:59

领域: cs.SE,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2502.15749v2

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. This success offers new promise for robotics, which has long been constrained by the high cost of action-labeled data. We ask: given the abundant video data containing interaction-related knowledge available as a rich "corpus", can a similar generative pre-training approach be effectively applied to enhance robot learning? The key challenge is to identify an effective representation for autoregressive pre-training that benefits robot manipulation tasks. Inspired by the way humans learn new skills through observing dynamic environments, we propose that effective robotic learning should emphasize motion-related knowledge, which is closely tied to low-level actions and is hardware-agnostic, facilitating the transfer of learned motions to actual robot actions. To this end, we introduce Moto, which converts video content into latent Motion Token sequences by a Latent Motion Tokenizer, learning a bridging "language" of motion from videos in an unsupervised manner. We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. After pre-training, Moto-GPT demonstrates the promising ability to produce semantically interpretable motion tokens, predict plausible motion trajectories, and assess trajectory rationality through output likelihood. To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control. Extensive experiments show that the fine-tuned Moto-GPT exhibits superior robustness and efficiency on robot manipulation benchmarks, underscoring its effectiveness in transferring knowledge from video data to downstream visual manipulation tasks.

Updated: 2025-03-21 01:45:21

标题: Moto：潜在运动令牌作为学习机器人从视频中操作的桥梁语言

摘要: 最近在大型语言模型上预训练的最新发展在各种自然语言处理任务中取得了显着成功，而且只需进行最少的微调。这一成功为机器人技术带来了新的希望，长期以来，机器人技术受到动作标记数据成本高昂的限制。我们提出一个问题：鉴于丰富的包含互动相关知识的视频数据作为一个丰富的“语料库”可用，是否可以有效地应用类似的生成式预训练方法来增强机器人学习？关键挑战是确定一种有利于机器人操作任务的自回归预训练的有效表示。受到人类通过观察动态环境学习新技能的方式的启发，我们提出有效的机器人学习应该强调与低级动作紧密相关的运动知识，这种知识与硬件无关，有助于将学习到的运动转移到实际机器人动作中。为此，我们引入了Moto，通过隐性运动令牌生成器将视频内容转换为潜在的运动令牌序列，以无监督的方式从视频中学习运动的桥接“语言”。我们通过运动令牌自回归对Moto-GPT进行预训练，使其能够捕捉多样化的视觉运动知识。在预训练之后，Moto-GPT展示了产生语义可解释的运动令牌、预测合理的运动轨迹，并通过输出概率评估轨迹的合理性的能力。为了将学习到的运动先验知识转移到真实机器人动作中，我们实施了一种联合微调策略，无缝地连接了隐性运动令牌预测和真实机器人控制。大量实验表明，微调后的Moto-GPT在机器人操作基准测试中表现出卓越的鲁棒性和效率，突显了其在将知识从视频数据转移到下游视觉操作任务中的有效性。

更新时间: 2025-03-21 01:45:21

领域: cs.RO,cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.04445v3

CoBRA: A Universal Strategyproof Confirmation Protocol for Quorum-based Proof-of-Stake Blockchains

We present a formal analysis of quorum-based State Machine Replication (SMR) protocols in Proof-of-Stake (PoS) systems under a hybrid threat model comprising honest, Byzantine, and rational validators. Our analysis of traditional quorum-based protocols establishes two fundamental impossibility results: (1) in partially synchronous networks, no quorum-based protocol can achieve SMR when rational and Byzantine validators comprise more than $1/3$ of participants, and (2) in synchronous networks, SMR remains impossible when rational and Byzantine validators comprise $2/3$ or more of participants. To overcome these limitations, we propose two complementary solutions in our hybrid model. First, we introduce a protocol that enforces a bound on the volume of the total transacted amount that is finalized within any time window $\Delta$ and prove that this bound is necessary for secure SMR protocols in our model. Second, we present the \emph{strongest chain rule}, which enables efficient finalization of transactions when the majority of honest participants provably support the SMR execution. Through empirical analysis of Ethereum and Cosmos networks, we demonstrate that validator participation consistently exceeds the required ${5}/{6}$ threshold, establishing the practical feasibility of our solution in production PoS systems.

Updated: 2025-03-21 01:39:29

标题: CoBRA：一种基于配额的权益证明区块链的通用的无欺诈确认协议

摘要: 我们在混合威胁模型下，对基于法定人数的共识状态机复制（SMR）协议在权益证明（PoS）系统中进行了正式分析，该模型包括诚实、拜占庭和理性验证者。我们对传统的基于法定人数的协议进行了分析，得出了两个基本的不可能结果：（1）在部分同步网络中，当理性和拜占庭验证者占参与者的$1/3$以上时，任何基于法定人数的协议都无法实现SMR；（2）在同步网络中，当理性和拜占庭验证者占参与者的$2/3$或更多时，SMR仍然是不可能的。为了克服这些限制，在我们的混合模型中提出了两种互补的解决方案。首先，我们引入了一种协议，强制对任何时间窗口$\Delta$内完成的总交易金额进行限制，并证明这一限制对于我们模型中安全的SMR协议是必要的。其次，我们提出了最强链规则，当大多数诚实参与者明显支持SMR执行时，可以实现交易的有效最终确认。通过对以太坊和宇宙网络的实证分析，我们证明验证者参与度始终超过所需的${5}/{6}$阈值，从而证明了我们的解决方案在生产PoS系统中的实用可行性。

更新时间: 2025-03-21 01:39:29

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2503.16783v1

Learning Part Knowledge to Facilitate Category Understanding for Fine-Grained Generalized Category Discovery

Generalized Category Discovery (GCD) aims to classify unlabeled data containing both seen and novel categories. Although existing methods perform well on generic datasets, they struggle in fine-grained scenarios. We attribute this difficulty to their reliance on contrastive learning over global image features to automatically capture discriminative cues, which fails to capture the subtle local differences essential for distinguishing fine-grained categories. Therefore, in this paper, we propose incorporating part knowledge to address fine-grained GCD, which introduces two key challenges: the absence of annotations for novel classes complicates the extraction of the part features, and global contrastive learning prioritizes holistic feature invariance, inadvertently suppressing discriminative local part patterns. To address these challenges, we propose PartGCD, including 1) Adaptive Part Decomposition, which automatically extracts class-specific semantic parts via Gaussian Mixture Models, and 2) Part Discrepancy Regularization, enforcing explicit separation between part features to amplify fine-grained local part distinctions. Experiments demonstrate state-of-the-art performance across multiple fine-grained benchmarks while maintaining competitiveness on generic datasets, validating the effectiveness and robustness of our approach.

Updated: 2025-03-21 01:37:51

标题: 学习部分知识以促进对于细粒度广义类别发现的类别理解

摘要: 广义类别发现（GCD）旨在对包含已知和新颖类别的未标记数据进行分类。虽然现有方法在通用数据集上表现良好，但在细粒度场景中表现不佳。我们将这一困难归因于它们依赖于对全局图像特征进行对比学习，以自动捕获区分性线索，这种方法未能捕获区分细粒度类别所必需的微妙局部差异。因此，在本文中，我们提出将部分知识纳入以解决细粒度GCD问题，这引入了两个关键挑战：新颖类别的缺乏注释使得提取部分特征变得复杂，并且全局对比学习优先考虑整体特征的不变性，无意中抑制了区分性的局部部分模式。为了解决这些挑战，我们提出了PartGCD，包括1）自适应部分分解，通过高斯混合模型自动提取类别特定语义部分，和2）部分差异规范化，强制在部分特征之间进行显式分离以放大细粒度局部部分差异。实验表明，在多个细粒度基准测试中展现了最先进的性能，同时在通用数据集上保持竞争力，验证了我们方法的有效性和鲁棒性。

更新时间: 2025-03-21 01:37:51

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.16782v1

SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments

As robots become increasingly capable, users will want to describe high-level missions and have robots infer the relevant details. Because pre-built maps are difficult to obtain in many realistic settings, accomplishing such missions will require the robot to map and plan online. While many semantic planning methods operate online, they are typically designed for well specified missions such as object search or exploration. Recently, Large Language Models (LLMs) have demonstrated powerful contextual reasoning abilities over a range of robotic tasks described in natural language. However, existing LLM-enabled planners typically do not consider online planning or complex missions; rather, relevant subtasks and semantics are provided by a pre-built map or a user. We address these limitations via SPINE, an online planner for missions with incomplete mission specifications provided in natural language. The planner uses an LLM to reason about subtasks implied by the mission specification and then realizes these subtasks in a receding horizon framework. Tasks are automatically validated for safety and refined online with new map observations. We evaluate SPINE in simulation and real-world settings with missions that require multiple steps of semantic reasoning and exploration in cluttered outdoor environments of over 20,000m$^2$. Compared to baselines that use existing LLM-enabled planning approaches, our method is over twice as efficient in terms of time and distance, requires less user interactions, and does not require a full map. Additional resources are provided at https://zacravichandran.github.io/SPINE.

Updated: 2025-03-21 01:34:48

标题: SPINE：用于在非结构化环境中使用不完整自然语言规范进行任务的在线语义规划

摘要: 随着机器人能力的不断提升，用户希望能够描述高级任务，并让机器人推断相关细节。由于在许多现实环境中很难获取预先构建的地图，完成这些任务将需要机器人在线地进行地图绘制和规划。虽然许多语义规划方法在线操作，但它们通常被设计用于明确定义的任务，如物体搜索或探索。最近，大型语言模型（LLMs）展示了在自然语言描述的一系列机器人任务上强大的上下文推理能力。然而，现有的LLM启用规划器通常不考虑在线规划或复杂任务；相反，相关子任务和语义是由预先构建的地图或用户提供的。我们通过SPINE来解决这些限制，这是一个用于以自然语言提供不完整任务规范的在线规划器。该规划器使用LLM来推理任务规范中隐含的子任务，然后在一个递进视野框架中实现这些子任务。任务会自动进行安全验证，并通过新的地图观察在线进行细化。我们在模拟和现实环境中评估了SPINE，这些环境需要在超过20,000平方米的杂乱户外环境中进行多步语义推理和探索的任务。与使用现有LLM启用规划方法的基线相比，我们的方法在时间和距离方面效率提高了一倍以上，需要更少的用户交互，并且不需要完整地图。更多资源请访问https://zacravichandran.github.io/SPINE。

更新时间: 2025-03-21 01:34:48

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.03035v3

Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

Tool learning can further broaden the usage scenarios of large language models (LLMs). However most of the existing methods either need to finetune that the model can only use tools seen in the training data, or add tool demonstrations into the prompt with lower efficiency. In this paper, we present a new Tool Learning method Chain-of-Tools. It makes full use of the powerful semantic representation capability of frozen LLMs to finish tool calling in CoT reasoning with a huge and flexible tool pool which may contain unseen tools. Especially, to validate the effectiveness of our approach in the massive unseen tool scenario, we construct a new dataset SimpleToolQuestions. We conduct experiments on two numerical reasoning benchmarks (GSM8K-XL and FuncQA) and two knowledge-based question answering benchmarks (KAMEL and SimpleToolQuestions). Experimental results show that our approach performs better than the baseline. We also identify dimensions of the model output that are critical in tool selection, enhancing the model interpretability. Our code and data are available at: https://github.com/fairyshine/Chain-of-Tools .

Updated: 2025-03-21 01:26:12

标题: 工具链：利用冷冻语言模型中的大规模未见工具进行CoT推理

摘要: 工具学习可以进一步扩大大型语言模型（LLMs）的使用场景。然而，大多数现有方法要么需要微调模型以便只能使用在训练数据中看到的工具，要么在提示中添加工具演示，效率较低。在本文中，我们提出了一种新的工具学习方法Chain-of-Tools。它充分利用了冻结LLMs的强大语义表示能力，在CoT推理中完成工具调用，使用一个巨大而灵活的工具池，其中可能包含未见过的工具。特别是，为了验证我们的方法在大规模未见工具场景中的有效性，我们构建了一个新的数据集SimpleToolQuestions。我们在两个数值推理基准（GSM8K-XL和FuncQA）和两个基于知识的问题回答基准（KAMEL和SimpleToolQuestions）上进行实验。实验结果显示我们的方法表现优于基准线。我们还确定了模型输出中对工具选择至关重要的维度，增强了模型的可解释性。我们的代码和数据可在以下链接找到：https://github.com/fairyshine/Chain-of-Tools。

更新时间: 2025-03-21 01:26:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.16779v1

Scientific Machine Learning Seismology

Scientific machine learning (SciML) is an interdisciplinary research field that integrates machine learning, particularly deep learning, with physics theory to understand and predict complex natural phenomena. By incorporating physical knowledge, SciML reduces the dependency on observational data, which is often limited in the natural sciences. In this article, the fundamental concepts of SciML, its applications in seismology, and prospects are described. Specifically, two popular methods are mainly discussed: physics-informed neural networks (PINNs) and neural operators (NOs). PINNs can address both forward and inverse problems by incorporating governing laws into the loss functions. The use of PINNs is expanding into areas such as simultaneous solutions of differential equations, inference in underdetermined systems, and regularization based on physics. These research directions would broaden the scope of deep learning in natural sciences. NOs are models designed for operator learning, which deals with relationships between infinite-dimensional spaces. NOs show promise in modeling the time evolution of complex systems based on observational or simulation data. Since large amounts of data are often required, combining NOs with physics-informed learning holds significant potential. Finally, SciML is considered from a broader perspective beyond deep learning: statistical (or mathematical) frameworks that integrate observational data with physical principles to model natural phenomena. In seismology, mathematically rigorous Bayesian statistics has been developed over the past decades, whereas more flexible and scalable deep learning has only emerged recently. Both approaches can be considered as part of SciML in a broad sense. Theoretical and practical insights in both directions would advance SciML methodologies and thereby deepen our understanding of earthquake phenomena.

Updated: 2025-03-21 01:24:11

标题: 科学机器学习地震学

摘要: 科学机器学习（SciML）是一个跨学科研究领域，将机器学习，特别是深度学习，与物理理论相结合，以理解和预测复杂的自然现象。通过融入物理知识，SciML减少了对观测数据的依赖，而在自然科学中观测数据往往是有限的。本文描述了SciML的基本概念，其在地震学中的应用和前景。具体而言，主要讨论了两种流行的方法：物理信息神经网络（PINNs）和神经算子（NOs）。PINNs可以通过将控制规律融入损失函数来解决正向和反向问题。PINNs的应用范围正在扩大，包括微分方程的同时解决，欠定系统的推断和基于物理的正则化。这些研究方向将拓展深度学习在自然科学中的应用范围。NOs是为算子学习设计的模型，处理无限维空间之间的关系。NOs在基于观测或模拟数据建模复杂系统的时间演化方面表现出潜力。由于通常需要大量数据，将NOs与基于物理的学习相结合具有重要潜力。最后，从更广泛的角度考虑SciML，超越深度学习：将观测数据与物理原理结合起来建模自然现象的统计（或数学）框架。在地震学中，数学严谨的贝叶斯统计方法在过去几十年里得到了发展，而更加灵活和可扩展的深度学习则是最近才出现。这两种方法可以在广义上被视为SciML的一部分。在这两个方向上的理论和实践见解将推动SciML方法论的发展，从而加深我们对地震现象的理解。

更新时间: 2025-03-21 01:24:11

领域: physics.geo-ph,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2409.18397v2

Generative Modeling of Class Probability for Multi-Modal Representation Learning

Multi-modal understanding plays a crucial role in artificial intelligence by enabling models to jointly interpret inputs from different modalities. However, conventional approaches such as contrastive learning often struggle with modality discrepancies, leading to potential misalignments. In this paper, we propose a novel class anchor alignment approach that leverages class probability distributions for multi-modal representation learning. Our method, Class-anchor-ALigned generative Modeling (CALM), encodes class anchors as prompts to generate and align class probability distributions for each modality, enabling more effective alignment. Furthermore, we introduce a cross-modal probabilistic variational autoencoder to model uncertainty in the alignment, enhancing the ability to capture deeper relationships between modalities and data variations. Extensive experiments on four benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, especially in out-of-domain evaluations. This highlights its superior generalization capabilities in multi-modal representation learning.

Updated: 2025-03-21 01:17:44

标题: 多模态表示学习的类概率生成建模

摘要: 多模态理解在人工智能中起着至关重要的作用，通过使模型能够共同解释来自不同模态的输入。然而，传统方法如对比学习通常难以处理模态差异，导致潜在的不对齐。在本文中，我们提出了一种新颖的类锚对齐方法，利用类概率分布进行多模态表示学习。我们的方法，Class-anchor-ALigned generative Modeling (CALM)，将类锚编码为提示，以生成和对齐每种模态的类概率分布，实现更有效的对齐。此外，我们引入了一个跨模态概率变分自动编码器来模拟对齐中的不确定性，增强了捕捉模态之间更深层关系和数据变化的能力。在四个基准数据集上的大量实验表明，我们的方法明显优于最先进的方法，尤其是在域外评估中。这突显了我们的方法在多模态表示学习中具有优越的泛化能力。

更新时间: 2025-03-21 01:17:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17417v1

Physics-Informed Deep B-Spline Networks for Dynamical Systems

Physics-informed machine learning provides an approach to combining data and governing physics laws for solving complex partial differential equations (PDEs). However, efficiently solving PDEs with varying parameters and changing initial conditions and boundary conditions (ICBCs) with theoretical guarantees remains an open challenge. We propose a hybrid framework that uses a neural network to learn B-spline control points to approximate solutions to PDEs with varying system and ICBC parameters. The proposed network can be trained efficiently as one can directly specify ICBCs without imposing losses, calculate physics-informed loss functions through analytical formulas, and requires only learning the weights of B-spline functions as opposed to both weights and basis as in traditional neural operator learning methods. We provide theoretical guarantees that the proposed B-spline networks serve as universal approximators for the set of solutions of PDEs with varying ICBCs under mild conditions and establish bounds on the generalization errors in physics-informed learning. We also demonstrate in experiments that the proposed B-spline network can solve problems with discontinuous ICBCs and outperforms existing methods, and is able to learn solutions of 3D dynamics with diverse initial conditions.

Updated: 2025-03-21 01:15:40

标题: 物理信息的深度B样条网络用于动力系统

摘要: 物理学知识驱动的机器学习提供了一种将数据和物理规律相结合的方法，用于解决复杂的偏微分方程（PDEs）。然而，高效地解决具有不同参数和不断变化的初始条件和边界条件（ICBCs）的PDEs，并且具有理论保证，仍然是一个开放的挑战。我们提出了一个混合框架，使用神经网络学习B样条控制点，以逼近具有不同系统和ICBC参数的PDEs的解。所提出的网络可以高效地训练，因为可以直接指定ICBCs而无需施加损失，通过分析公式计算物理知识驱动的损失函数，并且只需要学习B样条函数的权重，而不像传统神经算子学习方法那样需要学习权重和基础。我们提供了理论保证，即所提出的B样条网络在温和条件下作为PDEs解集的通用逼近器，并建立了在物理知识驱动学习中的泛化误差上的界限。我们还在实验中展示，所提出的B样条网络可以解决具有不连续ICBCs的问题，并且优于现有方法，并且能够学习具有多样化初始条件的3D动力学解。

更新时间: 2025-03-21 01:15:40

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.16777v1

Debugging and Runtime Analysis of Neural Networks with VLMs (A Case Study)

Debugging of Deep Neural Networks (DNNs), particularly vision models, is very challenging due to the complex and opaque decision-making processes in these networks. In this paper, we explore multi-modal Vision-Language Models (VLMs), such as CLIP, to automatically interpret the opaque representation space of vision models using natural language. This in turn, enables a semantic analysis of model behavior using human-understandable concepts, without requiring costly human annotations. Key to our approach is the notion of semantic heatmap, that succinctly captures the statistical properties of DNNs in terms of the concepts discovered with the VLM and that are computed off-line using a held-out data set. We show the utility of semantic heatmaps for fault localization -- an essential step in debugging -- in vision models. Our proposed technique helps localize the fault in the network (encoder vs head) and also highlights the responsible high-level concepts, by leveraging novel differential heatmaps, which summarize the semantic differences between the correct and incorrect behaviour of the analyzed DNN. We further propose a lightweight runtime analysis to detect and filter-out defects at runtime, thus improving the reliability of the analyzed DNNs. The runtime analysis works by measuring and comparing the similarity between the heatmap computed for a new (unseen) input and the heatmaps computed a-priori for correct vs incorrect DNN behavior. We consider two types of defects: misclassifications and vulnerabilities to adversarial attacks. We demonstrate the debugging and runtime analysis on a case study involving a complex ResNet-based classifier trained on the RIVAL10 dataset.

Updated: 2025-03-21 01:12:57

标题: 神经网络的调试和运行时分析与VLMs（案例研究）

摘要: 深度神经网络（DNNs）的调试，特别是视觉模型，由于这些网络中复杂且不透明的决策过程而非常具有挑战性。在本文中，我们探索了多模态视觉语言模型（VLMs），例如CLIP，以自动解释视觉模型的不透明表示空间，利用自然语言。这反过来，通过使用人类可理解的概念，实现了对模型行为的语义分析，而无需昂贵的人类注释。我们方法的关键在于语义热图的概念，它简洁地捕获了DNN的统计属性，以VLM发现的概念为基础，并且使用离线计算的保留数据集。我们展示了语义热图在视觉模型中故障定位的实用性--这是调试中的一个基本步骤。我们提出的技术有助于定位网络（编码器与头部）中的故障，并且通过利用新颖的差异热图，突出显示分析的DNN的正确和不正确行为之间的语义差异，从而总结出负责的高级概念。我们进一步提出了一种轻量级的运行时分析来在运行时检测和过滤缺陷，从而提高了分析的DNN的可靠性。运行时分析通过测量和比较为新（未曾见过）输入计算的热图和先前为正确与不正确的DNN行为计算的热图之间的相似性来工作。我们考虑两种类型的缺陷：错误分类和对敌对攻击的脆弱性。我们在一个涉及基于复杂ResNet的分类器在RIVAL10数据集上训练的案例研究中展示了调试和运行时分析。

更新时间: 2025-03-21 01:12:57

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.17416v1

Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)

The rapid growth of video content demands efficient and precise retrieval systems. While vision-language models (VLMs) excel in representation learning, they often struggle with adaptive, time-sensitive video retrieval. This paper introduces a novel framework that combines vector similarity search with graph-based data structures. By leveraging VLM embeddings for initial retrieval and modeling contextual relationships among video segments, our approach enables adaptive query refinement and improves retrieval accuracy. Experiments demonstrate its precision, scalability, and robustness, offering an effective solution for interactive video retrieval in dynamic environments.

Updated: 2025-03-21 01:11:14

标题: 通过视觉-语言模型（VLMs）提升后续视频检索

摘要: 视频内容的快速增长要求高效精确的检索系统。虽然视觉-语言模型（VLMs）在表示学习方面表现出色，但在适应性、时间敏感的视频检索方面经常遇到困难。本文介绍了一个将向量相似性搜索与基于图的数据结构相结合的新框架。通过利用VLM嵌入进行初始检索，并对视频段之间的上下文关系进行建模，我们的方法实现了自适应查询改进并提高了检索准确性。实验证明了其精度、可扩展性和稳健性，为动态环境中的交互式视频检索提供了有效解决方案。

更新时间: 2025-03-21 01:11:14

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2503.17415v1

Inteligencia Artificial para la conservación y uso sostenible de la biodiversidad, una visión desde Colombia (Artificial Intelligence for conservation and sustainable use of biodiversity, a view from Colombia)

The rise of artificial intelligence (AI) and the aggravating biodiversity crisis have resulted in a research area where AI-based computational methods are being developed to act as allies in conservation, and the sustainable use and management of natural resources. While important general guidelines have been established globally regarding the opportunities and challenges that this interdisciplinary research offers, it is essential to generate local reflections from the specific contexts and realities of each region. Hence, this document aims to analyze the scope of this research area from a perspective focused on Colombia and the Neotropics. In this paper, we summarize the main experiences and debates that took place at the Humboldt Institute between 2023 and 2024 in Colombia. To illustrate the variety of promising opportunities, we present current uses such as automatic species identification from images and recordings, species modeling, and in silico bioprospecting, among others. From the experiences described above, we highlight limitations, challenges, and opportunities for in order to successfully implementate AI in conservation efforts and sustainable management of biological resources in the Neotropics. The result aims to be a guide for researchers, decision makers, and biodiversity managers, facilitating the understanding of how artificial intelligence can be effectively integrated into conservation and sustainable use strategies. Furthermore, it also seeks to open a space for dialogue on the development of policies that promote the responsible and ethical adoption of AI in local contexts, ensuring that its benefits are harnessed without compromising biodiversity or the cultural and ecosystemic values inherent in Colombia and the Neotropics.

Updated: 2025-03-21 01:10:08

标题: 人工智能在哥伦比亚生物多样性保护和可持续利用中的视角

摘要: 人工智能（AI）的兴起和加剧的生物多样性危机已经导致一个研究领域的出现，即正在开发基于AI的计算方法作为保护、可持续利用和管理自然资源的盟友。虽然全球已经建立了重要的一般指导方针，涉及这一跨学科研究所提供的机遇和挑战，但从各个地区的特定背景和现实生成本地反思也是至关重要的。因此，本文旨在从哥伦比亚和新热带地区的视角分析这一研究领域的范围。在本文中，我们总结了2023年至2024年间在哥伦比亚洪堡研究所进行的主要经验和辩论。为了展示多种有前景的机会，我们介绍了当前的应用，如从图像和记录中进行自动物种识别、物种建模和体外生物探索等。从上述描述的经验中，我们强调了在新热带地区成功实施AI以促进保护工作和生物资源可持续管理的限制、挑战和机会。该结果旨在成为研究人员、决策者和生物多样性管理者的指南，促进对人工智能如何有效融入保护和可持续利用战略的理解。此外，它还旨在为推动在当地环境中促进负责任和道德采用AI的政策发展开辟一个对话空间，确保其好处得到利用而不危及哥伦比亚和新热带地区固有的生物多样性或文化和生态系统价值观。

更新时间: 2025-03-21 01:10:08

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.14543v2

On Explaining (Large) Language Models For Code Using Global Code-Based Explanations

In recent years, Language Models for Code (LLM4Code) have significantly changed the landscape of software engineering (SE) on downstream tasks, such as code generation, by making software development more efficient. Therefore, a growing interest has emerged in further evaluating these Language Models to homogenize the quality assessment of generated code. As the current evaluation process can significantly overreact on accuracy-based metrics, practitioners often seek methods to interpret LLM4Code outputs beyond canonical benchmarks. While the majority of research reports on code generation effectiveness in terms of expected ground truth, scant attention has been paid to LLMs' explanations. In essence, the decision-making process to generate code is hard to interpret. To bridge this evaluation gap, we introduce code rationales (Code$Q$), a technique with rigorous mathematical underpinning, to identify subsets of tokens that can explain individual code predictions. We conducted a thorough Exploratory Analysis to demonstrate the method's applicability and a User Study to understand the usability of code-based explanations. Our evaluation demonstrates that Code$Q$ is a powerful interpretability method to explain how (less) meaningful input concepts (i.e., natural language particle `at') highly impact output generation. Moreover, participants of this study highlighted Code$Q$'s ability to show a causal relationship between the input and output of the model with readable and informative explanations on code completion and test generation tasks. Additionally, Code$Q$ also helps to uncover model rationale, facilitating comparison with a human rationale to promote a fair level of trust and distrust in the model.

Updated: 2025-03-21 01:00:45

标题: 关于使用全局基于代码的解释来解释（大型）语言模型对代码的影响

摘要: 近年来，用于代码的语言模型（LLM4Code）已经显著改变了软件工程（SE）领域，通过使软件开发更加高效，对下游任务（如代码生成）产生了影响。因此，人们对进一步评估这些语言模型以使生成的代码的质量评估更加一致产生了越来越浓厚的兴趣。由于当前的评估过程在基于准确度的度量上可能会出现显著的过度反应，从业者经常寻求超越规范基准的方法来解释LLM4Code的输出。尽管大多数研究报告关于代码生成效果方面的期望地面真相，但对LLMs的解释却受到了极少关注。实质上，生成代码的决策过程很难解释。为弥补这一评估差距，我们引入了代码原理（Code$Q），这是一种具有严格数学基础的技术，用于识别可以解释单个代码预测的令牌子集。我们进行了彻底的探索性分析，以展示该方法的适用性，并进行了用户研究，以了解基于代码的解释的可用性。我们的评估表明，Code$Q$是一种强大的可解释性方法，可以解释输入概念（即自然语言粒子“at”）如何（较少）地影响输出生成。此外，本研究的参与者强调了Code$Q$展示输入与模型输出之间因果关系的能力，提供了关于代码完成和测试生成任务的可读和信息丰富的解释。此外，Code$Q还有助于揭示模型的原理，促进与人类原理的比较，以促进对模型的公平程度的信任和不信任。

更新时间: 2025-03-21 01:00:45

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2503.16771v1

Privacy Ethics Alignment in AI: A Stakeholder-Centric Based Framework for Ethical AI

The increasing integration of Artificial Intelligence (AI) in digital ecosystems has reshaped privacy dynamics, particularly for young digital citizens navigating data-driven environments. This study explores evolving privacy concerns across three key stakeholder groups, digital citizens (ages 16-19), parents/educators, and AI professionals, and assesses differences in data ownership, trust, transparency, parental mediation, education, and risk-benefit perceptions. Employing a grounded theory methodology, this research synthesizes insights from 482 participants through structured surveys, qualitative interviews, and focus groups. The findings reveal distinct privacy expectations: Young users emphasize autonomy and digital freedom, while parents and educators advocate for regulatory oversight and AI literacy programs. AI professionals, in contrast, prioritize the balance between ethical system design and technological efficiency. The data further highlights gaps in AI literacy and transparency, emphasizing the need for comprehensive, stakeholder-driven privacy frameworks that accommodate diverse user needs. Using comparative thematic analysis, this study identifies key tensions in privacy governance and develops the novel Privacy-Ethics Alignment in AI (PEA-AI) model, which structures privacy decision-making as a dynamic negotiation between stakeholders. By systematically analyzing themes such as transparency, user control, risk perception, and parental mediation, this research provides a scalable, adaptive foundation for AI governance, ensuring that privacy protections evolve alongside emerging AI technologies and youth-centric digital interactions.

Updated: 2025-03-21 00:54:33

标题: AI中的隐私伦理对齐：以利益相关者为中心的道德AI框架

摘要: 人工智能在数字生态系统中的增加整合已经重塑了隐私动态，特别是对于年轻的数字公民在数据驱动环境中的导航。本研究探讨了数字公民（年龄在16-19岁之间）、父母/教育者和人工智能专业人员三个关键利益相关者群体之间不断演变的隐私关注，并评估了在数据所有权、信任、透明度、家长调解、教育和风险-利益感知方面的差异。采用扎根理论方法，本研究通过结构化调查、定性访谈和焦点小组综合了482名参与者的见解。研究结果显示出明显的隐私期望：年轻用户强调自主权和数字自由，而父母和教育者主张监管监督和人工智能素养计划。相比之下，人工智能专业人员优先考虑道德系统设计和技术效率之间的平衡。数据进一步突显了人工智能素养和透明度方面的差距，强调了需要全面的、以利益相关者为驱动的隐私框架，以满足不同用户需求。通过比较主题分析，本研究确定了隐私治理中的关键张力，并制定了新颖的隐私-人工智能伦理对齐（PEA-AI）模型，将隐私决策定位为各利益相关者之间的动态协商。通过系统地分析透明度、用户控制、风险感知和家长调解等主题，本研究为人工智能治理提供了可扩展、适应性的基础，确保隐私保护与新兴人工智能技术和以青少年为中心的数字互动同步演进。

更新时间: 2025-03-21 00:54:33

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.11950v2

Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking

Mainstream visual object tracking frameworks predominantly rely on template matching paradigms. Their performance heavily depends on the quality of template features, which becomes increasingly challenging to maintain in complex scenarios involving target deformation, occlusion, and background clutter. While existing spatiotemporal memory-based trackers emphasize memory capacity expansion, they lack effective mechanisms for dynamic feature selection and adaptive fusion. To address this gap, we propose a Dynamic Attention Mechanism in Spatiotemporal Memory Network (DASTM) with two key innovations: 1) A differentiable dynamic attention mechanism that adaptively adjusts channel-spatial attention weights by analyzing spatiotemporal correlations between the templates and memory features; 2) A lightweight gating network that autonomously allocates computational resources based on target motion states, prioritizing high-discriminability features in challenging scenarios. Extensive evaluations on OTB-2015, VOT 2018, LaSOT, and GOT-10K benchmarks demonstrate our DASTM's superiority, achieving state-of-the-art performance in success rate, robustness, and real-time efficiency, thereby offering a novel solution for real-time tracking in complex environments.

Updated: 2025-03-21 00:48:31

标题: 时空记忆网络中的动态注意力机制用于目标跟踪

摘要: 主流的视觉目标跟踪框架主要依赖于模板匹配范式。它们的性能严重依赖于模板特征的质量，在涉及目标变形、遮挡和背景混杂等复杂场景中，维护模板特征变得越来越具有挑战性。虽然现有的基于时空记忆的跟踪器强调记忆容量的扩展，但它们缺乏动态特征选择和自适应融合的有效机制。为了填补这一空白，我们提出了一种具有两个关键创新的时空记忆网络中的动态注意机制（DASTM）：1）一种可微分的动态注意机制，通过分析模板和记忆特征之间的时空相关性来自适应调整通道-空间注意权重；2）一种轻量级的门控网络，根据目标运动状态自主分配计算资源，优先考虑在挑战性场景中具有高可区分性的特征。在OTB-2015、VOT 2018、LaSOT和GOT-10K基准测试上进行了广泛评估，结果显示我们的DASTM在成功率、鲁棒性和实时效率方面表现优异，从而为复杂环境中的实时跟踪提供了一种新颖的解决方案。

更新时间: 2025-03-21 00:48:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16768v1

Optimizing Cycle Life Prediction of Lithium-ion Batteries via a Physics-Informed Model

Accurately measuring the cycle lifetime of commercial lithium-ion batteries is crucial for performance and technology development. We introduce a novel hybrid approach combining a physics-based equation with a self-attention model to predict the cycle lifetimes of commercial lithium iron phosphate graphite cells via early-cycle data. After fitting capacity loss curves to this physics-based equation, we then use a self-attention layer to reconstruct entire battery capacity loss curves. Our model exhibits comparable performances to existing models while predicting more information: the entire capacity loss curve instead of cycle life. This provides more robustness and interpretability: our model does not need to be retrained for a different notion of end-of-life and is backed by physical intuition.

Updated: 2025-03-21 00:46:26

标题: 通过基于物理的模型优化锂离子电池的循环寿命预测

摘要: 准确测量商用锂离子电池的循环寿命对于性能和技术发展至关重要。我们引入了一种新颖的混合方法，将基于物理的方程与自注意力模型结合起来，通过早期循环数据来预测商用锂铁磷酸铁锂电池的循环寿命。在将容量损失曲线拟合到这个基于物理的方程之后，我们再使用自注意力层来重构整个电池容量损失曲线。我们的模型表现出与现有模型相当的性能，同时预测更多的信息：整个容量损失曲线而不是循环寿命。这提供了更强的稳健性和可解释性：我们的模型不需要针对不同的寿命终点概念进行重新训练，并且得到了物理直觉的支持。

更新时间: 2025-03-21 00:46:26

领域: cs.LG

下载: http://arxiv.org/abs/2404.17174v2

On the Robustness of Language Models for Tabular Question Answering

Large Language Models (LLMs), already shown to ace various text comprehension tasks have also remarkably been shown to tackle table comprehension tasks without specific training. While previous research has explored LLM capabilities with tabular dataset tasks, our study assesses the influence of \textit{in-context learning}, \textit{model scale}, \textit{instruction tuning}, and \textit{domain biases} on Tabular Question Answering (TQA). We evaluate the robustness of LLMs on Wikipedia-based \textbf{WTQ}, financial report-based \textbf{TAT-QA}, and scientific claims-based \textbf{SCITAB}, TQA datasets, focusing on their ability to interpret tabular data under various augmentations and perturbations robustly. Our findings indicate that instructions significantly enhance performance, with recent models exhibiting greater robustness over earlier versions. However, data contamination and practical reliability issues persist, especially with \textbf{WTQ}. We highlight the need for improved methodologies, including structure-aware self-attention mechanisms and better handling of domain-specific tabular data, to develop more reliable LLMs for table comprehension.

Updated: 2025-03-21 00:31:06

标题: 关于表格问答语言模型的鲁棒性

摘要: 大型语言模型（LLMs）已经被证明能够出色地完成各种文本理解任务，同时也被显示出可以在没有特定训练的情况下处理表格理解任务。虽然先前的研究已经探讨了LLM在表格数据集任务中的能力，但我们的研究评估了\textit{上下文学习}、\textit{模型规模}、\textit{指导调整}和\textit{领域偏见}对表格问答（TQA）的影响。我们评估了LLMs在基于维基百科的\textbf{WTQ}、基于财务报告的\textbf{TAT-QA}和基于科学论断的\textbf{SCITAB}等TQA数据集上的鲁棒性，重点关注它们在各种增强和扰动下解释表格数据的能力。我们的研究结果表明，指导显著提高了性能，最近的模型表现出比早期版本更强的鲁棒性。然而，数据污染和实用性可靠性问题仍然存在，特别是在\textbf{WTQ}中。我们强调了需要改进的方法，包括结构感知的自注意机制和更好地处理领域特定的表格数据，以开发更可靠的用于表格理解的LLMs。

更新时间: 2025-03-21 00:31:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12719v2

Opportunities and Challenges of Frontier Data Governance With Synthetic Data

Synthetic data, or data generated by machine learning models, is increasingly emerging as a solution to the data access problem. However, its use introduces significant governance and accountability challenges, and potentially debases existing governance paradigms, such as compute and data governance. In this paper, we identify 3 key governance and accountability challenges that synthetic data poses - it can enable the increased emergence of malicious actors, spontaneous biases and value drift. We thus craft 3 technical mechanisms to address these specific challenges, finding applications for synthetic data towards adversarial training, bias mitigation and value reinforcement. These could not only counteract the risks of synthetic data, but serve as critical levers for governance of the frontier in the future.

Updated: 2025-03-21 00:30:17

标题: 前沿数据治理与合成数据的机遇与挑战

摘要: 合成数据，或者由机器学习模型生成的数据，越来越被视为解决数据访问问题的一种解决方案。然而，其使用引入了重大的治理和问责挑战，并可能削弱现有的治理范式，如计算和数据治理。在本文中，我们确定了合成数据所面临的3个关键治理和问责挑战 - 它可以促使恶意行为者的增加，自发偏见和价值漂移。因此，我们制定了3种技术机制来解决这些特定挑战，找到了合成数据在对抗性训练、偏见缓解和价值强化方面的应用。这些不仅可以抵消合成数据的风险，还可以作为未来前沿治理的关键杠杆。

更新时间: 2025-03-21 00:30:17

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.17414v1

Rethinking the Role of Spatial Mixing

Until quite recently, the backbone of nearly every state-of-the-art computer vision model has been the 2D convolution. At its core, a 2D convolution simultaneously mixes information across both the spatial and channel dimensions of a representation. Many recent computer vision architectures consist of sequences of isotropic blocks that disentangle the spatial and channel-mixing components. This separation of the operations allows us to more closely juxtapose the effects of spatial and channel mixing in deep learning. In this paper, we take an initial step towards garnering a deeper understanding of the roles of these mixing operations. Through our experiments and analysis, we discover that on both classical (ResNet) and cutting-edge (ConvMixer) models, we can reach nearly the same level of classification performance by and leaving the spatial mixers at their random initializations. Furthermore, we show that models with random, fixed spatial mixing are naturally more robust to adversarial perturbations. Lastly, we show that this phenomenon extends past the classification regime, as such models can also decode pixel-shuffled images.

Updated: 2025-03-21 00:28:30

标题: 重新思考空间混合的作用

摘要: 直到最近，几乎每个最先进的计算机视觉模型的主干都是2D卷积。在其核心，2D卷积同时混合了表示的空间和通道维度上的信息。许多最近的计算机视觉架构由一系列各向同性块组成，这些块将空间和通道混合分离开来。这些操作的分离使我们能够更密切地对比深度学习中空间和通道混合的效果。在本文中，我们迈出了更深入了解这些混合操作作用的初始步骤。通过我们的实验和分析，我们发现在经典（ResNet）和尖端（ConvMixer）模型上，通过保持空间混合器处于随机初始化状态，我们可以达到几乎相同水平的分类性能。此外，我们展示了具有随机固定空间混合的模型自然更加抗干扰。最后，我们展示了这种现象不仅限于分类领域，这些模型还可以解码像素混乱的图像。

更新时间: 2025-03-21 00:28:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.16760v1

Fast online node labeling with graph subsampling

Large data applications rely on storing data in massive, sparse graphs with millions to trillions of nodes. Graph-based methods, such as node prediction, aim for computational efficiency regardless of graph size. Techniques like localized approximate personalized page rank (APPR) solve sparse linear systems with complexity independent of graph size, but is in terms of the maximum node degree, which can be much larger in practice than the average node degree for real-world large graphs. In this paper, we consider an \emph{online subsampled APPR method}, where messages are intentionally dropped at random. We use tools from graph sparsifiers and matrix linear algebra to give approximation bounds on the graph's spectral properties ($O(1/\epsilon^2)$ edges), and node classification performance (added $O(n\epsilon)$ overhead).

Updated: 2025-03-21 00:13:16

标题: 快速在线节点标记与图子采样

摘要: 大数据应用依赖于将数据存储在拥有数百万到数万亿节点的庞大稀疏图中。基于图的方法，如节点预测，旨在实现计算效率，而不考虑图的大小。类似局部近似个性化页面排名（APPR）的技术解决了与图大小无关的稀疏线性系统，但是在最大节点度方面，实际上可能比真实世界大型图的平均节点度要大得多。在本文中，我们考虑一种“在线子采样APPR方法”，其中消息被有意地随机丢弃。我们利用图稀疏化和矩阵线性代数工具给出了图的谱特性（$O(1/\epsilon^2)$条边）和节点分类性能（额外$O(n\epsilon)$开销）的近似界限。

更新时间: 2025-03-21 00:13:16

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2503.16755v1