Arxiv Day: Article

Bio2Token: All-atom tokenization of any biomolecular structure with Mamba

Efficient encoding and representation of large 3D molecular structures with high fidelity is critical for biomolecular design applications. Despite this, many representation learning approaches restrict themselves to modeling smaller systems or use coarse-grained approximations of the systems, for example modeling proteins at the resolution of amino acid residues rather than at the level of individual atoms. To address this, we develop quantized auto-encoders that learn atom-level tokenizations of complete proteins, RNA and small molecule structures with reconstruction accuracies well below 1 Angstrom. We demonstrate that a simple Mamba state space model architecture is efficient compared to an SE(3)-invariant IPA architecture, reaches competitive accuracies and can scale to systems with almost 100,000 atoms. The learned structure tokens of bio2token may serve as the input for all-atom generative models in the future.

Updated: 2025-04-08 23:59:56

标题: Bio2Token：使用Mamba对任何生物分子结构进行全原子标记化

摘要: 高效编码和表示大型3D分子结构，保真度高对于生物分子设计应用至关重要。尽管如此，许多表示学习方法限制自己模拟较小的系统或使用系统的粗粒度近似，例如以氨基酸残基的分辨率建模蛋白质，而不是以个别原子的水平建模。为了解决这个问题，我们开发了量化自动编码器，学习完整蛋白质、RNA和小分子结构的原子级标记，重建精度远低于1埃。我们证明，与SE(3)-不变IPA体系结构相比，简单的Mamba状态空间模型体系结构高效，达到了竞争性精度，并且可以扩展到具有近10万个原子的系统。bio2token的学习结构标记可能成为未来全原子生成模型的输入。

更新时间: 2025-04-08 23:59:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.19110v3

Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions

This paper examines the role of cognitive biases in the decision-making processes of large language models (LLMs), challenging the conventional goal of eliminating all biases. When properly balanced, we show that certain cognitive biases can enhance decision-making efficiency through rational deviations and heuristic shortcuts. By introducing heuristic moderation and an abstention option, which allows LLMs to withhold responses when uncertain, we reduce error rates, improve decision accuracy, and optimize decision rates. Using the Balance Rigor and Utility (BRU) dataset, developed through expert collaboration, our findings demonstrate that targeted inspection of cognitive biases aligns LLM decisions more closely with human reasoning, enhancing reliability and suggesting strategies for future improvements. This approach offers a novel way to leverage cognitive biases to improve the practical utility of LLMs across various applications.

Updated: 2025-04-08 23:59:08

标题: 在大型语言模型中平衡严密性和实用性：减轻多项选择题中的认知偏见

摘要: 本文探讨了认知偏见在大型语言模型（LLMs）决策过程中的作用，挑战了消除所有偏见的传统目标。我们表明，当适当平衡时，某些认知偏见可以通过理性偏差和启发式快捷方式来增强决策效率。通过引入启发式调节和弃权选项，使LLMs在不确定时可以选择不做出响应，我们降低了错误率，提高了决策准确性，并优化了决策速度。利用通过专家合作开发的平衡严谨和实用性（BRU）数据集，我们的研究结果表明，有针对性地审查认知偏见可以使LLM的决策更贴近人类推理，增强可靠性，并提出未来改进的策略。这种方法提供了一种新颖的方式，利用认知偏见来改善LLMs在各种应用中的实用性。

更新时间: 2025-04-08 23:59:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10999v4

Exploiting Meta-Learning-based Poisoning Attacks for Graph Link Prediction

Link prediction in graph data utilizes various algorithms and machine learning/deep learning models to predict potential relationships between graph nodes. This technique has found widespread use in numerous real-world applications, including recommendation systems, community networks, and biological structures. However, recent research has highlighted the vulnerability of link prediction models to adversarial attacks, such as poisoning and evasion attacks. Addressing the vulnerability of these models is crucial to ensure stable and robust performance in link prediction applications. While many works have focused on enhancing the robustness of the Graph Convolution Network (GCN) model, the Variational Graph Auto-Encoder (VGAE), a sophisticated model for link prediction, has not been thoroughly investigated in the context of graph adversarial attacks. To bridge this gap, this article proposes an unweighted graph poisoning attack approach using meta-learning techniques to undermine VGAE's link prediction performance. We conducted comprehensive experiments on diverse datasets to evaluate the proposed method and its parameters, comparing it with existing approaches in similar settings. Our results demonstrate that our approach significantly diminishes link prediction performance and outperforms other state-of-the-art methods.

Updated: 2025-04-08 23:36:29

标题: 利用基于元学习的投毒攻击进行图链接预测

摘要: 在图数据中的链路预测利用各种算法和机器学习/深度学习模型来预测图节点之间的潜在关系。这种技术已经在许多现实世界的应用中得到广泛使用，包括推荐系统、社区网络和生物结构。然而，最近的研究强调了链路预测模型对敌对攻击（如毒化和逃避攻击）的脆弱性。解决这些模型的脆弱性对于确保链路预测应用的稳定和强大性能至关重要。尽管许多作品都致力于增强图卷积网络（GCN）模型的稳健性，但用于链路预测的复杂模型变分图自动编码器（VGAE）在图敌对攻击的背景下尚未得到全面调查。为了弥补这一差距，本文提出了一种使用元学习技术的无权图毒化攻击方法，以削弱VGAE的链路预测性能。我们对多种数据集进行了全面实验，评估了提出的方法及其参数，并将其与类似设置中的现有方法进行比较。我们的结果表明，我们的方法显著降低了链路预测性能，并优于其他最先进的方法。

更新时间: 2025-04-08 23:36:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06492v1

Optimizing Through Change: Bounds and Recommendations for Time-Varying Bayesian Optimization Algorithms

Time-Varying Bayesian Optimization (TVBO) is the go-to framework for optimizing a time-varying, expensive, noisy black-box function. However, most of the solutions proposed so far either rely on unrealistic assumptions on the nature of the objective function or do not offer any theoretical guarantees. We propose the first analysis that asymptotically bounds the cumulative regret of TVBO algorithms under mild and realistic assumptions only. In particular, we provide an algorithm-independent lower regret bound and an upper regret bound that holds for a large class of TVBO algorithms. Based on this analysis, we formulate recommendations for TVBO algorithms and show how an algorithm (BOLT) that follows them performs better than the state-of-the-art of TVBO through experiments on synthetic and real-world problems.

Updated: 2025-04-08 23:26:57

标题: 优化通过变化：时间变化的贝叶斯优化算法的界限和建议

摘要: 时间变化的贝叶斯优化（TVBO）是优化时间变化、昂贵、嘈杂的黑盒函数的首选框架。然而，迄今为止提出的大多数解决方案要么依赖于对目标函数性质的不切实际的假设，要么不提供任何理论保证。我们提出了第一个只在温和和现实假设下渐近地限制TVBO算法的累积遗憾的分析。具体地，我们提供了一个与算法无关的较低遗憾界和一个适用于大类TVBO算法的较高遗憾界。基于这一分析，我们提出了TVBO算法的建议，并展示了一个遵循这些建议的算法（BOLT）如何通过对合成和真实世界问题的实验优于TVBO的最新技术。

更新时间: 2025-04-08 23:26:57

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2501.18963v2

Medical-GAT: Cancer Document Classification Leveraging Graph-Based Residual Network for Scenarios with Limited Data

Accurate classification of cancer-related medical abstracts is crucial for healthcare management and research. However, obtaining large, labeled datasets in the medical domain is challenging due to privacy concerns and the complexity of clinical data. This scarcity of annotated data impedes the development of effective machine learning models for cancer document classification. To address this challenge, we present a curated dataset of 1,874 biomedical abstracts, categorized into thyroid cancer, colon cancer, lung cancer, and generic topics. Our research focuses on leveraging this dataset to improve classification performance, particularly in data-scarce scenarios. We introduce a Residual Graph Attention Network (R-GAT) with multiple graph attention layers that capture the semantic information and structural relationships within cancer-related documents. Our R-GAT model is compared with various techniques, including transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT), RoBERTa, and domain-specific models like BioBERT and Bio+ClinicalBERT. We also evaluated deep learning models (CNNs, LSTMs) and traditional machine learning models (Logistic Regression, SVM). Additionally, we explore ensemble approaches that combine deep learning models to enhance classification. Various feature extraction methods are assessed, including Term Frequency-Inverse Document Frequency (TF-IDF) with unigrams and bigrams, Word2Vec, and tokenizers from BERT and RoBERTa. The R-GAT model outperforms other techniques, achieving precision, recall, and F1 scores of 0.99, 0.97, and 0.98 for thyroid cancer; 0.96, 0.94, and 0.95 for colon cancer; 0.96, 0.99, and 0.97 for lung cancer; and 0.95, 0.96, and 0.95 for generic topics.

Updated: 2025-04-08 22:53:41

标题: 医学-GAT：基于图形残差网络的癌症文档分类，在数据有限的情况下发挥作用

摘要: 癌症相关医学摘要的准确分类对于卫生管理和研究至关重要。然而，由于隐私问题和临床数据的复杂性，医学领域中获取大规模标记数据集是具有挑战性的。这种标记数据的稀缺性阻碍了癌症文档分类的有效机器学习模型的发展。为了解决这一挑战，我们提出了一个包含1,874篇生物医学摘要的策划数据集，分为甲状腺癌、结肠癌、肺癌和通用主题。我们的研究重点是利用这个数据集来提高分类性能，特别是在数据稀缺的情况下。我们引入了一个具有多个图注意层的残差图注意网络（R-GAT），用于捕获癌症相关文档中的语义信息和结构关系。我们的R-GAT模型与各种技术进行比较，包括基于变压器的模型，如双向编码器来自变压器（BERT）、RoBERTa，以及领域特定模型，如BioBERT和Bio+ClinicalBERT。我们还评估了深度学习模型（CNNs、LSTMs）和传统机器学习模型（逻辑回归、SVM）。此外，我们探索了将深度学习模型结合起来以增强分类的集成方法。评估了各种特征提取方法，包括具有单词和二元词的词频-逆文档频率（TF-IDF）、Word2Vec以及来自BERT和RoBERTa的标记器。R-GAT模型胜过其他技术，对于甲状腺癌实现了0.99的精确度、0.97的召回率和0.98的F1分数；对于结肠癌，分别为0.96、0.94和0.95；对于肺癌，分别为0.96、0.99和0.97；对于通用主题，分别为0.95、0.96和0.95。

更新时间: 2025-04-08 22:53:41

领域: cs.AI

下载: http://arxiv.org/abs/2410.15198v4

Sparsified-Learning for Heavy-Tailed Locally Stationary Processes

Sparsified Learning is ubiquitous in many machine learning tasks. It aims to regularize the objective function by adding a penalization term that considers the constraints made on the learned parameters. This paper considers the problem of learning heavy-tailed LSP. We develop a flexible and robust sparse learning framework capable of handling heavy-tailed data with locally stationary behavior and propose concentration inequalities. We further provide non-asymptotic oracle inequalities for different types of sparsity, including $\ell_1$-norm and total variation penalization for the least square loss.

Updated: 2025-04-08 22:43:55

标题: 稀疏学习用于重尾局部平稳过程

摘要: 稀疏化学习在许多机器学习任务中是普遍存在的。它旨在通过添加考虑到对学习参数施加的约束的惩罚项来正则化目标函数。本文考虑了学习重尾长尾分布（LSP）的问题。我们开发了一个灵活而强大的稀疏学习框架，能够处理具有局部平稳行为的重尾数据，并提出了集中不等式。我们进一步为不同类型的稀疏性提供了非渐近的Oracle不等式，包括对最小二乘损失的$\ell_1$-范数和总变差惩罚。

更新时间: 2025-04-08 22:43:55

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2504.06477v1

CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model

Fine-tuning large diffusion models for custom applications demands substantial power and time, which poses significant challenges for efficient implementation on mobile devices. In this paper, we develop a novel training accelerator specifically for Low-Rank Adaptation (LoRA) of diffusion models, aiming to streamline the process and reduce computational complexity. By leveraging a fully quantized training scheme for LoRA fine-tuning, we achieve substantial reductions in memory usage and power consumption while maintaining high model fidelity. The proposed accelerator features flexible dataflow, enabling high utilization for irregular and variable tensor shapes during the LoRA process. Experimental results show up to 1.81x training speedup and 5.50x energy efficiency improvements compared to the baseline, with minimal impact on image generation quality.

Updated: 2025-04-08 22:40:29

标题: CDM-QTA：用于扩散模型的高效LoRA微调的量化训练加速

摘要: 将大型扩散模型进行微调以适用于定制应用程序需要大量的功率和时间，这对于在移动设备上高效实现提出了重大挑战。在本文中，我们开发了一种新型的训练加速器，专门用于扩散模型的低秩适应（LoRA），旨在简化流程并降低计算复杂性。通过利用LoRA微调的完全量化训练方案，我们实现了内存使用量和功耗的大幅降低，同时保持高模型保真度。所提出的加速器具有灵活的数据流，可以在LoRA过程中对不规则和可变的张量形状进行高效利用。实验结果显示，与基线相比，训练速度提高了最多1.81倍，能效提高了5.50倍，对图像生成质量影响最小。

更新时间: 2025-04-08 22:40:29

领域: cs.GR,cs.AI,cs.AR,cs.CV

下载: http://arxiv.org/abs/2504.07998v1

Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization

3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in 3D reconstruction, achieving high-quality results with real-time radiance field rendering. However, a key challenge is the substantial storage cost: reconstructing a single scene typically requires millions of Gaussian splats, each represented by 59 floating-point parameters, resulting in approximately 1 GB of memory. To address this challenge, we propose a compression method by building separate attribute codebooks and storing only discrete code indices. Specifically, we employ noise-substituted vector quantization technique to jointly train the codebooks and model features, ensuring consistency between gradient descent optimization and parameter discretization. Our method reduces the memory consumption efficiently (around $45\times$) while maintaining competitive reconstruction quality on standard 3D benchmark scenes. Experiments on different codebook sizes show the trade-off between compression ratio and image quality. Furthermore, the trained compressed model remains fully compatible with popular 3DGS viewers and enables faster rendering speed, making it well-suited for practical applications.

Updated: 2025-04-08 22:40:23

标题: 通过噪声替代向量量化压缩3D高斯点阵

摘要: 3D高斯喷洒（3DGS）在3D重建方面表现出显著的效果，实现了实时辐射场渲染的高质量结果。然而，一个关键挑战是巨大的存储成本：重建一个场景通常需要数百万个高斯喷洒，每个由59个浮点参数表示，导致大约1GB的内存消耗。为了解决这个挑战，我们提出了一种压缩方法，通过构建单独的属性码书并仅存储离散的码索引。具体来说，我们采用噪声替代矢量量化技术来共同训练码书和模型特征，确保梯度下降优化和参数离散化之间的一致性。我们的方法有效地减少了内存消耗（约45倍），同时在标准3D基准场景上保持了竞争性的重建质量。对不同码书大小的实验显示了压缩比和图像质量之间的权衡。此外，经过训练的压缩模型仍然与流行的3DGS查看器完全兼容，并实现更快的渲染速度，使其非常适合实际应用。

更新时间: 2025-04-08 22:40:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.03059v2

Deep Fair Learning: A Unified Framework for Fine-tuning Representations with Sufficient Networks

Ensuring fairness in machine learning is a critical and challenging task, as biased data representations often lead to unfair predictions. To address this, we propose Deep Fair Learning, a framework that integrates nonlinear sufficient dimension reduction with deep learning to construct fair and informative representations. By introducing a novel penalty term during fine-tuning, our method enforces conditional independence between sensitive attributes and learned representations, addressing bias at its source while preserving predictive performance. Unlike prior methods, it supports diverse sensitive attributes, including continuous, discrete, binary, or multi-group types. Experiments on various types of data structure show that our approach achieves a superior balance between fairness and utility, significantly outperforming state-of-the-art baselines.

Updated: 2025-04-08 22:24:22

标题: 深度公平学习：一个统一的框架，用于通过充分网络微调表示

摘要: 确保机器学习中的公平性是一项关键且具有挑战性的任务，因为偏见数据表征经常导致不公平的预测。为了解决这个问题，我们提出了深度公平学习（Deep Fair Learning）框架，该框架将非线性充分降维与深度学习结合，构建公平且信息丰富的表征。通过在微调过程中引入一种新颖的惩罚项，我们的方法强制实现了敏感属性和学习表征之间的条件独立性，解决了偏见问题的根源，同时保持了预测性能。与以往的方法不同，它支持各种敏感属性，包括连续、离散、二进制或多组类型。在各种数据结构上的实验表明，我们的方法在公平性和效用之间取得了卓越的平衡，显著优于最先进的基线方法。

更新时间: 2025-04-08 22:24:22

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.06470v1

ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions

Large Language Models (LLMs) are widely used in Conversational AI systems to generate responses to user inquiries. However, many natural questions lack well-defined answers. While existing studies primarily focus on question types such as false premises, they often overlook out-of-scope questions, where the provided document is semantically highly similar to the query but does not contain the required answer. In this paper, we propose a guided hallucination-based method to efficiently generate a diverse set of out-of-scope questions from a given document corpus. We then evaluate multiple LLMs based on their effectiveness in confusion detection and appropriate response generation. Furthermore, we introduce an improved method for detecting such out-of-scope questions, enhancing the reliability of LLM-based question-answering systems.

Updated: 2025-04-08 22:24:08

标题: ELOQ：增强LLM检测超出范围问题的资源

摘要: 大型语言模型（LLMs）广泛用于对话式人工智能系统中，用于生成用户查询的响应。然而，许多自然问题缺乏明确定义的答案。现有研究主要关注诸如错误前提之类的问题类型，但往往忽略了超出范围的问题，即所提供的文档在语义上与查询高度相似，但不包含所需答案。在本文中，我们提出了一种基于引导幻觉的方法，可以从给定的文档语料库中有效地生成多样化的超出范围问题。然后，我们评估多个LLMs的效果，评估其在混淆检测和适当响应生成方面的效果。此外，我们引入了一种改进的方法来检测此类超出范围的问题，增强了基于LLM的问答系统的可靠性。

更新时间: 2025-04-08 22:24:08

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.14567v3

AI-Assisted Transport of Radioactive Ion Beams

Beams of radioactive heavy ions allow researchers to study rare and unstable atomic nuclei, shedding light into the internal structure of exotic nuclei and on how chemical elements are formed in stars. However, the extraction and transport of radioactive beams rely on time-consuming expert-driven tuning methods, where hundreds of parameters are manually optimized. Here, we introduce a system that uses Artificial Intelligence (AI) to assist in the radioactive beam transport process. We apply our methodology to real-life scenarios showing advantages when compared with standard tuning methods. Our method can be extended to other radioactive beam facilities around the world to improve operational efficiency and enhance scientific output.

Updated: 2025-04-08 22:21:54

标题: 人工智能辅助下的放射性离子束输运

摘要: 放射性重离子束使研究人员能够研究稀有和不稳定的原子核，揭示了异性核的内部结构以及化学元素是如何在恒星中形成的。然而，提取和传输放射性束依赖于耗时的专家驱动的调整方法，其中需要手动优化数百个参数。在这里，我们介绍了一个使用人工智能（AI）来辅助放射性束传输过程的系统。我们将我们的方法应用到现实场景中，与标准调整方法相比显示出优势。我们的方法可以扩展到世界各地的其他放射性束设施，以提高运营效率并增强科学产出。

更新时间: 2025-04-08 22:21:54

领域: physics.acc-ph,cs.AI,nucl-ex

下载: http://arxiv.org/abs/2504.06469v1

Agent-Arena: A General Framework for Evaluating Control Algorithms

Robotic research is inherently challenging, requiring expertise in diverse environments and control algorithms. Adapting algorithms to new environments often poses significant difficulties, compounded by the need for extensive hyper-parameter tuning in data-driven methods. To address these challenges, we present Agent-Arena, a Python framework designed to streamline the integration, replication, development, and testing of decision-making policies across a wide range of benchmark environments. Unlike existing frameworks, Agent-Arena is uniquely generalised to support all types of control algorithms and is adaptable to both simulation and real-robot scenarios. Please see our GitHub repository https://github.com/halid1020/agent-arena-v0.

Updated: 2025-04-08 22:20:50

标题: Agent-Arena：一个用于评估控制算法的通用框架

摘要: 机器人研究本质上具有挑战性，需要在不同环境和控制算法方面具备专门知识。将算法调整到新环境通常会带来重大困难，这还会受到数据驱动方法中大量超参数调整的影响。为了解决这些挑战，我们提出了Agent-Arena，这是一个设计用于简化在各种基准环境中集成、复制、开发和测试决策策略的Python框架。与现有框架不同，Agent-Arena独特地通用，支持所有类型的控制算法，并适用于模拟和真实机器人场景。请查看我们的GitHub存储库https://github.com/halid1020/agent-arena-v0。

更新时间: 2025-04-08 22:20:50

领域: cs.RO,cs.AI,cs.SE

下载: http://arxiv.org/abs/2504.06468v1

PARDON: Privacy-Aware and Robust Federated Domain Generalization

Federated Learning (FL) shows promise in preserving privacy and enabling collaborative learning. However, most current solutions focus on private data collected from a single domain. A significant challenge arises when client data comes from diverse domains (i.e., domain shift), leading to poor performance on unseen domains. Existing Federated Domain Generalization approaches address this problem but assume each client holds data for an entire domain, limiting their practicality in real-world scenarios with domain-based heterogeneity and client sampling. In addition, certain methods enable information sharing among clients, raising privacy concerns as this information could be used to reconstruct sensitive private data. To overcome this, we introduce FISC, a novel FedDG paradigm designed to robustly handle more complicated domain distributions between clients while ensuring security. FISC enables learning across domains by extracting an interpolative style from local styles and employing contrastive learning. This strategy gives clients multi-domain representations and unbiased convergent targets. Empirical results on multiple datasets, including PACS, Office-Home, and IWildCam, show FISC outperforms state-of-the-art (SOTA) methods. Our method achieves accuracy on unseen domains, with improvements ranging from 3.64% to 57.22% on unseen domains. Our code is available at https://github.com/judydnguyen/PARDON-FedDG.

Updated: 2025-04-08 22:15:47

标题: PARDON：隐私感知和强大的联邦域泛化

摘要: 联邦学习（FL）在保护隐私和促进合作学习方面表现出潜力。然而，目前大多数解决方案侧重于来自单个域的私有数据。当客户数据来自不同领域（即领域转移）时，就会出现重大挑战，导致在未知领域上表现不佳。现有的联邦领域泛化方法解决了这个问题，但假设每个客户端持有整个领域的数据，限制了它们在具有基于领域的异质性和客户抽样的实际场景中的实用性。此外，某些方法允许客户端之间共享信息，引发隐私问题，因为这些信息可能被用来重建敏感私人数据。为了克服这一问题，我们引入了FISC，一种新颖的FedDG范例，旨在稳健地处理客户端之间更复杂的领域分布，同时确保安全性。FISC通过从本地样式中提取一种插值样式并采用对比学习来实现跨领域学习。这种策略为客户提供了多领域表示和无偏的收敛目标。在多个数据集上的实证结果，包括PACS、Office-Home和IWildCam，显示FISC优于最先进的方法。我们的方法在未知领域上取得了准确性，改进范围从3.64%到57.22%。我们的代码可在https://github.com/judydnguyen/PARDON-FedDG 上找到。

更新时间: 2025-04-08 22:15:47

领域: cs.LG,cs.CV,cs.DC

下载: http://arxiv.org/abs/2410.22622v2

Hyperparameter Optimization in Machine Learning

Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determines the effectiveness of systems based on these technologies. Manual hyperparameter search is often unsatisfactory and becomes infeasible when the number of hyperparameters is large. Automating the search is an important step towards advancing, streamlining, and systematizing machine learning, freeing researchers and practitioners alike from the burden of finding a good set of hyperparameters by trial and error. In this survey, we present a unified treatment of hyperparameter optimization, providing the reader with examples, insights into the state-of-the-art, and numerous links to further reading. We cover the main families of techniques to automate hyperparameter search, often referred to as hyperparameter optimization or tuning, including random and quasi-random search, bandit-, model-, population-, and gradient-based approaches. We further discuss extensions, including online, constrained, and multi-objective formulations, touch upon connections with other fields such as meta-learning and neural architecture search, and conclude with open questions and future research directions.

Updated: 2025-04-08 22:13:51

标题: 机器学习中的超参数优化

摘要: 超参数是控制机器学习算法行为的配置变量。它们在机器学习和人工智能中无处不在，其值的选择决定了基于这些技术的系统的有效性。手动超参数搜索通常不尽如人意，当超参数数量较多时变得不可行。自动化搜索是推进、简化和系统化机器学习的重要步骤，使研究人员和从业者摆脱通过试错找到一组好超参数的负担。在这份调查中，我们提供了超参数优化的统一处理，为读者提供示例、最新技术见解以及大量进一步阅读链接。我们涵盖了自动化超参数搜索的主要技术家族，通常被称为超参数优化或调整，包括随机和准随机搜索、赌博-、模型-、种群-和基于梯度的方法。我们进一步讨论了扩展，包括在线、受限和多目标形式，涉及与其他领域如元学习和神经结构搜索的关联，并最后探讨了未来研究方向和待解决问题。

更新时间: 2025-04-08 22:13:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.22854v2

Low-Rank Thinning

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

Updated: 2025-04-08 21:57:48

标题: 低秩稀疏

摘要: 在稀疏化中的目标是使用一小组代表性点来总结数据集。值得注意的是，像Kernel Halving和Compress这样的次高斯稀疏化算法可以在大幅减少摘要点数的同时与均匀子采样的质量相匹配。然而，现有的保证仅涵盖了一定范围的分布和基于内核的质量测量，并且受到悲观维度依赖的影响。为了解决这些缺陷，我们引入了一种新的次高斯稀疏化的低秩分析方法，适用于任何分布和任何内核，保证在内核或数据矩阵近似低秩时高质量的压缩。为了展示这些技术的广泛适用性，我们设计了实用的次高斯稀疏化方法，改进了用于近似注意力在变压器中的最佳保证，通过重新排序加速随机梯度训练，并在接近线性时间内区分分布。

更新时间: 2025-04-08 21:57:48

领域: stat.ML,cs.LG,math.OC,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2502.12063v4

Federated Neural Architecture Search with Model-Agnostic Meta Learning

Federated Learning (FL) often struggles with data heterogeneity due to the naturally uneven distribution of user data across devices. Federated Neural Architecture Search (NAS) enables collaborative search for optimal model architectures tailored to heterogeneous data to achieve higher accuracy. However, this process is time-consuming due to extensive search space and retraining. To overcome this, we introduce FedMetaNAS, a framework that integrates meta-learning with NAS within the FL context to expedite the architecture search by pruning the search space and eliminating the retraining stage. Our approach first utilizes the Gumbel-Softmax reparameterization to facilitate relaxation of the mixed operations in the search space. We then refine the local search process by incorporating Model-Agnostic Meta-Learning, where a task-specific learner adapts both weights and architecture parameters (alphas) for individual tasks, while a meta learner adjusts the overall model weights and alphas based on the gradient information from task learners. Following the meta-update, we propose soft pruning using the same trick on search space to gradually sparsify the architecture, ensuring that the performance of the chosen architecture remains robust after pruning which allows for immediate use of the model without retraining. Experimental evaluations demonstrate that FedMetaNAS significantly accelerates the search process by more than 50\% with higher accuracy compared to FedNAS.

Updated: 2025-04-08 21:57:40

标题: 联邦式神经架构搜索与模型无关元学习

摘要: 联邦学习（FL）通常由于设备之间用户数据的自然不均匀分布而面临数据异构性的问题。联邦神经架构搜索（NAS）实现了针对异构数据的协作搜索，以获得更高的准确性。然而，由于广泛的搜索空间和重新训练，这一过程耗时。为了克服这一问题，我们引入了FedMetaNAS，这是一个将元学习与NAS相结合在FL环境中的框架，以加快架构搜索，通过剪枝搜索空间和消除重新训练阶段。我们的方法首先利用Gumbel-Softmax重参数化来促进在搜索空间中混合操作的放松。然后，通过结合模型无关元学习来完善本地搜索过程，其中一个任务特定的学习器调整任务的权重和架构参数（alphas），而一个元学习器根据来自任务学习器的梯度信息调整整体模型权重和alphas。在元更新之后，我们提出使用相同技巧对搜索空间进行软剪枝，逐渐稀疏化架构，确保选择的架构在剪枝后的性能仍然稳健，从而允许立即使用模型而无需重新训练。实验评估表明，与FedNAS相比，FedMetaNAS显著加快了搜索过程超过50\%，并且具有更高的准确性。

更新时间: 2025-04-08 21:57:40

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2504.06457v1

Mitigating Adversarial Effects of False Data Injection Attacks in Power Grid

Deep Neural Networks have proven to be highly accurate at a variety of tasks in recent years. The benefits of Deep Neural Networks have also been embraced in power grids to detect False Data Injection Attacks (FDIA) while conducting critical tasks like state estimation. However, the vulnerabilities of DNNs along with the distinct infrastructure of the cyber-physical-system (CPS) can favor the attackers to bypass the detection mechanism. Moreover, the divergent nature of CPS engenders limitations to the conventional defense mechanisms for False Data Injection Attacks. In this paper, we propose a DNN framework with an additional layer that utilizes randomization to mitigate the adversarial effect by padding the inputs. The primary advantage of our method is when deployed to a DNN model it has a trivial impact on the model's performance even with larger padding sizes. We demonstrate the favorable outcome of the framework through simulation using the IEEE 14-bus, 30-bus, 118-bus, and 300-bus systems. Furthermore to justify the framework we select attack techniques that generate subtle adversarial examples that can bypass the detection mechanism effortlessly.

Updated: 2025-04-08 21:51:37

标题: 减轻电网中虚假数据注入攻击的敌对影响

摘要: 近年来，深度神经网络在各种任务中表现出了高度的准确性。深度神经网络的优势也被应用到电力网络中，用于检测虚假数据注入攻击（FDIA），同时执行关键任务如状态估计。然而，由于DNN的弱点以及网络物理系统（CPS）的独特基础设施，攻击者可能更容易绕过检测机制。此外，CPS的差异性也为虚假数据注入攻击的传统防御机制带来了限制。在本文中，我们提出了一个使用随机化填充输入的附加层的DNN框架，以减轻对抗效果。我们的方法的主要优势在于，即使使用更大的填充大小，将其部署到DNN模型中也对模型的性能几乎没有影响。我们通过模拟使用IEEE 14、30、118和300节点系统展示了该框架的有利结果。此外，为了证明该框架的有效性，我们选择生成微妙的对抗示例的攻击技术，这些示例可以轻松绕过检测机制。

更新时间: 2025-04-08 21:51:37

领域: cs.CR

下载: http://arxiv.org/abs/2301.12487v3

Can Large Language Models Replace Data Scientists in Biomedical Research?

Data science plays a critical role in biomedical research, but it requires professionals with expertise in coding and medical data analysis. Large language models (LLMs) have shown great potential in supporting medical tasks and performing well in general coding tests. However, existing evaluations fail to assess their capability in biomedical data science, particularly in handling diverse data types such as genomics and clinical datasets. To address this gap, we developed a benchmark of data science coding tasks derived from the analyses of 39 published studies. This benchmark comprises 293 coding tasks (128 in Python and 165 in R) performed on real-world TCGA-type genomics and clinical data. Our findings reveal that the vanilla prompting of LLMs yields suboptimal performances due to drawbacks in following input instructions, understanding target data, and adhering to standard analysis practices. Next, we benchmarked six cutting-edge LLMs and advanced adaptation methods, finding two methods to be particularly effective: chain-of-thought prompting, which provides a step-by-step plan for data analysis, which led to a 21% code accuracy improvement (56.6% versus 35.3%); and self-reflection, enabling LLMs to refine the buggy code iteratively, yielding an 11% code accuracy improvement (45.5% versus 34.3%). Building on these insights, we developed a platform that integrates LLMs into the data science workflow for medical professionals. In a user study with five medical professionals, we found that while LLMs cannot fully automate programming tasks, they significantly streamline the programming process. We found that 80% of their submitted code solutions were incorporated from LLM-generated code, with up to 96% reuse in some cases. Our analysis highlights the potential of LLMs to enhance data science efficiency in biomedical research when integrated into expert workflows.

Updated: 2025-04-08 21:48:54

标题: 大型语言模型能否取代生物医学研究中的数据科学家？

摘要: 数据科学在生物医学研究中发挥着至关重要的作用，但需要具备编码和医学数据分析专业知识的专业人士。大型语言模型(LLMs)在支持医学任务方面表现出巨大潜力，并在一般编码测试中表现良好。然而，现有评估未能评估它们在生物医学数据科学中的能力，特别是处理基因组学和临床数据集等多种数据类型。为了弥补这一差距，我们开发了一个基于对39篇发表研究的分析得出的数据科学编码任务基准。该基准包括对真实世界TCGA类型基因组学和临床数据进行的293个编码任务(128个Python和165个R)。我们的发现显示，LLMs的原始提示导致性能低下，原因在于无法正确遵循输入指令、理解目标数据和遵循标准分析实践。接下来，我们对六种尖端的LLMs和先进的适应方法进行了基准测试，发现两种方法特别有效：思维链提示，为数据分析提供逐步计划，导致代码准确性提高了21%(56.6%对35.3%)；自我反思，使LLMs能够迭代地完善有bug的代码，导致代码准确性提高了11%(45.5%对34.3%)。基于这些见解，我们开发了一个将LLMs整合到医学专业人员的数据科学工作流程中的平台。在与五名医学专业人员进行的用户研究中，我们发现虽然LLMs不能完全自动化编程任务，但它们显著简化了编程过程。我们发现他们提交的代码解决方案中有80%是从LLM生成的代码中提取的，在某些情况下甚至达到了96%的重用率。我们的分析突出了将LLMs整合到专家工作流程中时，提高生物医学研究中数据科学效率的潜力。

更新时间: 2025-04-08 21:48:54

领域: cs.AI,cs.CL,q-bio.GN,q-bio.QM

下载: http://arxiv.org/abs/2410.21591v2

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models

The indistinguishability of AI-generated content from human text raises challenges in transparency and accountability. While several methods exist to watermark models behind APIs, embedding watermark strategies directly into model weights that are later reflected in the outputs of the model is challenging. In this study we propose a strategy to finetune a pair of low-rank adapters of a model, one serving as the text-generating model, and the other as the detector, so that a subtle watermark is embedded into the text generated by the first model and simultaneously optimized for detectability by the second. In this way, the watermarking strategy is fully learned end-to-end. This process imposes an optimization challenge, as balancing watermark robustness, naturalness, and task performance requires trade-offs. We discuss strategies on how to optimize this min-max objective and present results showing the effect of this modification to instruction finetuning.

Updated: 2025-04-08 21:34:02

标题: 你能微调你的双筒望远镜吗？将文本水印嵌入大型语言模型的权重

摘要: 人工智能生成的内容与人类文本难以区分，这引发了透明度和问责制方面的挑战。虽然存在多种方法可以在API后面对模型进行水印处理，但直接将水印策略嵌入到模型权重中，然后反映在模型输出中是具有挑战性的。在这项研究中，我们提出了一种策略，通过微调模型的一对低秩适配器，其中一个用作文本生成模型，另一个用作检测器，从而在第一个模型生成的文本中嵌入微妙的水印，并同时优化第二个模型检测到这种水印的能力。通过这种方式，水印策略可以完全学习端到端。这个过程带来了一个优化挑战，因为平衡水印的强度、自然性和任务性能需要进行权衡。我们讨论了如何优化这种极小-极大目标，并呈现了这种修改对指导微调的影响的结果。

更新时间: 2025-04-08 21:34:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06446v1

Classifying Subjective Time Perception in a Multi-robot Control Scenario Using Eye-tracking Information

As automation and mobile robotics reshape work environments, rising expectations for productivity increase cognitive demands on human operators, leading to potential stress and cognitive overload. Accurately assessing an operator's mental state is critical for maintaining performance and well-being. We use subjective time perception, which can be altered by stress and cognitive load, as a sensitive, low-latency indicator of well-being and cognitive strain. Distortions in time perception can affect decision-making, reaction times, and overall task effectiveness, making it a valuable metric for adaptive human-swarm interaction systems. We study how human physiological signals can be used to estimate a person's subjective time perception in a human-swarm interaction scenario as example. A human operator needs to guide and control a swarm of small mobile robots. We obtain eye-tracking data that is classified for subjective time perception based on questionnaire data. Our results show that we successfully estimate a person's time perception from eye-tracking data. The approach can profit from individual-based pretraining using only 30 seconds of data. In future work, we aim for robots that respond to human operator needs by automatically classifying physiological data in a closed control loop.

Updated: 2025-04-08 21:30:18

标题: 使用眼动信息在多机器人控制场景中对主观时间感知进行分类

摘要: 随着自动化和移动机器人改变工作环境，对生产力不断增长的期望增加了人类操作员的认知压力，导致潜在的压力和认知超载。准确评估操作员的心理状态对于保持绩效和健康至关重要。我们使用主观时间知觉作为一种敏感、低延迟的健康和认知负荷指标，因为它可以受到压力和认知负荷的影响。时间感知的扭曲可能影响决策、反应时间和整体任务效果，使其成为自适应人-群体交互系统的宝贵指标。我们研究了如何利用人类生理信号来估计一个人在人-群体交互场景中的主观时间知觉。一个人操作员需要引导和控制一群小型移动机器人。我们获取了基于问卷数据分类的眼动数据，用于主观时间知觉。我们的结果表明，我们成功地从眼动数据中估计了一个人的时间知觉。这种方法可以从仅使用30秒的数据进行个体化预训练中获益。在未来的工作中，我们的目标是通过自动对生理数据进行分类来响应人类操作员的需求，形成一个闭合控制循环。

更新时间: 2025-04-08 21:30:18

领域: cs.RO,cs.HC,cs.LG

下载: http://arxiv.org/abs/2504.06442v1

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with large language models (LLMs) have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on three combinatorial optimization tasks - bin packing, traveling salesman, and the flatpack problem - show that combining RL and evolutionary search improves discovery efficiency of improved algorithms, showcasing the potential of RL-enhanced evolutionary strategies to assist computer scientists and mathematicians for more efficient algorithm design.

Updated: 2025-04-08 21:23:26

标题: 算法发现与LLMs：进化搜索遇上强化学习

摘要: 在数学和计算机科学领域，发现解决复杂问题的高效算法一直是一项重大挑战，多年来需要大量的人类专业知识。最近，在具有大型语言模型（LLMs）的进化搜索方面取得了一些进展，显示出在加速各个领域中的算法发现方面具有潜力，特别是在数学和优化领域。然而，现有方法将LLM视为静态生成器，错失了利用从进化探索中获得的信号更新模型的机会。在这项工作中，我们提出通过强化学习（RL）微调不断改进搜索算子 - LLM - 来增强基于LLM的进化搜索。我们的方法利用进化搜索作为一种探索策略，发现改进的算法，同时RL基于这些发现优化LLM策略。我们在三个组合优化任务上的实验 - 装箱、旅行推销员和平板拼图问题 - 表明将RL和进化搜索结合起来可以提高改进算法的发现效率，展示了RL增强的进化策略有助于计算机科学家和数学家更有效地设计算法的潜力。

更新时间: 2025-04-08 21:23:26

领域: cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2504.05108v2

ChatGPT-4 in the Turing Test: A Critical Analysis

This paper critically examines the recent publication "ChatGPT-4 in the Turing Test" by Restrepo Echavarr\'ia (2025), challenging its central claims regarding the absence of minimally serious test implementations and the conclusion that ChatGPT-4 fails the Turing Test. The analysis reveals that the criticisms based on rigid criteria and limited experimental data are not fully justified. More importantly, the paper makes several constructive contributions that enrich our understanding of Turing Test implementations. It demonstrates that two distinct formats--the three-player and two-player tests--are both valid, each with unique methodological implications. The work distinguishes between absolute criteria (reflecting an optimal 50% identification rate in a three-player format) and relative criteria (which measure how closely a machine's performance approximates that of a human), offering a more nuanced evaluation framework. Furthermore, the paper clarifies the probabilistic underpinnings of both test types by modeling them as Bernoulli experiments--correlated in the three-player version and uncorrelated in the two-player version. This formalization allows for a rigorous separation between the theoretical criteria for passing the test, defined in probabilistic terms, and the experimental data that require robust statistical methods for proper interpretation. In doing so, the paper not only refutes key aspects of the criticized study but also lays a solid foundation for future research on objective measures of how closely an AI's behavior aligns with, or deviates from, that of a human being.

Updated: 2025-04-08 21:23:00

标题: ChatGPT-4在图灵测试中的表现：一项关键分析

摘要: 这篇论文对最近由Restrepo Echavarr\'ia (2025)发表的“ChatGPT-4在图灵测试中”的论文进行了批判性审查，挑战了关于最小严肃测试实施的缺失以及ChatGPT-4未能通过图灵测试的中心主张。分析表明，基于严格标准和有限实验数据的批评并不完全合理。更重要的是，这篇论文提出了几点有建设性的贡献，丰富了我们对图灵测试实施的理解。它展示了两种不同的格式--三人玩家和两人玩家测试--都是有效的，每种格式都具有独特的方法论含义。该研究区分了绝对标准（反映了在三人玩家格式中获得50%识别率的最佳状态）和相对标准（衡量机器的表现与人类表现的接近程度），提供了更加细致的评估框架。此外，该论文通过将它们建模为Bernoulli实验，明确了两种测试类型的概率基础--在三人玩家版本中是相关的，在两人玩家版本中是不相关的。这种形式化使得可以严格区分通过测试的理论标准，以概率术语定义，并且需要强大的统计方法来正确解释实验数据。通过这样做，这篇论文不仅否定了被批评研究的关键方面，而且为未来关于评估AI行为与人类行为的接近程度或偏离程度的客观措施研究奠定了坚实的基础。

更新时间: 2025-04-08 21:23:00

领域: cs.AI,cs.CY,cs.HC,68T01

下载: http://arxiv.org/abs/2503.06551v3

FedSECA: Sign Election and Coordinate-wise Aggregation of Gradients for Byzantine Tolerant Federated Learning

One of the most common defense strategies against Byzantine clients in federated learning (FL) is to employ a robust aggregator mechanism that makes the training more resilient. While many existing Byzantine robust aggregators provide theoretical convergence guarantees and are empirically effective against certain categories of attacks, we observe that certain high-strength attacks can subvert the robust aggregator and collapse the training. To overcome this limitation, we propose a method called FedSECA for robust Sign Election and Coordinate-wise Aggregation of gradients in FL that is less susceptible to malicious updates by an omniscient attacker. The proposed method has two main components. The Concordance Ratio Induced Sign Election(CRISE) module determines the consensus direction (elected sign) for each individual parameter gradient through a weighted voting strategy. The client weights are assigned based on a novel metric called concordance ratio, which quantifies the degree of sign agreement between the client gradient updates. Based on the elected sign, a Robust Coordinate-wise Aggregation(RoCA) strategy is employed, where variance-reduced sparse gradients are aggregated only if they are in alignment with the corresponding elected sign. We compare our proposed FedSECA method against 10 robust aggregators under 7 Byzantine attacks on 3 datasets and architectures. The results show that existing robust aggregators fail for at least some attacks, while FedSECA exhibits better robustness. Code - https://github.com/JosephGeoBenjamin/FedSECA-ByzantineTolerance

Updated: 2025-04-08 21:19:40

标题: FedSECA：拜占庭容错联邦学习的签名选举和逐坐标梯度聚合

摘要: 对抗拜占庭攻击的最常见防御策略之一是采用强大的聚合器机制，使训练更具韧性。虽然许多现有的拜占庭强健聚合器提供了理论收敛保证，并在某些攻击类别下具有实证有效性，但我们观察到某些高强度攻击可能会颠覆强健聚合器并导致训练崩溃。为了克服这一限制，我们提出了一种名为FedSECA的方法，用于在联邦学习中对参数梯度进行强健的符号选举和逐坐标聚合，该方法对全知攻击者的恶意更新不太敏感。所提出的方法主要有两个组件。一致性比例诱导符号选举（CRISE）模块通过加权投票策略确定每个个体参数梯度的共识方向（选举符号）。客户权重基于一种称为一致性比例的新颖度量分配，该度量量化客户梯度更新之间的符号一致程度。根据选举符号，采用强健的逐坐标聚合（RoCA）策略，仅在它们与相应选举符号一致时聚合方差减少的稀疏梯度。我们将提出的FedSECA方法与3个数据集和架构上的7种拜占庭攻击下的10种强健聚合器进行比较。结果显示，现有的强健聚合器对至少某些攻击失效，而FedSECA表现出更好的强健性。源代码-https://github.com/JosephGeoBenjamin/FedSECA-ByzantineTolerance

更新时间: 2025-04-08 21:19:40

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2411.03861v2

Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach

In this paper, we consider the distributed optimal control problem for linear networked systems. In particular, we are interested in learning distributed optimal controllers using graph recurrent neural networks (GRNNs). Most of the existing approaches result in centralized optimal controllers with offline training processes. However, as the increasing demand of network resilience, the optimal controllers are further expected to be distributed, and are desirable to be trained in an online distributed fashion, which are also the main contributions of our work. To solve this problem, we first propose a GRNN-based distributed optimal control method, and we cast the problem as a self-supervised learning problem. Then, the distributed online training is achieved via distributed gradient computation, and inspired by the (consensus-based) distributed optimization idea, a distributed online training optimizer is designed. Furthermore, the local closed-loop stability of the linear networked system under our proposed GRNN-based controller is provided by assuming that the nonlinear activation function of the GRNN-based controller is both local sector-bounded and slope-restricted. The effectiveness of our proposed method is illustrated by numerical simulations using a specifically developed simulator.

Updated: 2025-04-08 21:18:43

标题: 基于图神经网络的线性网络系统分布式最优控制：一种在线分布式训练方法

摘要: 在本文中，我们考虑了线性网络系统的分布式最优控制问题。特别地，我们有兴趣使用图循环神经网络（GRNNs）来学习分布式最优控制器。大多数现有方法导致具有线下训练过程的集中式最优控制器。然而，随着网络韧性需求的增加，进一步期望最优控制器是分布式的，并且希望以在线分布式方式进行训练，这也是我们工作的主要贡献。为了解决这个问题，我们首先提出了基于GRNN的分布式最优控制方法，并将问题视为自监督学习问题。然后，通过分布式梯度计算实现了分布式在线训练，并受到（基于共识的）分布式优化思想的启发，设计了一个分布式在线训练优化器。此外，假设GRNN控制器的非线性激活函数既是本地区域有界的又是斜率受限的，我们提供了基于我们提出的GRNN控制器的线性网络系统的本地闭环稳定性。通过使用一个专门开发的模拟器进行数值模拟，我们展示了我们提出的方法的有效性。

更新时间: 2025-04-08 21:18:43

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2504.06439v1

Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning

Large language models (LLMs) have shown substantial capacity for generating fluent, contextually appropriate responses. However, they can produce hallucinated outputs, especially when a user query includes one or more false premises-claims that contradict established facts. Such premises can mislead LLMs into offering fabricated or misleading details. Existing approaches include pretraining, fine-tuning, and inference-time techniques that often rely on access to logits or address hallucinations after they occur. These methods tend to be computationally expensive, require extensive training data, or lack proactive mechanisms to prevent hallucination before generation, limiting their efficiency in real-time applications. We propose a retrieval-based framework that identifies and addresses false premises before generation. Our method first transforms a user's query into a logical representation, then applies retrieval-augmented generation (RAG) to assess the validity of each premise using factual sources. Finally, we incorporate the verification results into the LLM's prompt to maintain factual consistency in the final output. Experiments show that this approach effectively reduces hallucinations, improves factual accuracy, and does not require access to model logits or large-scale fine-tuning.

Updated: 2025-04-08 21:14:48

标题: 不要让它产生幻觉：通过检索增强的逻辑推理进行前提验证

摘要: 大型语言模型(LLMs)展现出生成流畅、上下文适当的回应的巨大能力。然而，它们可能会产生幻觉输出，特别是当用户查询包含一个或多个与已知事实相矛盾的错误前提时。这样的前提可能会误导LLMs提供虚构或误导性的细节。现有方法包括预训练、微调和推理时技术，通常依赖于访问对数或在幻觉发生后处理。这些方法往往计算昂贵，需要大量训练数据，或缺乏在生成之前预防幻觉的主动机制，限制了它们在实时应用中的效率。我们提出了一个基于检索的框架，该框架在生成之前识别和处理错误前提。我们的方法首先将用户的查询转换为逻辑表示，然后应用检索增强生成(RAG)来评估每个前提的有效性，使用事实来源。最后，我们将验证结果合并到LLM的提示中，以保持最终输出的事实一致性。实验证明，这种方法有效减少了幻觉，提高了事实准确性，并且不需要访问模型对数或大规模微调。

更新时间: 2025-04-08 21:14:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.06438v1

Language-Dependent Political Bias in AI: A Study of ChatGPT and Gemini

As leading examples of large language models, ChatGPT and Gemini claim to provide accurate and unbiased information, emphasizing their commitment to political neutrality and avoidance of personal bias. This research investigates the political tendency of large language models and the existence of differentiation according to the query language. For this purpose, ChatGPT and Gemini were subjected to a political axis test using 14 different languages. The findings of the study suggest that these large language models do exhibit political tendencies, with both models demonstrating liberal and leftist biases. A comparative analysis revealed that Gemini exhibited a more pronounced liberal and left-wing tendency compared to ChatGPT. The study also found that these political biases varied depending on the language used for inquiry. The study delves into the factors that constitute political tendencies and linguistic differentiation, exploring differences in the sources and scope of educational data, structural and grammatical features of languages, cultural and political contexts, and the model's response to linguistic features. From this standpoint, and an ethical perspective, it is proposed that artificial intelligence tools should refrain from asserting a lack of political tendencies and neutrality, instead striving for political neutrality and executing user queries by incorporating these tendencies.

Updated: 2025-04-08 21:13:01

标题: AI 中存在的语言依赖性政治偏见：ChatGPT 和 Gemini 的研究

摘要: 作为大型语言模型的主要示例，ChatGPT和Gemini声称提供准确和公正的信息，强调它们对政治中立性的承诺和避免个人偏见。本研究调查了大型语言模型的政治倾向以及根据查询语言存在的差异化。为此，对ChatGPT和Gemini使用14种不同语言进行了政治轴测试。研究结果表明，这些大型语言模型确实表现出政治倾向，两种模型都展示出自由派和左翼倾向。比较分析显示，Gemini相对于ChatGPT表现出更明显的自由派和左翼倾向。研究还发现，这些政治偏见因查询使用的语言而异。研究深入探讨了构成政治倾向和语言差异化的因素，探讨了教育数据来源和范围的差异、语言的结构和语法特点、文化和政治背景以及模型对语言特征的反应。从这个角度和伦理的角度来看，建议人工智能工具应该避免声称缺乏政治倾向和中立性，而是努力实现政治中立，并通过整合这些倾向执行用户查询。

更新时间: 2025-04-08 21:13:01

领域: cs.CL,cs.AI,cs.ET,stat.AP

下载: http://arxiv.org/abs/2504.06436v1

Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation

Training robust deep learning models is crucial in Earth Observation, where globally deployed models often face distribution shifts that degrade performance, especially in low-data regions. Out-of-distribution (OOD) detection addresses this by identifying inputs that deviate from in-distribution (ID) data. However, existing methods either assume access to OOD data or compromise primary task performance, limiting real-world use. We introduce TARDIS, a post-hoc OOD detection method designed for scalable geospatial deployment. Our core innovation lies in generating surrogate distribution labels by leveraging ID data within the feature space. TARDIS takes a pre-trained model, ID data, and data from an unknown distribution (WILD), separates WILD into surrogate ID and OOD labels based on internal activations, and trains a binary classifier to detect distribution shifts. We validate on EuroSAT and xBD across 17 setups covering covariate and semantic shifts, showing near-upper-bound surrogate labeling performance in 13 cases and matching the performance of top post-hoc activation- and scoring-based methods. Finally, deploying TARDIS on Fields of the World reveals actionable insights into pre-trained model behavior at scale. The code is available at \href{https://github.com/microsoft/geospatial-ood-detection}{https://github.com/microsoft/geospatial-ood-detection}

Updated: 2025-04-08 21:00:47

标题: 规模上的分布转变：地球观测中的外分布检测

摘要: 在地球观测中，训练稳健的深度学习模型至关重要，全球部署的模型经常面临分布转移，降低性能，特别是在数据稀缺地区。超出分布（OOD）检测通过识别与分布内（ID）数据偏离的输入来解决这个问题。然而，现有方法要么假设可以访问OOD数据，要么牺牲主要任务性能，限制了实际应用。我们介绍了TARDIS，一种专为可伸缩地理部署而设计的后期OOD检测方法。我们的核心创新在于利用特征空间内的ID数据生成代理分布标签。TARDIS使用预训练模型、ID数据和来自未知分布（WILD）的数据，基于内部激活将WILD分离为代理ID和OOD标签，并训练一个二元分类器来检测分布转移。我们在EuroSAT和xBD上验证了17种设置，涵盖协变量和语义转移，显示了在13种情况下接近上限的代理标签性能，并与顶级后期激活和评分方法的性能相匹配。最后，在全球的田野上部署TARDIS揭示了大规模预训练模型行为的可操作见解。代码可在\href{https://github.com/microsoft/geospatial-ood-detection}{https://github.com/microsoft/geospatial-ood-detection}上获得。

更新时间: 2025-04-08 21:00:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.13394v2

S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning

Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. Existing methods like low-rank adaptations (LoRA) are efficient but lack flexibility, while Mixture-of-Experts (MoE) architectures enhance model capacity at the cost of more & under-utilized parameters. To address these limitations, we propose Structural Mixture of Residual Experts (S'MoRE), a novel framework that seamlessly integrates the efficiency of LoRA with the flexibility of MoE. Specifically, S'MoRE employs hierarchical low-rank decomposition of expert weights, yielding residuals of varying orders interconnected in a multi-layer structure. By routing input tokens through sub-trees of residuals, S'MoRE emulates the capacity of many experts by instantiating and assembling just a few low-rank matrices. We craft the inter-layer propagation of S'MoRE's residuals as a special type of Graph Neural Network (GNN), and prove that under similar parameter budget, S'MoRE improves "structural flexibility" of traditional MoE (or Mixture-of-LoRA) by exponential order. Comprehensive theoretical analysis and empirical results demonstrate that S'MoRE achieves superior fine-tuning performance, offering a transformative approach for efficient LLM adaptation.

Updated: 2025-04-08 20:54:00

标题: S'MoRE：用于LLM微调的结构混合剩余专家

摘要: 对预训练的大型语言模型（LLMs）进行微调面临着平衡参数效率和模型容量的双重挑战。现有的方法如低秩适应（LoRA）具有高效但缺乏灵活性，而专家混合（MoE）架构则通过更多和未充分利用的参数来增强模型容量。为了解决这些限制，我们提出了结构混合残差专家（S'MoRE）的新框架，无缝地将LoRA的效率与MoE的灵活性相结合。具体来说，S'MoRE采用了专家权重的分层低秩分解，产生了多层结构中相互连接的不同阶残差。通过将输入标记路由通过残差的子树，S'MoRE通过实例化和组装少量低秩矩阵来模拟许多专家的容量。我们将S'MoRE的残差的层间传播设计为一种特殊类型的图神经网络（GNN），并证明在相似的参数预算下，S'MoRE通过指数级别提高了传统MoE（或混合LoRA）的“结构灵活性”。全面的理论分析和实证结果表明，S'MoRE实现了优越的微调性能，为高效的LLM适应提供了一种变革性方法。

更新时间: 2025-04-08 20:54:00

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06426v1

Covariant Gradient Descent

We present a manifestly covariant formulation of the gradient descent method, ensuring consistency across arbitrary coordinate systems and general curved trainable spaces. The optimization dynamics is defined using a covariant force vector and a covariant metric tensor, both computed from the first and second statistical moments of the gradients. These moments are estimated through time-averaging with an exponential weight function, which preserves linear computational complexity. We show that commonly used optimization methods such as RMSProp, Adam and AdaBelief correspond to special limits of the covariant gradient descent (CGD) and demonstrate how these methods can be further generalized and improved.

Updated: 2025-04-08 20:44:48

标题: 共变梯度下降

摘要: 我们提出了一种显式协变形式的梯度下降方法，确保在任意坐标系和一般曲线可训练空间中保持一致性。优化动态是使用协变力向量和协变度量张量定义的，这两者都是从梯度的一阶和二阶统计矩计算得到的。这些矩通过带有指数权重函数的时间平均估计，该函数保持线性计算复杂度。我们表明，常用的优化方法如RMSProp、Adam和AdaBelief对应于协变梯度下降（CGD）的特殊极限，并展示了如何进一步泛化和改进这些方法。

更新时间: 2025-04-08 20:44:48

领域: cs.LG

下载: http://arxiv.org/abs/2504.05279v2

SPIRe: Boosting LLM Inference Throughput with Speculative Decoding

Speculative decoding (SD) has been shown to reduce the latency of autoregressive decoding (AD) by 2-3x for small batch sizes. However, increasing throughput and therefore reducing the cost per token requires decoding with large batch sizes. Recent work shows that SD can accelerate decoding with large batch sizes too if the context is sufficiently long and the draft model's KV cache is sparse. We introduce SPIRe, a draft model that combines static sparse attention, pruned initialization, and feedback memory to increase the modeled throughput of speculative decoding by over 100% compared to speculation with a much smaller draft model and by over 35% compared to the strong baseline of sparse self-speculation. Our approach is particularly effective when context lengths vary significantly across requests.

Updated: 2025-04-08 20:39:20

标题: SPIRe：通过投机解码提高LLM推理吞吐量

摘要: 推测解码（SD）已被证明可以将自回归解码（AD）的延迟降低2-3倍，适用于小批量大小。然而，增加吞吐量从而降低每个标记的成本需要使用大批量大小进行解码。最近的研究表明，如果上下文足够长且初稿模型的KV缓存稀疏，SD也可以加速大批量大小的解码。我们引入了SPIRe，一个结合了静态稀疏注意力、修剪初始化和反馈记忆的初稿模型，将推测解码的建模吞吐量相对于使用较小初稿模型的推测增加了100%以上，相对于稀疏自我推测的强基线增加了35%以上。我们的方法在请求的上下文长度差异显著时特别有效。

更新时间: 2025-04-08 20:39:20

领域: cs.LG

下载: http://arxiv.org/abs/2504.06419v1

Releasing Differentially Private Event Logs Using Generative Models

In recent years, the industry has been witnessing an extended usage of process mining and automated event data analysis. Consequently, there is a rising significance in addressing privacy apprehensions related to the inclusion of sensitive and private information within event data utilized by process mining algorithms. State-of-the-art research mainly focuses on providing quantifiable privacy guarantees, e.g., via differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques designed for the release of trace variants are still insufficient to meet all the demands of industry-scale utilization. Moreover, ensuring privacy guarantees in situations characterized by a high occurrence of infrequent trace variants remains a challenging endeavor. In this paper, we introduce two novel approaches for releasing differentially private trace variants based on trained generative models. With TraVaG, we leverage \textit{Generative Adversarial Networks} (GANs) to sample from a privatized implicit variant distribution. Our second method employs \textit{Denoising Diffusion Probabilistic Models} that reconstruct artificial trace variants from noise via trained Markov chains. Both methods offer industry-scale benefits and elevate the degree of privacy assurances, particularly in scenarios featuring a substantial prevalence of infrequent variants. Also, they overcome the shortcomings of conventional privacy preservation techniques, such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data demonstrate that our approaches surpass state-of-the-art techniques in terms of privacy guarantees and utility preservation.

Updated: 2025-04-08 20:35:53

标题: 使用生成模型发布差异隐私事件日志

摘要: 近年来，行业一直在见证过程挖掘和自动化事件数据分析的广泛应用。因此，解决与过程挖掘算法使用的事件数据中包含敏感和私人信息有关的隐私担忧愈发重要。最新研究主要着眼于为主要过程挖掘技术（如过程发现）使用的迹变体提供可量化的隐私保证，例如通过差分隐私。然而，针对发布迹变体而设计的隐私保护技术仍不足以满足行业规模利用的所有需求。此外，在频繁发生罕见迹变体的情况下确保隐私保证仍是一项具有挑战性的工作。在本文中，我们介绍了两种基于经过训练的生成模型发布差分隐私迹变体的新方法。通过TraVaG，我们利用生成对抗网络（GANs）从私密的隐式变体分布中进行抽样。我们的第二种方法采用去噪扩散概率模型，通过训练后的马尔可夫链从噪声中重建人工迹变体。这两种方法都提供了行业规模的好处，并提高了隐私保证的程度，特别是在涉及大量罕见变体的情况下。此外，它们克服了传统隐私保护技术的缺点，如限制变体长度和引入虚假变体。对真实事件数据的实验结果表明，我们的方法在隐私保证和效用保留方面超越了最新技术。

更新时间: 2025-04-08 20:35:53

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2504.06418v1

TRIDENT: Tri-modal Real-time Intrusion Detection Engine for New Targets

The increasing availability of drones and their potential for malicious activities pose significant privacy and security risks, necessitating fast and reliable detection in real-world environments. However, existing drone detection systems often struggle in real-world settings due to environmental noise and sensor limitations. This paper introduces TRIDENT, a tri-modal drone detection framework that integrates synchronized audio, visual, and RF data to enhance robustness and reduce dependence on individual sensors. TRIDENT introduces two fusion strategies - Late Fusion and GMU Fusion - to improve multi-modal integration while maintaining efficiency. The framework incorporates domain-specific feature extraction techniques alongside a specialized data augmentation pipeline that simulates real-world sensor degradation to improve generalization capabilities. A diverse multi-sensor dataset is collected in urban and non-urban environments under varying lighting conditions, ensuring comprehensive evaluation. Experimental results show that TRIDENT achieves 98.8 percent accuracy in real-world recordings and 83.26 percent in a more complex setting (augmented data), outperforming unimodal and dual-modal baselines. Moreover, TRIDENT operates in real-time, detecting drones in just 6.09 ms while consuming only 75.27 mJ per detection, making it highly efficient for resource-constrained devices. The dataset and code have been released to ensure reproducibility (https://github.com/TRIDENT-2025/TRIDENT).

Updated: 2025-04-08 20:33:43

标题: TRIDENT：面向新目标的三模式实时入侵检测引擎

摘要: 随着无人机的日益普及和其潜在的恶意活动可能性，隐私和安全风险显著增加，迫使在现实环境中快速准确地进行检测。然而，由于环境噪声和传感器限制，现有的无人机检测系统通常在现实环境中遇到困难。本文介绍了TRIDENT，一个三模态无人机检测框架，集成了同步音频、视觉和射频数据，以增强鲁棒性并减少对单个传感器的依赖。TRIDENT引入了两种融合策略 - 晚融合和GMU融合 - 以改善多模态集成同时保持效率。该框架结合了领域特定的特征提取技术和专门的数据增强管道，模拟真实世界传感器降级以提高泛化能力。在城市和非城市环境中收集了一个多传感器数据集，在不同的光照条件下进行了全面评估。实验结果显示，TRIDENT在真实录音中实现了98.8%的准确率，在更复杂的环境中（增强数据）达到了83.26%，超过了单模态和双模态的基线。此外，TRIDENT实时运行，仅需6.09毫秒便能检测到无人机，每次检测只消耗75.27毫焦耳，使其对资源受限设备非常高效。数据集和代码已发布以确保可重现性（https://github.com/TRIDENT-2025/TRIDENT）。

更新时间: 2025-04-08 20:33:43

领域: cs.CR

下载: http://arxiv.org/abs/2504.06417v1

Unifying Autoregressive and Diffusion-Based Sequence Generation

We present significant extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. We introduce hyperschedules, which assign distinct noise schedules to individual token positions, generalizing both autoregressive models (e.g., GPT) and conventional diffusion models (e.g., SEDD, MDLM) as special cases. Second, we propose two hybrid token-wise noising processes that interpolate between absorbing and uniform processes, enabling the model to fix past mistakes, and we introduce a novel inference algorithm that leverages this new feature in a simplified context inspired from MDLM. To support efficient training and inference, we design attention masks compatible with KV-caching. Our methods achieve state-of-the-art perplexity and generate diverse, high-quality sequences across standard benchmarks, suggesting a promising path for autoregressive diffusion-based sequence generation.

Updated: 2025-04-08 20:32:10

标题: 统一自回归和基于扩散的序列生成

摘要: 我们提出了对基于扩散的序列生成模型进行重大扩展，模糊了与自回归语言模型之间的界限。我们引入了超调度，为每个标记位置分配不同的噪声调度，将自回归模型（例如GPT）和传统的扩散模型（例如SEDD、MDLM）作为特例进行泛化。其次，我们提出了两种混合标记级别的噪声处理过程，插值了吸收和均匀过程，使模型能够修正过去的错误，并引入了一种利用这一新特性的新推理算法，受到MDLM的启发，在简化的上下文中。为了支持高效的训练和推理，我们设计了与KV缓存兼容的注意力蒙版。我们的方法在标准基准测试中实现了最先进的困惑度，并生成了多样且高质量的序列，为基于自回归扩散的序列生成打开了一条有前途的道路。

更新时间: 2025-04-08 20:32:10

领域: cs.LG

下载: http://arxiv.org/abs/2504.06416v1

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

Multimodal large language models (MLLMs) excel in vision-language tasks but also pose significant risks of generating harmful content, particularly through jailbreak attacks. Jailbreak attacks refer to intentional manipulations that bypass safety mechanisms in models, leading to the generation of inappropriate or unsafe content. Detecting such attacks is critical to ensuring the responsible deployment of MLLMs. Existing jailbreak detection methods face three primary challenges: (1) Many rely on model hidden states or gradients, limiting their applicability to white-box models, where the internal workings of the model are accessible; (2) They involve high computational overhead from uncertainty-based analysis, which limits real-time detection, and (3) They require fully labeled harmful datasets, which are often scarce in real-world settings. To address these issues, we introduce a test-time adaptive framework called JAILDAM. Our method leverages a memory-based approach guided by policy-driven unsafe knowledge representations, eliminating the need for explicit exposure to harmful data. By dynamically updating unsafe knowledge during test-time, our framework improves generalization to unseen jailbreak strategies while maintaining efficiency. Experiments on multiple VLM jailbreak benchmarks demonstrate that JAILDAM delivers state-of-the-art performance in harmful content detection, improving both accuracy and speed.

Updated: 2025-04-08 20:25:30

标题: JailDAM：具有适应性记忆的视觉语言模型越狱检测

摘要: 多模态大型语言模型（MLLMs）在视觉语言任务中表现出色，但也存在通过越狱攻击生成有害内容的重大风险。越狱攻击是指绕过模型中的安全机制而进行的有意操作，导致生成不当或不安全的内容。检测这种攻击对确保负责任地部署MLLMs至关重要。现有的越狱检测方法面临三个主要挑战：（1）许多方法依赖于模型的隐藏状态或梯度，限制了它们对白盒模型的适用性，其中模型的内部工作是可访问的；（2）它们涉及基于不确定性分析的高计算开销，限制了实时检测；（3）它们需要完全标记的有害数据集，在现实世界中往往稀缺。为了解决这些问题，我们引入了一个称为JAILDAM的测试时自适应框架。我们的方法利用基于内存的方法，由政策驱动的不安全知识表示引导，消除了对有害数据的显式曝光的需求。通过在测试时动态更新不安全知识，我们的框架提高了对未见过的越狱策略的泛化能力，同时保持了效率。在多个VLM越狱基准测试上的实验表明，JAILDAM在有害内容检测方面提供了最先进的性能，提高了准确性和速度。

更新时间: 2025-04-08 20:25:30

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2504.03770v2

Model Equality Testing: Which Model Is This API Serving?

Users often interact with large language models through black-box inference APIs, both for closed- and open-weight models (e.g., Llama models are popularly accessed via Amazon Bedrock and Azure AI Studio). In order to cut costs or add functionality, API providers may quantize, watermark, or finetune the underlying model, changing the output distribution -- possibly without notifying users. We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem, where the user collects samples from the API and a reference distribution and conducts a statistical test to see if the two distributions are the same. We find that tests based on the Maximum Mean Discrepancy between distributions are powerful for this task: a test built on a simple string kernel achieves a median of 77.4% power against a range of distortions, using an average of just 10 samples per prompt. We then apply this test to commercial inference APIs from Summer 2024 for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.

Updated: 2025-04-08 20:19:26

标题: 模型平等测试：这个API服务的是哪个模型？

摘要: 用户经常通过黑匣子推理API与大型语言模型进行交互，无论是封闭权重模型还是开放权重模型（例如，Llama模型通常通过Amazon Bedrock和Azure AI Studio访问）。为了降低成本或增加功能，API提供者可能会对底层模型进行量化、水印或微调，改变输出分布 - 可能在未通知用户的情况下。我们将检测这种失真形式化为模型平等测试，这是一个两个样本测试问题，用户从API和参考分布中收集样本，并进行统计测试以查看这两个分布是否相同。我们发现，基于分布之间的最大均值差异的测试对于这项任务非常有效：基于简单字符串核的测试针对一系列失真实现了77.4%的中位数功率，每个提示只使用平均10个样本。然后，我们将该测试应用于来自2024年夏季的商业推理API，针对四个Llama模型，发现31个端点中有11个端点提供的分布与Meta发布的参考权重不同。

更新时间: 2025-04-08 20:19:26

领域: cs.LG

下载: http://arxiv.org/abs/2410.20247v2

Literature Review: Cyber Security Monitoring in Maritime

In recent years, many cyber incidents have occurred in the maritime sector, targeting the information technology (IT) and operational technology (OT) infrastructure. Although several literature review papers have been published in the maritime field, none of the previous studies has focused on cyber security monitoring, which aims at timely detection of cyber attacks with automated methods. The current article addresses this research gap and surveys the methods, algorithms, tools and architectures used for cyber security monitoring in the maritime sector. For the survey, a systematic literature review of cyber security monitoring studies is conducted. The first contribution of this article is the bibliometric analysis of related literature and the identification of the main research themes in previous works. For that purpose, our article presents a taxonomy for existing studies which highlights the main properties of maritime cyber security monitoring research. The second contribution of this article is an in-depth analysis of previous works and the identification of research gaps and limitations in existing literature. Based on our findings, we outline future research directions for cyber security monitoring in the maritime field.

Updated: 2025-04-08 20:17:34

标题: 文献标题翻译：文献综述：海事领域的网络安全监测

摘要: 近年来，海事领域发生了许多网络事件，针对信息技术（IT）和运营技术（OT）基础设施。尽管在海事领域已经发表了几篇文献综述论文，但之前的研究没有专注于网络安全监控，其目的是通过自动化方法及时检测网络攻击。本文填补了这一研究空白，调查了在海事领域用于网络安全监控的方法、算法、工具和架构。为了进行调查，进行了一项系统性的网络安全监控研究文献综述。本文的第一个贡献是对相关文献进行文献计量分析，并确定了以前研究中的主要研究主题。为此，我们的文章提出了一个现有研究的分类法，突出了海事网络安全监控研究的主要特性。本文的第二个贡献是对以前的研究进行深入分析，并确定了现有文献中的研究空白和局限性。根据我们的发现，我们概述了海事领域网络安全监控的未来研究方向。

更新时间: 2025-04-08 20:17:34

领域: cs.CR

下载: http://arxiv.org/abs/2503.18173v2

Evaluating Mutation Techniques in Genetic Algorithm-Based Quantum Circuit Synthesis

Quantum computing leverages the unique properties of qubits and quantum parallelism to solve problems intractable for classical systems, offering unparalleled computational potential. However, the optimization of quantum circuits remains critical, especially for noisy intermediate-scale quantum (NISQ) devices with limited qubits and high error rates. Genetic algorithms (GAs) provide a promising approach for efficient quantum circuit synthesis by automating optimization tasks. This work examines the impact of various mutation strategies within a GA framework for quantum circuit synthesis. By analyzing how different mutations transform circuits, it identifies strategies that enhance efficiency and performance. Experiments utilized a fitness function emphasizing fidelity, while accounting for circuit depth and T operations, to optimize circuits with four to six qubits. Comprehensive hyperparameter testing revealed that combining delete and swap strategies outperformed other approaches, demonstrating their effectiveness in developing robust GA-based quantum circuit optimizers.

Updated: 2025-04-08 20:14:35

标题: 评估基因算法在量子电路合成中的突变技术

摘要: 量子计算利用量子位和量子并行性的独特属性来解决经典系统无法解决的问题，提供了无与伦比的计算潜力。然而，量子电路的优化仍然至关重要，特别是对于具有有限量子位和高误差率的噪声中等规模量子(NISQ)设备。遗传算法(GAs)通过自动化优化任务提供了一种高效的量子电路合成方法。本文研究了在量子电路合成的GA框架中各种突变策略的影响。通过分析不同突变如何改变电路，它确定了增强效率和性能的策略。实验利用强调保真度的适应度函数，同时考虑电路深度和T操作，优化了四到六个量子位的电路。全面的超参数测试显示，结合删除和交换策略优于其他方法，证明了它们在开发稳健的基于GA的量子电路优化器中的有效性。

更新时间: 2025-04-08 20:14:35

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2504.06413v1

PEEL the Layers and Find Yourself: Revisiting Inference-time Data Leakage for Residual Neural Networks

This paper explores inference-time data leakage risks of deep neural networks (NNs), where a curious and honest model service provider is interested in retrieving users' private data inputs solely based on the model inference results. Particularly, we revisit residual NNs due to their popularity in computer vision and our hypothesis that residual blocks are a primary cause of data leakage owing to the use of skip connections. By formulating inference-time data leakage as a constrained optimization problem, we propose a novel backward feature inversion method, \textbf{PEEL}, which can effectively recover block-wise input features from the intermediate output of residual NNs. The surprising results in high-quality input data recovery can be explained by the intuition that the output from these residual blocks can be considered as a noisy version of the input and thus the output retains sufficient information for input recovery. We demonstrate the effectiveness of our layer-by-layer feature inversion method on facial image datasets and pre-trained classifiers. Our results show that PEEL outperforms the state-of-the-art recovery methods by an order of magnitude when evaluated by mean squared error (MSE). The code is available at \href{https://github.com/Huzaifa-Arif/PEEL}{https://github.com/Huzaifa-Arif/PEEL}

Updated: 2025-04-08 20:11:05

标题: 剥开层层面纱，找到真我：重新审视残差神经网络的推断时数据泄露

摘要: 本文探讨了深度神经网络（NNs）在推断时可能存在的数据泄露风险，即一个好奇而诚实的模型服务提供商有兴趣仅基于模型推断结果检索用户的私人数据输入。特别是，我们重新审视了残差NNs，因为它们在计算机视觉中很受欢迎，我们的假设是残差块是数据泄漏的主要原因，因为它们使用了跳跃连接。通过将推断时数据泄漏定义为一个受限制的优化问题，我们提出了一种新颖的反向特征反演方法\textbf{PEEL}，可以有效地从残差NNs的中间输出中恢复块状输入特征。通过将这些残差块的输出视为输入的嘈杂版本，我们可以解释高质量输入数据恢复的令人惊讶结果，因此输出保留了足够的信息用于输入恢复。我们在面部图像数据集和预训练分类器上展示了我们的逐层特征反演方法的有效性。我们的结果表明，PEEL在均方误差（MSE）评估中胜过了现有的恢复方法一个数量级。代码可在以下链接获得：\href{https://github.com/Huzaifa-Arif/PEEL}{https://github.com/Huzaifa-Arif/PEEL}。

更新时间: 2025-04-08 20:11:05

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2504.06410v1

Understanding Machine Unlearning Through the Lens of Mode Connectivity

Machine Unlearning aims to remove undesired information from trained models without requiring full retraining from scratch. Despite recent advancements, their underlying loss landscapes and optimization dynamics received less attention. In this paper, we investigate and analyze machine unlearning through the lens of mode connectivity - the phenomenon where independently trained models can be connected by smooth low-loss paths in the parameter space. We define and study mode connectivity in unlearning across a range of overlooked conditions, including connections between different unlearning methods, models trained with and without curriculum learning, and models optimized with first-order and secondorder techniques. Our findings show distinct patterns of fluctuation of different evaluation metrics along the curve, as well as the mechanistic (dis)similarity between unlearning methods. To the best of our knowledge, this is the first study on mode connectivity in the context of machine unlearning.

Updated: 2025-04-08 20:02:10

标题: 透过模式连接的视角理解机器遗忘

摘要: Machine Unlearning旨在从经过训练的模型中移除不需要的信息，而无需从头重新进行完整的训练。尽管最近取得了进展，但它们的基本损失景观和优化动态受到了较少关注。在本文中，我们通过模态连接的视角对机器取消学习进行了调查和分析 - 这种现象是独立训练的模型可以通过参数空间中的平滑低损失路径相连接。我们定义并研究了在各种被忽视条件下的取消学习中的模态连接，包括不同取消学习方法之间的连接，使用和不使用课程学习进行训练的模型，以及使用一阶和二阶技术进行优化的模型。我们的研究结果显示了不同评估指标沿曲线波动的明显模式，以及取消学习方法之间的机械（不）相似性。据我们所知，这是关于机器取消学习背景下的模态连接的第一项研究。

更新时间: 2025-04-08 20:02:10

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2504.06407v1

Rotated Bitboards and Reinforcement Learning in Computer Chess and Beyond

There exist several techniques for representing the chess board inside the computer. In the first part of this paper, the concepts of the bitboard-representation and the advantages of (rotated) bitboards in move generation are explained. In order to illustrate those ideas practice, the concrete implementation of the move-generator in FUSc# is discussed and we explain a technique how to verify the move-generator with the "perft"-command. We show that the move-generator of FUSc# works 100% correct. The second part of this paper deals with reinforcement learning in computer chess (and beyond). We exemplify the progress that has been made in this field in the last 15-20 years by comparing the "state of the art" from 2002-2008, when FUSc# was developed, with recent innovations connected to "AlphaZero". We discuss how a "FUSc#-Zero" could be implemented and what would be necessary to reduce the number of training games necessary to achieve a good performance. This can be seen as a test case to the general prblem of improving "sample effciency" in reinforcement learning. In the final part, we move beyond computer chess, as the importance of sample effciency extends far beyond board games into a wide range of applications where data is costly, diffcult to obtain, or time consuming to generate. We review some application of the ideas developed in AlphaZero in other domains, i.e. the "other Alphas" like AlphaFold, AlphaTensor, AlphaGeometry and AlphaProof. We also discuss future research and the potential for such methods for ecological economic planning.

Updated: 2025-04-08 19:57:41

标题: 旋转的位棋盘和强化学习在计算机国际象棋以及其他领域的应用

摘要: 存在几种在计算机内表示国际象棋棋盘的技术。在本文的第一部分中，解释了位板表示的概念以及（旋转）位板在移动生成中的优势。为了说明这些想法的实践，讨论了FUSc#中移动生成器的具体实现，并解释了如何使用“perft”命令验证移动生成器的技术。我们展示了FUSc#的移动生成器工作100％正确。本文的第二部分涉及计算机国际象棋中的强化学习（以及其他领域）。我们通过比较2002年至2008年之间的“现状”与与“AlphaZero”相关的最新创新来举例说明在过去15-20年中在这一领域取得的进展。我们讨论了如何实现“FUSc#-Zero”以及减少训练游戏数量以实现良好性能所需的条件。这可以看作是改进强化学习中“样本效率”的一般问题的一个测试案例。在最后一部分，我们超越了计算机国际象棋，因为样本效率的重要性远远超出了棋盘游戏，涉及到数据昂贵、难以获得或者生成耗时的广泛应用。我们回顾了在其他领域中应用AlphaZero所开发的想法，即“其他Alpha”如AlphaFold、AlphaTensor、AlphaGeometry和AlphaProof。我们还讨论了未来研究以及这些方法在生态经济规划中的潜力。

更新时间: 2025-04-08 19:57:41

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.10822v2

Physical spline for denoising object trajectory data by combining splines, ML feature regression and model knowledge

This article presents a method for estimating the dynamic driving states (position, velocity, acceleration and heading) from noisy measurement data. The proposed approach is effective with both complete and partial observations, producing refined trajectory signals with kinematic consistency, ensuring that velocity is the integral of acceleration and position is the integral of velocity. Additionally, the method accounts for the constraint that vehicles can only move in the direction of their orientation. The method is implemented as a configurable python library that also enables trajectory estimation solely based on position data. Regularization is applied to prevent extreme state variations. A key application is enhancing recorded trajectory data for use as reference inputs in machine learning models. At the end, the article presents the results of the method along with a comparison to ground truth data.

Updated: 2025-04-08 19:53:57

标题: 将物理样条与样条、ML特征回归和模型知识相结合，用于去噪对象轨迹数据

摘要: 本文提出了一种从嘈杂的测量数据中估计动态驾驶状态（位置、速度、加速度和航向）的方法。所提出的方法既适用于完整观测，也适用于部分观测，能够产生具有运动一致性的精细轨迹信号，确保速度是加速度的积分，位置是速度的积分。此外，该方法考虑了车辆只能朝着其方向移动的约束。该方法实现为一个可配置的Python库，还能够仅基于位置数据进行轨迹估计。正则化被应用以防止极端状态变化。一个关键的应用是增强记录的轨迹数据以用作机器学习模型中的参考输入。最后，文章呈现了该方法的结果，并与地面真实数据进行了比较。

更新时间: 2025-04-08 19:53:57

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2504.06404v1

Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation

This paper presents a hybrid model combining Transformer and CNN for predicting the current waveform in signal lines. Unlike traditional approaches such as current source models, driver linear representations, waveform functional fitting, or equivalent load capacitance methods, our model does not rely on fixed simplified models of standard-cell drivers or RC loads. Instead, it replaces the complex Newton iteration process used in traditional SPICE simulations, leveraging the powerful sequence modeling capabilities of the Transformer framework to directly predict current responses without iterative solving steps. The hybrid architecture effectively integrates the global feature-capturing ability of Transformers with the local feature extraction advantages of CNNs, significantly improving the accuracy of current waveform predictions. Experimental results demonstrate that, compared to traditional SPICE simulations, the proposed algorithm achieves an error of only 0.0098. These results highlight the algorithm's superior capabilities in predicting signal line current waveforms, timing analysis, and power evaluation, making it suitable for a wide range of technology nodes, from 40nm to 3nm.

Updated: 2025-04-08 19:42:10

标题: 融合全局和局部：Transformer-CNN协同作用用于下一代电流估计

摘要: 这篇论文提出了一种将Transformer和CNN结合的混合模型，用于预测信号线中的电流波形。与传统方法（如电流源模型、驱动器线性表示、波形函数拟合或等效负载电容方法）不同，我们的模型不依赖于标准单元驱动器或RC负载的固定简化模型。相反，它取代了传统SPICE模拟中使用的复杂牛顿迭代过程，利用Transformer框架强大的序列建模能力，直接预测电流响应而无需迭代求解步骤。混合架构有效地整合了Transformer的全局特征捕获能力与CNN的局部特征提取优势，显著提高了电流波形预测的准确性。实验结果表明，与传统的SPICE模拟相比，所提出的算法仅达到0.0098的误差。这些结果突显了该算法在预测信号线电流波形、时序分析和功耗评估方面的优越能力，使其适用于从40纳米到3纳米的广泛技术节点。

更新时间: 2025-04-08 19:42:10

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2504.07996v1

Low Rank Learning for Offline Query Optimization

Recent deployments of learned query optimizers use expensive neural networks and ad-hoc search policies. To address these issues, we introduce \textsc{LimeQO}, a framework for offline query optimization leveraging low-rank learning to efficiently explore alternative query plans with minimal resource usage. By modeling the workload as a partially observed, low-rank matrix, we predict unobserved query plan latencies using purely linear methods, significantly reducing computational overhead compared to neural networks. We formalize offline exploration as an active learning problem, and present simple heuristics that reduces a 3-hour workload to 1.5 hours after just 1.5 hours of exploration. Additionally, we propose a transductive Tree Convolutional Neural Network (TCNN) that, despite higher computational costs, achieves the same workload reduction with only 0.5 hours of exploration. Unlike previous approaches that place expensive neural networks directly in the query processing ``hot'' path, our approach offers a low-overhead solution and a no-regressions guarantee, all without making assumptions about the underlying DBMS. The code is available in \href{https://github.com/zixy17/LimeQO}{https://github.com/zixy17/LimeQO}.

Updated: 2025-04-08 19:41:19

标题: 低秩学习用于离线查询优化

摘要: 最近部署的学习查询优化器使用昂贵的神经网络和临时搜索策略。为了解决这些问题，我们引入了\textsc{LimeQO}，这是一个离线查询优化框架，利用低秩学习来高效地探索备选查询计划，资源利用最小。通过将工作负载建模为部分观察到的低秩矩阵，我们使用纯线性方法预测未观察到的查询计划延迟，与神经网络相比，显著降低了计算开销。我们将离线探索形式化为主动学习问题，并提出简单的启发式方法，在只有1.5小时的探索后，将一个3小时的工作负载减少到1.5小时。此外，我们提出了一种传统树卷积神经网络（TCNN），尽管计算成本更高，但仅需0.5小时的探索即可实现相同的工作负载减少。与将昂贵的神经网络直接放在查询处理的“热”路径中的先前方法不同，我们的方法提供了低开销的解决方案和无回退保证，而不会对底层DBMS做出假设。代码可在\href{https://github.com/zixy17/LimeQO}{https://github.com/zixy17/LimeQO}中找到。

更新时间: 2025-04-08 19:41:19

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2504.06399v1

Sharpness-Aware Parameter Selection for Machine Unlearning

It often happens that some sensitive personal information, such as credit card numbers or passwords, are mistakenly incorporated in the training of machine learning models and need to be removed afterwards. The removal of such information from a trained model is a complex task that needs to partially reverse the training process. There have been various machine unlearning techniques proposed in the literature to address this problem. Most of the proposed methods revolve around removing individual data samples from a trained model. Another less explored direction is when features/labels of a group of data samples need to be reverted. While the existing methods for these tasks do the unlearning task by updating the whole set of model parameters or only the last layer of the model, we show that there are a subset of model parameters that have the largest contribution in the unlearning target features. More precisely, the model parameters with the largest corresponding diagonal value in the Hessian matrix (computed at the learned model parameter) have the most contribution in the unlearning task. By selecting these parameters and updating them during the unlearning stage, we can have the most progress in unlearning. We provide theoretical justifications for the proposed strategy by connecting it to sharpness-aware minimization and robust unlearning. We empirically show the effectiveness of the proposed strategy in improving the efficacy of unlearning with a low computational cost.

Updated: 2025-04-08 19:41:07

标题: 机器学习中的锐度感知参数选择

摘要: 通常会发生一些敏感个人信息（如信用卡号码或密码）被误置于机器学习模型的训练中，需要在之后进行删除的情况。从经过训练的模型中删除这种信息是一个复杂的任务，需要部分地逆转训练过程。文献中提出了各种机器学习遗忘技术来解决这个问题。大多数提出的方法围绕着从训练模型中删除个别数据样本展开。另一个较少探讨的方向是当需要恢复一组数据样本的特征/标签。尽管现有的方法通过更新整个模型参数集或仅更新模型的最后一层来执行遗忘任务，我们表明有一部分模型参数对遗忘目标特征有最大的贡献。更准确地说，在Hessian矩阵中具有最大对角值（在学习的模型参数处计算）的模型参数对遗忘任务有最大的贡献。通过选择这些参数并在遗忘阶段更新它们，我们可以在遗忘过程中取得最大的进展。我们通过将其与锐度感知最小化和稳健遗忘相联系，为所提出的策略提供了理论上的证明。我们在实验中证明了所提出的策略在提高遗忘效果方面的有效性，并且具有低计算成本。

更新时间: 2025-04-08 19:41:07

领域: cs.LG

下载: http://arxiv.org/abs/2504.06398v1

Navigating Explanatory Multiverse Through Counterfactual Path Geometry

Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to technical and domain-specific constraints that aim to maximise their real-life utility. In addition to considering desiderata pertaining to the counterfactual instance itself, guaranteeing existence of a viable path connecting it with the factual data point has recently gained relevance. While current explainability approaches ensure that the steps of such a journey as well as its destination adhere to selected constraints, they neglect the multiplicity of these counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys. We define it using vector spaces, showing how to navigate, reason about and compare the geometry of counterfactual trajectories found within it. To this end, we overview their spatial properties -- such as affinity, branching, divergence and possible future convergence -- and propose an all-in-one metric, called opportunity potential, to quantify them. Notably, the explanatory process offered by our method grants explainees more agency by allowing them to select counterfactuals not only based on their absolute differences but also according to the properties of their connecting paths. To demonstrate real-life flexibility, benefit and efficacy of explanatory multiverse we propose its graph-based implementation, which we use for qualitative and quantitative evaluation on six tabular and image data sets.

Updated: 2025-04-08 19:40:51

标题: 穿越反事实路径几何的解释性多元宇宙航行

摘要: 反事实解释在解释（不透明）预测模型的决策时往往是事实上的标准。它们的生成通常受到技术和领域特定约束的限制，旨在最大化它们在现实生活中的效用。除了考虑与反事实实例本身有关的期望条件外，最近确保存在一个连接它与事实数据点的可行路径变得越来越重要。虽然当前的可解释性方法确保这一旅程的步骤以及其目的地符合所选的约束条件，但它们忽略了这些反事实路径的多样性。为了解决这一缺点，我们引入了新颖的解释多元宇宙概念，涵盖了所有可能的反事实旅程。我们使用向量空间对其进行定义，展示了如何在其中导航、推理和比较发现的反事实轨迹的几何形态。为此，我们概述了它们的空间属性，如亲和性、分支、分歧和可能的未来收敛，并提出了一种全面的度量标准，称为机会潜力，来量化它们。值得注意的是，我们的方法提供的解释过程赋予受解释者更多的主动权，使他们不仅可以基于绝对差异选择反事实，还可以根据它们连接路径的特性进行选择。为了展示解释多元宇宙的现实灵活性、益处和效力，我们提出了基于图的实现，我们将其用于对六个表格和图像数据集进行定性和定量评估。

更新时间: 2025-04-08 19:40:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.02786v4

Randomized Pairwise Learning with Adaptive Sampling: A PAC-Bayes Analysis

We study stochastic optimization with data-adaptive sampling schemes to train pairwise learning models. Pairwise learning is ubiquitous, and it covers several popular learning tasks such as ranking, metric learning and AUC maximization. A notable difference of pairwise learning from pointwise learning is the statistical dependencies among input pairs, for which existing analyses have not been able to handle in the general setting considered in this paper. To this end, we extend recent results that blend together two algorithm-dependent frameworks of analysis -- algorithmic stability and PAC-Bayes -- which allow us to deal with any data-adaptive sampling scheme in the optimizer. We instantiate this framework to analyze (1) pairwise stochastic gradient descent, which is a default workhorse in many machine learning problems, and (2) pairwise stochastic gradient descent ascent, which is a method used in adversarial training. All of these algorithms make use of a stochastic sampling from a discrete distribution (sample indices) before each update. Non-uniform sampling of these indices has been already suggested in the recent literature, to which our work provides generalization guarantees in both smooth and non-smooth convex problems.

Updated: 2025-04-08 19:37:59

标题: 随机配对学习与自适应采样：一项PAC-Bayes分析

摘要: 我们研究了使用数据自适应抽样方案进行随机优化来训练成对学习模型。成对学习是普遍的，涵盖了几种流行的学习任务，如排名、度量学习和AUC最大化。成对学习与点对学习的一个显著区别在于输入对之间的统计依赖性，现有的分析无法处理本文考虑的一般设置。为此，我们扩展了最近的结果，将算法稳定性和PAC-Bayes两个算法相关的分析框架结合在一起，这使我们能够处理优化器中的任何数据自适应抽样方案。我们将这个框架实例化为分析(1)成对随机梯度下降，这是许多机器学习问题中的默认工具，以及(2)成对随机梯度下降上升，这是对抗训练中使用的一种方法。所有这些算法在每次更新之前都利用从离散分布中随机抽样（样本索引）。最近的文献已经提出了对这些索引进行非均匀抽样，我们的工作为这些问题提供了在平滑和非平滑凸问题中的泛化保证。

更新时间: 2025-04-08 19:37:59

领域: cs.LG

下载: http://arxiv.org/abs/2504.02957v2

The Zero Body Problem: Probing LLM Use of Sensory Language

Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. This language is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. We extend an existing corpus of parallel human and model responses to short story prompts with an additional 18,000 stories generated by 18 popular models. We find that all models generate stories that differ significantly from human usage of sensory language, but the direction of these differences varies considerably between model families. Namely, Gemini models use significantly more sensory language than humans along most axes whereas most models from the remaining five families use significantly less. Linear probes run on five models suggest that they are capable of identifying sensory language. However, we find preliminary evidence suggesting that instruction tuning may discourage usage of sensory language. Finally, to support further work, we release our expanded story dataset.

Updated: 2025-04-08 19:31:37

标题: 零体问题：探究LLM对感官语言的使用

摘要: 感官语言表达了从味觉和声音到兴奋和胃痛等身体经验。这种语言引起了来自各个领域的学者的兴趣，包括机器人技术、叙事学、语言学和认知科学。在这项工作中，我们探讨了非具体化的语言模型是否能够近似人类对具体化语言的使用。我们通过向现有的平行人类和模型响应短篇故事提示的语料库添加了18,000个由18个流行模型生成的故事。我们发现，所有模型生成的故事在感官语言的使用方面与人类有显著差异，但这些差异的方向在模型家族之间变化很大。具体来说，双子座模型在大多数方面使用的感官语言显著多于人类，而其他五个家族的大多数模型使用的感官语言显著较少。在五个模型上运行的线性探针表明它们能够识别感官语言。然而，我们发现初步证据表明指导调整可能会抑制感官语言的使用。最后，为了支持进一步的研究，我们发布了我们扩展的故事数据集。

更新时间: 2025-04-08 19:31:37

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06393v1

Induced Model Matching: Restricted Models Help Train Full-Featured Models

We consider scenarios where a very accurate (often small) predictive model using restricted features is available when training a full-featured (often larger) model. This restricted model may be thought of as side-information'', and can come either from an auxiliary dataset or from the same dataset by forcing the restriction. How can the restricted model be useful to the full model? To answer this, we introduce a methodology called Induced Model Matching (IMM). IMM aligns the context-restricted, or induced, version of the large model with the restricted model. We relate IMM to approaches such as noising, which is implicit in addressing the problem, and reverse knowledge distillation from weak teachers, which is explicit but does not exploit restriction being the nature of the weakness. We show that these prior methods can be thought of as approximations to IMM and can be problematic in terms of consistency. Experimentally, we first motivate IMM using logistic regression as a toy example. We then explore it in language modeling, the application that initially inspired it, and demonstrate it on both LSTM and transformer full models, using bigrams as restricted models. We lastly give a simple RL example, which shows that POMDP policies can help learn better MDP policies. The IMM principle is thus generally applicable in common scenarios where restricted data is cheaper to collect or restricted models are easier to learn.

Updated: 2025-04-08 19:27:14

标题: 诱导模型匹配：受限模型有助于训练完整特征模型

摘要: 我们考虑这样的情景：在训练一个包含所有特征的完整模型时，有一个非常精确（通常很小）的预测模型可用，该预测模型使用受限特征。这个受限模型可以被视为“侧信息”，可以来自辅助数据集，也可以通过强制限制来自于同一数据集。受限模型如何对完整模型有用呢？为了回答这个问题，我们引入了一种称为诱导模型匹配（IMM）的方法。IMM将大模型的上下文受限的诱导版本与受限模型对齐。我们将IMM与诸如添加噪声和从弱教师进行逆向知识蒸馏等方法联系起来，这些方法在解决问题时隐含了噪声，而后者是显式的，但并未充分利用限制是弱点的本质。我们展示这些先前的方法可以被视为IMM的近似，并且在一致性方面可能存在问题。在实验中，我们首先以逻辑回归为例来激发IMM。然后我们在语言建模中探索IMM，这是最初激发它的应用，并在LSTM和transformer完整模型上演示，使用bigrams作为受限模型。最后，我们给出一个简单的强化学习示例，展示了POMDP策略如何帮助学习更好的MDP策略。因此，IMM原则通常适用于受限数据更容易收集或受限模型更容易学习的常见情景中。

更新时间: 2025-04-08 19:27:14

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.12513v2

Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data

Gaussian process (GP) models that combine both categorical and continuous input variables have found use in analysis of longitudinal data and computer experiments. However, standard inference for these models has the typical cubic scaling, and common scalable approximation schemes for GPs cannot be applied since the covariance function is non-continuous. In this work, we derive a basis function approximation scheme for mixed-domain covariance functions, which scales linearly with respect to the number of observations and total number of basis functions. The proposed approach is naturally applicable to also Bayesian GP regression with discrete observation models. We demonstrate the scalability of the approach and compare model reduction techniques for additive GP models in a longitudinal data context. We confirm that we can approximate the exact GP model accurately in a fraction of the runtime compared to fitting the corresponding exact model. In addition, we demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models when dealing with a large number of candidate predictors.

Updated: 2025-04-08 19:22:12

标题: 可伸缩的混合领域高斯过程建模和纵向数据模型简化

摘要: 混合类别和连续输入变量的高斯过程（GP）模型已在纵向数据分析和计算实验中找到应用。然而，这些模型的标准推断通常具有典型的立方缩放，常见的可扩展近似方案对GP不适用，因为协方差函数是非连续的。在这项工作中，我们推导了一种基函数逼近方案，用于混合域协方差函数，其与观测数量和基函数总数成线性关系。所提出的方法自然适用于具有离散观测模型的贝叶斯GP回归。我们展示了该方法的可扩展性，并在纵向数据环境中比较了附加GP模型的模型简化技术。我们确认，与拟合相应的精确模型相比，我们可以在运行时间的一小部分内准确逼近精确的GP模型。此外，我们展示了一种可扩展的模型简化工作流程，用于在处理大量候选预测变量时获得更小且更易解释的模型。

更新时间: 2025-04-08 19:22:12

领域: stat.CO,cs.LG

下载: http://arxiv.org/abs/2111.02019v4

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.

Updated: 2025-04-08 19:11:31

标题: 一类连续时间线性二次强化学习问题的次线性后悔

摘要: 我们研究了针对扩散过程的一类连续时间线性二次(LQ)控制问题的强化学习（RL），其中状态为标量值，运行控制奖励不存在，但状态过程的波动性取决于状态和控制变量。我们应用了一种无模型的方法，既不依赖于模型参数的知识，也不依赖于其估计，并设计了一个RL算法来直接学习最优策略参数。我们的主要贡献包括引入一个探索计划和对所提出算法的后悔分析。我们提供了策略参数收敛到最优值的收敛速度，并证明该算法在对数因子下达到了$O(N^{\frac{3}{4}})$的后悔界，其中$N$为学习周期数。我们进行了模拟研究以验证理论结果，并展示了所提出算法的有效性和可靠性。我们还对我们的方法与最近针对状态和控制依赖波动性设置的模型驱动随机LQ RL研究方法进行了数值比较，表明前者在后悔界方面表现更佳。

更新时间: 2025-04-08 19:11:31

领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2407.17226v4

Tabular and Deep Reinforcement Learning for Gittins Index

In the realm of multi-arm bandit problems, the Gittins index policy is known to be optimal in maximizing the expected total discounted reward obtained from pulling the Markovian arms. In most realistic scenarios however, the Markovian state transition probabilities are unknown and therefore the Gittins indices cannot be computed. One can then resort to reinforcement learning (RL) algorithms that explore the state space to learn these indices while exploiting to maximize the reward collected. In this work, we propose tabular (QGI) and Deep RL (DGN) algorithms for learning the Gittins index that are based on the retirement formulation for the multi-arm bandit problem. When compared with existing RL algorithms that learn the Gittins index, our algorithms have a lower run time, require less storage space (small Q-table size in QGI and smaller replay buffer in DGN), and illustrate better empirical convergence to the Gittins index. This makes our algorithm well suited for problems with large state spaces and is a viable alternative to existing methods. As a key application, we demonstrate the use of our algorithms in minimizing the mean flowtime in a job scheduling problem when jobs are available in batches and have an unknown service time distribution.

Updated: 2025-04-08 19:10:59

标题: 表格和深度强化学习在吉廷斯指数中的应用

摘要: 在多臂老虎机问题领域中，吉廷斯指数策略被认为是最优的，可以最大化从拉动马尔可夫臂获取的预期总折扣奖励。然而，在大多数实际情况下，马尔可夫状态转移概率是未知的，因此无法计算吉廷斯指数。可以借助强化学习（RL）算法来探索状态空间以学习这些指数，同时利用以最大化收集的奖励。在这项工作中，我们提出了基于多臂老虎机问题的退休制定的学习吉廷斯指数的表格（QGI）和深度RL（DGN）算法。与现有的学习吉廷斯指数的RL算法相比，我们的算法运行时间更短，需要更少的存储空间（QGI中的小Q表大小和DGN中的更小的重放缓冲区），并且展示出更好的实证收敛性到吉廷斯指数。这使得我们的算法非常适用于具有大状态空间的问题，并且是现有方法的一个可行替代方案。作为一个关键应用，我们展示了我们的算法在最小化作业调度问题中平均流程时间的应用，当作业以批量可用且具有未知服务时间分布时。

更新时间: 2025-04-08 19:10:59

领域: cs.LG,cs.PF,stat.ML

下载: http://arxiv.org/abs/2405.01157v3

SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL

To apply reinforcement learning to safety-critical applications, we ought to provide safety guarantees during both policy training and deployment. In this work we present novel theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup: the bound, based on a `maximum policy ratio' that is computed with respect to a `safe' base policy, can also be more generally applied to temporally-extended properties (beyond safety) and to robust control problems. We thus present SPoRt, which also provides a data-driven approach for obtaining such a bound for the base policy, based on scenario theory, and which includes Projected PPO, a new projection-based approach for training the task-specific policy while maintaining a user-specified bound on property violation. Hence, SPoRt enables the user to trade off safety guarantees in exchange for task-specific performance. Accordingly, we present experimental results demonstrating this trade-off, as well as a comparison of the theoretical bound to posterior bounds based on empirical violation rates.

Updated: 2025-04-08 19:09:07

标题: SPoRt - 安全策略比率：在无模型RL中对任务策略进行认证培训和部署

摘要: 为了将强化学习应用于安全关键应用程序，我们应该在策略训练和部署过程中提供安全保证。在这项工作中，我们提出了一些新颖的理论结果，为模型无关、分集设置中的新任务特定策略提供了一种违反安全属性的概率上限：该上限基于一个“最大策略比率”，该比率是针对一个“安全”基础策略计算的，也可以更普遍地应用于超越安全性的时间扩展属性和鲁棒控制问题。因此，我们提出了SPoRt，它还提供了一种基于情景理论的数据驱动方法，用于获取基础策略的这种上限，并包括Projected PPO，这是一种新的基于投影的方法，用于训练任务特定策略，同时保持用户指定的属性违反上限。因此，SPoRt使用户能够在安全保证和任务特定性能之间进行权衡。因此，我们展示了实验结果，展示了这种权衡，以及基于经验违反率的后验上限的理论上限的比较。

更新时间: 2025-04-08 19:09:07

领域: cs.LG

下载: http://arxiv.org/abs/2504.06386v1

Foundation Model for Composite Microstructures: Reconstruction, Stiffness, and Nonlinear Behavior Prediction

The rapid advancement of machine learning has unlocked numerous opportunities for materials science, particularly in accelerating the design and analysis of materials. However, a significant challenge lies in the scarcity and high cost of obtaining high-quality materials datasets. While foundation models pre-trained on large datasets have excelled in fields like natural language processing by leveraging latent features through transfer learning, their application in materials science remains limited. Here, we present a foundation model specifically designed for composite materials. Pre-trained on a dataset of short-fiber composites to learn robust latent features, the model accurately predicts homogenized stiffness during transfer learning, even with limited training data. Additionally, our model effectively predicts the material's nonlinear behavior by transferring these learned features to an Interaction-based Material Network, which is a constitutive surrogate model. These results demonstrate the potential of our foundation model to capture complex material behaviors. Our findings validate the feasibility and effectiveness of foundation models in composite materials. We anticipate extending this approach to more complex three-dimensional composite materials, polycrystalline materials, and beyond. Moreover, this framework enables high-accuracy predictions even when experimental data are scarce, paving the way for more efficient and cost-effective materials design and analysis.

Updated: 2025-04-08 19:00:34

标题: 复合微结构基础模型：重建、刚度和非线性行为预测

摘要: 机器学习的快速发展为材料科学开辟了许多机会，特别是在加速材料设计和分析方面。然而，一个重要的挑战在于获取高质量材料数据集的稀缺性和高成本。虽然在自然语言处理等领域表现突出的基础模型在大型数据集上预训练，通过迁移学习利用潜在特征，但它们在材料科学中的应用仍然有限。在这里，我们提出了一个专门为复合材料设计的基础模型。在短纤维复合材料数据集上进行预训练以学习稳健的潜在特征，该模型甚至在有限的训练数据情况下也能准确预测均质化刚度。此外，我们的模型通过将这些学到的特征转移到基于相互作用的材料网络中，有效地预测了材料的非线性行为，这是一种构成代理模型。这些结果展示了我们的基础模型捕捉复杂材料行为的潜力。我们的发现验证了基础模型在复合材料中的可行性和有效性。我们期待将这种方法扩展到更复杂的三维复合材料、多晶材料等。此外，该框架在实验数据稀缺时也能实现高精度预测，为更高效和成本效益的材料设计和分析铺平了道路。

更新时间: 2025-04-08 19:00:34

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2411.06565v3

ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion

Parameter generation has emerged as a novel paradigm for neural network development, offering an alternative to traditional neural network training by synthesizing high-quality model weights directly. In the context of Low-Rank Adaptation (LoRA) for evolving ($\textit{i.e.}$, constantly updated) large language models (LLMs), this approach promises efficient adaptation without costly retraining. However, existing methods face critical limitations in simultaneously achieving scalability and controllability. In this paper, we introduce $\texttt{ORAL}$, a novel $\textbf{conditional recurrent diffusion}$ framework that addresses these challenges. $\texttt{ORAL}$ incorporates a novel conditioning mechanism that integrates model architecture and textual task specifications, enabling the generation of task-specific LoRA parameters that can seamlessly transfer across evolving foundation models. Our approach successfully scales to billions-of-parameter LLMs and maintains controllability. Through extensive experiments across seven language tasks, four vision tasks, and three multimodal tasks using five pre-trained LLMs, we demonstrate that $\texttt{ORAL}$ generates high-quality LoRA parameters that achieve comparable or superior performance to vanilla trained counterparts.

Updated: 2025-04-08 18:38:56

标题: 口头：通过条件递归扩散促进大规模LoRAs

摘要: 参数生成已经成为神经网络开发的一种新范式，提供了一种替代传统神经网络训练的方法，通过直接合成高质量的模型权重。在为不断更新的大型语言模型（LLMs）进行低秩适应（LoRA）时，这种方法承诺在没有昂贵的重新训练的情况下实现高效的适应。然而，现有方法在同时实现可扩展性和可控性方面面临关键限制。在本文中，我们介绍了$\texttt{ORAL}$，这是一个新颖的$\textbf{条件递归扩散}$框架，可以解决这些挑战。$\texttt{ORAL}$包括一个新颖的调节机制，将模型架构和文本任务规范整合在一起，从而生成可以在不断更新的基础模型之间无缝传输的特定任务LoRA参数。我们的方法成功扩展到了数十亿参数的LLMs，并保持可控性。通过在七个语言任务、四个视觉任务和三个多模态任务上进行广泛实验，使用五个预训练的LLMs，我们证明了$\texttt{ORAL}$生成的高质量LoRA参数实现了与普通训练对照组相当或更好的性能。

更新时间: 2025-04-08 18:38:56

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2503.24354v2

Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. Our work leverages a convex reformulation of the standard weight-decay penalized training problem as a set of group-$\ell_1$-regularized data-local models, where locality is enforced by polyhedral cone constraints. In the special case of zero-regularization, we show that this problem is exactly equivalent to unconstrained optimization of a convex "gated ReLU" network with non-singular gates. For problems with non-zero regularization, we show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem. To optimize the convex reformulations, we develop an accelerated proximal gradient method and a practical augmented Lagrangian solver. We show that these approaches are faster than standard training heuristics for the non-convex problem, such as SGD, and outperform commercial interior-point solvers. Experimentally, we verify our theoretical results, explore the group-$\ell_1$ regularization path, and scale convex optimization for neural networks to image classification on MNIST and CIFAR-10.

Updated: 2025-04-08 18:36:13

标题: 快速凸优化用于两层ReLU网络：等价模型类和锥分解

摘要: 我们开发了用于具有ReLU激活函数的两层神经网络的凸优化的快速算法和稳健软件。我们的工作利用了标准权重衰减惩罚训练问题的凸重构，作为一组群- $ \ell_1 $ -正则化的数据局部模型，其中局部性通过多面体锥约束来强制执行。在零正则化的特殊情况下，我们展示了这个问题与具有非奇异门的凸“门控ReLU”网络的无约束优化完全等价。对于具有非零正则化的问题，我们展示了凸门控ReLU模型为ReLU训练问题获得数据相关的近似界限。为了优化凸重构，我们开发了一种加速的近端梯度方法和一个实用的增广拉格朗日求解器。我们展示了这些方法对于非凸问题比标准训练启发式方法（如SGD）更快，并且优于商用内点求解器。在实验中，我们验证了我们的理论结果，探索了群- $ \ell_1 $ 正则化路径，并将凸优化扩展到了MNIST和CIFAR-10上的图像分类。

更新时间: 2025-04-08 18:36:13

领域: cs.LG

下载: http://arxiv.org/abs/2202.01331v4

Center-fixing of tropical cyclones using uncertainty-aware deep learning applied to high-temporal-resolution geostationary satellite imagery

Determining the location of a tropical cyclone's (TC) surface circulation center -- "center-fixing" -- is a critical first step in the TC-forecasting process, affecting current and future estimates of track, intensity, and structure. Despite a recent increase in automated center-fixing methods, only one such method (ARCHER-2) is operational, and its best performance is achieved when using microwave or scatterometer data, which are not available at every forecast cycle. We develop a deep-learning algorithm called GeoCenter; besides a few scalars in the operational ATCF, it relies only on geostationary IR satellite imagery, which is available for all TC basins at high frequency (10 min) and low latency (< 10 min) during both day and night. GeoCenter ingests an animation (time series) of IR images, including 9 channels at lag times up to 4 hours. The animation is centered at a "first guess" location, offset from the true TC-center location by 48 km on average and sometimes > 100 km; GeoCenter is tasked with correcting this offset. On an independent testing dataset, GeoCenter achieves a mean/median/RMS (root mean square) error of 26.6/22.2/32.4 km for all systems, 24.7/20.8/30.0 km for tropical systems, and 14.6/12.5/17.3 km for category-2--5 hurricanes. These values are similar to ARCHER-2 errors with microwave or scatterometer data, and better than ARCHER-2 errors when only IR data are available. GeoCenter also performs skillful uncertainty quantification, producing a well calibrated ensemble of 150 TC-center locations. Furthermore, all predictors used by GeoCenter are available in real time, which would make GeoCenter easy to implement operationally every 10 min.

Updated: 2025-04-08 18:34:36

标题: 使用深度学习技术和高时间分辨率的静止卫星图像来确定热带气旋的中心位置

摘要: 确定热带气旋（TC）地表环流中心的位置——“中心定位”——是TC预测过程中的关键第一步，影响着路径、强度和结构的当前和未来估计。尽管最近自动中心定位方法有所增加，但只有一种方法（ARCHER-2）是运行的，并且其最佳性能是在使用微波或散射计数据时实现的，而这些数据并不在每个预报周期都可用。我们开发了一种名为GeoCenter的深度学习算法；除了操作性ATCF中的一些标量外，它仅依赖于地球静止红外卫星图像，这些图像在所有TC盆地中都以高频率（10分钟）和低延迟（<10分钟）在白天和夜晚都可用。GeoCenter摄入了一个包括最多4小时滞后时间的IR图像动画（时间序列），包括9个通道。这个动画以一个“第一次猜测”的位置为中心，平均偏离真实的TC中心位置48公里，有时甚至大于100公里；GeoCenter的任务是纠正这个偏移。在一个独立的测试数据集上，GeoCenter对所有系统实现了平均/中位/RMS（均方根）误差分别为26.6/22.2/32.4公里，对热带系统为24.7/20.8/30.0公里，对2-5级飓风为14.6/12.5/17.3公里。这些数值与使用微波或散射计数据的ARCHER-2误差相似，并且比仅使用IR数据时的ARCHER-2误差更好。GeoCenter还执行了熟练的不确定性量化，产生了一个150个TC中心位置的良好校准的集合。此外，GeoCenter使用的所有预测因子都是实时可用的，这将使GeoCenter每10分钟在操作上易于实施。

更新时间: 2025-04-08 18:34:36

领域: physics.ao-ph,cs.AI

下载: http://arxiv.org/abs/2409.16507v2

GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors

The creation of manufacturable and editable 3D shapes through Computer-Aided Design (CAD) remains a highly manual and time-consuming task, hampered by the complex topology of boundary representations of 3D solids and unintuitive design tools. While most work in the 3D shape generation literature focuses on representations like meshes, voxels, or point clouds, practical engineering applications demand the modifiability and manufacturability of CAD models and the ability for multi-modal conditional CAD model generation. This paper introduces GenCAD, a generative model that employs autoregressive transformers with a contrastive learning framework and latent diffusion models to transform image inputs into parametric CAD command sequences, resulting in editable 3D shape representations. Extensive evaluations demonstrate that GenCAD significantly outperforms existing state-of-the-art methods in terms of the unconditional and conditional generations of CAD models. Additionally, the contrastive learning framework of GenCAD facilitates the retrieval of CAD models using image queries from large CAD databases, which is a critical challenge within the CAD community. Our results provide a significant step forward in highlighting the potential of generative models to expedite the entire design-to-production pipeline and seamlessly integrate different design modalities.

Updated: 2025-04-08 18:30:54

标题: GenCAD：基于图像条件的计算机辅助设计生成，采用基于Transformer的对比表示和扩散先验

摘要: 通过计算机辅助设计（CAD）创建可制造和可编辑的3D形状仍然是一个高度手动和耗时的任务，受到3D固体边界表示的复杂拓扑和不直观设计工具的阻碍。虽然3D形状生成文献中的大部分工作集中在表示如网格、体素或点云，但实际工程应用需要CAD模型的可修改性和可制造性，以及多模态条件CAD模型生成的能力。本文介绍了GenCAD，一种采用自回归变换器、对比学习框架和潜变扩散模型的生成模型，将图像输入转换为参数化CAD命令序列，从而产生可编辑的3D形状表示。广泛的评估表明，GenCAD在CAD模型的无条件和条件生成方面明显优于现有的最先进方法。此外，GenCAD的对比学习框架有助于从大型CAD数据库中利用图像查询检索CAD模型，这是CAD社区内的一个关键挑战。我们的结果在突出生成模型加速整个设计到生产流程的潜力和无缝整合不同设计模态方面迈出了重要的一步。

更新时间: 2025-04-08 18:30:54

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2409.16294v2

Deep spatio-temporal point processes: Advances and new directions

Spatio-temporal point processes (STPPs) model discrete events distributed in time and space, with important applications in areas such as criminology, seismology, epidemiology, and social networks. Traditional models often rely on parametric kernels, limiting their ability to capture heterogeneous, nonstationary dynamics. Recent innovations integrate deep neural architectures -- either by modeling the conditional intensity function directly or by learning flexible, data-driven influence kernels, substantially broadening their expressive power. This article reviews the development of the deep influence kernel approach, which enjoys statistical explainability, since the influence kernel remains in the model to capture the spatiotemporal propagation of event influence and its impact on future events, while also possessing strong expressive power, thereby benefiting from both worlds. We explain the main components in developing deep kernel point processes, leveraging tools such as functional basis decomposition and graph neural networks to encode complex spatial or network structures, as well as estimation using both likelihood-based and likelihood-free methods, and address computational scalability for large-scale data. We also discuss the theoretical foundation of kernel identifiability. Simulated and real-data examples highlight applications to crime analysis, earthquake aftershock prediction, and sepsis prediction modeling, and we conclude by discussing promising directions for the field.

Updated: 2025-04-08 18:28:12

标题: 深度时空点过程：进展与新方向

摘要: 时空点过程（STPPs）模型离散事件在时间和空间中的分布，具有在犯罪学、地震学、流行病学和社交网络等领域的重要应用。传统模型通常依赖于参数核函数，限制了其捕捉异质性、非平稳动态的能力。最近的创新整合了深度神经结构，通过直接建模条件强度函数或学习灵活的、数据驱动的影响核函数，大大拓宽了其表达能力。本文回顾了深度影响核方法的发展，该方法享有统计可解释性，因为影响核仍然存在于模型中，捕捉事件影响的时空传播以及其对未来事件的影响，同时具有强大的表达能力，从而获益于两个世界。我们解释了开发深度核点过程的主要组成部分，利用功能基分解和图神经网络等工具来编码复杂的空间或网络结构，以及使用基于似然和不基于似然的方法进行估计，并解决大规模数据的计算可扩展性。我们还讨论了核可识别性的理论基础。模拟和真实数据示例突显了在犯罪分析、地震余震预测和败血症预测建模等领域的应用，最后我们讨论了该领域的发展方向。

更新时间: 2025-04-08 18:28:12

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2504.06364v1

Towards Federated RLHF with Aggregated Client Preference for LLMs

Reinforcement learning with human feedback (RLHF) fine-tunes a pretrained large language model (LLM) using user preference data, enabling it to generate content aligned with human preferences. However, due to privacy concerns, users may be reluctant to share sensitive preference data. To address this, we propose utilizing Federated Learning (FL) techniques, allowing large-scale preference collection from diverse real-world users without requiring them to transmit data to a central server. Our federated RLHF methods (i.e., FedBis and FedBiscuit) encode each client's preferences into binary selectors and aggregate them to capture common preferences. In particular, FedBiscuit overcomes key challenges, such as preference heterogeneity and reward hacking, through innovative solutions like grouping clients with similar preferences to reduce heterogeneity and using multiple binary selectors to enhance LLM output quality. To evaluate the performance of the proposed methods, we establish the first federated RLHF benchmark with a heterogeneous human preference dataset. Experimental results show that by integrating the LLM with aggregated client preferences, FedBis and FedBiscuit significantly enhance the professionalism and readability of the generated content.

Updated: 2025-04-08 18:13:57

标题: 朝向具有聚合客户偏好的联邦RLHF的LLMs

摘要: 使用人类反馈的强化学习（RLHF）通过使用用户偏好数据微调预训练的大型语言模型（LLM），使其能够生成符合人类偏好的内容。然而，由于隐私问题，用户可能不愿分享敏感的偏好数据。为了解决这个问题，我们提出利用联邦学习（FL）技术，允许从不同现实世界用户进行大规模偏好收集，而无需他们将数据传输到中央服务器。我们的联邦RLHF方法（即FedBis和FedBiscuit）将每个客户端的偏好编码为二进制选择器，并将它们聚合起来以捕捉共同的偏好。特别是，FedBiscuit通过创新性解决方案，如将具有相似偏好的客户端分组以减少异质性，并使用多个二进制选择器来增强LLM输出质量，克服了关键挑战，如偏好异质性和奖励欺诈。为了评估所提方法的性能，我们建立了第一个带有异质人类偏好数据集的联邦RLHF基准。实验结果表明，通过将LLM与聚合客户端偏好整合，FedBis和FedBiscuit显著提高了生成内容的专业性和可读性。

更新时间: 2025-04-08 18:13:57

领域: cs.CL,cs.DC,cs.LG

下载: http://arxiv.org/abs/2407.03038v3

Genetic Programming for Explainable Manifold Learning

Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.

Updated: 2025-04-08 18:10:44

标题: 遗传编程用于可解释流形学习

摘要: 流形学习技术在机器学习中起着关键作用，通过揭示高维数据中的低维嵌入，从而通过将数据转换为较低维度表示，提高数据分析的效率和可解释性。然而，当前流形学习方法的一个显著挑战是它们缺乏明确的功能映射，这对许多现实世界的应用中的可解释性至关重要。以其可解释的基于功能树的模型而闻名，遗传编程已经成为解决这一挑战的一种有前途的方法。先前的研究利用多目标遗传编程来平衡流形质量与嵌入维度，产生跨一系列嵌入大小的功能映射。然而，这些映射树往往变得复杂，阻碍了可解释性。为了应对这一问题，在本文中，我们介绍了一种名为遗传编程可解释流形学习（GP-EMaL）的新方法，该方法直接惩罚树的复杂性。我们的新方法能够保持高流形质量，同时显著增强可解释性，并且还允许定制复杂度度量，如对称平衡、缩放和节点复杂度，以满足各种应用需求。我们的实验分析表明，GP-EMaL能够在大多数情况下与现有方法的性能相匹配，同时使用更简单、更小、更可解释的树结构。这一进步标志着实现可解释流形学习的重要一步。

更新时间: 2025-04-08 18:10:44

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2403.14139v2

From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction

Game State Reconstruction (GSR), a critical task in Sports Video Understanding, involves precise tracking and localization of all individuals on the football field-players, goalkeepers, referees, and others - in real-world coordinates. This capability enables coaches and analysts to derive actionable insights into player movements, team formations, and game dynamics, ultimately optimizing training strategies and enhancing competitive advantage. Achieving accurate GSR using a single-camera setup is highly challenging due to frequent camera movements, occlusions, and dynamic scene content. In this work, we present a robust end-to-end pipeline for tracking players across an entire match using a single-camera setup. Our solution integrates a fine-tuned YOLOv5m for object detection, a SegFormer-based camera parameter estimator, and a DeepSORT-based tracking framework enhanced with re-identification, orientation prediction, and jersey number recognition. By ensuring both spatial accuracy and temporal consistency, our method delivers state-of-the-art game state reconstruction, securing first place in the SoccerNet Game State Reconstruction Challenge 2024 and significantly outperforming competing methods.

Updated: 2025-04-08 18:10:44

标题: 从广播到小地图：实现目前最先进的SoccerNet游戏状态重建

摘要: 游戏状态重建（GSR）是体育视频理解中的关键任务，涉及对足球场上的所有个体-球员、守门员、裁判等-在现实世界坐标中的精确跟踪和定位。这种能力使教练和分析师能够获得关于球员移动、团队编队和比赛动态的可操作见解，最终优化训练策略并增强竞争优势。使用单摄像头设置实现准确的GSR非常具有挑战性，原因在于摄像头频繁移动、遮挡和动态场景内容。在这项工作中，我们提出了一个强大的端到端管道，使用单摄像头设置跟踪整个比赛中的球员。我们的解决方案集成了一个经过微调的YOLOv5m用于目标检测，一个基于SegFormer的相机参数估计器，以及一个基于DeepSORT的跟踪框架，增强了重新识别、方向预测和球衣号码识别。通过确保空间准确性和时间一致性，我们的方法提供了最先进的游戏状态重建，获得了2024年SoccerNet游戏状态重建挑战赛的第一名，并明显优于竞争方法。

更新时间: 2025-04-08 18:10:44

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.06357v1

An Information-Geometric Approach to Artificial Curiosity

Learning in environments with sparse rewards remains a fundamental challenge in reinforcement learning. Artificial curiosity addresses this limitation through intrinsic rewards to guide exploration, however, the precise formulation of these rewards has remained elusive. Ideally, such rewards should depend on the agent's information about the environment, remaining agnostic to the representation of the information -- an invariance central to information geometry. Leveraging information geometry, we show that invariance under congruent Markov morphisms and the agent-environment interaction, uniquely constrains intrinsic rewards to concave functions of the reciprocal occupancy. Additional geometrically motivated restrictions effectively limits the candidates to those determined by a real parameter that governs the occupancy space geometry. Remarkably, special values of this parameter are found to correspond to count-based and maximum entropy exploration, revealing a geometric exploration-exploitation trade-off. This framework provides important constraints to the engineering of intrinsic reward while integrating foundational exploration methods into a single, cohesive model.

Updated: 2025-04-08 18:04:15

标题: 一种信息几何方法用于人工好奇心

摘要: 在稀疏奖励环境中学习仍然是强化学习中的一个基本挑战。人工好奇心通过内在奖励来指导探索来解决这一限制，然而，这些奖励的精确制定一直是难以捉摸的。理想情况下，这些奖励应该取决于代理对环境的信息，保持对信息表示的不可知性 -- 这是信息几何的核心不变性。利用信息几何，我们展示了在同余马尔可夫形变和代理-环境交互下的不变性，唯一地将内在奖励约束为倒数占据的凹函数。额外的几何动机限制有效地将候选对象限制为由控制占据空间几何的实参数确定的那些。值得注意的是，这个参数的特殊值被发现对应于基于计数和最大熵探索，揭示了一个几何探索-利用权衡。这个框架为内在奖励的工程提供了重要的约束，同时将基础探索方法整合到一个统一的模型中。

更新时间: 2025-04-08 18:04:15

领域: cs.LG

下载: http://arxiv.org/abs/2504.06355v1

Scalable Robust Bayesian Co-Clustering with Compositional ELBOs

Co-clustering exploits the duality of instances and features to simultaneously uncover meaningful groups in both dimensions, often outperforming traditional clustering in high-dimensional or sparse data settings. Although recent deep learning approaches successfully integrate feature learning and cluster assignment, they remain susceptible to noise and can suffer from posterior collapse within standard autoencoders. In this paper, we present the first fully variational Co-clustering framework that directly learns row and column clusters in the latent space, leveraging a doubly reparameterized ELBO to improve gradient signal-to-noise separation. Our unsupervised model integrates a Variational Deep Embedding with a Gaussian Mixture Model (GMM) prior for both instances and features, providing a built-in clustering mechanism that naturally aligns latent modes with row and column clusters. Furthermore, our regularized end-to-end noise learning Compositional ELBO architecture jointly reconstructs the data while regularizing against noise through the KL divergence, thus gracefully handling corrupted or missing inputs in a single training pipeline. To counteract posterior collapse, we introduce a scale modification that increases the encoder's latent means only in the reconstruction pathway, preserving richer latent representations without inflating the KL term. Finally, a mutual information-based cross-loss ensures coherent co-clustering of rows and columns. Empirical results on diverse real-world datasets from multiple modalities, numerical, textual, and image-based, demonstrate that our method not only preserves the advantages of prior Co-clustering approaches but also exceeds them in accuracy and robustness, particularly in high-dimensional or noisy settings.

Updated: 2025-04-08 18:02:36

标题: 可扩展的稳健贝叶斯共聚类与组合ELBOs

摘要: 共聚类利用实例和特征的二元性，同时在两个维度中发现有意义的群组，通常在高维或稀疏数据设置中胜过传统的聚类方法。尽管最近的深度学习方法成功地整合了特征学习和群集分配，但它们仍然容易受到噪声的影响，并且可能在标准自编码器中遭受后验坍缩。在本文中，我们提出了第一个完全变分共聚类框架，直接在潜在空间中学习行和列群集，利用双重参数化ELBO来改进梯度信噪比分离。我们的无监督模型集成了一个变分深度嵌入和一个高斯混合模型（GMM）先验，用于实例和特征，提供了一个内置的聚类机制，自然地将潜在模式与行和列群集对齐。此外，我们的正则化端到端噪声学习组合ELBO架构同时重建数据，通过KL散度对抗噪声进行正则化，因此可以在单个训练流程中优雅地处理损坏或缺失的输入。为了抵消后验坍缩，我们引入了一个增加编码器潜在均值的尺度修改，仅在重建路径中，保留更丰富的潜在表示而不膨胀KL项。最后，基于互信息的交叉损失确保了行和列的一致共聚类。来自多种模态（数值、文本和基于图像的）的多样真实世界数据集上的实证结果表明，我们的方法不仅保留了先前共聚类方法的优势，而且在准确性和稳健性方面超越了它们，特别是在高维或嘈杂的环境中。

更新时间: 2025-04-08 18:02:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.04079v2

GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization

Large Language Models (LLMs) can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We address this gap with a novel architecture that reframes LLM finetuning as Gaussian process (GP) marginal likelihood optimization via deep kernel methods. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both - LLMs to provide a rich and flexible input space for Bayesian optimization and - GPs to model this space with predictive uncertainty for more efficient sampling. Applied to Buchwald-Hartwig reaction optimization, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions in just 50 optimization iterations). We also observe a 14% improvement over domain-specific representations without requiring specialized features. Extensive empirical evaluation across 19 benchmarks - ranging from general chemistry to reaction and molecular property optimization - demonstrates our method's robustness, generality, and consistent improvements across: (1) tasks, (2) LLM architectures (encoder, decoder, encoder-decoder), (3) pretraining domains (chemistry-related or general-purpose) and (4) hyperparameter settings (tuned once on a single dataset). Finally, we explain these improvements: joint LLM-GP optimization through marginal likelihood implicitly performs contrastive learning, aligning representations to produce (1) better-structured embedding spaces, (2) improved uncertainty calibration, and (3) more efficient sampling - without requiring any external loss. This work provides both practical advances in sample-efficient optimization and insights into what makes effective Bayesian optimization.

Updated: 2025-04-08 17:59:57

标题: GOLLuM：高斯过程优化的LLMs——通过贝叶斯优化重新构建LLM微调

摘要: 大型语言模型（LLMs）可以在它们的潜在空间中编码复杂关系，但是在不确定性下利用它们进行优化仍然具有挑战性。我们通过一种新颖的架构来解决这一差距，将LLM微调重新构建为通过深度核方法优化高斯过程（GP）边际似然。我们引入了基于LLM的深度核，与GP一起进行优化，以保留两者的优点 - LLM提供丰富灵活的输入空间进行贝叶斯优化，GP模拟该空间的预测不确定性以实现更高效的采样。应用于Buchwald-Hartwig反应优化，我们的方法几乎将发现高性能反应的发现率翻倍，与静态LLM嵌入相比（仅需50次优化迭代即可覆盖前5%反应的24%至43%）。我们还观察到与不需要专门特征的领域特定表示相比，有14%的改进。广泛的经验评估涵盖19个基准 - 从通用化学到反应和分子属性优化 - 显示了我们方法的稳健性、普遍性和一致的改进：（1）任务、（2）LLM架构（编码器、解码器、编码器-解码器）、（3）预训练领域（化学相关或通用性）和（4）超参数设置（在单个数据集上调整一次）。最后，我们解释了这些改进：通过边际似然联合LLM-GP优化隐性执行对比学习，使表示对齐以产生（1）更好结构化的嵌入空间、（2）改进的不确定性校准和（3）更高效的采样 - 而无需任何外部损失。这项工作在样本高效优化方面提供了实际进展，并揭示了有效贝叶斯优化的要素。

更新时间: 2025-04-08 17:59:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06265v1

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Large Language Models (LLMs) have demonstrated the ability to tackle increasingly complex tasks through advanced reasoning, long-form content generation, and tool use. Solving these tasks often involves long inference-time computations. In human problem solving, a common strategy to expedite work is collaboration: by dividing the problem into sub-tasks, exploring different strategies concurrently, etc. Recent research has shown that LLMs can also operate in parallel by implementing explicit cooperation frameworks, such as voting mechanisms or the explicit creation of independent sub-tasks that can be executed in parallel. However, each of these frameworks may not be suitable for all types of tasks, which can hinder their applicability. In this work, we propose a different design approach: we run LLM "workers" in parallel , allowing them to synchronize via a concurrently-updated attention cache and prompt these workers to decide how best to collaborate. Our approach allows the instances to come up with their own collaboration strategy for the problem at hand, all the while "seeing" each other's partial progress in the concurrent cache. We implement this approach via Hogwild! Inference: a parallel LLM inference engine where multiple instances of the same LLM run in parallel with the same attention cache, with "instant" access to each other's generated tokens. Hogwild! inference takes advantage of Rotary Position Embeddings (RoPE) to avoid recomputation while improving parallel hardware utilization. We find that modern reasoning-capable LLMs can perform inference with shared Key-Value cache out of the box, without additional fine-tuning.

Updated: 2025-04-08 17:59:41

标题: Hogwild！推理：通过并发关注生成并行LLM

摘要: 大型语言模型（LLMs）已经展示了通过先进的推理、长篇内容生成和工具使用来解决日益复杂的任务的能力。解决这些任务通常涉及长时间推理计算。在人类问题求解中，加快工作的常见策略是协作：将问题分解为子任务，同时探索不同策略等。最近的研究表明，LLMs也可以通过实现显式的合作框架（如投票机制或显式创建可并行执行的独立子任务）来并行操作。然而，这些框架中的每一个可能并不适用于所有类型的任务，这可能会限制它们的适用性。在这项工作中，我们提出了一种不同的设计方法：我们并行运行LLM“工作者”，让它们通过同时更新的注意力缓存进行同步，并促使这些工作者决定如何最好地协作。我们的方法允许实例为手头的问题制定自己的协作策略，同时在并发缓存中“看到”彼此的部分进展。我们通过Hogwild!推理实现了这一方法：一个并行的LLM推理引擎，其中多个相同LLM的实例与相同的注意力缓存并行运行，同时“即时”访问彼此生成的标记。Hogwild!推理利用旋转位置嵌入（RoPE）来避免重新计算，同时提高并行硬件利用率。我们发现，现代具有推理能力的LLMs可以直接使用共享的键-值缓存进行推理，无需额外的微调。

更新时间: 2025-04-08 17:59:41

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2504.06261v1

FEABench: Evaluating Language Models on Multiphysics Reasoning Ability

Building precise simulations of the real world and invoking numerical solvers to answer quantitative problems is an essential requirement in engineering and science. We present FEABench, a benchmark to evaluate the ability of large language models (LLMs) and LLM agents to simulate and solve physics, mathematics and engineering problems using finite element analysis (FEA). We introduce a comprehensive evaluation scheme to investigate the ability of LLMs to solve these problems end-to-end by reasoning over natural language problem descriptions and operating COMSOL Multiphysics$^\circledR$, an FEA software, to compute the answers. We additionally design a language model agent equipped with the ability to interact with the software through its Application Programming Interface (API), examine its outputs and use tools to improve its solutions over multiple iterations. Our best performing strategy generates executable API calls 88% of the time. LLMs that can successfully interact with and operate FEA software to solve problems such as those in our benchmark would push the frontiers of automation in engineering. Acquiring this capability would augment LLMs' reasoning skills with the precision of numerical solvers and advance the development of autonomous systems that can tackle complex problems in the real world. The code is available at https://github.com/google/feabench

Updated: 2025-04-08 17:59:39

标题: FEABench：评估语言模型在多物理推理能力上的表现

摘要: 构建精确模拟现实世界并调用数值求解器来回答定量问题是工程和科学中的基本要求。我们提出FEABench，一个用于评估大型语言模型（LLMs）和LLM代理模拟和解决物理、数学和工程问题的能力的基准。我们引入了一个全面的评估方案，通过对自然语言问题描述进行推理，并使用有限元分析（FEA）软件COMSOL Multiphysics$^\circledR$来计算答案，以研究LLMs解决这些问题的能力。我们另外设计了一个语言模型代理，具备通过其应用程序编程接口（API）与软件进行交互、检查其输出并使用工具改进其解决方案的能力，并在多次迭代中实现最佳表现策略，生成可执行的API调用的概率为88%。能够成功与和操作FEA软件解决类似我们基准中的问题的LLMs将推动工程自动化的前沿。获得这种能力将增强LLMs的推理能力，具备数值求解器的精确性，并推动能够解决现实世界复杂问题的自主系统的发展。代码可在https://github.com/google/feabench获取。

更新时间: 2025-04-08 17:59:39

领域: cs.AI,cs.CL,cs.NA,math.NA

下载: http://arxiv.org/abs/2504.06260v1

Fractal and Regular Geometry of Deep Neural Networks

We study the geometric properties of random neural networks by investigating the boundary volumes of their excursion sets for different activation functions, as the depth increases. More specifically, we show that, for activations which are not very regular (e.g., the Heaviside step function), the boundary volumes exhibit fractal behavior, with their Hausdorff dimension monotonically increasing with the depth. On the other hand, for activations which are more regular (e.g., ReLU, logistic and $\tanh$), as the depth increases, the expected boundary volumes can either converge to zero, remain constant or diverge exponentially, depending on a single spectral parameter which can be easily computed. Our theoretical results are confirmed in some numerical experiments based on Monte Carlo simulations.

Updated: 2025-04-08 17:56:05

标题: 深度神经网络的分形和规则几何特征

摘要: 我们通过研究边界体积来调查不同激活函数的随机神经网络的几何特性，随着深度的增加。更具体地，我们发现，对于不太规则的激活函数（例如，Heaviside阶跃函数），边界体积表现出分形行为，其Hausdorff维数随深度单调增加。另一方面，对于更规则的激活函数（例如，ReLU、logistic和$\tanh$），随着深度的增加，预期的边界体积可能收敛于零，保持恒定或呈指数增长，取决于一个可以轻松计算的单一谱参数。我们的理论结果在基于蒙特卡洛模拟的一些数值实验中得到验证。

更新时间: 2025-04-08 17:56:05

领域: math.PR,cs.LG,stat.ML,60G60, 62B10, 62M45, 68T07

下载: http://arxiv.org/abs/2504.06250v1

Stacking Variational Bayesian Monte Carlo

Variational Bayesian Monte Carlo (VBMC) is a sample-efficient method for approximate Bayesian inference with computationally expensive likelihoods. While VBMC's local surrogate approach provides stable approximations, its conservative exploration strategy and limited evaluation budget can cause it to miss regions of complex posteriors. In this work, we introduce Stacking Variational Bayesian Monte Carlo (S-VBMC), a method that constructs global posterior approximations by merging independent VBMC runs through a principled and inexpensive post-processing step. Our approach leverages VBMC's mixture posterior representation and per-component evidence estimates, requiring no additional likelihood evaluations while being naturally parallelizable. We demonstrate S-VBMC's effectiveness on two synthetic problems designed to challenge VBMC's exploration capabilities and two real-world applications from computational neuroscience, showing substantial improvements in posterior approximation quality across all cases.

Updated: 2025-04-08 17:56:04

标题: 堆叠变分贝叶斯蒙特卡洛

摘要: Variational Bayesian Monte Carlo（VBMC）是一种用于近似贝叶斯推断的高效方法，适用于计算昂贵的似然函数。尽管VBMC的局部替代方法提供了稳定的近似，但其保守的探索策略和有限的评估预算可能导致其错过复杂后验的区域。在本文中，我们介绍了堆叠变分贝叶斯蒙特卡洛（S-VBMC），这是一种通过一个基于原则且廉价的后处理步骤将独立的VBMC运行合并以构建全局后验近似的方法。我们的方法利用VBMC的混合后验表示和每个组件的证据估计，无需额外的似然函数评估，同时天然可并行化。我们在两个旨在挑战VBMC探索能力的合成问题和两个来自计算神经科学的真实应用上展示了S-VBMC的有效性，显示在所有情况下后验近似质量的显著改善。

更新时间: 2025-04-08 17:56:04

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.05004v2

Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge

Sch\"odinger bridge (SB) has evolved into a universal class of probabilistic generative models. In practice, however, estimated learning signals are often uncertain, and the reliability promised by existing methods is often based on speculative optimal-case scenarios. Recent studies regarding the Sinkhorn algorithm through mirror descent (MD) have gained attention, revealing geometric insights into solution acquisition of the SB problems. In this paper, we propose a variational online MD (OMD) framework for the SB problems, which provides further stability to SB solvers. We formally prove convergence and a regret bound for the novel OMD formulation of SB acquisition. As a result, we propose a simulation-free SB algorithm called Variational Mirrored Schr\"odinger Bridge (VMSB) by utilizing the Wasserstein-Fisher-Rao geometry of the Gaussian mixture parameterization for Schr\"odinger potentials. Based on the Wasserstein gradient flow theory, the algorithm offers tractable learning dynamics that precisely approximate each OMD step. In experiments, we validate the performance of the proposed VMSB algorithm across an extensive suite of benchmarks. VMSB consistently outperforms contemporary SB solvers on a range of SB problems, demonstrating the robustness predicted by our theory.

Updated: 2025-04-08 17:49:16

标题: 变分在线镜像下降用于Schrödinger桥中的稳健学习

摘要: Schr\"odinger桥（SB）已经发展成为一类通用的概率生成模型。然而，在实践中，估计的学习信号往往是不确定的，现有方法所承诺的可靠性往往基于推测性的最佳情况。最近关于Sinkhorn算法通过镜像下降（MD）的研究引起了关注，揭示了SB问题解决方案获取的几何洞见。本文提出了一种用于SB问题的变分在线MD（OMD）框架，为SB求解器提供进一步的稳定性。我们正式证明了SB获取的新型OMD公式的收敛性和遗憾界。因此，我们提出了一种无模拟的SB算法，称为变分镜像Schr\"odinger桥（VMSB），通过利用Gaussian混合参数化的Wasserstein-Fisher-Rao几何来逼近Schr\"odinger势能。基于Wasserstein梯度流理论，该算法提供了可处理的学习动态，精确逼近每个OMD步骤。在实验中，我们验证了所提出的VMSB算法在广泛的基准测试中的性能。VMSB在一系列SB问题上始终优于当代SB求解器，展示了我们理论所预测的鲁棒性。

更新时间: 2025-04-08 17:49:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.02618v2

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Models are available on HuggingFace at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4 and project website is https://apigen-mt.github.io

Updated: 2025-04-08 17:46:44

标题: APIGen-MT：通过模拟代理-人类互动生成多轮数据的代理管道

摘要: 训练有效的人工智能代理程序以进行多轮交互需要捕捉现实人-代理动态的高质量数据，然而这种数据很少且手动收集成本高昂。我们介绍了APIGen-MT，这是一个生成可验证且多样化的多轮代理数据的两阶段框架。在第一阶段，我们的代理管道通过LLM评审委员会和迭代反馈循环生成详细的任务蓝图，其中包括地面真实的行动。然后，这些蓝图通过模拟的人-代理互动转化为完整的交互轨迹。我们训练了一个系列模型--xLAM-2-fc-r系列，参数范围从10亿到700亿。我们的模型在$\tau$-bench和BFCL基准测试中表现优于前沿模型，如GPT-4o和Claude 3.5，较小模型在特别是多轮设置中超越了较大的模型，同时在多次试验中保持了卓越的一致性。全面的实验表明，我们的验证蓝图到细节方法产生了高质量的训练数据，促进了更可靠、高效和能力更强的代理的发展。我们开源了收集的合成数据和训练的xLAM-2-fc-r模型，以推动人工智能代理研究。模型可在HuggingFace上获得，项目网站为https://apigen-mt.github.io。

更新时间: 2025-04-08 17:46:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03601v2

A Case for Network-wide Orchestration of Host-based Intrusion Detection and Response

Recent cyber incidents and the push for zero trust security underscore the necessity of monitoring host-level events. However, current host-level intrusion detection systems (IDS) lack the ability to correlate alerts and coordinate a network-wide response in real time. Motivated by advances in system-level extensions free of rebooting and network-wide orchestration of host actions, we propose using a central IDS orchestrator to remotely program the logic of each host IDS and collect the alerts generated in real time. In this paper, we make arguments for such a system concept and provide a high level design of the main system components. Furthermore, we have developed a system prototype and evaluated it using two experimental scenarios rooted from real-world attacks. The evaluation results show that the host-based IDS orchestration system is able to defend against the attacks effectively.

Updated: 2025-04-08 17:41:04

标题: 一个关于网络范围内对基于主机的入侵检测和响应进行编排的案例

摘要: 最近的网络事件和对零信任安全的推动强调了监控主机级事件的必要性。然而，当前的主机级入侵检测系统（IDS）缺乏能力在实时协调网络范围内响应的能力。受到系统级扩展无需重新启动和主机动作的网络范围编排的推动，我们提出使用一个中央IDS编排器远程编程每个主机IDS的逻辑，并实时收集生成的警报。在本文中，我们为这样的系统概念提供论据，并提供主要系统组件的高级设计。此外，我们已经开发了一个系统原型，并使用两个源自真实攻击的实验场景对其进行评估。评估结果表明，基于主机的IDS编排系统能够有效地抵御攻击。

更新时间: 2025-04-08 17:41:04

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2504.06241v1

Decentralized Federated Domain Generalization with Style Sharing: A Formal Modeling and Convergence Analysis

Much of the federated learning (FL) literature focuses on settings where local dataset statistics remain the same between training and testing time. Recent advances in domain generalization (DG) aim to use data from source (training) domains to train a model that generalizes well to data from unseen target (testing) domains. In this paper, we are motivated by two major gaps in existing work on FL and DG: (1) the lack of formal mathematical analysis of DG objectives and training processes; and (2) DG research in FL being limited to the conventional star-topology architecture. Addressing the second gap, we develop $\textit{Decentralized Federated Domain Generalization with Style Sharing}$ ($\texttt{StyleDDG}$), a fully decentralized DG algorithm designed to allow devices in a peer-to-peer network to achieve DG based on sharing style information inferred from their datasets. Additionally, we fill the first gap by providing the first systematic approach to mathematically analyzing style-based DG training optimization. We cast existing centralized DG algorithms within our framework, and employ their formalisms to model $\texttt{StyleDDG}$. Based on this, we obtain analytical conditions under which a sub-linear convergence rate of $\texttt{StyleDDG}$ can be obtained. Through experiments on two popular DG datasets, we demonstrate that $\texttt{StyleDDG}$ can obtain significant improvements in accuracy across target domains with minimal added communication overhead compared to decentralized gradient methods that do not employ style sharing.

Updated: 2025-04-08 17:32:56

标题: 去中心化联邦域泛化与风格共享：形式建模与收敛分析

摘要: 大部分联邦学习（FL）文献关注的是本地数据集统计在训练和测试时保持不变的情况。最近在领域泛化（DG）方面取得的进展旨在利用来自源（训练）域的数据训练模型，以便很好地泛化到来自未见过的目标（测试）域的数据。本文针对现有FL和DG工作中存在的两个主要差距进行了研究：（1）缺乏对DG目标和训练过程进行形式化数学分析；以及（2）FL中的DG研究局限于传统的星形拓扑结构。针对第二个差距，我们开发了一种完全去中心化的DG算法，称为“具有风格共享的分散式联邦领域泛化”（StyleDDG），旨在允许对等网络中的设备通过共享从其数据集中推断出的风格信息来实现DG。此外，我们通过提供首个系统化方法来对基于风格的DG训练优化进行数学分析，填补了第一个差距。我们将现有的集中式DG算法置于我们的框架中，并利用它们的形式化方法来建模StyleDDG。基于此，我们获得了StyleDDG可以实现次线性收敛速率的分析条件。通过对两个流行的DG数据集进行实验，我们证明StyleDDG可以在目标领域的准确性上取得显著改进，而与不使用风格共享的去中心化梯度方法相比，通信开销最小。

更新时间: 2025-04-08 17:32:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06235v1

Privacy Attacks on Image AutoRegressive Models

Image autoregressive generation has emerged as a powerful new paradigm, with image autoregressive models (IARs) matching state-of-the-art diffusion models (DMs) in image quality (FID: 1.48 vs. 1.58) while allowing for higher generation speed. However, the privacy risks associated with IARs remain unexplored, raising concerns about their responsible deployment. To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to those of DMs as a reference point. Specifically, we develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images, with a True Positive Rate at False Positive Rate = 1% (TPR@FPR=1%) of 86.38%, compared to just 6.38% for DMs using comparable attacks. We leverage our novel MIA to perform dataset inference (DI) for IARs and show that it requires as few as 6 samples to detect dataset membership, compared to 200 samples for DI in DMs. This confirms a higher level of information leakage in IARs. Finally, we are able to extract hundreds of training data points from an IAR (e.g., 698 from VAR-d30). Our results suggest a fundamental privacy-utility trade-off: while IARs excel in image generation quality and speed, they are empirically significantly more vulnerable to privacy attacks compared to DMs that achieve similar performance. This trend suggests that incorporating techniques from DMs into IARs, such as modeling the per-token probability distribution using a diffusion procedure, could help mitigate IARs' vulnerability to privacy attacks. We make our code available at: https://github.com/sprintml/privacy_attacks_against_iars

Updated: 2025-04-08 17:28:09

标题: 基于图像自回归模型的隐私攻击

摘要: 图像自回归生成已经成为一种强大的新范式，图像自回归模型（IARs）与最先进的扩散模型（DMs）在图像质量上匹配（FID：1.48 vs. 1.58），同时允许更高的生成速度。然而，与IARs相关的隐私风险仍未被探索，引发对其负责任部署的担忧。为了填补这一空白，我们对IARs进行了全面的隐私分析，将它们的隐私风险与DMs作为参考点进行比较。具体地，我们开发了一种新颖的成员推理攻击（MIA），在检测训练图像方面取得了非常高的成功率，当False Positive Rate = 1%（TPR@FPR=1%）时，True Positive Rate为86.38%，而使用类似攻击的DMs仅为6.38%。我们利用我们的新颖MIA进行了IARs的数据集推理（DI），并展示出只需要6个样本就能检测到数据集成员身份，而DMs则需要200个样本进行DI。这证实了IARs中信息泄露的水平更高。最后，我们能够从IARs中提取数百个训练数据点（例如，从VAR-d30中提取了698个）。我们的结果表明了一种基本的隐私-效用权衡：虽然IARs在图像生成质量和速度方面表现出色，但在实证上相比实现类似性能的DMs，它们更容易受到隐私攻击的影响。这一趋势表明，将来自DMs的技术纳入IARs中，例如使用扩散程序建模每个标记的概率分布，可能有助于减轻IARs对隐私攻击的脆弱性。我们提供我们的代码链接：https://github.com/sprintml/privacy_attacks_against_iars

更新时间: 2025-04-08 17:28:09

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.02514v2

Modeling Challenging Patient Interactions: LLMs for Medical Communication Training

Effective patient communication is pivotal in healthcare, yet traditional medical training often lacks exposure to diverse, challenging interpersonal dynamics. To bridge this gap, this study proposes the use of Large Language Models (LLMs) to simulate authentic patient communication styles, specifically the "accuser" and "rationalizer" personas derived from the Satir model, while also ensuring multilingual applicability to accommodate diverse cultural contexts and enhance accessibility for medical professionals. Leveraging advanced prompt engineering, including behavioral prompts, author's notes, and stubbornness mechanisms, we developed virtual patients (VPs) that embody nuanced emotional and conversational traits. Medical professionals evaluated these VPs, rating their authenticity (accuser: $3.8 \pm 1.0$; rationalizer: $3.7 \pm 0.8$ on a 5-point Likert scale (from one to five)) and correctly identifying their styles. Emotion analysis revealed distinct profiles: the accuser exhibited pain, anger, and distress, while the rationalizer displayed contemplation and calmness, aligning with predefined, detailed patient description including medical history. Sentiment scores (on a scale from zero to nine) further validated these differences in the communication styles, with the accuser adopting negative ($3.1 \pm 0.6$) and the rationalizer more neutral ($4.0 \pm 0.4$) tone. These results underscore LLMs' capability to replicate complex communication styles, offering transformative potential for medical education. This approach equips trainees to navigate challenging clinical scenarios by providing realistic, adaptable patient interactions, enhancing empathy and diagnostic acumen. Our findings advocate for AI-driven tools as scalable, cost-effective solutions to cultivate nuanced communication skills, setting a foundation for future innovations in healthcare training.

Updated: 2025-04-08 17:25:48

标题: 建模具有挑战性的患者互动：LLMs用于医疗沟通培训

摘要: 有效的患者沟通在医疗保健中至关重要，然而传统的医学培训往往缺乏对多样化、具有挑战性的人际动态的接触。为了弥补这一差距，本研究提出利用大型语言模型（LLMs）模拟真实患者沟通风格，特别是源自Satir模型的“控诉者”和“理性化者”人设，同时确保多语言适用性，以适应不同文化背景，并增强医疗专业人士的可访问性。通过利用先进的提示工程，包括行为提示、作者注释和固执机制，我们开发了体现微妙情感和对话特征的虚拟患者（VPs）。医疗专业人士评估了这些VPs，根据五分Likert量表（从一到五）对其真实性进行评分（控诉者：$3.8 \pm 1.0$；理性化者：$3.7 \pm 0.8$），并正确识别了它们的风格。情绪分析显示了明显的特征：控诉者表现出疼痛、愤怒和苦恼，而理性化者展现出思考和冷静，与预定义的详细患者描述（包括病史）一致。情感分数（在零到九的范围内）进一步验证了这些沟通风格的差异，控诉者采用了负面的（$3.1 \pm 0.6$）而理性化者更为中性的（$4.0 \pm 0.4$）语调。这些结果突显了LLMs复制复杂沟通风格的能力，为医学教育提供了转变潜力。这种方法使实习生能够通过提供现实、可适应的患者互动来应对具有挑战性的临床场景，增强同理心和诊断能力。我们的发现倡导利用AI驱动的工具作为成本效益高、可扩展的解决方案，以培养微妙的沟通技能，为未来医疗保健培训的创新奠定基础。

更新时间: 2025-04-08 17:25:48

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.22250v2

Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation

While decoder-only large language models (LLMs) have shown impressive results, encoder-decoder models are still widely adopted in real-world applications for their inference efficiency and richer encoder representation. In this paper, we study a novel problem: adapting pretrained decoder-only LLMs to encoder-decoder, with the goal of leveraging the strengths of both approaches to achieve a more favorable quality-efficiency trade-off. We argue that adaptation not only enables inheriting the capability of decoder-only LLMs but also reduces the demand for computation compared to pretraining from scratch. We rigorously explore different pretraining objectives and parameter initialization/optimization techniques. Through extensive experiments based on Gemma 2 (2B and 9B) and a suite of newly pretrained mT5-sized models (up to 1.6B), we demonstrate the effectiveness of adaptation and the advantage of encoder-decoder LLMs. Under similar inference budget, encoder-decoder LLMs achieve comparable (often better) pretraining performance but substantially better finetuning performance than their decoder-only counterpart. For example, Gemma 2B-2B outperforms Gemma 2B by $\sim$7\% after instruction tuning. Encoder-decoder adaptation also allows for flexible combination of different-sized models, where Gemma 9B-2B significantly surpasses Gemma 2B-2B by $>$3\%. The adapted encoder representation also yields better results on SuperGLUE. We will release our checkpoints to facilitate future research.

Updated: 2025-04-08 17:13:41

标题: 编码器-解码器 Gemma：通过适应性改善质量-效率平衡

摘要: 尽管仅解码器的大型语言模型（LLMs）展现了令人印象深刻的结果，编码器-解码器模型仍然被广泛采用于实际应用中，因为其推理效率和更丰富的编码器表示。本文研究了一个新颖的问题：将预训练的仅解码器LLMs调整为编码器-解码器，旨在利用两种方法的优势，实现更有利的质量-效率折衷。我们认为，调整不仅可以继承仅解码器LLMs的能力，还可以减少与从头开始预训练相比的计算需求。我们严谨地探讨了不同的预训练目标和参数初始化/优化技术。通过基于Gemma 2（2B和9B）和一系列新预训练的mT5尺寸模型（高达1.6B）的广泛实验，我们展示了调整的有效性和编码器-解码器LLMs的优势。在类似的推理预算下，编码器-解码器LLMs实现了可比（通常更好）的预训练性能，但比仅解码器LLMs具有显著更好的微调性能。例如，在调整指导后，Gemma 2B-2B的表现优于Gemma 2B约7％。编码器-解码器调整还允许灵活组合不同尺寸的模型，其中Gemma 9B-2B明显优于Gemma 2B-2B超过3％。调整后的编码器表示还在SuperGLUE上取得了更好的结果。我们将发布我们的检查点以促进未来研究。

更新时间: 2025-04-08 17:13:41

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06225v1

Evaluating the Fitness of Ontologies for the Task of Question Generation

Ontology-based question generation is an important application of semantic-aware systems that enables the creation of large question banks for diverse learning environments. The effectiveness of these systems, both in terms of the calibre and cognitive difficulty of the resulting questions, depends heavily on the quality and modelling approach of the underlying ontologies, making it crucial to assess their fitness for this task. To date, there has been no comprehensive investigation into the specific ontology aspects or characteristics that affect the question generation process. Therefore, this paper proposes a set of requirements and task-specific metrics for evaluating the fitness of ontologies for question generation tasks in pedagogical settings. Using the ROMEO methodology, a structured framework for deriving task-specific metrics, an expert-based approach is employed to assess the performance of various ontologies in Automatic Question Generation (AQG) tasks, which is then evaluated over a set of ontologies. Our results demonstrate that ontology characteristics significantly impact the effectiveness of question generation, with different ontologies exhibiting varying performance levels. This highlights the importance of assessing ontology quality with respect to AQG tasks.

Updated: 2025-04-08 17:10:04

标题: 评估本体论对问题生成任务的适应性

摘要: 基于本体的问题生成是语义感知系统的重要应用，它使得可以为不同学习环境创建大量问题库。这些系统的有效性，无论是问题的质量还是认知难度，都严重依赖于基础本体的质量和建模方法，因此评估它们对于这一任务的适应性至关重要。迄今为止，还没有对影响问题生成过程的具体本体方面或特征进行全面调查。因此，本文提出了一组要求和任务特定的指标，用于评估本体在教学环境中问题生成任务的适应性。利用ROMEO方法论，一个用于推导任务特定指标的结构化框架，采用基于专家的方法来评估各种本体在自动问题生成（AQG）任务中的性能，然后在一组本体上进行评估。我们的结果表明，本体特征显著影响问题生成的有效性，不同本体表现出不同的性能水平。这突显了在AQG任务方面评估本体质量的重要性。

更新时间: 2025-04-08 17:10:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.07994v1

Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios

While chain-of-thought (CoT) prompting improves reasoning in large language models, its effectiveness in vision-language models (VLMs) remains limited due to over-reliance on textual cues and memorized knowledge. To investigate the visual reasoning capabilities of VLMs in complex real-world scenarios, we introduce DrivingVQA, a visual question answering dataset derived from driving theory exams, which contains 3,931 multiple-choice problems with expert-written explanations and grounded entities relevant to the reasoning process. Leveraging this dataset, we propose RIV-CoT, a Retrieval-Based Interleaved Visual Chain-of-Thought method that enables VLMs to reason using visual crops corresponding to these relevant entities. Our experiments demonstrate that RIV-CoT improves answer accuracy by 3.1% and reasoning accuracy by 4.6% over vanilla CoT prompting. Furthermore, we demonstrate that our method effectively scales to the larger A-OKVQA reasoning dataset by leveraging automatically generated pseudo-labels, outperforming CoT prompting.

Updated: 2025-04-08 17:09:59

标题: 现实世界驾驶场景中的检索式交错视觉思维链。

摘要: 尽管思维链（CoT）提示改进了大型语言模型的推理能力，但由于过度依赖文本线索和记忆知识，它在视觉语言模型（VLMs）中的有效性仍然有限。为了研究VLMs在复杂的现实场景中的视觉推理能力，我们引入了DrivingVQA，这是一个源自驾驶理论考试的视觉问答数据集，包含3,931个多项选择问题，配有专家撰写的解释和与推理过程相关的实体。利用这个数据集，我们提出了RIV-CoT，一种基于检索的交错视觉思维方法，使VLMs能够使用与这些相关实体相对应的视觉裁剪进行推理。我们的实验证明，与普通CoT提示相比，RIV-CoT提高了3.1%的答案准确性和4.6%的推理准确性。此外，我们证明了我们的方法通过利用自动生成的伪标签有效地扩展到更大的A-OKVQA推理数据集，胜过了CoT提示。

更新时间: 2025-04-08 17:09:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2501.04671v2

GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

Recent advancements in machine learning have significantly improved the identification of disease-associated genes from gene expression datasets. However, these processes often require extensive expertise and manual effort, limiting their scalability. Large Language Model (LLM)-based agents have shown promise in automating these tasks due to their increasing problem-solving abilities. To support the evaluation and development of such methods, we introduce GenoTEX, a benchmark dataset for the automated analysis of gene expression data. GenoTEX provides analysis code and results for solving a wide range of gene-trait association problems, encompassing dataset selection, preprocessing, and statistical analysis, in a pipeline that follows computational genomics standards. The benchmark includes expert-curated annotations from bioinformaticians to ensure accuracy and reliability. To provide baselines for these tasks, we present GenoAgent, a team of LLM-based agents that adopt a multi-step programming workflow with flexible self-correction, to collaboratively analyze gene expression datasets. Our experiments demonstrate the potential of LLM-based methods in analyzing genomic data, while error analysis highlights the challenges and areas for future improvement. We propose GenoTEX as a promising resource for benchmarking and enhancing automated methods for gene expression data analysis. The benchmark is available at https://github.com/Liu-Hy/GenoTEX.

Updated: 2025-04-08 17:09:04

标题: GenoTEX：用于自动基因表达数据分析的LLM代理基准

摘要: 最近机器学习的进展显著提高了从基因表达数据集中识别与疾病相关的基因。然而，这些过程通常需要广泛的专业知识和手动努力，限制了其可扩展性。基于大型语言模型（LLM）的代理已经显示出自动化这些任务的潜力，因为它们具有越来越强的问题解决能力。为了支持这些方法的评估和发展，我们引入了GenoTEX，这是一个用于基因表达数据自动分析的基准数据集。GenoTEX提供了用于解决各种基因-特征关联问题的分析代码和结果，包括数据集选择、预处理和统计分析，这些都遵循计算基因组学的标准流程。该基准数据集包括来自生物信息学家的专家策划注释，以确保准确性和可靠性。为了为这些任务提供基线，我们提出了GenoAgent，这是一个采用多步编程工作流程的LLM代理团队，具有灵活的自我纠正功能，共同分析基因表达数据集。我们的实验展示了LLM方法在分析基因组数据方面的潜力，而错误分析突出了未来改进的挑战和领域。我们提议GenoTEX作为一个有前景的资源，用于基因表达数据分析的基准测试和增强自动化方法。该基准数据集可在https://github.com/Liu-Hy/GenoTEX 上获得。

更新时间: 2025-04-08 17:09:04

领域: cs.LG,cs.AI,q-bio.GN

下载: http://arxiv.org/abs/2406.15341v3

AUTALIC: A Dataset for Anti-AUTistic Ableist Language In Context

As our understanding of autism and ableism continues to increase, so does our understanding of ableist language towards autistic people. Such language poses a significant challenge in NLP research due to its subtle and context-dependent nature. Yet, detecting anti-autistic ableist language remains underexplored, with existing NLP tools often failing to capture its nuanced expressions. We present AUTALIC, the first benchmark dataset dedicated to the detection of anti-autistic ableist language in context, addressing a significant gap in the field. The dataset comprises 2,400 autism-related sentences collected from Reddit, accompanied by surrounding context, and is annotated by trained experts with backgrounds in neurodiversity. Our comprehensive evaluation reveals that current language models, including state-of-the-art LLMs, struggle to reliably identify anti-autistic ableism and align with human judgments, underscoring their limitations in this domain. We publicly release AUTALIC along with the individual annotations which serve as a valuable resource to researchers working on ableism, neurodiversity, and also studying disagreements in annotation tasks. This dataset serves as a crucial step towards developing more inclusive and context-aware NLP systems that better reflect diverse perspectives.

Updated: 2025-04-08 17:08:26

标题: AUTALIC：一份用于反对自闭症歧视语言的上下文数据集

摘要: 随着我们对自闭症和肤色异能主义的理解不断增加，我们对自闭症人群的肤色异能主义语言也有了更深入的认识。这种语言在自然语言处理研究中构成了重要挑战，因为它具有微妙且依赖上下文的特性。然而，检测反自闭症肤色异能主义语言仍未得到充分探讨，现有的自然语言处理工具往往无法捕捉其微妙的表达。我们提出AUTALIC，这是第一个专门用于检测上下文中的反自闭症肤色异能主义语言的基准数据集，填补了领域中的重要空白。该数据集包括从Reddit收集的2,400个与自闭症相关的句子，附带周围的上下文，并由具有神经多样性背景的专业人员进行注释。我们的综合评估显示，当前的语言模型，包括最先进的LLMs，在可靠地识别反自闭症肤色异能主义方面表现不佳，并与人类判断不一致，突显了它们在这一领域的局限性。我们公开发布AUTALIC以及个别注释，这些注释可为研究肤色异能主义、神经多样性以及研究注释任务中的分歧的研究人员提供宝贵资源。该数据集是向开发更具包容性和上下文感知的自然语言处理系统迈出的关键一步，以更好地反映多样化的观点。

更新时间: 2025-04-08 17:08:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.16520v3

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

The increasing adoption of web crawling opt-outs by copyright holders of online content raises critical questions about the impact of data compliance on large language model (LLM) performance. However, little is known about how these restrictions (and the resultant filtering of pretraining datasets) affect the capabilities of models trained using these corpora. In this work, we conceptualize this effect as the $\textit{data compliance gap}$ (DCG), which quantifies the performance difference between models trained on datasets that comply with web crawling opt-outs, and those that do not. We measure the data compliance gap in two settings: pretraining models from scratch and continual pretraining from existing compliant models (simulating a setting where copyrighted data could be integrated later in pretraining). Our experiments with 1.5B models show that, as of January 2025, compliance with web data opt-outs does not degrade general knowledge acquisition (close to 0\% DCG). However, in specialized domains such as biomedical research, excluding major publishers leads to performance declines. These findings suggest that while general-purpose LLMs can be trained to perform equally well using fully open data, performance in specialized domains may benefit from access to high-quality copyrighted sources later in training. Our study provides empirical insights into the long-debated trade-off between data compliance and downstream model performance, informing future discussions on AI training practices and policy decisions.

Updated: 2025-04-08 17:08:06

标题: “高效的LLMs是否能够符合道德标准？量化网络爬虫选择退出的影响”

摘要: 网络爬虫退出协议越来越被在线内容的版权持有者采纳，这引发了有关数据合规对大型语言模型（LLM）性能影响的关键问题。然而，关于这些限制（以及随之产生的预训练数据集过滤）如何影响使用这些语料库训练的模型的能力，人们知之甚少。在这项研究中，我们将这种影响概念化为“数据合规差距”（DCG），它量化了在遵守网络爬虫退出协议的数据集上训练的模型与未遵守协议的模型之间的性能差异。我们在两种设置中测量数据合规差距：从头开始预训练模型和从现有合规模型进行持续预训练（模拟版权数据稍后可能整合到预训练中的情景）。我们的实验表明，截至2025年1月，遵守网络数据退出协议并不会降低一般知识获取（DCG接近0％）。然而，在生物医学研究等专业领域，排除主要出版商会导致性能下降。这些发现表明，虽然通用性LLM可以通过完全开放的数据进行训练，但在专业领域中，访问高质量的版权来源可能有助于后续训练中的性能。我们的研究为长期争论的数据合规与下游模型性能之间的权衡提供了实证见解，为未来关于AI训练实践和政策决策的讨论提供了信息。

更新时间: 2025-04-08 17:08:06

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06219v1

A score-based particle method for homogeneous Landau equation

We propose a novel score-based particle method for solving the Landau equation in plasmas, that seamlessly integrates learning with structure-preserving particle methods [arXiv:1910.03080]. Building upon the Lagrangian viewpoint of the Landau equation, a central challenge stems from the nonlinear dependence of the velocity field on the density. Our primary innovation lies in recognizing that this nonlinearity is in the form of the score function, which can be approximated dynamically via techniques from score-matching. The resulting method inherits the conservation properties of the deterministic particle method while sidestepping the necessity for kernel density estimation in [arXiv:1910.03080]. This streamlines computation and enhances scalability with dimensionality. Furthermore, we provide a theoretical estimate by demonstrating that the KL divergence between our approximation and the true solution can be effectively controlled by the score-matching loss. Additionally, by adopting the flow map viewpoint, we derive an update formula for exact density computation. Extensive examples have been provided to show the efficiency of the method, including a physically relevant case of Coulomb interaction.

Updated: 2025-04-08 17:00:36

标题: 一种基于得分的粒子方法用于齐次Landau方程

摘要: 我们提出了一种新颖的基于分数的粒子方法，用于解决等离子体中的Landau方程，它无缝地将学习与保持结构的粒子方法相结合[arXiv:1910.03080]。借鉴Landau方程的拉格朗日视角，一个主要挑战源于速度场对密度的非线性依赖。我们的主要创新在于认识到这种非线性是得分函数的形式，可以通过得分匹配技术动态地近似。由此产生的方法继承了确定性粒子方法的保守性质，同时避免了在[arXiv:1910.03080]中核密度估计的必要性。这简化了计算过程，并提高了与维度相关的可扩展性。此外，我们通过展示我们的近似和真实解之间的KL散度可以通过得分匹配损失有效地控制来提供了一个理论估计。此外，通过采用流映射视角，我们推导出了用于精确密度计算的更新公式。我们提供了大量示例来展示该方法的效率，包括库仑相互作用的一个物理相关案例。

更新时间: 2025-04-08 17:00:36

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2405.05187v2

From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models

Long-context capabilities are essential for a wide range of applications, including document and video understanding, in-context learning, and inference-time scaling, all of which require models to process and reason over long sequences of text and multimodal data. In this work, we introduce a efficient training recipe for building ultra-long context LLMs from aligned instruct model, pushing the boundaries of context lengths from 128K to 1M, 2M, and 4M tokens. Our approach leverages efficient continued pretraining strategies to extend the context window and employs effective instruction tuning to maintain the instruction-following and reasoning abilities. Our UltraLong-8B, built on Llama3.1-Instruct with our recipe, achieves state-of-the-art performance across a diverse set of long-context benchmarks. Importantly, models trained with our approach maintain competitive performance on standard benchmarks, demonstrating balanced improvements for both long and short context tasks. We further provide an in-depth analysis of key design choices, highlighting the impacts of scaling strategies and data composition. Our findings establish a robust framework for efficiently scaling context lengths while preserving general model capabilities. We release all model weights at: https://ultralong.github.io/.

Updated: 2025-04-08 16:58:58

标题: 从128K到4M：超长上下文大语言模型的高效训练

摘要: 长上下文能力对于各种应用至关重要，包括文档和视频理解、上下文学习和推理时的扩展，所有这些都需要模型处理和推理长序列的文本和多模态数据。在这项工作中，我们介绍了一种有效的训练配方，用于从对齐的指导模型构建超长上下文LLMs，将上下文长度从128K扩展到1M、2M和4M个标记。我们的方法利用高效的持续预训练策略来扩展上下文窗口，并采用有效的指导调整来维持遵循指导和推理能力。我们的UltraLong-8B，基于Llama3.1-Instruct与我们的配方构建，实现了在各种长上下文基准测试中的最先进性能。重要的是，使用我们方法训练的模型在标准基准测试中保持竞争性能，展示了长短上下文任务的平衡改进。我们进一步提供了对关键设计选择的深入分析，突出了扩展策略和数据组成的影响。我们的发现建立了一个强大的框架，可以有效地扩展上下文长度，同时保留一般模型能力。我们在https://ultralong.github.io/发布所有模型权重。

更新时间: 2025-04-08 16:58:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06214v1

Federated Automated Feature Engineering

Automated feature engineering (AutoFE) is used to automatically create new features from original features to improve predictive performance without needing significant human intervention and domain expertise. Many algorithms exist for AutoFE, but very few approaches exist for the federated learning (FL) setting where data is gathered across many clients and is not shared between clients or a central server. We introduce AutoFE algorithms for the horizontal, vertical, and hybrid FL settings, which differ in how the data is gathered across clients. To the best of our knowledge, we are the first to develop AutoFE algorithms for the horizontal and hybrid FL cases, and we show that the downstream test scores of our federated AutoFE algorithms is close in performance to the case where data is held centrally and AutoFE is performed centrally.

Updated: 2025-04-08 16:57:48

标题: 联邦化自动特征工程

摘要: 自动特征工程（AutoFE）用于自动从原始特征中创建新特征，以提高预测性能，而无需大量人工干预和领域专业知识。存在许多用于AutoFE的算法，但在数据横跨许多客户端且不在客户端或中央服务器之间共享的联邦学习（FL）设置中，很少有方法存在。我们介绍了用于水平、垂直和混合FL设置的AutoFE算法，这些设置在客户端之间如何收集数据方面有所不同。据我们所知，我们是第一个为水平和混合FL情况开发AutoFE算法的团队，并且我们展示了我们的联邦AutoFE算法的下游测试分数与数据集中保存数据和AutoFE在中心执行时性能接近的情况。

更新时间: 2025-04-08 16:57:48

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2412.04404v2

NNN: Next-Generation Neural Networks for Marketing Mix Modeling

We present NNN, a Transformer-based neural network approach to Marketing Mix Modeling (MMM) designed to address key limitations of traditional methods. Unlike conventional MMMs which rely on scalar inputs and parametric decay functions, NNN uses rich embeddings to capture both quantitative and qualitative aspects of marketing and organic channels (e.g., search queries, ad creatives). This, combined with its attention mechanism, enables NNN to model complex interactions, capture long-term effects, and potentially improve sales attribution accuracy. We show that L1 regularization permits the use of such expressive models in typical data-constrained settings. Evaluating NNN on simulated and real-world data demonstrates its efficacy, particularly through considerable improvement in predictive power. Beyond attribution, NNN provides valuable, complementary insights through model probing, such as evaluating keyword or creative effectiveness, enhancing model interpretability.

Updated: 2025-04-08 16:57:11

标题: NNN: 下一代神经网络用于市场组合建模

摘要: 我们提出了NNN，这是一种基于Transformer的神经网络方法，用于解决传统方法的关键局限性。与依赖标量输入和参数衰减函数的传统MMM不同，NNN使用丰富的嵌入来捕捉营销和有机渠道的定量和定性方面（例如搜索查询、广告创意）。结合其注意机制，NNN能够模拟复杂的交互作用，捕捉长期效应，并可能提高销售归因的准确性。我们表明L1正则化允许在典型的数据受限设置中使用这种表达模型。通过对模拟和真实世界数据进行评估，证明了NNN的功效，特别是通过预测能力的显著提升。除了归因之外，NNN通过模型探测提供有价值的、互补的见解，例如评估关键词或创意的有效性，增强模型的可解释性。

更新时间: 2025-04-08 16:57:11

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2504.06212v1

Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs

Zero-Knowledge Proofs (ZKPs) are rapidly gaining importance in privacy-preserving and verifiable computing. ZKPs enable a proving party to prove the truth of a statement to a verifying party without revealing anything else. ZKPs have applications in blockchain technologies, verifiable machine learning, and electronic voting, but have yet to see widespread adoption due to the computational complexity of the proving process. Recent works have accelerated the key primitives of state-of-the-art ZKP protocols on GPU and ASIC. However, the protocols accelerated thus far face one of two challenges: they either require a trusted setup for each application, or they generate larger proof sizes with higher verification costs, limiting their applicability in scenarios with numerous verifiers or strict verification time constraints. This work presents an accelerator, zkSpeed, for HyperPlonk, a state-of-the-art ZKP protocol that supports both one-time, universal setup and small proof sizes for typical ZKP applications in publicly verifiable, consensus-based systems. We accelerate the entire protocol, including two major primitives: SumCheck and Multi-scalar Multiplications (MSMs). We develop a full-chip architecture using 366.46 mm$^2$ and 2 TB/s of bandwidth to accelerate the entire proof generation process, achieving geometric mean speedups of 801$\times$ over CPU baselines.

Updated: 2025-04-08 16:56:10

标题: 需要 zkSpeed：加速超级 Plonk 用于零知识证明

摘要: 零知识证明（ZKPs）在隐私保护和可验证计算中迅速变得重要。ZKPs使证明方能向验证方证明一个陈述的真实性，而无需透露其他任何信息。ZKPs在区块链技术、可验证机器学习和电子投票等领域有应用，但由于证明过程的计算复杂性，尚未被广泛采用。最近的研究加速了最先进的ZKP协议的关键原语在GPU和ASIC上的实现。然而，迄今加速的协议面临两种挑战之一：要么对每个应用都需要一个可信的设置，要么会生成更大的证明大小并带来更高的验证成本，限制了它们在具有众多验证者或严格验证时间约束的场景中的适用性。本文提出了一个名为zkSpeed的加速器，用于HyperPlonk，这是一种最先进的ZKP协议，支持一次性、通用设置和典型ZKP应用中小的证明大小，适用于公开可验证的基于共识的系统。我们加速整个协议，包括两个主要原语：SumCheck和多标量乘法（MSMs）。我们开发了一种全芯片架构，使用366.46 mm$^2和2 TB/s的带宽来加速整个证明生成过程，实现了相对于CPU基线的几何平均加速比为801倍。

更新时间: 2025-04-08 16:56:10

领域: cs.AR,cs.CR

下载: http://arxiv.org/abs/2504.06211v1

The Work Capacity of Channels with Memory: Maximum Extractable Work in Percept-Action Loops

Predicting future observations plays a central role in machine learning, biology, economics, and many other fields. It lies at the heart of organizational principles such as the variational free energy principle and has even been shown -- based on the second law of thermodynamics -- to be necessary for reaching the fundamental energetic limits of sequential information processing. While the usefulness of the predictive paradigm is undisputed, complex adaptive systems that interact with their environment are more than just predictive machines: they have the power to act upon their environment and cause change. In this work, we develop a framework to analyze the thermodynamics of information processing in percept-action loops -- a model of agent-environment interaction -- allowing us to investigate the thermodynamic implications of actions and percepts on equal footing. To this end, we introduce the concept of work capacity -- the maximum rate at which an agent can expect to extract work from its environment. Our results reveal that neither of two previously established design principles for work-efficient agents -- maximizing predictive power and forgetting past actions -- remains optimal in environments where actions have observable consequences. Instead, a trade-off emerges: work-efficient agents must balance prediction and forgetting, as remembering past actions can reduce the available free energy. This highlights a fundamental departure from the thermodynamics of passive observation, suggesting that prediction and energy efficiency may be at odds in active learning systems.

Updated: 2025-04-08 16:54:20

标题: 具有记忆的通道的工作能力：感知-动作循环中可提取的最大工作

摘要: 预测未来观察在机器学习、生物学、经济学和许多其他领域中起着核心作用。它是诸如变分自由能原理等组织原则的核心，并且甚至根据热力学第二定律已被证明是实现顺序信息处理的基本能量极限所必需的。虽然预测范式的有用性是毋庸置疑的，但与其环境互动的复杂自适应系统不仅仅是预测机器：它们有能力对其环境产生影响并引起变化。在这项工作中，我们开发了一个框架来分析感知-行动循环中信息处理的热力学--一种代理-环境交互的模型--使我们能够研究行动和感知对热力学的影响。为此，我们引入了工作能力的概念--代理能够从其环境中期望提取工作的最大速率。我们的结果显示，对于行为具有可观察后果的环境，两个先前建立的工作高效代理设计原则--最大化预测能力和忘记过去行动--都不再是最优的。相反，一个权衡出现了：工作高效代理必须平衡预测和遗忘，因为记住过去的行动可能会降低可用的自由能。这突显了与被动观察的热力学根本分歧，表明在主动学习系统中，预测和能源效率可能存在冲突。

更新时间: 2025-04-08 16:54:20

领域: cs.LG,cond-mat.stat-mech,cs.IT,math.IT,nlin.AO,nlin.CD,quant-ph

下载: http://arxiv.org/abs/2504.06209v1

A new framework for prognostics in decentralized industries: Enhancing fairness, security, and transparency through Blockchain and Federated Learning

As global industries transition towards Industry 5.0 predictive maintenance PM remains crucial for cost effective operations resilience and minimizing downtime in increasingly smart manufacturing environments In this chapter we explore how the integration of Federated Learning FL and blockchain BC technologies enhances the prediction of machinerys Remaining Useful Life RUL within decentralized and human centric industrial ecosystems Traditional centralized data approaches raise concerns over privacy security and scalability especially as Artificial intelligence AI driven smart manufacturing becomes more prevalent This chapter leverages FL to enable localized model training across multiple sites while utilizing BC to ensure trust transparency and data integrity across the network This BC integrated FL framework optimizes RUL predictions enhances data privacy and security establishes transparency and promotes collaboration in decentralized manufacturing It addresses key challenges such as maintaining privacy and security ensuring transparency and fairness and incentivizing participation in decentralized networks Experimental validation using the NASA CMAPSS dataset demonstrates the model effectiveness in real world scenarios and we extend our findings to the broader research community through open source code on GitHub inviting collaborative development to drive innovation in Industry 5.0

Updated: 2025-04-08 16:53:33

标题: 去中心化行业预测的新框架：通过区块链和联邦学习提高公平性、安全性和透明度

摘要: 随着全球产业向工业5.0过渡，预测性维护对于成本有效的运营韧性和在日益智能制造环境中最小化停机时间仍然至关重要。在本章中，我们探讨了如何整合联邦学习（FL）和区块链（BC）技术，以增强机械剩余寿命（RUL）的预测能力，这发生在分散化和以人为中心的工业生态系统中。传统的集中式数据方法引发了对隐私、安全性和可扩展性的担忧，尤其是随着人工智能（AI）驱动的智能制造变得更加普遍。本章利用FL实现跨多个站点的本地化模型训练，同时利用BC确保网络上的信任、透明度和数据完整性。这种BC集成的FL框架优化了RUL预测，增强了数据隐私和安全性，建立了透明度，并促进了分散制造中的协作。它解决了一些关键挑战，如保护隐私和安全性，确保透明度和公平性，以及激励参与分散网络。使用NASA CMAPSS数据集进行实验验证，展示了模型在现实场景中的有效性，并通过GitHub上的开源代码将我们的研究成果扩展到更广泛的研究社区，邀请协作开发，推动工业5.0创新。

更新时间: 2025-04-08 16:53:33

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.05725v2

An experimental survey and Perspective View on Meta-Learning for Automated Algorithms Selection and Parametrization

Considerable progress has been made in the recent literature studies to tackle the Algorithms Selection and Parametrization (ASP) problem, which is diversified in multiple meta-learning setups. Yet there is a lack of surveys and comparative evaluations that critically analyze, summarize and assess the performance of existing methods. In this paper, we provide an overview of the state of the art in this continuously evolving field. The survey sheds light on the motivational reasons for pursuing classifiers selection through meta-learning. In this regard, Automated Machine Learning (AutoML) is usually treated as an ASP problem under the umbrella of the democratization of machine learning. Accordingly, AutoML makes machine learning techniques accessible to domain scientists who are interested in applying advanced analytics but lack the required expertise. It can ease the task of manually selecting ML algorithms and tuning related hyperparameters. We comprehensively discuss the different phases of classifiers selection based on a generic framework that is formed as an outcome of reviewing prior works. Subsequently, we propose a benchmark knowledge base of 4 millions previously learned models and present extensive comparative evaluations of the prominent methods for classifiers selection based on 08 classification algorithms and 400 benchmark datasets. The comparative study quantitatively assesses the performance of algorithms selection methods along while emphasizing the strengths and limitations of existing studies.

Updated: 2025-04-08 16:51:22

标题: 一个关于元学习在自动算法选择和参数化中的实验调查和展望视角

摘要: 最近的文献研究在解决算法选择和参数化（ASP）问题方面取得了相当大的进展，这个问题在多种元学习设置中有所不同。然而，缺乏对现有方法进行批判性分析、总结和评估性能的调查和比较评估。在本文中，我们概述了这个不断发展的领域的最新进展。调查揭示了通过元学习选择分类器的动机。在这方面，自动机器学习（AutoML）通常被视为机器学习民主化的ASP问题。因此，AutoML使得对于应用先进分析感兴趣但缺乏必要专业知识的领域科学家更容易获得机器学习技术。它可以简化手动选择ML算法和调整相关超参数的任务。我们全面讨论了基于一个通用框架的分类器选择的不同阶段，该框架是通过审查先前作品形成的结果。随后，我们提出了一个包含400个基准数据集和08种分类算法的显著方法的广泛比较评估的基准知识库，该知识库包含了400万个先前学习过的模型。比较研究在定量评估算法选择方法的性能的同时，强调了现有研究的优势和局限性。

更新时间: 2025-04-08 16:51:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06207v1

Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surpassing human intelligence. However, is the LLMs' remarkable reasoning ability indeed comes from true intelligence by human standards, or are they simply reciting solutions witnessed during training at an Internet level? To study this problem, we propose RoR-Bench, a novel, multi-modal benchmark for detecting LLM's recitation behavior when asked simple reasoning problems but with conditions subtly shifted, and conduct empirical analysis on our benchmark. Surprisingly, we found existing cutting-edge LLMs unanimously exhibits extremely severe recitation behavior; by changing one phrase in the condition, top models such as OpenAI-o1 and DeepSeek-R1 can suffer $60\%$ performance loss on elementary school-level arithmetic and reasoning problems. Such findings are a wake-up call to the LLM community that compels us to re-evaluate the true intelligence level of cutting-edge LLMs.

Updated: 2025-04-08 16:51:11

标题: 背诵胜于推理：尖端语言模型如何在小学水平的推理问题上失败？

摘要: 最近几年，LLM基准测试从小学水平迅速升级到前沿问题，对研究人员来说，这为我们超越人类智能只差一步创造了奇迹。然而，LLM的出色推理能力确实来自于人类标准下的真正智能，还是它们只是在互联网水平的训练中背诵解决方案？为了研究这一问题，我们提出了RoR-Bench，一个新颖的多模式基准，用于检测LLM在被问及简单推理问题但条件微妙转变时的背诵行为，并对我们的基准进行实证分析。令人惊讶的是，我们发现现有的尖端LLM一致表现出极其严重的背诵行为；通过改变条件中的一个短语，像OpenAI-o1和DeepSeek-R1这样的顶级模型在小学水平的算术和推理问题上可能会遭受60%的性能损失。这样的发现是对LLM社区的一声警钟，迫使我们重新评估尖端LLM的真实智能水平。

更新时间: 2025-04-08 16:51:11

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.00509v2

Improving Genetic Programming for Symbolic Regression with Equality Graphs

The search for symbolic regression models with genetic programming (GP) has a tendency of revisiting expressions in their original or equivalent forms. Repeatedly evaluating equivalent expressions is inefficient, as it does not immediately lead to better solutions. However, evolutionary algorithms require diversity and should allow the accumulation of inactive building blocks that can play an important role at a later point. The equality graph is a data structure capable of compactly storing expressions and their equivalent forms allowing an efficient verification of whether an expression has been visited in any of their stored equivalent forms. We exploit the e-graph to adapt the subtree operators to reduce the chances of revisiting expressions. Our adaptation, called eggp, stores every visited expression in the e-graph, allowing us to filter out from the available selection of subtrees all the combinations that would create already visited expressions. Results show that, for small expressions, this approach improves the performance of a simple GP algorithm to compete with PySR and Operon without increasing computational cost. As a highlight, eggp was capable of reliably delivering short and at the same time accurate models for a selected set of benchmarks from SRBench and a set of real-world datasets.

Updated: 2025-04-08 16:48:10

标题: 利用等式图提高符号回归的遗传编程

摘要: 通过遗传编程（GP）寻找具有符号回归模型的趋势是重新访问表达式的原始形式或等效形式。重复评估等效表达式是低效的，因为它并不能立即导致更好的解决方案。然而，进化算法需要多样性，并且应允许无效构建块的积累，这些构建块在以后可能发挥重要作用。等式图是一种数据结构，能够紧凑地存储表达式及其等效形式，从而有效地验证表达式是否已经在任何存储的等效形式中被访问过。我们利用等式图来调整子树操作符，以减少重新访问表达式的可能性。我们的调整称为eggp，将每个访问的表达式存储在等式图中，使我们能够从可用的子树选择中过滤掉所有可能创建已经访问过的表达式的组合。结果显示，对于较小的表达式，这种方法提高了简单GP算法的性能，使其能够与PySR和Operon竞争，而不增加计算成本。值得一提的是，eggp能够可靠地为SRBench和一组真实世界数据集中的一组基准提供短且准确的模型。

更新时间: 2025-04-08 16:48:10

领域: cs.LG

下载: http://arxiv.org/abs/2501.17848v2

Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization

Optimization with matrix gradient orthogonalization has recently demonstrated impressive results in the training of deep neural networks (Jordan et al., 2024; Liu et al., 2025). In this paper, we provide a theoretical analysis of this approach. In particular, we show that the orthogonalized gradient method can be seen as a first-order trust-region optimization method, where the trust-region is defined in terms of the matrix spectral norm. Motivated by this observation, we develop the stochastic non-Euclidean trust-region gradient method with momentum, which recovers the Muon optimizer (Jordan et al., 2024) as a special case, along with normalized SGD and signSGD with momentum (Cutkosky and Mehta, 2020; Sun et al., 2023). In addition, we prove state-of-the-art convergence results for the proposed algorithm in a range of scenarios, which involve arbitrary non-Euclidean norms, constrained and composite problems, and non-convex, star-convex, first- and second-order smooth functions. Finally, our theoretical findings provide an explanation for several practical observations, including the practical superiority of Muon compared to the Orthogonal-SGDM algorithm of Tuddenham et al. (2022) and the importance of weight decay in the training of large-scale language models.

Updated: 2025-04-08 16:47:42

标题: 通过非欧几里德信任区域优化理解深度学习中的梯度正交化

摘要: 最近，使用矩阵梯度正交化进行优化在深度神经网络训练中展现出令人印象深刻的结果（Jordan等，2024；Liu等，2025）。本文提供了对这种方法的理论分析。特别地，我们展示了正交化梯度方法可以被视为一种以矩阵谱范数定义信任域的一阶优化方法。受到这一观察的启发，我们开发了带有动量的随机非欧几里德信任域梯度方法，其恢复了Muon优化器（Jordan等，2024）作为一个特例，以及带有动量的归一化SGD和带有动量的signSGD（Cutkosky和Mehta，2020；Sun等，2023）。此外，我们证明了在一系列情景中提出的算法的最先进的收敛性结果，这些情景涉及任意非欧几里德范数、约束和复合问题，以及非凸、星凸、一阶和二阶平滑函数。最后，我们的理论发现解释了几个实际观察结果，包括Muon相对于Tuddenham等人（2022）的正交SGDM算法的实际优越性，以及在训练大规模语言模型中权重衰减的重要性。

更新时间: 2025-04-08 16:47:42

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2503.12645v2

Evaluating the Propensity of Generative AI for Producing Harmful Disinformation During an Election Cycle

Generative Artificial Intelligence offers a powerful tool for adversaries who wish to engage in influence operations, such as the Chinese Spamouflage operation and the Russian Internet Research Agency effort that both sought to interfere with recent US election cycles. Therefore, this study seeks to investigate the propensity of current generative AI models for producing harmful disinformation during an election cycle. The probability that different generative AI models produced disinformation when given adversarial prompts was evaluated, in addition the associated harm. This allows for the expected harm for each model to be computed and it was discovered that Copilot and Gemini tied for the overall safest performance by realizing the lowest expected harm, while GPT-4o produced the greatest rates of harmful disinformation, resulting in much higher expected harm scores. The impact of disinformation category was also investigated and Gemini was safest within the political category of disinformation due to mitigation attempts made by developers during the election, while Copilot was safest for topics related to health. Moreover, characteristics of adversarial roles were discovered that led to greater expected harm across all models. Finally, classification models were developed that predicted disinformation production based on the conditions considered in this study, which offers insight into factors important for predicting disinformation production. Based on all of these insights, recommendations are provided that seek to mitigate factors that lead to harmful disinformation being produced by generative AI models. It is hoped that developers will use these insights to improve future models.

Updated: 2025-04-08 16:46:34

标题: 评估生成式人工智能在选举周期中产生有害虚假信息的倾向性

摘要: 生成人工智能为希望进行影响操作的对手提供了强大的工具，比如中国的垃圾邮件操作和俄罗斯的互联网研究机构努力，它们都试图干预最近的美国选举周期。因此，本研究旨在调查当前生成人工智能模型在选举周期期间产生有害虚假信息的倾向。评估了不同生成人工智能模型在给定对抗性提示时产生虚假信息的概率，以及相关的伤害。这使得可以计算出每个模型的预期伤害，结果发现Copilot和Gemini在整体上表现最安全，因为它们实现了最低的预期伤害，而GPT-4o产生了最高比率的有害虚假信息，导致更高的预期伤害分数。还调查了虚假信息类别的影响，Gemini在政治类虚假信息中最安全，因为开发人员在选举期间采取了缓解措施，而Copilot在与健康相关的话题中最安全。此外，发现了导致所有模型预期伤害增加的对抗角色特征。最后，开发了分类模型，根据本研究考虑的条件预测虚假信息的产生，这为预测虚假信息产生的重要因素提供了洞见。基于所有这些洞见，提供了旨在减轻导致生成人工智能模型产生有害虚假信息的因素的建议。希望开发人员将利用这些洞见来改进未来的模型。

更新时间: 2025-04-08 16:46:34

领域: cs.AI

下载: http://arxiv.org/abs/2411.06120v6

Characteristics of Political Misinformation Over the Past Decade

Although misinformation tends to spread online, it can have serious real-world consequences. In order to develop automated tools to detect and mitigate the impact of misinformation, researchers must leverage algorithms that can adapt to the modality (text, images and video), the source, and the content of the false information. However, these characteristics tend to change dynamically across time, making it challenging to develop robust algorithms to fight misinformation spread. Therefore, this paper uses natural language processing to find common characteristics of political misinformation over a twelve year period. The results show that misinformation has increased dramatically in recent years and that it has increasingly started to be shared from sources with primary information modalities of text and images (e.g., Facebook and Instagram), although video sharing sources containing misinformation are starting to increase (e.g., TikTok). Moreover, it was discovered that statements expressing misinformation contain more negative sentiment than accurate information. However, the sentiment associated with both accurate and inaccurate information has trended downward, indicating a generally more negative tone in political statements across time. Finally, recurring misinformation categories were uncovered that occur over multiple years, which may imply that people tend to share inaccurate statements around information they fear or don't understand (Science and Medicine, Crime, Religion), impacts them directly (Policy, Election Integrity, Economic) or Public Figures who are salient in their daily lives. Together, it is hoped that these insights will assist researchers in developing algorithms that are temporally invariant and capable of detecting and mitigating misinformation across time.

Updated: 2025-04-08 16:41:24

标题: 过去十年政治虚假信息的特征

摘要: 尽管虚假信息在网上传播，但它可能会产生严重的现实世界后果。为了开发自动化工具来检测和减轻虚假信息的影响，研究人员必须利用可以适应虚假信息的形式（文本、图像和视频）、来源和内容的算法。然而，这些特征往往会随着时间动态变化，使得开发强大的算法来对抗虚假信息传播具有挑战性。因此，本文利用自然语言处理来发现政治虚假信息在十二年时间段内的共同特征。结果显示，虚假信息在近年来呈现出急剧增加的趋势，并且越来越多地开始从具有文本和图像等主要信息形式的来源（如Facebook和Instagram）分享，尽管包含虚假信息的视频分享来源（如TikTok）也开始增加。此外，发现表达虚假信息的陈述比准确信息包含更多的负面情绪。然而，随着时间推移，与准确和不准确信息相关的情绪都呈下降趋势，表明政治声明中一般更为负面。最后，发现了多年来反复出现的虚假信息类别，这可能意味着人们倾向于在他们害怕或不理解的信息周围分享不准确的陈述（科学和医学、犯罪、宗教），这些信息直接影响到他们（政策、选举诚信、经济）或者日常生活中突出的公众人物。希望这些见解能帮助研究人员开发出具有时间不变性的算法，能够在不同时间段内检测和减轻虚假信息。

更新时间: 2025-04-08 16:41:24

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2411.06122v2

TxGemma: Efficient and Agentic LLMs for Therapeutics

Therapeutic development is a costly and high-risk endeavor that is often plagued by high failure rates. To address this, we introduce TxGemma, a suite of efficient, generalist large language models (LLMs) capable of therapeutic property prediction as well as interactive reasoning and explainability. Unlike task-specific models, TxGemma synthesizes information from diverse sources, enabling broad application across the therapeutic development pipeline. The suite includes 2B, 9B, and 27B parameter models, fine-tuned from Gemma-2 on a comprehensive dataset of small molecules, proteins, nucleic acids, diseases, and cell lines. Across 66 therapeutic development tasks, TxGemma achieved superior or comparable performance to the state-of-the-art generalist model on 64 (superior on 45), and against state-of-the-art specialist models on 50 (superior on 26). Fine-tuning TxGemma models on therapeutic downstream tasks, such as clinical trial adverse event prediction, requires less training data than fine-tuning base LLMs, making TxGemma suitable for data-limited applications. Beyond these predictive capabilities, TxGemma features conversational models that bridge the gap between general LLMs and specialized property predictors. These allow scientists to interact in natural language, provide mechanistic reasoning for predictions based on molecular structure, and engage in scientific discussions. Building on this, we further introduce Agentic-Tx, a generalist therapeutic agentic system powered by Gemini 2.5 that reasons, acts, manages diverse workflows, and acquires external domain knowledge. Agentic-Tx surpasses prior leading models on the Humanity's Last Exam benchmark (Chemistry & Biology) with 52.3% relative improvement over o3-mini (high) and 26.7% over o3-mini (high) on GPQA (Chemistry) and excels with improvements of 6.3% (ChemBench-Preference) and 2.4% (ChemBench-Mini) over o3-mini (high).

Updated: 2025-04-08 16:39:02

标题: TxGemma：用于治疗的高效和自主的LLMs

摘要: 治疗性开发是一项昂贵且高风险的工作，通常受到高失败率的困扰。为了解决这个问题，我们引入了TxGemma，这是一套高效、通用的大型语言模型（LLMs），能够预测治疗性属性，同时具有交互式推理和可解释性。与任务特定模型不同，TxGemma从多种信息源中综合信息，能够广泛应用于治疗性开发流程中。该套件包括2B、9B和27B参数模型，从Gemma-2对小分子、蛋白质、核酸、疾病和细胞系的全面数据集进行了微调。在66个治疗性开发任务中，TxGemma在64个任务上取得了优越或可比的表现，超过了最先进的通用模型45个任务，在50个任务上超过了最先进的专业模型26个任务。在治疗性下游任务上微调TxGemma模型，如临床试验不良事件预测，比微调基础LLMs需要更少的训练数据，使TxGemma适用于数据有限的应用。除了这些预测能力，TxGemma还具有桥接通用LLMs和专门属性预测器之间差距的对话模型。这些模型允许科学家以自然语言交互，根据分子结构提供机理推理的预测，并参与科学讨论。基于此，我们进一步推出了Agentic-Tx，这是一个由Gemini 2.5驱动的通用治疗性主体系统，能够进行推理、行动、管理多样化工作流程，并获取外部领域知识。Agentic-Tx在人类最终考试基准测试（化学和生物学）上超过了之前领先的模型，相对于o3-mini（高）提高了52.3%，在GPQA（化学）上超过了o3-mini（高）26.7%，在ChemBench-Preference上提高了6.3%，在ChemBench-Mini上提高了2.4%，超过了o3-mini（高）。

更新时间: 2025-04-08 16:39:02

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06196v1

Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction

Link prediction is a crucial graph-learning task with applications including citation prediction and product recommendation. Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance and reducing computational cost by removing graph dependency. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and heuristic methods (e.g., common neighbors). This paper first explores the impact of different teachers in GNN-to-MLP distillation. Surprisingly, we find that stronger teachers do not always produce stronger students: MLPs distilled from GNN4LP can underperform those distilled from simpler GNNs, while weaker heuristic methods can teach MLPs to near-GNN performance with drastically reduced training costs. Building on these insights, we propose Ensemble Heuristic-Distilled MLPs (EHDM), which eliminates graph dependencies while effectively integrating complementary signals via a gating mechanism. Experiments on ten datasets show an average 7.93% improvement over previous GNN-to-MLP approaches with 1.95-3.32 times less training time, indicating EHDM is an efficient and effective link prediction method.

Updated: 2025-04-08 16:35:11

标题: 启发式方法是提取MLP用于图链接预测的良好教师

摘要: 链路预测是一项关键的图学习任务，其应用包括引文预测和产品推荐。将图神经网络（GNNs）教师精炼为多层感知机（MLPs）学生已经成为一种有效的方法，可以实现强大的性能并通过消除图依赖来降低计算成本。然而，现有的精炼方法仅使用标准的GNNs，忽视了其他教师，如专门用于链路预测的模型（GNN4LP）和启发式方法（例如，共同邻居）。本文首次探讨了在GNN到MLP精炼中不同教师的影响。令人惊讶的是，我们发现更强大的教师并不总是会产生更强大的学生：从GNN4LP精炼出的MLPs可能表现不佳，而较弱的启发式方法可以教会MLPs接近GNN性能，并大大降低训练成本。基于这些见解，我们提出了集成启发式精炼MLPs（EHDM），通过门控机制有效地集成互补信号，从而消除图依赖性。在十个数据集上的实验表明，与先前的GNN到MLP方法相比，EHDM平均提高了7.93%，训练时间减少了1.95-3.32倍，表明EHDM是一种高效有效的链路预测方法。

更新时间: 2025-04-08 16:35:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06193v1

rEGGression: an Interactive and Agnostic Tool for the Exploration of Symbolic Regression Models

Regression analysis is used for prediction and to understand the effect of independent variables on dependent variables. Symbolic regression (SR) automates the search for non-linear regression models, delivering a set of hypotheses that balances accuracy with the possibility to understand the phenomena. Many SR implementations return a Pareto front allowing the choice of the best trade-off. However, this hides alternatives that are close to non-domination, limiting these choices. Equality graphs (e-graphs) allow to represent large sets of expressions compactly by efficiently handling duplicated parts occurring in multiple expressions. E-graphs allow to store and query all SR solution candidates visited in one or multiple GP runs efficiently and open the possibility to analyse much larger sets of SR solution candidates. We introduce rEGGression, a tool using e-graphs to enable the exploration of a large set of symbolic expressions which provides querying, filtering, and pattern matching features creating an interactive experience to gain insights about SR models. The main highlight is its focus in the exploration of the building blocks found during the search that can help the experts to find insights about the studied phenomena.This is possible by exploiting the pattern matching capability of the e-graph data structure.

Updated: 2025-04-08 16:34:59

标题: rEGGression: 一种用于探索符号回归模型的交互式和不可知的工具

摘要: 回归分析用于预测和了解自变量对因变量的影响。符号回归（SR）自动化了非线性回归模型的搜索，提供了一组假设，平衡了准确性与理解现象的可能性。许多SR实现返回帕累托前沿，允许选择最佳权衡。然而，这隐藏了接近非支配的替代方案，限制了这些选择。相等图（e-graphs）通过有效处理多个表达式中出现的重复部分，可以紧凑地表示大量表达式集合。E-图允许高效地存储和查询在一个或多个GP运行中访问的所有SR解决方案候选人，并打开了分析更大的SR解决方案候选人集合的可能性。我们介绍了一种名为rEGGression的工具，使用e-graphs来探索大量符号表达式集合，提供查询、过滤和模式匹配功能，从而创造一个交互式体验，以获取有关SR模型的见解。其主要亮点在于专注于探索搜索过程中发现的构建模块，这可以帮助专家们找到有关研究现象的见解。这是通过利用e-graph数据结构的模式匹配能力实现的。

更新时间: 2025-04-08 16:34:59

领域: cs.LG

下载: http://arxiv.org/abs/2501.17859v2

SkillFlow: Efficient Skill and Code Transfer Through Communication in Adapting AI Agents

AI agents are autonomous systems that can execute specific tasks based on predefined programming. Here, we present SkillFlow, a modular, technology-agnostic framework that allows agents to expand their functionality in an ad-hoc fashion by acquiring new skills from their environment or other agents. We present a theoretical model that examines under which conditions this framework would be beneficial, and we then explore SkillFlow's ability to accelerate task completion and lead to lower cumulative costs in a real-world application, namely scheduling agents for calendar events. We demonstrate that within a few iterations, SkillFlow leads to considerable (24.8%, p-value = $6.4\times10^{-3}$) gains in time and cost, especially when the communication cost is high. Finally, we draw analogies from well-studied biological systems and compare this framework to that of lateral gene transfer, a significant process of adaptation and evolution in novel environments.

Updated: 2025-04-08 16:33:24

标题: SkillFlow：通过通信在适应人工智能代理中实现高效技能和代码转移

摘要: 人工智能代理是可以根据预先定义的编程执行特定任务的自主系统。在这里，我们介绍了SkillFlow，这是一个模块化的、技术无关的框架，允许代理通过从环境或其他代理获取新技能的方式以临时方式扩展其功能。我们提出了一个理论模型，考察在哪些条件下这个框架将是有益的，然后我们探讨了SkillFlow在现实世界应用中加速任务完成并导致更低累积成本的能力，即为日历事件安排代理。我们证明，在几次迭代中，SkillFlow会导致显著的时间和成本增益（24.8%，p值= $6.4\times10^{-3}$），特别是当通信成本较高时。最后，我们从研究充分的生物系统中提取类比，并将这个框架与侧向基因转移进行比较，侧向基因转移是在新环境中适应和进化的重要过程。

更新时间: 2025-04-08 16:33:24

领域: cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2504.06188v1

GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance

Robot learning approaches such as behavior cloning and reinforcement learning have shown great promise in synthesizing robot skills from human demonstrations in specific environments. However, these approaches often require task-specific demonstrations or designing complex simulation environments, which limits the development of generalizable and robust policies for unseen real-world settings. Recent advances in the use of foundation models for robotics (e.g., LLMs, VLMs) have shown great potential in enabling systems to understand the semantics in the world from large-scale internet data. However, it remains an open challenge to use this knowledge to enable robotic systems to understand the underlying dynamics of the world, to generalize policies across different tasks, and to adapt policies to new environments. To alleviate these limitations, we propose an agentic framework for robot self-guidance and self-improvement, which consists of a set of role-specialized conversational agents, such as a high-level advisor, a grounding agent, a monitoring agent, and a robotic agent. Our framework iteratively grounds a base robot policy to relevant objects in the environment and uses visuomotor cues to shift the action distribution of the policy to more desirable states, online, while remaining agnostic to the subjective configuration of a given robot hardware platform. We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates, both in simulation and in real-world experiments, without the need for additional human demonstrations or extensive exploration. Code and videos available at: https://agenticrobots.github.io

Updated: 2025-04-08 16:32:04

标题: GRAPPA：通过在线主体引导泛化和调整机器人策略

摘要: 机器人学习方法，如行为克隆和强化学习，已经展示出在特定环境中从人类演示中合成机器人技能的巨大潜力。然而，这些方法通常需要特定任务的演示或设计复杂的模拟环境，这限制了为未知真实世界设置开发可推广和稳健策略的发展。最近在使用基础模型（例如LLMs、VLMs）在机器人领域取得的进展显示出巨大潜力，使系统能够从大规模互联网数据中理解世界中的语义。然而，利用这种知识使机器人系统理解世界的潜在动态、在不同任务之间推广策略以及适应新环境仍然是一个挑战。为了缓解这些限制，我们提出了一个机器人自我引导和自我改进的主体框架，包括一组角色专门化的对话代理，如高级顾问、基础代理、监控代理和机器人代理。我们的框架通过迭代将基础机器人策略与环境中的相关对象联系起来，并使用视觉动作线索将策略的行为分布转移到更理想的状态，同时保持对给定机器人硬件平台的主观配置的不可知性。我们演示了我们的方法可以有效地引导操作策略，在模拟和真实世界实验中实现显著更高的成功率，而无需额外的人类演示或广泛的探索。代码和视频可在以下链接找到：https://agenticrobots.github.io

更新时间: 2025-04-08 16:32:04

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.06473v3

WoundAmbit: Bridging State-of-the-Art Semantic Segmentation and Real-World Wound Care

Chronic wounds affect a large population, particularly the elderly and diabetic patients, who often exhibit limited mobility and co-existing health conditions. Automated wound monitoring via mobile image capture can reduce in-person physician visits by enabling remote tracking of wound size. Semantic segmentation is key to this process, yet wound segmentation remains underrepresented in medical imaging research. To address this, we benchmark state-of-the-art deep learning models from general-purpose vision, medical imaging, and top methods from public wound challenges. For fair comparison, we standardize training, data augmentation, and evaluation, conducting cross-validationto minimize partitioning bias. We also assess real-world deployment aspects, including generalization to an out-of-distribution wound dataset, computational efficiency, and interpretability. Additionally, we propose a reference object-based approach to convert AI-generated masks into clinically relevant wound size estimates, and evaluate this, along with mask quality, for the best models based on physician assessments. Overall, the transformer-based TransNeXt showed the highest levels of generalizability. Despite variations in inference times, all models processed at least one image per second on the CPU, which is deemed adequate for the intended application. Interpretability analysis typically revealed prominent activations in wound regions, emphasizing focus on clinically relevant features. Expert evaluation showed high mask approval for all analyzed models, with VWFormer and ConvNeXtS backbone performing the best. Size retrieval accuracy was similar across models, and predictions closely matched expert annotations. Finally, we demonstrate how our AI-driven wound size estimation framework, WoundAmbit, can be integrated into a custom telehealth system. Our code will be made available on GitHub upon publication.

Updated: 2025-04-08 16:25:59

标题: WoundAmbit：连接最先进的语义分割和现实世界的伤口护理

摘要: 慢性伤口影响了大部分人群，特别是老年人和糖尿病患者，他们通常表现出行动受限和合并其他健康状况。通过移动图像捕捉实现自动化伤口监测可以减少亲自就医的次数，从而实现对伤口大小的远程跟踪。语义分割对这一过程至关重要，然而在医学影像研究中，伤口分割仍未得到充分重视。为了解决这一问题，我们对来自通用视觉、医学影像和公共伤口挑战赛中的顶尖方法的最先进的深度学习模型进行基准测试。为了进行公平比较，我们对训练、数据增强和评估进行了标准化，进行交叉验证以减少分区偏差。我们还评估了现实世界部署方面，包括对分布外伤口数据集的泛化能力、计算效率和可解释性。此外，我们提出了一种基于参考对象的方法，将AI生成的蒙版转换为临床相关的伤口大小估计，并基于医生的评估对最佳模型进行了评估。总体而言，基于转换器的TransNeXt显示出最高的泛化能力。尽管推理时间有所变化，但所有模型在CPU上每秒至少处理一张图像，这被认为对于预期的应用是足够的。可解释性分析通常显示出在伤口区域的显著激活，强调对临床相关特征的关注。专家评估显示，所有分析的模型都获得了高度的蒙版批准，VWFormer和ConvNeXtS骨干表现最佳。各模型的尺寸检索精度相似，预测与专家注释相吻合。最后，我们展示了我们的基于AI的伤口大小估计框架WoundAmbit如何集成到定制的远程医疗系统中。我们的代码将在出版后在GitHub上提供。

更新时间: 2025-04-08 16:25:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06185v1

Non-negative Tensor Mixture Learning for Discrete Density Estimation

We present an expectation-maximization (EM) based unified framework for non-negative tensor decomposition that optimizes the Kullback-Leibler divergence. To avoid iterations in each M-step and learning rate tuning, we establish a general relationship between low-rank decompositions and many-body approximations. Using this connection, we exploit that the closed-form solution of the many-body approximation updates all parameters simultaneously in the M-step. Our framework offers not only a unified methodology for a variety of low-rank structures, including CP, Tucker, and Tensor Train decompositions, but also their mixtures. Notably, the weights of each low-rank tensor in the mixture can be learned from the data, which enables us to leverage the advantage of different low-rank structures without careful selection of the structure in advance. We empirically demonstrate that our framework overall provides superior generalization in terms of discrete density estimation and classification when compared to conventional tensor-based approaches.

Updated: 2025-04-08 16:22:02

标题: 非负张量混合学习用于离散密度估计

摘要: 我们提出了一个基于期望最大化（EM）的统一框架，用于非负张量分解，优化Kullback-Leibler散度。为了避免在每个M步骤中进行迭代和学习率调整，我们建立了低秩分解和多体近似之间的一般关系。利用这种联系，我们利用了多体近似的闭式解，同时在M步骤中更新所有参数。我们的框架不仅为各种低秩结构提供了统一的方法论，包括CP、Tucker和Tensor Train分解，还包括它们的混合。值得注意的是，混合中每个低秩张量的权重可以从数据中学习，这使我们能够利用不同低秩结构的优势，而无需事先仔细选择结构。我们在实证中证明，与传统基于张量的方法相比，我们的框架在离散密度估计和分类方面提供了更优越的泛化性能。

更新时间: 2025-04-08 16:22:02

领域: stat.ML,cs.LG,68T01,I.2.6

下载: http://arxiv.org/abs/2405.18220v2

Blockchain Oracles for Real Estate Rental

Blockchain technology has seen adoption across various industries and the real estate sector is no exception. The traditional property leasing process guarantees no trust between parties, uses insecure communication channels, and forces participants who are not familiar with the process to perform contracts. Blockchain technology emerges as a solution to simplify the traditional property leasing process. This work proposes the use of two blockchain oracles to handle, respectively, maintenance issues and automate rent payments in the context of property rental. These two components are introduced in a blockchain-based property rental platform.

Updated: 2025-04-08 16:21:30

标题: 区块链预言机在房地产租赁中的应用

摘要: 区块链技术已被各行业采用，房地产行业也不例外。传统的房地产租赁过程在各方之间没有信任保障，使用不安全的通信渠道，并迫使不熟悉流程的参与者执行合同。区块链技术出现作为简化传统房地产租赁过程的解决方案。本文提出使用两个区块链预言机来处理维护问题和自动化租金支付，在房屋租赁环境中。这两个组件被引入到基于区块链的房屋租赁平台中。

更新时间: 2025-04-08 16:21:30

领域: cs.CR,K.4.3; D.2

下载: http://arxiv.org/abs/2504.06180v1

A Self-Supervised Framework for Space Object Behaviour Characterisation

Foundation Models, pre-trained on large unlabelled datasets before task-specific fine-tuning, are increasingly being applied to specialised domains. Recent examples include ClimaX for climate and Clay for satellite Earth observation, but a Foundation Model for Space Object Behavioural Analysis has not yet been developed. As orbital populations grow, automated methods for characterising space object behaviour are crucial for space safety. We present a Space Safety and Sustainability Foundation Model focusing on space object behavioural analysis using light curves (LCs). We implemented a Perceiver-Variational Autoencoder (VAE) architecture, pre-trained with self-supervised reconstruction and masked reconstruction on 227,000 LCs from the MMT-9 observatory. The VAE enables anomaly detection, motion prediction, and LC generation. We fine-tuned the model for anomaly detection & motion prediction using two independent LC simulators (CASSANDRA and GRIAL respectively), using CAD models of boxwing, Sentinel-3, SMOS, and Starlink platforms. Our pre-trained model achieved a reconstruction error of 0.01%, identifying potentially anomalous light curves through reconstruction difficulty. After fine-tuning, the model scored 88% and 82% accuracy, with 0.90 and 0.95 ROC AUC scores respectively in both anomaly detection and motion mode prediction (sun-pointing, spin, etc.). Analysis of high-confidence anomaly predictions on real data revealed distinct patterns including characteristic object profiles and satellite glinting. Here, we demonstrate how self-supervised learning can simultaneously enable anomaly detection, motion prediction, and synthetic data generation from rich representations learned in pre-training. Our work therefore supports space safety and sustainability through automated monitoring and simulation capabilities.

Updated: 2025-04-08 16:19:19

标题: 一个用于空间物体行为特征描述的自监督框架

摘要: 基金会模型在特定任务微调之前，已在大型未标记数据集上进行了预训练，并越来越多地应用于专门领域。最近的例子包括用于气候的ClimaX和用于卫星地球观测的Clay，但尚未开发出用于空间物体行为分析的基金会模型。随着轨道人口的增长，自动化方法对于表征空间物体行为至关重要以确保空间安全。我们提出了一个专注于利用光曲线（LCs）进行空间物体行为分析的空间安全与可持续性基金会模型。我们使用Perceiver-Variational Autoencoder（VAE）架构，在MMT-9天文台的227,000个LCs上进行了自监督重建和遮罪重建的预训练。VAE能够实现异常检测、运动预测和LC生成。我们使用两个独立的LC模拟器（分别为CASSANDRA和GRIAL），并使用箱式机、Sentinel-3、SMOS和Starlink平台的CAD模型，对模型进行了异常检测和运动预测的微调。我们的预训练模型在重建错误率为0.01%时，通过重建困难识别出潜在异常的光曲线。在微调后，模型在异常检测和运动模式预测（太阳指向、旋转等）方面分别取得了88%和82%的准确率，ROC AUC得分分别为0.90和0.95。对真实数据中高置信度异常预测的分析显示出特征性物体轮廓和卫星闪烁等明显模式。在这里，我们展示了自监督学习如何通过在预训练中学到的丰富表示同时实现异常检测、运动预测和合成数据生成。因此，我们的工作通过自动化监测和模拟能力支持了空间安全与可持续性。

更新时间: 2025-04-08 16:19:19

领域: cs.LG,cs.AI,physics.space-ph

下载: http://arxiv.org/abs/2504.06176v1

Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning

Beamforming techniques are considered as essential parts to compensate severe path losses in millimeter-wave (mmWave) communications. In particular, these techniques adopt large antenna arrays and formulate narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over narrow beams for efficient link configuration by traditional standard defined beam selection approaches, which mainly rely on channel state information and beam sweeping through exhaustive searching, imposes computational and communications overheads. And, such resulting overheads limit their potential use in vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communications involving highly dynamic scenarios. In comparison, utilizing out-of-band contextual information, such as sensing data obtained from sensor devices, provides a better alternative to reduce overheads. This paper presents a deep learning-based solution for utilizing the multi-modality sensing data for predicting the optimal beams having sufficient mmWave received powers so that the best V2I and V2V line-of-sight links can be ensured proactively. The proposed solution has been tested on real-world measured mmWave sensing and communication data, and the results show that it can achieve up to 98.19% accuracies while predicting top-13 beams. Correspondingly, when compared to existing been sweeping approach, the beam sweeping searching space and time overheads are greatly shortened roughly by 79.67% and 91.89%, respectively which confirm a promising solution for beamforming in mmWave enabled communications.

Updated: 2025-04-08 16:18:00

标题: 基于深度学习的毫米波波束成形中多模态传感技术在连接车辆中的应用

摘要: 波束成形技术被认为是在毫米波通信中补偿严重路径损耗的关键部分。特别是，这些技术采用大型天线阵列并制定窄波束以获得令人满意的接收功率。然而，通过传统标准定义的波束选择方法执行窄波束上的准确波束对准，主要依赖于信道状态信息和通过穷举搜索进行波束扫描，会带来计算和通信开销。这样导致的开销限制了它们在涉及高度动态场景的车辆对基础设施（V2I）和车辆对车辆（V2V）通信中的潜在使用。相比之下，利用来自传感器设备获取的越带外背景信息，如感知数据，提供了减少开销的更好选择。本文提出了一种基于深度学习的解决方案，利用多模式感知数据预测具有足够毫米波接收功率的最佳波束，从而可以主动确保最佳的V2I和V2V直射链路。所提出的解决方案已在真实测量的毫米波感知和通信数据上进行了测试，结果表明，在预测前13个波束时，可以达到高达98.19%的准确率。相应地，与现有的波束扫描方法相比，波束扫描搜索空间和时间开销分别大约缩短了79.67%和91.89%，证实了毫米波通信中波束成形的一个有前途的解决方案。

更新时间: 2025-04-08 16:18:00

领域: cs.NI,cs.AI,cs.ET,cs.LG,eess.SP

下载: http://arxiv.org/abs/2504.06173v1

KnowCoder-X: Boosting Multilingual Information Extraction via Code

Empirical evidence indicates that LLMs exhibit spontaneous cross-lingual alignment. However, although LLMs show promising cross-lingual alignment in IE, a significant imbalance across languages persists, highlighting an underlying deficiency. To address this, we propose KnowCoder-X, a powerful code LLM with advanced cross-lingual and multilingual capabilities for universal information extraction. Firstly, it standardizes the representation of multilingual schemas using Python classes, ensuring a consistent ontology across different languages. Then, IE across languages is formulated as a unified code generation task. Secondly, we enhance the model's cross-lingual transferability through IE cross-lingual alignment instruction tuning on a translated instance prediction task we proposed. During this phase, we also construct a high-quality and diverse bilingual IE parallel dataset with 257k samples, called ParallelNER, synthesized by our proposed robust three-stage pipeline, with manual annotation to ensure quality. Although without training in 29 unseen languages, KnowCoder-X surpasses ChatGPT by $30.17\%$ and SoTA by $20.03\%$, thereby demonstrating superior cross-lingual IE capabilities. Comprehensive evaluations on 64 IE benchmarks in Chinese and English under various settings demonstrate that KnowCoder-X significantly enhances cross-lingual IE transfer through boosting the IE alignment. Our code and dataset are available at: https://github.com/ICT-GoKnow/KnowCoder

Updated: 2025-04-08 16:16:30

标题: KnowCoder-X：通过代码提高多语言信息提取

摘要: 实证证据表明，LLMs表现出自发的跨语言对齐。然而，虽然LLMs在信息抽取方面显示出有希望的跨语言对齐，但跨语言之间存在显著的不平衡，突显了潜在的不足。为了解决这个问题，我们提出了KnowCoder-X，这是一个具有先进跨语言和多语言能力的强大的代码LLM，用于通用信息抽取。首先，它通过使用Python类来标准化多语言模式的表示，确保在不同语言之间保持一致的本体论。然后，跨语言信息抽取被构建为一个统一的代码生成任务。其次，我们通过在我们提出的翻译实例预测任务上调整IE跨语言对齐指令来增强模型的跨语言可传递性。在这个阶段，我们还通过我们提出的鲁棒的三阶段管道构建了一个包含257k个样本的高质量和多样化的双语IE平行数据集，称为ParallelNER，通过手动注释来确保质量。尽管没有在29种新语言上进行训练，KnowCoder-X超过了ChatGPT $30.17\%$ 和SoTA $20.03\%$，从而展示出了优越的跨语言信息抽取能力。在不同设置下对中文和英文的64个信息抽取基准进行全面评估表明，KnowCoder-X通过提升信息抽取对齐显著增强了跨语言信息抽取的传递性。我们的代码和数据集可在以下链接找到：https://github.com/ICT-GoKnow/KnowCoder

更新时间: 2025-04-08 16:16:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.04794v2

Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi

The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, hidden information, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for a various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of ``rules''. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting the action space using conventions, which act as special cooperative actions that span over multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play across a various number of cooperators within Hanabi.

Updated: 2025-04-08 16:15:33

标题: 用约定增强行动空间以提高《花火》中多智能体合作

摘要: Hanabi纸牌游戏被认为是测试和开发多智能体强化学习（MARL）算法的强大媒介，因为它具有合作性质、隐藏信息、有限通信和显著的复杂性。先前的研究工作已经探索了MARL算法在Hanabi中的能力，主要关注先进的架构设计和算法操作，以实现各种合作者的最新性能。然而，这通常导致复杂的解决策略，计算成本高，需要大量的训练数据。为了有效解决Hanabi游戏，人类需要使用惯例，这通常允许以预定义和相互同意的“规则”集为基础隐含地传达思想或知识。包含部分可观测性的多智能体问题，尤其在存在有限通信时，可以极大受益于使用隐式知识共享。在本文中，我们提出了一种新颖的方法，通过使用惯例来扩充行动空间，这些惯例作为特殊的合作行动，涵盖多个时间步骤和多个智能体，需要智能体主动选择才能实现。这些惯例基于现有的人类惯例，并显著提高了在Hanabi中自我对弈和跨对弈的现有技术在各种合作者中的表现。

更新时间: 2025-04-08 16:15:33

领域: cs.MA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.06333v2

A stochastic first-order method with multi-extrapolated momentum for highly smooth unconstrained optimization

In this paper, we consider an unconstrained stochastic optimization problem where the objective function exhibits high-order smoothness. Specifically, we propose a new stochastic first-order method (SFOM) with multi-extrapolated momentum, in which multiple extrapolations are performed in each iteration, followed by a momentum update based on these extrapolations. We demonstrate that the proposed SFOM can accelerate optimization by exploiting the high-order smoothness of the objective function $f$. Assuming that the $p$th-order derivative of $f$ is Lipschitz continuous for some $p\ge2$, and under additional mild assumptions, we establish that our method achieves a sample complexity of $\widetilde{\mathcal{O}}(\epsilon^{-(3p+1)/p})$ for finding a point $x$ such that $\mathbb{E}[\|\nabla f(x)\|]\le\epsilon$. To the best of our knowledge, this is the first SFOM to leverage arbitrary-order smoothness of the objective function for acceleration, resulting in a sample complexity that improves upon the best-known results without assuming the mean-squared smoothness condition. Preliminary numerical experiments validate the practical performance of our method and support our theoretical findings.

Updated: 2025-04-08 16:04:37

标题: 一种具有多次外推动量的高度光滑无约束优化的随机一阶方法

摘要: 在本文中，我们考虑一个无约束的随机优化问题，其中目标函数具有高阶平滑性。具体地，我们提出了一种新的带有多次外推动量的随机一阶方法（SFOM），在每次迭代中进行多次外推，然后根据这些外推进行动量更新。我们证明了所提出的SFOM可以通过利用目标函数$f$的高阶平滑性来加速优化过程。假设$f$的$p$阶导数在某个$p\ge2$时是Lipschitz连续的，并在额外的温和假设下，我们建立了我们的方法在寻找一个点$x$，使得$\mathbb{E}[\|\nabla f(x)\|]\le\epsilon$时，达到样本复杂度$\widetilde{\mathcal{O}}(\epsilon^{-(3p+1)/p})$。据我们所知，这是第一个利用目标函数任意阶平滑性加速的SFOM，其样本复杂度优于在不假定均方平滑条件的情况下已知的最佳结果。初步数值实验验证了我们方法的实际表现，并支持我们的理论发现。

更新时间: 2025-04-08 16:04:37

领域: math.OC,cs.AI,cs.LG,49M05, 49M37, 90C25, 90C30

下载: http://arxiv.org/abs/2412.14488v4

Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks

This paper presents a novel approach to detect F0 through Convolutional Neural Networks and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates a very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our new approach and other state-of-the-art CNN methods reveals that our approach can enhance the detection rate by approximately 5% across various Signal-to-Noise Ratio conditions.

Updated: 2025-04-08 16:01:25

标题: 实时音高/F0检测使用谱图像和卷积神经网络

摘要: 本文提出了一种新颖的方法，通过卷积神经网络和图像处理技术来直接从频谱图像中估计音高。我们的新方法展示了非常好的检测准确度；预测的音高轮廓中有92%与真实音高轮廓具有强或中等相关性。此外，我们的新方法与其他最先进的卷积神经网络方法进行的实验比较表明，在各种信噪比条件下，我们的方法可以将检测率提高约5%。

更新时间: 2025-04-08 16:01:25

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2504.06165v1

Topological Approach for Data Assimilation

Many dynamical systems are difficult or impossible to model using high fidelity physics based models. Consequently, researchers are relying more on data driven models to make predictions and forecasts. Based on limited training data, machine learning models often deviate from the true system states over time and need to be continually updated as new measurements are taken using data assimilation. Classical data assimilation algorithms typically require knowledge of the measurement noise statistics which may be unknown. In this paper, we introduce a new data assimilation algorithm with a foundation in topological data analysis. By leveraging the differentiability of functions of persistence, gradient descent optimization is used to minimize topological differences between measurements and forecast predictions by tuning data driven model coefficients without using noise information from the measurements. We describe the method and focus on its capabilities performance using the chaotic Lorenz 63 system as an example and we also show that the method works on a higher dimensional example with the Lorenz 96 system.

Updated: 2025-04-08 15:59:11

标题: 拓扑数据同化的方法

摘要: 许多动力系统很难或不可能使用高保真度的基于物理的模型进行建模。因此，研究人员更多地依赖数据驱动模型进行预测和预测。基于有限的训练数据，机器学习模型往往会随着时间的推移偏离真实系统状态，并且需要在采用数据同化获取新测量值时不断更新。传统的数据同化算法通常需要了解测量噪声统计数据，而这可能是未知的。在本文中，我们介绍了一种基于拓扑数据分析的新型数据同化算法。通过利用持久性函数的可微性，梯度下降优化被用来通过调整数据驱动模型系数来最小化测量和预测之间的拓扑差异，而无需使用来自测量的噪声信息。我们描述了该方法，并重点关注其在混沌的Lorenz 63系统上的性能表现，并且我们还展示了该方法在Lorenz 96系统的高维示例上的工作。

更新时间: 2025-04-08 15:59:11

领域: nlin.CD,cs.LG,math.AT

下载: http://arxiv.org/abs/2411.18627v2

Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups

Large Language Models (LLMs) have been shown to demonstrate imbalanced biases against certain groups. However, the study of unprovoked targeted attacks by LLMs towards at-risk populations remains underexplored. Our paper presents three novel contributions: (1) the explicit evaluation of LLM-generated attacks on highly vulnerable mental health groups; (2) a network-based framework to study the propagation of relative biases; and (3) an assessment of the relative degree of stigmatization that emerges from these attacks. Our analysis of a recently released large-scale bias audit dataset reveals that mental health entities occupy central positions within attack narrative networks, as revealed by a significantly higher mean centrality of closeness (p-value = 4.06e-10) and dense clustering (Gini coefficient = 0.7). Drawing from sociological foundations of stigmatization theory, our stigmatization analysis indicates increased labeling components for mental health disorder-related targets relative to initial targets in generation chains. Taken together, these insights shed light on the structural predilections of large language models to heighten harmful discourse and highlight the need for suitable approaches for mitigation.

Updated: 2025-04-08 15:56:57

标题: 穿越兔子洞：针对精神健康群体的LLM生成攻击叙事中出现的新兴偏见

摘要: 大型语言模型（LLMs）已被证明存在对某些群体的不平衡偏见。然而，对LLMs对处于风险群体的无端针对性攻击的研究仍未被充分探讨。我们的论文提出了三个新的贡献：（1）对高度脆弱的心理健康群体进行LLM生成攻击的明确评估；（2）一个基于网络的框架来研究相对偏见的传播；以及（3）评估这些攻击所产生的相对污名化程度。我们对最近发布的大规模偏见审计数据集进行的分析显示，心理健康实体在攻击叙事网络中占据核心位置，这表现在接近性的平均中心性显著更高（p值=4.06e-10）和密集聚类（基尼系数=0.7）。从污名化理论的社会学基础出发，我们的污名化分析表明，相对于生成链中的初始目标，与心理健康障碍相关的目标具有增加的标签化组件。综合这些见解，我们揭示了大型语言模型增加有害言论的结构偏好，突显了需要适当方法来减轻这种情况的必要性。

更新时间: 2025-04-08 15:56:57

领域: cs.CL,cs.AI,cs.CY,cs.LG,cs.SI,J.4; K.4.1; K.4.2

下载: http://arxiv.org/abs/2504.06160v1

Model Inversion Attack against Federated Unlearning

With the introduction of regulations related to the ``right to be forgotten", federated learning (FL) is facing new privacy compliance challenges. To address these challenges, researchers have proposed federated unlearning (FU). However, existing FU research has primarily focused on improving the efficiency of unlearning, with less attention paid to the potential privacy vulnerabilities inherent in these methods. To address this gap, we draw inspiration from gradient inversion attacks in FL and propose the federated unlearning inversion attack (FUIA). The FUIA is specifically designed for the three types of FU (sample unlearning, client unlearning, and class unlearning), aiming to provide a comprehensive analysis of the privacy leakage risks associated with FU. In FUIA, the server acts as an honest-but-curious attacker, recording and exploiting the model differences before and after unlearning to expose the features and labels of forgotten data. FUIA significantly leaks the privacy of forgotten data and can target all types of FU. This attack contradicts the goal of FU to eliminate specific data influence, instead exploiting its vulnerabilities to recover forgotten data and expose its privacy flaws. Extensive experimental results show that FUIA can effectively reveal the private information of forgotten data. To mitigate this privacy leakage, we also explore two potential defense methods, although these come at the cost of reduced unlearning effectiveness and the usability of the unlearned model.

Updated: 2025-04-08 15:54:55

标题: 模型反转攻击对联邦遗忘的影响

摘要: 随着与“被遗忘权”相关的法规的引入，联邦学习（FL）面临着新的隐私合规挑战。为了解决这些挑战，研究人员提出了联邦去学习（FU）。然而，现有的FU研究主要集中在改善去学习的效率上，对这些方法固有的潜在隐私漏洞关注较少。为了弥补这一差距，我们从FL中的梯度反转攻击中汲取灵感，提出了联邦去学习反转攻击（FUIA）。FUIA专门针对FU中的三种类型（样本去学习，客户去学习和类别去学习），旨在提供与FU相关的隐私泄露风险的全面分析。在FUIA中，服务器充当一个诚实但好奇的攻击者，记录并利用去学习前后的模型差异，以暴露被遗忘数据的特征和标签。FUIA显著泄露了被遗忘数据的隐私，可以针对所有类型的FU。这种攻击违背了FU的目标，即消除特定数据的影响，而是利用其漏洞恢复被遗忘的数据并暴露其隐私缺陷。大量实验结果显示，FUIA可以有效地揭示被遗忘数据的私人信息。为了减少这种隐私泄露，我们还探讨了两种潜在的防御方法，尽管这些方法会降低去学习的效果和未学习模型的可用性。

更新时间: 2025-04-08 15:54:55

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2502.14558v3

Hall Effect Thruster Forecasting using a Topological Approach for Data Assimilation

Hall Effect Thrusters (HETs) are electric thrusters that eject heavy ionized gas particles from the spacecraft to generate thrust. Although traditionally they were used for station keeping, recently They have been used for interplanetary space missions due to their high delta-V potential and their operational longevity in contrast to other thrusters, e.g., chemical. However, the operation of HETs involves complex processes such as ionization of gases, strong magnetic fields, and complicated solar panel power supply interactions. Therefore, their operation is extremely difficult to model thus necessitating Data Assimilation (DA) approaches for estimating and predicting their operational states. Because HET's operating environment is often noisy with non-Gaussian sources, this significantly limits applicable DA tools. We describe a topological approach for data assimilation that bypasses these limitations that does not depend on the noise model, and utilize it to forecast spatiotemporal plume field states of HETs. Our approach is a generalization of the Topological Approach for Data Assimilation (TADA) method that allows including different forecast functions. We show how TADA can be combined with the Long Short-Term Memory network for accurate forecasting. We then apply our approach to high-fidelity Hall Effect Thruster (HET) simulation data from the Air Force Research Laboratory (AFRL) rocket propulsion division where we demonstrate the forecast resiliency of TADA on noise contaminated, high-dimensional data.

Updated: 2025-04-08 15:52:50

标题: 霍尔效应推进器的预测：利用拓扑方法进行数据同化

摘要: 霍尔效应推进器（HETs）是一种通过从宇宙飞船中排出重离子化气体微粒来产生推力的电推进器。尽管传统上它们被用于保持轨道稳定，但最近由于其高ΔV潜力以及相对于其他推进器（如化学推进器）的操作寿命长，它们已被用于星际空间任务。然而，HETs的操作涉及到气体的电离、强磁场和复杂的太阳能电池供电相互作用等复杂过程。因此，它们的操作极为难以建模，因此需要数据同化（DA）方法来估计和预测其操作状态。由于HET的操作环境通常嘈杂且具有非高斯源，这显著限制了适用的DA工具。我们描述了一种拓扑方法的数据同化，绕过了这些限制，不依赖于噪声模型，并利用它来预测HET的空间时间喷流场状态。我们的方法是对数据同化的拓扑方法（TADA）的一种推广，允许包括不同的预测功能。我们展示了TADA如何与长短期记忆网络结合以进行精确的预测。然后，我们将我们的方法应用于空军研究实验室（AFRL）火箭推进部门的高保真度Hall效应推进器（HET）模拟数据，展示了TADA在噪声污染、高维数据上的预测弹性。

更新时间: 2025-04-08 15:52:50

领域: cs.LG

下载: http://arxiv.org/abs/2504.06157v1

CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network

This study uses CAPM (Convex Adversarial Polytope for Maxpool-based CNN) to improve the verified bound for general purpose maxpool-based convolutional neural networks (CNNs) under bounded norm adversarial perturbations. The maxpool function is decomposed as a series of ReLU functions to extend the convex relaxation technique to maxpool functions, by which the verified bound can be efficiently computed through a dual network. The experimental results demonstrate that this technique allows the state-of-the-art verification precision for maxpool-based CNNs and involves a much lower computational cost than current verification methods, such as DeepZ, DeepPoly and PRIMA. This method is also applicable to large-scale CNNs, which previous studies show to be often computationally prohibitively expensive. Under certain circumstances, CAPM is 40-times, 20-times or twice as fast and give a significantly higher verification bound (CAPM 98% vs. PRIMA 76%/DeepPoly 73%/DeepZ 8%) as compared to PRIMA/DeepPoly/DeepZ. Furthermore, we additionally present the time complexity of our algorithm as $O(W^2NK)$, where $W$ is the maximum width of the neural network, $N$ is the number of neurons, and $K$ is the size of the maxpool layer's kernel.

Updated: 2025-04-08 15:51:23

标题: CAPM：通过双网络快速和稳健地验证基于Maxpool的CNN

摘要: 这项研究利用CAPM（基于最大池的CNN的凸对抗多面体）来改进受限规范对抗扰动下卷积神经网络（CNN）的验证上界。将最大池函数分解为一系列ReLU函数，以扩展凸松弛技术至最大池函数，通过这种方式可以通过双网络高效计算验证上界。实验结果表明，这种技术允许在最大池基础的CNN中达到最新的验证精度，并且比当前的验证方法（如DeepZ、DeepPoly和PRIMA）具有更低的计算成本。这种方法也适用于大规模CNN，先前的研究表明这种方法通常计算成本过高。在某些情况下，与PRIMA/DeepPoly/DeepZ相比，CAPM的速度快40倍、20倍或2倍，并且提供了显著更高的验证上界（CAPM 98% vs. PRIMA 76%/DeepPoly 73%/DeepZ 8%）。此外，我们还提供了算法的时间复杂度为$O(W^2NK)$，其中$W$是神经网络的最大宽度，$N$是神经元的数量，$K$是最大池层的卷积核大小。

更新时间: 2025-04-08 15:51:23

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09550v2

Convexity in ReLU Neural Networks: beyond ICNNs?

Convex functions and their gradients play a critical role in mathematical imaging, from proximal optimization to Optimal Transport. The successes of deep learning has led many to use learning-based methods, where fixed functions or operators are replaced by learned neural networks. Regardless of their empirical superiority, establishing rigorous guarantees for these methods often requires to impose structural constraints on neural architectures, in particular convexity. The most popular way to do so is to use so-called Input Convex Neural Networks (ICNNs). In order to explore the expressivity of ICNNs, we provide necessary and sufficient conditions for a ReLU neural network to be convex. Such characterizations are based on product of weights and activations, and write nicely for any architecture in the path-lifting framework. As particular applications, we study our characterizations in depth for 1 and 2-hidden-layer neural networks: we show that every convex function implemented by a 1-hidden-layer ReLU network can be also expressed by an ICNN with the same architecture; however this property no longer holds with more layers. Finally, we provide a numerical procedure that allows an exact check of convexity for ReLU neural networks with a large number of affine regions.

Updated: 2025-04-08 15:49:44

标题: 在ReLU神经网络中的凸性：超越ICNNs？

摘要: 凸函数及其梯度在数学成像中起着关键作用，从近端优化到最优输运。深度学习的成功导致许多人使用基于学习的方法，其中固定函数或操作符被学习的神经网络取代。尽管这些方法在经验上表现优越，但要为这些方法建立严格的保证通常需要对神经网络体系结构施加结构约束，特别是凸性。实现这一目标的最流行方式是使用所谓的输入凸神经网络（ICNNs）。为了探索ICNNs的表达能力，我们提供了ReLU神经网络为凸函数的必要和充分条件。这些表征是基于权重和激活的乘积，并且对路径提升框架中的任何架构都很好地描述。作为特定应用，我们深入研究了1和2隐藏层神经网络的特征：我们展示了一个由1隐藏层ReLU网络实现的每个凸函数也可以由具有相同架构的ICNN表达；然而，这种特性在更多层中不再成立。最后，我们提供了一个数值程序，允许对具有大量仿射区域的ReLU神经网络进行凸性的精确检查。

更新时间: 2025-04-08 15:49:44

领域: cs.LG

下载: http://arxiv.org/abs/2501.03017v2

Expertized Caption Auto-Enhancement for Video-Text Retrieval

Video-text retrieval has been stuck in the information mismatch caused by personalized and inadequate textual descriptions of videos. The substantial information gap between the two modalities hinders an effective cross-modal representation alignment, resulting in ambiguous retrieval results. Although text rewriting methods have been proposed to broaden text expressions, the modality gap remains significant, as the text representation space is hardly expanded with insufficient semantic enrichment.Instead, this paper turns to enhancing visual presentation, bridging video expression closer to textual representation via caption generation and thereby facilitating video-text matching.While multimodal large language models (mLLM) have shown a powerful capability to convert video content into text, carefully crafted prompts are essential to ensure the reasonableness and completeness of the generated captions. Therefore, this paper proposes an automatic caption enhancement method that improves expression quality and mitigates empiricism in augmented captions through self-learning.Additionally, an expertized caption selection mechanism is designed and introduced to customize augmented captions for each video, further exploring the utilization potential of caption augmentation.Our method is entirely data-driven, which not only dispenses with heavy data collection and computation workload but also improves self-adaptability by circumventing lexicon dependence and introducing personalized matching. The superiority of our method is validated by state-of-the-art results on various benchmarks, specifically achieving Top-1 recall accuracy of 68.5% on MSR-VTT, 68.1% on MSVD, and 62.0% on DiDeMo. Our code is publicly available at https://github.com/CaryXiang/ECA4VTR.

Updated: 2025-04-08 15:45:28

标题: 视频文本检索的专家级标题自动增强

摘要: 视频文本检索一直受到视频个性化和不足的文本描述导致的信息不匹配问题的困扰。两种模态之间的重大信息差距阻碍了有效的跨模态表示对齐，导致检索结果模糊不清。尽管已经提出了文本重写方法来拓宽文本表达，但是由于文本表示空间在语义丰富度不足的情况下很难扩展，模态差距仍然显著。相反，本文转向增强视觉表现，通过生成字幕将视频表达与文本表示更接近，从而促进视频文本匹配。虽然多模态大型语言模型（mLLM）已经显示出将视频内容转换为文本的强大能力，但是精心设计的提示对于确保生成的字幕的合理性和完整性至关重要。因此，本文提出了一种自动字幕增强方法，通过自学习改善表达质量，并减轻通过自学习改善表达质量，并减轻通过自学习改善表达质量，并减轻通过自学习改善表达质量，并减轻增强字幕中的经验主义。此外，设计并引入了一种专家化字幕选择机制，为每个视频定制增强字幕，进一步探索字幕增强的利用潜力。我们的方法完全是数据驱动的，不仅省去了繁重的数据采集和计算工作量，而且通过规避词汇依赖性和引入个性化匹配来提高自适应性。我们的方法的优越性通过在各种基准测试中的最新结果得到验证，具体来说，在MSR-VTT上达到了68.5%的Top-1召回准确率，在MSVD上达到了68.1%，在DiDeMo上达到了62.0%。我们的代码可以在https://github.com/CaryXiang/ECA4VTR公开获取。

更新时间: 2025-04-08 15:45:28

领域: cs.CV,cs.AI,cs.LG,H.3.3; I.2.10; I.2.7; H.5.1

下载: http://arxiv.org/abs/2502.02885v3

ARLO: A Tailorable Approach for Transforming Natural Language Software Requirements into Architecture using LLMs

Software requirements expressed in natural language (NL) frequently suffer from verbosity, ambiguity, and inconsistency. This creates a range of challenges, including selecting an appropriate architecture for a system and assessing different architectural alternatives. Relying on human expertise to accomplish the task of mapping NL requirements to architecture is time-consuming and error-prone. This paper proposes ARLO, an approach that automates this task by leveraging (1) a set of NL requirements for a system, (2) an existing standard that specifies architecturally relevant software quality attributes, and (3) a readily available Large Language Model (LLM). Specifically, ARLO determines the subset of NL requirements for a given system that is architecturally relevant and maps that subset to a tailorable matrix of architectural choices. ARLO applies integer linear programming on the architectural-choice matrix to determine the optimal architecture for the current requirements. We demonstrate ARLO's efficacy using a set of real-world examples. We highlight ARLO's ability (1) to trace the selected architectural choices to the requirements and (2) to isolate NL requirements that exert a particular influence on a system's architecture. This allows the identification, comparative assessment, and exploration of alternative architectural choices based on the requirements and constraints expressed therein.

Updated: 2025-04-08 15:38:42

标题: ARLO：使用LLMs将自然语言软件需求转化为架构的可定制方法

摘要: 自然语言表达的软件需求经常受到冗长、模棱两可和不一致的困扰。这造成了一系列挑战，包括选择系统的适当架构和评估不同的架构替代方案。依赖人类专业知识来实现将自然语言需求映射到架构的任务耗时且容易出错。本文提出了ARLO，一种利用（1）系统的一组自然语言需求，（2）指定了架构相关软件质量属性的现有标准，以及（3）现成的大型语言模型（LLM）来自动化这一任务的方法。具体地，ARLO确定了对于给定系统而言具有架构相关性的自然语言需求子集，并将该子集映射到可定制的架构选择矩阵。ARLO在架构选择矩阵上应用整数线性规划，以确定当前需求的最佳架构。我们使用一组真实世界示例展示了ARLO的有效性。我们强调ARLO的能力（1）跟踪所选架构选择到需求，（2）隔离对系统架构产生特定影响的自然语言需求。这允许根据其中表达的需求和约束，识别、比较评估和探索替代架构选择。

更新时间: 2025-04-08 15:38:42

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2504.06143v1

Adversarial Training of Reward Models

Reward modeling has emerged as a promising approach for the scalable alignment of language models. However, contemporary reward models (RMs) often lack robustness, awarding high rewards to low-quality, out-of-distribution (OOD) samples. This can lead to reward hacking, where policies exploit unintended shortcuts to maximize rewards, undermining alignment. To address this challenge, we introduce Adv-RM, a novel adversarial training framework that automatically identifies adversarial examples -- responses that receive high rewards from the target RM but are OOD and of low quality. By leveraging reinforcement learning, Adv-RM trains a policy to generate adversarial examples that reliably expose vulnerabilities in large state-of-the-art reward models such as Nemotron 340B RM. Incorporating these adversarial examples into the reward training process improves the robustness of RMs, mitigating reward hacking and enhancing downstream performance in RLHF. We demonstrate that Adv-RM significantly outperforms conventional RM training, increasing stability and enabling more effective RLHF training in both synthetic and real-data settings.

Updated: 2025-04-08 15:38:25

标题: 对抗训练奖励模型

摘要: 奖励建模已经成为一种可扩展的语言模型对齐的有前途的方法。然而，当代奖励模型(RMs)常常缺乏鲁棒性，对低质量、分布外的样本给予高奖励。这可能导致奖励黑客，即政策利用意外的捷径来最大化奖励，破坏对齐。为了解决这一挑战，我们引入了Adv-RM，这是一个新颖的对抗训练框架，可以自动识别对抗样本--即接收目标RM高奖励但是分布外且低质量的响应。通过利用强化学习，Adv-RM训练一个策略来生成可靠地暴露出大型最先进奖励模型（如Nemotron 340B RM）的漏洞的对抗样本。将这些对抗样本纳入奖励训练过程中可以提高RM的鲁棒性，减轻奖励黑客，并增强RLHF中的下游性能。我们证明Adv-RM明显优于传统RM训练，在合成和实际数据设置中增加了稳定性，并使RLHF训练更有效。

更新时间: 2025-04-08 15:38:25

领域: cs.LG

下载: http://arxiv.org/abs/2504.06141v1

Early Classification of Time Series: Taxonomy and Benchmark

In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known as Early Classification of Time Series (ECTS). Although it has been the subject of a growing body of literature, there is still a lack of a systematic, shared evaluation protocol to compare the relative merits of the various existing methods. This document begins by situating these methods within a principle-based taxonomy. It defines dimensions for organizing their evaluation, and then reports the results of a very extensive set of experiments along these dimensions involving nine state-of-the art ECTS algorithms. In addition, these and other experiments can be carried out using an open-source library in which most of the existing ECTS algorithms have been implemented (see https://github.com/ML-EDM/ml_edm).

Updated: 2025-04-08 15:37:13

标题: 时间序列的早期分类：分类学和基准

摘要: 在许多情况下，研究现象的测量是按顺序提供的，需要尽早进行类别预测，以避免产生过高的时间惩罚，但又不能太早，以免冒误分类的风险。这个问题在时间序列的情况下得到了特别研究，被称为时间序列的早期分类（ECTS）。尽管已经有越来越多的文献对此进行了研究，但仍然缺乏一个系统的、共享的评估协议，以比较各种现有方法的相对优点。本文首先将这些方法置于基于原则的分类中。它定义了组织评估的维度，然后报告了涉及九种最先进的ECTS算法的一系列非常广泛的实验结果。此外，这些和其他实验可以使用一个开源库进行，其中大多数现有的ECTS算法已经实现（请参见https://github.com/ML-EDM/ml_edm）。

更新时间: 2025-04-08 15:37:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.18332v5

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

Long-form video processing fundamentally challenges vision-language models (VLMs) due to the high computational costs of handling extended temporal sequences. Existing token pruning and feature merging methods often sacrifice critical temporal dependencies or dilute semantic information. We introduce differential distillation, a principled approach that systematically preserves task-relevant information while suppressing redundancy. Based on this principle, we develop ViLaMP, a hierarchical video-language model that processes hour-long videos at ``mixed precision'' through two key mechanisms: (1) differential keyframe selection that maximizes query relevance while maintaining temporal distinctiveness at the frame level and (2) differential feature merging that preserves query-salient features in non-keyframes at the patch level. Hence, ViLaMP retains full information in keyframes while reducing non-keyframes to their most salient features, resembling mixed-precision training. Extensive experiments demonstrate ViLaMP's superior performance across four video understanding benchmarks, particularly on long-form content. Notably, ViLaMP can process ultra-long videos (up to 10K frames) on a single NVIDIA A100 GPU, achieving substantial computational efficiency while maintaining state-of-the-art performance.

Updated: 2025-04-08 15:36:11

标题: 将视频-语言模型通过分层差分蒸馏扩展到10K帧

摘要: 长篇视频处理从根本上挑战视觉语言模型（VLMs），因为处理长时间序列的高计算成本。现有的令牌修剪和特征合并方法经常牺牲关键的时间依赖性或淡化语义信息。我们引入差异蒸馏，这是一种系统地保留任务相关信息同时抑制冗余的原则性方法。基于这一原则，我们开发了ViLaMP，这是一个层次化的视频语言模型，通过两个关键机制在“混合精度”下处理长达一小时的视频：（1）差异关键帧选择，最大化查询相关性同时在帧级别保持时间上的区别，（2）差异特征合并，在补丁级别保留查询显著特征的非关键帧。因此，ViLaMP保留关键帧的全部信息，同时将非关键帧减少到它们最显著的特征，类似于混合精度训练。大量实验表明ViLaMP在四个视频理解基准测试中表现优越，特别是在长篇内容上。值得注意的是，ViLaMP可以在单个NVIDIA A100 GPU上处理超长视频（高达10K帧），实现了大幅的计算效率，同时保持了最先进的性能水平。

更新时间: 2025-04-08 15:36:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.02438v2

A Multimedia Analytics Model for the Foundation Model Era

The rapid advances in Foundation Models and agentic Artificial Intelligence are transforming multimedia analytics by enabling richer, more sophisticated interactions between humans and analytical systems. Existing conceptual models for visual and multimedia analytics, however, do not adequately capture the complexity introduced by these powerful AI paradigms. To bridge this gap, we propose a comprehensive multimedia analytics model specifically designed for the foundation model era. Building upon established frameworks from visual analytics, multimedia analytics, knowledge generation, analytic task definition, mixed-initiative guidance, and human-in-the-loop reinforcement learning, our model emphasizes integrated human-AI teaming based on visual analytics agents from both technical and conceptual perspectives. Central to the model is a seamless, yet explicitly separable, interaction channel between expert users and semi-autonomous analytical processes, ensuring continuous alignment between user intent and AI behavior. The model addresses practical challenges in sensitive domains such as intelligence analysis, investigative journalism, and other fields handling complex, high-stakes data. We illustrate through detailed case studies how our model facilitates deeper understanding and targeted improvement of multimedia analytics solutions. By explicitly capturing how expert users can optimally interact with and guide AI-powered multimedia analytics systems, our conceptual framework sets a clear direction for system design, comparison, and future research.

Updated: 2025-04-08 15:35:59

标题: 一个多媒体分析模型用于基础模型时代

摘要: 基金会模型和主体人工智能的快速进展正在通过实现更丰富、更复杂的人机交互，改变多媒体分析。然而，现有的视觉和多媒体分析概念模型并未充分捕捉这些强大人工智能范式引入的复杂性。为了弥合这一差距，我们提出了一个专门为基金会模型时代设计的全面多媒体分析模型。借鉴了视觉分析、多媒体分析、知识生成、分析任务定义、混合主动引导和人在环中增强学习等已建立的框架，我们的模型强调了基于视觉分析代理的技术和概念视角的人工智能团队集成。模型的核心是专家用户和半自主分析过程之间的无缝、但明确可分离的交互渠道，确保用户意图和人工智能行为之间的持续对齐。该模型解决了情报分析、调查性新闻和处理复杂、高风险数据等敏感领域的实际挑战。我们通过详细案例研究展示了我们的模型如何促进对多媒体分析解决方案的深入了解和有针对性的改进。通过明确捕捉专家用户如何最佳地与和引导基于人工智能的多媒体分析系统交互，我们的概念框架为系统设计、比较和未来研究设定了清晰的方向。

更新时间: 2025-04-08 15:35:59

领域: cs.MM,cs.AI,cs.HC

下载: http://arxiv.org/abs/2504.06138v1

QGen Studio: An Adaptive Question-Answer Generation, Training and Evaluation Platform

We present QGen Studio: an adaptive question-answer generation, training, and evaluation platform. QGen Studio enables users to leverage large language models (LLMs) to create custom question-answer datasets and fine-tune models on this synthetic data. It features a dataset viewer and model explorer to streamline this process. The dataset viewer provides key metrics and visualizes the context from which the QA pairs are generated, offering insights into data quality. The model explorer supports model comparison, allowing users to contrast the performance of their trained LLMs against other models, supporting performance benchmarking and refinement. QGen Studio delivers an interactive, end-to-end solution for generating QA datasets and training scalable, domain-adaptable models. The studio will be open-sourced soon, allowing users to deploy it locally.

Updated: 2025-04-08 15:32:09

标题: QGen Studio：一种自适应问答生成、训练和评估平台

摘要: 我们提出了QGen Studio：一种自适应的问答生成、训练和评估平台。QGen Studio使用户能够利用大型语言模型（LLMs）创建自定义的问答数据集，并在这些合成数据上对模型进行微调。它具有数据集查看器和模型浏览器，以简化这一过程。数据集查看器提供关键指标，并可视化生成QA对的上下文，为数据质量提供见解。模型浏览器支持模型比较，允许用户对比他们训练的LLMs与其他模型的性能，支持性能基准测试和改进。QGen Studio提供了一个交互式的、端到端的解决方案，用于生成QA数据集和训练可扩展的、可适应领域的模型。该工作室将很快开源，允许用户在本地部署。

更新时间: 2025-04-08 15:32:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.06136v1

Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning

Retrieval-Augmented Generation (RAG) and vector-based search have become foundational tools for memory in AI systems, yet they struggle with abstraction, scalability, and semantic precision - especially in decentralized environments. We present SHIMI (Semantic Hierarchical Memory Index), a unified architecture that models knowledge as a dynamically structured hierarchy of concepts, enabling agents to retrieve information based on meaning rather than surface similarity. SHIMI organizes memory into layered semantic nodes and supports top-down traversal from abstract intent to specific entities, offering more precise and explainable retrieval. Critically, SHIMI is natively designed for decentralized ecosystems, where agents maintain local memory trees and synchronize them asynchronously across networks. We introduce a lightweight sync protocol that leverages Merkle-DAG summaries, Bloom filters, and CRDT-style conflict resolution to enable partial synchronization with minimal overhead. Through benchmark experiments and use cases involving decentralized agent collaboration, we demonstrate SHIMI's advantages in retrieval accuracy, semantic fidelity, and scalability - positioning it as a core infrastructure layer for decentralized cognitive systems.

Updated: 2025-04-08 15:31:00

标题: 分散式人工智能记忆：SHIMI，一种用于可扩展代理推理的语义分层内存索引

摘要: 检索增强生成（RAG）和基于向量的搜索已成为人工智能系统中记忆的基础工具，但它们在抽象性、可扩展性和语义精度方面存在困难，特别是在去中心化环境中。我们提出了SHIMI（语义分层记忆索引），这是一个统一的架构，将知识建模为动态结构化的概念层次，使代理能够基于含义而不是表面相似性检索信息。SHIMI将记忆组织成分层语义节点，并支持从抽象意图到具体实体的自顶向下遍历，提供更精确和可解释的检索。关键是，SHIMI是专为去中心化生态系统而设计的，其中代理维护本地记忆树，并在网络中异步同步它们。我们引入了一种轻量级同步协议，利用Merkle-DAG摘要、Bloom过滤器和CRDT风格的冲突解决，实现了最小开销的部分同步。通过基准实验和涉及去中心化代理协作的用例，我们展示了SHIMI在检索准确性、语义保真度和可扩展性方面的优势 - 将其定位为去中心化认知系统的核心基础设施层。

更新时间: 2025-04-08 15:31:00

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2504.06135v1

PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series

Benchmarking anomaly detection approaches for multivariate time series is a challenging task due to a lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. Additionally, our dataset represents a discrete-sequence problem, which remains unaddressed by previously-proposed solutions in literature. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data. Furthermore, results show that the threshold used can have a large influence on detection performance, hence more work needs to be invested in methods to find a suitable threshold without the need for labelled data.

Updated: 2025-04-08 15:26:49

标题: 路径：用于评估多变量时间序列在线无监督异常检测方法的离散序列数据集

摘要: 为多变量时间序列的异常检测方法进行基准测试是一项具有挑战性的任务，因为缺乏高质量的数据集。当前公开可用的数据集太小，不够多样化，特征的异常太简单，这阻碍了这一研究领域的可度量进展。我们提出了一种解决方案：通过最先进的仿真工具生成一个多样化、广泛且非平凡的数据集，反映了汽车动力总成的实际行为，包括其多变量、动态和可变状态属性。此外，我们的数据集代表了一个离散序列问题，在文献中以前未被解决。为了满足无监督和半监督异常检测设置，以及时间序列生成和预测，我们提供了数据集的不同版本，其中训练和测试子集以受污染和干净的版本提供，取决于任务。我们还提供了基于确定性和变分自动编码器以及非参数方法的一些方法的基线结果。正如预期的那样，基线实验表明，在半监督版本的数据集上训练的方法优于无监督对应方法，突出了需要更加能够抵抗受污染训练数据的方法。此外，结果显示使用的阈值可以对检测性能产生很大影响，因此需要更多的工作投入到寻找一个合适阈值的方法中，而不需要标记数据。

更新时间: 2025-04-08 15:26:49

领域: cs.LG,cs.AI,cs.CE,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.13951v4

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in recovering dense 3D surfaces. However, it is non-trivial to simultaneously recover meticulous geometry and preserve smoothness across regions with differing characteristics. To address this issue, previous methods typically employ geometric priors, which are often constrained by the performance of the prior models. In this paper, we propose ND-SDF, which learns a Normal Deflection field to represent the angular deviation between the scene normal and the prior normal. Unlike previous methods that uniformly apply geometric priors on all samples, introducing significant bias in accuracy, our proposed normal deflection field dynamically learns and adapts the utilization of samples based on their specific characteristics, thereby improving both the accuracy and effectiveness of the model. Our method not only obtains smooth weakly textured regions such as walls and floors but also preserves the geometric details of complex structures. In addition, we introduce a novel ray sampling strategy based on the deflection angle to facilitate the unbiased rendering process, which significantly improves the quality and accuracy of intricate surfaces, especially on thin structures. Consistent improvements on various challenging datasets demonstrate the superiority of our method.

Updated: 2025-04-08 15:24:36

标题: ND-SDF：学习用于高保真度室内重建的法向偏移场

摘要: 通过体积渲染进行神经隐式重建已经证明了在恢复密集的3D表面方面的有效性。然而，同时恢复细致的几何形状并在具有不同特征的区域之间保持平滑性并非易事。为了解决这个问题，先前的方法通常采用几何先验，这些几何先验通常受到先验模型性能的限制。在本文中，我们提出了ND-SDF，它学习一个法向偏转场来表示场景法线和先验法线之间的角度偏差。与先前方法不同的是，先前方法通常在所有样本上均匀应用几何先验，这会在准确性上引入显著的偏见，而我们提出的法向偏转场动态学习并根据它们的具体特征调整样本的利用，从而提高模型的准确性和有效性。我们的方法不仅获得了平滑的弱纹理区域，如墙壁和地板，还保留了复杂结构的几何细节。此外，我们引入了一种基于偏转角的射线采样策略，以促进无偏的渲染过程，这显著改善了复杂表面的质量和准确性，特别是在薄结构上。对各种具有挑战性的数据集的一致改进证明了我们方法的优越性。

更新时间: 2025-04-08 15:24:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.12598v3

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $\varepsilon \approx 1$ for Poisson subsampling and $\varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

Updated: 2025-04-08 15:21:03

标题: 避免在组合下对子样本机制的隐私核算中遇到的陷阱

摘要: 我们考虑计算对不同ially私有机制进行子抽样组合的严格隐私保证的问题。最近的算法可以以任意精度计算隐私参数，但必须小心应用。我们的主要贡献是解决两个常见的困惑点。首先，一些隐私会计假设对子抽样机制的组合的隐私保证是通过自我组合未组合机制的最坏情况数据集确定的。我们证明这在一般情况下并不成立。其次，泊松子抽样有时被认为与无替换抽样具有类似的隐私保证。我们证明实际上两种抽样方案的隐私保证可能存在显著差异。特别是，我们给出了一组超参数的示例，对于泊松子抽样，$\varepsilon \approx 1$，而对于无替换抽样，$\varepsilon > 10$。这发生在一些实际上可以选择为DP-SGD的参数上。

更新时间: 2025-04-08 15:21:03

领域: cs.CR,cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20769v2

Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms

Vehicle Routing Problems (VRP) are an extension of the Traveling Salesperson Problem and are a fundamental NP-hard challenge in combinatorial optimization. Solving VRP in real-time at large scale has become critical in numerous applications, from growing markets like last-mile delivery to emerging use-cases like interactive logistics planning. Such applications involve solving similar problem instances repeatedly, yet current state-of-the-art solvers treat each instance on its own without leveraging previous examples. We introduce a novel optimization framework that uses a reinforcement learning agent - trained on prior instances - to quickly generate initial solutions, which are then further optimized by genetic algorithms. Our framework, Evolutionary Algorithm with Reinforcement Learning Initialization (EARLI), consistently outperforms current state-of-the-art solvers across various time scales. For example, EARLI handles vehicle routing with 500 locations within 1s, 10x faster than current solvers for the same solution quality, enabling applications like real-time and interactive routing. EARLI can generalize to new data, as demonstrated on real e-commerce delivery data of a previously unseen city. Our hybrid framework presents a new way to combine reinforcement learning and genetic algorithms, paving the road for closer interdisciplinary collaboration between AI and optimization communities towards real-time optimization in diverse domains.

Updated: 2025-04-08 15:21:01

标题: 通过人工智能初始化的遗传算法加速车辆路径规划

摘要: 车辆路径问题（VRP）是旅行推销员问题的延伸，是组合优化中的一个基本的NP难题。在大规模实时解决VRP已成为许多应用中至关重要的问题，从像最后一英里配送这样的增长市场到像交互式物流规划这样的新兴用例。这些应用涉及反复解决类似问题实例，然而当前最先进的解决方案对每个实例单独处理而不利用先前的示例。我们引入了一个新颖的优化框架，该框架使用强化学习代理-在先前实例上进行训练-快速生成初始解决方案，然后通过遗传算法进一步优化这些解决方案。我们的框架，Evolutionary Algorithm with Reinforcement Learning Initialization（EARLI），在各种时间尺度上一贯优于当前最先进的解决方案。例如，EARLI在1秒内处理具有500个位置的车辆路径，比当前解决方案快10倍，同时保持相同的解决方案质量，实现了实时和交互式路径规划的应用。EARLI能够泛化到新数据，如在以前未见过的城市的真实电子商务交付数据上所示。我们的混合框架提出了一种结合强化学习和遗传算法的新方法，为AI和优化社区之间的更紧密的跨学科合作铺平了道路，实现各个领域的实时优化。

更新时间: 2025-04-08 15:21:01

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2504.06126v1

Scalable Dynamic Mixture Model with Full Covariance for Probabilistic Traffic Forecasting

Deep learning-based multivariate and multistep-ahead traffic forecasting models are typically trained with the mean squared error (MSE) or mean absolute error (MAE) as the loss function in a sequence-to-sequence setting, simply assuming that the errors follow an independent and isotropic Gaussian or Laplacian distributions. However, such assumptions are often unrealistic for real-world traffic forecasting tasks, where the probabilistic distribution of spatiotemporal forecasting is very complex with strong concurrent correlations across both sensors and forecasting horizons in a time-varying manner. In this paper, we model the time-varying distribution for the matrix-variate error process as a dynamic mixture of zero-mean Gaussian distributions. To achieve efficiency, flexibility, and scalability, we parameterize each mixture component using a matrix normal distribution and allow the mixture weight to change and be predictable over time. The proposed method can be seamlessly integrated into existing deep-learning frameworks with only a few additional parameters to be learned. We evaluate the performance of the proposed method on a traffic speed forecasting task and find that our method not only improves model performance but also provides interpretable spatiotemporal correlation structures.

Updated: 2025-04-08 15:19:44

标题: 可扩展的具有完整协方差的动态混合模型用于概率交通预测

摘要: 基于深度学习的多变量和多步骤交通预测模型通常在序列到序列设置中使用均方误差（MSE）或平均绝对误差（MAE）作为损失函数进行训练，简单地假设错误遵循独立和各向同性的高斯或拉普拉斯分布。然而，这种假设通常对于真实世界的交通预测任务来说是不现实的，因为时空预测的概率分布非常复杂，具有强烈的交叉相关性，同时在时变情况下跨越传感器和预测视野。在本文中，我们将矩阵变量误差过程的时变分布建模为零均值高斯分布的动态混合。为了实现效率、灵活性和可扩展性，我们使用矩阵正态分布参数化每个混合分量，并允许混合权重随时间变化且可预测。所提出的方法可以无缝集成到现有的深度学习框架中，只需学习少量额外参数。我们在交通速度预测任务上评估了所提出方法的性能，发现我们的方法不仅提高了模型性能，还提供了可解释的时空相关结构。

更新时间: 2025-04-08 15:19:44

领域: cs.LG

下载: http://arxiv.org/abs/2212.06653v4

Robo-taxi Fleet Coordination at Scale via Reinforcement Learning

Fleets of robo-taxis offering on-demand transportation services, commonly known as Autonomous Mobility-on-Demand (AMoD) systems, hold significant promise for societal benefits, such as reducing pollution, energy consumption, and urban congestion. However, orchestrating these systems at scale remains a critical challenge, with existing coordination algorithms often failing to exploit the systems' full potential. This work introduces a novel decision-making framework that unites mathematical modeling with data-driven techniques. In particular, we present the AMoD coordination problem through the lens of reinforcement learning and propose a graph network-based framework that exploits the main strengths of graph representation learning, reinforcement learning, and classical operations research tools. Extensive evaluations across diverse simulation fidelities and scenarios demonstrate the flexibility of our approach, achieving superior system performance, computational efficiency, and generalizability compared to prior methods. Finally, motivated by the need to democratize research efforts in this area, we release publicly available benchmarks, datasets, and simulators for network-level coordination alongside an open-source codebase designed to provide accessible simulation platforms and establish a standardized validation process for comparing methodologies. Code available at: https://github.com/StanfordASL/RL4AMOD

Updated: 2025-04-08 15:19:41

标题: 通过强化学习实现规模化的Robo-taxi车队协调

摘要: 机器人出租车车队提供按需交通服务，通常被称为自主移动出行（AMoD）系统，具有显著的社会效益，如减少污染、能源消耗和城市拥堵。然而，在规模上协调这些系统仍然是一个关键挑战，现有的协调算法经常未能充分发挥系统的潜力。本文介绍了一个将数学建模与数据驱动技术结合的新型决策框架。具体来说，我们通过强化学习的视角提出了AMoD协调问题，并提出了一个基于图网络的框架，利用了图表示学习、强化学习和经典运筹学工具的主要优势。通过对不同模拟忠实度和场景的广泛评估，我们的方法展现了其灵活性，相较于先前的方法，实现了优越的系统性能、计算效率和泛化能力。最后，受到在这一领域民主化研究努力的需要，我们发布了公开可用的网络级协调基准、数据集和模拟器，同时推出了一个旨在提供可访问的模拟平台并为比较方法学建立标准化验证流程的开源代码库。代码可在以下链接找到：https://github.com/StanfordASL/RL4AMOD

更新时间: 2025-04-08 15:19:41

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.06125v1

Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures

The reasoning abilities of large language models (LLMs) have improved with chain-of-thought (CoT) prompting, allowing models to solve complex tasks stepwise. However, training CoT capabilities requires detailed reasoning data, which is often scarce. The self-taught reasoner (STaR) framework addresses this by using reinforcement learning to automatically generate reasoning steps, reducing reliance on human-labeled data. Although STaR and its variants have demonstrated empirical success, a theoretical foundation explaining these improvements is lacking. Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting, which decomposes complex reasoning into step-by-step solutions. However, the mechanisms underlying LLMs' ability to perform arithmetic in a single step of CoT remain poorly understood. In this work, we propose that LLMs learn arithmetic by capturing algebraic structures, such as commutativity and identity properties. Since these structures are observable through input-output relationships, they can generalize to unseen data. We empirically demonstrate that LLMs can learn algebraic structures using a custom dataset of arithmetic problems, as well as providing theoretical evidence showing that, under specific configurations of weights and biases, the transformer-based LLMs can generate embeddings that remain invariant to both permutations of input tokens and the presence of identity elements. Our findings indicate that leveraging algebraic structures can enhance the LLMs' arithmetic capabilities, offering insights into improving their arithmetic performance.

Updated: 2025-04-08 15:19:23

标题: 揭示大型语言模型中的算术：代数结构的作用

摘要: 大型语言模型（LLMs）的推理能力随着思维链（CoT）提示的改进而提高，使模型能够逐步解决复杂任务。然而，训练CoT能力需要详细的推理数据，这通常是稀缺的。自学习推理器（STaR）框架通过使用强化学习自动生成推理步骤，减少对人工标记数据的依赖来解决这个问题。尽管STaR及其变体已经证明了实证成功，但缺乏解释这些改进的理论基础。大型语言模型（LLMs）已经展示了出色的数学能力，主要是由于思维链（CoT）提示的推动，将复杂的推理分解为逐步解决方案。然而，LLMs能够在CoT的单个步骤中执行算术的机制仍然不够清楚。在这项工作中，我们提出LLMs通过捕捉代数结构（如交换性和恒等性质）来学习算术。由于这些结构可以通过输入输出关系观察到，它们可以推广到未见过的数据。我们在实验证明LLMs可以使用算术问题的自定义数据集学习代数结构，并提供理论证据表明，在特定配置的权重和偏差下，基于变压器的LLMs可以生成保持对输入标记的排列和恒等元素存在不变的嵌入。我们的发现表明，利用代数结构可以增强LLMs的算术能力，为改进它们的算术性能提供了见解。

更新时间: 2025-04-08 15:19:23

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2411.16260v2

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttraining scaling as demonstrated by Open AI O1/O3 and Deepseek R1. In this work, we investigate the entire posttraining of ATP, aiming to align it with breakthroughs in reasoning models in natural languages.To begin, we continual train current ATP models with a hybrid dataset, which consists of numerous statement-proof pairs, and additional data aimed at incorporating cognitive behaviors that emulate human reasoning and hypothesis refinement. Next, we explore reinforcement learning with the use of outcome reward returned by Lean 4 compiler. Through our designed continual training and reinforcement learning processes, we have successfully improved existing formal provers, including both DeepSeek-Prover-v1.5 and Goedel-Prover, achieving state-of-the-art performance in the field of whole-proof generation. For example, we achieve a 59.8% pass rate (pass@32) on MiniF2F. This is an on-going project and we will progressively update our findings, release our data and training details.

Updated: 2025-04-08 15:15:26

标题: Leanabell-Prover：形式推理中的后训练缩放

摘要: 最近自动定理证明（ATP）在LLM的帮助下取得了一些进展，突显了利用Lean 4代码进行形式推理的潜力。然而，ATP尚未像Open AI O1/O3和DeepSeek R1所展示的那样通过最近的后训练扩展而实现革命性的发展。在这项工作中，我们调查了ATP的整个后训练过程，旨在将其与自然语言推理模型的突破相一致。首先，我们使用混合数据集持续训练当前的ATP模型，该数据集包括大量的语句-证明对，以及旨在模拟人类推理和假设完善的认知行为的额外数据。接下来，我们探索强化学习，利用Lean 4编译器返回的结果奖励。通过我们设计的持续训练和强化学习过程，我们成功改进了现有的形式证明器，包括DeepSeek-Prover-v1.5和Goedel-Prover，在整个证明生成领域取得了最先进的性能。例如，在MiniF2F上我们实现了59.8%的通过率（pass@32）。这是一个持续进行的项目，我们将逐步更新我们的发现，发布我们的数据和训练细节。

更新时间: 2025-04-08 15:15:26

领域: cs.AI

下载: http://arxiv.org/abs/2504.06122v1

TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models

Diffusion models have been the predominant generative model for tabular data generation. However, they face the conundrum of modeling under a separate versus a unified data representation. The former encounters the challenge of jointly modeling all multi-modal distributions of tabular data in one model. While the latter alleviates this by learning a single representation for all features, it currently leverages sparse suboptimal encoding heuristics and necessitates additional computation costs. In this work, we address the latter by presenting TabRep, a tabular diffusion architecture trained with a unified continuous representation. To motivate the design of our representation, we provide geometric insights into how the data manifold affects diffusion models. The key attributes of our representation are composed of its density, flexibility to provide ample separability for nominal features, and ability to preserve intrinsic relationships. Ultimately, TabRep provides a simple yet effective approach for training tabular diffusion models under a continuous data manifold. Our results showcase that TabRep achieves superior performance across a broad suite of evaluations. It is the first to synthesize tabular data that exceeds the downstream quality of the original datasets while preserving privacy and remaining computationally efficient.

Updated: 2025-04-08 15:10:24

标题: TabRep：用于训练表格扩散模型的简单有效连续表示

摘要: 扩散模型一直是表格数据生成的主要生成模型。然而，它们面临着在分开数据表示和统一数据表示下建模的难题。前者面临挑战，即在一个模型中联合建模表格数据的所有多模态分布。而后者通过学习所有特征的单一表示来缓解这一挑战，目前利用稀疏次优编码启发式方法并需要额外的计算成本。在这项工作中，我们通过提出TabRep来解决后者，这是一个使用统一连续表示训练的表格扩散架构。为了激发我们表示设计的动机，我们提供了关于数据流形如何影响扩散模型的几何洞察。我们表示的关键属性包括其密度、为名义特征提供充足可分离性的灵活性以及保持内在关系的能力。最终，TabRep为在连续数据流形下训练表格扩散模型提供了一种简单而有效的方法。我们的结果展示了TabRep在广泛的评估中实现了更优越的性能。它是第一个合成超越原始数据集下游质量的表格数据，同时保护隐私并保持计算效率。

更新时间: 2025-04-08 15:10:24

领域: cs.LG

下载: http://arxiv.org/abs/2504.04798v2

Triple-entry Accounting, Blockchain and Next of Kin: Towards a Standardisation of Ledger Terminology

Triple-entry accounting (TEA) is simultaneously a novel application in the blockchain universe and one of the many concepts applied in blockchain technology. Its Wild Wild West status is accompanied by a lack of consistent and comprehensive set of categories, a state of play that impedes a proper apprehension of the technology, leading to contradictions and oversight of important nuances. To clearly delineate the confines of TEA within the world of blockchain, we provide building blocks to standardise its terminology. Particularly, we distinguish between essential elements such as accounting and bookkeeping, as well as between decentralised systems, distributed ledgers and distributed journals.

Updated: 2025-04-08 15:09:42

标题: 三重记账，区块链和继承人：走向账簿术语标准化

摘要: 三重入账（TEA）同时是区块链领域中的一种新颖应用，也是区块链技术中应用的许多概念之一。其“野蛮西部”状态伴随着缺乏一致和全面的分类体系，这种状况妨碍了对技术的正确理解，导致矛盾和对重要细微差别的忽视。为了清晰地界定TEA在区块链世界中的范围，我们提供了规范其术语的基础。特别是，我们区分了会计和簿记等基本要素，以及分散系统、分布式账本和分布式日记账之间的区别。

更新时间: 2025-04-08 15:09:42

领域: cs.CR

下载: http://arxiv.org/abs/2101.02632v3

When does compositional structure yield compositional generalization? A kernel theory

Compositional generalization (the ability to respond correctly to novel combinations of familiar components) is thought to be a cornerstone of intelligent behavior. Compositionally structured (e.g. disentangled) representations support this ability; however, the conditions under which they are sufficient for the emergence of compositional generalization remain unclear. To address this gap, we present a theory of compositional generalization in kernel models with fixed, compositionally structured representations. This provides a tractable framework for characterizing the impact of training data statistics on generalization. We find that these models are limited to functions that assign values to each combination of components seen during training, and then sum up these values ("conjunction-wise additivity"). This imposes fundamental restrictions on the set of tasks compositionally structured kernel models can learn, in particular preventing them from transitively generalizing equivalence relations. Even for compositional tasks that they can learn in principle, we identify novel failure modes in compositional generalization (memorization leak and shortcut bias) that arise from biases in the training data. Finally, we empirically validate our theory, showing that it captures the behavior of deep neural networks (convolutional networks, residual networks, and Vision Transformers) trained on a set of compositional tasks with similarly structured data. Ultimately, this work examines how statistical structure in the training data can affect compositional generalization, with implications for how to identify and remedy failure modes in deep learning models.

Updated: 2025-04-08 15:07:04

标题: 组成结构何时产生组成概括？一种核心理论

摘要: 组合泛化（对熟悉组件的新组合做出正确响应的能力）被认为是智能行为的基石。组合结构化（例如解耦）表示支持这种能力；然而，它们在何种条件下足以支持组合泛化的出现仍不清楚。为了弥补这一空白，我们提出了一种在具有固定、组合结构化表示的核模型中进行组合泛化的理论。这为表征训练数据统计对泛化的影响提供了一个可处理的框架。我们发现，这些模型仅限于为在训练过程中看到的每个组件组合分配值，然后将这些值相加（“逻辑与可加性”）。这对组合结构化核模型可以学习的任务集施加了基本限制，特别是阻止它们在传递性地泛化等价关系方面。即使对于原则上可以学习的组合任务，我们也确定了组合泛化中的新的失败模式（记忆泄漏和捷径偏差），这些模式源于训练数据中的偏见。最后，我们通过实验证实了我们的理论，展示了它如何捕捉训练在一组具有类似结构化数据的组合任务上的深度神经网络（卷积网络、残差网络和视觉变换器）的行为。最终，这项工作探讨了训练数据中的统计结构如何影响组合泛化，对如何识别和纠正深度学习模型中的失败模式具有重要意义。

更新时间: 2025-04-08 15:07:04

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2405.16391v3

Leveraging Axis-Aligned Subspaces for High-Dimensional Bayesian Optimization with Group Testing

Bayesian optimization (BO ) is an effective method for optimizing expensive-to-evaluate black-box functions. While high-dimensional problems can be particularly challenging, due to the multitude of parameter choices and the potentially high number of data points required to fit the model, this limitation can be addressed if the problem satisfies simplifying assumptions. Axis-aligned subspace approaches, where few dimensions have a significant impact on the objective, motivated several algorithms for high-dimensional BO . However, the validity of this assumption is rarely verified, and the assumption is rarely exploited to its full extent. We propose a group testing ( GT) approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective, then terminates once active dimensions are identified. To that end, we extend the well-established GT theory to functions over continuous domains. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By leveraging the axis-aligned subspace assumption, GTBO outperforms state-of-the-art methods on benchmarks satisfying the assumption of axis-aligned subspaces, while offering improved interpretability.

Updated: 2025-04-08 15:00:15

标题: 利用轴对齐子空间进行高维贝叶斯优化与分组测试

摘要: 贝叶斯优化（BO）是优化昂贵评估的黑盒函数的有效方法。虽然高维问题可能特别具有挑战性，由于参数选择的多样性和拟合模型所需的数据点数量可能很高，但如果问题满足简化假设，这种限制可以得到解决。轴对齐子空间方法，其中少数维度对目标具有显著影响，激发了几种高维BO算法。然而，这一假设的有效性很少得到验证，并且很少被充分利用。我们提出了一种群体测试（GT）方法，以识别活跃变量，以便在这些领域中进行高效优化。所提出的算法，群体测试贝叶斯优化（GTBO），首先运行一个测试阶段，其中系统地选择并测试变量组，以确定它们是否影响目标，然后在确定活跃维度后终止。为此，我们将已建立的GT理论扩展到连续域上的函数。在第二阶段，GTBO通过更加重视活跃维度来引导优化。通过利用轴对齐子空间假设，GTBO在满足轴对齐子空间假设的基准上表现优于最先进的方法，同时提供了更好的可解释性。

更新时间: 2025-04-08 15:00:15

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.06111v1

The Art of Beating the Odds with Predictor-Guided Random Design Space Exploration

This work introduces an innovative method for improving combinational digital circuits through random exploration in MIG-based synthesis. High-quality circuits are crucial for performance, power, and cost, making this a critical area of active research. Our approach incorporates next-state prediction and iterative selection, significantly accelerating the synthesis process. This novel method achieves up to 14x synthesis speedup and up to 20.94% better MIG minimization on the EPFL Combinational Benchmark Suite compared to state-of-the-art techniques. We further explore various predictor models and show that increased prediction accuracy does not guarantee an equivalent increase in synthesis quality of results or speedup, observing that randomness remains a desirable factor.

Updated: 2025-04-08 14:52:06

标题: 用预测引导的随机设计空间探索技术战胜几率的艺术

摘要: 这项工作介绍了一种通过在基于MIG的综合中进行随机探索来改进组合数字电路的创新方法。高质量的电路对性能、功耗和成本至关重要，使这成为一个活跃研究领域。我们的方法结合了下一状态预测和迭代选择，显著加速了综合过程。与最先进技术相比，这种新颖方法在EPFL组合基准套件上实现了高达14倍的综合加速和高达20.94%的更好MIG最小化。我们进一步探讨了各种预测模型，并表明增加预测准确性并不保证结果的综合质量或加速度等价增加，观察到随机性仍然是一个令人渴望的因素。

更新时间: 2025-04-08 14:52:06

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2502.17936v3

Uncertainty-Aware Hybrid Machine Learning in Virtual Sensors for Vehicle Sideslip Angle Estimation

Precise vehicle state estimation is crucial for safe and reliable autonomous driving. The number of measurable states and their precision offered by the onboard vehicle sensor system are often constrained by cost. For instance, measuring critical quantities such as the Vehicle Sideslip Angle (VSA) poses significant commercial challenges using current optical sensors. This paper addresses these limitations by focusing on the development of high-performance virtual sensors to enhance vehicle state estimation for active safety. The proposed Uncertainty-Aware Hybrid Learning (UAHL) architecture integrates a machine learning model with vehicle motion models to estimate VSA directly from onboard sensor data. A key aspect of the UAHL architecture is its focus on uncertainty quantification for individual model estimates and hybrid fusion. These mechanisms enable the dynamic weighting of uncertainty-aware predictions from machine learning and vehicle motion models to produce accurate and reliable hybrid VSA estimates. This work also presents a novel dataset named Real-world Vehicle State Estimation Dataset (ReV-StED), comprising synchronized measurements from advanced vehicle dynamic sensors. The experimental results demonstrate the superior performance of the proposed method for VSA estimation, highlighting UAHL as a promising architecture for advancing virtual sensors and enhancing active safety in autonomous vehicles.

Updated: 2025-04-08 14:49:58

标题: 不确定性感知的虚拟传感器中用于车辆侧滑角估计的混合机器学习

摘要: 精确的车辆状态估计对于安全可靠的自动驾驶至关重要。车载车辆传感器系统提供的可测状态数量及其精度通常受成本限制。例如，使用当前光学传感器测量关键数量，如车辆侧滑角（VSA），面临着重大商业挑战。本文通过专注于开发高性能虚拟传感器来解决这些限制，以增强用于主动安全的车辆状态估计。提出的不确定性感知混合学习（UAHL）架构将机器学习模型与车辆运动模型集成在一起，直接从车载传感器数据中估计VSA。UAHL架构的一个关键方面是其专注于对个体模型估计和混合融合的不确定性量化。这些机制使得能够动态加权来自机器学习和车辆运动模型的不确定性感知预测，以生成准确可靠的混合VSA估计。本文还介绍了一个名为实际车辆状态估计数据集（ReV-StED）的新型数据集，其中包含来自先进车辆动态传感器的同步测量。实验结果展示了所提出方法在VSA估计方面的卓越性能，突出了UAHL作为推进虚拟传感器并增强自动驾驶车辆主动安全性的有前途的架构。

更新时间: 2025-04-08 14:49:58

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06105v1

Large Language Model Enhanced Knowledge Representation Learning: A Survey

Knowledge Representation Learning (KRL) is crucial for enabling applications of symbolic knowledge from Knowledge Graphs (KGs) to downstream tasks by projecting knowledge facts into vector spaces. Despite their effectiveness in modeling KG structural information, KRL methods are suffering from the sparseness of KGs. The rise of Large Language Models (LLMs) built on the Transformer architecture presents promising opportunities for enhancing KRL by incorporating textual information to address information sparsity in KGs. LLM-enhanced KRL methods, including three key approaches, encoder-based methods that leverage detailed contextual information, encoder-decoder-based methods that utilize a unified Seq2Seq model for comprehensive encoding and decoding, and decoder-based methods that utilize extensive knowledge from large corpora, have significantly advanced the effectiveness and generalization of KRL in addressing a wide range of downstream tasks. This work provides a broad overview of downstream tasks while simultaneously identifying emerging research directions in these evolving domains.

Updated: 2025-04-08 14:47:07

标题: 大型语言模型增强知识表示学习：一项调查

摘要: 知识表示学习（KRL）对于通过将知识事实投影到向量空间实现从知识图（KGs）到下游任务的应用至关重要。尽管KRL方法在建模KG结构信息方面非常有效，但它们受到KGs稀疏性的困扰。基于Transformer架构的大型语言模型（LLMs）的崛起为通过整合文本信息来解决KGs信息稀疏性提供了有望的机会。LLM增强的KRL方法包括三种关键方法：利用详细上下文信息的基于编码器的方法，利用统一的Seq2Seq模型进行全面编码和解码的基于编码器-解码器的方法，以及利用大型语料库的广泛知识的基于解码器的方法，这些方法显著提高了KRL在解决各种下游任务中的有效性和泛化能力。本文概述了下游任务的广泛概况，同时确定了这些不断发展领域中新兴研究方向。

更新时间: 2025-04-08 14:47:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00936v5

Sherlock: A Dataset for Process-aware Intrusion Detection Research on Power Grid Networks

Physically distributed components and legacy protocols make the protection of power grids against increasing cyberattack threats challenging. Infamously, the 2015 and 2016 blackouts in Ukraine were caused by cyberattacks, and the German Federal Office for Information Security (BSI) recorded over 200 cyber incidents against the German energy sector between 2023 and 2024. Intrusion detection promises to quickly detect such attacks and mitigate the worst consequences. However, public datasets of realistic scenarios are vital to evaluate these systems. This paper introduces Sherlock, a dataset generated with the co-simulator Wattson. In total, Sherlock covers three scenarios with various attacks manipulating the process state by injecting malicious commands or manipulating measurement values. We additionally test five recently-published intrusion detection systems on Sherlock, highlighting specific challenges for intrusion detection in power grids. Dataset and documentation are available at https://sherlock.wattson.it/.

Updated: 2025-04-08 14:46:35

标题: 福尔摩斯：用于电力网络过程感知入侵检测研究的数据集

摘要: 分布式的物理组件和传统协议使得保护电网免受不断增加的网络攻击威胁具有挑战性。臭名昭著的是，2015年和2016年乌克兰的停电是由网络攻击引起的，德国联邦信息安全局（BSI）记录显示，在2023年至2024年间对德国能源部门发生了200多起网络事件。入侵检测承诺能够快速检测此类攻击并减轻最严重的后果。然而，真实场景的公共数据集对于评估这些系统至关重要。本文介绍了使用协同模拟器Wattson生成的数据集Sherlock。总共，Sherlock涵盖了三种不同的场景，其中攻击通过注入恶意命令或操纵测量值来操纵过程状态。我们还在Sherlock上测试了五种最近发布的入侵检测系统，突出了电网入侵检测面临的特定挑战。数据集和文档可在https://sherlock.wattson.it/ 上获得。

更新时间: 2025-04-08 14:46:35

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2504.06102v1

Towards Varroa destructor mite detection using a narrow spectra illumination

This paper focuses on the development and modification of a beehive monitoring device and Varroa destructor detection on the bees with the help of hyperspectral imagery while utilizing a U-net, semantic segmentation architecture, and conventional computer vision methods. The main objectives were to collect a dataset of bees and mites, and propose the computer vision model which can achieve the detection between bees and mites.

Updated: 2025-04-08 14:41:42

标题: 朝向使用窄光谱照明进行检测Varroa destructor螨类

摘要: 本文关注蜂箱监测设备的开发和修改，以及利用高光谱图像技术对蜜蜂上的Varroa destructor进行检测，同时利用U-net、语义分割架构和传统计算机视觉方法。主要目标是收集蜜蜂和螨虫的数据集，并提出可以实现蜜蜂和螨虫之间检测的计算机视觉模型。

更新时间: 2025-04-08 14:41:42

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06099v1

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized benchmarks has limited the study of Bayesian optimization within the domain. To address this, we present CATBench, a comprehensive benchmarking suite that captures the complexities of compiler autotuning, ranging from discrete, conditional, and permutation parameter types to known and unknown binary constraints, as well as both multi-fidelity and multi-objective evaluations. The benchmarks in CATBench span a range of machine learning-oriented computations, from tensor algebra to image processing and clustering, and uses state-of-the-art compilers, such as TACO and RISE/ELEVATE. CATBench offers a unified interface for evaluating Bayesian optimization algorithms, promoting reproducibility and innovation through an easy-to-use, fully containerized setup of both surrogate and real-world compiler optimization tasks. We validate CATBench on several state-of-the-art algorithms, revealing their strengths and weaknesses and demonstrating the suite's potential for advancing both Bayesian optimization and compiler autotuning research.

Updated: 2025-04-08 14:37:00

标题: CATBench：用于黑盒优化的编译器自动调优基准套件

摘要: 贝叶斯优化是自动调整编译器的强大方法。自动调整的复杂性景观为黑盒优化器提供了众多很少考虑的结构挑战，而缺乏标准化基准限制了贝叶斯优化在该领域内的研究。为了解决这个问题，我们提出了CATBench，一个全面的基准套件，捕捉了编译器自动调整的复杂性，从离散、条件和排列参数类型到已知和未知的二进制约束，以及多种保真度和多目标评估。CATBench中的基准涵盖了一系列面向机器学习的计算，从张量代数到图像处理和聚类，并使用了最先进的编译器，如TACO和RISE/ELEVATE。CATBench提供了一个统一的界面，用于评估贝叶斯优化算法，通过易于使用的全容器化设置，促进了复现性和创新，涵盖了代理和真实世界的编译器优化任务。我们在几种最先进的算法上验证了CATBench，揭示了它们的优势和劣势，并展示了该套件在推动贝叶斯优化和编译器自动调整研究方面的潜力。

更新时间: 2025-04-08 14:37:00

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.17811v2

Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training

LLM training is scaled up to 10Ks of GPUs by a mix of data-(DP) and model-parallel (MP) execution. Critical to achieving efficiency is tensor-parallel (TP; a form of MP) execution within tightly-coupled subsets of GPUs, referred to as a scale-up domain, and the larger the scale-up domain the better the performance. New datacenter architectures are emerging with more GPUs able to be tightly-coupled in a scale-up domain, such as moving from 8 GPUs to 72 GPUs connected via NVLink. Unfortunately, larger scale-up domains increase the blast-radius of failures, with a failure of single GPU potentially impacting TP execution on the full scale-up domain, which can degrade overall LLM training throughput dramatically. With as few as 0.1% of GPUs being in a failed state, a high TP-degree job can experience nearly 10% reduction in LLM training throughput. We propose nonuniform-tensor-parallelism (NTP) to mitigate this amplified impact of GPU failures. In NTP, a DP replica that experiences GPU failures operates at a reduced TP degree, contributing throughput equal to the percentage of still-functional GPUs. We also propose a rack-design with improved electrical and thermal capabilities in order to sustain power-boosting of scale-up domains that have experienced failures; combined with NTP, this can allow the DP replica with the reduced TP degree (i.e., with failed GPUs) to keep up with the others, thereby achieving near-zero throughput loss for large-scale LLM training.

Updated: 2025-04-08 14:35:40

标题: 非均匀张量并行性：减轻GPU故障对扩展LLM训练的影响

摘要: LLM培训通过数据-（DP）和模型并行（MP）执行，扩展到数千个GPU。实现效率的关键是在GPU的紧密耦合子集内执行张量并行（TP；一种MP形式），称为扩展域，扩展域越大，性能越好。新的数据中心架构正在出现，可以在扩展域内连接更多GPU，例如从8个GPU移动到通过NVLink连接的72个GPU。不幸的是，更大的扩展域会增加故障的爆炸半径，单个GPU的故障可能影响整个扩展域上的TP执行，从而极大地降低整体LLM培训吞吐量。即使只有0.1%的GPU处于故障状态，高TP度作业的LLM培训吞吐量也可能减少近10%。我们提出了非均匀张量并行（NTP）来减轻GPU故障的放大影响。在NTP中，经历GPU故障的DP副本以降低的TP度运行，贡献吞吐量等于仍然正常运行的GPU的百分比。我们还提出了一种具有改进的电气和热能力的机架设计，以便维持经历故障的扩展域的功率增强；结合NTP，这可以使具有降低的TP度（即带有故障GPU的）的DP副本跟上其他副本，从而实现大规模LLM培训的近零吞吐量损失。

更新时间: 2025-04-08 14:35:40

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2504.06095v1

Real-Time LaCAM

The vast majority of Multi-Agent Path Finding (MAPF) methods with completeness guarantees require planning full horizon paths. However, planning full horizon paths can take too long and be impractical in real-world applications. Instead, real-time planning and execution, which only allows the planner a finite amount of time before executing and replanning, is more practical for real world multi-agent systems. Several methods utilize real-time planning schemes but none are provably complete, which leads to livelock or deadlock. Our main contribution is to show the first Real-Time MAPF method with provable completeness guarantees. We do this by leveraging LaCAM (Okumura 2023) in an incremental fashion. Our results show how we can iteratively plan for congested environments with a cutoff time of milliseconds while still maintaining the same success rate as full horizon LaCAM. We also show how it can be used with a single-step learned MAPF policy. The proposed Real-Time LaCAM also provides us with a general mechanism for using iterative constraints for completeness in future real-time MAPF algorithms.

Updated: 2025-04-08 14:31:05

标题: 实时LaCAM

摘要: 大多数具有完整性保证的多智能体路径规划（MAPF）方法需要规划完整的视野路径。然而，规划完整的视野路径可能需要太长时间，在现实世界的应用中并不切实际。相反，实时规划和执行只允许规划者在执行和重新规划之前有限的时间，对于现实世界的多智能体系统更为实用。一些方法利用实时规划方案，但没有一个可以被证明是完整的，这导致了活锁或死锁。我们的主要贡献是展示了第一个具有可证实完整性保证的实时MAPF方法。我们通过以增量方式利用LaCAM（Okumura 2023）来实现这一点。我们的结果展示了我们如何可以在毫秒级的截止时间内迭代规划拥挤环境，同时仍保持与完整视野LaCAM相同的成功率。我们还展示了如何与单步学习的MAPF策略一起使用它。提出的实时LaCAM还为我们提供了一种在未来实时MAPF算法中使用迭代约束来实现完整性的一般机制。

更新时间: 2025-04-08 14:31:05

领域: cs.MA,cs.AI,cs.RO

下载: http://arxiv.org/abs/2504.06091v1

Frequency maps reveal the correlation between Adversarial Attacks and Implicit Bias

Despite their impressive performance in classification tasks, neural networks are known to be vulnerable to adversarial attacks, subtle perturbations of the input data designed to deceive the model. In this work, we investigate the correlation between these perturbations and the implicit bias of neural networks trained with gradient-based algorithms. To this end, we analyse a representation of the network's implicit bias through the lens of the Fourier transform. Specifically, we identify unique fingerprints of implicit bias and adversarial attacks by calculating the minimal, essential frequencies needed for accurate classification of each image, as well as the frequencies that drive misclassification in its adversarially perturbed counterpart. This approach enables us to uncover and analyse the correlation between these essential frequencies, providing a precise map of how the network's biases align or contrast with the frequency components exploited by adversarial attacks. To this end, among other methods, we use a newly introduced technique capable of detecting nonlinear correlations between high-dimensional datasets. Our results provide empirical evidence that the network bias in Fourier space and the target frequencies of adversarial attacks are highly correlated and suggest new potential strategies for adversarial defence.

Updated: 2025-04-08 14:29:39

标题: 频率图表揭示对抗性攻击与隐性偏见之间的相关性

摘要: 尽管神经网络在分类任务中表现出色，但已知它们容易受到对抗性攻击的影响，即对输入数据进行微小扰动以欺骗模型。在这项工作中，我们调查了这些扰动与使用基于梯度的算法训练的神经网络的内在偏差之间的相关性。为此，我们通过傅立叶变换的视角分析了网络内在偏差的表示。具体来说，我们通过计算每幅图像所需的最小、必要频率以及驱动对其进行对抗性扰动后错误分类的频率，识别了内在偏差和对抗性攻击的独特指纹。这种方法使我们能够揭示和分析这些必要频率之间的相关性，提供网络偏差在傅立叶空间中与对抗性攻击利用的频率成分之间如何一致或相反的精确地图。为此，除其他方法外，我们使用了一种新引入的能够检测高维数据集之间非线性相关性的技术。我们的结果提供了实证证据，即傅立叶空间中的网络偏差和对抗性攻击的目标频率高度相关，并提出了新的对抗性防御潜在策略。

更新时间: 2025-04-08 14:29:39

领域: cs.LG,cs.AI,cs.CR,stat.ML

下载: http://arxiv.org/abs/2305.15203v3

MCAT: Visual Query-Based Localization of Standard Anatomical Clips in Fetal Ultrasound Videos Using Multi-Tier Class-Aware Token Transformer

Accurate standard plane acquisition in fetal ultrasound (US) videos is crucial for fetal growth assessment, anomaly detection, and adherence to clinical guidelines. However, manually selecting standard frames is time-consuming and prone to intra- and inter-sonographer variability. Existing methods primarily rely on image-based approaches that capture standard frames and then classify the input frames across different anatomies. This ignores the dynamic nature of video acquisition and its interpretation. To address these challenges, we introduce Multi-Tier Class-Aware Token Transformer (MCAT), a visual query-based video clip localization (VQ-VCL) method, to assist sonographers by enabling them to capture a quick US sweep. By then providing a visual query of the anatomy they wish to analyze, MCAT returns the video clip containing the standard frames for that anatomy, facilitating thorough screening for potential anomalies. We evaluate MCAT on two ultrasound video datasets and a natural image VQ-VCL dataset based on Ego4D. Our model outperforms state-of-the-art methods by 10% and 13% mIoU on the ultrasound datasets and by 5.35% mIoU on the Ego4D dataset, using 96% fewer tokens. MCAT's efficiency and accuracy have significant potential implications for public health, especially in low- and middle-income countries (LMICs), where it may enhance prenatal care by streamlining standard plane acquisition, simplifying US-based screening, diagnosis and allowing sonographers to examine more patients.

Updated: 2025-04-08 14:29:15

标题: MCAT：使用多层级类别感知标记转换器在胎儿超声视频中基于视觉查询定位标准解剖剪辑

摘要: 胎儿超声（US）视频中准确获取标准平面是进行胎儿生长评估、异常检测和遵守临床指南的关键。然而，手动选择标准帧耗时且容易出现超声医生内部和互相之间的变异性。现有方法主要依赖于基于图像的方法，捕捉标准帧，然后对不同解剖结构的输入帧进行分类。这忽略了视频采集的动态特性及其解释。为了解决这些挑战，我们引入了多层次类别感知记号变换器（MCAT），这是一种基于视觉查询的视频剪辑定位（VQ-VCL）方法，可帮助超声医生通过启用他们快速进行US扫描。然后提供他们希望分析的解剖结构的视觉查询，MCAT返回包含该解剖结构的标准帧的视频剪辑，便于对潜在异常进行全面筛查。我们在两个超声视频数据集和一个基于Ego4D的自然图像VQ-VCL数据集上评估了MCAT。我们的模型在超声数据集上的mIoU上优于现有方法10％和13％，在Ego4D数据集上的mIoU上优于5.35％，使用的记号数量减少了96％。MCAT的效率和准确性对公共卫生，特别是在低收入和中等收入国家（LMICs）具有重要的潜在影响，在这些国家可以通过简化标准平面采集、简化基于US的筛查、诊断，并允许超声医生检查更多患者来增强产前护理。

更新时间: 2025-04-08 14:29:15

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06088v1

Accurate Ab-initio Neural-network Solutions to Large-Scale Electronic Structure Problems

We present finite-range embeddings (FiRE), a novel wave function ansatz for accurate large-scale ab-initio electronic structure calculations. Compared to contemporary neural-network wave functions, FiRE reduces the asymptotic complexity of neural-network variational Monte Carlo (NN-VMC) by $\sim n_\text{el}$, the number of electrons. By restricting electron-electron interactions within the neural network, FiRE accelerates all key operations -- sampling, pseudopotentials, and Laplacian computations -- resulting in a real-world $10\times$ acceleration in now-feasible 180-electron calculations. We validate our method's accuracy on various challenging systems, including biochemical compounds, conjugated hydrocarbons, and organometallic compounds. On these systems, FiRE's energies are consistently within chemical accuracy of the most reliable data, including experiments, even in cases where high-accuracy methods such as CCSD(T), AFQMC, or contemporary NN-VMC fall short. With these improvements in both runtime and accuracy, FiRE represents a new `gold-standard' method for fast and accurate large-scale ab-initio calculations, potentially enabling new computational studies in fields like quantum chemistry, solid-state physics, and material design.

Updated: 2025-04-08 14:28:54

标题: 准确的从头算神经网络解决大规模电子结构问题

摘要: 我们提出了有限范围嵌入（FiRE），这是一种新颖的波函数假设，用于准确的大规模从头计算电子结构。与当代神经网络波函数相比，FiRE将神经网络变分蒙特卡洛（NN-VMC）的渐进复杂度降低了约$n_\text{el}$（电子数）。通过在神经网络内限制电子-电子相互作用，FiRE加速了所有关键操作--抽样、赝势和拉普拉斯计算--导致在现实世界中可行的180个电子计算中实现了$10\times$的加速。我们在各种具有挑战性的系统上验证了我们方法的准确性，包括生物化学化合物、共轭碳氢化合物和有机金属化合物。在这些系统中，FiRE的能量始终保持在化学精度范围内，与最可靠的数据（包括实验数据）保持一致，即使在高精度方法如CCSD(T)、AFQMC或当代NN-VMC无法做到的情况下也是如此。凭借这些在运行时间和准确性方面的改进，FiRE代表了一种新的“黄金标准”方法，可用于快速准确的大规模从头计算，可能为量子化学、固体物理和材料设计等领域的新的计算研究提供可能。

更新时间: 2025-04-08 14:28:54

领域: physics.comp-ph,cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2504.06087v1

Probabilistic Traffic Forecasting with Dynamic Regression

This paper proposes a dynamic regression (DR) framework that enhances existing deep spatiotemporal models by incorporating structured learning for the error process in traffic forecasting. The framework relaxes the assumption of time independence by modeling the error series of the base model (i.e., a well-established traffic forecasting model) using a matrix-variate autoregressive (AR) model. The AR model is integrated into training by redesigning the loss function. The newly designed loss function is based on the likelihood of a non-isotropic error term, enabling the model to generate probabilistic forecasts while preserving the original outputs of the base model. Importantly, the additional parameters introduced by the DR framework can be jointly optimized alongside the base model. Evaluation on state-of-the-art (SOTA) traffic forecasting models using speed and flow datasets demonstrates improved performance, with interpretable AR coefficients and spatiotemporal covariance matrices enhancing the understanding of the model.

Updated: 2025-04-08 14:26:10

标题: 使用动态回归进行概率交通预测

摘要: 本文提出了一种动态回归（DR）框架，通过在交通预测中结构化学习误差过程，增强了现有的深度时空模型。该框架通过使用矩阵变量自回归（AR）模型对基础模型的误差序列进行建模，放松了时间独立的假设。AR模型被整合到训练中，通过重新设计损失函数。新设计的损失函数基于非各向同性误差项的似然性，使模型能够生成概率预测，同时保留基础模型的原始输出。重要的是，DR框架引入的额外参数可以与基础模型一起进行联合优化。使用速度和流量数据集对最先进的交通预测模型进行评估，结果表明性能有所提高，可解释的AR系数和时空协方差矩阵增强了对模型的理解。

更新时间: 2025-04-08 14:26:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2301.06650v3

Security Analysis of Thumbnail-Preserving Image Encryption and a New Framework

As a primary encryption primitive balancing the privacy and searchability of cloud storage images, thumbnail preserving encryption (TPE) enables users to quickly identify the privacy personal image on the cloud and request this image from the owner through a secure channel. In this paper, we have found that two different plaintext images may produce the same thumbnail. It results in the failure of search strategy because the collision of thumbnail occurs. To address this serious security issues, we conduct an in-depth analysis on the collision probabilities of thumbnails, and then propose a new TPE framework, called multi-factor thumbnail preserving encryption (MFTPE). It starts from the collision probability of two blocks, extend to the probabilities of two images and ultimately to N images. Then, we in detail describe three specific MFTPE constructions preserving different combinations of factors, i.e., the sum and the geometric mean, the sum and the range, and the sum and the weighted mean. The theoretical and experimental results demonstrate that the proposed MFTPE reduces the probability of thumbnails, exhibits strong robustness, and also effectively resists face detection and noise attacks.

Updated: 2025-04-08 14:24:43

标题: 缩略图保留图像加密的安全分析和新框架

摘要: 作为平衡云存储图片隐私性和可搜索性的主要加密原语，缩略图保留加密（TPE）使用户能够快速识别云端上的隐私个人图像，并通过安全渠道向所有者请求该图像。本文发现两个不同的明文图像可能产生相同的缩略图。这导致搜索策略失败，因为缩略图发生碰撞。为了解决这个严重的安全问题，我们对缩略图碰撞概率进行了深入分析，然后提出了一个新的TPE框架，称为多因素缩略图保留加密（MFTPE）。它从两个块的碰撞概率开始，延伸到两个图像的概率，最终延伸到N个图像。然后，我们详细描述了保留不同因素组合的三种具体MFTPE构造，即总和和几何平均、总和和范围以及总和和加权平均。理论和实验结果表明，所提出的MFTPE减少了缩略图的概率，表现出强大的韧性，并且有效抵抗人脸检测和噪声攻击。

更新时间: 2025-04-08 14:24:43

领域: cs.CR,cs.MM

下载: http://arxiv.org/abs/2504.06083v1

A Survey on Design-space Dimensionality Reduction Methods for Shape Optimization

The rapidly evolving field of engineering design of functional surfaces necessitates sophisticated tools to manage the inherent complexity of high-dimensional design spaces. This survey paper offers a scoping review, i.e., a literature mapping synthesis borrowed from clinical medicine, delving into the field of design-space dimensionality reduction techniques tailored for shape optimization, bridging traditional methods and cutting-edge technologies. Dissecting the spectrum of these techniques, from classical linear approaches like principal component analysis to more nuanced nonlinear methods such as autoencoders, the discussion extends to innovative physics-informed methods that integrate physical data into the dimensionality reduction process, enhancing the physical relevance and effectiveness of reduced design spaces. By integrating these methods into optimization frameworks, it is shown how they significantly mitigate the curse of dimensionality, streamline computational processes, and refine the design exploration and optimization of complex functional surfaces. The survey provides a classification of methods and highlights the transformative impact of these techniques in simplifying design challenges, thereby fostering more efficient and effective engineering solutions.

Updated: 2025-04-08 14:23:57

标题: 一个关于形状优化设计空间维度缩减方法的调查

摘要: 功能表面工程设计领域发展迅速，需要复杂的工具来管理高维设计空间固有的复杂性。本综述论文提供了一个范围性回顾，即从临床医学中借鉴的文献综合，深入探讨为形状优化量身定制的设计空间维度缩减技术领域，桥接传统方法和前沿技术。通过解剖这些技术的范围，从像主成分分析这样的经典线性方法到更微妙的非线性方法，如自动编码器，讨论延伸到将物理数据整合到维度缩减过程中的创新物理知识方法，增强了缩减设计空间的物理相关性和有效性。通过将这些方法整合到优化框架中，展示了它们如何显著减轻维度的困扰，简化计算过程，优化复杂功能表面的设计探索和优化。该综述提供了方法分类，并突出了这些技术在简化设计挑战方面所产生的转变影响，从而促进更高效和有效的工程解决方案。

更新时间: 2025-04-08 14:23:57

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.13944v2

Collaborative Prediction: Tractable Information Aggregation via Agreement

We give efficient "collaboration protocols" through which two parties, who observe different features about the same instances, can interact to arrive at predictions that are more accurate than either could have obtained on their own. The parties only need to iteratively share and update their own label predictions-without either party ever having to share the actual features that they observe. Our protocols are efficient reductions to the problem of learning on each party's feature space alone, and so can be used even in settings in which each party's feature space is illegible to the other-which arises in models of human/AI interaction and in multi-modal learning. The communication requirements of our protocols are independent of the dimensionality of the data. In an online adversarial setting we show how to give regret bounds on the predictions that the parties arrive at with respect to a class of benchmark policies defined on the joint feature space of the two parties, despite the fact that neither party has access to this joint feature space. We also give simpler algorithms for the same task in the batch setting in which we assume that there is a fixed but unknown data distribution. We generalize our protocols to a decision theoretic setting with high dimensional outcome spaces, where parties communicate only "best response actions." Our theorems give a computationally and statistically tractable generalization of past work on information aggregation amongst Bayesians who share a common and correct prior, as part of a literature studying "agreement" in the style of Aumann's agreement theorem. Our results require no knowledge of (or even the existence of) a prior distribution and are computationally efficient. Nevertheless we show how to lift our theorems back to this classical Bayesian setting, and in doing so, give new information aggregation theorems for Bayesian agreement.

Updated: 2025-04-08 14:12:42

标题: 合作预测：通过协议实现可行的信息聚合

摘要: 我们提供了有效的“协作协议”，通过这些协议，观察同一实例的两方可以相互交互，得出比任何一方单独得出的预测更准确的结果。双方只需要迭代地分享和更新自己的标签预测-而无需任何一方分享他们观察到的实际特征。我们的协议是对每一方特征空间学习问题的高效简化，因此即使在每一方的特征空间对另一方是不可读的情况下-在人工智能交互模型和多模态学习中出现，也可以使用。我们的协议的通信需求与数据的维度无关。在在线对抗设置中，我们展示了如何对双方根据两方联合特征空间上定义的一类基准策略到达的预测给出遗憾上界，尽管双方都无法访问这个联合特征空间。我们还为批处理设置中相同任务提供了更简单的算法，假设存在一个固定但未知的数据分布。我们将我们的协议推广到一个决策理论设置，其中各方仅通信“最佳响应行动”。我们的定理为贝叶斯人之间关于信息聚合的过去工作提供了可计算和统计可追踪的泛化，这些贝叶斯人共享一个共同且正确的先验，作为研究“协议”的文献的一部分，研究了奥曼协议定理风格的一致性。我们的结果不需要对先验分布的知识（甚至存在性），并且计算效率高。然而，我们展示了如何将我们的定理提升回到这个经典的贝叶斯设置，并在此过程中，为贝叶斯一致性提供了新的信息聚合定理。

更新时间: 2025-04-08 14:12:42

领域: cs.LG,cs.DS,cs.GT

下载: http://arxiv.org/abs/2504.06075v1

PINP: Physics-Informed Neural Predictor with latent estimation of fluid flows

Accurately predicting fluid dynamics and evolution has been a long-standing challenge in physical sciences. Conventional deep learning methods often rely on the nonlinear modeling capabilities of neural networks to establish mappings between past and future states, overlooking the fluid dynamics, or only modeling the velocity field, neglecting the coupling of multiple physical quantities. In this paper, we propose a new physics-informed learning approach that incorporates coupled physical quantities into the prediction process to assist with forecasting. Central to our method lies in the discretization of physical equations, which are directly integrated into the model architecture and loss function. This integration enables the model to provide robust, long-term future predictions. By incorporating physical equations, our model demonstrates temporal extrapolation and spatial generalization capabilities. Experimental results show that our approach achieves the state-of-the-art performance in spatiotemporal prediction across both numerical simulations and real-world extreme-precipitation nowcasting benchmarks.

Updated: 2025-04-08 14:11:01

标题: PINP：具有流体流动潜在估计的物理信息神经预测器

摘要: 准确预测流体动力学和演化一直是物理科学中长期存在的挑战。传统的深度学习方法通常依赖于神经网络的非线性建模能力，以建立过去和未来状态之间的映射关系，忽视了流体动力学，或者只建模速度场，忽略了多个物理量之间的耦合。在本文中，我们提出了一种新的物理信息学习方法，将耦合的物理量纳入到预测过程中，以帮助进行预测。我们方法的核心在于将物理方程的离散化直接集成到模型架构和损失函数中。这种集成使模型能够提供稳健、长期的未来预测。通过整合物理方程，我们的模型展示了时间外推和空间泛化的能力。实验结果表明，我们的方法在数值模拟和现实世界极端降水的即时预报基准测试中实现了最先进的性能。

更新时间: 2025-04-08 14:11:01

领域: cs.LG

下载: http://arxiv.org/abs/2504.06070v1

Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images

This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results show that LoRA applied after an initial fine-tuning slightly improves performance in low-shot settings (e.g., 1-shot and 5-shot), while full fine-tuning remains more effective in higher-shot configurations. These findings highlight LoRA's potential for efficient adaptation in aerial object detection, encouraging further research into parameter-efficient fine-tuning strategies for few-shot learning. Our code is available here: https://github.com/HichTala/LoRA-DiffusionDet.

Updated: 2025-04-08 14:10:39

标题: 分析低秩适应对航空图像跨领域少样本目标检测的影响

摘要: 这篇论文研究了低秩适应（LoRA）在航空图像跨域少样本目标检测中的应用。LoRA最初设计用于大规模模型，有助于减少过拟合，使其成为资源受限情况下的一种有前途的方法。我们将LoRA集成到DiffusionDet中，并在DOTA和DIOR数据集上评估其性能。我们的结果表明，在低样本情况下（例如，1-shot和5-shot），在初始微调后应用LoRA略微提高了性能，而在更高样本配置中，完全微调仍然更有效。这些发现突出了LoRA在航空目标检测中的适应效能，鼓励进一步研究参数高效微调策略用于少样本学习。我们的代码在此处可用：https://github.com/HichTala/LoRA-DiffusionDet。

更新时间: 2025-04-08 14:10:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06330v1

Unifying KV Cache Compression for Large Language Models with LeanKV

Large language models (LLMs) exhibit exceptional performance but incur significant serving costs due to their substantial memory requirements, with the key-value (KV) cache being a primary bottleneck. Existing KV cache compression techniques, such as quantization and pruning, apply uniform treatment to both keys and values, and discard unimportant tokens entirely, overlooking the fine-grained differences in significance of various components within the KV cache. To address these limitations, we introduce LeanKV, a framework that advances KV cache compression by exploiting three levels of differentiation in the KV cache: (1) the differing impact of keys and values on attention computation, (2) the varying importance of tokens, and (3) the diverse dynamic sparsity patterns across attention heads. At the core of LeanKV is an on-GPU memory manager that compacts fragmented free memory list into contiguous regions in parallel, effectively translating sparsity in the KV cache into performance gains. We evaluate LeanKV on several mainstream models, including the recent "thinking model". LeanKV is able to compress the KV cache by $2.7\times$ to $5.7\times$ with near-lossless accuracy on complex workloads requiring sophisticated reasoning and long-generation capabilities, and enhances throughput by $1.9\times$ to $5.4\times$.

Updated: 2025-04-08 14:05:12

标题: 使用LeanKV统一大型语言模型的KV缓存压缩

摘要: 大型语言模型（LLMs）表现出卓越的性能，但由于其巨大的内存需求而产生显著的服务成本，其中键值（KV）缓存是主要瓶颈。现有的KV缓存压缩技术，如量化和修剪，对键和值都施加统一处理，并完全丢弃无关紧要的令牌，忽视了KV缓存中各组件的显著差异。为了解决这些限制，我们引入了LeanKV，这是一个通过利用KV缓存中三个层次的差异性来推进KV缓存压缩的框架：（1）键和值对注意力计算的不同影响，（2）令牌的不同重要性，和（3）注意力头之间的多样动态稀疏模式。LeanKV的核心是一个在GPU上的内存管理器，将碎片化的空闲内存列表并行压缩为连续区域，有效地将KV缓存中的稀疏转化为性能增益。我们在几个主流模型上评估了LeanKV，包括最近的“思考模型”。LeanKV能够以接近无损精度将KV缓存压缩2.7倍至5.7倍，适用于需要复杂推理和长期生成能力的工作负载，并提高吞吐量1.9倍至5.4倍。

更新时间: 2025-04-08 14:05:12

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2412.03131v2

Self-Supervised Siamese Autoencoders

In contrast to fully-supervised models, self-supervised representation learning only needs a fraction of data to be labeled and often achieves the same or even higher downstream performance. The goal is to pre-train deep neural networks on a self-supervised task, making them able to extract meaningful features from raw input data afterwards. Previously, autoencoders and Siamese networks have been successfully employed as feature extractors for tasks such as image classification. However, both have their individual shortcomings and benefits. In this paper, we combine their complementary strengths by proposing a new method called SidAE (Siamese denoising autoencoder). Using an image classification downstream task, we show that our model outperforms two self-supervised baselines across multiple data sets and scenarios. Crucially, this includes conditions in which only a small amount of labeled data is available. Empirically, the Siamese component has more impact, but the denoising autoencoder is nevertheless necessary to improve performance.

Updated: 2025-04-08 14:03:38

标题: 自监督孪生自编码器

摘要: 与完全监督模型相比，自监督表示学习只需要对数据进行少量标记，通常能够达到同样甚至更高的下游性能。其目标是在自监督任务上预训练深度神经网络，使其能够从原始输入数据中提取有意义的特征。以前，自动编码器和连体网络已成功地被用作图像分类等任务的特征提取器。然而，它们都有各自的缺点和优势。在本文中，我们通过提出一种新方法SidAE（连体去噪自动编码器）来结合它们互补的优势。通过使用图像分类下游任务，我们展示了我们的模型在多个数据集和场景中优于两个自监督基线的表现。至关重要的是，这包括仅有少量标记数据可用的情况。在实证上，连体组件具有更大的影响，但去噪自动编码器仍然是必要的以提高性能。

更新时间: 2025-04-08 14:03:38

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2304.02549v2

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Prompting advanced LLMs with reasoning capabilities to use search engines during inference is often suboptimal, as the LLM might not fully possess the capability on how to interact optimally with the search engine. This paper introduces Search-R1, an extension of reinforcement learning (RL) for reasoning frameworks where the LLM learns to autonomously generate (multiple) search queries during step-by-step reasoning with real-time retrieval. Search-R1 optimizes LLM reasoning trajectories with multi-turn search interactions, leveraging retrieved token masking for stable RL training and a simple outcome-based reward function. Experiments on seven question-answering datasets show that Search-R1 improves performance by 41% (Qwen2.5-7B) and 20% (Qwen2.5-3B) over various RAG baselines under the same setting. This paper further provides empirical insights into RL optimization methods, LLM choices, and response length dynamics in retrieval-augmented reasoning. The code and model checkpoints are available at https://github.com/PeterGriffinJin/Search-R1.

Updated: 2025-04-08 14:03:26

标题: Search-R1：使用强化学习训练LLMs进行推理和利用搜索引擎

摘要: 高效获取外部知识和最新信息对于大型语言模型（LLMs）中的有效推理和文本生成至关重要。在推理过程中，促使具有推理能力的高级LLMs使用搜索引擎往往并不是最佳选择，因为LLM可能并不完全具备如何与搜索引擎进行最佳互动的能力。本文介绍了Search-R1，这是一种基于强化学习（RL）的推理框架扩展，其中LLM学习在实时检索中逐步生成（多个）搜索查询。Search-R1通过多轮搜索互动优化LLM推理轨迹，利用检索到的令牌掩码进行稳定的RL训练，并使用简单的基于结果的奖励函数。在七个问答数据集上的实验表明，Search-R1在相同设置下比各种RAG基准线提高了41%（Qwen2.5-7B）和20%（Qwen2.5-3B）的性能。本文进一步提供了关于RL优化方法、LLM选择和检索增强推理中响应长度动态的经验见解。代码和模型检查点可在https://github.com/PeterGriffinJin/Search-R1上找到。

更新时间: 2025-04-08 14:03:26

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2503.09516v3

Principled Interpolation in Normalizing Flows

Generative models based on normalizing flows are very successful in modeling complex data distributions using simpler ones. However, straightforward linear interpolations show unexpected side effects, as interpolation paths lie outside the area where samples are observed. This is caused by the standard choice of Gaussian base distributions and can be seen in the norms of the interpolated samples as they are outside the data manifold. This observation suggests that changing the way of interpolating should generally result in better interpolations, but it is not clear how to do that in an unambiguous way. In this paper, we solve this issue by enforcing a specific manifold and, hence, change the base distribution, to allow for a principled way of interpolation. Specifically, we use the Dirichlet and von Mises-Fisher base distributions on the probability simplex and the hypersphere, respectively. Our experimental results show superior performance in terms of bits per dimension, Fr\'echet Inception Distance (FID), and Kernel Inception Distance (KID) scores for interpolation, while maintaining the generative performance.

Updated: 2025-04-08 14:02:40

标题: 正规流中的原则插值

摘要: 基于归一化流的生成模型在利用简单数据来建模复杂数据分布方面取得了很大成功。然而，直接的线性插值表现出意想不到的副作用，因为插值路径位于样本被观察到的区域之外。这是由于标准选择的高斯基本分布所导致的，在插值样本的范数中可以看到它们超出了数据流形。这一观察表明，改变插值方式通常会导致更好的插值结果，但如何以明确的方式做到这一点尚不清楚。在本文中，我们通过强制施加特定的流形并改变基本分布来解决这个问题，以便采用一种有原则的插值方式。具体地，我们分别在概率单纯形和超球面上使用了狄利克雷和von Mises-Fisher基本分布。我们的实验结果显示，在插值方面，我们表现出卓越的性能，包括每维比特数、Fr\'echet Inception Distance（FID）和Kernel Inception Distance（KID）分数，同时保持生成性能。

更新时间: 2025-04-08 14:02:40

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2010.12059v2

Explainable AI for building energy retrofitting under data scarcity

Enhancing energy efficiency in residential buildings is a crucial step toward mitigating climate change and reducing greenhouse gas emissions. Retrofitting existing buildings, which account for a significant portion of energy consumption, is critical particularly in regions with outdated and inefficient building stocks. This study presents an Artificial Intelligence (AI) and Machine Learning (ML)-based framework to recommend energy efficiency measures for residential buildings, leveraging accessible building characteristics to achieve energy class targets. Using Latvia as a case study, the methodology addresses challenges associated with limited datasets, class imbalance and data scarcity. The proposed approach integrates Conditional Tabular Generative Adversarial Networks (CTGAN) to generate synthetic data, enriching and balancing the dataset. A Multi-Layer Perceptron (MLP) model serves as the predictive model performing multi-label classification to predict appropriate retrofit strategies. Explainable Artificial Intelligence (XAI), specifically SHapley Additive exPlanations (SHAP), ensures transparency and trust by identifying key features that influence recommendations and guiding feature engineering choices for improved reliability and performance. The evaluation of the approach shows that it notably overcomes data limitations, achieving improvements up to 54% in precision, recall and F1 score. Although this study focuses on Latvia, the methodology is adaptable to other regions, underscoring the potential of AI in reducing the complexity and cost of building energy retrofitting overcoming data limitations. By facilitating decision-making processes and promoting stakeholders engagement, this work supports the global transition toward sustainable energy use in the residential building sector.

Updated: 2025-04-08 14:00:08

标题: 数据稀缺条件下建筑能源改造的可解释人工智能

摘要: 提高住宅建筑能效是减缓气候变化和减少温室气体排放的关键步骤。对现有建筑进行翻新是至关重要的，尤其是在具有陈旧和低效建筑库存的地区。本研究提出了一种基于人工智能（AI）和机器学习（ML）的框架，以推荐住宅建筑的能效措施，利用可访问的建筑特征来实现能源等级目标。以拉脱维亚为案例研究，该方法解决了与有限数据集、类别不平衡和数据稀缺性相关的挑战。所提出的方法整合了条件表生成对抗网络（CTGAN）来生成合成数据，丰富和平衡数据集。多层感知器（MLP）模型作为预测模型进行多标签分类，以预测合适的翻新策略。可解释人工智能（XAI），特别是SHapley添加解释（SHAP），通过识别影响推荐的关键特征，并指导特征工程选择以提高可靠性和性能，确保透明度和信任。该方法的评估显示，它显著克服了数据限制，精度、召回率和F1得分的提高达到了54%。尽管本研究重点放在拉脱维亚，但该方法适用于其他地区，强调了AI在减少建筑能源改造复杂性和成本，克服数据限制方面的潜力。通过促进决策过程和推动利益相关者参与，这项工作支持住宅建筑行业向可持续能源使用的全球转变。

更新时间: 2025-04-08 14:00:08

领域: cs.LG

下载: http://arxiv.org/abs/2504.06055v1

CORTEX-AVD: CORner Case Testing & EXploration for Autonomous Vehicles Development

Autonomous Vehicles (AVs) aim to improve traffic safety and efficiency by reducing human error. However, ensuring AVs reliability and safety is a challenging task when rare, high-risk traffic scenarios are considered. These 'Corner Cases' (CC) scenarios, such as unexpected vehicle maneuvers or sudden pedestrian crossings, must be safely and reliable dealt by AVs during their operations. But they arehard to be efficiently generated. Traditional CC generation relies on costly and risky real-world data acquisition, limiting scalability, and slowing research and development progress. Simulation-based techniques also face challenges, as modeling diverse scenarios and capturing all possible CCs is complex and time-consuming. To address these limitations in CC generation, this research introduces CORTEX-AVD, CORner Case Testing & EXploration for Autonomous Vehicles Development, an open-source framework that integrates the CARLA Simulator and Scenic to automatically generate CC from textual descriptions, increasing the diversity and automation of scenario modeling. Genetic Algorithms (GA) are used to optimize the scenario parameters in six case study scenarios, increasing the occurrence of high-risk events. Unlike previous methods, CORTEX-AVD incorporates a multi-factor fitness function that considers variables such as distance, time, speed, and collision likelihood. Additionally, the study provides a benchmark for comparing GA-based CC generation methods, contributing to a more standardized evaluation of synthetic data generation and scenario assessment. Experimental results demonstrate that the CORTEX-AVD framework significantly increases CC incidence while reducing the proportion of wasted simulations.

Updated: 2025-04-08 13:52:29

标题: CORTEX-AVD: 自动驾驶车辆开发的角落案例测试与探索

摘要: Autonomous Vehicles (AVs)旨在通过减少人为错误来提高交通安全和效率。然而，在考虑到罕见的高风险交通情景时，确保AVs的可靠性和安全性是一项具有挑战性的任务。这些“角落情况”(CC)情景，比如意外的车辆机动或突然出现的行人穿越，必须在AVs运营过程中被安全可靠地处理。但是它们很难被高效地生成。传统的CC生成依赖于昂贵且有风险的真实世界数据获取，限制了可扩展性，并减缓了研究和开发进展。基于模拟的技术也面临挑战，因为建模各种情景并捕捉所有可能的CC是复杂且耗时的。为了解决CC生成中的这些限制，这项研究引入了CORTEX-AVD，即CORner Case Testing & EXploration for Autonomous Vehicles Development，这是一个集成了CARLA Simulator和Scenic的开源框架，可以从文本描述中自动生成CC，增加情景建模的多样性和自动化。遗传算法(GA)被用于优化六个案例研究情景中的参数，增加高风险事件的发生率。与以往的方法不同，CORTEX-AVD融合了一个考虑距离、时间、速度和碰撞概率等变量的多因素适应度函数。此外，该研究提供了一个用于比较基于GA的CC生成方法的基准，有助于更标准化地评估合成数据生成和情景评估。实验结果表明，CORTEX-AVD框架显著增加了CC的发生率，同时减少了浪费模拟的比例。

更新时间: 2025-04-08 13:52:29

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.03989v2

Improving Privacy Benefits of Redaction

We propose a novel redaction methodology that can be used to sanitize natural text data. Our new technique provides better privacy benefits than other state of the art techniques while maintaining lower redaction levels.

Updated: 2025-04-08 13:47:26

标题: 改善消除隐私风险的效果

摘要: 我们提出了一种新颖的消除方法，可用于对自然文本数据进行消毒。我们的新技术在保持更低的消除水平的同时，提供比其他最先进技术更好的隐私保护益处。

更新时间: 2025-04-08 13:47:26

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2501.17762v3

Trust-Region Twisted Policy Improvement

Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL). However, scaling MCTS to parallel compute has proven challenging in practice which has motivated alternative planners like sequential Monte-Carlo (SMC). Many of these SMC methods adopt particle filters for smoothing through a reformulation of RL as a policy inference problem. Yet, persisting design choices of these particle filters often conflict with the aim of online planning in RL, which is to obtain a policy improvement at the start of planning. Drawing inspiration from MCTS, we tailor SMC planners specifically for RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling, as well as improving policy and value target estimation. This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.

Updated: 2025-04-08 13:47:07

标题: 信任区域扭曲策略改进

摘要: 蒙特卡洛树搜索（MCTS）推动了深度强化学习（RL）近期的许多突破。然而，在实践中将MCTS扩展到并行计算已被证明是具有挑战性的，这促使了类似顺序蒙特卡洛（SMC）的替代规划者的出现。许多这些SMC方法通过将RL重新表述为策略推断问题，采用粒子滤波器进行平滑处理。然而，这些粒子滤波器的持久设计选择常常与RL中在线规划的目标相冲突，即在规划开始时获得策略改进。受到MCTS的启发，我们通过在规划器内改进数据生成，通过受限制的动作采样和明确的终止状态处理，以及改进策略和价值目标估计，专门为RL定制了SMC规划者。这导致了我们的Trust-Region Twisted SMC（TRT-SMC），它在离散和连续领域中显示出比基线MCTS和SMC方法更好的运行时间和样本效率。

更新时间: 2025-04-08 13:47:07

领域: cs.LG

下载: http://arxiv.org/abs/2504.06048v1

Ising on the Graph: Task-specific Graph Subsampling via the Ising Model

Reducing a graph while preserving its overall properties is an important problem with many applications. Typically, reduction approaches either remove edges (sparsification) or merge nodes (coarsening) in an unsupervised way with no specific downstream task in mind. In this paper, we present an approach for subsampling graph structures using an Ising model defined on either the nodes or edges and learning the external magnetic field of the Ising model using a graph neural network. Our approach is task-specific as it can learn how to reduce a graph for a specific downstream task in an end-to-end fashion without requiring a differentiable loss function for the task. We showcase the versatility of our approach on four distinct applications: image segmentation, explainability for graph classification, 3D shape sparsification, and sparse approximate matrix inverse determination.

Updated: 2025-04-08 13:40:08

标题: 在图上的Ising：通过Ising模型进行特定任务的图子采样

摘要: 减少一个图的同时保留其整体特性是一个具有许多应用的重要问题。通常，减少方法要么删除边（稀疏化），要么合并节点（粗化），以无监督的方式进行，没有特定的下游任务。在本文中，我们提出了一种使用定义在节点或边上的伊辛模型对图结构进行子采样的方法，并使用图神经网络学习伊辛模型的外部磁场。我们的方法是特定于任务的，因为它可以学习如何为特定的下游任务以端到端的方式减少图，而不需要为任务需要可微损失函数。我们展示了我们的方法在四个不同应用中的多功能性：图像分割，图分类的可解释性，3D形状稀疏化和稀疏近似矩阵逆运算的确定。

更新时间: 2025-04-08 13:40:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.10206v3

Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

Human facial data offers valuable potential for tackling classification problems, including face recognition, age estimation, gender identification, emotion analysis, and race classification. However, recent privacy regulations, particularly the EU General Data Protection Regulation, have restricted the collection and usage of human images in research. As a result, several previously published face data sets have been removed from the internet due to inadequate data collection methods and privacy concerns. While synthetic data sets have been suggested as an alternative, they fall short of accurately representing the real data distribution. Additionally, most existing data sets are labeled for just a single task, which limits their versatility. To address these limitations, we introduce the Multi-Task Face (MTF) data set, designed for various tasks, including face recognition and classification by race, gender, and age, as well as for aiding in training generative networks. The MTF data set comes in two versions: a non-curated set containing 132,816 images of 640 individuals and a manually curated set with 5,246 images of 240 individuals, meticulously selected to maximize their classification quality. Both data sets were ethically sourced, using publicly available celebrity images in full compliance with copyright regulations. Along with providing detailed descriptions of data collection and processing, we evaluated the effectiveness of the MTF data set in training five deep learning models across the aforementioned classification tasks, achieving up to 98.88\% accuracy for gender classification, 95.77\% for race classification, 97.60\% for age classification, and 79.87\% for face recognition with the ConvNeXT model. Both MTF data sets can be accessed through the following link. https://github.com/RamiHaf/MTF_data_set

Updated: 2025-04-08 13:38:03

标题: 多任务人脸（MTF）数据集：一个合法和道德合规的人脸图像收集，用于各种分类任务。

摘要: 人脸数据具有巨大的潜力，可以用于解决分类问题，包括人脸识别、年龄估计、性别识别、情绪分析和种族分类等。然而，近年来的隐私法规，特别是欧盟的《通用数据保护条例》，限制了在研究中收集和使用人类图像的范围。因此，一些先前发布的人脸数据集已经从互联网上移除，原因是数据收集方法不足以及隐私问题。虽然合成数据集被认为是一种替代方案，但它们无法准确地代表真实数据分布。此外，大多数现有数据集仅用于单一任务的标记，这限制了它们的多功能性。为了解决这些限制，我们引入了多任务人脸（MTF）数据集，旨在用于各种任务，包括人脸识别和种族、性别和年龄分类，以及为生成网络的训练提供帮助。MTF数据集有两个版本：一个非策划集，包含640个个体的132,816张图像，一个手动策划集，包含240个个体的5,246张图像，精心选择以最大化其分类质量。两个数据集都是在道德上获得的，使用公开可用的名人图像，并严格遵守版权法规。除了提供数据收集和处理的详细描述外，我们还评估了MTF数据集在训练五个深度学习模型中在上述分类任务中的有效性，实现了98.88\%的性别分类准确率，95.77\%的种族分类准确率，97.60\%的年龄分类准确率以及79.87\%的ConvNeXT模型的人脸识别准确率。MTF数据集可以通过以下链接访问：https://github.com/RamiHaf/MTF_data_set。

更新时间: 2025-04-08 13:38:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2311.11882v2

Confidence Regularized Masked Language Modeling using Text Length

Masked language modeling, which is a task to predict a randomly masked word in the input text, is an efficient language representation learning method. Masked language modeling ignores various words which people can think of for filling in the masked position and calculates the loss with a single word. Especially when the input text is short, the entropy of the word distribution that can fill in the masked position can be high. This may cause the model to be overconfident in the single answer. To address this issue, we propose a novel confidence regularizer that controls regularizing strength dynamically by the input text length. Experiments with GLUE and SQuAD datasets showed that our method achieves better accuracy and lower expected calibration error.

Updated: 2025-04-08 13:37:08

标题: 信心规范化的掩蔽语言建模使用文本长度

摘要: 掩盖语言建模是一种任务，即预测输入文本中随机掩盖的单词，是一种有效的语言表示学习方法。掩盖语言建模忽略了人们可以考虑填补掩盖位置的各种单词，并使用单个单词计算损失。特别是当输入文本较短时，可以填补掩盖位置的单词分布的熵可能较高。这可能导致模型对单一答案过于自信。为了解决这个问题，我们提出了一种新颖的置信度正则化器，通过输入文本长度动态控制正则化强度。使用GLUE和SQuAD数据集进行的实验表明，我们的方法实现了更高的准确性和更低的预期校准误差。

更新时间: 2025-04-08 13:37:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06037v1

Bridging the Theoretical Gap in Randomized Smoothing

Randomized smoothing has become a leading approach for certifying adversarial robustness in machine learning models. However, a persistent gap remains between theoretical certified robustness and empirical robustness accuracy. This paper introduces a new framework that bridges this gap by leveraging Lipschitz continuity for certification and proposing a novel, less conservative method for computing confidence intervals in randomized smoothing. Our approach tightens the bounds of certified robustness, offering a more accurate reflection of model robustness in practice. Through rigorous experimentation we show that our method improves the robust accuracy, compressing the gap between empirical findings and previous theoretical results. We argue that investigating local Lipschitz constants and designing ad-hoc confidence intervals can further enhance the performance of randomized smoothing. These results pave the way for a deeper understanding of the relationship between Lipschitz continuity and certified robustness.

Updated: 2025-04-08 13:31:11

标题: 填补随机平滑中的理论空白

摘要: 随机平滑已成为证明机器学习模型对抗性鲁棒性的主要方法。然而，理论上的认证鲁棒性与实际鲁棒性准确性之间仍存在持久差距。本文介绍了一个新的框架，通过利用Lipschitz连续性进行认证，并提出了一种更不保守的方法来计算随机平滑中的置信区间，从而弥合了这一差距。我们的方法缩小了认证鲁棒性的范围，更准确地反映了模型在实践中的鲁棒性。通过严格的实验，我们展示了我们的方法提高了鲁棒性准确性，缩小了实证研究和以往理论结果之间的差距。我们认为，研究局部Lipschitz常数并设计特定的置信区间可以进一步提升随机平滑的性能。这些结果为更深入地了解Lipschitz连续性与认证鲁棒性之间的关系铺平了道路。

更新时间: 2025-04-08 13:31:11

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.02412v2

Attacking at non-harmonic frequencies in screaming-channel attacks

Screaming-channel attacks enable Electromagnetic (EM) Side-Channel Attacks (SCAs) at larger distances due to higher EM leakage energies than traditional SCAs, relaxing the requirement of close access to the victim. This attack can be mounted on devices integrating Radio Frequency (RF) modules on the same die as digital circuits, where the RF can unintentionally capture, modulate, amplify, and transmit the leakage along with legitimate signals. Leakage results from digital switching activity, so the hypothesis of previous works was that this leakage would appear at multiples of the digital clock frequency, i.e., harmonics. This work demonstrates that compromising signals appear not only at the harmonics and that leakage at non-harmonics can be exploited for successful attacks. Indeed, the transformations undergone by the leaked signal are complex due to propagation effects through the substrate and power and ground planes, so the leakage also appears at other frequencies. We first propose two methodologies to locate frequencies that contain leakage and demonstrate that it appears at non-harmonic frequencies. Then, our experimental results show that screaming-channel attacks at non-harmonic frequencies can be as successful as at harmonics when retrieving a 16-byte AES key. As the RF spectrum is polluted by interfering signals, we run experiments and show successful attacks in a more realistic, noisy environment where harmonic frequencies are contaminated by multi-path fading and interference. These attacks at non-harmonic frequencies increase the attack surface by providing attackers with an increased number of potential frequencies where attacks can succeed.

Updated: 2025-04-08 13:29:36

标题: 在尖叫通道攻击中攻击非谐波频率

摘要: 尖叫信道攻击使电磁（EM）侧信道攻击（SCAs）能够在较大距离上进行，因为与传统SCAs相比，其泄漏能量更高，从而放宽了对受害者近距离接触的要求。这种攻击可以针对将射频（RF）模块与数字电路集成在同一芯片上的设备进行，其中射频可能无意中捕获、调制、放大并传输泄漏信号以及合法信号。泄漏是由数字切换活动引起的，因此以前的工作假设这种泄漏将以数字时钟频率的倍数即谐波的形式出现。这项工作证明了受损信号不仅会出现在谐波上，而且可以利用非谐波上的泄漏进行成功攻击。实际上，由于信号通过基板和电源地面层传播效应的转换是复杂的，因此泄漏也会出现在其他频率上。我们首先提出了两种定位包含泄漏的频率的方法，并证明了它出现在非谐波频率上。然后，我们的实验结果显示，当检索一个16字节AES密钥时，非谐波频率上的尖叫信道攻击可以与谐波频率上的攻击一样成功。由于射频频谱被干扰信号污染，我们进行实验，并展示在更现实且嘈杂的环境中成功攻击，其中谐波频率受到多径衰减和干扰的影响。在非谐波频率上的这些攻击增加了攻击表面，为攻击者提供了更多潜在频率，攻击可以成功。

更新时间: 2025-04-08 13:29:36

领域: cs.CR

下载: http://arxiv.org/abs/2311.15832v2

Generalizability of experimental studies

Experimental studies are a cornerstone of machine learning (ML) research. A common, but often implicit, assumption is that the results of a study will generalize beyond the study itself, e.g. to new data. That is, there is a high probability that repeating the study under different conditions will yield similar results. Despite the importance of the concept, the problem of measuring generalizability remains open. This is probably due to the lack of a mathematical formalization of experimental studies. In this paper, we propose such a formalization and develop a quantifiable notion of generalizability. This notion allows to explore the generalizability of existing studies and to estimate the number of experiments needed to achieve the generalizability of new studies. To demonstrate its usefulness, we apply it to two recently published benchmarks to discern generalizable and non-generalizable results. We also publish a Python module that allows our analysis to be repeated for other experimental studies.

Updated: 2025-04-08 13:26:24

标题: 实验研究的泛化性

摘要: 实验研究是机器学习（ML）研究的基石。一个常见但常常是隐含的假设是，研究的结果将会超出研究本身，例如对新数据的泛化。也就是说，以不同条件重复这项研究将产生类似的结果的概率很高。尽管这个概念的重要性，但测量泛化能力的问题仍然没有解决。这可能是因为缺乏对实验研究的数学形式化。在本文中，我们提出这样一种形式化，并发展了一个可量化的泛化概念。这个概念允许探索现有研究的泛化能力，并估计需要进行多少实验才能实现新研究的泛化能力。为了证明其实用性，我们将其应用于两个最近发表的基准，以区分可泛化和不可泛化的结果。我们还发布了一个Python模块，可以让我们的分析重复应用于其他实验研究。

更新时间: 2025-04-08 13:26:24

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.17374v2

Information-Theoretic Reward Decomposition for Generalizable RLHF

A generalizable reward model is crucial in Reinforcement Learning from Human Feedback (RLHF) as it enables correctly evaluating unseen prompt-response pairs. However, existing reward models lack this ability, as they are typically trained by increasing the reward gap between chosen and rejected responses, while overlooking the prompts that the responses are conditioned on. Consequently, when the trained reward model is evaluated on prompt-response pairs that lie outside the data distribution, neglecting the effect of prompts may result in poor generalization of the reward model. To address this issue, we decompose the reward value into two independent components: prompt-free reward and prompt-related reward. Prompt-free reward represents the evaluation that is determined only by responses, while the prompt-related reward reflects the reward that derives from both the prompt and the response. We extract these two components from an information-theoretic perspective, which requires no extra models. Subsequently, we propose a new reward learning algorithm by prioritizing data samples based on their prompt-free reward values. Through toy examples, we demonstrate that the extracted prompt-free and prompt-related rewards effectively characterize two parts of the reward model. Further, standard evaluations show that our method improves both the alignment performance and the generalization capability of the reward model.

Updated: 2025-04-08 13:26:07

标题: 信息论奖励分解用于可泛化的强化学习HF

摘要: 从人类反馈中学习强化学习（RLHF）中，一个可泛化的奖励模型是至关重要的，因为它能够正确评估未见过的提示-响应对。然而，现有的奖励模型缺乏这种能力，因为它们通常是通过增加选择和拒绝响应之间的奖励差距来训练的，同时忽视了响应所依赖的提示。因此，当训练好的奖励模型在数据分布之外的提示-响应对上进行评估时，忽视提示的影响可能会导致奖励模型的泛化能力较差。为了解决这个问题，我们将奖励值分解为两个独立的部分：无提示奖励和与提示相关的奖励。无提示奖励代表仅由响应决定的评估，而与提示相关的奖励反映了由提示和响应共同产生的奖励。我们从信息论的角度提取了这两个部分，不需要额外的模型。随后，我们提出了一种新的奖励学习算法，通过基于无提示奖励值对数据样本进行优先级排序。通过玩具示例，我们证明了提取的无提示和与提示相关的奖励有效地表征了奖励模型的两个部分。进一步的标准评估表明，我们的方法改善了奖励模型的对齐性能和泛化能力。

更新时间: 2025-04-08 13:26:07

领域: cs.AI

下载: http://arxiv.org/abs/2504.06020v1

Generating Usage-related Questions for Preference Elicitation in Conversational Recommender Systems

A key distinguishing feature of conversational recommender systems over traditional recommender systems is their ability to elicit user preferences using natural language. Currently, the predominant approach to preference elicitation is to ask questions directly about items or item attributes. Users searching for recommendations may not have deep knowledge of the available options in a given domain. As such, they might not be aware of key attributes or desirable values for them. However, in many settings, talking about the planned use of items does not present any difficulties, even for those that are new to a domain. In this paper, we propose a novel approach to preference elicitation by asking implicit questions based on item usage. As one of the main contributions of this work, we develop a multi-stage data annotation protocol using crowdsourcing, to create a high-quality labeled training dataset. Another main contribution is the development of four models for the question generation task: two template-based baseline models and two neural text-to-text models. The template-based models use heuristically extracted common patterns found in the training data, while the neural models use the training data to learn to generate questions automatically. Using common metrics from machine translation for automatic evaluation, we show that our approaches are effective in generating elicitation questions, even with limited training data. We further employ human evaluation for comparing the generated questions using both pointwise and pairwise evaluation designs. We find that the human evaluation results are consistent with the automatic ones, allowing us to draw conclusions about the quality of the generated questions with certainty. Finally, we provide a detailed analysis of cases where the models show their limitations.

Updated: 2025-04-08 13:25:51

标题: 在会话式推荐系统中生成与使用相关的问题以进行偏好引导

摘要: 对话式推荐系统与传统推荐系统的一个关键特征是其利用自然语言引导用户偏好的能力。目前，偏好引导的主要方法是直接询问有关物品或物品属性的问题。寻找推荐的用户可能对特定领域的可用选项缺乏深入了解，因此可能不了解关键属性或它们的理想值。然而，在许多情况下，谈论物品的计划使用并不会带来任何困难，即使对于对某一领域尚不熟悉的人也是如此。在本文中，我们提出了一种基于物品使用的隐式问题引导的新方法。作为本研究的一个主要贡献，我们开发了一个多阶段的数据标注协议，利用众包创建了一个高质量的标记训练数据集。另一个主要贡献是开发了四个用于问题生成任务的模型：两个基于模板的基准模型和两个神经文本到文本模型。基于模板的模型使用训练数据中发现的常见模式进行启发式提取，而神经模型使用训练数据学习自动生成问题。利用机器翻译的常用指标进行自动评估，我们展示了我们的方法在生成引导问题方面的有效性，即使只有有限的训练数据。我们进一步使用人工评估来比较使用点对点和成对评估设计生成的问题。我们发现，人工评估结果与自动评估结果一致，使我们能够确定地得出关于生成问题质量的结论。最后，我们对模型显示出限制的情况进行了详细分析。

更新时间: 2025-04-08 13:25:51

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2111.13463v2

A Geometric-Aware Perspective and Beyond: Hybrid Quantum-Classical Machine Learning Methods

Geometric Machine Learning (GML) has shown that respecting non-Euclidean geometry in data spaces can significantly improve performance over naive Euclidean assumptions. In parallel, Quantum Machine Learning (QML) has emerged as a promising paradigm that leverages superposition, entanglement, and interference within quantum state manifolds for learning tasks. This paper offers a unifying perspective by casting QML as a specialized yet more expressive branch of GML. We argue that quantum states, whether pure or mixed, reside on curved manifolds (e.g., projective Hilbert spaces or density-operator manifolds), mirroring how covariance matrices inhabit the manifold of symmetric positive definite (SPD) matrices or how image sets occupy Grassmann manifolds. However, QML also benefits from purely quantum properties, such as entanglement-induced curvature, that can yield richer kernel structures and more nuanced data embeddings. We illustrate these ideas with published and newly discussed results, including hybrid classical -quantum pipelines for diabetic foot ulcer classification and structural health monitoring. Despite near-term hardware limitations that constrain purely quantum solutions, hybrid architectures already demonstrate tangible benefits by combining classical manifold-based feature extraction with quantum embeddings. We present a detailed mathematical treatment of the geometrical underpinnings of quantum states, emphasizing parallels to classical Riemannian geometry and manifold-based optimization. Finally, we outline open research challenges and future directions, including Quantum Large Language Models (LLMs), quantum reinforcement learning, and emerging hardware approaches, demonstrating how synergizing GML and QML principles can unlock the next generation of machine intelligence.

Updated: 2025-04-08 13:24:55

标题: 一种具有几何意识的视角及其进展：混合量子-经典机器学习方法

摘要: 几何机器学习（GML）已经表明，在数据空间中尊重非欧几里得几何可以显著提高性能，超过天真的欧几里得假设。与此同时，量子机器学习（QML）作为一种有前途的范式出现，利用量子态流形中的叠加、纠缠和干涉进行学习任务。本文提供了一个统一的视角，将QML视为GML的一个特殊但更具表现力的分支。我们认为，无论是纯态还是混合态，量子态都存在于曲线流形上（例如，投影希尔伯特空间或密度算子流形），这反映了协方差矩阵存在于对称正定（SPD）矩阵流形上或图像集占据Grassmann流形的方式。然而，QML还受益于纯粹的量子性质，如纠缠引起的曲率，可以产生更丰富的核结构和更微妙的数据嵌入。我们通过已发表和新讨论的结果来说明这些想法，包括用于糖尿病足溃疡分类和结构健康监测的混合经典-量子管道。尽管近期硬件限制限制了纯量子解决方案，但混合架构已经通过将基于经典流形的特征提取与量子嵌入相结合，展示了明显的好处。我们对量子态的几何基础进行了详细的数学处理，强调与经典黎曼几何和基于流形的优化的相似之处。最后，我们概述了开放的研究挑战和未来方向，包括量子大语言模型（LLMs）、量子强化学习和新兴的硬件方法，展示了如何通过协同作用GML和QML原则可以解锁下一代机器智能。

更新时间: 2025-04-08 13:24:55

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06328v1

Improving Mixed-Criticality Scheduling with Reinforcement Learning

This paper introduces a novel reinforcement learning (RL) approach to scheduling mixed-criticality (MC) systems on processors with varying speeds. Building upon the foundation laid by [1], we extend their work to address the non-preemptive scheduling problem, which is known to be NP-hard. By modeling this scheduling challenge as a Markov Decision Process (MDP), we develop an RL agent capable of generating near-optimal schedules for real-time MC systems. Our RL-based scheduler prioritizes high-critical tasks while maintaining overall system performance. Through extensive experiments, we demonstrate the scalability and effectiveness of our approach. The RL scheduler significantly improves task completion rates, achieving around 80% overall and 85% for high-criticality tasks across 100,000 instances of synthetic data and real data under varying system conditions. Moreover, under stable conditions without degradation, the scheduler achieves 94% overall task completion and 93% for high-criticality tasks. These results highlight the potential of RL-based schedulers in real-time and safety-critical applications, offering substantial improvements in handling complex and dynamic scheduling scenarios.

Updated: 2025-04-08 13:22:59

标题: 用强化学习改进混合关键性调度

摘要: 本文介绍了一种新颖的强化学习（RL）方法，用于在速度不同的处理器上调度混合重要性（MC）系统。在[1]奠定的基础上，我们扩展了他们的工作以解决非抢占式调度问题，这被认为是NP难题。通过将这一调度挑战建模为马尔可夫决策过程（MDP），我们开发了一个RL代理程序，能够为实时MC系统生成接近最优的调度。我们基于RL的调度程序优先考虑高关键性任务，同时保持整体系统性能。通过大量实验，我们展示了我们方法的可扩展性和有效性。RL调度程序显著提高了任务完成率，在100,000个合成数据和真实数据实例的不同系统条件下，整体完成率达到约80%，高关键性任务达到85%。此外，在没有降级的稳定条件下，调度程序实现了94%的整体任务完成率和93%的高关键性任务完成率。这些结果突显了RL调度程序在实时和安全关键应用中的潜力，为处理复杂和动态调度场景提供了实质性改进。

更新时间: 2025-04-08 13:22:59

领域: cs.LG,cs.AI,cs.MA,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.03994v2

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

Updated: 2025-04-08 13:22:27

标题: 贝叶斯离线策略评估和学习针对大动作空间

摘要: 在交互式系统中，行为通常是相关的，这为在大规模动作空间中更有效地进行离线策略评估（OPE）和学习（OPL）提供了机会。我们引入了一个统一的贝叶斯框架，通过结构化和信息丰富的先验来捕捉这些相关性。在这个框架中，我们提出了sDM，一个通用的贝叶斯方法，用于OPE和OPL，基于算法和理论基础。值得注意的是，sDM利用动作相关性而不影响计算效率。此外，受在线贝叶斯赌博机的启发，我们引入了贝叶斯度量，评估算法在多个问题实例中的平均性能，与传统的最坏情况评估有所不同。我们分析了sDM在OPE和OPL中的应用，突出了利用动作相关性的好处。经验证据展示了sDM的强大性能。

更新时间: 2025-04-08 13:22:27

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.14664v2

CAI: An Open, Bug Bounty-Ready Cybersecurity AI

By 2028 most cybersecurity actions will be autonomous, with humans teleoperating. We present the first classification of autonomy levels in cybersecurity and introduce Cybersecurity AI (CAI), an open-source framework that democratizes advanced security testing through specialized AI agents. Through rigorous empirical evaluation, we demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks, solving challenges across diverse categories with significantly greater efficiency -up to 3,600x faster than humans in specific tasks and averaging 11x faster overall. CAI achieved first place among AI teams and secured a top-20 position worldwide in the "AI vs Human" CTF live Challenge, earning a monetary reward of $750. Based on our results, we argue against LLM-vendor claims about limited security capabilities. Beyond cybersecurity competitions, CAI demonstrates real-world effectiveness, reaching top-30 in Spain and top-500 worldwide on Hack The Box within a week, while dramatically reducing security testing costs by an average of 156x. Our framework transcends theoretical benchmarks by enabling non-professionals to discover significant security bugs (CVSS 4.3-7.5) at rates comparable to experts during bug bounty exercises. By combining modular agent design with seamless tool integration and human oversight (HITL), CAI addresses critical market gaps, offering organizations of all sizes access to AI-powered bug bounty security testing previously available only to well-resourced firms -thereby challenging the oligopolistic ecosystem currently dominated by major bug bounty platforms.

Updated: 2025-04-08 13:22:09

标题: CAI：一个开放的，适合漏洞赏金的网络安全人工智能

摘要: 到2028年，大多数网络安全行动将是自主的，由人类进行远程操作。我们提出了网络安全自主级别的第一个分类，并介绍了网络安全AI（CAI），这是一个开源框架，通过专门的AI代理实现了高级安全测试的民主化。通过严格的实证评估，我们证明CAI在CTF基准测试中始终优于最先进的结果，以极大的效率解决了各种类别的挑战-在特定任务中比人类快3600倍，平均快11倍。CAI在“AI vs Human” CTF live Challenge中获得了AI团队的第一名，并在全球排名前20，赢得了750美元的奖励。根据我们的结果，我们反对LLM供应商关于安全能力有限的说法。除了网络安全比赛，CAI还展示了实际有效性，在一周内在西班牙排名前30，在Hack The Box全球排名前500，同时将安全测试成本平均降低了156倍。我们的框架通过结合模块化代理设计、无缝工具集成和人类监督（HITL），弥补了市场的关键差距，为各种规模的组织提供了AI支持的漏洞赏金安全测试，这在以前只有资源充裕的公司才能获得-从而挑战了目前由主要漏洞赏金平台主导的寡头生态系统。

更新时间: 2025-04-08 13:22:09

领域: cs.CR

下载: http://arxiv.org/abs/2504.06017v1

The Hall of AI Fears and Hopes: Comparing the Views of AI Influencers and those of Members of the U.S. Public Through an Interactive Platform

AI development is shaped by academics and industry leaders - let us call them ``influencers'' - but it is unclear how their views align with those of the public. To address this gap, we developed an interactive platform that served as a data collection tool for exploring public views on AI, including their fears, hopes, and overall sense of hopefulness. We made the platform available to 330 participants representative of the U.S. population in terms of age, sex, ethnicity, and political leaning, and compared their views with those of 100 AI influencers identified by Time magazine. The public fears AI getting out of control, while influencers emphasize regulation, seemingly to deflect attention from their alleged focus on monetizing AI's potential. Interestingly, the views of AI influencers from underrepresented groups such as women and people of color often differ from the views of underrepresented groups in the public.

Updated: 2025-04-08 13:21:31

标题: AI恐惧与希望之殿：通过互动平台比较AI影响者与美国公众的观点

摘要: AI发展受学术界和行业领袖——让我们称之为“影响者”——的影响，但他们的观点如何与公众观点一致尚不清楚。为了弥补这一差距，我们开发了一个互动平台，作为一个数据收集工具，用于探索公众对人工智能的看法，包括他们的恐惧、希望和整体的愉快感。我们向代表美国人口的330名参与者提供了这个平台，涵盖了年龄、性别、种族和政治倾向，并将他们的观点与《时代》杂志识别出的100名AI影响者进行了比较。公众担心人工智能失控，而影响者则强调监管，似乎是为了转移对他们据称将焦点放在AI潜力获利上的关注。有趣的是，像女性和有色人种等少数群体的AI影响者的观点往往与公众中的少数群体的观点不同。

更新时间: 2025-04-08 13:21:31

领域: cs.HC,cs.AI,I.2; K.4.1; K.4.2; K.4.3

下载: http://arxiv.org/abs/2504.06016v1

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

Optimal hyperparameter selection is critical for maximizing neural network performance, especially as models grow in complexity. This work investigates the viability of using large language models (LLMs) for hyperparameter optimization by employing a fine-tuned version of Code Llama. Through parameter-efficient fine-tuning using LoRA, we adapt the LLM to generate accurate and efficient hyperparameter recommendations tailored to diverse neural network architectures. Unlike traditional methods such as Optuna, which rely on exhaustive trials, the proposed approach achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead. Our approach highlights that LLM-based optimization not only matches state-of-the-art methods like Tree-structured Parzen Estimators but also accelerates the tuning process. This positions LLMs as a promising alternative to conventional optimization techniques, particularly for rapid experimentation. Furthermore, the ability to generate hyperparameters in a single inference step makes this method particularly well-suited for resource-constrained environments such as edge devices and mobile applications, where computational efficiency is paramount. The results confirm that LLMs, beyond their efficiency, offer substantial time savings and comparable stability, underscoring their value in advancing machine learning workflows. All generated hyperparameters are included in the LEMUR Neural Network (NN) Dataset, which is publicly available and serves as an open-source benchmark for hyperparameter optimization research.

Updated: 2025-04-08 13:15:47

标题: Optuna vs Code Llama：LLMs是否是超参数调整的新范式？

摘要: 最佳超参数选择对于最大化神经网络性能至关重要，特别是在模型复杂度增加的情况下。本研究探讨了使用大型语言模型（LLMs）进行超参数优化的可行性，通过使用Code Llama的经过微调的版本。通过使用LoRA进行参数高效微调，我们使LLM能够生成针对不同神经网络架构量身定制的准确高效的超参数推荐。与依赖于详尽试验的传统方法（如Optuna）不同，所提出的方法在均方根误差（RMSE）方面取得了具有竞争力或更优的结果，同时显著减少了计算开销。我们的方法强调，基于LLM的优化不仅与Tree-structured Parzen Estimators等最先进方法相匹敌，而且加快了调优过程。这将LLMs定位为传统优化技术的有希望替代品，特别适用于快速实验。此外，能够在单个推断步骤中生成超参数使得这种方法特别适用于资源受限环境，如边缘设备和移动应用，其中计算效率至关重要。结果证实，LLMs除了高效外，还提供了大量时间节约和可比性稳定性，突显了它们在推动机器学习工作流程方面的价值。所有生成的超参数都包含在LEMUR神经网络（NN）数据集中，该数据集是公开可用的，并作为超参数优化研究的开源基准。

更新时间: 2025-04-08 13:15:47

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2504.06006v1

An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning

Various tasks in data science are modeled utilizing the variational regularization approach, where manually selecting regularization parameters presents a challenge. The difficulty gets exacerbated when employing regularizers involving a large number of hyperparameters. To overcome this challenge, bilevel learning can be employed to learn such parameters from data. However, neither exact function values nor exact gradients with respect to the hyperparameters are attainable, necessitating methods that only rely on inexact evaluation of such quantities. State-of-the-art inexact gradient-based methods a priori select a sequence of the required accuracies and cannot identify an appropriate step size since the Lipschitz constant of the hypergradient is unknown. In this work, we propose an algorithm with backtracking line search that only relies on inexact function evaluations and hypergradients and show convergence to a stationary point. Furthermore, the proposed algorithm determines the required accuracy dynamically rather than manually selected before running it. Our numerical experiments demonstrate the efficiency and feasibility of our approach for hyperparameter estimation on a range of relevant problems in imaging and data science such as total variation and field of experts denoising and multinomial logistic regression. Particularly, the results show that the algorithm is robust to its own hyperparameters such as the initial accuracies and step size.

Updated: 2025-04-08 13:10:47

标题: 一个自适应不精确的一阶方法用于双层优化，并应用于超参数学习

摘要: 数据科学中的各种任务都是利用变分正则化方法建模的，其中手动选择正则化参数是一项挑战。当使用涉及大量超参数的正则化器时，困难会加剧。为了克服这一挑战，可以利用双层学习从数据中学习这些参数。然而，既不可能得到确切的函数值，也不可能得到关于超参数的确切梯度，这需要仅依赖于这些量的不精确评估的方法。最先进的不精确梯度基础方法预先选择所需精度的序列，并且无法确定适当的步长，因为超梯度的Lipschitz常数是未知的。在这项工作中，我们提出了一种算法，使用回溯线搜索，仅依赖于不精确的函数评估和超梯度，并展示了收敛到一个稳定点。此外，所提出的算法动态确定所需的精度，而不是在运行之前手动选择。我们的数值实验证明了我们的方法在图像和数据科学中的超参数估计问题上的有效性和可行性，例如总变差和专家域去噪和多项式逻辑回归。特别是，结果表明该算法对其自身的超参数（如初始精度和步长）具有鲁棒性。

更新时间: 2025-04-08 13:10:47

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2308.10098v4

Continuous Diffusion for Mixed-Type Tabular Data

Score-based generative models, commonly referred to as diffusion models, have proven to be successful at generating text and image data. However, their adaptation to mixed-type tabular data remains underexplored. In this work, we propose CDTD, a Continuous Diffusion model for mixed-type Tabular Data. CDTD is based on a novel combination of score matching and score interpolation to enforce a unified continuous noise distribution for both continuous and categorical features. We explicitly acknowledge the necessity of homogenizing distinct data types by relying on model-specific loss calibration and initialization schemes.To further address the high heterogeneity in mixed-type tabular data, we introduce adaptive feature- or type-specific noise schedules. These ensure balanced generative performance across features and optimize the allocation of model capacity across features and diffusion time. Our experimental results show that CDTD consistently outperforms state-of-the-art benchmark models, captures feature correlations exceptionally well, and that heterogeneity in the noise schedule design boosts sample quality. Replication code is available at https://github.com/muellermarkus/cdtd.

Updated: 2025-04-08 13:02:16

标题: 混合类型表格数据的连续扩散

摘要: 评分型生成模型，通常称为扩散模型，在生成文本和图像数据方面已被证明取得成功。然而，它们在混合类型表格数据上的应用仍未得到充分探讨。在这项工作中，我们提出了一种名为CDTD的连续扩散模型，用于混合类型表格数据。CDTD基于得分匹配和得分插值的新颖组合，以强制为连续和分类特征统一连续噪声分布。我们明确承认通过依赖于模型特定的损失校准和初始化方案来使不同数据类型同质化的必要性。为了进一步解决混合类型表格数据中的高异质性，我们引入了自适应特征或类型特定的噪声时间表。这些确保了在特征之间实现平衡的生成性能，并优化了模型容量在特征和扩散时间之间的分配。我们的实验结果显示，CDTD始终优于最先进的基准模型，异常良好地捕捉特征相关性，并且在噪声时间表设计中的异质性提高了样本质量。复制代码可在https://github.com/muellermarkus/cdtd找到。

更新时间: 2025-04-08 13:02:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.10431v5

NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge

The rapid advancement of large language models (LLMs) has raised concerns about cultural bias, fairness, and their applicability in diverse linguistic and underrepresented regional contexts. To enhance and benchmark the capabilities of LLMs, there is a need to develop large-scale resources focused on multilingual, local, and cultural contexts. In this study, we propose a framework, NativQA, that can seamlessly construct large-scale, culturally and regionally aligned QA datasets in native languages. The framework utilizes user-defined seed queries and leverages search engines to collect location-specific, everyday information. It has been evaluated across 39 locations in 24 countries and in 7 languages, ranging from extremely low-resource to high-resource languages, which resulted over 300K Question Answer (QA) pairs. The developed resources can be used for LLM benchmarking and further fine-tuning. The framework has been made publicly available for the community (https://gitlab.com/nativqa/nativqa-framework).

Updated: 2025-04-08 13:01:51

标题: NativQA框架：利用本地、当地和日常知识使LLMs能够实现

摘要: 大型语言模型（LLMs）的快速发展引起了人们对文化偏见、公平性以及它们在多样化语言和被低估的地区环境中的适用性的担忧。为了增强和评估LLMs的能力，需要开发专注于多语言、本地化和文化环境的大规模资源。在这项研究中，我们提出了一个名为NativQA的框架，可以无缝地构建大规模的、文化和地区对齐的母语问答数据集。该框架利用用户定义的种子查询，并利用搜索引擎收集特定于地点的日常信息。它已在24个国家的39个地点以及7种语言中进行评估，这些语言从极低资源到高资源的语言都有，结果产生了超过30万个问题答案（QA）对。开发的资源可以用于LLMs的基准测试和进一步的微调。该框架已公开提供给社区使用（https://gitlab.com/nativqa/nativqa-framework）。

更新时间: 2025-04-08 13:01:51

领域: cs.CL,cs.AI,68T50,F.2.2; I.2.7

下载: http://arxiv.org/abs/2504.05995v1

Graph Federated Learning Based Proactive Content Caching in Edge Computing

With the rapid growth of mobile data traffic and the increasing prevalence of video streaming, proactive content caching in edge computing has become crucial for reducing latency and alleviating network congestion. However, traditional caching strategies such as FIFO, LRU, and LFU fail to effectively predict future content popularity, while existing proactive caching approaches often require users to upload data to a central server, raising concerns regarding privacy and scalability. To address these challenges, this paper proposes a Graph Federated Learning-based Proactive Content Caching (GFPCC) scheme that enhances caching efficiency while preserving user privacy. The proposed approach integrates federated learning and graph neural networks, enabling users to locally train Light Graph Convolutional Networks (LightGCN) to capture user-item relationships and predict content popularity. Instead of sharing raw data, only the trained model parameters are transmitted to the central server, where a federated averaging algorithm aggregates updates, refines the global model, and selects the most popular files for proactive caching. Experimental evaluations on real-world datasets, such as MovieLens, demonstrate that GFPCC outperforms baseline caching algorithms by achieving higher cache efficiency through more accurate content popularity predictions. Moreover, the federated learning framework strengthens privacy protection while maintaining efficient model training; however, scalability remains a challenge in large-scale networks with dynamic user preferences.

Updated: 2025-04-08 12:46:45

标题: 基于图联邦学习的边缘计算中的主动内容缓存

摘要: 随着移动数据流量的快速增长和视频流媒体的普及，边缘计算中的主动内容缓存变得至关重要，以降低延迟和缓解网络拥塞。然而，传统的缓存策略如FIFO、LRU和LFU未能有效预测未来内容的流行度，而现有的主动缓存方法通常需要用户上传数据到中央服务器，引发隐私和可扩展性方面的担忧。为了解决这些挑战，本文提出了一种基于图联邦学习的主动内容缓存方案（GFPCC），旨在提高缓存效率同时保护用户隐私。提出的方法集成了联邦学习和图神经网络，使用户能够本地训练轻量级图卷积网络（LightGCN）来捕捉用户-物品关系并预测内容的流行度。而不是共享原始数据，只有训练过的模型参数被传输到中央服务器，其中联邦平均算法聚合更新，完善全局模型，并选择最受欢迎的文件进行主动缓存。对于真实世界的数据集，如MovieLens的实验评估表明，GFPCC通过更准确的内容流行度预测，优于基线缓存算法，实现更高的缓存效率。此外，联邦学习框架加强了隐私保护，同时保持了高效的模型训练；然而，在具有动态用户偏好的大规模网络中，可扩展性仍然是一个挑战。

更新时间: 2025-04-08 12:46:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.04760v2

Towards Simple Machine Learning Baselines for GNSS RFI Detection

Machine learning research in GNSS radio frequency interference (RFI) detection often lacks a proper justification for the decisions made in deep learning-based model architectures. Our paper challenges the status quo in machine learning approaches for GNSS RFI detection, revealing the potentially misleading track of current research and highlighting alternative directions. Our position advocates for a shift in focus from solely pursuing novel model designs to critically evaluating the utility of complex black box deep learning methods against simpler and more interpretable machine learning baselines. Our findings demonstrate the need for the creation of simple baselines and suggest the need for more exploration and development of simple and interpretable machine learning methods for the detection of GNSS RFIs. The increment of model complexity in the state-of-the-art deep learning-based models often provides very little improvement. Thanks to a unique dataset from Swiss Air Force and Swiss Air-Rescue (Rega), preprocessed by Swiss Air Navigation Services Ltd. (Skyguide), we demonstrate the effectiveness of a simple machine learning baseline for GNSS RFI detection on real-world large-scale aircraft data containing flight recordings impacted by real jamming. The experimental results indicate that our solution successfully detects potential GNSS RFI with 91% accuracy outperforming state-of-the-art deep learning architectures. We believe that our work offers insights and suggestions for the field to move forward.

Updated: 2025-04-08 12:45:01

标题: 朝向简单的机器学习基线模型用于GNSS RFI检测

摘要: 全球导航卫星系统射频干扰（RFI）检测中的机器学习研究通常缺乏对基于深度学习模型架构中所做决策的适当理由。我们的论文挑战了全球导航卫星系统射频干扰检测的机器学习方法的现状，揭示了当前研究可能误导的轨迹，并突显了替代方向。我们的立场主张将重点从仅仅追求新颖模型设计转变为对比较简单且可解释的机器学习基准线进行批判性评估。我们的发现表明，需要建立简单基准线，并建议更多地探索和发展用于检测全球导航卫星系统射频干扰的简单且可解释的机器学习方法。当前最先进的基于深度学习的模型复杂度增加往往提供很少的改进。由瑞士空军和瑞士空中救援（Rega）提供的独特数据集，由瑞士空中导航服务有限公司（Skyguide）预处理，我们展示了一个简单的机器学习基准线在包含受到实际干扰影响的飞行记录的真实世界大规模飞机数据上对全球导航卫星系统射频干扰检测的有效性。实验结果表明，我们的解决方案成功地以91%的准确率检测到潜在的全球导航卫星系统射频干扰，胜过了最先进的深度学习架构。我们相信我们的工作为该领域提供了见解和建议，推动其向前发展。

更新时间: 2025-04-08 12:45:01

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2504.07993v1

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) with large language models (LLMs) has demonstrated strong performance in multilingual question-answering (QA) tasks by leveraging relevant passages retrieved from corpora. In multilingual RAG (mRAG), the retrieved passages can be written in languages other than that of the query entered by the user, making it challenging for LLMs to effectively utilize the provided information. Recent research suggests that retrieving passages from multilingual corpora can improve RAG performance, particularly for low-resource languages. However, the extent to which LLMs can leverage different kinds of multilingual contexts to generate accurate answers, *independently from retrieval quality*, remains understudied. In this paper, we conduct an extensive assessment of LLMs' ability to (i) make consistent use of a relevant passage regardless of its language, (ii) respond in the expected language, and (iii) focus on the relevant passage even when multiple `distracting' passages in different languages are provided in the context. Our experiments with four LLMs across three QA datasets covering a total of 48 languages reveal a surprising ability of LLMs to extract the relevant information from out-language passages, but a much weaker ability to formulate a full answer in the correct language. Our analysis, based on both accuracy and feature attribution techniques, further shows that distracting passages negatively impact answer quality regardless of their language. However, distractors in the query language exert a slightly stronger influence. Taken together, our findings deepen the understanding of how LLMs utilize context in mRAG systems, providing directions for future improvements.

Updated: 2025-04-08 12:40:23

标题: 关于检索增强生成中多语言上下文利用的一致性

摘要: 检索增强生成（RAG）与大型语言模型（LLMs）在多语言问答（QA）任务中表现出强大的性能，通过利用从语料库中检索到的相关段落。在多语言RAG（mRAG）中，检索到的段落可以用用户输入的查询语言以外的语言编写，这使得LLMs有效利用提供的信息变得具有挑战性。最近的研究表明，从多语言语料库中检索段落可以提高RAG的性能，特别是对于资源匮乏的语言。然而，LLMs能够利用不同类型的多语言背景生成准确答案的程度，*独立于检索质量*，仍未得到充分研究。本文对LLMs的能力进行了广泛评估，评估LLMs能够（i）在不考虑语言的情况下一致使用相关段落，（ii）以预期的语言作出响应，以及（iii）在提供多个不同语言的“干扰”段落的情况下，专注于相关段落。我们在覆盖48种语言的三个QA数据集上使用四种LLMs进行实验，结果显示LLMs能够从跨语言段落中提取相关信息的惊人能力，但在以正确语言形式提出完整答案方面能力较弱。我们的分析基于准确性和特征归因技术，进一步显示干扰段落会对答案质量产生负面影响，无论其语言如何。然而，在查询语言中的干扰因素会产生稍微更强烈的影响。综合起来，我们的研究结果深化了关于LLMs如何在mRAG系统中利用上下文的理解，为未来改进提供了方向。

更新时间: 2025-04-08 12:40:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.00597v2

Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute

Recent advancements in software engineering agents have demonstrated promising capabilities in automating program improvements. However, their reliance on closed-source or resource-intensive models introduces significant deployment challenges in private environments, prompting a critical question: \textit{How can personally deployable open-source LLMs achieve comparable code reasoning performance?} To this end, we propose a unified Test-Time Compute scaling framework that leverages increased inference-time computation instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. Internally, we introduce a \textit{development-contextualized trajectory synthesis} method leveraging real-world software repositories to bootstrap multi-stage reasoning processes, such as fault localization and patch generation. We further enhance trajectory quality through rejection sampling, rigorously evaluating trajectories along accuracy and complexity. Externally, we propose a novel \textit{development-process-based search} strategy guided by reward models and execution verification. This approach enables targeted computational allocation at critical development decision points, overcoming limitations of existing "end-point only" verification methods. Evaluations on SWE-bench Verified demonstrate our \textbf{32B model achieves a 46\% issue resolution rate}, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1. Additionally, we provide the empirical validation of the test-time scaling phenomenon within SWE agents, revealing that \textbf{models dynamically allocate more tokens to increasingly challenging problems}, effectively enhancing reasoning capabilities. We publicly release all training data, models, and code to facilitate future research. https://github.com/yingweima2022/SWE-Reasoner

Updated: 2025-04-08 12:36:08

标题: 思考更长时间，而不是更大：通过扩展测试时计算来增强软件工程代理

摘要: 最近在软件工程代理方面取得的进展显示出自动化程序改进的潜力。然而，它们对闭源或资源密集型模型的依赖在私人环境中引入了重大的部署挑战，引发了一个关键问题：\textit{如何可以通过可个人部署的开源LLMs实现可比的代码推理性能？} 为此，我们提出了一个统一的测试时间计算缩放框架，利用增加的推理时间计算而不是更大的模型。我们的框架包括两种互补策略：内部TTC和外部TTC。在内部，我们引入了一种\textit{开发上下文化轨迹合成}方法，利用真实世界的软件存储库启动多阶段推理过程，如故障定位和补丁生成。我们通过拒绝抽样进一步提高轨迹质量，严格评估准确性和复杂性。在外部，我们提出了一种新颖的\textit{基于开发过程的搜索}策略，由奖励模型和执行验证指导。这种方法使得在关键的开发决策点上实现了有针对性的计算分配，克服了现有的“终点验证”方法的局限性。在SWE-bench Verified上的评估表明，我们的\textbf{32B模型实现了46\%的问题解决率}，超过了DeepSeek R1 671B和OpenAI o1等显著更大的模型。此外，我们提供了关于SWE代理中测试时间缩放现象的经验证实，揭示了\textbf{模型动态地向越来越具有挑战性的问题分配更多的令牌}，有效地增强了推理能力。我们公开发布所有的训练数据、模型和代码，以促进未来的研究。https://github.com/yingweima2022/SWE-Reasoner

更新时间: 2025-04-08 12:36:08

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.23803v2

Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models

Reinforcement learning (RL) is a powerful tool for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We propose using prior model knowledge to guide the exploration process to speed up this learning process. This model knowledge comes in the form of a model set to which the true transition kernel and reward function belong. We optimize over this model set to obtain upper and lower bounds on the Q-function, which are then used to guide the exploration of the agent. We provide theoretical guarantees on the convergence of the Q-function to the optimal Q-function under the proposed class of exploring policies. Furthermore, we also introduce a data-driven regularized version of the model set optimization problem that ensures the convergence of the class of exploring policies to the optimal policy. Lastly, we show that when the model set has a specific structure, namely the bounded-parameter MDP (BMDP) framework, the regularized model set optimization problem becomes convex and simple to implement. In this setting, we also show that we obtain finite-time convergence to the optimal policy under additional assumptions. We demonstrate the effectiveness of the proposed exploration strategy in a simulation study. The results indicate that the proposed method can significantly speed up the learning process in reinforcement learning.

Updated: 2025-04-08 12:33:38

标题: 使用有界不确定性模型的强化学习中的智能探索

摘要: 强化学习（RL）是在不确定环境中进行决策的强大工具，但通常需要大量数据来学习最优策略。我们提出使用先前的模型知识来指导探索过程，加快学习过程。该模型知识以模型集的形式出现，其中真实的转移核和奖励函数属于其中。我们对该模型集进行优化，以获得Q函数的上限和下限，然后用于指导代理的探索。我们提供了关于Q函数收敛到最优Q函数的理论保证，基于提出的探索策略类。此外，我们还引入了数据驱动的正则化版本的模型集优化问题，确保探索策略类收敛到最优策略。最后，我们展示了当模型集具有特定结构时，即有界参数MDP（BMDP）框架时，正则化模型集优化问题变得凸且易于实现。在这种情况下，我们还展示在额外假设下，我们获得有限时间的收敛到最优策略。我们在模拟研究中展示了所提出的探索策略的有效性。结果表明，所提出的方法可以显著加快强化学习中的学习过程。

更新时间: 2025-04-08 12:33:38

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.05978v1

Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries

Kolmogorov-Arnold Networks (KANs) have gained attention as a promising alternative to traditional Multilayer Perceptrons (MLPs) for deep learning applications in computational physics, especially within the framework of physics-informed neural networks (PINNs). Physics-informed Kolmogorov-Arnold Networks (PIKANs) and their variants have been introduced and evaluated to solve inverse problems. However, similar to PINNs, current versions of PIKANs are limited to obtaining solutions for a single computational domain per training run; consequently, a new geometry requires retraining the model from scratch. Physics-informed PointNet (PIPN) was introduced to address this limitation for PINNs. In this work, we introduce physics-informed Kolmogorov-Arnold PointNet (PI-KAN-PointNet) to extend this capability to PIKANs. PI-KAN-PointNet enables the simultaneous solution of an inverse problem over multiple irregular geometries within a single training run, reducing computational costs. We construct KANs using Jacobi polynomials and investigate their performance by considering Jacobi polynomials of different degrees and types in terms of both computational cost and prediction accuracy. As a benchmark test case, we consider natural convection in a square enclosure with a cylinder, where the cylinder's shape varies across a dataset of 135 geometries. We compare the performance of PI-KAN-PointNet with that of PIPN (i.e., physics-informed PointNet with MLPs) and observe that, with approximately an equal number of trainable parameters and similar computational cost, PI-KAN-PointNet provides more accurate predictions. Finally, we explore the combination of KAN and MLP in constructing a physics-informed PointNet. Our findings indicate that a physics-informed PointNet model employing MLP layers as the encoder and KAN layers as the decoder represents the optimal configuration among all models investigated.

Updated: 2025-04-08 12:31:57

标题: 物理信息KAN PointNet：深度学习用于在多种不规则几何形状上同时解决不可压缩流的反问题

摘要: Kolmogorov-Arnold Networks (KANs)作为传统多层感知器（MLPs）在计算物理学中深度学习应用的一种有前途的替代方案已经引起了关注，特别是在物理信息神经网络（PINNs）框架内。已经引入并评估了物理信息科尔莫戈洛夫-阿诺德网络（PIKANs）及其变体来解决反问题。然而，类似于PINNs，当前版本的PIKANs仅限于在单个计算域中获得解决方案，因此，新的几何形状需要从头开始重新训练模型。引入了物理信息PointNet（PIPN）以解决PINNs的这一限制。在这项工作中，我们引入了物理信息科尔莫戈洛夫-阿诺德PointNet（PI-KAN-PointNet）来将这种能力扩展到PIKANs。PI-KAN-PointNet使得能够在单次训练运行中同时解决多个不规则几何形状上的反问题，从而降低了计算成本。我们使用雅各比多项式构建KANs，并通过考虑不同次数和类型的雅各比多项式来研究它们的性能，包括计算成本和预测准确性。作为基准测试案例，我们考虑了带有圆柱体的正方形围场中的自然对流，其中圆柱体的形状在一个包含135个几何形状的数据集中变化。我们将PI-KAN-PointNet的性能与PIPN（即，具有MLPs的物理信息PointNet）进行比较，并观察到，在可训练参数数量大致相等且计算成本相似的情况下，PI-KAN-PointNet提供了更准确的预测。最后，我们探讨了在构建物理信息PointNet时将KAN和MLP组合起来的可能性。我们的研究结果表明，采用MLP层作为编码器和KAN层作为解码器的物理信息PointNet模型代表了在所有研究的模型中的最佳配置。

更新时间: 2025-04-08 12:31:57

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2504.06327v1

MLPROP -- an open interactive web interface for thermophysical property prediction with machine learning

Machine learning (ML) enables the development of powerful methods for predicting thermophysical properties with unprecedented scope and accuracy. However, technical barriers like cumbersome implementation in established workflows hinder their application in practice. With MLPROP, we provide an interactive web interface for directly applying advanced ML methods to predict thermophysical properties without requiring ML expertise, thereby substantially increasing the accessibility of novel models. MLPROP currently includes models for predicting the vapor pressure of pure components (GRAPPA), activity coefficients and vapor-liquid equilibria in binary mixtures (UNIFAC 2.0, mod. UNIFAC 2.0, and HANNA), and a routine to fit NRTL parameters to the model predictions. MLPROP will be continuously updated and extended and is accessible free of charge via https://ml-prop.mv.rptu.de/. MLPROP removes the barrier to learning and experimenting with new ML-based methods for predicting thermophysical properties. The source code of all models is available as open source, which allows integration into existing workflows.

Updated: 2025-04-08 12:28:18

标题: MLPROP-一个用于机器学习热物性属性预测的开放交互式网络界面

摘要: 机器学习（ML）使得能够开发出强大的方法，以前所未有的范围和准确性预测热物性性质。然而，技术障碍，如在已建立的工作流程中繁琐的实施，阻碍了它们在实践中的应用。通过MLPROP，我们提供了一个交互式网络界面，直接应用先进的机器学习方法来预测热物性性质，而无需机器学习专业知识，从而大大提高了新型模型的可访问性。MLPROP目前包括用于预测纯组分的蒸气压（GRAPPA）、二元混合物中的活度系数和蒸汽-液体平衡（UNIFAC 2.0，mod. UNIFAC 2.0和HANNA）的模型，以及一个用于将NRTL参数拟合到模型预测中的例行程序。MLPROP将不断更新和扩展，可免费通过https://ml-prop.mv.rptu.de/访问。MLPROP消除了学习和尝试新的基于机器学习方法来预测热物性性质的障碍。所有模型的源代码都作为开源提供，可用于集成到现有工作流程中。

更新时间: 2025-04-08 12:28:18

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2504.05970v1

Security Vulnerabilities in Ethereum Smart Contracts: A Systematic Analysis

Smart contracts are a secure and trustworthy application that plays a vital role in decentralized applications in various fields such as insurance,the internet, and gaming. However, in recent years, smart contract security breaches have occurred frequently, and due to their financial properties, they have caused huge economic losses, such as the most famous security incident "The DAO" which caused a loss of over \$60 million in Ethereum. This has drawn a lot of attention from all sides. Writing a secure smart contract is now a critical issue.This paper focuses on Ether smart contracts and explains the main components of Ether, smart contract architecture and mechanism.The environment used in this paper is the Ethernet environment, using remix online compilation platform and Solidity language, according to the four security events of American Chain, The DAO, Parity and KotET, the principles of integer overflow attack, reentrant attack, access control attack and denial of service attack are studied and analyzed accordingly, and the scenarios of these vulnerabilities are reproduced, and the measures to prevent them are given. Finally, preventive measures are given. In addition, the principles of short address attack, early transaction attack and privileged function exposure attack are also introduced in detail, and security measures are proposed.As vulnerabilities continue to emerge, their classification will also evolve. The analysis and research of the current vulnerabilities are also to lay a solid foundation for avoiding more vulnerabilities.

Updated: 2025-04-08 12:25:34

标题: 以太坊智能合约中的安全漏洞：系统性分析

摘要: 智能合约是一种安全可信的应用程序，在保险、互联网和游戏等各个领域的去中心化应用中发挥着至关重要的作用。然而，近年来智能合约安全漏洞频繁发生，由于其财务属性，已经造成了巨大的经济损失，比如最著名的安全事件"The DAO"导致以太坊损失超过6千万美元。这引起了各方的关注。编写安全的智能合约现在是一个关键问题。本文重点关注Ether智能合约，解释了Ether的主要组成部分、智能合约架构和机制。本文使用以太网环境，使用remix在线编译平台和Solidity语言，根据美国链、The DAO、Parity和KotET的四个安全事件，研究和分析了整数溢出攻击、递归攻击、访问控制攻击和拒绝服务攻击的原则，并重现了这些漏洞的场景，并提出了预防措施。最后，提出了预防措施。此外，还详细介绍了短地址攻击、早期交易攻击和特权函数暴露攻击的原则，并提出了安全措施。随着漏洞不断出现，它们的分类也将不断演变。对当前漏洞的分析和研究也是为了避免更多漏洞打下坚实基础。

更新时间: 2025-04-08 12:25:34

领域: cs.CR,D.2.4

下载: http://arxiv.org/abs/2504.05968v1

Autoencoder-Based Detection of Anomalous Stokes V Spectra in the Flare-Producing Active Region 13663 Using Hinode/SP Observations

Detecting unusual signals in observational solar spectra is crucial for understanding the features associated with impactful solar events, such as solar flares. However, existing spectral analysis techniques face challenges, particularly when relying on pre-defined, physics-based calculations to process large volumes of noisy and complex observational data. To address these limitations, we applied deep learning to detect anomalies in the Stokes V spectra from the Hinode/SP instrument. Specifically, we developed an autoencoder model for spectral compression, which serves as an anomaly detection method. Our model effectively identifies anomalous spectra within spectro-polarimetric maps captured prior to the onset of the X1.3 flare on May 5, 2024, in NOAA AR 13663. These atypical spectral points exhibit highly complex profiles and spatially align with polarity inversion lines in magnetogram images, indicating their potential as sites of magnetic energy storage and possible triggers for flares. Notably, the detected anomalies are highly localized, making them particularly challenging to identify in magnetogram images using current manual methods.

Updated: 2025-04-08 12:20:47

标题: 基于自动编码器的Hinode/SP观测中13663活跃区产生耀斑的异常Stokes V光谱检测

摘要: 在观测太阳光谱中检测异常信号对于理解与影响太阳事件（如太阳耀斑）相关的特征至关重要。然而，现有的光谱分析技术面临挑战，特别是在依赖预定义的基于物理的计算来处理大量嘈杂和复杂的观测数据时。为了解决这些限制，我们应用深度学习来检测日出/SP仪器的Stokes V光谱中的异常。具体而言，我们开发了一个用于光谱压缩的自动编码器模型，它作为一种异常检测方法。我们的模型有效地识别了在2024年5月5日X1.3耀斑爆发前捕获的NOAA AR 13663中的光谱偏离。这些非典型的光谱点展现出高度复杂的轮廓，并在磁场图像中与极性反转线空间对齐，表明它们有可能是磁能储存的场所，可能是耀斑的触发器。值得注意的是，检测到的异常点高度局部化，使它们在使用当前手动方法在磁场图像中识别时尤其具有挑战性。

更新时间: 2025-04-08 12:20:47

领域: cs.LG,astro-ph.IM,astro-ph.SR

下载: http://arxiv.org/abs/2504.05962v1

Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

We present and analyze a novel regularized form of the gradient clipping algorithm, proving that it converges to global minima of the loss surface of deep neural networks under the squared loss, provided that the layers are of sufficient width. The algorithm presented here, dubbed $\delta-$GClip, introduces a modification to gradient clipping that leads to a first-of-its-kind example of a step size scheduling for gradient descent that provably minimizes training losses of deep neural nets. We also present empirical evidence that our theoretically founded $\delta-$GClip algorithm is competitive with the state-of-the-art deep learning heuristics on various neural architectures including modern transformer based architectures. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Lojasiewicz inequality which was recently proven to be true for sufficiently wide neural networks at any depth within a neighbourhood of the initialization.

Updated: 2025-04-08 12:19:22

标题: 正则化梯度裁剪可证明训练宽而深的神经网络

摘要: 我们提出并分析了一种新颖的梯度剪裁算法的正则化形式，证明了在深度神经网络的损失曲面下，只要层的宽度足够，该算法收敛到全局最小值。这里介绍的算法被称为$\delta-$GClip，它对梯度剪裁进行了修改，导致了一个首次的梯度下降步长调度的例子，可以证明最小化深度神经网络的训练损失。我们还提出了实证证据表明，我们在理论上基础的$\delta-$GClip算法在各种神经网络架构上与最先进的深度学习启发式算法竞争力相当，包括现代基于Transformer的架构。我们对标准梯度剪裁进行的修改旨在利用PL*条件，这是Polyak-Lojasiewicz不等式的一个变体，最近已被证明对于足够宽的神经网络在初始化的邻域内的任何深度都成立。

更新时间: 2025-04-08 12:19:22

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.08624v2

ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes

3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis but is limited by the substantial number of Gaussian primitives required, posing challenges for deployment on lightweight devices. Recent methods address this issue by compressing the storage size of densified Gaussians, yet fail to preserve rendering quality and efficiency. To overcome these limitations, we propose ProtoGS to learn Gaussian prototypes to represent Gaussian primitives, significantly reducing the total Gaussian amount without sacrificing visual quality. Our method directly uses Gaussian prototypes to enable efficient rendering and leverage the resulting reconstruction loss to guide prototype learning. To further optimize memory efficiency during training, we incorporate structure-from-motion (SfM) points as anchor points to group Gaussian primitives. Gaussian prototypes are derived within each group by clustering of K-means, and both the anchor points and the prototypes are optimized jointly. Our experiments on real-world and synthetic datasets prove that we outperform existing methods, achieving a substantial reduction in the number of Gaussians, and enabling high rendering speed while maintaining or even enhancing rendering fidelity.

Updated: 2025-04-08 12:19:01

标题: ProtoGS：使用3D高斯原型进行高效且高质量的渲染

摘要: 3D高斯点云（3DGS）在新视角合成方面取得了重大进展，但由于需要大量高斯基元，因此在轻量设备上部署时存在挑战。最近的方法通过压缩高斯密度存储大小来解决这个问题，但未能保持渲染质量和效率。为了克服这些限制，我们提出ProtoGS来学习高斯原型来表示高斯基元，显著减少总高斯数量而不牺牲视觉质量。我们的方法直接使用高斯原型实现高效渲染，并利用结果重构损失来指导原型学习。为了在训练过程中进一步优化内存效率，我们将运动结构（SfM）点作为锚点将高斯基元分组。通过K均值聚类在每个组中派生高斯原型，同时联合优化锚点和原型。我们在现实世界和合成数据集上的实验证明，我们优于现有方法，显著减少高斯数量，同时保持或甚至增强了渲染保真度。

更新时间: 2025-04-08 12:19:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17486v3

Drought forecasting using a hybrid neural architecture for integrating time series and static data

Reliable forecasting is critical for early warning systems and adaptive drought management. Most previous deep learning approaches focus solely on homogeneous regions and rely on single-structured data. This paper presents a hybrid neural architecture that integrates time series and static data, achieving state-of-the-art performance on the DroughtED dataset. Our results illustrate the potential of designing neural models for the treatment of heterogeneous data in climate related tasks and present reliable prediction of USDM categories, an expert-informed drought metric. Furthermore, this work validates the potential of DroughtED for enabling location-agnostic training of deep learning models.

Updated: 2025-04-08 12:11:34

标题: 使用混合神经结构集成时间序列和静态数据进行干旱预测

摘要: 可靠的预测对于早期预警系统和适应性干旱管理至关重要。大多数先前的深度学习方法仅关注同质地区，并依赖于单一结构化数据。本文提出了一种混合神经架构，将时间序列和静态数据整合在一起，在DroughtED数据集上实现了最先进的性能。我们的结果展示了为气候相关任务设计神经模型处理异质数据的潜力，并可可靠地预测USDM类别，这是一个专家设计的干旱度量标准。此外，这项工作验证了DroughtED具有使深度学习模型进行位置不敏感训练的潜力。

更新时间: 2025-04-08 12:11:34

领域: cs.LG

下载: http://arxiv.org/abs/2504.05957v1

Temporal Alignment-Free Video Matching for Few-shot Action Recognition

Few-Shot Action Recognition (FSAR) aims to train a model with only a few labeled video instances. A key challenge in FSAR is handling divergent narrative trajectories for precise video matching. While the frame- and tuple-level alignment approaches have been promising, their methods heavily rely on pre-defined and length-dependent alignment units (e.g., frames or tuples), which limits flexibility for actions of varying lengths and speeds. In this work, we introduce a novel TEmporal Alignment-free Matching (TEAM) approach, which eliminates the need for temporal units in action representation and brute-force alignment during matching. Specifically, TEAM represents each video with a fixed set of pattern tokens that capture globally discriminative clues within the video instance regardless of action length or speed, ensuring its flexibility. Furthermore, TEAM is inherently efficient, using token-wise comparisons to measure similarity between videos, unlike existing methods that rely on pairwise comparisons for temporal alignment. Additionally, we propose an adaptation process that identifies and removes common information across classes, establishing clear boundaries even between novel categories. Extensive experiments demonstrate the effectiveness of TEAM. Codes are available at github.com/leesb7426/TEAM.

Updated: 2025-04-08 12:11:11

标题: 无时间对齐的视频匹配方法用于少样本动作识别

摘要: Few-Shot Action Recognition (FSAR)旨在仅使用少量标记的视频实例训练模型。在FSAR中的一个关键挑战是处理不同的叙事轨迹，以实现精确的视频匹配。虽然基于帧和元组级别的对齐方法表现出前景，但它们的方法严重依赖于预定义和长度相关的对齐单元（例如帧或元组），这限制了对不同长度和速度的动作的灵活性。在这项工作中，我们介绍了一种新颖的TEmporal Alignment-free Matching（TEAM）方法，该方法消除了在动作表示和匹配过程中对时间单位的需求和蛮力对齐。具体地，TEAM使用一组固定的模式标记来表示每个视频，这些标记捕捉视频实例中的全局区分性线索，无论动作长度或速度如何，都可以确保其灵活性。此外，TEAM本质上高效，使用标记级别的比较来衡量视频之间的相似性，不像现有方法依赖于用于时间对齐的成对比较。此外，我们提出了一个适应过程，识别和移除跨类别的共同信息，建立甚至在新颖类别之间也能清晰界定的界限。大量实验证明了TEAM的有效性。代码可在github.com/leesb7426/TEAM找到。

更新时间: 2025-04-08 12:11:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05956v1

Toward Cost-efficient Adaptive Clinical Trials in Knee Osteoarthritis with Reinforcement Learning

Osteoarthritis (OA) is the most common musculoskeletal disease, with knee OA (KOA) being one of the leading causes of disability and a significant economic burden. Predicting KOA progression is crucial for improving patient outcomes, optimizing healthcare resources, studying the disease, and developing new treatments. The latter application particularly requires one to understand the disease progression in order to collect the most informative data at the right time. Existing methods, however, are limited by their static nature and their focus on individual joints, leading to suboptimal predictive performance and downstream utility. Our study proposes a new method that allows to dynamically monitor patients rather than individual joints with KOA using a novel Active Sensing (AS) approach powered by Reinforcement Learning (RL). Our key idea is to directly optimize for the downstream task by training an agent that maximizes informative data collection while minimizing overall costs. Our RL-based method leverages a specially designed reward function to monitor disease progression across multiple body parts, employs multimodal deep learning, and requires no human input during testing. Extensive numerical experiments demonstrate that our approach outperforms current state-of-the-art models, paving the way for the next generation of KOA trials.

Updated: 2025-04-08 12:10:27

标题: 朝向成本效益高的膝骨关节炎自适应临床试验的强化学习

摘要: 骨关节炎（OA）是最常见的肌肉骨骼疾病，其中膝关节骨关节炎（KOA）是致残的主要原因之一，也是一个重要的经济负担。预测KOA的进展对于改善患者预后、优化医疗资源、研究疾病以及开发新的治疗方法至关重要。后者特别需要理解疾病的进展，以便在正确的时间收集最具信息量的数据。然而，现有方法受到其静态性质和对个体关节的关注的限制，导致预测性能和下游效用不佳。我们的研究提出了一种新方法，利用强化学习（RL）驱动的新型主动感知（AS）方法，允许动态监测患者而不是单个关节的KOA。我们的关键思想是通过训练一个最大化信息数据收集并最小化总体成本的代理来直接优化下游任务。我们基于RL的方法利用专门设计的奖励函数跨多个身体部位监测疾病进展，采用多模态深度学习，在测试期间不需要人为输入。广泛的数值实验表明，我们的方法优于当前最先进的模型，为下一代KOA试验铺平了道路。

更新时间: 2025-04-08 12:10:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.02349v4

Unsupervised Location Mapping for Narrative Corpora

This work presents the task of unsupervised location mapping, which seeks to map the trajectory of an individual narrative on a spatial map of locations in which a large set of narratives take place. Despite the fundamentality and generality of the task, very little work addressed the spatial mapping of narrative texts. The task consists of two parts: (1) inducing a ``map'' with the locations mentioned in a set of texts, and (2) extracting a trajectory from a single narrative and positioning it on the map. Following recent advances in increasing the context length of large language models, we propose a pipeline for this task in a completely unsupervised manner without predefining the set of labels. We test our method on two different domains: (1) Holocaust testimonies and (2) Lake District writing, namely multi-century literature on travels in the English Lake District. We perform both intrinsic and extrinsic evaluations for the task, with encouraging results, thereby setting a benchmark and evaluation practices for the task, as well as highlighting challenges.

Updated: 2025-04-08 12:06:47

标题: 无监督位置映射用于叙事语料库

摘要: 这项工作介绍了无监督位置映射的任务，旨在将个体叙事的轨迹映射到大量叙事发生的空间位置地图上。尽管这一任务的基础性和普适性，很少有工作涉及叙事文本的空间映射。该任务包括两个部分：（1）诱导一组文本中提及的“地图”位置，以及（2）从单个叙事中提取轨迹并将其定位在地图上。借鉴了大型语言模型增加上下文长度的最新进展，我们提出了一个完全无监督的管道，用于执行此任务，而不预先定义标签集。我们在两个不同领域上测试了我们的方法：（1）大屠杀证词和（2）湖区写作，即关于英国湖区旅行的跨世纪文学。我们为该任务执行了内在和外在评估，并取得了令人鼓舞的结果，从而为该任务设定了基准和评估实践，同时突出了挑战。

更新时间: 2025-04-08 12:06:47

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.05954v1

Representing Normative Regulations in OWL DL for Automated Compliance Checking Supported by Text Annotation

Compliance checking is the process of determining whether a regulated entity adheres to these regulations. Currently, compliance checking is predominantly manual, requiring significant time and highly skilled experts, while still being prone to errors caused by the human factor. Various approaches have been explored to automate compliance checking, however, representing regulations in OWL DL language which enables compliance checking through OWL reasoning has not been adopted. In this work, we propose an annotation schema and an algorithm that transforms text annotations into machine-interpretable OWL DL code. The proposed approach is validated through a proof-of-concept implementation applied to examples from the building construction domain.

Updated: 2025-04-08 12:05:21

标题: 用OWL DL表示规范性法规以支持文本注释的自动合规检查

摘要: 合规检查是确定受监管实体是否遵守这些规定的过程。目前，合规检查主要是手动进行的，需要大量时间和高技能的专家，同时仍然容易出现由人为因素引起的错误。已经探索了各种方法来自动化合规检查，然而，尚未采用通过OWL推理实现合规检查的OWL DL语言表示法。在这项工作中，我们提出了一个注解模式和一个将文本注释转换为机器可解释的OWL DL代码的算法。所提出的方法通过一个概念验证实现应用于建筑施工领域的示例进行了验证。

更新时间: 2025-04-08 12:05:21

领域: cs.AI

下载: http://arxiv.org/abs/2504.05951v1

AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems

Improving decision-making capabilities in Autonomous Intelligent Vehicles (AIVs) has been a heated topic in recent years. Despite advancements, training machines to capture regions of interest for comprehensive scene understanding, like human perception and reasoning, remains a significant challenge. This study introduces a novel framework, Human Attention-based Explainable Guidance for Intelligent Vehicle Systems (AEGIS). AEGIS utilizes human attention, converted from eye-tracking, to guide reinforcement learning (RL) models to identify critical regions of interest for decision-making. AEGIS uses a pre-trained human attention model to guide RL models to identify critical regions of interest for decision-making. By collecting 1.2 million frames from 20 participants across six scenarios, AEGIS pre-trains a model to predict human attention patterns.

Updated: 2025-04-08 12:04:52

标题: AEGIS：基于人类注意力的可解释引导智能车辆系统

摘要: 在最近几年，提升自主智能车辆（AIVs）的决策能力一直是一个热门话题。尽管取得了进展，训练机器捕捉感兴趣区域以实现全面场景理解，如人类感知和推理，仍然是一个重大挑战。本研究介绍了一个新颖的框架，人类关注点解释引导智能车辆系统（AEGIS）。AEGIS利用从眼动追踪转换的人类关注点，引导强化学习（RL）模型识别决策制定的关键区域。AEGIS使用一个预训练的人类关注模型来引导RL模型识别决策制定的关键区域。通过在六种情境下收集了来自20名参与者的120万帧，AEGIS预训练一个模型来预测人类关注模式。

更新时间: 2025-04-08 12:04:52

领域: cs.AI

下载: http://arxiv.org/abs/2504.05950v1

MM-STFlowNet: A Transportation Hub-Oriented Multi-Mode Passenger Flow Prediction Method via Spatial-Temporal Dynamic Graph Modeling

Accurate and refined passenger flow prediction is essential for optimizing the collaborative management of multiple collection and distribution modes in large-scale transportation hubs. Traditional methods often focus only on the overall passenger volume, neglecting the interdependence between different modes within the hub. To address this limitation, we propose MM-STFlowNet, a comprehensive multi-mode prediction framework grounded in dynamic spatial-temporal graph modeling. Initially, an integrated temporal feature processing strategy is implemented using signal decomposition and convolution techniques to address data spikes and high volatility. Subsequently, we introduce the Spatial-Temporal Dynamic Graph Convolutional Recurrent Network (STDGCRN) to capture detailed spatial-temporal dependencies across multiple traffic modes, enhanced by an adaptive channel attention mechanism. Finally, the self-attention mechanism is applied to incorporate various external factors, further enhancing prediction accuracy. Experiments on a real-world dataset from Guangzhounan Railway Station in China demonstrate that MM-STFlowNet achieves state-of-the-art performance, particularly during peak periods, providing valuable insight for transportation hub management.

Updated: 2025-04-08 12:00:06

标题: MM-STFlowNet：基于时空动态图建模的交通枢纽导向的多模式乘客流预测方法

摘要: 准确和精细的乘客流量预测对于优化大型交通枢纽中多种集运和配送模式的协同管理至关重要。传统方法往往只关注整体乘客量，忽视了枢纽内不同模式之间的相互依赖关系。为了解决这一局限性，我们提出了MM-STFlowNet，这是一个基于动态时空图建模的全面多模式预测框架。首先，利用信号分解和卷积技术实施集成的时间特征处理策略，以解决数据波动和高波动性。随后，我们引入了空间-时间动态图卷积循环网络(STDGCRN)来捕捉多种交通模式之间的详细空间-时间依赖关系，通过自适应通道注意机制增强。最后，应用自我注意机制来整合各种外部因素，进一步提高预测准确性。在中国广州南站的真实数据集上进行的实验表明，MM-STFlowNet在高峰期间取得了最先进的性能，为交通枢纽管理提供了宝贵的见解。

更新时间: 2025-04-08 12:00:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06325v1

CKGAN: Training Generative Adversarial Networks Using Characteristic Kernel Integral Probability Metrics

In this paper, we propose CKGAN, a novel generative adversarial network (GAN) variant based on an integral probability metrics framework with characteristic kernel (CKIPM). CKIPM, as a distance between two probability distributions, is designed to optimize the lowerbound of the maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space, and thus can be used to train GANs. CKGAN mitigates the notorious problem of mode collapse by mapping the generated images back to random noise. To save the effort of selecting the kernel function manually, we propose a soft selection method to automatically learn a characteristic kernel function. The experimental evaluation conducted on a set of synthetic and real image benchmarks (MNIST, CelebA, etc.) demonstrates that CKGAN generally outperforms other MMD-based GANs. The results also show that at the cost of moderately more training time, the automatically selected kernel function delivers very close performance to the best of manually fine-tuned one on real image benchmarks and is able to improve the performances of other MMD-based GANs.

Updated: 2025-04-08 11:58:56

标题: CKGAN：使用特征核积分概率度量训练生成对抗网络

摘要: 在本文中，我们提出了CKGAN，这是一种基于特征核（CKIPM）的积分概率度量框架的新型生成对抗网络（GAN）变体。CKIPM作为两个概率分布之间的距离，旨在优化再生核希尔伯特空间中的最大均值差异（MMD）的下界，因此可以用来训练GAN。CKGAN通过将生成的图像映射回随机噪声来缓解模式崩溃的问题。为了节省手动选择核函数的工作量，我们提出了一种软选择方法，可以自动学习一个特征核函数。对一组合成和真实图像基准（MNIST，CelebA等）进行的实验评估表明，CKGAN通常优于其他基于MMD的GAN。结果还表明，以适度更多的训练时间为代价，自动选择的核函数在真实图像基准上可以提供非常接近手动微调的最佳性能，并且能够改进其他基于MMD的GAN的性能。

更新时间: 2025-04-08 11:58:56

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.05945v1

From Stability to Inconsistency: A Study of Moral Preferences in LLMs

As large language models (LLMs) increasingly integrate into our daily lives, it becomes crucial to understand their implicit biases and moral tendencies. To address this, we introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory, which conceptualizes human morality through six core foundations. We propose a novel evaluation method that captures the full spectrum of LLMs' revealed moral preferences by answering a range of real-world moral dilemmas. Our findings reveal that state-of-the-art models have remarkably homogeneous value preferences, yet demonstrate a lack of consistency.

Updated: 2025-04-08 11:52:50

标题: 从稳定到不一致：LLMs中道德偏好的研究

摘要: 随着大型语言模型（LLMs）越来越多地融入我们的日常生活，了解它们的潜在偏见和道德倾向变得至关重要。为了解决这个问题，我们介绍了一个基于道德基础理论的Moral Foundations LLM数据集（MFD-LLM），该理论通过六个核心基础概念化人类道德。我们提出了一种新颖的评估方法，通过回答一系列现实世界的道德困境，捕捉了LLMs揭示的道德偏好的完整谱。我们的研究发现，最先进的模型具有非常同质的价值偏好，但缺乏一致性。

更新时间: 2025-04-08 11:52:50

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2504.06324v1

Mosaic: Composite Projection Pruning for Resource-efficient LLMs

Extensive compute and memory requirements limit the deployment of large language models (LLMs) on any hardware. Compression methods, such as pruning, can reduce model size, which in turn reduces resource requirements. State-of-the-art pruning is based on coarse-grained methods. They are time-consuming and inherently remove critical model parameters, adversely impacting the quality of the pruned model. This paper introduces projection pruning, a novel fine-grained method for pruning LLMs. In addition, LLM projection pruning is enhanced by a new approach we refer to as composite projection pruning - the synergistic combination of unstructured pruning that retains accuracy and structured pruning that reduces model size. We develop Mosaic, a novel system to create and deploy pruned LLMs using composite projection pruning. Mosaic is evaluated using a range of performance and quality metrics on multiple hardware platforms, LLMs, and datasets. Mosaic is 7.19x faster in producing models than existing approaches. Mosaic models achieve up to 84.2% lower perplexity and 31.4% higher accuracy than models obtained from coarse-grained pruning. Up to 67% faster inference and 68% lower GPU memory use is noted for Mosaic models.

Updated: 2025-04-08 11:51:35

标题: 镶嵌：用于资源高效的LLMs的复合投影修剪

摘要: 大型语言模型（LLMs）的广泛计算和内存需求限制了在任何硬件上部署。压缩方法，如修剪，可以减小模型大小，从而降低资源需求。最先进的修剪方法基于粗粒度方法。它们耗时且会导致关键模型参数的移除，从而对修剪后模型的质量产生不利影响。本文介绍了投影修剪，一种用于修剪LLMs的新颖细粒度方法。此外，LLM投影修剪通过我们称之为复合投影修剪的新方法得到增强 - 这是保留准确性的非结构化修剪和减小模型大小的结构化修剪的协同组合。我们开发了一种新颖的系统Mosaic，用于利用复合投影修剪创建和部署修剪后的LLMs。Mosaic在多个硬件平台、LLMs和数据集上使用一系列性能和质量指标进行评估。Mosaic在生成模型方面比现有方法快7.19倍。Mosaic模型的困惑度最高降低84.2%，准确度比粗粒度修剪获得的模型高31.4%。Mosaic模型的推理速度最高提高67%，GPU内存使用减少了68%。

更新时间: 2025-04-08 11:51:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06323v1

Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss

PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the way remained elusive for PAC-Bayes. While PAC-Bayes allows construction of data-informed priors, the final confidence intervals depend only on the number of points that were not used for the construction of the prior, whereas confidence information in the prior, which is related to the number of points used to construct the prior, is lost. This limits the possibility and benefit of sequential prior updates, because the final bounds depend only on the size of the final batch. We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss. The procedure is based on a novel decomposition of the expected loss of randomized classifiers. The decomposition rewrites the loss of the posterior as an excess loss relative to a downscaled loss of the prior plus the downscaled loss of the prior, which is bounded recursively. As a side result, we also present a generalization of the split-kl and PAC-Bayes-split-kl inequalities to discrete random variables, which we use for bounding the excess losses, and which can be of independent interest. In empirical evaluation the new procedure significantly outperforms state-of-the-art.

Updated: 2025-04-08 11:45:31

标题: 递归PAC-Bayes：一种频率主义方法，用于顺序先验更新且不丢失信息

摘要: PAC贝叶斯分析是一个频率论框架，用于将先验知识纳入学习中。它受到贝叶斯学习的启发，允许顺序数据处理，并自然地将一个处理步骤的后验转化为下一个处理步骤的先验。然而，尽管经过了二十五年的研究，能够在不丢失置信信息的情况下逐步更新先验的能力对于PAC-Bayes来说仍然是难以实现的。尽管PAC-Bayes允许构建数据驱动的先验，但最终的置信区间仅取决于未用于构造先验的数据点的数量，而先验中的置信信息，与用于构造先验的数据点数量相关，却丢失了。这限制了顺序先验更新的可能性和好处，因为最终的边界仅取决于最终批次的大小。我们提出了一种新颖的，回顾起来令人惊讶地简单而强大的PAC-Bayesian程序，允许无信息损失地进行顺序先验更新。该程序基于对随机分类器的期望损失的新颖分解。该分解将后验的损失重新表述为相对于先验的缩小损失的过量损失，再加上先验的缩小损失，这些损失被递归地界定。作为一个副产品，我们还提出了分割kl和PAC-Bayes分割kl不等式到离散随机变量的泛化，我们用于界定过量损失，并且可能具有独立的兴趣。在实证评估中，新程序明显优于现有技术水平。

更新时间: 2025-04-08 11:45:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.14681v3

TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices

Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware resource constraints. In this work, we propose TinyFormer, a framework specifically designed to develop and deploy resource-efficient transformers on MCUs. TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine. Separately, SuperNAS aims to search for an appropriate supernet from a vast search space. SparseNAS evaluates the best sparse single-path model including transformer architecture from the identified supernet. Finally, SparseEngine efficiently deploys the searched sparse models onto MCUs. To the best of our knowledge, SparseEngine is the first deployment framework capable of performing inference of sparse models with transformer on MCUs. Evaluation results on the CIFAR-10 dataset demonstrate that TinyFormer can develop efficient transformers with an accuracy of 96.1% while adhering to hardware constraints of 1MB storage and $320$KB memory. Additionally, TinyFormer achieves significant speedups in sparse inference, up to 12.2x, when compared to the CMSIS-NN library. TinyFormer is believed to bring powerful transformers into TinyML scenarios and greatly expand the scope of deep learning applications.

Updated: 2025-04-08 11:42:15

标题: 微小变压器：微小设备上的高效变压器设计和部署

摘要: 在各种嵌入式物联网应用中，将深度学习模型（如微控制器单元MCUs）部署到小型设备上已经引起了广泛关注。然而，由于硬件资源受限，高效设计和部署最新的先进模型（如变压器）在小型设备上是具有挑战性的。在这项工作中，我们提出了TinyFormer，这是一个专门设计用于在MCUs上开发和部署资源高效的变压器的框架。TinyFormer主要由SuperNAS、SparseNAS和SparseEngine组成。其中，SuperNAS旨在从庞大的搜索空间中搜索适当的超网络。SparseNAS评估了从确定的超网络中包括变压器架构的最佳稀疏单路径模型。最后，SparseEngine有效地将搜索到的稀疏模型部署到MCUs上。据我们所知，SparseEngine是第一个能够在MCUs上执行具有变压器的稀疏模型推断的部署框架。在CIFAR-10数据集上的评估结果表明，TinyFormer可以开发出准确率为96.1%的高效变压器，并且符合1MB存储和320KB内存的硬件限制。此外，与CMSIS-NN库相比，TinyFormer在稀疏推断方面实现了显著的加速，最高可达12.2倍。TinyFormer被认为可以将功能强大的变压器引入TinyML场景，并大大扩展深度学习应用的范围。

更新时间: 2025-04-08 11:42:15

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2311.01759v2

Evaluation of the impact of expert knowledge: How decision support scores impact the effectiveness of automatic knowledge-driven feature engineering (aKDFE)

Adverse Drug Events (ADEs), harmful medication effects, pose significant healthcare challenges, impacting patient safety and costs. This study evaluates automatic Knowledge-Driven Feature Engineering (aKDFE) for improved ADE prediction from Electronic Health Record (EHR) data, comparing it with automated event-based Knowledge Discovery in Databases (KDD). We investigated how incorporating domain-specific ADE risk scores for prolonged heart QT interval, extracted from the Janusmed Riskprofile (Janusmed) Clinical Decision Support System (CDSS), affects prediction performance using EHR data and medication handling events. Results indicate that, while aKDFE step 1 (event-based feature generation) alone did not significantly improve ADE prediction performance, aKDFE step 2 (patient-centric transformation) enhances the prediction performance. High Area Under the Receiver Operating Characteristic curve (AUROC) values suggest strong feature correlations to the outcome, aligning with the predictive power of patients' prior healthcare history for ADEs. Statistical analysis did not confirm that incorporating the Janusmed information (i) risk scores and (ii) medication route of administration into the model's feature set enhanced predictive performance. However, the patient-centric transformation applied by aKDFE proved to be a highly effective feature engineering approach. Limitations include a single-project focus, potential bias from machine learning pipeline methods, and reliance on AUROC. In conclusion, aKDFE, particularly with patient-centric transformation, improves ADE prediction from EHR data. Future work will explore attention-based models, event feature sequences, and automatic methods for incorporating domain knowledge into the aKDFE framework.

Updated: 2025-04-08 11:34:38

标题: 专家知识影响评估：决策支持评分如何影响自动知识驱动的特征工程（aKDFE）的有效性

摘要: Adverse Drug Events (ADEs), 有害药物事件，构成了重大的医疗挑战，影响患者安全和成本。本研究评估了自动知识驱动特征工程（aKDFE）对从电子健康记录（EHR）数据中改进ADE预测的影响，并将其与自动事件驱动的数据库知识发现（KDD）进行比较。我们研究了如何将从Janusmed风险概况（Janusmed）临床决策支持系统（CDSS）中提取的特定领域ADE风险评分，如延长心脏QT间期，纳入EHR数据和药物处理事件中，影响预测性能。结果表明，尽管aKDFE的第一步（基于事件的特征生成）单独并没有显著提高ADE预测性能，但aKDFE的第二步（以患者为中心的转换）增强了预测性能。高面积在接收操作特征曲线下的ROC（AUROC）值表明特征与结果之间有很强的相关性，与患者先前健康历史对ADE的预测能力相一致。统计分析未证实将Janusmed信息（i）风险评分和（ii）药物给药途径纳入模型特征集会增强预测性能。然而，aKDFE应用的以患者为中心的转换被证明是一种高效的特征工程方法。局限性包括单一项目焦点、机器学习管道方法可能存在的偏见以及对AUROC的依赖。总之，aKDFE，特别是以患者为中心的转换，可以改进从EHR数据中的ADE预测。未来的工作将探索基于注意力的模型、事件特征序列和自动方法如何将领域知识纳入aKDFE框架中。

更新时间: 2025-04-08 11:34:38

领域: cs.LG,62R01, 68T05,I.2.6

下载: http://arxiv.org/abs/2504.05928v1

Uncovering Fairness through Data Complexity as an Early Indicator

Fairness constitutes a concern within machine learning (ML) applications. Currently, there is no study on how disparities in classification complexity between privileged and unprivileged groups could influence the fairness of solutions, which serves as a preliminary indicator of potential unfairness. In this work, we investigate this gap, specifically, we focus on synthetic datasets designed to capture a variety of biases ranging from historical bias to measurement and representational bias to evaluate how various complexity metrics differences correlate with group fairness metrics. We then apply association rule mining to identify patterns that link disproportionate complexity differences between groups with fairness-related outcomes, offering data-centric indicators to guide bias mitigation. Our findings are also validated by their application in real-world problems, providing evidence that quantifying group-wise classification complexity can uncover early indicators of potential fairness challenges. This investigation helps practitioners to proactively address bias in classification tasks.

Updated: 2025-04-08 11:28:40

标题: 揭示数据复杂性作为早期指标的公平性

摘要: 公平性构成了机器学习（ML）应用中的一个关注点。目前，关于特权和非特权群体之间分类复杂性差异如何影响解决方案公平性的研究尚未进行，这作为潜在不公平的初步指标。在这项工作中，我们调查了这一差距，具体地，我们关注设计用于捕捉从历史偏见到测量和表现偏见的各种偏见的合成数据集，以评估不同复杂度度量之间的差异与群体公平度量之间的相关性。然后，我们应用关联规则挖掘来识别将群体之间的不成比例的复杂度差异与与公平相关的结果联系起来的模式，提供数据中心的指标来指导偏见缓解。我们的发现也通过在真实问题中的应用进行了验证，证明了量化群体分类复杂性可以揭示潜在公平挑战的早期指标。这项调查帮助从业人员主动解决分类任务中的偏见。

更新时间: 2025-04-08 11:28:40

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2504.05923v1

A Materials Map Integrating Experimental and Computational Data via Graph-Based Machine Learning for Enhanced Materials Discovery

Materials informatics (MI), emerging from the integration of materials science and data science, is expected to significantly accelerate material development and discovery. The data used in MI are derived from both computational and experimental studies; however, their integration remains challenging. In our previous study, we reported the integration of these datasets by applying a machine learning model that is trained on the experimental dataset to the compositional data stored in the computational database. In this study, we use the obtained datasets to construct materials maps, which visualize the relationships between material properties and structural features, aiming to support experimental researchers. The materials map is constructed using the MatDeepLearn (MDL) framework, which implements materials property prediction using graph-based representations of material structure and deep learning modeling. Through statistical analysis, we find that the MDL framework using the message passing neural network (MPNN) architecture efficiently extracts features reflecting the structural complexity of materials. Moreover, we find that this advantage does not necessarily translate into improved accuracy in the prediction of material properties. We attribute this unexpected outcome to the high learning performance inherent in MPNN, which can contribute to the structuring of data points within the materials map.

Updated: 2025-04-08 11:19:16

标题: 一个通过基于图的机器学习整合实验和计算数据的材料地图，用于增强材料发现

摘要: 材料信息学（MI）源自材料科学和数据科学的整合，预计将显著加速材料的开发和发现。MI中使用的数据来自计算和实验研究，但它们的整合仍然具有挑战性。在我们先前的研究中，我们报告了通过应用一个在实验数据集上训练的机器学习模型来整合这些数据集，将其与计算数据库中存储的组成数据相结合。在这项研究中，我们使用获取的数据集构建材料图，可视化材料属性和结构特征之间的关系，旨在支持实验研究人员。材料图是使用MatDeepLearn（MDL）框架构建的，该框架实现了基于材料结构的图形表示和深度学习建模的材料属性预测。通过统计分析，我们发现使用消息传递神经网络（MPNN）架构的MDL框架有效提取反映材料结构复杂性的特征。此外，我们发现这一优势并不一定会转化为材料属性预测的准确性提高。我们将这一意外结果归因于MPNN中固有的高学习性能，这可以有助于在材料图中构建数据点的结构化。

更新时间: 2025-04-08 11:19:16

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2503.07378v5

Deep RL-based Autonomous Navigation of Micro Aerial Vehicles (MAVs) in a complex GPS-denied Indoor Environment

The Autonomy of Unmanned Aerial Vehicles (UAVs) in indoor environments poses significant challenges due to the lack of reliable GPS signals in enclosed spaces such as warehouses, factories, and indoor facilities. Micro Aerial Vehicles (MAVs) are preferred for navigating in these complex, GPS-denied scenarios because of their agility, low power consumption, and limited computational capabilities. In this paper, we propose a Reinforcement Learning based Deep-Proximal Policy Optimization (D-PPO) algorithm to enhance realtime navigation through improving the computation efficiency. The end-to-end network is trained in 3D realistic meta-environments created using the Unreal Engine. With these trained meta-weights, the MAV system underwent extensive experimental trials in real-world indoor environments. The results indicate that the proposed method reduces computational latency by 91\% during training period without significant degradation in performance. The algorithm was tested on a DJI Tello drone, yielding similar results.

Updated: 2025-04-08 11:14:37

标题: 基于深度强化学习的微型无人机在复杂无GPS室内环境中的自主导航

摘要: 在室内环境中，无人机（UAVs）的自主性面临着重大挑战，因为封闭空间中缺乏可靠的GPS信号，如仓库、工厂和室内设施。由于其灵活性、低功耗和有限的计算能力，微型空中车辆（MAVs）被认为是在这些复杂的、无GPS信号的场景中导航的首选。在本文中，我们提出了一种基于强化学习的深度-邻近策略优化（D-PPO）算法，以提高实时导航的计算效率。端到端网络在使用虚幻引擎创建的3D逼真元环境中进行训练。通过这些训练好的元重量，MAV系统在真实世界的室内环境中经历了广泛的实验试验。结果表明，所提出的方法在训练期间将计算延迟降低了91%，而性能并没有明显降低。该算法还在DJI Tello无人机上进行了测试，得到了类似的结果。

更新时间: 2025-04-08 11:14:37

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2504.05918v1

PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario

Driving scene understanding is a critical real-world problem that involves interpreting and associating various elements of a driving environment, such as vehicles, pedestrians, and traffic signals. Despite advancements in autonomous driving, traditional pipelines rely on deterministic models that fail to capture the probabilistic nature and inherent uncertainty of real-world driving. To address this, we propose PRIMEDrive-CoT, a novel uncertainty-aware model for object interaction and Chain-of-Thought (CoT) reasoning in driving scenarios. In particular, our approach combines LiDAR-based 3D object detection with multi-view RGB references to ensure interpretable and reliable scene understanding. Uncertainty and risk assessment, along with object interactions, are modelled using Bayesian Graph Neural Networks (BGNNs) for probabilistic reasoning under ambiguous conditions. Interpretable decisions are facilitated through CoT reasoning, leveraging object dynamics and contextual cues, while Grad-CAM visualizations highlight attention regions. Extensive evaluations on the DriveCoT dataset demonstrate that PRIMEDrive-CoT outperforms state-of-the-art CoT and risk-aware models.

Updated: 2025-04-08 11:06:02

标题: PRIMEDrive-CoT：一种基于先知链式思维框架的面向不确定性的驾驶场景中物体交互。

摘要: 驾驶场景理解是一个关键的现实世界问题，涉及解释和关联驾驶环境中的各种元素，如车辆、行人和交通信号等。尽管自动驾驶技术取得了进展，但传统的流程依赖于确定性模型，无法捕捉真实世界驾驶的概率特性和固有不确定性。为了解决这个问题，我们提出了PRIMEDrive-CoT，这是一个新颖的不确定性感知模型，用于驾驶场景中的对象交互和Chain-of-Thought（CoT）推理。具体来说，我们的方法将基于LiDAR的3D对象检测与多视角RGB参考相结合，以确保可解释和可靠的场景理解。不确定性和风险评估，以及对象交互，使用贝叶斯图神经网络（BGNNs）建模，以在模糊条件下进行概率推理。通过CoT推理促进可解释的决策，利用对象动态和上下文线索，而Grad-CAM可视化突出显示关注区域。对DriveCoT数据集进行的广泛评估表明，PRIMEDrive-CoT优于最先进的CoT和风险感知模型。

更新时间: 2025-04-08 11:06:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.05908v1

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Fine-tuning large language models (LLMs) on human preferences, typically through reinforcement learning from human feedback (RLHF), has proven successful in enhancing their capabilities. However, ensuring the safety of LLMs during fine-tuning remains a critical concern, and mitigating the potential conflicts in safety and helpfulness is costly in RLHF. To address this issue, we propose a supervised learning framework called Bi-Factorial Preference Optimization (BFPO), which re-parameterizes a joint RLHF objective of both safety and helpfulness into a single supervised learning objective. In supervised optimization, a labeling function is used to capture the global preferences ranking to balance both safety and helpfulness. To evaluate BFPO, we develop a benchmark that includes comprehensive discriminative and generative tasks for helpfulness and harmlessness. The results indicate that our method significantly outperforms existing approaches in both safety and helpfulness. Moreover, BFPO achieves the same level of safety as methods that heavily rely on human labor with less than 10\% of the computational resources and human prompting and annotation process. The training recipes can be found here: https://github.com/wx-zhang/bfpo.

Updated: 2025-04-08 11:04:33

标题: 双因素偏好优化：在语言模型中平衡安全性和帮助性

摘要: 将大型语言模型（LLMs）微调到人类偏好上，通常通过从人类反馈中的强化学习（RLHF）已被证明可以成功增强其能力。然而，确保在微调过程中LLMs的安全性仍然是一个关键关注点，而在RLHF中缓解安全性和有益性之间的潜在冲突是昂贵的。为了解决这个问题，我们提出了一个名为Bi-Factorial Preference Optimization（BFPO）的监督学习框架，它将同时考虑安全性和有益性的联合RLHF目标重新参数化为单一的监督学习目标。在监督优化中，使用一个标注函数来捕捉全局偏好排序，以平衡安全性和有益性。为了评估BFPO，我们开发了一个包括全面的辨别和生成任务的基准，用于评估有益性和无害性。结果表明，我们的方法在安全性和有益性方面明显优于现有方法。此外，BFPO在不到10％的计算资源和人类提示和注释过程的情况下实现了与严重依赖人力的方法相同水平的安全性。训练程序可以在这里找到：https://github.com/wx-zhang/bfpo。

更新时间: 2025-04-08 11:04:33

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.15313v2

Intrinsic Saliency Guided Trunk-Collateral Network for Unsupervised Video Object Segmentation

Recent unsupervised video object segmentation (UVOS) methods predominantly adopt the motion-appearance paradigm. Mainstream motion-appearance approaches use either the two-encoder structure to separately encode motion and appearance features, or the single-encoder structure for joint encoding. However, these methods fail to properly balance the motion-appearance relationship. Consequently, even with complex fusion modules for motion-appearance integration, the extracted suboptimal features degrade the models' overall performance. Moreover, the quality of optical flow varies across scenarios, making it insufficient to rely solely on optical flow to achieve high-quality segmentation results. To address these challenges, we propose the Intrinsic Saliency guided Trunk-Collateral Net}work (ISTC-Net), which better balances the motion-appearance relationship and incorporates model's intrinsic saliency information to enhance segmentation performance. Specifically, considering that optical flow maps are derived from RGB images, they share both commonalities and differences. We propose a novel Trunk-Collateral structure. The shared trunk backbone captures the motion-appearance commonality, while the collateral branch learns the uniqueness of motion features. Furthermore, an Intrinsic Saliency guided Refinement Module (ISRM) is devised to efficiently leverage the model's intrinsic saliency information to refine high-level features, and provide pixel-level guidance for motion-appearance fusion, thereby enhancing performance without additional input. Experimental results show that ISTC-Net achieved state-of-the-art performance on three UVOS datasets (89.2% J&F on DAVIS-16, 76% J on YouTube-Objects, 86.4% J on FBMS) and four standard video salient object detection (VSOD) benchmarks with the notable increase, demonstrating its effectiveness and superiority over previous methods.

Updated: 2025-04-08 11:02:14

标题: 基于内在显著性引导的树干-支路网络用于无监督视频目标分割

摘要: 最近的无监督视频目标分割（UVOS）方法主要采用运动-外观范式。主流的运动-外观方法使用两个编码器结构分别编码运动和外观特征，或者使用单个编码器结构进行联合编码。然而，这些方法未能很好地平衡运动-外观关系。因此，即使使用了复杂的融合模块进行运动-外观整合，提取出的次优特征会降低模型的整体性能。此外，光流的质量在不同场景下变化，仅依靠光流无法实现高质量的分割结果。为了解决这些挑战，我们提出了Intrinsic Saliency guided Trunk-Collateral Network（ISTC-Net），它更好地平衡了运动-外观关系，并利用模型的内在显著信息来增强分割性能。具体来说，考虑到光流图是从RGB图像中派生出来的，它们既有共同点又有差异。我们提出了一种新颖的Trunk-Collateral结构。共享的主干骨干捕捉了运动-外观的共性，而支路学习了运动特征的独特性。此外，设计了一个Intrinsic Saliency guided Refinement Module（ISRM），有效利用模型的内在显著信息来优化高层特征，并为运动-外观融合提供像素级指导，从而提高性能而无需额外输入。实验结果表明，ISTC-Net在三个UVOS数据集（DAVIS-16上的89.2％J＆F，YouTube-Objects上的76％J，FBMS上的86.4％J）和四个标准视频显著目标检测（VSOD）基准上取得了最先进的性能，显著提高了，证明其有效性和优越性超过了先前的方法。

更新时间: 2025-04-08 11:02:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.05904v1

Defending Deep Neural Networks against Backdoor Attacks via Module Switching

The exponential increase in the parameters of Deep Neural Networks (DNNs) has significantly raised the cost of independent training, particularly for resource-constrained entities. As a result, there is a growing reliance on open-source models. However, the opacity of training processes exacerbates security risks, making these models more vulnerable to malicious threats, such as backdoor attacks, while simultaneously complicating defense mechanisms. Merging homogeneous models has gained attention as a cost-effective post-training defense. However, we notice that existing strategies, such as weight averaging, only partially mitigate the influence of poisoned parameters and remain ineffective in disrupting the pervasive spurious correlations embedded across model parameters. We propose a novel module-switching strategy to break such spurious correlations within the model's propagation path. By leveraging evolutionary algorithms to optimize fusion strategies, we validate our approach against backdoor attacks targeting text and vision domains. Our method achieves effective backdoor mitigation even when incorporating a couple of compromised models, e.g., reducing the average attack success rate (ASR) to 22% compared to 31.9% with the best-performing baseline on SST-2.

Updated: 2025-04-08 11:01:07

标题: 通过模块切换对抗后门攻击的深度神经网络防御

摘要: 深度神经网络（DNNs）参数的指数增长显著提高了独立训练的成本，特别是对于资源受限的实体。因此，人们越来越依赖开源模型。然而，训练过程的不透明性加剧了安全风险，使这些模型更容易受到恶意威胁的攻击，如后门攻击，同时也使防御机制变得更加复杂。合并同质模型作为一种经济有效的后训练防御方式受到关注。然而，我们注意到现有的策略，如权重平均化，只能部分减轻被毒害参数的影响，并且在破坏模型参数之间嵌入的普遍虚假相关性方面仍然无效。我们提出了一种新颖的模块切换策略来破坏模型传播路径中的这种虚假相关性。通过利用进化算法优化融合策略，我们验证了我们的方法针对针对文本和视觉领域的后门攻击的有效性。即使在整合了一些受损模型的情况下，我们的方法也能有效地减轻后门攻击，例如将攻击成功率（ASR）的平均值从SST-2上表现最好的基线的31.9%降至22%。

更新时间: 2025-04-08 11:01:07

领域: cs.CR,cs.CL,I.2.7; I.2.10

下载: http://arxiv.org/abs/2504.05902v1

On the Hölder Stability of Multiset and Graph Neural Networks

Extensive research efforts have been put into characterizing and constructing maximally separating multiset and graph neural networks. However, recent empirical evidence suggests the notion of separation itself doesn't capture several interesting phenomena. On the one hand, the quality of this separation may be very weak, to the extent that the embeddings of "separable" objects might even be considered identical when using fixed finite precision. On the other hand, architectures which aren't capable of separation in theory, somehow achieve separation when taking the network to be wide enough. In this work, we address both of these issues, by proposing a novel pair-wise separation quality analysis framework which is based on an adaptation of Lipschitz and \Holder{} stability to parametric functions. The proposed framework, which we name \emph{\Holder{} in expectation}, allows for separation quality analysis, without restricting the analysis to embeddings that can separate all the input space simultaneously. We prove that common sum-based models are lower-\Holder{} in expectation, with an exponent that decays rapidly with the network's depth . Our analysis leads to adversarial examples of graphs which can be separated by three 1-WL iterations, but cannot be separated in practice by standard maximally powerful Message Passing Neural Networks (MPNNs). To remedy this, we propose two novel MPNNs with improved separation quality, one of which is lower Lipschitz in expectation. We show these MPNNs can easily classify our adversarial examples, and compare favorably with standard MPNNs on standard graph learning tasks.

Updated: 2025-04-08 10:58:36

标题: 关于多集和图神经网络的Hölder稳定性

摘要: 大量的研究工作已经投入到表征和构建最大分离的多集和图神经网络中。然而，最近的经验证据表明，分离本身并不能捕捉几个有趣的现象。一方面，这种分离的质量可能非常弱，以至于在使用固定有限精度时，“可分”对象的嵌入甚至可能被认为是相同的。另一方面，在理论上无法实现分离的结构，在网络足够宽时，却能实现分离。在这项工作中，我们通过提出一种新颖的基于参数函数的Lipschitz和Holder稳定性的改编的成对分离质量分析框架来解决这两个问题。我们命名为“Holder in expectation”的提出的框架，允许进行分离质量分析，而不限制分析到能够同时分离所有输入空间的嵌入。我们证明常见的基于和模型在期望下是较低的Holder，其指数随网络深度迅速衰减。我们的分析导致图的对抗性例子，可以通过三次1-WL迭代进行分离，但无法通过标准的最大功率消息传递神经网络（MPNNs）在实践中分离。为了解决这个问题，我们提出了两种新颖的具有改进分离质量的MPNNs，其中一种在期望下是较低的Lipschitz。我们表明这些MPNNs可以轻松分类我们的对抗性例子，并在标准图学习任务上与标准MPNNs进行了有利的比较。

更新时间: 2025-04-08 10:58:36

领域: cs.LG

下载: http://arxiv.org/abs/2406.06984v3

Green Prompting

Large Language Models (LLMs) have become widely used across various domains spanning search engines, code generation, and text creation. However, a major concern associated with their adoption is the high cost of inference, impacting both their sustainability and financial feasibility. In this study, we empirically study how different prompt and response characteristics directly impact LLM inference energy cost. We conduct experiments leveraging three open-source transformer-based LLMs across three task types$-$question answering, sentiment analysis, and text generation. For each inference, we analyzed prompt and response characteristics (length, semantic meaning, time taken, energy consumption). Our results demonstrate that even when presented with identical tasks, models generate responses with varying characteristics and subsequently exhibit distinct energy consumption patterns. We found that prompt length is less significant than the semantic meaning of the task itself. In addition, we identified specific keywords associated with higher or lower energy usage that vary between associated tasks. These findings highlight the importance of prompt design in optimizing inference efficiency. We conclude that the semantic meaning of prompts and certain task-related keywords significantly impact inference costs, leading the way for deeper exploration towards creating energy-adaptive LLMs.

Updated: 2025-04-08 10:56:07

标题: 绿色提示

摘要: 大型语言模型（LLMs）已经在涵盖搜索引擎、代码生成和文本创作等各个领域广泛应用。然而，与其采用相关的主要关注点是推理成本高昂，影响了它们的可持续性和财务可行性。在这项研究中，我们经验性地研究了不同提示和响应特征如何直接影响LLM推理能耗。我们进行了实验，利用三种开源基于transformer的LLM在三种任务类型$-$问题回答、情感分析和文本生成中。对于每次推理，我们分析了提示和响应特征（长度、语义意义、所需时间、能源消耗）。我们的结果表明，即使面临相同的任务，模型生成的响应具有不同的特征，随后展现出不同的能源消耗模式。我们发现，提示长度不如任务本身的语义意义重要。此外，我们确定了与更高或更低能源使用相关的特定关键词，这些关键词在相关任务之间变化。这些发现凸显了提示设计在优化推理效率方面的重要性。我们得出结论，提示的语义意义和某些与任务相关的关键词显著影响推理成本，为深入探索创造能源适应型LLMs铺平了道路。

更新时间: 2025-04-08 10:56:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.10666v2

Drawing a Map of Elections

Our main contribution is the introduction of the map of elections framework. A map of elections consists of three main elements: (1) a dataset of elections (i.e., collections of ordinal votes over given sets of candidates), (2) a way of measuring similarities between these elections, and (3) a representation of the elections in the 2D Euclidean space as points, so that the more similar two elections are, the closer are their points. In our maps, we mostly focus on datasets of synthetic elections, but we also show an example of a map over real-life ones. To measure similarities, we would have preferred to use, e.g., the isomorphic swap distance, but this is infeasible due to its high computational complexity. Hence, we propose polynomial-time computable positionwise distance and use it instead. Regarding the representations in 2D Euclidean space, we mostly use the Kamada-Kawai algorithm, but we also show two alternatives. We develop the necessary theoretical results to form our maps and argue experimentally that they are accurate and credible. Further, we show how coloring the elections in a map according to various criteria helps in analyzing results of a number of experiments. In particular, we show colorings according to the scores of winning candidates or committees, running times of ILP-based winner determination algorithms, and approximation ratios achieved by particular algorithms.

Updated: 2025-04-08 10:52:54

标题: 绘制选举地图

摘要: 我们的主要贡献是引入选举地图框架。选举地图由三个主要元素组成：(1) 选举数据集（即，在给定候选人集合上的序数投票集合），(2) 衡量这些选举之间相似性的方法，以及(3) 将选举在二维欧几里德空间中表示为点，使得两个选举越相似，它们的点越接近。在我们的地图中，我们主要关注合成选举数据集，但我们也展示了一个基于现实选举的地图示例。为了衡量相似性，我们本来更倾向于使用同构交换距离，但由于其高计算复杂性，这是不可行的。因此，我们提出了可多项式时间计算的位置距离，并使用它代替。关于在二维欧几里德空间中的表示，我们主要使用Kamada-Kawai算法，但我们也展示了两个替代方案。我们发展了形成我们的地图所需的理论结果，并通过实验证明它们准确可靠。此外，我们展示了如何根据各种标准在地图中对选举进行着色有助于分析多项实验的结果。具体而言，我们展示了根据获胜候选人或委员会的得分、基于ILP的获胜者确定算法的运行时间，以及特定算法实现的近似比率的着色。

更新时间: 2025-04-08 10:52:54

领域: cs.MA,cs.AI,cs.GT

下载: http://arxiv.org/abs/2504.03809v2

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

Human intention-based systems enable robots to perceive and interpret user actions to interact with humans and adapt to their behavior proactively. Therefore, intention prediction is pivotal in creating a natural interaction with social robots in human-designed environments. In this paper, we examine using Large Language Models (LLMs) to infer human intention in a collaborative object categorization task with a physical robot. We propose a novel multimodal approach that integrates user non-verbal cues, like hand gestures, body poses, and facial expressions, with environment states and user verbal cues to predict user intentions in a hierarchical architecture. Our evaluation of five LLMs shows the potential for reasoning about verbal and non-verbal user cues, leveraging their context-understanding and real-world knowledge to support intention prediction while collaborating on a task with a social robot. Video: https://youtu.be/tBJHfAuzohI

Updated: 2025-04-08 10:48:19

标题: 比较苹果和橙子：LLM 动力的多模态意图预测在对象分类任务中

摘要: 基于人类意图的系统使机器人能够感知和解释用户的行动，与人类互动并主动适应他们的行为。因此，在人类设计的环境中，意图预测对于创建自然互动与社交机器人至关重要。本文中，我们研究了使用大型语言模型（LLMs）在与物理机器人进行协作对象分类任务中推断人类意图。我们提出了一种新颖的多模态方法，将用户的非语言线索（如手势、身体姿势和面部表情）与环境状态和用户的语言线索相结合，以在分层架构中预测用户意图。我们评估了五种LLMs，显示了通过推理关于语言和非语言用户线索，利用它们的上下文理解和现实世界知识来支持与社交机器人协作任务中的意图预测的潜力。视频链接：https://youtu.be/tBJHfAuzohI

更新时间: 2025-04-08 10:48:19

领域: cs.RO,cs.AI,cs.HC,I.2.9; I.2.7; I.2.8

下载: http://arxiv.org/abs/2404.08424v3

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference

The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase in computation. However, the large MoE model size still introduces substantial memory demands, which usually requires expert offloading on resource-constrained platforms and incurs significant overhead. Hybrid CPU-GPU inference has been proposed to leverage CPU computation to reduce expert loading overhead but faces major challenges: on one hand, the expert activation patterns of MoE models are highly unstable, rendering the fixed mapping strategies in existing works inefficient; on the other hand, the hybrid CPU-GPU schedule for MoE is inherently complex due to the diverse expert sizes, structures, uneven workload distribution, etc. To address these challenges, in this paper, we propose HybriMoE, a hybrid CPU-GPU inference framework that improves resource utilization through a novel CPU-GPU scheduling and cache management system. HybriMoE introduces (i) a dynamic intra-layer scheduling strategy to balance workloads across CPU and GPU, (ii) an impact-driven inter-layer prefetching algorithm, and (iii) a score-based caching algorithm to mitigate expert activation instability. We implement HybriMoE on top of the kTransformers framework and evaluate it on three widely used MoE-based LLMs. Experimental results demonstrate that HybriMoE achieves an average speedup of 1.33$\times$ in the prefill stage and 1.70$\times$ in the decode stage compared to state-of-the-art hybrid MoE inference framework. Our code is available at: https://github.com/PKU-SEC-Lab/HybriMoE.

Updated: 2025-04-08 10:47:37

标题: HybriMoE: 用于高效MoE推断的混合CPU-GPU调度和缓存管理

摘要: 专家混合（MoE）架构已经证明具有显著优势，因为它能够增加模型容量而不需要成比例增加计算量。然而，大型MoE模型仍然引入了相当大的内存需求，通常需要在资源受限的平台上卸载专家，并产生显著的开销。已经提出了混合CPU-GPU推断来利用CPU计算以减少专家加载开销，但面临着重大挑战：一方面，MoE模型的专家激活模式非常不稳定，使得现有作品中固定映射策略效率低下；另一方面，由于专家大小、结构、工作负载分布不均等原因，MoE的混合CPU-GPU调度本质上是复杂的。为了解决这些挑战，在本文中，我们提出了HybriMoE，这是一个通过创新的CPU-GPU调度和缓存管理系统改进资源利用率的混合CPU-GPU推断框架。HybriMoE引入了（i）动态的层内调度策略来平衡CPU和GPU之间的工作负载，（ii）基于影响的层间预取算法，以及（iii）基于分数的缓存算法来减轻专家激活不稳定性。我们在kTransformers框架上实现了HybriMoE，并在三个广泛使用的基于MoE的LLM上进行了评估。实验结果表明，与最先进的混合MoE推断框架相比，HybriMoE在预填充阶段的平均加速比为1.33倍，在解码阶段为1.70倍。我们的代码可在以下链接找到：https://github.com/PKU-SEC-Lab/HybriMoE。

更新时间: 2025-04-08 10:47:37

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2504.05897v1

Why do zeroes happen? A model-based approach for demand classification

Effective demand forecasting is critical for inventory management, production planning, and decision making across industries. Selecting the appropriate model and suitable features to efficiently capture patterns in the data is one of the main challenges in demand forecasting. In reality, this becomes even more complicated when the recorded sales have zeroes, which can happen naturally or due to some anomalies, such as stockouts and recording errors. Mistreating the zeroes can lead to the application of inappropriate forecasting methods, and thus leading to poor decision making. Furthermore, the demand itself can have different fundamental characteristics, and being able to distinguish one type from another might bring substantial benefits in terms of accuracy and thus decision making. We propose a two-stage model-based classification framework that in the first step, identifies artificially occurring zeroes, and then classifies demand to one of the possible types: regular/intermittent, intermittent smooth/lumpy, fractional/count. The framework utilises statistical modelling and information criteria to detect anomalous zeroes and then classify demand into those categories. We then argue that different types of demand need different features, and show empirically that they tend to increase the accuracy of the forecasting methods compared to those applied directly to the dataset without the generated features and the two-stage framework. Our general practical recommendation based on that is to use the mixture approach for intermittent demand, capturing the demand sizes and demand probability separately, as it seems to improve the accuracy of different forecasting approaches.

Updated: 2025-04-08 10:45:30

标题: 为什么会出现零值？一种基于模型的需求分类方法

摘要: 有效的需求预测对于库存管理、生产规划和各行业的决策制定至关重要。选择适当的模型和合适的特征以有效捕捉数据中的模式是需求预测中的主要挑战之一。实际上，当记录的销售量为零时，情况变得更加复杂，这可能是自然发生的，也可能是由于一些异常情况，如断货和记录错误。错误处理零值可能会导致应用不当的预测方法，从而导致决策不佳。此外，需求本身可能具有不同的基本特征，能够区分一个类型与另一个类型可能会在准确性和决策方面带来重大好处。我们提出了一个基于模型的两阶段分类框架，第一步是识别人为出现的零值，然后将需求分类为可能的类型之一：常规/间歇性、间歇性平滑/不平滑、分数/计数。该框架利用统计建模和信息准则来检测异常的零值，然后将需求分类到这些类别中。我们认为不同类型的需求需要不同的特征，并在实证上展示，与直接应用于数据集的预测方法相比，它们往往能提高预测方法的准确性。基于这一点，我们的一般实际建议是对间歇性需求采用混合方法，分别捕捉需求量和需求概率，因为这似乎能提高不同预测方法的准确性。

更新时间: 2025-04-08 10:45:30

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2504.05894v1

DiVA-DocRE: A Discriminative and Voice-Aware Paradigm for Document-Level Relation Extraction

The remarkable capabilities of Large Language Models (LLMs) in text comprehension and generation have revolutionized Information Extraction (IE). One such advancement is in Document-level Relation Triplet Extraction (DocRTE), a critical task in information systems that aims to extract entities and their semantic relationships from documents. However, existing methods are primarily designed for Sentence level Relation Triplet Extraction (SentRTE), which typically handles a limited set of relations and triplet facts within a single sentence. Additionally, some approaches treat relations as candidate choices integrated into prompt templates, resulting in inefficient processing and suboptimal performance when determining the relation elements in triplets. To address these limitations, we introduce a Discriminative and Voice Aware Paradigm DiVA. DiVA involves only two steps: performing document-level relation extraction (DocRE) and then identifying the subject object entities based on the relation. No additional processing is required simply input the document to directly obtain the triplets. This streamlined process more accurately reflects real-world scenarios for triplet extraction. Our innovation lies in transforming DocRE into a discriminative task, where the model pays attention to each relation and to the often overlooked issue of active vs. passive voice within the triplet. Our experiments on the Re-DocRED and DocRED datasets demonstrate state-of-the-art results for the DocRTE task.

Updated: 2025-04-08 10:43:00

标题: DiVA-DocRE：一种用于文档级关系提取的区分性和声音感知范式

摘要: 大型语言模型（LLM）在文本理解和生成方面的显著能力已经彻底改变了信息抽取（IE）的方式。其中一个进步是文档级关系三元组提取（DocRTE），这是信息系统中一个关键任务，旨在从文档中提取实体及其语义关系。然而，现有方法主要设计用于句子级关系三元组提取（SentRTE），通常处理限定的关系集合和单个句子内的三元组事实。此外，一些方法将关系视为候选选择集成到提示模板中，导致处理效率低下，当确定三元组中的关系元素时表现不佳。为了解决这些限制，我们引入了一种判别和语态感知的范式DiVA。DiVA仅涉及两个步骤：执行文档级关系提取（DocRE），然后基于关系识别主客体实体。不需要额外处理，只需输入文档即可直接获取三元组。这种简化流程更准确地反映了三元组提取的实际场景。我们的创新在于将DocRE转化为一个判别任务，模型关注每个关系以及通常被忽视的主动与被动语态问题。我们在Re-DocRED和DocRED数据集上的实验展示了DocRTE任务的最新成果。

更新时间: 2025-04-08 10:43:00

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2409.13717v2

To Give or Not to Give? The Impacts of Strategically Withheld Recourse

Individuals often aim to reverse undesired outcomes in interactions with automated systems, like loan denials, by either implementing system-recommended actions (recourse), or manipulating their features. While providing recourse benefits users and enhances system utility, it also provides information about the decision process that can be used for more effective strategic manipulation, especially when the individuals collectively share such information with each other. We show that this tension leads rational utility-maximizing systems to frequently withhold recourse, resulting in decreased population utility, particularly impacting sensitive groups. To mitigate these effects, we explore the role of recourse subsidies, finding them effective in increasing the provision of recourse actions by rational systems, as well as lowering the potential social cost and mitigating unfairness caused by recourse withholding.

Updated: 2025-04-08 10:36:16

标题: 是否给予？战略性保留救济的影响

摘要: 个体经常试图通过实施系统推荐的行动（救济）或操纵其特征来扭转与自动化系统的交互中不良结果，如贷款被拒。虽然提供救济有利于用户并增强系统效用，但它也提供了关于决策过程的信息，这些信息可以用于更有效的战略操纵，特别是当个体共享这些信息时。我们表明，这种紧张状态导致理性的效用最大化系统经常保留救济，导致人口效用降低，尤其影响敏感群体。为了减轻这些影响，我们探讨了救济补贴的作用，发现它们在增加理性系统提供救济行动方面是有效的，同时降低了潜在的社会成本，并缓解了由于保留救济而引起的不公平。

更新时间: 2025-04-08 10:36:16

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2504.05891v1

A Taxonomy of Self-Handover

Self-handover, transferring an object between one's own hands, is a common but understudied bimanual action. While it facilitates seamless transitions in complex tasks, the strategies underlying its execution remain largely unexplored. Here, we introduce the first systematic taxonomy of self-handover, derived from manual annotation of over 12 hours of cooking activity performed by 21 participants. Our analysis reveals that self-handover is not merely a passive transition, but a highly coordinated action involving anticipatory adjustments by both hands. As a step toward automated analysis of human manipulation, we further demonstrate the feasibility of classifying self-handover types using a state-of-the-art vision-language model. These findings offer fresh insights into bimanual coordination, underscoring the role of self-handover in enabling smooth task transitions-an ability essential for adaptive dual-arm robotics.

Updated: 2025-04-08 10:18:43

标题: 自主交接的分类学

摘要: 自我交接，即将一个物体从一只手传递到另一只手，是一种常见但鲜为人知的双手动作。虽然它有助于在复杂任务中实现无缝的过渡，但其执行背后的策略仍然未被深入研究。在这里，我们介绍了第一个系统的自我交接分类法，该分类法来源于对21名参与者进行的12小时烹饪活动的手动注释。我们的分析显示，自我交接不仅仅是一种被动的过渡，而是涉及到双手之间的预期调整的高度协调动作。作为朝向自动化分析人类操纵的一步，我们进一步展示了使用最先进的视觉语言模型对自我交接类型进行分类的可行性。这些发现为双手协调提供了新的见解，强调了自我交接在实现平稳任务过渡中的作用-这是自适应双臂机器人所必需的能力。

更新时间: 2025-04-08 10:18:43

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.04939v2

Turin3D: Evaluating Adaptation Strategies under Label Scarcity in Urban LiDAR Segmentation with Semi-Supervised Techniques

3D semantic segmentation plays a critical role in urban modelling, enabling detailed understanding and mapping of city environments. In this paper, we introduce Turin3D: a new aerial LiDAR dataset for point cloud semantic segmentation covering an area of around 1.43 km2 in the city centre of Turin with almost 70M points. We describe the data collection process and compare Turin3D with others previously proposed in the literature. We did not fully annotate the dataset due to the complexity and time-consuming nature of the process; however, a manual annotation process was performed on the validation and test sets, to enable a reliable evaluation of the proposed techniques. We first benchmark the performances of several point cloud semantic segmentation models, trained on the existing datasets, when tested on Turin3D, and then improve their performances by applying a semi-supervised learning technique leveraging the unlabelled training set. The dataset will be publicly available to support research in outdoor point cloud segmentation, with particular relevance for self-supervised and semi-supervised learning approaches given the absence of ground truth annotations for the training set.

Updated: 2025-04-08 10:17:14

标题: Turin3D：在城市LiDAR分割中使用半监督技术评估标签稀缺情况下的适应策略

摘要: 3D语义分割在城市建模中发挥着关键作用，可以实现对城市环境的详细理解和制图。本文介绍了Turin3D：一个新的航空LiDAR数据集，用于覆盖都灵市中心约1.43平方公里区域的点云语义分割，包含近7000万个点。我们描述了数据收集过程，并将Turin3D与文献中先前提出的其他数据集进行了比较。由于数据集的复杂性和耗时性，我们未对其进行完全注释；然而，在验证和测试集上进行了手动注释过程，以便对所提出的技术进行可靠评估。我们首先对在现有数据集上训练的几种点云语义分割模型在Turin3D上进行测试的性能进行基准测试，然后通过应用半监督学习技术利用未标记的训练集来提高它们的性能。该数据集将公开提供，以支持室外点云分割的研究，尤其对于自监督和半监督学习方法具有特殊的重要性，因为训练集缺乏地面真实注释。

更新时间: 2025-04-08 10:17:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05882v1

Assessing employment and labour issues implicated by using AI

This chapter critiques the dominant reductionist approach in AI and work studies, which isolates tasks and skills as replaceable components. Instead, it advocates for a systemic perspective that emphasizes the interdependence of tasks, roles, and workplace contexts. Two complementary approaches are proposed: an ethnographic, context-rich method that highlights how AI reconfigures work environments and expertise; and a relational task-based analysis that bridges micro-level work descriptions with macro-level labor trends. The authors argue that effective AI impact assessments must go beyond predicting automation rates to include ethical, well-being, and expertise-related questions. Drawing on empirical case studies, they demonstrate how AI reshapes human-technology relations, professional roles, and tacit knowledge practices. The chapter concludes by calling for a human-centric, holistic framework that guides organizational and policy decisions, balancing technological possibilities with social desirability and sustainability of work.

Updated: 2025-04-08 10:14:19

标题: 评估使用人工智能涉及的就业和劳动问题

摘要: 这一章对人工智能和工作研究中占主导地位的还原主义方法进行了批判，该方法将任务和技能孤立为可替代的组成部分。相反，它主张一个强调任务、角色和工作场景相互依存的系统性视角。提出了两种互补的方法：一个以民族志和丰富背景为特点的方法，突出人工智能如何重新配置工作环境和专业知识；以及一个关系性任务分析，将微观工作描述与宏观劳动趋势联系起来。作者认为，有效的人工智能影响评估必须超越预测自动化率，包括道德、福祉和专业知识相关问题。通过实证案例研究，他们展示了人工智能如何重塑人类与技术的关系、专业角色和默许知识实践。本章最后呼吁建立一个以人为中心、全面的框架，引导组织和政策决策，在技术可能性与社会期望和工作的可持续性之间取得平衡。

更新时间: 2025-04-08 10:14:19

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2504.06322v1

Actuarial Learning for Pension Fund Mortality Forecasting

For the assessment of the financial soundness of a pension fund, it is necessary to take into account mortality forecasting so that longevity risk is consistently incorporated into future cash flows. In this article, we employ machine learning models applied to actuarial science ({\it actuarial learning}) to make mortality predictions for a relevant sample of pension funds' participants. Actuarial learning represents an emerging field that involves the application of machine learning (ML) and artificial intelligence (AI) techniques in actuarial science. This encompasses the use of algorithms and computational models to analyze large sets of actuarial data, such as regression trees, random forest, boosting, XGBoost, CatBoost, and neural networks (eg. FNN, LSTM, and MHA). Our results indicate that some ML/AI algorithms present competitive out-of-sample performance when compared to the classical Lee-Carter model. This may indicate interesting alternatives for consistent liability evaluation and effective pension fund risk management.

Updated: 2025-04-08 10:09:41

标题: 养老基金死亡率预测的精算学习

摘要: 为了评估养老基金的财务稳健性，有必要考虑人口死亡率预测，以便将长寿风险一致地纳入未来现金流。在本文中，我们应用机器学习模型应用于精算科学（精算学习），为一组相关的养老基金参与者进行死亡率预测。精算学习代表了一个新兴领域，涉及在精算科学中应用机器学习（ML）和人工智能（AI）技术。这包括使用算法和计算模型来分析大量的精算数据，例如回归树、随机森林、提升、XGBoost、CatBoost和神经网络（例如FNN、LSTM和MHA）。我们的结果表明，与传统的Lee-Carter模型相比，一些ML/AI算法在样本外表现出竞争力。这可能表明了一些有趣的替代方案，用于一致的负债评估和有效的养老基金风险管理。

更新时间: 2025-04-08 10:09:41

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2504.05881v1

Systematic Parameter Decision in Approximate Model Counting

This paper proposes a novel approach to determining the internal parameters of the hashing-based approximate model counting algorithm $\mathsf{ApproxMC}$. In this problem, the chosen parameter values must ensure that $\mathsf{ApproxMC}$ is Probably Approximately Correct (PAC), while also making it as efficient as possible. The existing approach to this problem relies on heuristics; in this paper, we solve this problem by formulating it as an optimization problem that arises from generalizing $\mathsf{ApproxMC}$'s correctness proof to arbitrary parameter values. Our approach separates the concerns of algorithm soundness and optimality, allowing us to address the former without the need for repetitive case-by-case argumentation, while establishing a clear framework for the latter. Furthermore, after reduction, the resulting optimization problem takes on an exceptionally simple form, enabling the use of a basic search algorithm and providing insight into how parameter values affect algorithm performance. Experimental results demonstrate that our optimized parameters improve the runtime performance of the latest $\mathsf{ApproxMC}$ by a factor of 1.6 to 2.4, depending on the error tolerance.

Updated: 2025-04-08 09:58:41

标题: 近似模型计数中的系统参数决策

摘要: 本文提出了一种新颖的方法来确定基于哈希的近似模型计数算法$\mathsf{ApproxMC}$的内部参数。在这个问题中，选择的参数值必须确保$\mathsf{ApproxMC}$是可能近似正确的（PAC），同时使其尽可能高效。现有的方法依赖于启发式方法；在本文中，我们通过将问题表述为从概括$\mathsf{ApproxMC}$的正确性证明到任意参数值而产生的优化问题来解决这个问题。我们的方法将算法的稳健性和优化性分开处理，使我们能够解决前者而无需重复的逐案论证，同时为后者建立一个清晰的框架。此外，在简化后，得到的优化问题具有异常简单的形式，使得可以使用基本搜索算法，并揭示参数值如何影响算法性能。实验结果表明，我们优化的参数可以使最新的$\mathsf{ApproxMC}$的运行时间性能提高1.6到2.4倍，具体取决于误差容忍度。

更新时间: 2025-04-08 09:58:41

领域: cs.AI

下载: http://arxiv.org/abs/2504.05874v1

Exploiting Features and Logits in Heterogeneous Federated Learning

Due to the rapid growth of IoT and artificial intelligence, deploying neural networks on IoT devices is becoming increasingly crucial for edge intelligence. Federated learning (FL) facilitates the management of edge devices to collaboratively train a shared model while maintaining training data local and private. However, a general assumption in FL is that all edge devices are trained on the same machine learning model, which may be impractical considering diverse device capabilities. For instance, less capable devices may slow down the updating process because they struggle to handle large models appropriate for ordinary devices. In this paper, we propose a novel data-free FL method that supports heterogeneous client models by managing features and logits, called Felo; and its extension with a conditional VAE deployed in the server, called Velo. Felo averages the mid-level features and logits from the clients at the server based on their class labels to provide the average features and logits, which are utilized for further training the client models. Unlike Felo, the server has a conditional VAE in Velo, which is used for training mid-level features and generating synthetic features according to the labels. The clients optimize their models based on the synthetic features and the average logits. We conduct experiments on two datasets and show satisfactory performances of our methods compared with the state-of-the-art methods.

Updated: 2025-04-08 09:54:58

标题: 利用异构联邦学习中的特征和logits

摘要: 由于物联网和人工智能的快速发展，将神经网络部署在物联网设备上对边缘智能变得日益关键。联邦学习（FL）促进了边缘设备的管理，以便在保持训练数据本地和私密的同时，协作训练一个共享模型。然而，在FL中的一个普遍假设是，所有边缘设备都在相同的机器学习模型上进行训练，这可能在考虑到各种设备能力时是不切实际的。例如，性能较差的设备可能会因为难以处理适用于普通设备的大型模型而减慢更新过程。在本文中，我们提出了一种支持异构客户端模型的新型无数据FL方法，通过管理特征和对数，称为Felo；以及通过在服务器上部署条件VAE进行扩展的Velo。Felo根据客户端的类别标签在服务器上平均中间层特征和对数，以提供平均特征和对数，这些特征和对数用于进一步训练客户端模型。与Felo不同，Velo中的服务器使用条件VAE，用于训练中间层特征并根据标签生成合成特征。客户端根据合成特征和平均对数优化其模型。我们在两个数据集上进行实验，并展示了与最先进方法相比，我们的方法表现令人满意。

更新时间: 2025-04-08 09:54:58

领域: cs.LG

下载: http://arxiv.org/abs/2210.15527v2

Agent Guide: A Simple Agent Behavioral Watermarking Framework

The increasing deployment of intelligent agents in digital ecosystems, such as social media platforms, has raised significant concerns about traceability and accountability, particularly in cybersecurity and digital content protection. Traditional large language model (LLM) watermarking techniques, which rely on token-level manipulations, are ill-suited for agents due to the challenges of behavior tokenization and information loss during behavior-to-action translation. To address these issues, we propose Agent Guide, a novel behavioral watermarking framework that embeds watermarks by guiding the agent's high-level decisions (behavior) through probability biases, while preserving the naturalness of specific executions (action). Our approach decouples agent behavior into two levels, behavior (e.g., choosing to bookmark) and action (e.g., bookmarking with specific tags), and applies watermark-guided biases to the behavior probability distribution. We employ a z-statistic-based statistical analysis to detect the watermark, ensuring reliable extraction over multiple rounds. Experiments in a social media scenario with diverse agent profiles demonstrate that Agent Guide achieves effective watermark detection with a low false positive rate. Our framework provides a practical and robust solution for agent watermarking, with applications in identifying malicious agents and protecting proprietary agent systems.

Updated: 2025-04-08 09:54:49

标题: 代理指南：一个简单的代理行为水印框架

摘要: 在数字生态系统中智能代理的部署不断增加，例如社交媒体平台，引发了关于可追踪性和问责制的重大关注，特别是在网络安全和数字内容保护方面。传统的大型语言模型(LLM)水印技术依赖于标记级别的操纵，对于代理来说并不合适，因为在行为到行动转换过程中存在行为标记化和信息丢失的挑战。为了解决这些问题，我们提出了Agent Guide，这是一个新颖的行为水印框架，通过引导代理的高级决策（行为）通过概率偏差嵌入水印，同时保留特定执行（行动）的自然性。我们的方法将代理行为解耦为两个层次，行为（例如，选择添加书签）和行动（例如，使用特定标签添加书签），并将水印引导偏差应用于行为概率分布。我们使用基于z统计的统计分析来检测水印，确保在多轮中可靠提取。在具有不同代理配置文件的社交媒体场景中的实验表明，Agent Guide实现了有效的水印检测，且误报率低。我们的框架为代理水印提供了实用且强大的解决方案，在识别恶意代理和保护专有代理系统方面具有应用。

更新时间: 2025-04-08 09:54:49

领域: cs.AI,K.6.5

下载: http://arxiv.org/abs/2504.05871v1

Energy-Conserving Neural Network Closure Model for Long-Time Accurate and Stable LES

Machine learning-based closure models for LES have shown promise in capturing complex turbulence dynamics but often suffer from instabilities and physical inconsistencies. In this work, we develop a novel skew-symmetric neural architecture as closure model that enforces stability while preserving key physical conservation laws. Our approach leverages a discretization that ensures mass, momentum, and energy conservation, along with a face-averaging filter to maintain mass conservation in coarse-grained velocity fields. We compare our model against several conventional data-driven closures (including unconstrained convolutional neural networks), and the physics-based Smagorinsky model. Performance is evaluated on decaying turbulence and Kolmogorov flow for multiple coarse-graining factors. In these test cases we observe that unconstrained machine learning models suffer from numerical instabilities. In contrast, our skew-symmetric model remains stable across all tests, though at the cost of increased dissipation. Despite this trade-off, we demonstrate that our model still outperforms the Smagorinsky model in unseen scenarios. These findings highlight the potential of structure-preserving machine learning closures for reliable long-time LES.

Updated: 2025-04-08 09:49:18

标题: 能量保存的神经网络封闭模型用于长时间精确稳定的LES

摘要: 机器学习基于LES的封闭模型显示出捕获复杂湍流动力学的潜力，但往往受到不稳定性和物理不一致性的困扰。在这项工作中，我们开发了一种新颖的斜对称神经网络架构作为封闭模型，强制执行稳定性同时保留关键的物理守恒定律。我们的方法利用一种确保质量、动量和能量守恒的离散化，以及一种面平均滤波器，以在粗粒度速度场中保持质量守恒。我们将我们的模型与几种传统的数据驱动封闭模型（包括无约束的卷积神经网络）和基于物理的Smagorinsky模型进行比较。性能在衰减湍流和科尔莫哥洛夫流上进行评估，用于多个粗粒化因子。在这些测试案例中，我们观察到无约束的机器学习模型受到数值不稳定性的影响。相比之下，我们的斜对称模型在所有测试中仍然保持稳定，尽管以增加耗散为代价。尽管存在这种权衡，我们展示了我们的模型仍然在未知情况下优于Smagorinsky模型。这些发现突显了结构保持机器学习封闭模型对可靠的长时间LES的潜力。

更新时间: 2025-04-08 09:49:18

领域: cs.LG,cs.NA,math.NA,65Mxx

下载: http://arxiv.org/abs/2504.05868v1

CTI-HAL: A Human-Annotated Dataset for Cyber Threat Intelligence Analysis

Organizations are increasingly targeted by Advanced Persistent Threats (APTs), which involve complex, multi-stage tactics and diverse techniques. Cyber Threat Intelligence (CTI) sources, such as incident reports and security blogs, provide valuable insights, but are often unstructured and in natural language, making it difficult to automatically extract information. Recent studies have explored the use of AI to perform automatic extraction from CTI data, leveraging existing CTI datasets for performance evaluation and fine-tuning. However, they present challenges and limitations that impact their effectiveness. To overcome these issues, we introduce a novel dataset manually constructed from CTI reports and structured according to the MITRE ATT&CK framework. To assess its quality, we conducted an inter-annotator agreement study using Krippendorff alpha, confirming its reliability. Furthermore, the dataset was used to evaluate a Large Language Model (LLM) in a real-world business context, showing promising generalizability.

Updated: 2025-04-08 09:47:15

标题: CTI-HAL：一份用于网络威胁情报分析的人工标注数据集

摘要: 组织越来越受到高级持久威胁（APTs）的攻击，这涉及复杂的、多阶段的策略和多样化的技术。网络威胁情报（CTI）来源，如事件报告和安全博客，提供了有价值的见解，但通常是非结构化的自然语言，使得难以自动提取信息。最近的研究探讨了利用人工智能从CTI数据中进行自动提取，利用现有的CTI数据集进行性能评估和微调。然而，它们提出了影响其有效性的挑战和限制。为了克服这些问题，我们引入了一个新颖的数据集，该数据集由CTI报告手动构建，并根据MITRE ATT&CK框架进行结构化。为了评估其质量，我们使用Krippendorff alpha进行了一个注释者间一致性研究，确认了其可靠性。此外，该数据集被用于在真实商业环境中评估一个大型语言模型（LLM），显示出有希望的泛化能力。

更新时间: 2025-04-08 09:47:15

领域: cs.CR

下载: http://arxiv.org/abs/2504.05866v1

Understanding Layer Significance in LLM Alignment

Aligning large language models (LLMs) through supervised fine-tuning is essential for tailoring them to specific applications. Recent studies suggest that alignment primarily adjusts a model's presentation style rather than its foundational knowledge, indicating that only certain components of the model are significantly impacted. To uncover how alignment affects model behavior at a granular level, we propose identifying which layers within LLMs are most critical to the alignment process. Our approach, named ILA, involves learning a binary mask for the parameter changes in each layer during alignment, as an indicator of layer significance. Experimental results reveal that, despite substantial differences in alignment datasets, the important layers of a model identified by ILA exhibit nearly 90\% overlap, highlighting fundamental patterns in LLM alignment. The results also indicate that freezing non-essential layers improves overall model performance, while selectively tuning the most critical layers significantly enhances fine-tuning efficiency with minimal performance loss. Finally, we discuss how these findings extend from LLM alignment to reasoning.

Updated: 2025-04-08 09:44:28

标题: 理解LLM对齐中的层次重要性

摘要: 通过监督微调对齐大型语言模型（LLMs）对于将它们定制到特定应用程序至关重要。最近的研究表明，对齐主要调整模型的表现风格而不是其基础知识，表明仅有模型的某些组件受到显著影响。为了揭示对齐如何在细粒度水平上影响模型行为，我们提出了识别LLMs中哪些层对对齐过程最关键的方法。我们的方法被称为ILA，涉及在对齐期间学习每个层中参数变化的二进制掩码，作为层重要性的指示。实验结果表明，尽管对齐数据集存在明显差异，但ILA识别的模型重要层之间几乎有90%的重叠，突显了LLM对齐中的基本模式。结果还表明，冻结非关键层会改善整体模型性能，而有选择地微调最关键的层会显著提高微调效率，同时最小程度地降低性能。最后，我们讨论了这些发现如何从LLM对齐扩展到推理。

更新时间: 2025-04-08 09:44:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.17875v3

Are Generative AI Agents Effective Personalized Financial Advisors?

Large language model-based agents are becoming increasingly popular as a low-cost mechanism to provide personalized, conversational advice, and have demonstrated impressive capabilities in relatively simple scenarios, such as movie recommendations. But how do these agents perform in complex high-stakes domains, where domain expertise is essential and mistakes carry substantial risk? This paper investigates the effectiveness of LLM-advisors in the finance domain, focusing on three distinct challenges: (1) eliciting user preferences when users themselves may be unsure of their needs, (2) providing personalized guidance for diverse investment preferences, and (3) leveraging advisor personality to build relationships and foster trust. Via a lab-based user study with 64 participants, we show that LLM-advisors often match human advisor performance when eliciting preferences, although they can struggle to resolve conflicting user needs. When providing personalized advice, the LLM was able to positively influence user behavior, but demonstrated clear failure modes. Our results show that accurate preference elicitation is key, otherwise, the LLM-advisor has little impact, or can even direct the investor toward unsuitable assets. More worryingly, users appear insensitive to the quality of advice being given, or worse these can have an inverse relationship. Indeed, users reported a preference for and increased satisfaction as well as emotional trust with LLMs adopting an extroverted persona, even though those agents provided worse advice.

Updated: 2025-04-08 09:41:03

标题: 生成式AI代理人是否是有效的个性化财务顾问？

摘要: 基于大型语言模型的代理在提供个性化、对话式建议方面越来越受欢迎，并且在相对简单的场景中表现出令人印象深刻的能力，比如电影推荐。但是，这些代理在复杂高风险领域中的表现如何，这些领域需要领域专业知识，错误可能带来重大风险？本文研究了金融领域中LLM顾问的有效性，重点关注三个不同的挑战：(1)在用户自己可能不确定自己需求的情况下引出用户偏好，(2)为多样化的投资偏好提供个性化指导，(3)利用顾问个性建立关系和促进信任。通过一项64名参与者的实验室用户研究，我们发现LLM顾问在引出偏好时通常能够与人类顾问表现匹配，尽管他们可能在解决用户需求冲突时遇到困难。在提供个性化建议时，LLM能够积极影响用户行为，但表现出明显的失败模式。我们的结果显示，准确的偏好引出是关键，否则，LLM顾问几乎没有影响，甚至可能导向投资者走向不合适的资产。更令人担忧的是，用户似乎对所给建议的质量不敏感，甚至更糟的是，这可能存在一个反向关系。事实上，用户报告了对采用外向人格的LLM更偏好、更满意以及更情感上信任，即使这些代理提供的建议更糟糕。

更新时间: 2025-04-08 09:41:03

领域: cs.AI,cs.CL,cs.HC,cs.IR,q-fin.CP

下载: http://arxiv.org/abs/2504.05862v1

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench emphasizes real-world scenarios and practical aspects of software development. We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks. Evaluation of 19 LLMs reveals that closed-source models (particularly Gemini-Ultra and GPT-4), outperform open-source models in CodeEditorBench, highlighting differences in model performance based on problem types and prompt sensitivities. CodeEditorBench aims to catalyze advancements in LLMs by providing a robust platform for assessing code editing capabilities. We will release all prompts and datasets to enable the community to expand the dataset and benchmark emerging LLMs. By introducing CodeEditorBench, we contribute to the advancement of LLMs in code editing and provide a valuable resource for researchers and practitioners.

Updated: 2025-04-08 09:39:25

标题: CodeEditorBench：评估大型语言模型的代码编辑能力

摘要: 大型语言模型(LLMs)用于代码的应用正在迅速发展，代码编辑作为一项关键能力正在崛起。我们引入了CodeEditorBench，这是一个旨在严格评估LLMs在代码编辑任务中表现的评估框架，包括调试、翻译、润色和需求切换。与现有的专注于代码生成的基准不同，CodeEditorBench强调真实世界场景和软件开发的实际方面。我们从五个来源中精选了各种编程语言、复杂级别和编辑任务的多样化编码挑战和场景。对19个LLMs的评估显示，封闭源模型(特别是Gemini-Ultra和GPT-4)在CodeEditorBench中表现优于开源模型，突显了基于问题类型和提示敏感性的模型性能差异。CodeEditorBench旨在通过提供一个强大的平台来评估代码编辑能力，促进LLMs的进步。我们将发布所有提示和数据集，以便社区扩展数据集并评估新兴的LLMs。通过引入CodeEditorBench，我们为LLMs在代码编辑方面的进步做出贡献，并为研究人员和从业者提供了宝贵的资源。

更新时间: 2025-04-08 09:39:25

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.03543v3

Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners

Searching for unfamiliar American Sign Language (ASL) signs is challenging for learners because, unlike spoken languages, they cannot type a text-based query to look up an unfamiliar sign. Advances in isolated sign recognition have enabled the creation of video-based dictionaries, allowing users to submit a video and receive a list of the closest matching signs. Previous HCI research using Wizard-of-Oz prototypes has explored interface designs for ASL dictionaries. Building on these studies, we incorporate their design recommendations and leverage state-of-the-art sign-recognition technology to develop an automated video-based dictionary. We also present findings from an observational study with twelve novice ASL learners who used this dictionary during video-comprehension and question-answering tasks. Our results address human-AI interaction challenges not covered in previous WoZ research, including recording and resubmitting signs, unpredictable outputs, system latency, and privacy concerns. These insights offer guidance for designing and deploying video-based ASL dictionary systems.

Updated: 2025-04-08 09:35:46

标题: 走向基于人工智能的视频美国手语词典：与学习者探讨设计和使用体验

摘要: 寻找不熟悉的美国手语（ASL）手势对学习者来说是具有挑战性的，因为与口语语言不同，他们无法输入基于文本的查询来查找不熟悉的手势。孤立手势识别技术的进步已经实现了基于视频的词典的创建，允许用户提交视频并接收最匹配手势的列表。先前的人机交互研究使用“巫师”原型探索了ASL词典的界面设计。基于这些研究，我们结合他们的设计建议并利用最先进的手势识别技术开发了一个自动化的基于视频的词典。我们还提出了一个有关十二名新手ASL学习者在视频理解和答题任务中使用该词典的观察研究结果。我们的结果涉及到以往WoZ研究中未涉及的人工智能交互挑战，包括记录和重新提交手势、不可预测的输出、系统延迟和隐私问题。这些见解为设计和部署基于视频的ASL词典系统提供了指导。

更新时间: 2025-04-08 09:35:46

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2504.05857v1

Enhancing Coreference Resolution with Pretrained Language Models: Bridging the Gap Between Syntax and Semantics

Large language models have made significant advancements in various natural language processing tasks, including coreference resolution. However, traditional methods often fall short in effectively distinguishing referential relationships due to a lack of integration between syntactic and semantic information. This study introduces an innovative framework aimed at enhancing coreference resolution by utilizing pretrained language models. Our approach combines syntax parsing with semantic role labeling to accurately capture finer distinctions in referential relationships. By employing state-of-the-art pretrained models to gather contextual embeddings and applying an attention mechanism for fine-tuning, we improve the performance of coreference tasks. Experimental results across diverse datasets show that our method surpasses conventional coreference resolution systems, achieving notable accuracy in disambiguating references. This development not only improves coreference resolution outcomes but also positively impacts other natural language processing tasks that depend on precise referential understanding.

Updated: 2025-04-08 09:33:09

标题: 使用预训练语言模型增强共指消解：弥合句法与语义之间的差距

摘要: 大型语言模型在各种自然语言处理任务中取得了重大进展，包括共指消解。然而，传统方法往往在有效区分指代关系方面表现不佳，因为缺乏句法和语义信息之间的整合。本研究引入了一种创新框架，旨在通过利用预训练语言模型来增强共指消解。我们的方法将句法分析与语义角色标注结合，以准确捕捉指代关系中的细微区别。通过利用最先进的预训练模型来收集上下文嵌入，并应用注意机制进行微调，我们提高了共指任务的性能。跨多个数据集的实验结果显示，我们的方法超越了传统的共指消解系统，在消除引用歧义方面取得了显着的准确性。这一发展不仅改善了共指消解结果，还积极影响其他依赖于精确指代理解的自然语言处理任务。

更新时间: 2025-04-08 09:33:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.05855v1

A spectral mixture representation of isotropic kernels to generalize random Fourier features

Rahimi and Recht (2007) introduced the idea of decomposing positive definite shift-invariant kernels by randomly sampling from their spectral distribution. This famous technique, known as Random Fourier Features (RFF), is in principle applicable to any such kernel whose spectral distribution can be identified and simulated. In practice, however, it is usually applied to the Gaussian kernel because of its simplicity, since its spectral distribution is also Gaussian. Clearly, simple spectral sampling formulas would be desirable for broader classes of kernels. In this paper, we show that the spectral distribution of positive definite isotropic kernels in $\mathbb{R}^{d}$ for all $d\geq1$ can be decomposed as a scale mixture of $\alpha$-stable random vectors, and we identify the mixing distribution as a function of the kernel. This constructive decomposition provides a simple and ready-to-use spectral sampling formula for many multivariate positive definite shift-invariant kernels, including exponential power kernels, generalized Mat\'ern kernels, generalized Cauchy kernels, as well as newly introduced kernels such as the Beta, Kummer, and Tricomi kernels. In particular, we retrieve the fact that the spectral distributions of these kernels are scale mixtures of the multivariate Gaussian distribution, along with an explicit mixing distribution formula. This result has broad applications for support vector machines, kernel ridge regression, Gaussian processes, and other kernel-based machine learning techniques for which the random Fourier features technique is applicable.

Updated: 2025-04-08 09:32:39

标题: 一个谱混合表示的各向同性核以推广随机傅立叶特征

摘要: Rahimi和Recht（2007）提出了通过从谱分布中随机抽样来分解正定平移不变核的思想。这种著名的技术被称为随机傅立叶特征（RFF），原则上适用于任何可以识别和模拟其谱分布的这种核。然而，在实践中，通常将其应用于高斯核，因为其简单性，其谱分布也是高斯的。显然，对于更广泛的核类别，简单的谱抽样公式是可取的。在本文中，我们展示了在$\mathbb{R}^{d}$中对所有$d\geq1$的正定各向同性核的谱分布可以分解为$\alpha$-稳定随机向量的尺度混合，并将混合分布识别为核的函数。这种构造性分解为许多多元正定平移不变核提供了一个简单和即用的谱抽样公式，包括指数幂核，广义Matérn核，广义柯西核，以及新引入的Beta、Kummer和Tricomi核。特别地，我们得到了这些核的谱分布是多元高斯分布的尺度混合，以及显式的混合分布公式。这个结果对支持向量机、核岭回归、高斯过程以及其他适用于随机傅立叶特征技术的基于核的机器学习技术具有广泛的应用。

更新时间: 2025-04-08 09:32:39

领域: cs.LG,math.PR,stat.CO,stat.ML,42B10, 62H05, 65C05, 65D12, 60E10,G.3; I.6.1; I.1.0

下载: http://arxiv.org/abs/2411.02770v3

Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model

Reinforcement Learning from Human Feedback (RLHF) has emerged as a critical technique for training large language models. However, reward hacking-a phenomenon where models exploit flaws in the reward model-remains a significant barrier to achieving robust and scalable intelligence through long-term training. Existing studies have proposed uncertain reward model to address reward hacking, however, they often lack systematic or theoretical foundations, failing to model the uncertainty intrinsically emerging from preference data, thus cannot sufficiently mitigate reward hacking to sustain prolonged RLHF training and exploration. In this paper, we propose the Probabilistic Uncertain Reward Model (PURM), a natural generalization of the classical Bradley-Terry reward model, that directly model the reward distribution emerged from the preference data. We theoretically derived PURM's loss function and the reward distribution uncertainty calculation based on Bhattacharyya Coefficient. To mitigate reward hacking with PURM, we further introduce an uncertainty-aware penalty into Proximal Policy Optimization (PPO), which leverages the learned uncertainty to dynamically balance reward optimization and exploration. We propose a lightweight and easy-to-use implementation of PURM. Experiments demonstrate that PURM significantly delays the onset of reward hacking while improving final reward performance, outperforming baseline methods in both stability and effectiveness.

Updated: 2025-04-08 09:32:13

标题: 概率不确定奖励模型：Bradley-Terry奖励模型的自然泛化

摘要: 人类反馈强化学习（RLHF）已经成为训练大型语言模型的关键技术。然而，奖励破解-模型利用奖励模型中的缺陷的现象-仍然是通过长期训练实现稳健和可扩展智能的重要障碍。现有研究提出了不确定性奖励模型来解决奖励破解问题，然而，它们通常缺乏系统性或理论基础，未能对偏好数据中固有产生的不确定性进行建模，因此无法充分减轻奖励破解以支持长期的RLHF训练和探索。在本文中，我们提出了概率不确定奖励模型（PURM），这是经典Bradley-Terry奖励模型的自然泛化，直接对偏好数据中产生的奖励分布进行建模。我们从理论上推导了PURM的损失函数和基于Bhattacharyya系数的奖励分布不确定性计算。为了减轻PURM的奖励破解问题，我们进一步将一种基于不确定性的惩罚引入到Proximal Policy Optimization（PPO）中，利用学习到的不确定性动态平衡奖励优化和探索。我们提出了PURM的轻量且易于使用的实现。实验证明，PURM显著延迟了奖励破解的发生时间，同时改善了最终奖励性能，在稳定性和有效性方面优于基线方法。

更新时间: 2025-04-08 09:32:13

领域: cs.LG

下载: http://arxiv.org/abs/2503.22480v3

Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants

Generative models have demonstrated remarkable success in domains such as text, image, and video synthesis. In this work, we explore the application of generative models to fluid dynamics, specifically for turbulence simulation, where classical numerical solvers are computationally expensive. We propose a novel stochastic generative model based on stochastic interpolants, which enables probabilistic forecasting while incorporating physical constraints such as energy stability and divergence-freeness. Unlike conventional stochastic generative models, which are often agnostic to underlying physical laws, our approach embeds energy consistency by making the parameters of the stochastic interpolant learnable coefficients. We evaluate our method on a benchmark turbulence problem - Kolmogorov flow - demonstrating superior accuracy and stability over state-of-the-art alternatives such as autoregressive conditional diffusion models (ACDMs) and PDE-Refiner. Furthermore, we achieve stable results for significantly longer roll-outs than standard stochastic interpolants. Our results highlight the potential of physics-aware generative models in accelerating and enhancing turbulence simulations while preserving fundamental conservation properties.

Updated: 2025-04-08 09:29:01

标题: 物理学感知的生成模型用于通过能量一致的随机插值描绘湍流流体流动

摘要: 生成模型在文本、图像和视频合成等领域展示了显著的成功。在这项工作中，我们探索了将生成模型应用于流体动力学，特别是用于湍流模拟，传统的数值求解器在计算上非常昂贵。我们提出了一种基于随机插值的新型随机生成模型，它可以进行概率预测，同时融入诸如能量稳定性和无散性等物理约束。与通常对基础物理定律不关心的传统随机生成模型不同，我们的方法通过使随机插值的参数成为可学习系数来嵌入能量一致性。我们在一个基准湍流问题 - 科尔莫哥罗夫流上评估了我们的方法，展示了超越最先进的替代方案，如自回归条件扩散模型（ACDMs）和PDE-Refiner的优越准确性和稳定性。此外，我们实现了比标准随机插值更长时间的稳定结果。我们的结果突出了物理感知生成模型在加速和增强湍流模拟方面的潜力，同时保留了基本的守恒性质。

更新时间: 2025-04-08 09:29:01

领域: cs.CE,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2504.05852v1

Referential communication in heterogeneous communities of pre-trained visual deep networks

As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of referential communication in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.

Updated: 2025-04-08 09:28:02

标题: 预先训练的视觉深度网络异质社区中的指代性沟通

摘要: 随着大型预训练图像处理神经网络被嵌入到自主代理中，如自动驾驶汽车或机器人，一个问题出现了，即这些系统如何能够相互之间就周围世界进行通信，尽管它们具有不同的架构和训练规则。作为朝着这个方向迈出的第一步，我们系统地探索了在一个异构先进预训练视觉网络社区中的指代交流任务，展示它们可以自我监督地发展出一个共享的协议，用于在一组候选对象中引用目标对象。这个共享的协议也可以在一定程度上用于沟通关于以前未见过的不同粒度的对象类别。此外，一个最初不属于现有社区的视觉网络可以非常容易地学习社区的协议。最后，我们 qualitatively 和 quantitatively 研究了新兴协议的性质，提供了一些证据表明它正在捕捉对象的高级语义特征。

更新时间: 2025-04-08 09:28:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2302.08913v7

PathGPT: Leveraging Large Language Models for Personalized Route Generation

The proliferation of GPS enabled devices has led to the accumulation of a substantial corpus of historical trajectory data. By leveraging these data for training machine learning models,researchers have devised novel data-driven methodologies that address the personalized route recommendation (PRR) problem. In contrast to conventional algorithms such as Dijkstra shortest path algorithm,these novel algorithms possess the capacity to discern and learn patterns within the data,thereby facilitating the generation of more personalized paths. However,once these models have been trained,their application is constrained to the generation of routes that align with their training patterns. This limitation renders them less adaptable to novel scenarios and the deployment of multiple machine learning models might be necessary to address new possible scenarios,which can be costly as each model must be trained separately. Inspired by recent advances in the field of Large Language Models (LLMs),we leveraged their natural language understanding capabilities to develop a unified model to solve the PRR problem while being seamlessly adaptable to new scenarios without additional training. To accomplish this,we combined the extensive knowledge LLMs acquired during training with further access to external hand-crafted context information,similar to RAG (Retrieved Augmented Generation) systems,to enhance their ability to generate paths according to user-defined requirements. Extensive experiments on different datasets show a considerable uplift in LLM performance on the PRR problem.

Updated: 2025-04-08 09:25:21

标题: PathGPT：利用大型语言模型进行个性化路径生成

摘要: GPS定位设备的普及导致了大量的历史轨迹数据的积累。通过利用这些数据来训练机器学习模型，研究人员设计了新颖的数据驱动方法来解决个性化路径推荐（PRR）问题。与传统算法如Dijkstra最短路径算法相比，这些新颖算法具有识别和学习数据中模式的能力，从而促进生成更个性化路径。然而，一旦这些模型被训练，它们的应用受限于生成符合其训练模式的路径。这种限制使它们对新场景的适应性较弱，可能需要部署多个机器学习模型来处理新的可能场景，这可能成本高昂，因为每个模型必须单独训练。受到大型语言模型(LLMs)领域最新进展的启发，我们利用它们的自然语言理解能力开发了一个统一模型来解决PRR问题，同时能够无需额外训练就能轻松适应新场景。为了实现这一目标，我们结合了LLMs在训练过程中获得的广泛知识，进一步访问外部手工制作的上下文信息，类似于RAG（检索增强生成）系统，以增强它们根据用户定义要求生成路径的能力。对不同数据集的广泛实验显示LLM在PRR问题上的性能有显著提升。

更新时间: 2025-04-08 09:25:21

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.05846v1

Adaptive Substructure-Aware Expert Model for Molecular Property Prediction

Molecular property prediction is essential for applications such as drug discovery and toxicity assessment. While Graph Neural Networks (GNNs) have shown promising results by modeling molecules as molecular graphs, their reliance on data-driven learning limits their ability to generalize, particularly in the presence of data imbalance and diverse molecular substructures. Existing methods often overlook the varying contributions of different substructures to molecular properties, treating them uniformly. To address these challenges, we propose ASE-Mol, a novel GNN-based framework that leverages a Mixture-of-Experts (MoE) approach for molecular property prediction. ASE-Mol incorporates BRICS decomposition and significant substructure awareness to dynamically identify positive and negative substructures. By integrating a MoE architecture, it reduces the adverse impact of negative motifs while improving adaptability to positive motifs. Experimental results on eight benchmark datasets demonstrate that ASE-Mol achieves state-of-the-art performance, with significant improvements in both accuracy and interpretability.

Updated: 2025-04-08 09:25:03

标题: 自适应次结构感知专家模型用于分子性质预测

摘要: 分子性质预测对于药物发现和毒性评估等应用至关重要。虽然图神经网络（GNNs）通过将分子建模为分子图表现出有希望的结果，但它们对数据驱动学习的依赖限制了它们在数据不平衡和多样的分子亚结构存在时的泛化能力。现有方法往往忽视不同亚结构对分子性质的不同贡献，将它们一概而论。为了解决这些挑战，我们提出了ASE-Mol，这是一个基于GNN的新颖框架，利用了混合专家（MoE）方法进行分子性质预测。ASE-Mol将BRICS分解和显著亚结构意识相结合，动态识别正负亚结构。通过整合MoE架构，它减少了负面基序的不利影响，同时提高了对正面基序的适应性。对八个基准数据集的实验结果表明，ASE-Mol实现了最先进的性能，在准确性和可解释性方面都取得了显著改进。

更新时间: 2025-04-08 09:25:03

领域: cs.LG

下载: http://arxiv.org/abs/2504.05844v1

Hybrid Temporal Differential Consistency Autoencoder for Efficient and Sustainable Anomaly Detection in Cyber-Physical Systems

Cyberattacks on critical infrastructure, particularly water distribution systems, have increased due to rapid digitalization and the integration of IoT devices and industrial control systems (ICS). These cyber-physical systems (CPS) introduce new vulnerabilities, requiring robust and automated intrusion detection systems (IDS) to mitigate potential threats. This study addresses key challenges in anomaly detection by leveraging time correlations in sensor data, integrating physical principles into machine learning models, and optimizing computational efficiency for edge applications. We build upon the concept of temporal differential consistency (TDC) loss to capture the dynamics of the system, ensuring meaningful relationships between dynamic states. Expanding on this foundation, we propose a hybrid autoencoder-based approach, referred to as hybrid TDC-AE, which extends TDC by incorporating both deterministic nodes and conventional statistical nodes. This hybrid structure enables the model to account for non-deterministic processes. Our approach achieves state-of-the-art classification performance while improving time to detect anomalies by 3%, outperforming the BATADAL challenge leader without requiring domain-specific knowledge, making it broadly applicable. Additionally, it maintains the computational efficiency of conventional autoencoders while reducing the number of fully connected layers, resulting in a more sustainable and efficient solution. The method demonstrates how leveraging physics-inspired consistency principles enhances anomaly detection and strengthens the resilience of cyber-physical systems.

Updated: 2025-04-08 09:22:44

标题: 混合时间差分一致性自编码器用于高效和可持续的网络物理系统异常检测

摘要: 关键基础设施，特别是水务分配系统遭受网络攻击的情况正在增加，这主要是由于数字化进程的快速推进以及物联网设备和工业控制系统（ICS）的整合。这些网络物理系统（CPS）引入了新的漏洞，需要强大且自动化的入侵检测系统（IDS）来减轻潜在威胁。本研究通过利用传感器数据中的时间相关性、将物理原理整合到机器学习模型中，并为边缘应用优化计算效率，解决了异常检测中的关键挑战。我们建立在时间差分一致性（TDC）损失的概念基础上，以捕捉系统的动态，确保动态状态之间具有意义的关系。在此基础上，我们提出了一种基于混合自编码器的方法，称为混合TDC-AE，该方法通过结合确定性节点和传统统计节点来扩展TDC。这种混合结构使模型能够考虑非确定性过程。我们的方法在不需要领域特定知识的情况下，实现了最先进的分类性能，同时将异常检测的时间缩短了3%，胜过了BATADAL挑战的领先者。此外，它保持了传统自编码器的计算效率，同时减少了完全连接层的数量，从而提供了更可持续和高效的解决方案。该方法展示了如何利用受物理启发的一致性原则增强异常检测，并加强网络物理系统的韧性。

更新时间: 2025-04-08 09:22:44

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06320v1

Momentum Boosted Episodic Memory for Improving Learning in Long-Tailed RL Environments

Traditional Reinforcement Learning (RL) algorithms assume the distribution of the data to be uniform or mostly uniform. However, this is not the case with most real-world applications like autonomous driving or in nature where animals roam. Some experiences are encountered frequently, and most of the remaining experiences occur rarely; the resulting distribution is called Zipfian. Taking inspiration from the theory of complementary learning systems, an architecture for learning from Zipfian distributions is proposed where important long tail trajectories are discovered in an unsupervised manner. The proposal comprises an episodic memory buffer containing a prioritised memory module to ensure important rare trajectories are kept longer to address the Zipfian problem, which needs credit assignment to happen in a sample efficient manner. The experiences are then reinstated from episodic memory and given weighted importance forming the trajectory to be executed. Notably, the proposed architecture is modular, can be incorporated in any RL architecture and yields improved performance in multiple Zipfian tasks over traditional architectures. Our method outperforms IMPALA by a significant margin on all three tasks and all three evaluation metrics (Zipfian, Uniform, and Rare Accuracy) and also gives improvements on most Atari environments that are considered challenging

Updated: 2025-04-08 09:21:39

标题: 增强动量的叙事记忆用于改善长尾RL环境中的学习

摘要: 传统的强化学习（RL）算法假设数据分布是均匀的或大多是均匀的。然而，在大多数现实世界的应用程序中，如自动驾驶或动物漫游的自然环境中，并非如此。一些经验经常遇到，其余的经验很少发生；由此产生的分布被称为Zipfian。受互补学习系统理论的启发，提出了一种用于从Zipfian分布中学习的架构，在其中以无监督的方式发现重要的长尾轨迹。该提议包括一个包含优先内存模块的情景记忆缓冲区，以确保重要的罕见轨迹保持更长时间以解决Zipfian问题，这需要以样本有效的方式进行信用分配。然后，经验从情景记忆中恢复并赋予加权重要性，形成要执行的轨迹。值得注意的是，所提出的架构是模块化的，可以并入任何RL架构，并在多个Zipfian任务中的性能优于传统架构。我们的方法在所有三个任务和所有三个评估指标（Zipfian，均匀和罕见准确性）上显着优于IMPALA，并且在大多数被认为具有挑战性的Atari环境中也有改进。

更新时间: 2025-04-08 09:21:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05840v1

Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking

Recently, the Image Prompt Adapter (IP-Adapter) has been increasingly integrated into text-to-image diffusion models (T2I-DMs) to improve controllability. However, in this paper, we reveal that T2I-DMs equipped with the IP-Adapter (T2I-IP-DMs) enable a new jailbreak attack named the hijacking attack. We demonstrate that, by uploading imperceptible image-space adversarial examples (AEs), the adversary can hijack massive benign users to jailbreak an Image Generation Service (IGS) driven by T2I-IP-DMs and mislead the public to discredit the service provider. Worse still, the IP-Adapter's dependency on open-source image encoders reduces the knowledge required to craft AEs. Extensive experiments verify the technical feasibility of the hijacking attack. In light of the revealed threat, we investigate several existing defenses and explore combining the IP-Adapter with adversarially trained models to overcome existing defenses' limitations. Our code is available at https://github.com/fhdnskfbeuv/attackIPA.

Updated: 2025-04-08 09:20:29

标题: 注意特洛伊木马：图像提示适配器实现可伸缩且具有欺骗性的越狱

摘要: 最近，图像提示适配器（IP-Adapter）越来越多地被整合到文本到图像扩散模型（T2I-DMs）中，以提高可控性。然而，在本文中，我们揭示了配备IP-Adapter（T2I-IP-DMs）的T2I-DMs使得一种名为劫持攻击的新越狱攻击成为可能。我们证明，通过上传不可察觉的图像空间对抗示例（AEs），对手可以劫持大量良性用户来越狱由T2I-IP-DMs驱动的图像生成服务（IGS）并误导公众贬低服务提供商。更糟糕的是，IP-Adapter对开源图像编码器的依赖降低了制作AEs所需的知识。大量实验证实了劫持攻击的技术可行性。鉴于揭示的威胁，我们调查了几种现有的防御措施，并探讨了将IP-Adapter与对抗训练模型结合以克服现有防御措施的限制。我们的代码可以在https://github.com/fhdnskfbeuv/attackIPA找到。

更新时间: 2025-04-08 09:20:29

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2504.05838v1

Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching

Large Language Models (LLMs) exhibit pronounced memory-bound characteristics during inference due to High Bandwidth Memory (HBM) bandwidth constraints. In this paper, we propose an L2 Cache-oriented asynchronous KV Cache prefetching method to break through the memory bandwidth bottleneck in LLM inference through computation-load overlap. By strategically scheduling idle memory bandwidth during active computation windows, our method proactively prefetches required KV Cache into GPU L2 cache, enabling high-speed L2 cache hits for subsequent accesses and effectively hiding HBM access latency within computational cycles. Extensive experiments on NVIDIA H20 GPUs demonstrate that the proposed method achieves 2.15x improvement in attention kernel efficiency and up to 1.97x end-to-end throughput enhancement, surpassing state-of-the-art baseline FlashAttention-3. Notably, our solution maintains orthogonality to existing optimization techniques and can be integrated with current inference frameworks, providing a scalable latency-hiding solution for next-generation LLM inference engines.

Updated: 2025-04-08 09:17:35

标题: 通过异步KV缓存预取加速LLM推断吞吐量

摘要: 大型语言模型（LLMs）在推断过程中表现出明显的受记忆限制的特征，这是由于高带宽内存（HBM）带宽约束所致。本文提出了一种以L2缓存为导向的异步KV缓存预取方法，通过计算负载重叠来突破LLM推断中的内存带宽瓶颈。通过在活动计算窗口中策略性地调度空闲内存带宽，我们的方法主动地预取所需的KV缓存到GPU L2缓存中，实现高速L2缓存命中以用于后续访问，并有效地隐藏HBM访问延迟在计算周期内。在NVIDIA H20 GPU上进行的大量实验表明，所提出的方法实现了注意力核心效率的2.15倍改进，并最多提高了1.97倍的端到端吞吐量，超过了最先进的基准FlashAttention-3。值得注意的是，我们的解决方案与现有优化技术保持正交性，并可以集成到当前推断框架中，为下一代LLM推断引擎提供可伸缩的隐藏延迟解决方案。

更新时间: 2025-04-08 09:17:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06319v1

Channel State Information Analysis for Jamming Attack Detection in Static and Dynamic UAV Networks -- An Experimental Study

Networks built on the IEEE 802.11 standard have experienced rapid growth in the last decade. Their field of application is vast, including smart home applications, Internet of Things (IoT), and short-range high throughput static and dynamic inter-vehicular communication networks. Within such networks, Channel State Information (CSI) provides a detailed view of the state of the communication channel and represents the combined effects of multipath propagation, scattering, phase shift, fading, and power decay. In this work, we investigate the problem of jamming attack detection in static and dynamic vehicular networks. We utilize ESP32-S3 modules to set up a communication network between an Unmanned Aerial Vehicle (UAV) and a Ground Control Station (GCS), to experimentally test the combined effects of a constant jammer on recorded CSI parameters, and the feasibility of jamming detection through CSI analysis in static and dynamic communication scenarios.

Updated: 2025-04-08 09:15:53

标题: 静态和动态无人机网络中的干扰攻击检测渠道状态信息分析--实验研究

摘要: 建立在IEEE 802.11标准上的网络在过去十年中经历了快速增长。它们的应用领域广泛，包括智能家居应用、物联网(IoT)和短距离高吞吐量静态和动态车辆间通信网络。在这些网络中，信道状态信息(Channel State Information，CSI)提供了通信信道状态的详细视图，并代表了多径传播、散射、相移、衰落和功率衰减的综合效应。本文研究了静态和动态车辆网络中干扰攻击检测的问题。我们利用ESP32-S3模块在无人机(UAV)和地面控制站(GCS)之间建立通信网络，通过实验测试恒定干扰器对记录的CSI参数的综合影响，以及在静态和动态通信场景中通过CSI分析检测干扰的可行性。

更新时间: 2025-04-08 09:15:53

领域: cs.CR,cs.RO

下载: http://arxiv.org/abs/2504.05832v1

Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset

Human Activity Recognition (HAR) primarily relied on traditional RGB cameras to achieve high-performance activity recognition. However, the challenging factors in real-world scenarios, such as insufficient lighting and rapid movements, inevitably degrade the performance of RGB cameras. To address these challenges, biologically inspired event cameras offer a promising solution to overcome the limitations of traditional RGB cameras. In this work, we rethink human activity recognition by combining the RGB and event cameras. The first contribution is the proposed large-scale multi-modal RGB-Event human activity recognition benchmark dataset, termed HARDVS 2.0, which bridges the dataset gaps. It contains 300 categories of everyday real-world actions with a total of 107,646 paired videos covering various challenging scenarios. Inspired by the physics-informed heat conduction model, we propose a novel multi-modal heat conduction operation framework for effective activity recognition, termed MMHCO-HAR. More in detail, given the RGB frames and event streams, we first extract the feature embeddings using a stem network. Then, multi-modal Heat Conduction blocks are designed to fuse the dual features, the key module of which is the multi-modal Heat Conduction Operation layer. We integrate RGB and event embeddings through a multi-modal DCT-IDCT layer while adaptively incorporating the thermal conductivity coefficient via FVEs into this module. After that, we propose an adaptive fusion module based on a policy routing strategy for high-performance classification. Comprehensive experiments demonstrate that our method consistently performs well, validating its effectiveness and robustness. The source code and benchmark dataset will be released on https://github.com/Event-AHU/HARDVS/tree/HARDVSv2

Updated: 2025-04-08 09:14:24

标题: 使用RGB-事件传感器进行人类活动识别：多模热传导模型和基准数据集

摘要: 人类活动识别（HAR）主要依赖传统的RGB摄像头来实现高性能的活动识别。然而，在现实世界的情景中，诸如光照不足和快速移动等挑战因素不可避免地降低了RGB摄像头的性能。为了解决这些挑战，受生物启发的事件摄像头提供了一个有前途的解决方案，以克服传统RGB摄像头的局限性。在这项工作中，我们通过结合RGB和事件摄像头重新思考人类活动识别。第一个贡献是提出了一个大规模的多模式RGB-Event人类活动识别基准数据集，称为HARDVS 2.0，弥合了数据集间的差距。它包含了300种日常生活中真实动作的类别，总共包括107,646个配对视频，涵盖了各种具有挑战性的情景。受受到物理知识的传热模型的启发，我们提出了一个新颖的多模式传热操作框架，用于有效的活动识别，称为MMHCO-HAR。更详细地讲，给定RGB帧和事件流，我们首先使用一个干扰网络提取特征嵌入。然后，设计了多模式传热块来融合双重特征，其中的关键模块是多模式传热操作层。我们通过一个多模式DCT-IDCT层将RGB和事件嵌入整合起来，同时通过FVEs自适应地将热导率系数纳入该模块。之后，我们提出了一个基于策略路由策略的自适应融合模块，用于高性能分类。全面的实验表明，我们的方法始终表现良好，验证了其有效性和鲁棒性。源代码和基准数据集将发布在https://github.com/Event-AHU/HARDVS/tree/HARDVSv2。

更新时间: 2025-04-08 09:14:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05830v1

PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse

Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar SSL method and a contrastive learning-based SSL method. Additionally, PFML is on par with the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse.

Updated: 2025-04-08 09:13:33

标题: PFML：无表示崩溃的时间序列数据的自监督学习

摘要: 自监督学习（SSL）是一种数据驱动的学习方法，利用数据的固有结构来引导学习过程。与监督学习相比，后者依赖于外部标签，SSL利用数据的固有特性产生自己的监督信号。然而，SSL方法经常出现的一个问题是表示坍塌，即模型输出恒定的输入不变特征表示。这个问题阻碍了SSL方法应用于新数据模态的潜力，因为试图避免表示坍塌会浪费研究人员的时间和精力。本文介绍了一种新颖的用于时间序列数据的SSL算法，称为PFML（从掩码潜变量预测功能）。PFML不是直接预测掩码输入信号或其潜在表示，而是通过预测与掩码嵌入对应的输入信号的统计功能来运行，给定一系列未掩码的嵌入。该算法设计旨在避免表示坍塌，使其可直接应用于不同的时间序列数据领域，如临床数据中的新型传感器模态。我们通过三种不同的数据模态展示了PFML的有效性：从多传感器惯性测量单位数据中对婴儿姿势和运动进行分类，从语音数据中对情绪进行识别，以及从脑电图数据中对睡眠阶段进行分类。结果表明，PFML优于一个在概念上类似的SSL方法和一个基于对比学习的SSL方法。此外，PFML与当前最先进的SSL方法不相上下，同时在概念上更简单，且不会出现表示坍塌。

更新时间: 2025-04-08 09:13:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.10087v3

Federated Unlearning Made Practical: Seamless Integration via Negated Pseudo-Gradients

The right to be forgotten is a fundamental principle of privacy-preserving regulations and extends to Machine Learning (ML) paradigms such as Federated Learning (FL). While FL enhances privacy by enabling collaborative model training without sharing private data, trained models still retain the influence of training data. Federated Unlearning (FU) methods recently proposed often rely on impractical assumptions for real-world FL deployments, such as storing client update histories or requiring access to a publicly available dataset. To address these constraints, this paper introduces a novel method that leverages negated Pseudo-gradients Updates for Federated Unlearning (PUF). Our approach only uses standard client model updates, anyway employed during regular FL rounds, and interprets them as pseudo-gradients. When a client needs to be forgotten, we apply the negated of their pseudo-gradients, appropriately scaled, to the global model. Unlike state-of-the-art mechanisms, PUF seamlessly integrates with FL workflows, incurs no additional computational and communication overhead beyond standard FL rounds, and supports concurrent unlearning requests. We extensively evaluated the proposed method on two well-known benchmark image classification datasets (CIFAR-10 and CIFAR-100) and a real-world medical imaging dataset for segmentation (ProstateMRI), using three different neural architectures: two residual networks and a vision transformer. The experimental results across various settings demonstrate that PUF achieves state-of-the-art forgetting effectiveness and recovery time, without relying on any additional assumptions, thus underscoring its practical applicability.

Updated: 2025-04-08 09:05:33

标题: 联邦式遗忘实用化：通过否定伪梯度实现无缝集成

摘要: 被遗忘的权利是隐私保护法规的基本原则，也适用于机器学习（ML）范式，如联邦学习（FL）。虽然FL通过实现协作模型训练而无需共享私人数据来增强隐私保护，但训练模型仍保留训练数据的影响。最近提出的联邦遗忘（FU）方法通常依赖于对于现实世界FL部署来说不切实际的假设，比如存储客户端更新历史记录或需要访问公开可用的数据集。为了解决这些限制，本文介绍了一种利用否定伪梯度更新进行联邦遗忘（PUF）的新方法。我们的方法仅使用标准客户端模型更新，这些更新在常规FL轮次中已经被使用，并将它们解释为伪梯度。当一个客户端需要被遗忘时，我们将其伪梯度的否定适当缩放后应用于全局模型。与最先进的机制不同，PUF与FL工作流程无缝集成，不会产生额外的计算和通信开销，超出标准FL轮次，同时支持并发遗忘请求。我们在两个著名的基准图像分类数据集（CIFAR-10和CIFAR-100）以及一个用于分割的真实医学成像数据集（ProstateMRI）上对所提出的方法进行了广泛评估，使用了三种不同的神经架构：两个残差网络和一个视觉变换器。在各种设置下的实验结果表明，PUF实现了最先进的遗忘效果和恢复时间，而无需依赖任何额外的假设，突显了其实际适用性。

更新时间: 2025-04-08 09:05:33

领域: cs.LG

下载: http://arxiv.org/abs/2504.05822v1

MMTEB: Massive Multilingual Text Embedding Benchmark

Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ languages. MMTEB includes a diverse set of challenging, novel tasks such as instruction following, long-document retrieval, and code retrieval, representing the largest multilingual collection of evaluation tasks for embedding models to date. Using this collection, we develop several highly multilingual benchmarks, which we use to evaluate a representative set of models. We find that while large language models (LLMs) with billions of parameters can achieve state-of-the-art performance on certain language subsets and task categories, the best-performing publicly available model is multilingual-e5-large-instruct with only 560 million parameters. To facilitate accessibility and reduce computational cost, we introduce a novel downsampling method based on inter-task correlation, ensuring a diverse selection while preserving relative model rankings. Furthermore, we optimize tasks such as retrieval by sampling hard negatives, creating smaller but effective splits. These optimizations allow us to introduce benchmarks that drastically reduce computational demands. For instance, our newly introduced zero-shot English benchmark maintains a ranking order similar to the full-scale version but at a fraction of the computational cost.

Updated: 2025-04-08 08:57:22

标题: MMTEB：大规模多语言文本嵌入基准

摘要: 文本嵌入通常在有限的一组任务上进行评估，这些任务受到语言、领域和任务多样性的限制。为了解决这些限制并提供更全面的评估，我们引入了大规模多语言文本嵌入基准测试（MMTEB）- 这是MTEB的一个大规模、社区驱动的扩展，涵盖了超过250种语言的500多个经过质量控制的评估任务。MMTEB包括一系列具有挑战性的、新颖的任务，如按照指示操作、长文档检索和代码检索，代表迄今为止最大的多语言嵌入模型评估任务集合。利用这个集合，我们开发了几个高度多语言的基准测试，用于评估一组代表性的模型。我们发现，虽然具有数十亿参数的大型语言模型（LLMs）可以在某些语言子集和任务类别上实现最先进的性能，但表现最佳的公开可用模型是仅具有5.6亿参数的多语言-e5-large-instruct。为了促进可访问性并减少计算成本，我们引入了一种基于任务之间相关性的新型降采样方法，确保多样化选择的同时保持相对模型排名。此外，我们通过采样难例来优化检索等任务，创建更小但有效的数据集。这些优化使我们能够推出大大减少计算需求的基准测试。例如，我们新引入的零-shot英语基准测试保持了类似完整版本的排名顺序，但计算成本只是一小部分。

更新时间: 2025-04-08 08:57:22

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2502.13595v2

Sample-efficient Unsupervised Policy Cloning from Ensemble Self-supervised Labeled Videos

Current advanced policy learning methodologies have demonstrated the ability to develop expert-level strategies when provided enough information. However, their requirements, including task-specific rewards, action-labeled expert trajectories, and huge environmental interactions, can be expensive or even unavailable in many scenarios. In contrast, humans can efficiently acquire skills within a few trials and errors by imitating easily accessible internet videos, in the absence of any other supervision. In this paper, we try to let machines replicate this efficient watching-and-learning process through Unsupervised Policy from Ensemble Self-supervised labeled Videos (UPESV), a novel framework to efficiently learn policies from action-free videos without rewards and any other expert supervision. UPESV trains a video labeling model to infer the expert actions in expert videos through several organically combined self-supervised tasks. Each task performs its duties, and they together enable the model to make full use of both action-free videos and reward-free interactions for robust dynamics understanding and advanced action prediction. Simultaneously, UPESV clones a policy from the labeled expert videos, in turn collecting environmental interactions for self-supervised tasks. After a sample-efficient, unsupervised, and iterative training process, UPESV obtains an advanced policy based on a robust video labeling model. Extensive experiments in sixteen challenging procedurally generated environments demonstrate that the proposed UPESV achieves state-of-the-art interaction-limited policy learning performance (outperforming five current advanced baselines on 12/16 tasks) without exposure to any other supervision except for videos.

Updated: 2025-04-08 08:54:33

标题: 高效样本无监督策略克隆：来自集成自监督标记视频的方法

摘要: 当前的先进政策学习方法已经证明在提供足够信息时能够开发专家级别的策略。然而，它们的要求，包括任务特定的奖励、动作标记的专家轨迹和大量的环境交互，在许多情况下可能是昂贵的甚至不可用的。相比之下，人类可以通过模仿易于获取的互联网视频，在几次试验和错误中高效地获得技能，而无需任何其他监督。在本文中，我们尝试让机器通过非监督的Ensemble Self-supervised labeled Videos (UPESV)框架复制这种高效的观看和学习过程，该框架可以从无动作视频中高效地学习政策，而无需奖励和任何其他专家监督。UPESV训练一个视频标注模型，通过几个有机结合的自监督任务推断专家视频中的专家动作。每个任务都完成其任务，它们共同使模型充分利用无动作视频和无奖励互动，以实现稳健的动态理解和高级动作预测。同时，UPESV从标记的专家视频中克隆一个策略，进而收集环境交互以进行自监督任务。经过高效的样本、无监督和迭代训练过程，UPESV基于稳健的视频标注模型获得了一个先进的策略。在十六个具有挑战性的程序生成环境中的广泛实验表明，所提出的UPESV实现了最先进的受限互动政策学习性能（在12/16个任务上优于五个当前的先进基线），而不需要接受任何其他监督，除了视频。

更新时间: 2025-04-08 08:54:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.10778v2

Parasite: A Steganography-based Backdoor Attack Framework for Diffusion Models

Recently, the diffusion model has gained significant attention as one of the most successful image generation models, which can generate high-quality images by iteratively sampling noise. However, recent studies have shown that diffusion models are vulnerable to backdoor attacks, allowing attackers to enter input data containing triggers to activate the backdoor and generate their desired output. Existing backdoor attack methods primarily focused on target noise-to-image and text-to-image tasks, with limited work on backdoor attacks in image-to-image tasks. Furthermore, traditional backdoor attacks often rely on a single, conspicuous trigger to generate a fixed target image, lacking concealability and flexibility. To address these limitations, we propose a novel backdoor attack method called "Parasite" for image-to-image tasks in diffusion models, which not only is the first to leverage steganography for triggers hiding, but also allows attackers to embed the target content as a backdoor trigger to achieve a more flexible attack. "Parasite" as a novel attack method effectively bypasses existing detection frameworks to execute backdoor attacks. In our experiments, "Parasite" achieved a 0 percent backdoor detection rate against the mainstream defense frameworks. In addition, in the ablation study, we discuss the influence of different hiding coefficients on the attack results. You can find our code at https://anonymous.4open.science/r/Parasite-1715/.

Updated: 2025-04-08 08:53:47

标题: 《寄生虫：一种基于隐写术的扩散模型后门攻击框架》

摘要: 最近，扩散模型作为最成功的图像生成模型之一，已经引起了重大关注，可以通过迭代地采样噪声生成高质量的图像。然而，最近的研究表明，扩散模型容易受到后门攻击的影响，允许攻击者输入包含触发器的数据来激活后门并生成他们想要的输出。现有的后门攻击方法主要集中在目标噪声到图像和文本到图像任务上，对图像到图像任务中的后门攻击工作有限。此外，传统的后门攻击往往依赖于单一明显的触发器来生成固定目标图像，缺乏隐蔽性和灵活性。为了解决这些限制，我们提出了一种新颖的后门攻击方法，称为“寄生虫”，用于扩散模型中的图像到图像任务，不仅是第一个利用隐写术进行触发器隐藏的方法，还允许攻击者将目标内容嵌入后门触发器以实现更灵活的攻击。作为一种新颖的攻击方法，“寄生虫”有效地绕过了现有的检测框架来执行后门攻击。在我们的实验中，“寄生虫”在主流防御框架中实现了0％的后门检测率。此外，在消融研究中，我们讨论了不同隐藏系数对攻击结果的影响。您可以在https://anonymous.4open.science/r/Parasite-1715/找到我们的代码。

更新时间: 2025-04-08 08:53:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05815v1

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

While large language models (LLMs) have demonstrated exceptional capabilities in challenging tasks such as mathematical reasoning, existing methods to enhance reasoning ability predominantly rely on supervised fine-tuning (SFT) followed by reinforcement learning (RL) on reasoning-specific data after pre-training. However, these approaches critically depend on external supervisions--such as human labelled reasoning traces, verified golden answers, or pre-trained reward models--which limits scalability and practical applicability. In this work, we propose Entropy Minimized Policy Optimization (EMPO), which makes an early attempt at fully unsupervised LLM reasoning incentivization. EMPO does not require any supervised information for incentivizing reasoning capabilities (i.e., neither verifiable reasoning traces, problems with golden answers, nor additional pre-trained reward models). By continuously minimizing the predictive entropy of LLMs on unlabeled user queries in a latent semantic space, EMPO enables purely self-supervised evolution of reasoning capabilities with strong flexibility and practicality. Our experiments demonstrate competitive performance of EMPO on both mathematical reasoning and free-form commonsense reasoning tasks. Specifically, without any supervised signals, EMPO boosts the accuracy of Qwen2.5-Math-7B Base from 30.7\% to 48.1\% on mathematical benchmarks and improves truthfulness accuracy of Qwen2.5-7B Instruct from 87.16\% to 97.25\% on TruthfulQA.

Updated: 2025-04-08 08:48:51

标题: 正确的问题已经是答案的一半：完全无监督的LLM推理激励化

摘要: 大型语言模型(LLMs)在挑战性任务如数学推理方面展示了出色的能力，现有的增强推理能力方法主要依赖于在预训练后进行监督微调(SFT)，然后在推理特定数据上进行强化学习(RL)。然而，这些方法关键依赖于外部监督，如人类标记的推理轨迹、验证的黄金答案或预训练奖励模型，这限制了可扩展性和实际应用性。在这项工作中，我们提出熵最小化策略优化(EMPO)，它尝试完全无监督的LLM推理激励。EMPO不需要任何监督信息来激励推理能力(即，既没有可验证的推理轨迹，也没有带有黄金答案的问题，也没有额外的预训练奖励模型)。通过在潜在语义空间中不断最小化LLMs在未标记用户查询上的预测熵，EMPO实现了推理能力的纯自我监督演进，具有强大的灵活性和实用性。我们的实验表明，EMPO在数学推理和自由形式常识推理任务上表现竞争力。具体而言，在没有任何监督信号的情况下，EMPO将Qwen2.5-Math-7B Base的准确率从30.7%提高到48.1%，并将Qwen2.5-7B Instruct的真实性准确率从87.16%提高到97.25%。

更新时间: 2025-04-08 08:48:51

领域: cs.LG

下载: http://arxiv.org/abs/2504.05812v1

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Large language models (LLMs) are increasingly contributing to the creation of content on the Internet. This creates a feedback loop as subsequent generations of models will be trained on this generated, synthetic data. This phenomenon is receiving increasing interest, in particular because previous studies have shown that it may lead to distribution shift - models misrepresent and forget the true underlying distributions of human data they are expected to approximate (e.g. resulting in a drastic loss of quality). In this study, we study the impact of human data properties on distribution shift dynamics in iterated training loops. We first confirm that the distribution shift dynamics greatly vary depending on the human data by comparing four datasets (two based on Twitter and two on Reddit). We then test whether data quality may influence the rate of this shift. We find that it does on the twitter, but not on the Reddit datasets. We then focus on a Reddit dataset and conduct a more exhaustive evaluation of a large set of dataset properties. This experiment associated lexical diversity with larger, and semantic diversity with smaller detrimental shifts, suggesting that incorporating text with high lexical (but limited semantic) diversity could exacerbate the degradation of generated text. We then focus on the evolution of political bias, and find that the type of shift observed (bias reduction, amplification or inversion) depends on the political lean of the human (true) distribution. Overall, our work extends the existing literature on the consequences of recursive fine-tuning by showing that this phenomenon is highly dependent on features of the human data on which training occurs. This suggests that different parts of internet (e.g. GitHub, Reddit) may undergo different types of shift depending on their properties.

Updated: 2025-04-08 08:45:26

标题: LLMs中的递归训练循环：训练数据属性如何调节生成数据中的分布转移？

摘要: 大型语言模型（LLMs）越来越多地为互联网上内容的创作做出贡献。这导致了一个反馈循环，因为随后的模型世代将在这些生成的合成数据上进行训练。这种现象越来越受到关注，特别是因为以前的研究表明，这可能导致分布转移 - 模型错误地表示和忘记了其应该逼近的人类数据的真实基础分布（例如导致质量急剧下降）。在这项研究中，我们研究了人类数据属性对迭代训练循环中的分布转移动态的影响。我们首先通过比较四个数据集（两个基于Twitter，两个基于Reddit）确认，分布转移动态在很大程度上取决于人类数据。然后我们测试数据质量是否可能影响这种转移的速率。我们发现在Twitter数据集上是可以的，但在Reddit数据集上不是。然后我们专注于一个Reddit数据集，并对大量数据集属性进行更全面的评估。这个实验将词汇多样性与更大的、语义多样性与较小的有害转移联系起来，表明将具有高词汇（但有限语义）多样性的文本纳入可能会加剧生成文本的退化。然后我们关注政治偏见的演变，并发现观察到的转移类型（偏见减少、增强或反转）取决于人类（真实）分布的政治倾向。总的来说，我们的工作通过展示这种现象高度依赖于训练所在的人类数据的特征，扩展了现有关于递归微调后果的文献。这表明互联网的不同部分（例如GitHub、Reddit）可能会根据其属性经历不同类型的转移。

更新时间: 2025-04-08 08:45:26

领域: cs.LG,cs.AI,cs.CL,68T50,I.2.7

下载: http://arxiv.org/abs/2504.03814v2

Meta-Continual Learning of Neural Fields

Neural Fields (NF) have gained prominence as a versatile framework for complex data representation. This work unveils a new problem setting termed \emph{Meta-Continual Learning of Neural Fields} (MCL-NF) and introduces a novel strategy that employs a modular architecture combined with optimization-based meta-learning. Focused on overcoming the limitations of existing methods for continual learning of neural fields, such as catastrophic forgetting and slow convergence, our strategy achieves high-quality reconstruction with significantly improved learning speed. We further introduce Fisher Information Maximization loss for neural radiance fields (FIM-NeRF), which maximizes information gains at the sample level to enhance learning generalization, with proved convergence guarantee and generalization bound. We perform extensive evaluations across image, audio, video reconstruction, and view synthesis tasks on six diverse datasets, demonstrating our method's superiority in reconstruction quality and speed over existing MCL and CL-NF approaches. Notably, our approach attains rapid adaptation of neural fields for city-scale NeRF rendering with reduced parameter requirement.

Updated: 2025-04-08 08:38:37

标题: 元学习神经场的持续学习

摘要: 神经场（NF）作为一种多功能的数据表示框架已经备受关注。本研究揭示了一个名为\emph{神经场的元持续学习}（MCL-NF）的新问题设定，并引入了一种采用模块化架构结合基于优化的元学习的新策略。我们的策略专注于克服现有方法在神经场持续学习方面的局限性，如灾难性遗忘和收敛速度慢，实现了高质量重建并显著提高了学习速度。我们进一步引入了用于神经辐射场的Fisher信息最大化损失（FIM-NeRF），该损失在样本级别最大化信息增益，以增强学习泛化能力，并证明了其收敛保证和泛化界。我们在六个不同数据集上进行了广泛评估，包括图像、音频、视频重建和视图合成任务，展示了我们的方法在重建质量和速度方面优于现有的MCL和CL-NF方法。值得注意的是，我们的方法实现了对城市规模NeRF渲染的快速适应，同时减少了参数要求。

更新时间: 2025-04-08 08:38:37

领域: cs.AI

下载: http://arxiv.org/abs/2504.05806v1

StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization

The integration of large language models (LLMs) into information retrieval systems introduces new attack surfaces, particularly for adversarial ranking manipulations. We present StealthRank, a novel adversarial ranking attack that manipulates LLM-driven product recommendation systems while maintaining textual fluency and stealth. Unlike existing methods that often introduce detectable anomalies, StealthRank employs an energy-based optimization framework combined with Langevin dynamics to generate StealthRank Prompts (SRPs)-adversarial text sequences embedded within product descriptions that subtly yet effectively influence LLM ranking mechanisms. We evaluate StealthRank across multiple LLMs, demonstrating its ability to covertly boost the ranking of target products while avoiding explicit manipulation traces that can be easily detected. Our results show that StealthRank consistently outperforms state-of-the-art adversarial ranking baselines in both effectiveness and stealth, highlighting critical vulnerabilities in LLM-driven recommendation systems.

Updated: 2025-04-08 08:36:18

标题: StealthRank：通过隐秘提示优化进行LLM排名操纵

摘要: 将大型语言模型（LLM）集成到信息检索系统中引入了新的攻击面，特别是针对敌对排名操纵。我们提出了StealthRank，一种新颖的敌对排名攻击，可以操纵由LLM驱动的产品推荐系统，同时保持文本流畅和隐秘性。与现有方法不同，通常会引入可检测的异常，StealthRank采用基于能量的优化框架结合Langevin动力学来生成StealthRank提示（SRPs）-嵌入产品描述中的敌对文本序列，微妙而有效地影响LLM排名机制。我们在多个LLM上评估了StealthRank，展示了其能够在避免易被检测到的明显操纵痕迹的情况下，秘密提升目标产品的排名。我们的结果表明，StealthRank在有效性和隐秘性方面始终优于最先进的敌对排名基线，突显了LLM驱动的推荐系统中的关键漏洞。

更新时间: 2025-04-08 08:36:18

领域: cs.IR,cs.AI,cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.05804v1

Your Image Generator Is Your New Private Dataset

Generative diffusion models have emerged as powerful tools to synthetically produce training data, offering potential solutions to data scarcity and reducing labelling costs for downstream supervised deep learning applications. However, effectively leveraging text-conditioned image generation for building classifier training sets requires addressing key issues: constructing informative textual prompts, adapting generative models to specific domains, and ensuring robust performance. This paper proposes the Text-Conditioned Knowledge Recycling (TCKR) pipeline to tackle these challenges. TCKR combines dynamic image captioning, parameter-efficient diffusion model fine-tuning, and Generative Knowledge Distillation techniques to create synthetic datasets tailored for image classification. The pipeline is rigorously evaluated on ten diverse image classification benchmarks. The results demonstrate that models trained solely on TCKR-generated data achieve classification accuracies on par with (and in several cases exceeding) models trained on real images. Furthermore, the evaluation reveals that these synthetic-data-trained models exhibit substantially enhanced privacy characteristics: their vulnerability to Membership Inference Attacks is significantly reduced, with the membership inference AUC lowered by 5.49 points on average compared to using real training data, demonstrating a substantial improvement in the performance-privacy trade-off. These findings indicate that high-fidelity synthetic data can effectively replace real data for training classifiers, yielding strong performance whilst simultaneously providing improved privacy protection as a valuable emergent property. The code and trained models are available in the accompanying open-source repository.

Updated: 2025-04-08 08:35:53

标题: 您的图像生成器是您的新私人数据集。

摘要: 生成扩散模型已经成为合成生成训练数据的强大工具，为解决数据稀缺和减少下游监督式深度学习应用的标注成本提供了潜在解决方案。然而，有效利用文本条件图像生成来构建分类器训练集需要解决关键问题：构建信息丰富的文本提示，将生成模型调整到特定领域，并确保稳健性能。本文提出了文本条件知识回收（TCKR）管道来解决这些挑战。TCKR结合了动态图像字幕、参数高效扩散模型微调以及生成知识蒸馏技术，以创建适用于图像分类的合成数据集。该管道在十个不同的图像分类基准上进行了严格评估。结果表明，仅在TCKR生成的数据上训练的模型实现了与（在几种情况下超过）在真实图像上训练的模型相当的分类准确性。此外，评估显示，这些经合成数据训练的模型表现出明显增强的隐私特性：它们对成员推断攻击的脆弱性显著降低，成员推断AUC平均降低了5.49个点，相比于使用真实训练数据，表明在性能-隐私权衡方面取得了实质性改进。这些发现表明，高保真度的合成数据可以有效替代真实数据用于训练分类器，产生强大的性能，同时提供改进的隐私保护作为一种有价值的新兴属性。相关代码和训练模型可在随附的开源存储库中获得。

更新时间: 2025-04-08 08:35:53

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.04582v2

Large Language Models for Knowledge Graph Embedding: A Survey

Large language models (LLMs) have garnered significant attention for their superior performance in many knowledge-driven applications on the world wide web.These models are designed to train hundreds of millions or more parameters on large amounts of text data, enabling them to understand and generate naturallanguage effectively. As the superior performance of LLMs becomes apparent,they are increasingly being applied to knowledge graph embedding (KGE) related tasks to improve the processing results. Traditional KGE representation learning methods map entities and relations into a low-dimensional vector space, enablingthe triples in the knowledge graph to satisfy a specific scoring function in thevector space. However, based on the powerful language understanding and seman-tic modeling capabilities of LLMs, that have recently been invoked to varying degrees in different types of KGE related scenarios such as multi-modal KGE andopen KGE according to their task characteristics. In this paper, we investigate awide range of approaches for performing LLMs-related tasks in different types of KGE scenarios. To better compare the various approaches, we summarize each KGE scenario in a classification. Finally, we discuss the applications in which the methods are mainly used and suggest several forward-looking directions for the development of this new research area.

Updated: 2025-04-08 08:33:49

标题: 大型语言模型用于知识图嵌入：一项调查

摘要: 大型语言模型(LLMs)因其在世界范围内许多知识驱动应用中的卓越性能而受到重视。这些模型旨在训练数亿甚至更多参数，使它们能够有效地理解和生成自然语言。随着LLMs卓越性能的显现，它们越来越多地被应用于知识图嵌入(KGE)相关任务，以改善处理结果。传统的KGE表示学习方法将实体和关系映射到低维向量空间，使知识图中的三元组在向量空间中满足特定的评分函数。然而，基于最近LLMs强大的语言理解和语义建模能力，已经在不同类型的KGE相关场景中以不同程度被调用，如多模态KGE和开放KGE等。在本文中，我们研究了在不同类型的KGE场景中执行LLMs相关任务的各种方法。为了更好地比较各种方法，我们总结了每个KGE场景的分类。最后，我们讨论了这些方法主要用于的应用，并提出了这一新研究领域发展的几个前瞻方向。

更新时间: 2025-04-08 08:33:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.07766v2

From Superficial to Deep: Integrating External Knowledge for Follow-up Question Generation Using Knowledge Graph and LLM

In a conversational system, dynamically generating follow-up questions based on context can help users explore information and provide a better user experience. Humans are usually able to ask questions that involve some general life knowledge and demonstrate higher order cognitive skills. However, the questions generated by existing methods are often limited to shallow contextual questions that are uninspiring and have a large gap to the human level. In this paper, we propose a three-stage external knowledge-enhanced follow-up question generation method, which generates questions by identifying contextual topics, constructing a knowledge graph (KG) online, and finally combining these with a large language model to generate the final question. The model generates information-rich and exploratory follow-up questions by introducing external common sense knowledge and performing a knowledge fusion operation. Experiments show that compared to baseline models, our method generates questions that are more informative and closer to human questioning levels while maintaining contextual relevance.

Updated: 2025-04-08 08:31:03

标题: 从表层到深度：利用知识图谱和LLM集成外部知识进行后续问题生成

摘要: 在对话系统中，基于上下文动态生成后续问题可以帮助用户探索信息并提供更好的用户体验。人类通常能够提出涉及一些一般生活知识并展示较高层次认知能力的问题。然而，现有方法生成的问题往往局限于肤浅的上下文问题，缺乏灵感，并与人类水平存在较大差距。在本文中，我们提出了一种三阶段的外部知识增强后续问题生成方法，该方法通过识别上下文主题、在线构建知识图谱（KG）并最终将其与大型语言模型结合生成最终问题。该模型通过引入外部常识知识和执行知识融合操作生成信息丰富和探索性的后续问题。实验证明，与基准模型相比，我们的方法生成的问题更具信息量，更接近人类提问水平，同时保持上下文相关性。

更新时间: 2025-04-08 08:31:03

领域: cs.AI

下载: http://arxiv.org/abs/2504.05801v1

Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling

Training-free consistent text-to-image generation depicting the same subjects across different images is a topic of widespread recent interest. Existing works in this direction predominantly rely on cross-frame self-attention; which improves subject-consistency by allowing tokens in each frame to pay attention to tokens in other frames during self-attention computation. While useful for single subjects, we find that it struggles when scaling to multiple characters. In this work, we first analyze the reason for these limitations. Our exploration reveals that the primary-issue stems from self-attention-leakage, which is exacerbated when trying to ensure consistency across multiple-characters. This happens when tokens from one subject pay attention to other characters, causing them to appear like each other (e.g., a dog appearing like a duck). Motivated by these findings, we propose StoryBooth: a training-free approach for improving multi-character consistency. In particular, we first leverage multi-modal chain-of-thought reasoning and region-based generation to apriori localize the different subjects across the desired story outputs. The final outputs are then generated using a modified diffusion model which consists of two novel layers: 1) a bounded cross-frame self-attention layer for reducing inter-character attention leakage, and 2) token-merging layer for improving consistency of fine-grain subject details. Through both qualitative and quantitative results we find that the proposed approach surpasses prior state-of-the-art, exhibiting improved consistency across both multiple-characters and fine-grain subject details.

Updated: 2025-04-08 08:30:55

标题: Storybooth：无需训练的多主题一致性，提高视觉叙事效果

摘要: 无需训练的一致文本到图像生成，描绘相同主题在不同图像中是一个近期广泛关注的话题。目前在这个方向上的工作主要依赖于跨帧自注意力；通过允许每帧中的标记在自注意力计算过程中关注其他帧中的标记，从而提高主题一致性。虽然对于单个主题很有用，但我们发现在扩展到多个角色时存在困难。在这项工作中，我们首先分析了这些限制的原因。我们的探索揭示了主要问题源于自注意力泄漏，尤其是在试图确保跨多个角色的一致性时更加严重。当来自一个主题的标记关注其他角色时，会导致它们相互之间看起来相似（例如，一只狗看起来像一只鸭子）。受到这些发现的启发，我们提出了StoryBooth：一个用于提高多角色一致性的无需训练的方法。具体地，我们首先利用多模态思维链和基于区域的生成来预先定位所需故事输出中的不同主题。最终的输出是使用修改后的扩散模型生成的，该模型包括两个新颖的层：1）用于减少角色间关注泄漏的有界跨帧自注意力层，和2）用于提高细节主题一致性的标记合并层。通过定性和定量结果，我们发现提出的方法超越了先前的最新技术，展示了跨多个角色和细节主题的一致性得到了改善。

更新时间: 2025-04-08 08:30:55

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2504.05800v1

On Rollouts in Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.

Updated: 2025-04-08 08:24:38

标题: 关于模型驱动强化学习中的Rollouts

摘要: 基于模型的强化学习（MBRL）旨在通过学习环境模型并从中生成合成数据来增强数据效率。然而，在这些数据生成过程中积累的模型错误可能扭曲数据分布，对策略学习产生负面影响，并阻碍长期规划。因此，模型错误的积累是当前MBRL方法的关键瓶颈。我们提出了Infoprop，这是一种基于模型的数据生成机制，可以将模型不确定性分为偶然性和认知性，并减少后者对数据分布的影响。此外，Infoprop会跟踪模型数据生成过程中的积累错误，并提供终止准则以限制数据损坏。我们在Infoprop-Dyna算法中展示了Infoprop的能力，在常见的MuJoCo基准任务中报告了最先进的性能，同时显著增加了数据生成长度和质量。

更新时间: 2025-04-08 08:24:38

领域: cs.LG

下载: http://arxiv.org/abs/2501.16918v2

REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models

Hallucinations in large language model (LLM) outputs severely limit their reliability in knowledge-intensive tasks such as question answering. To address this challenge, we introduce REFIND (Retrieval-augmented Factuality hallucINation Detection), a novel framework that detects hallucinated spans within LLM outputs by directly leveraging retrieved documents. As part of the REFIND, we propose the Context Sensitivity Ratio (CSR), a novel metric that quantifies the sensitivity of LLM outputs to retrieved evidence. This innovative approach enables REFIND to efficiently and accurately detect hallucinations, setting it apart from existing methods. In the evaluation, REFIND demonstrated robustness across nine languages, including low-resource settings, and significantly outperformed baseline models, achieving superior IoU scores in identifying hallucinated spans. This work highlights the effectiveness of quantifying context sensitivity for hallucination detection, thereby paving the way for more reliable and trustworthy LLM applications across diverse languages. Our code is available at https://github.com/oneonlee/REFIND.

Updated: 2025-04-08 08:17:49

标题: SemEval-2025任务3中的REFIND：大型语言模型中的检索增强事实性幻觉检测

摘要: 大型语言模型（LLM）输出中的幻觉严重限制了它们在诸如问答等知识密集型任务中的可靠性。为了解决这一挑战，我们引入了REFIND（检索增强事实性幻觉检测），这是一个新颖的框架，通过直接利用检索到的文档来检测LLM输出中的幻觉片段。作为REFIND的一部分，我们提出了上下文敏感性比率（CSR），这是一个新颖的度量标准，用于量化LLM输出对检索证据的敏感性。这种创新性方法使得REFIND能够高效准确地检测幻觉，使其与现有方法有所区别。在评估中，REFIND跨越九种语言展现了稳健性，包括低资源环境，并且明显优于基线模型，在识别幻觉片段方面取得了更高的IoU分数。这项工作突显了量化上下文敏感性对于幻觉检测的有效性，从而为跨多种语言的更可靠和可信任的LLM应用铺平了道路。我们的代码可在https://github.com/oneonlee/REFIND获取。

更新时间: 2025-04-08 08:17:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.13622v2

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding

Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g., heuristic search), are now being suggested for solution using learning-based or hybrid methods. However, in this domain, it remains difficult, if not impossible, to conduct a fair comparison between classical, learning-based, and hybrid approaches due to the lack of a unified framework that supports both learning and evaluation. To address this, we introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, a collection of predefined problem instances, a visualization toolkit, and a benchmarking tool for automated evaluation. We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators (such as success rate and path length), enabling a fair multi-fold comparison. The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.

Updated: 2025-04-08 08:14:39

标题: POGEMA: 一个用于合作多智能体路径规划的基准平台

摘要: 多智能体强化学习（MARL）最近在解决各种环境中具有挑战性的合作和竞争多智能体问题方面取得了显著的成就，通常涉及少量智能体和完全可观测性。此外，一系列关键的与机器人相关的任务，如传统上使用经典的非可学习方法（例如启发式搜索）来处理的多机器人路径规划，现在被建议使用基于学习或混合方法来解决。然而，在这个领域，由于缺乏支持学习和评估的统一框架，要进行经典、基于学习和混合方法之间的公平比较仍然困难，甚至不可能。为了解决这个问题，我们引入了POGEMA，这是一个全面的工具集，包括用于学习的快速环境、问题实例生成器、一系列预定义的问题实例、可视化工具包和用于自动评估的基准测试工具。我们还介绍和定义了一个评估协议，该协议规定了一系列与领域相关的指标，这些指标是基于主要评估指标（如成功率和路径长度）计算的，从而实现了公平的多重比较。这个比较涉及各种最先进的MARL、基于搜索的方法和混合方法的结果被呈现。

更新时间: 2025-04-08 08:14:39

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2407.14931v3

How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM

3D spatial understanding is essential in real-world applications such as robotics, autonomous vehicles, virtual reality, and medical imaging. Recently, Large Language Models (LLMs), having demonstrated remarkable success across various domains, have been leveraged to enhance 3D understanding tasks, showing potential to surpass traditional computer vision methods. In this survey, we present a comprehensive review of methods integrating LLMs with 3D spatial understanding. We propose a taxonomy that categorizes existing methods into three branches: image-based methods deriving 3D understanding from 2D visual data, point cloud-based methods working directly with 3D representations, and hybrid modality-based methods combining multiple data streams. We systematically review representative methods along these categories, covering data representations, architectural modifications, and training strategies that bridge textual and 3D modalities. Finally, we discuss current limitations, including dataset scarcity and computational challenges, while highlighting promising research directions in spatial perception, multi-modal fusion, and real-world applications.

Updated: 2025-04-08 08:11:39

标题: 怎样激活带有3D容量的LLM？对LLM中的空间推理进行调查

摘要: 3D空间理解在现实世界的应用中至关重要，如机器人技术、自动驾驶车辆、虚拟现实和医学成像。最近，大型语言模型（LLMs）在各个领域展现出了显著的成功，并被利用来增强3D理解任务，显示出超越传统计算机视觉方法的潜力。在这项调查中，我们提出了一项综合评估，涵盖了将LLMs与3D空间理解相结合的方法。我们提出了一个分类法，将现有方法分为三个分支：从2D视觉数据中获取3D理解的基于图像的方法，直接使用3D表示的点云方法，以及结合多个数据流的混合模态方法。我们系统地审查了这些类别中的代表性方法，涵盖了数据表示、架构修改和训练策略，以建立文本和3D模态之间的桥梁。最后，我们讨论了当前的限制，包括数据集稀缺性和计算挑战，同时强调了在空间感知、多模态融合和现实应用领域的有前景的研究方向。

更新时间: 2025-04-08 08:11:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05786v1

Penalising the biases in norm regularisation enforces sparsity

Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between regularising parameters' norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a $\sqrt{1+x^2}$ factor. Notably, this weighting factor disappears when the norm of bias terms is not regularised. The presence of this additional weighting factor is of utmost significance as it is shown to enforce the uniqueness and sparsity (in the number of kinks) of the minimal norm interpolator. Conversely, omitting the bias' norm allows for non-sparse solutions. Penalising the bias terms in the regularisation, either explicitly or implicitly, thus leads to sparse estimators.

Updated: 2025-04-08 08:10:47

标题: 惩罚规范化中的偏差，强制稀疏性

摘要: 控制参数的规范往往在训练神经网络时产生良好的泛化效果。除了简单的直觉之外，正则化参数规范与所获得的估计量之间的关系在理论上仍然存在误解。对于具有一层隐藏的ReLU网络和一维数据，本研究表明表示函数所需的参数规范由其二阶导数的总变化加权得到，权重为$\sqrt{1+x^2}$。值得注意的是，当不对偏置项的规范进行规范化时，这种加权因子会消失。这个额外的加权因子的存在非常重要，因为它被证明可以强制实现最小规范插值器的唯一性和稀疏性（在kinks数量上）。相反，省略偏置的规范允许出现非稀疏解。对偏置项进行正则化，无论是显式还是隐式，都会导致稀疏估计量。

更新时间: 2025-04-08 08:10:47

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2303.01353v4

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

Existing neural head avatars methods have achieved significant progress in the image quality and motion range of portrait animation. However, these methods neglect the computational overhead, and to the best of our knowledge, none is designed to run on mobile devices. This paper presents MobilePortrait, a lightweight one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis, enabling real-time inference on mobile devices. Specifically, we introduce a mixed representation of explicit and implicit keypoints for precise motion modeling and precomputed visual features for enhanced foreground and background synthesis. With these two key designs and using simple U-Nets as backbones, our method achieves state-of-the-art performance with less than one-tenth the computational demand. It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.

Updated: 2025-04-08 08:10:07

标题: MobilePortrait：移动设备上实时一拍即生成的神经头像

摘要: 现有的神经头像方法在肖像动画的图像质量和运动范围方面取得了显著进展。然而，这些方法忽视了计算开销，据我们所知，没有一种是设计用于移动设备上运行的。本文介绍了MobilePortrait，一种轻量级的一次性神经头像方法，通过将外部知识整合到动作建模和图像合成中，降低了学习复杂性，实现了移动设备上的实时推断。具体地，我们引入了明确和隐式关键点的混合表示，用于精确的动作建模，以及预先计算的视觉特征，用于增强前景和背景合成。通过这两个关键设计，并使用简单的U-Net作为主干，我们的方法以不到十分之一的计算需求实现了最先进的性能。已验证在移动设备上达到100 FPS以上的速度，并支持视频和音频驱动输入。

更新时间: 2025-04-08 08:10:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05712v3

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA

Video Question Answering (VideoQA) is a complex video-language task that demands a sophisticated understanding of both visual content and temporal dynamics. Traditional Transformer-style architectures, while effective in integrating multimodal data, often simplify temporal dynamics through positional encoding and fail to capture non-linear interactions within video sequences. In this paper, we introduce the Temporal Trio Transformer (T3T), a novel architecture that models time consistency and time variability. The T3T integrates three key components: Temporal Smoothing (TS), Temporal Difference (TD), and Temporal Fusion (TF). The TS module employs Brownian Bridge for capturing smooth, continuous temporal transitions, while the TD module identifies and encodes significant temporal variations and abrupt changes within the video content. Subsequently, the TF module synthesizes these temporal features with textual cues, facilitating a deeper contextual understanding and response accuracy. The efficacy of the T3T is demonstrated through extensive testing on multiple VideoQA benchmark datasets. Our results underscore the importance of a nuanced approach to temporal modeling in improving the accuracy and depth of video-based question answering.

Updated: 2025-04-08 08:08:03

标题: 视频流作为时间序列：为视频问答发现时间一致性和变化性

摘要: 视频问答（VideoQA）是一项复杂的视频语言任务，要求对视觉内容和时间动态都有深入的理解。传统的Transformer风格架构虽然在整合多模态数据方面很有效，但通常通过位置编码简化时间动态，并未捕捉视频序列中的非线性交互。在本文中，我们介绍了时间三重Transformer（T3T），这是一种新颖的架构，可以建模时间一致性和时间变异性。T3T整合了三个关键组件：时间平滑（TS）、时间差异（TD）和时间融合（TF）。TS模块采用布朗桥来捕捉平滑连续的时间过渡，而TD模块则识别和编码视频内容中重要的时间变化和突变。随后，TF模块将这些时间特征与文本线索综合起来，促进更深入的上下文理解和响应准确性。通过在多个VideoQA基准数据集上进行广泛测试，我们展示了T3T的有效性。我们的结果强调了在改善基于视频的问答的准确性和深度方面，对时间建模采取细致的方法的重要性。

更新时间: 2025-04-08 08:08:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05783v1

DeepGDel: Deep Learning-based Gene Deletion Prediction Framework for Growth-Coupled Production in Genome-Scale Metabolic Models

In genome-scale constraint-based metabolic models, gene deletion strategies are crucial for achieving growth-coupled production, where cell growth and target metabolite production are simultaneously achieved. While computational methods for calculating gene deletions have been widely explored and contribute to developing gene deletion strategy databases, current approaches are limited in leveraging new data-driven paradigms, such as machine learning, for more efficient strain design. Therefore, it is necessary to propose a fundamental framework for this objective. In this study, we first formulate the problem of gene deletion strategy prediction and then propose a framework for predicting gene deletion strategies for growth-coupled production in genome-scale metabolic models. The proposed framework leverages deep learning algorithms to learn and integrate sequential gene and metabolite data representation, enabling the automatic gene deletion strategy prediction. Computational experiment results demonstrate the feasibility of the proposed framework, showing substantial improvements over the baseline method. Specifically, the proposed framework achieves a 17.64%, 27.15%, and 18.07% increase in overall accuracy across three metabolic models of different scales under study, while maintaining balanced precision and recall in predicting gene deletion statuses. The source code and examples for the framework are publicly available at https://github.com/MetNetComp/DeepGDel.

Updated: 2025-04-08 08:07:59

标题: DeepGDel: 基于深度学习的基因删除预测框架，用于基因组规模代谢模型中的生长耦合生产

摘要: 在基因组规模的基于约束的代谢模型中，基因删除策略对于实现生长耦合生产至关重要，其中细胞生长和目标代谢产物生产同时实现。虽然计算基因删除的方法已被广泛研究并有助于开发基因删除策略数据库，但目前的方法在利用新的数据驱动范式（如机器学习）以实现更有效的菌株设计方面存在局限性。因此，有必要提出一个基本框架来实现这一目标。在本研究中，我们首先阐述基因删除策略预测的问题，然后提出了一个在基因组规模代谢模型中预测生长耦合生产的基因删除策略的框架。所提出的框架利用深度学习算法学习和集成顺序基因和代谢物数据表示，实现自动基因删除策略预测。计算实验结果证明了所提出框架的可行性，显示出相比基准方法的实质性改进。具体而言，所提出的框架在研究中三种不同规模的代谢模型中分别实现了总体准确率的17.64％、27.15％和18.07％的增加，同时在预测基因删除状态方面保持了平衡的精确度和召回率。该框架的源代码和示例可在https://github.com/MetNetComp/DeepGDel 上公开获取。

更新时间: 2025-04-08 08:07:59

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2504.06316v1

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal Large Language Models (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited data size, narrow domain coverage, and unstructured knowledge distribution. To close these gaps, we introduce MDK12-Bench, a multi-disciplinary benchmark assessing the reasoning capabilities of MLLMs via real-world K-12 examinations. Spanning six disciplines (math, physics, chemistry, biology, geography, and information science), our benchmark comprises 140K reasoning instances across diverse difficulty levels from primary school to 12th grade. It features 6,827 instance-level knowledge point annotations based on a well-organized knowledge structure, detailed answer explanations, difficulty labels and cross-year partitions, providing a robust platform for comprehensive evaluation. Additionally, we present a novel dynamic evaluation framework to mitigate data contamination issues by bootstrapping question forms, question types, and image styles during evaluation. Extensive experiment on MDK12-Bench reveals the significant limitation of current MLLMs in multimodal reasoning. The findings on our benchmark provide insights into the development of the next-generation models. Our data and codes are available at https://github.com/LanceZPF/MDK12.

Updated: 2025-04-08 08:06:53

标题: MDK12-Bench：用于评估多模态大型语言模型推理能力的多学科基准

摘要: 多模态推理，将语言和视觉线索整合到问题解决和决策中，是人类智能的基本方面，也是通往人工通用智能的关键一步。然而，在多模态大型语言模型（MLLMs）中评估多模态推理能力仍然不足。大多数现有的推理基准受限于有限的数据规模、狭窄的领域覆盖范围和无结构的知识分布。为了弥补这些差距，我们引入了MDK12-Bench，一个跨学科基准，通过真实的K-12考试评估MLLMs的推理能力。我们的基准涵盖了六个学科（数学、物理、化学、生物、地理和信息科学），包括从小学到12年级的各种难度级别的140,000个推理实例。它包括基于良好组织的知识结构的6,827个实例级知识点注释，详细的答案解释，难度标签和跨年度分区，为全面评估提供了稳健的平台。此外，我们提出了一种新颖的动态评估框架，通过在评估过程中引导问题形式、问题类型和图像风格来减轻数据污染问题。对MDK12-Bench的广泛实验揭示了当前MLLMs在多模态推理中的显著局限性。我们基准的发现为下一代模型的发展提供了见解。我们的数据和代码可在https://github.com/LanceZPF/MDK12 上获得。

更新时间: 2025-04-08 08:06:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05782v1

Neural Architecture Search: Two Constant Shared Weights Initialisations

In the last decade, zero-cost metrics have gained prominence in neural architecture search (NAS) due to their ability to evaluate architectures without training. These metrics are significantly faster and less computationally expensive than traditional NAS methods and provide insights into neural architectures' internal workings. This paper introduces epsinas, a novel zero-cost NAS metric that assesses architecture potential using two constant shared weight initialisations and the statistics of their outputs. We show that the dispersion of raw outputs, normalised by their average magnitude, strongly correlates with trained accuracy. This effect holds across image classification and language tasks on NAS-Bench-101, NAS-Bench-201, and NAS-Bench-NLP. Our method requires no data labels, operates on a single minibatch, and eliminates the need for gradient computation, making it independent of training hyperparameters, loss metrics, and human annotations. It evaluates a network in a fraction of a GPU second and integrates seamlessly into existing NAS frameworks. The code supporting this study can be found on GitHub at https://github.com/egracheva/epsinas.

Updated: 2025-04-08 07:57:20

标题: 神经架构搜索：两种常数共享权重初始化

摘要: 在过去的十年中，零成本指标在神经架构搜索（NAS）中日益受到重视，因为它们能够在不训练的情况下评估架构。这些指标比传统的NAS方法显著更快、计算成本更低，并且能够提供有关神经架构内部运作的见解。本文介绍了epsinas，一种新颖的零成本NAS指标，通过使用两个共享权重初始化和它们的输出统计来评估架构潜力。我们展示了原始输出的离散性，通过其平均幅度进行归一化，与训练准确度强相关。这种效应在NAS-Bench-101、NAS-Bench-201和NAS-Bench-NLP上的图像分类和语言任务中都成立。我们的方法不需要数据标签，在单个小批量上运行，并消除了梯度计算的需要，使其独立于训练超参数、损失指标和人工注释。它能够在一小部分GPU秒内评估一个网络，并且可以无缝集成到现有的NAS框架中。支持本研究的代码可在GitHub上找到：https://github.com/egracheva/epsinas。

更新时间: 2025-04-08 07:57:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2302.04406v3

Confidence-calibrated covariate shift correction for few-shot classification in Vision-Language Models

Since the establishment of vision-language foundation models as the new mainstay in low-shot vision classification tasks, the question of domain generalization arising from insufficient target data is assuming more importance. This scarcity challenge induces sampling bias and amplifies model sensitivity to variations and shifts in data distributions. While fine-tuning on multiple domains could mitigate such domain generalization issues, it is resource-intensive and demands diverse data sources. In this work, we systematically analyze two critical challenges: (1) covariate shift between the pre-training distribution and the underspecified target distribution, and (2) confidence misalignment, where predictions on novel data are overconfident. To address both challenges simultaneously, we introduce \textbf{Confidence-Calibrated Covariate Shift Correction (CalShift)} -- a unified approach that combines a Fisher information penalty to mitigate covariate shift and a Confidence Misalignment Penalty (CMP) to reduce overconfidence in misclassified examples. Experimental evaluations across various vision and covariate shift benchmarks demonstrate that CalShift significantly improves model calibration, achieving up to a 5.82\% reduction in Expected Calibration Error (ECE). Furthermore, CalShift enhances robustness, improving accuracy by 3.5\% on challenging datasets impacted by covariate shifts. Our results highlight CalShift as a promising strategy for building robust and reliable low-shot vision-language systems for real-world applications.

Updated: 2025-04-08 07:54:30

标题: 视觉-语言模型中用于少样本分类的置信度校准协变量转移校正

摘要: 自视觉-语言基础模型成为低样本视觉分类任务的新支柱以来，由于目标数据不足而产生的域泛化问题变得更加重要。这种稀缺性挑战引发了采样偏差，并增加了模型对数据分布变化和偏移的敏感性。虽然在多个领域上微调可以缓解这种域泛化问题，但这需要大量资源，并且需要多样化的数据来源。在这项工作中，我们系统地分析了两个关键挑战：(1) 预训练分布与不充分指定的目标分布之间的协变量偏移，以及 (2) 置信度不一致，即对新数据的预测过于自信。为了同时解决这两个挑战，我们引入了\textbf{置信度校准协变量偏移修正（CalShift）} - 一种统一方法，结合了费舍尔信息惩罚来缓解协变量偏移，以及置信度不一致惩罚（CMP）来减少误分类示例的过度自信。通过在各种视觉和协变量偏移基准上进行实验评估，我们发现CalShift显著提高了模型的校准性，将期望校准误差降低了高达5.82%。此外，CalShift提高了鲁棒性，在受到协变量偏移影响的具有挑战性数据集上，准确率提高了3.5%。我们的结果突显了CalShift作为建立健壮可靠的低样本视觉-语言系统用于实际应用的有前途的策略。

更新时间: 2025-04-08 07:54:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.07847v2

Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning

Birth weight serves as a fundamental indicator of neonatal health, closely linked to both early medical interventions and long-term developmental risks. Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions in diverse clinical settings. This research explores machine learning to address these limitations, utilizing a structured methodology that integrates advanced imputation strategies, supervised feature selection techniques, and predictive modeling. Given the constraints of the dataset, the research strengthens the role of data preprocessing in improving the model performance. Among the various methodologies explored, tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors, while ensemble-based regression models proved highly effective in capturing non-linear relationships and complex maternal-fetal interactions within the data. Beyond model performance, the study highlights the clinical significance of key physiological determinants, offering insights into maternal and fetal health factors that influence birth weight, offering insights that extend over statistical modeling. By bridging computational intelligence with perinatal research, this work underscores the transformative role of machine learning in enhancing predictive accuracy, refining risk assessment and informing data-driven decision-making in maternal and neonatal care. Keywords: Birth weight prediction, maternal-fetal health, MICE, BART, Gradient Boosting, neonatal outcomes, Clinipredictive.

Updated: 2025-04-08 07:54:17

标题: 利用先进的机器学习方法从高维数据预测胎儿出生体重

摘要: 出生体重是新生儿健康的基本指标，与早期医疗干预和长期发展风险密切相关。传统的预测模型通常受限于有限的特征选择和不完整的数据集，往往难以忽视多样临床环境中复杂的母胎互动。本研究探讨了机器学习来解决这些局限，利用结构化方法整合先进的插补策略、监督特征选择技术和预测建模。鉴于数据集的限制，研究加强了数据预处理在提高模型性能方面的作用。在探索的各种方法中，基于树的特征选择方法展现出较强的能力，能够识别最相关的预测因子，而基于集成的回归模型在捕捉数据中的非线性关系和复杂的母胎互动方面表现出高效。除了模型性能外，该研究突出了关键生理决定因素的临床意义，为影响出生体重的母婴健康因素提供了洞察，这些洞察超越了统计建模。通过将计算智能与围产期研究结合起来，本研究强调了机器学习在提高预测准确性、完善风险评估和指导母婴护理中数据驱动决策制定方面的转变性作用。关键词：出生体重预测，母婴健康，MICE，BART，Gradient Boosting，新生儿结果，Clinipredictive。

更新时间: 2025-04-08 07:54:17

领域: cs.LG

下载: http://arxiv.org/abs/2502.14270v2

Transferable Mask Transformer: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

Recent advances in Vision Transformers (ViTs) have set new benchmarks in semantic segmentation. However, when adapting pretrained ViTs to new target domains, significant performance degradation often occurs due to distribution shifts, resulting in suboptimal global attention. Since self-attention mechanisms are inherently data-driven, they may fail to effectively attend to key objects when source and target domains exhibit differences in texture, scale, or object co-occurrence patterns. While global and patch-level domain adaptation methods provide partial solutions, region-level adaptation with dynamically shaped regions is crucial due to spatial heterogeneity in transferability across different image areas. We present Transferable Mask Transformer (TMT), a novel region-level adaptation framework for semantic segmentation that aligns cross-domain representations through spatial transferability analysis. TMT consists of two key components: (1) An Adaptive Cluster-based Transferability Estimator (ACTE) that dynamically segments images into structurally and semantically coherent regions for localized transferability assessment, and (2) A Transferable Masked Attention (TMA) module that integrates region-specific transferability maps into ViTs' attention mechanisms, prioritizing adaptation in regions with low transferability and high semantic uncertainty. Comprehensive evaluations across 20 cross-domain pairs demonstrate TMT's superiority, achieving an average 2% MIoU improvement over vanilla fine-tuning and a 1.28% increase compared to state-of-the-art baselines. The source code will be publicly available.

Updated: 2025-04-08 07:53:51

标题: 可转移的Mask Transformer：具有区域自适应可转移性估计的跨域语义分割

摘要: 最近在Vision Transformers（ViTs）方面取得的进展在语义分割领域创造了新的基准。然而，当将预训练的ViTs适应到新的目标领域时，由于分布转移，往往会出现显著的性能下降，导致全局关注不足。由于自我关注机制本质上是数据驱动的，当源领域和目标领域在纹理、尺度或对象共现模式方面存在差异时，它们可能无法有效地关注关键对象。虽然全局和补丁级域适应方法提供了部分解决方案，但由于在不同图像区域之间的可传递性存在空间异质性，所以基于区域级别的适应对于动态形状的区域至关重要。我们提出了一种新颖的区域级适应框架Transferable Mask Transformer（TMT），用于语义分割，通过空间可传递性分析来对齐跨领域表示。TMT包括两个关键组件：（1）一种自适应基于聚类的可传递性估计器（ACTE），用于将图像动态分割为结构和语义上连贯的区域，以进行局部可传递性评估；（2）一种可传递的遮罩关注（TMA）模块，将区域特定的可传递性映射集成到ViTs的注意机制中，优先考虑那些可传递性低、语义不确定性高的区域进行适应。对20个跨领域对的全面评估显示TMT的优越性，相比普通微调平均MIoU提高了2％，相比最先进的基线提高了1.28％。源代码将公开提供。

更新时间: 2025-04-08 07:53:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05774v1

A Lightweight Multi-Module Fusion Approach for Korean Character Recognition

Optical Character Recognition (OCR) is essential in applications such as document processing, license plate recognition, and intelligent surveillance. However, existing OCR models often underperform in real-world scenarios due to irregular text layouts, poor image quality, character variability, and high computational costs. This paper introduces SDA-Net (Stroke-Sensitive Attention and Dynamic Context Encoding Network), a lightweight and efficient architecture designed for robust single-character recognition. SDA-Net incorporates: (1) a Dual Attention Mechanism to enhance stroke-level and spatial feature extraction; (2) a Dynamic Context Encoding module that adaptively refines semantic information using a learnable gating mechanism; (3) a U-Net-inspired Feature Fusion Strategy for combining low-level and high-level features; and (4) a highly optimized lightweight backbone that reduces memory and computational demands. Experimental results show that SDA-Net achieves state-of-the-art accuracy on challenging OCR benchmarks, with significantly faster inference, making it well-suited for deployment in real-time and edge-based OCR systems.

Updated: 2025-04-08 07:50:19

标题: 一种轻量级的多模块融合方法用于韩文字符识别

摘要: 光学字符识别（OCR）在文档处理、车牌识别和智能监控等应用中至关重要。然而，现有的OCR模型在现实场景中往往表现不佳，原因包括不规则的文本布局、图像质量不佳、字符变化性大和高计算成本等。本文介绍了SDA-Net（Stroke-Sensitive Attention and Dynamic Context Encoding Network），这是一种轻量级、高效的架构，旨在进行稳健的单字符识别。SDA-Net包括：（1）双重注意力机制，用于增强笔画级和空间特征提取；（2）动态上下文编码模块，使用可学习的门控机制自适应地优化语义信息；（3）受U-Net启发的特征融合策略，用于结合低级和高级特征；以及（4）高度优化的轻量级骨干网络，降低内存和计算需求。实验结果显示，SDA-Net在具有挑战性的OCR基准测试中实现了最先进的准确性，推理速度显著更快，使其非常适合部署在实时和边缘OCR系统中。

更新时间: 2025-04-08 07:50:19

领域: cs.CV,cs.AI,68T07,I.2.10

下载: http://arxiv.org/abs/2504.05770v1

Temporal Dynamic Embedding for Irregularly Sampled Time Series

In several practical applications, particularly healthcare, clinical data of each patient is individually recorded in a database at irregular intervals as required. This causes a sparse and irregularly sampled time series, which makes it difficult to handle as a structured representation of the prerequisites of neural network models. We therefore propose temporal dynamic embedding (TDE), which enables neural network models to receive data that change the number of variables over time. TDE regards each time series variable as an embedding vector evolving over time, instead of a conventional fixed structured representation, which causes a critical missing problem. For each time step, TDE allows for the selective adoption and aggregation of only observed variable subsets and represents the current status of patient based on current observations. The experiment was conducted on three clinical datasets: PhysioNet 2012, MIMIC-III, and PhysioNet 2019. The TDE model performed competitively or better than the imputation-based baseline and several recent state-of-the-art methods with reduced training runtime.

Updated: 2025-04-08 07:49:22

标题: 不规则采样时间序列的时间动态嵌入

摘要: 在几个实际应用中，特别是在医疗保健领域，每个患者的临床数据按照需要以不规则的时间间隔单独记录在数据库中。这导致了稀疏且不规则采样的时间序列，使得很难将其作为神经网络模型的先决条件的结构化表示来处理。因此，我们提出了时间动态嵌入（TDE），它使神经网络模型能够接收随时间改变变量数量的数据。TDE将每个时间序列变量视为随时间演变的嵌入向量，而不是传统的固定结构表示，这造成了一个关键的缺失问题。对于每个时间步长，TDE允许选择性采用和聚合仅观察到的变量子集，并基于当前观察表示患者的当前状态。实验在三个临床数据集上进行：PhysioNet 2012、MIMIC-III和PhysioNet 2019。TDE模型的性能优于基于插补的基线和几种最新的前沿方法，并且训练运行时间更短。

更新时间: 2025-04-08 07:49:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05768v1

AiGAS-dEVL-RC: An Adaptive Growing Neural Gas Model for Recurrently Drifting Unsupervised Data Streams

Concept drift and extreme verification latency pose significant challenges in data stream learning, particularly when dealing with recurring concept changes in dynamic environments. This work introduces a novel method based on the Growing Neural Gas (GNG) algorithm, designed to effectively handle abrupt recurrent drifts while adapting to incrementally evolving data distributions (incremental drifts). Leveraging the self-organizing and topological adaptability of GNG, the proposed approach maintains a compact yet informative memory structure, allowing it to efficiently store and retrieve knowledge of past or recurring concepts, even under conditions of delayed or sparse stream supervision. Our experiments highlight the superiority of our approach over existing data stream learning methods designed to cope with incremental non-stationarities and verification latency, demonstrating its ability to quickly adapt to new drifts, robustly manage recurring patterns, and maintain high predictive accuracy with a minimal memory footprint. Unlike other techniques that fail to leverage recurring knowledge, our proposed approach is proven to be a robust and efficient online learning solution for unsupervised drifting data flows.

Updated: 2025-04-08 07:42:50

标题: AiGAS-dEVL-RC：一种用于递归漂移无监督数据流的自适应增长神经气模型

摘要: Concept drift和极端的验证延迟在数据流学习中构成了重大挑战，特别是在处理动态环境中的重复概念变化时。本文介绍了一种基于Growing Neural Gas（GNG）算法的新方法，旨在有效处理突发的重复漂移，并适应逐渐演变的数据分布（增量漂移）。利用GNG的自组织和拓扑适应性，所提出的方法保持了紧凑而丰富的记忆结构，使其能够高效地存储和检索过去或重复概念的知识，即使在延迟或稀疏流监督的条件下也是如此。我们的实验突显了我们的方法相对于现有的旨在应对增量非稳态性和验证延迟的数据流学习方法的优越性，展示了其快速适应新漂移、稳健管理重复模式并通过最小的内存占用维持高预测准确性的能力。与其他未能利用重复知识的技术不同，我们提出的方法已被证明是一种强大且高效的在线学习解决方案，适用于无监督漂移数据流。

更新时间: 2025-04-08 07:42:50

领域: cs.LG,cs.NE,68T05 (Primary) 68T07 (Secondary)

下载: http://arxiv.org/abs/2504.05761v1

Addressing Class Imbalance with Probabilistic Graphical Models and Variational Inference

This study proposes a method for imbalanced data classification based on deep probabilistic graphical models (DPGMs) to solve the problem that traditional methods have insufficient learning ability for minority class samples. To address the classification bias caused by class imbalance, we introduce variational inference optimization probability modeling, which enables the model to adaptively adjust the representation ability of minority classes and combines the class-aware weight adjustment strategy to enhance the classifier's sensitivity to minority classes. In addition, we combine the adversarial learning mechanism to generate minority class samples in the latent space so that the model can better characterize the category boundary in the high-dimensional feature space. The experiment is evaluated on the Kaggle "Credit Card Fraud Detection" dataset and compared with a variety of advanced imbalanced classification methods (such as GAN-based sampling, BRF, XGBoost-Cost Sensitive, SAAD, HAN). The results show that the method in this study has achieved the best performance in AUC, Precision, Recall and F1-score indicators, effectively improving the recognition rate of minority classes and reducing the false alarm rate. This method can be widely used in imbalanced classification tasks such as financial fraud detection, medical diagnosis, and anomaly detection, providing a new solution for related research.

Updated: 2025-04-08 07:38:30

标题: 使用概率图模型和变分推断解决类别不平衡问题

摘要: 本研究提出了一种基于深度概率图模型（DPGMs）的不平衡数据分类方法，以解决传统方法对少数类样本学习能力不足的问题。为了解决由类别不平衡导致的分类偏差，我们引入了变分推断优化概率建模，使模型能够自适应调整少数类的表示能力，并结合了类感知权重调整策略，以增强分类器对少数类的敏感性。此外，我们结合对抗学习机制在潜在空间中生成少数类样本，使模型能够更好地表征高维特征空间中的类别边界。实验在Kaggle的“信用卡欺诈检测”数据集上进行评估，并与多种先进的不平衡分类方法（如基于GAN的采样、BRF、XGBoost-Cost Sensitive、SAAD、HAN）进行比较。结果表明，本研究方法在AUC、Precision、Recall和F1-score指标上取得了最佳表现，有效提高了少数类的识别率，降低了误报率。该方法可以广泛应用于金融欺诈检测、医学诊断和异常检测等不平衡分类任务，为相关研究提供了新的解决方案。

更新时间: 2025-04-08 07:38:30

领域: cs.LG

下载: http://arxiv.org/abs/2504.05758v1

Interpretable Non-linear Survival Analysis with Evolutionary Symbolic Regression

Survival Regression (SuR) is a key technique for modeling time to event in important applications such as clinical trials and semiconductor manufacturing. Currently, SuR algorithms belong to one of three classes: non-linear black-box -- allowing adaptability to many datasets but offering limited interpretability (e.g., tree ensembles); linear glass-box -- being easier to interpret but limited to modeling only linear interactions (e.g., Cox proportional hazards); and non-linear glass-box -- allowing adaptability and interpretability, but empirically found to have several limitations (e.g., explainable boosting machines, survival trees). In this work, we investigate whether Symbolic Regression (SR), i.e., the automated search of mathematical expressions from data, can lead to non-linear glass-box survival models that are interpretable and accurate. We propose an evolutionary, multi-objective, and multi-expression implementation of SR adapted to SuR. Our empirical results on five real-world datasets show that SR consistently outperforms traditional glass-box methods for SuR in terms of accuracy per number of dimensions in the model, while exhibiting comparable accuracy with black-box methods. Furthermore, we offer qualitative examples to assess the interpretability potential of SR models for SuR. Code at: https://github.com/lurovi/SurvivalMultiTree-pyNSGP.

Updated: 2025-04-08 07:37:37

标题: 具有进化符号回归的可解释非线性生存分析

摘要: 生存回归（SuR）是建模事件发生时间的关键技术，在临床试验和半导体制造等重要应用中发挥作用。目前，SuR算法属于三种类型之一：非线性黑匣子 - 允许适应许多数据集，但提供有限的可解释性（例如，树集成）；线性玻璃箱 - 更容易解释，但仅限于建模线性交互作用（例如，Cox比例危险）；非线性玻璃箱 - 允许适应性和可解释性，但实证发现存在一些局限性（例如，可解释增强机器，生存树）。在这项工作中，我们调查了符号回归（SR），即从数据中自动搜索数学表达式，是否可以导致可解释且准确的非线性玻璃箱生存模型。我们提出了一种适应于SuR的进化、多目标和多表达式的SR实现。我们在五个真实世界数据集上的实证结果表明，SR在模型维度数量方面始终优于传统玻璃箱方法，同时在准确性方面与黑匣子方法相当。此外，我们提供定性示例以评估SR模型在SuR中的可解释性潜力。源代码地址：https://github.com/lurovi/SurvivalMultiTree-pyNSGP.

更新时间: 2025-04-08 07:37:37

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2504.05756v1

Unraveling Human-AI Teaming: A Review and Outlook

Artificial Intelligence (AI) is advancing at an unprecedented pace, with clear potential to enhance decision-making and productivity. Yet, the collaborative decision-making process between humans and AI remains underdeveloped, often falling short of its transformative possibilities. This paper explores the evolution of AI agents from passive tools to active collaborators in human-AI teams, emphasizing their ability to learn, adapt, and operate autonomously in complex environments. This paradigm shifts challenges traditional team dynamics, requiring new interaction protocols, delegation strategies, and responsibility distribution frameworks. Drawing on Team Situation Awareness (SA) theory, we identify two critical gaps in current human-AI teaming research: the difficulty of aligning AI agents with human values and objectives, and the underutilization of AI's capabilities as genuine team members. Addressing these gaps, we propose a structured research outlook centered on four key aspects of human-AI teaming: formulation, coordination, maintenance, and training. Our framework highlights the importance of shared mental models, trust-building, conflict resolution, and skill adaptation for effective teaming. Furthermore, we discuss the unique challenges posed by varying team compositions, goals, and complexities. This paper provides a foundational agenda for future research and practical design of sustainable, high-performing human-AI teams.

Updated: 2025-04-08 07:37:25

标题: 揭示人工智能与人类协作团队：回顾与展望

摘要: 人工智能（AI）正在以前所未有的速度发展，具有明显的增强决策和提高生产力的潜力。然而，人与AI之间的协作决策过程仍然不够成熟，往往未能充分发挥其变革性可能性。本文探讨了AI代理从被动工具发展为人-AI团队中的积极合作者的演变过程，强调它们在复杂环境中学习、适应和自主运作的能力。这种范式转变挑战了传统团队动态，需要新的互动协议、委托策略和责任分配框架。基于团队情境感知（SA）理论，我们确定了当前人-AI团队研究中的两个关键空白：将AI代理与人类价值观和目标进行对齐的困难，以及未充分利用AI作为真正团队成员的能力。解决这些空白，我们提出了一个以人-AI团队的四个关键方面为中心的结构化研究展望：制定、协调、维护和训练。我们的框架强调了共享的心智模型、建立信任、解决冲突和技能适应对于有效团队合作的重要性。此外，我们讨论了由不同团队构成、目标和复杂性带来的独特挑战。本文为未来研究和可持续、高绩效人-AI团队的实际设计提供了基础议程。

更新时间: 2025-04-08 07:37:25

领域: cs.HC,cs.AI,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2504.05755v1

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Multi-agent pathfinding (MAPF) is a problem that generally requires finding collision-free paths for multiple agents in a shared environment. Solving MAPF optimally, even under restrictive assumptions, is NP-hard, yet efficient solutions for this problem are critical for numerous applications, such as automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Typically, such learning-based MAPF solvers are augmented with additional components like single-agent planning or communication. Orthogonally, in this work we rely solely on imitation learning that leverages a large dataset of expert MAPF solutions and transformer-based neural network to create a foundation model for MAPF called MAPF-GPT. The latter is capable of generating actions without additional heuristics or communication. MAPF-GPT demonstrates zero-shot learning abilities when solving the MAPF problems that are not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable MAPF solvers on a diverse range of problem instances and is computationally efficient during inference.

Updated: 2025-04-08 07:32:56

标题: MAPF-GPT: 规模化多智能体路径规划的模仿学习

摘要: 多智能体路径规划（MAPF）是一个通常需要在共享环境中为多个智能体找到无碰撞路径的问题。即使在限制性假设下，解决MAPF的最佳方法也是NP难的，然而对于许多应用，如自动仓库和运输系统，这个问题的高效解决方案至关重要。最近，基于学习的MAPF方法受到了关注，特别是那些利用深度强化学习的方法。通常，这种基于学习的MAPF求解器会使用额外的组件，如单智能体规划或通信。在这项工作中，我们完全依赖于模仿学习，利用大量专家MAPF解决方案的数据集和基于转换器的神经网络来创建一个名为MAPF-GPT的MAPF基础模型。后者能够在不需要额外启发式或通信的情况下生成动作。MAPF-GPT在解决训练数据集中不存在的MAPF问题时表现出了零-shot学习的能力。我们展示了MAPF-GPT在各种问题实例上明显优于当前表现最佳的可学习MAPF求解器，并且在推理过程中具有计算效率。

更新时间: 2025-04-08 07:32:56

领域: cs.MA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.00134v5

Secure Text Mail Encryption with Generative Adversarial Networks

This work presents an encryption model based on Generative Adversarial Networks (GANs). Encryption of RTF-8 data is realized by dynamically generating decimal numbers that lead to the encryption and decryption of alphabetic strings in integer representation by simple addition rules, the modulus of the dimension of the considered alphabet. The binary numbers for the private dynamical keys correlate with the binary numbers of public reference keys from a mapping defined by the specific GAN configuration. For reversible encryption with bijective mapping between dynamic and reference keys as defined by the GAN encryptor with random combinations of NOT logical gates between bitwise subcomponents of the transmitted text signal, secure text encryption can be realized by transferring a GAN-encrypted public key with encrypted text from a sender to a receiver. Using the technique described above, secure text mail transfer can be realized from component-wise encryption of text mail strings with total key sizes of up to $10^{8}$ bits that define random decimal numbers obtained from the GAN. From the present model, we assert that encrypted texts can be transmitted more efficiently and securely than from RSA encryption, as long as users of the specific configuration of the GAN encryption model are unaware of the GAN encryptor circuit.

Updated: 2025-04-08 07:27:57

标题: 使用生成对抗网络进行安全文本邮件加密

摘要: 这项工作提出了一种基于生成对抗网络（GANs）的加密模型。通过动态生成导致字母字符串在整数表示中通过简单的加法规则进行加密和解密的十进制数来实现RTF-8数据的加密。私有动态密钥的二进制数与公共参考密钥的二进制数相关联，后者是由特定GAN配置定义的映射确定的。对于具有双射映射的可逆加密，由GAN加密器定义，在传输文本信号的位子组件之间使用NOT逻辑门的随机组合，可以通过从发送方向接收方传输包含GAN加密的公钥和加密文本来实现安全文本加密。使用上述描述的技术，可以通过对文本邮件字符串进行逐个分量加密，总密钥大小高达$10^{8}$位，这些密钥是从GAN获得的随机十进制数。根据目前的模型，我们断言，只要特定配置的GAN加密模型的用户不知道GAN加密器电路，加密文本可以比RSA加密更有效地和安全地传输。

更新时间: 2025-04-08 07:27:57

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2504.07140v1

Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision

We present Lunima-OmniLV (abbreviated as OmniLV), a universal multimodal multi-task framework for low-level vision that addresses over 100 sub-tasks across four major categories: image restoration, image enhancement, weak-semantic dense prediction, and stylization. OmniLV leverages both textual and visual prompts to offer flexible and user-friendly interactions. Built on Diffusion Transformer (DiT)-based generative priors, our framework supports arbitrary resolutions -- achieving optimal performance at 1K resolution -- while preserving fine-grained details and high fidelity. Through extensive experiments, we demonstrate that separately encoding text and visual instructions, combined with co-training using shallow feature control, is essential to mitigate task ambiguity and enhance multi-task generalization. Our findings also reveal that integrating high-level generative tasks into low-level vision models can compromise detail-sensitive restoration. These insights pave the way for more robust and generalizable low-level vision systems.

Updated: 2025-04-08 07:26:50

标题: Lumina-OmniLV：一种统一的通用低级视觉多模态框架

摘要: 我们提出了Lunima-OmniLV（简称为OmniLV），这是一个通用的多模态多任务框架，用于处理低级视觉中的100多个子任务，涵盖了图像恢复、图像增强、弱语义密集预测和风格化四个主要类别。OmniLV利用文本和视觉提示，提供灵活且用户友好的交互。基于扩散变压器（DiT）的生成先验，我们的框架支持任意分辨率 - 在1K分辨率下实现最佳性能 - 同时保留了细节和高保真度。通过大量实验，我们证明了分别编码文本和视觉指令，并结合使用浅层特征控制进行联合训练，对于减轻任务歧义和增强多任务泛化至关重要。我们的发现还表明，将高级生成任务整合到低级视觉模型中可能会损害对细节敏感的恢复。这些见解为更强大和可泛化的低级视觉系统铺平了道路。

更新时间: 2025-04-08 07:26:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.04903v2

DDT: Decoupled Diffusion Transformer

Diffusion transformers have demonstrated remarkable generation quality, albeit requiring longer training iterations and numerous inference steps. In each denoising step, diffusion transformers encode the noisy inputs to extract the lower-frequency semantic component and then decode the higher frequency with identical modules. This scheme creates an inherent optimization dilemma: encoding low-frequency semantics necessitates reducing high-frequency components, creating tension between semantic encoding and high-frequency decoding. To resolve this challenge, we propose a new \textbf{\color{ddt}D}ecoupled \textbf{\color{ddt}D}iffusion \textbf{\color{ddt}T}ransformer~(\textbf{\color{ddt}DDT}), with a decoupled design of a dedicated condition encoder for semantic extraction alongside a specialized velocity decoder. Our experiments reveal that a more substantial encoder yields performance improvements as model size increases. For ImageNet $256\times256$, Our DDT-XL/2 achieves a new state-of-the-art performance of {1.31 FID}~(nearly $4\times$ faster training convergence compared to previous diffusion transformers). For ImageNet $512\times512$, Our DDT-XL/2 achieves a new state-of-the-art FID of 1.28. Additionally, as a beneficial by-product, our decoupled architecture enhances inference speed by enabling the sharing self-condition between adjacent denoising steps. To minimize performance degradation, we propose a novel statistical dynamic programming approach to identify optimal sharing strategies.

Updated: 2025-04-08 07:17:45

标题: DDT：解耦扩散变压器

摘要: 扩散变压器展示了出色的生成质量，尽管需要较长的训练迭代和大量的推理步骤。在每个去噪步骤中，扩散变压器对嘈杂的输入进行编码，提取低频语义组件，然后使用相同的模块解码高频部分。这种方案产生了一个固有的优化困境：编码低频语义需要减少高频组件，在语义编码和高频解码之间产生张力。为了解决这一挑战，我们提出了一种新的解耦扩散变压器（DDT），采用专用的条件编码器进行语义提取，以及专门的速度解码器的解耦设计。我们的实验表明，随着模型大小的增加，更强大的编码器会带来性能改进。对于ImageNet $256\times256$，我们的DDT-XL/2实现了{1.31 FID}的新的最先进性能（与以前的扩散变压器相比，训练收敛速度提高了近4倍）。对于ImageNet $512\times512$，我们的DDT-XL/2实现了1.28的最新最先进FID。此外，作为一个有益的副产品，我们的解耦架构通过在相邻去噪步骤之间共享自身条件来提高推理速度。为了最小化性能降级，我们提出了一种新颖的统计动态规划方法来确定最佳共享策略。

更新时间: 2025-04-08 07:17:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05741v1

Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring

In recent years, large language models (LLMs) achieve remarkable success across a variety of tasks. However, their potential in the domain of Automated Essay Scoring (AES) remains largely underexplored. Moreover, compared to English data, the methods for Chinese AES is not well developed. In this paper, we propose Rank-Then-Score (RTS), a fine-tuning framework based on large language models to enhance their essay scoring capabilities. Specifically, we fine-tune the ranking model (Ranker) with feature-enriched data, and then feed the output of the ranking model, in the form of a candidate score set, with the essay content into the scoring model (Scorer) to produce the final score. Experimental results on two benchmark datasets, HSK and ASAP, demonstrate that RTS consistently outperforms the direct prompting (Vanilla) method in terms of average QWK across all LLMs and datasets, and achieves the best performance on Chinese essay scoring using the HSK dataset.

Updated: 2025-04-08 07:10:51

标题: 排名-然后-评分：增强大型语言模型用于自动化作文评分

摘要: 近年来，大型语言模型（LLMs）在各种任务中取得了显著的成功。然而，它们在自动作文评分（AES）领域的潜力仍然未被充分挖掘。此外，与英文数据相比，中文AES的方法尚未得到很好的发展。在本文中，我们提出了一种基于大型语言模型的微调框架Rank-Then-Score（RTS），以增强其作文评分能力。具体来说，我们使用特征丰富的数据对排名模型（Ranker）进行微调，然后将排名模型的输出（候选得分集合）与作文内容一起输入评分模型（Scorer）以产生最终得分。在两个基准数据集HSK和ASAP上的实验结果表明，RTS在所有LLMs和数据集上始终优于直接提示（Vanilla）方法，且在使用HSK数据集进行中文作文评分时表现最佳。

更新时间: 2025-04-08 07:10:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.05736v1

RiemannGFM: Learning a Graph Foundation Model from Riemannian Geometry

The foundation model has heralded a new era in artificial intelligence, pretraining a single model to offer cross-domain transferability on different datasets. Graph neural networks excel at learning graph data, the omnipresent non-Euclidean structure, but often lack the generalization capacity. Hence, graph foundation model is drawing increasing attention, and recent efforts have been made to leverage Large Language Models. On the one hand, existing studies primarily focus on text-attributed graphs, while a wider range of real graphs do not contain fruitful textual attributes. On the other hand, the sequential graph description tailored for the Large Language Model neglects the structural complexity, which is a predominant characteristic of the graph. Such limitations motivate an important question: Can we go beyond Large Language Models, and pretrain a universal model to learn the structural knowledge for any graph? The answer in the language or vision domain is a shared vocabulary. We observe the fact that there also exist shared substructures underlying graph domain, and thereby open a new opportunity of graph foundation model with structural vocabulary. The key innovation is the discovery of a simple yet effective structural vocabulary of trees and cycles, and we explore its inherent connection to Riemannian geometry. Herein, we present a universal pretraining model, RiemannGFM. Concretely, we first construct a novel product bundle to incorporate the diverse geometries of the vocabulary. Then, on this constructed space, we stack Riemannian layers where the structural vocabulary, regardless of specific graph, is learned in Riemannian manifold offering cross-domain transferability. Extensive experiments show the effectiveness of RiemannGFM on a diversity of real graphs.

Updated: 2025-04-08 07:04:29

标题: RiemannGFM：从黎曼几何中学习图基础模型

摘要: 基础模型为人工智能开启了一个新时代，通过预训练单一模型实现在不同数据集上的跨领域可转移性。图神经网络擅长于学习图数据，这种无处不在的非欧几里得结构，但通常缺乏泛化能力。因此，图基础模型引起了越来越多的关注，最近的努力致力于利用大型语言模型。一方面，现有研究主要集中在文本属性图上，而更广泛的真实图并不包含丰富的文本属性。另一方面，为大型语言模型量身定制的顺序图描述忽视了图的结构复杂性，这是图的主要特征。这些限制激发了一个重要问题：我们是否可以超越大型语言模型，预训练一个通用模型来学习任何图的结构知识？在语言或视觉领域，答案是共享词汇。我们观察到，图领域也存在共享的子结构，从而为具有结构词汇的图基础模型开辟了新的机会。关键创新在于发现了树和循环的简单而有效的结构词汇，并探索了它与黎曼几何的内在联系。在此，我们提出了一个通用的预训练模型，RiemannGFM。具体而言，我们首先构建一个新颖的产品捆绑包，以整合词汇的多样几何。然后，在这个构建的空间上，我们堆叠黎曼层，其中结构词汇，无论具体图形如何，都在黎曼流形中学习，提供跨领域可转移性。大量实验显示了RiemannGFM在各种真实图上的有效性。

更新时间: 2025-04-08 07:04:29

领域: cs.LG

下载: http://arxiv.org/abs/2502.03251v2

Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach

Pneumonia, a prevalent respiratory infection, remains a leading cause of morbidity and mortality worldwide, particularly among vulnerable populations. Chest X-rays serve as a primary tool for pneumonia detection; however, variations in imaging conditions and subtle visual indicators complicate consistent interpretation. Automated tools can enhance traditional methods by improving diagnostic reliability and supporting clinical decision-making. In this study, we propose a novel multi-scale transformer approach for pneumonia detection that integrates lung segmentation and classification into a unified framework. Our method introduces a lightweight transformer-enhanced TransUNet for precise lung segmentation, achieving a Dice score of 95.68% on the "Chest X-ray Masks and Labels" dataset with fewer parameters than traditional transformers. For classification, we employ pre-trained ResNet models (ResNet-50 and ResNet-101) to extract multi-scale feature maps, which are then processed through a modified transformer module to enhance pneumonia detection. This integration of multi-scale feature extraction and lightweight transformer modules ensures robust performance, making our method suitable for resource-constrained clinical environments. Our approach achieves 93.75% accuracy on the "Kermany" dataset and 96.04% accuracy on the "Cohen" dataset, outperforming existing methods while maintaining computational efficiency. This work demonstrates the potential of multi-scale transformer architectures to improve pneumonia diagnosis, offering a scalable and accurate solution to global healthcare challenges."https://github.com/amirrezafateh/Multi-Scale-Transformer-Pneumonia"

Updated: 2025-04-08 07:00:02

标题: 高效准确的肺炎检测：一种新型多尺度Transformer方法

摘要: 肺炎是一种普遍的呼吸道感染，仍然是全球主要的致病因素和死亡原因，尤其在易受伤害的人群中更为突出。胸部X射线被用作肺炎检测的主要工具；然而，影像条件的变化和微妙的视觉指标使得解释结果变得复杂。自动化工具可以通过提高诊断的可靠性和支持临床决策来增强传统方法。在这项研究中，我们提出了一种新颖的多尺度变压器方法，用于将肺部分割和分类整合到一个统一的框架中。我们的方法引入了一种轻量级的变压器增强TransUNet来进行精确的肺部分割，在“胸部X射线掩膜和标签”数据集上实现了95.68%的Dice分数，比传统变压器的参数更少。对于分类，我们使用预训练的ResNet模型（ResNet-50和ResNet-101）来提取多尺度特征图，然后通过修改后的变压器模块来增强肺炎检测。多尺度特征提取和轻量级变压器模块的集成确保了稳健的性能，使我们的方法适用于资源受限的临床环境。我们的方法在“Kermany”数据集上实现了93.75%的准确率，在“Cohen”数据集上实现了96.04%的准确率，优于现有方法同时保持计算效率。这项工作展示了多尺度变压器架构改善肺炎诊断的潜力，为全球卫生保健挑战提供了可扩展且准确的解决方案。"https://github.com/amirrezafateh/Multi-Scale-Transformer-Pneumonia"

更新时间: 2025-04-08 07:00:02

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.04290v4

AI-Driven Prognostics for State of Health Prediction in Li-ion Batteries: A Comprehensive Analysis with Validation

This paper presents a comprehensive review of AI-driven prognostics for State of Health (SoH) prediction in lithium-ion batteries. We compare the effectiveness of various AI algorithms, including FFNN, LSTM, and BiLSTM, across multiple datasets (CALCE, NASA, UDDS) and scenarios (e.g., varying temperatures and driving conditions). Additionally, we analyze the factors influencing SoH fluctuations, such as temperature and charge-discharge rates, and validate our findings through simulations. The results demonstrate that BiLSTM achieves the highest accuracy, with an average RMSE reduction of 15% compared to LSTM, highlighting its robustness in real-world applications.

Updated: 2025-04-08 06:58:39

标题: 基于人工智能的锂离子电池健康状态预测：具验证的全面分析

摘要: 本文提出了一个全面的人工智能驱动的锂离子电池健康状态（SoH）预测的综述。我们比较了各种人工智能算法的有效性，包括FFNN，LSTM和BiLSTM，在多个数据集（CALCE，NASA，UDDS）和场景（例如，不同温度和驾驶条件）中的表现。此外，我们分析了影响SoH波动的因素，如温度和充放电速率，并通过模拟验证了我们的发现。结果表明，BiLSTM实现了最高的准确性，与LSTM相比，平均RMSE减少了15％，突显了其在实际应用中的稳健性。

更新时间: 2025-04-08 06:58:39

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.05728v1

GSON: A Group-based Social Navigation Framework with Large Multimodal Model

With the increasing presence of service robots and autonomous vehicles in human environments, navigation systems need to evolve beyond simple destination reach to incorporate social awareness. This paper introduces GSON, a novel group-based social navigation framework that leverages Large Multimodal Models (LMMs) to enhance robots' social perception capabilities. Our approach uses visual prompting to enable zero-shot extraction of social relationships among pedestrians and integrates these results with robust pedestrian detection and tracking pipelines to overcome the inherent inference speed limitations of LMMs. The planning system incorporates a mid-level planner that sits between global path planning and local motion planning, effectively preserving both global context and reactive responsiveness while avoiding disruption of the predicted social group. We validate GSON through extensive real-world mobile robot navigation experiments involving complex social scenarios such as queuing, conversations, and photo sessions. Comparative results show that our system significantly outperforms existing navigation approaches in minimizing social perturbations while maintaining comparable performance on traditional navigation metrics.

Updated: 2025-04-08 06:45:53

标题: GSON：一个具有大型多模态模型的基于群体的社交导航框架

摘要: 随着服务机器人和自动驾驶车辆在人类环境中的增加，导航系统需要超越简单的目的地到达，融入社会意识。本文介绍了一种新颖的基于群体的社交导航框架GSON，利用大型多模型模型（LMMs）来增强机器人的社交感知能力。我们的方法利用视觉提示，实现对行人之间社交关系的零-shot提取，并将这些结果与强大的行人检测和跟踪管道集成，以克服LMMs固有的推理速度限制。规划系统包含一个中级规划器，位于全局路径规划和本地运动规划之间，有效地保留全局上下文和反应灵敏性，同时避免破坏预测的社交群体。我们通过涉及排队、对话和拍照等复杂社交场景的大量实际移动机器人导航实验验证了GSON。比较结果显示，我们的系统在最小化社交干扰方面明显优于现有的导航方法，同时在传统导航指标上保持可比的性能。

更新时间: 2025-04-08 06:45:53

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.18084v2

StateAct: Enhancing LLM Base Agents via Self-prompting and State-tracking

Large language models (LLMs) are increasingly used as autonomous agents, tackling tasks from robotics to web navigation. Their performance depends on the underlying base agent. Existing methods, however, struggle with long-context reasoning and goal adherence. We introduce StateAct, a novel and efficient base agent that enhances decision-making through (1) self-prompting, which reinforces task goals at every step, and (2) chain-of-states, an extension of chain-of-thought that tracks state information over time. StateAct outperforms ReAct, the previous best base agent, by over 10% on Alfworld, 30% on Textcraft, and 7% on Webshop across multiple frontier LLMs. We also demonstrate that StateAct can be used as a drop-in replacement for ReAct with advanced LLM agent methods such as test-time scaling, yielding an additional 12% gain on Textcraft. By improving efficiency and long-range reasoning without requiring additional training or retrieval, StateAct provides a scalable foundation for LLM agents. We open source our code to support further research at https://github.com/ai-nikolai/stateact .

Updated: 2025-04-08 06:37:51

标题: StateAct：通过自我提示和状态跟踪增强LLM基础代理

摘要: 大型语言模型（LLMs）越来越被用作自主代理，从机器人到网络导航的任务。它们的性能取决于基础代理。然而，现有方法在长期上下文推理和目标遵从方面存在困难。我们引入了StateAct，一种新颖而高效的基础代理，通过（1）自我提示，强化每一步的任务目标，以及（2）状态链，一种跟踪随时间变化的状态信息的思维链扩展，增强决策制定。StateAct在多个前沿LLMs上比之前最佳基础代理ReAct表现更好，在Alfworld上提高了超过10％，在Textcraft上提高了30％，在Webshop上提高了7％。我们还证明了StateAct可以作为ReAct的替代品与先进的LLM代理方法（如测试时间缩放）一起使用，在Textcraft上额外获得12％的收益。通过提高效率和远程推理能力，无需额外训练或检索，StateAct为LLM代理提供了一个可扩展的基础。我们开源我们的代码以支持进一步研究：https://github.com/ai-nikolai/stateact。

更新时间: 2025-04-08 06:37:51

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.02810v3

Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment

We explore the use of Large Language Models (LLMs) for automated assessment of open-text student reflections and prediction of academic performance. Traditional methods for evaluating reflections are time-consuming and may not scale effectively in educational settings. In this work, we employ LLMs to transform student reflections into quantitative scores using two assessment strategies (single-agent and multi-agent) and two prompting techniques (zero-shot and few-shot). Our experiments, conducted on a dataset of 5,278 reflections from 377 students over three academic terms, demonstrate that the single-agent with few-shot strategy achieves the highest match rate with human evaluations. Furthermore, models utilizing LLM-assessed reflection scores outperform baselines in both at-risk student identification and grade prediction tasks. These findings suggest that LLMs can effectively automate reflection assessment, reduce educators' workload, and enable timely support for students who may need additional assistance. Our work emphasizes the potential of integrating advanced generative AI technologies into educational practices to enhance student engagement and academic success.

Updated: 2025-04-08 06:34:15

标题: 单一代理人与多代理人LLM策略在自动化学生反思评估中的比较

摘要: 我们探讨了大型语言模型（LLMs）用于自动评估开放文本学生反思并预测学术表现的应用。传统的评估反思方法耗时且在教育环境中可能不具有有效扩展性。在这项工作中，我们利用LLMs将学生反思转化为定量分数，采用两种评估策略（单一代理和多代理）和两种提示技术（零射击和少射击）。我们在一个包含来自377名学生的5,278个反思的数据集上进行的实验表明，少射击策略的单一代理实现了与人工评估的最高匹配率。此外，利用LLM评估的反思分数的模型在风险学生识别和成绩预测任务中均优于基线。这些发现表明LLMs可以有效地自动化反思评估，减轻教育工作者的工作量，并为可能需要额外帮助的学生提供及时支持。我们的工作强调了将先进的生成AI技术整合到教育实践中以增强学生参与度和学术成功的潜力。

更新时间: 2025-04-08 06:34:15

领域: cs.LG,cs.CY,I.2; I.6; K.3

下载: http://arxiv.org/abs/2504.05716v1

AIJIM: A Theoretical Model for Real-Time, Crowdsourced Environmental Journalism with AI

Environmental journalism is vital for raising awareness of ecological crises and supporting evidence-based policy, yet traditional methods suffer from delays, limited scalability, and lack of coverage in under-monitored regions. This paper introduces the Artificial Intelligence Journalism Integration Model (AIJIM), a conceptual and transferable theoretical model that structures real-time, AI-supported environmental journalism workflows. AIJIM combines citizen-sourced image data, automated hazard detection, dual-level validation (visual and textual), and AI-generated reporting. Validated through a pilot study in Mallorca, AIJIM achieved significant improvements in reporting speed and accuracy, while maintaining transparency and ethical oversight through Explainable AI (XAI), GDPR compliance, and community review. The model demonstrates high transferability and offers a new benchmark for scalable, responsible, and participatory journalism at the intersection of environmental communication and artificial intelligence.

Updated: 2025-04-08 06:26:24

标题: AIJIM：一个基于AI的实时、众包环境新闻的理论模型

摘要: 环境新闻对提高对生态危机的认识和支持基于证据的政策至关重要，然而传统方法存在延迟、可扩展性有限以及在监测不足的地区缺乏覆盖的问题。本文介绍了人工智能新闻整合模型（AIJIM），这是一个概念性和可转移的理论模型，用于构建实时的、由人工智能支持的环境新闻工作流程。 AIJIM结合了公民提供的图像数据、自动化危险检测、双层验证（视觉和文本）以及人工智能生成的报道。通过在马略卡岛进行的试点研究进行验证，AIJIM在报道速度和准确性方面取得了显著的改进，同时通过可解释人工智能（XAI）、GDPR合规性和社区审查来维护透明度和道德监督。该模型展示了高可转移性，并为环境沟通和人工智能交汇处的可扩展、负责任和参与式新闻设立了新的基准。

更新时间: 2025-04-08 06:26:24

领域: cs.CY,cs.AI,cs.HC,68T45,I.2.10; H.3.5; J.4

下载: http://arxiv.org/abs/2503.17401v3

CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning

Deep learning (e.g., Transformer) has been widely and successfully used in multivariate time series forecasting (MTSF). Unlike existing methods that focus on training models from a single modal of time series input, large language models (LLMs) based MTSF methods with cross-modal text and time series input have recently shown great superiority, especially with limited temporal data. However, current LLM-based MTSF methods usually focus on adapting and fine-tuning LLMs, while neglecting the distribution discrepancy between textual and temporal input tokens, thus leading to sub-optimal performance. To address this issue, we propose a novel Cross-Modal LLM Fine-Tuning (CALF) framework for MTSF by reducing the distribution discrepancy between textual and temporal data, which mainly consists of the temporal target branch with temporal input and the textual source branch with aligned textual input. To reduce the distribution discrepancy, we develop the cross-modal match module to first align cross-modal input distributions. Additionally, to minimize the modality distribution gap in both feature and output spaces, feature regularization loss is developed to align the intermediate features between the two branches for better weight updates, while output consistency loss is introduced to allow the output representations of both branches to correspond effectively. Thanks to the modality alignment, CALF establishes state-of-the-art performance for both long-term and short-term forecasting tasks with low computational complexity, and exhibiting favorable few-shot and zero-shot abilities similar to that in LLMs. Code is available at https://github.com/Hank0626/LLaTA.

Updated: 2025-04-08 06:21:53

标题: CALF：通过跨模态微调对齐LLMs用于时间序列预测

摘要: 深度学习（例如Transformer）已被广泛和成功地应用于多变量时间序列预测（MTSF）。与现有方法不同，这些方法侧重于从单个时间序列输入模态训练模型，基于大型语言模型（LLMs）的MTSF方法利用跨模态文本和时间序列输入，最近展现出极大的优势，特别是在有限的时间数据情况下。然而，当前基于LLM的MTSF方法通常侧重于调整和微调LLMs，而忽视了文本和时间输入令牌之间的分布差异，从而导致次优的性能。为了解决这个问题，我们提出了一种新颖的交叉模态LLM微调（CALF）框架，通过减少文本和时间数据之间的分布差异，主要包括具有时间输入的时间目标分支和具有对齐文本输入的文本源分支。为了减少分布差异，我们开发了交叉模态匹配模块，首先对齐跨模态输入分布。此外，为了最小化特征和输出空间中的模态分布差距，我们开发了特征正则化损失，用于调整两个分支之间的中间特征，以便更好地更新权重，同时引入了输出一致性损失，以使两个分支的输出表示能够有效对应。由于模态对齐，CALF 在长期和短期预测任务中建立了最先进的性能，计算复杂度低，并且展示了类似LLMs中的有利的少样本和零样本能力。代码可在https://github.com/Hank0626/LLaTA 上找到。

更新时间: 2025-04-08 06:21:53

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2403.07300v3

LoopGen: Training-Free Loopable Music Generation

Loops--short audio segments designed for seamless repetition--are central to many music genres, particularly those rooted in dance and electronic styles. However, current generative music models struggle to produce truly loopable audio, as generating a short waveform alone does not guarantee a smooth transition from its endpoint back to its start, often resulting in audible discontinuities. Loops--short audio segments designed for seamless repetition--are central to many music genres, particularly those rooted in dance and electronic styles. However, current generative music models struggle to produce truly loopable audio, as generating a short waveform alone does not guarantee a smooth transition from its endpoint back to its start, often resulting in audible discontinuities. We address this gap by modifying a non-autoregressive model (MAGNeT) to generate tokens in a circular pattern, letting the model attend to the beginning of the audio when creating its ending. This inference-only approach results in generations that are aware of future context and loop naturally, without the need for any additional training or data. We evaluate the consistency of loop transitions by computing token perplexity around the seam of the loop, observing a 55% improvement. Blind listening tests further confirm significant perceptual gains over baseline methods, improving mean ratings by 70%. Taken together, these results highlight the effectiveness of inference-only approaches in improving generative models and underscore the advantages of non-autoregressive methods for context-aware music generation.

Updated: 2025-04-08 06:13:10

标题: LoopGen: 无需训练的可循环音乐生成

摘要: 循环-设计用于无缝重复的短音频片段-在许多音乐流派中都起着重要作用，特别是那些根植于舞蹈和电子风格的流派。然而，当前的生成音乐模型很难产生真正可循环的音频，因为仅生成短波形并不能保证其端点回到起点时平稳过渡，通常会导致可听见的不连续性。我们通过修改一个非自回归模型（MAGNeT）来解决这一问题，使其以循环模式生成令牌，让模型在创建结束时关注音频的开头。这种仅推断的方法导致生成物能够意识到未来的上下文并自然地循环，无需任何额外的训练或数据。我们通过计算在循环接缝周围的令牌困惑度来评估循环转换的一致性，观察到55%的改善。盲听测试进一步确认相比基准方法有显著的知觉增益，平均评分提高了70%。综合这些结果，这些结果突显了仅推断方法在改进生成模型方面的有效性，并强调了非自回归方法在上下文感知音乐生成中的优势。

更新时间: 2025-04-08 06:13:10

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2504.04466v2

YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review

In the field of deep learning-based computer vision, YOLO is revolutionary. With respect to deep learning models, YOLO is also the one that is evolving the most rapidly. Unfortunately, not every YOLO model possesses scholarly publications. Moreover, there exists a YOLO model that lacks a publicly accessible official architectural diagram. Naturally, this engenders challenges, such as complicating the understanding of how the model operates in practice. Furthermore, the review articles that are presently available do not delve into the specifics of each model. The objective of this study is to present a comprehensive and in-depth architecture comparison of the four most recent YOLO models, specifically YOLOv8 through YOLO11, thereby enabling readers to quickly grasp not only how each model functions, but also the distinctions between them. To analyze each YOLO version's architecture, we meticulously examined the relevant academic papers, documentation, and scrutinized the source code. The analysis reveals that while each version of YOLO has improvements in architecture and feature extraction, certain blocks remain unchanged. The lack of scholarly publications and official diagrams presents challenges for understanding the model's functionality and future enhancement. Future developers are encouraged to provide these resources.

Updated: 2025-04-08 06:11:13

标题: YOLOv8到YOLO11：一项全面的架构深度比较评述

摘要: 在基于深度学习的计算机视觉领域，YOLO是具有革命性的。就深度学习模型而言，YOLO也是发展最迅速的一个。不幸的是，并非每个YOLO模型都有学术论文支持。此外，还存在一个缺乏公开可访问的官方架构图的YOLO模型。自然而然地，这带来了挑战，比如复杂化模型在实践中的运行方式的理解。此外，目前可用的评论文章并未深入探讨每个模型的具体情况。本研究旨在对最近的四个YOLO模型进行全面深入的架构比较，具体包括YOLOv8到YOLO11，从而使读者能够快速了解每个模型的功能以及它们之间的区别。为了分析每个YOLO版本的架构，我们仔细研究了相关学术论文、文档，并审查了源代码。分析表明，虽然每个YOLO版本在架构和特征提取方面都有改进，但某些模块保持不变。缺乏学术出版物和官方图表对于理解模型的功能和未来增强提出了挑战。鼓励未来的开发人员提供这些资源。

更新时间: 2025-04-08 06:11:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2501.13400v2

Automated Archival Descriptions with Federated Intelligence of LLMs

Enforcing archival standards requires specialized expertise, and manually creating metadata descriptions for archival materials is a tedious and error-prone task. This work aims at exploring the potential of agentic AI and large language models (LLMs) in addressing the challenges of implementing a standardized archival description process. To this end, we introduce an agentic AI-driven system for automated generation of high-quality metadata descriptions of archival materials. We develop a federated optimization approach that unites the intelligence of multiple LLMs to construct optimal archival metadata. We also suggest methods to overcome the challenges associated with using LLMs for consistent metadata generation. To evaluate the feasibility and effectiveness of our techniques, we conducted extensive experiments using a real-world dataset of archival materials, which covers a variety of document types and data formats. The evaluation results demonstrate the feasibility of our techniques and highlight the superior performance of the federated optimization approach compared to single-model solutions in metadata quality and reliability.

Updated: 2025-04-08 06:11:05

标题: 使用联合智能的LLM自动档案描述

摘要: 执行档案标准需要专业知识，手动为档案材料创建元数据描述是一项繁琐且容易出错的任务。本文旨在探讨代理人工智能和大型语言模型（LLMs）在解决实施标准档案描述流程挑战方面的潜力。为此，我们引入了一个基于代理人工智能驱动的系统，用于自动生成高质量的档案材料元数据描述。我们开发了一个联合多个LLMs智能构建最佳档案元数据的联邦优化方法。我们还提出了克服使用LLMs进行一致元数据生成所面临挑战的方法。为了评估我们技术的可行性和有效性，我们使用一个覆盖各种文档类型和数据格式的真实世界档案材料数据集进行了广泛实验。评估结果证明了我们技术的可行性，并突出了联邦优化方法在元数据质量和可靠性方面相对于单一模型解决方案的卓越表现。

更新时间: 2025-04-08 06:11:05

领域: cs.AI,cs.DL,cs.IR,cs.LG,I.2

下载: http://arxiv.org/abs/2504.05711v1

Cryptomania v.s. Minicrypt in a Quantum World

We prove that it is impossible to construct perfect-complete quantum public-key encryption (QPKE) with classical keys from quantumly secure one-way functions (OWFs) in a black-box manner, resolving a long-standing open question in quantum cryptography. Specifically, in the quantum random oracle model (QROM), no perfect-complete QPKE scheme with classical keys, and classical/quantum ciphertext can be secure. This improves the previous works which require either unproven conjectures or imposed restrictions on key generation algorithms. This impossibility even extends to QPKE with quantum public key if the public key can be uniquely determined by the secret key, and thus is tight to all existing QPKE constructions.

Updated: 2025-04-08 06:07:40

标题: 密码狂热与量子世界中的微型密码学

摘要: 我们证明了无法通过经典密钥构建完美完备的量子公钥加密（QPKE），从而解决了量子密码学中长期存在的一个问题。具体而言，在量子随机预言模型（QROM）中，没有使用经典密钥和经典/量子密文的完美完备的QPKE方案可以保证安全。这一结果改进了先前的研究，这些研究要么需要未经证实的猜想，要么对密钥生成算法施加了限制。这种不可能性甚至延伸到具有量子公钥的QPKE，如果公钥可以由秘密密钥唯一确定，这与所有现有的QPKE构造都密切相关。

更新时间: 2025-04-08 06:07:40

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2504.05710v1

Diabetic Retinopathy Detection Based on Convolutional Neural Networks with SMOTE and CLAHE Techniques Applied to Fundus Images

Diabetic retinopathy (DR) is one of the major complications in diabetic patients' eyes, potentially leading to permanent blindness if not detected timely. This study aims to evaluate the accuracy of artificial intelligence (AI) in diagnosing DR. The method employed is the Synthetic Minority Over-sampling Technique (SMOTE) algorithm, applied to identify DR and its severity stages from fundus images using the public dataset "APTOS 2019 Blindness Detection." Literature was reviewed via ScienceDirect, ResearchGate, Google Scholar, and IEEE Xplore. Classification results using Convolutional Neural Network (CNN) showed the best performance for the binary classes normal (0) and DR (1) with an accuracy of 99.55%, precision of 99.54%, recall of 99.54%, and F1-score of 99.54%. For the multiclass classification No_DR (0), Mild (1), Moderate (2), Severe (3), Proliferate_DR (4), the accuracy was 95.26%, precision 95.26%, recall 95.17%, and F1-score 95.23%. Evaluation using the confusion matrix yielded results of 99.68% for binary classification and 96.65% for multiclass. This study highlights the significant potential in enhancing the accuracy of DR diagnosis compared to traditional human analysis

Updated: 2025-04-08 05:38:53

标题: 基于卷积神经网络结合SMOTE和CLAHE技术应用于眼底图像的糖尿病视网膜病变检测

摘要: 糖尿病性视网膜病变（DR）是糖尿病患者眼睛中的一种主要并发症，如果未及时检测可能导致永久性失明。本研究旨在评估人工智能（AI）在诊断DR中的准确性。采用的方法是使用Synthetic Minority Over-sampling Technique（SMOTE）算法，应用于利用公共数据集“APTOS 2019 Blindness Detection”中的眼底图像以识别DR及其严重程度阶段。通过ScienceDirect、ResearchGate、Google Scholar和IEEE Xplore等途径对文献进行了回顾。使用卷积神经网络（CNN）进行分类的结果显示，在二元类别正常（0）和DR（1）方面，准确率为99.55％，精确率为99.54％，召回率为99.54％，F1分数为99.54％。对于多类别分类No_DR（0）、Mild（1）、Moderate（2）、Severe（3）、Proliferate_DR（4），准确率为95.26％，精确率为95.26％，召回率为95.17％，F1分数为95.23％。使用混淆矩阵进行评估，二元分类的结果为99.68％，多类别分类为96.65％。本研究突显了AI在提高DR诊断准确性方面与传统人类分析相比的重要潜力。

更新时间: 2025-04-08 05:38:53

领域: eess.IV,cs.CV,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2504.05696v1

Architecture independent generalization bounds for overparametrized deep ReLU networks

We prove that overparametrized neural networks are able to generalize with a test error that is independent of the level of overparametrization, and independent of the Vapnik-Chervonenkis (VC) dimension. We prove explicit bounds that only depend on the metric geometry of the test and training sets, on the regularity properties of the activation function, and on the operator norms of the weights and norms of biases. For overparametrized deep ReLU networks with a training sample size bounded by the input space dimension, we explicitly construct zero loss minimizers without use of gradient descent, and prove that the generalization error is independent of the network architecture.

Updated: 2025-04-08 05:37:38

标题: 深度ReLU网络过参数化的与架构无关的泛化界限

摘要: 我们证明，过度参数化的神经网络能够在泛化时表现出与过度参数化水平无关的测试错误，且与Vapnik-Chervonenkis（VC）维度无关。我们证明了明确的界限，仅取决于测试集和训练集的度量几何、激活函数的正则性属性，以及权重和偏差的算子范数。对于训练样本大小受输入空间维数限制的过度参数化深度ReLU网络，我们明确构建了零损失最小化器，而无需使用梯度下降，并证明泛化误差与网络结构无关。

更新时间: 2025-04-08 05:37:38

领域: cs.LG,cs.AI,math.AP,math.OC,stat.ML,57R70, 62M45

下载: http://arxiv.org/abs/2504.05695v1

Large Language Models Enhanced Hyperbolic Space Recommender Systems

Large Language Models (LLMs) have attracted significant attention in recommender systems for their excellent world knowledge capabilities. However, existing methods that rely on Euclidean space struggle to capture the rich hierarchical information inherent in textual and semantic data, which is essential for capturing user preferences. The geometric properties of hyperbolic space offer a promising solution to address this issue. Nevertheless, integrating LLMs-based methods with hyperbolic space to effectively extract and incorporate diverse hierarchical information is non-trivial. To this end, we propose a model-agnostic framework, named HyperLLM, which extracts and integrates hierarchical information from both structural and semantic perspectives. Structurally, HyperLLM uses LLMs to generate multi-level classification tags with hierarchical parent-child relationships for each item. Then, tag-item and user-item interactions are jointly learned and aligned through contrastive learning, thereby providing the model with clear hierarchical information. Semantically, HyperLLM introduces a novel meta-optimized strategy to extract hierarchical information from semantic embeddings and bridge the gap between the semantic and collaborative spaces for seamless integration. Extensive experiments show that HyperLLM significantly outperforms recommender systems based on hyperbolic space and LLMs, achieving performance improvements of over 40%. Furthermore, HyperLLM not only improves recommender performance but also enhances training stability, highlighting the critical role of hierarchical information in recommender systems.

Updated: 2025-04-08 05:35:38

标题: 大型语言模型增强的双曲空间推荐系统

摘要: 大型语言模型（LLMs）因其出色的世界知识能力而在推荐系统中引起了重大关注。然而，依赖欧几里德空间的现有方法难以捕捉文本和语义数据中固有的丰富层次信息，这对于捕捉用户偏好至关重要。双曲空间的几何特性提供了一个有前景的解决方案来解决这个问题。然而，将基于LLMs的方法与双曲空间有效地结合起来以提取和整合多样的层次信息是非常困难的。为此，我们提出了一个名为HyperLLM的模型无关框架，从结构和语义两个角度提取和整合层次信息。在结构上，HyperLLM使用LLMs为每个项目生成具有层次父子关系的多级分类标签。然后，通过对比学习共同学习和对齐标签-项目和用户-项目的交互作用，从而为模型提供清晰的层次信息。在语义上，HyperLLM引入了一种新颖的元优化策略，从语义嵌入中提取层次信息，并弥合语义空间和协作空间之间的差距，实现无缝整合。大量实验证明，HyperLLM明显优于基于双曲空间和LLMs的推荐系统，性能提高超过40%。此外，HyperLLM不仅提高了推荐性能，还增强了训练稳定性，突显了层次信息在推荐系统中的关键作用。

更新时间: 2025-04-08 05:35:38

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2504.05694v1

STRIVE: A Think & Improve Approach with Iterative Refinement for Enhancing Question Quality Estimation

Automatically assessing question quality is crucial for educators as it saves time, ensures consistency, and provides immediate feedback for refining teaching materials. We propose a novel methodology called STRIVE (Structured Thinking and Refinement with multiLLMs for Improving Verified Question Estimation) using a series of Large Language Models (LLMs) for automatic question evaluation. This approach aims to improve the accuracy and depth of question quality assessment, ultimately supporting diverse learners and enhancing educational practices. The method estimates question quality in an automated manner by generating multiple evaluations based on the strengths and weaknesses of the provided question and then choosing the best solution generated by the LLM. Then the process is improved by iterative review and response with another LLM until the evaluation metric values converge. This sophisticated method of evaluating question quality improves the estimation of question quality by automating the task of question quality evaluation. Correlation scores show that using this proposed method helps to improve correlation with human judgments compared to the baseline method. Error analysis shows that metrics like relevance and appropriateness improve significantly relative to human judgments by using STRIVE.

Updated: 2025-04-08 05:34:38

标题: STRIVE：一种基于迭代细化的思考和改进方法，用于提高问题质量估计。

摘要: 自动评估问题质量对教育者至关重要，因为它节省时间，确保一致性，并为改进教材提供即时反馈。我们提出了一种名为STRIVE（使用多个大型语言模型进行结构化思考和改进以提高验证问题估计的方法）的新方法，利用一系列大型语言模型（LLMs）进行自动问题评估。这种方法旨在提高问题质量评估的准确性和深度，最终支持不同学习者，并增强教育实践。该方法通过根据提供的问题的优势和劣势生成多个评估，然后选择由LLM生成的最佳解决方案的自动方式来估计问题质量。然后，通过与另一个LLM的迭代审查和响应来改进该过程，直到评估指标值收敛。这种评估问题质量的复杂方法通过自动化问题质量评估的任务来改进问题质量的估计。相关性分数表明，使用这种提出的方法有助于提高与基线方法相比与人类判断的相关性。错误分析表明，使用STRIVE相对于人类判断明显提高了相关性和适当性等指标。

更新时间: 2025-04-08 05:34:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.05693v1

StayLTC: A Cost-Effective Multimodal Framework for Hospital Length of Stay Forecasting

Accurate prediction of Length of Stay (LOS) in hospitals is crucial for improving healthcare services, resource management, and cost efficiency. This paper presents StayLTC, a multimodal deep learning framework developed to forecast real-time hospital LOS using Liquid Time-Constant Networks (LTCs). LTCs, with their continuous-time recurrent dynamics, are evaluated against traditional models using structured data from Electronic Health Records (EHRs) and clinical notes. Our evaluation, conducted on the MIMIC-III dataset, demonstrated that LTCs significantly outperform most of the other time series models, offering enhanced accuracy, robustness, and efficiency in resource utilization. Additionally, LTCs demonstrate a comparable performance in LOS prediction compared to time series large language models, while requiring significantly less computational power and memory, underscoring their potential to advance Natural Language Processing (NLP) tasks in healthcare.

Updated: 2025-04-08 05:27:53

标题: StayLTC：一种成本效益的医院住院天数预测多模态框架

摘要: 准确预测住院时间（LOS）对于改善医疗服务、资源管理和成本效率至关重要。本文介绍了StayLTC，这是一个多模态深度学习框架，旨在利用液态时间恒定网络（LTCs）预测实时医院LOS。通过使用来自电子健康记录（EHRs）和临床笔记的结构化数据，评估了具有连续时间循环动态的LTCs与传统模型的差异。我们在MIMIC-III数据集上进行的评估表明，LTCs明显优于大多数其他时间序列模型，提供了增强的准确性、鲁棒性和资源利用效率。此外，与时间序列大型语言模型相比，LTCs在LOS预测中表现出可比较的性能，同时需要较少的计算资源和内存，突显了它们在推进医疗保健中自然语言处理（NLP）任务的潜力。

更新时间: 2025-04-08 05:27:53

领域: cs.AI

下载: http://arxiv.org/abs/2504.05691v1

Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers

Machine learning (ML) for text classification has been widely used in various domains. These applications can significantly impact ethics, economics, and human behavior, raising serious concerns about trusting ML decisions. Studies indicate that conventional metrics are insufficient to build human trust in ML models. These models often learn spurious correlations and predict based on them. In the real world, their performance can deteriorate significantly. To avoid this, a common practice is to test whether predictions are reasonable based on valid patterns in the data. Along with this, a challenge known as the trustworthiness oracle problem has been introduced. Due to the lack of automated trustworthiness oracles, the assessment requires manual validation of the decision process disclosed by explanation methods. However, this is time-consuming, error-prone, and unscalable. We propose TOKI, the first automated trustworthiness oracle generation method for text classifiers. TOKI automatically checks whether the words contributing the most to a prediction are semantically related to the predicted class. Specifically, we leverage ML explanations to extract the decision-contributing words and measure their semantic relatedness with the class based on word embeddings. We also introduce a novel adversarial attack method that targets trustworthiness vulnerabilities identified by TOKI. To evaluate their alignment with human judgement, experiments are conducted. We compare TOKI with a naive baseline based solely on model confidence and TOKI-guided adversarial attack method with A2T, a SOTA adversarial attack method. Results show that relying on prediction uncertainty cannot effectively distinguish between trustworthy and untrustworthy predictions, TOKI achieves 142% higher accuracy than the naive baseline, and TOKI-guided attack method is more effective with fewer perturbations than A2T.

Updated: 2025-04-08 05:22:52

标题: 自动生成可信赖的神经网络文本分类器信任度测试器

摘要: 文本分类的机器学习（ML）已被广泛应用于各个领域。这些应用可能会对伦理、经济和人类行为产生重大影响，引发人们对信任ML决策的严重担忧。研究表明，传统的评估指标不足以建立人类对ML模型的信任。这些模型经常学习到虚假相关性，并基于这些相关性进行预测。在现实世界中，它们的性能可能会显著下降。为了避免这种情况，一种常见的做法是通过在数据中寻找有效模式来测试预测是否合理。除此之外，还出现了一个被称为可信赖预测的问题。由于缺乏自动的可信赖预测方法，评估需要通过解释方法披露的决策过程进行手动验证。然而，这种方法耗时、容易出错且不可扩展。我们提出了TOKI，这是第一个针对文本分类器的自动可信赖预测方法。TOKI会自动检查对预测起到最大贡献的词是否与预测的类别在语义上相关。具体来说，我们利用ML解释来提取决策贡献词，并基于词嵌入衡量它们与类别的语义相关性。我们还引入了一种新颖的对抗攻击方法，该方法针对TOKI识别出的可信赖性漏洞。为了评估它们与人类判断的一致性，进行了实验。我们将TOKI与仅基于模型置信度的天真基线以及TOKI引导的对抗攻击方法与A2T（一种SOTA对抗攻击方法）进行比较。结果表明，依赖预测的不确定性无法有效区分可信赖和不可信赖的预测，TOKI的准确率比天真基线高出142％，而TOKI引导的攻击方法比A2T更有效，扰动更少。

更新时间: 2025-04-08 05:22:52

领域: cs.SE,cs.CL,cs.CR

下载: http://arxiv.org/abs/2410.22663v2

Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators

Conversational large language models (LLMs) have gained widespread attention due to their instruction-following capabilities. To ensure conversational LLMs follow instructions, role separators are employed to distinguish between different participants in a conversation. However, incorporating role separators introduces potential vulnerabilities. Misusing roles can lead to prompt injection attacks, which can easily misalign the model's behavior with the user's intentions, raising significant security concerns. Although various prompt injection attacks have been proposed, recent research has largely overlooked the impact of role separators on safety. This highlights the critical need to thoroughly understand the systemic weaknesses in dialogue systems caused by role separators. This paper identifies modeling weaknesses caused by role separators. Specifically, we observe a strong positional bias associated with role separators, which is inherent in the format of dialogue modeling and can be triggered by the insertion of role separators. We further develop the Separators Injection Attack (SIA), a new orthometric attack based on role separators. The experiment results show that SIA is efficient and extensive in manipulating model behavior with an average gain of 18.2% for manual methods and enhances the attack success rate to 100% with automatic methods.

Updated: 2025-04-08 05:20:56

标题: 分隔符注入攻击：揭示由角色分隔符引起的大型语言模型中的对话偏见

摘要: 会话式大型语言模型（LLMs）因其遵循指示的能力而受到广泛关注。为确保会话式LLMs遵循指示，角色分隔符被用来区分对话中的不同参与者。然而，整合角色分隔符会引入潜在的漏洞。滥用角色可能会导致提示注入攻击，这可能会导致模型的行为与用户意图不一致，引发重大安全问题。尽管已提出各种提示注入攻击，但最近的研究在很大程度上忽视了角色分隔符对安全性的影响。这突显了深入了解对话系统中由角色分隔符引起的系统性弱点的迫切需要。本文确定了由角色分隔符引起的建模弱点。具体而言，我们观察到与角色分隔符相关的强烈位置偏见，这是对话建模格式固有的，可以通过插入角色分隔符来触发。我们进一步开发了基于角色分隔符的新的正交攻击——分隔符注入攻击（SIA）。实验结果显示，SIA在操纵模型行为方面高效而广泛，平均获得18.2%的手动方法，并将攻击成功率提高到100%的自动方法。

更新时间: 2025-04-08 05:20:56

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2504.05689v1

Large Language Model for Patent Concept Generation

In traditional innovation practices, concept and IP generation are often iteratively integrated. Both processes demand an intricate understanding of advanced technical domain knowledge. Existing large language models (LLMs), while possessing massive pre-trained knowledge, often fall short in the innovative concept generation due to a lack of specialized knowledge necessary for the generation. To bridge this critical gap, we propose a novel knowledge finetuning (KFT) framework to endow LLM-based AI with the ability to autonomously mine, understand, and apply domain-specific knowledge and concepts for invention generation, i.e., concept and patent generation together. Our proposed PatentGPT integrates knowledge injection pre-training (KPT), domain-specific supervised finetuning (SFT), and reinforcement learning from human feedback (RLHF). Extensive evaluation shows that PatentGPT significantly outperforms the state-of-the-art models on patent-related benchmark tests. Our method not only provides new insights into data-driven innovation but also paves a new path to fine-tune LLMs for applications in the context of technology. We also discuss the managerial and policy implications of AI-generating inventions in the future.

Updated: 2025-04-08 05:07:10

标题: 专利概念生成的大型语言模型

摘要: 在传统的创新实践中，概念和知识产权的生成通常是迭代集成的。这两个过程都需要对先进技术领域知识的复杂理解。现有的大型语言模型（LLMs），虽然具有大量的预训练知识，但在创新概念生成方面往往表现不佳，因为缺乏生成所需的专业知识。为了弥补这一关键差距，我们提出了一种新颖的知识微调（KFT）框架，赋予基于LLM的人工智能能力，自主地挖掘、理解和应用领域特定的知识和概念，用于发明生成，即概念和专利生成。我们提出的PatentGPT集成了知识注入预训练（KPT）、领域特定的监督微调（SFT）以及从人类反馈中进行强化学习（RLHF）。广泛的评估表明，PatentGPT在专利相关基准测试中明显优于最先进的模型。我们的方法不仅为数据驱动的创新提供了新的见解，还为在技术背景下调整LLMs应用铺平了一条新道路。我们还讨论了未来AI生成发明的管理和政策影响。

更新时间: 2025-04-08 05:07:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.00092v3

kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization

Robustness is critical in zero-shot singing voice conversion (SVC). This paper introduces two novel methods to strengthen the robustness of the kNN-VC framework for SVC. First, kNN-VC's core representation, WavLM, lacks harmonic emphasis, resulting in dull sounds and ringing artifacts. To address this, we leverage the bijection between WavLM, pitch contours, and spectrograms to perform additive synthesis, integrating the resulting waveform into the model to mitigate these issues. Second, kNN-VC overlooks concatenative smoothness, a key perceptual factor in SVC. To enhance smoothness, we propose a new distance metric that filters out unsuitable kNN candidates and optimize the summing weights of the candidates during inference. Although our techniques are built on the kNN-VC framework for implementation convenience, they are broadly applicable to general concatenative neural synthesis models. Experimental results validate the effectiveness of these modifications in achieving robust SVC. Demo: http://knnsvc.com Code: https://github.com/SmoothKen/knn-svc

Updated: 2025-04-08 04:59:56

标题: kNN-SVC：具有加法合成和连接平滑优化的稳健零样本歌声转换

摘要: 鲁棒性在零样本歌声转换（SVC）中至关重要。本文介绍了两种新方法，以加强kNN-VC框架在SVC中的鲁棒性。首先，kNN-VC的核心表示，WavLM，缺乏和声强调，导致声音沉闷且带有回响痕迹。为了解决这个问题，我们利用WavLM、音高轮廓和谱图之间的双射关系进行加法合成，将生成的波形整合到模型中以减轻这些问题。其次，kNN-VC忽视了串联平滑性，这是SVC中的一个关键感知因素。为了增强平滑性，我们提出了一种新的距离度量，用于过滤不合适的kNN候选者，并在推理过程中优化候选者的求和权重。尽管我们的技术是建立在kNN-VC框架上以实现方便性，但它们广泛适用于一般的串联神经合成模型。实验结果验证了这些修改在实现鲁棒SVC方面的有效性。演示：http://knnsvc.com 代码：https://github.com/SmoothKen/knn-svc

更新时间: 2025-04-08 04:59:56

领域: cs.SD,cs.AI,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2504.05686v1

Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study

Ensuring the safety of Large Language Models (LLMs) in diverse linguistic settings remains challenging, particularly for low-resource languages. Existing safety alignment methods are English-centric, limiting their effectiveness. We systematically compare Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Kahneman-Tversky Optimization (KTO) for aligning SEA-Lion-v2.1-Instruct, a Llama 3-8B variant, to reduce toxicity in Singlish. Our results show that SFT+KTO achieves superior safety alignment with higher sample efficiency than DPO. Additionally, we introduce KTO-S, which enhances stability via improved KL divergence regularization. Our approach reduces Singlish toxicity by 99\%, generalizes to TOXIGEN, and maintains strong performance on standard LLM benchmarks, providing a scalable framework for safer AI deployment in multilingual contexts.

Updated: 2025-04-08 04:50:41

标题: 在边缘处安全：低资源英语语言中安全对齐的一般方法--以新加坡英语为例研究

摘要: 确保大型语言模型（LLMs）在不同语言环境中的安全性仍然具有挑战性，尤其是对于资源匮乏的语言。现有的安全对齐方法以英语为中心，限制了它们的有效性。我们系统地比较了监督微调（SFT）、直接偏好优化（DPO）和卡内曼-特沃斯基优化（KTO）用于对齐SEA-Lion-v2.1-Instruct，一种Llama 3-8B变种，以减少Singlish中的毒性。我们的结果表明，SFT+KTO在样本效率更高的情况下实现了卓越的安全对齐，胜过DPO。此外，我们引入了KTO-S，通过改进的KL散度正则化增强了稳定性。我们的方法将Singlish的毒性降低了99％，并且在TOXIGEN上具有泛化能力，并在标准LLM基准测试中保持了强大的性能，为多语境下更安全的AI部署提供了可扩展的框架。

更新时间: 2025-04-08 04:50:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.12485v2

TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis

This paper introduces Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning (TARO), a novel framework for high-fidelity and temporally coherent video-to-audio synthesis. Built upon flow-based transformers, which offer stable training and continuous transformations for enhanced synchronization and audio quality, TARO introduces two key innovations: (1) Timestep-Adaptive Representation Alignment (TRA), which dynamically aligns latent representations by adjusting alignment strength based on the noise schedule, ensuring smooth evolution and improved fidelity, and (2) Onset-Aware Conditioning (OAC), which integrates onset cues that serve as sharp event-driven markers of audio-relevant visual moments to enhance synchronization with dynamic visual events. Extensive experiments on the VGGSound and Landscape datasets demonstrate that TARO outperforms prior methods, achieving relatively 53\% lower Frechet Distance (FD), 29% lower Frechet Audio Distance (FAD), and a 97.19% Alignment Accuracy, highlighting its superior audio quality and synchronization precision.

Updated: 2025-04-08 04:49:36

标题: TARO：基于时间步自适应表示对齐和以起始时间为条件的同步视频到音频合成

摘要: 这篇论文介绍了Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning (TARO)，这是一个用于高保真度和时间连贯的视频到音频合成的新框架。建立在基于流的变压器之上，这些变压器提供稳定的训练和连续的转换，以增强同步性和音频质量，TARO引入了两个关键创新：(1) Timestep-Adaptive Representation Alignment (TRA)，通过根据噪声计划调整对齐强度来动态对齐潜在表示，确保平滑演变和改进保真度；(2) Onset-Aware Conditioning (OAC)，集成了起始提示，这些提示作为音频相关视觉时刻的尖锐事件驱动标记，以增强与动态视觉事件的同步。对VGGSound和Landscape数据集进行的大量实验表明，TARO优于先前的方法，实现了相对53%更低的Frechet距离（FD）、29%更低的Frechet音频距离（FAD）和97.19%的对齐准确度，突显了其优越的音频质量和同步精度。

更新时间: 2025-04-08 04:49:36

领域: cs.SD,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.05684v1

STNAGNN: Data-driven Spatio-temporal Brain Connectivity beyond FC

In recent years, graph neural networks (GNNs) have been widely applied in the analysis of brain fMRI, yet defining the connectivity between ROIs remains a challenge in noisy fMRI data. Among all approaches, Functional Connectome (FC) is the most popular method. Computed by the correlation coefficients between ROI time series, FC is a powerful and computationally efficient way to estimate ROI connectivity. However, it is well known for neglecting structural connections and causality in ROI interactions. Also, FC becomes much more noisy in the short spatio-temporal sliding-window subsequences of fMRI. Effective Connectome (EC) is proposed as a directional alternative, but is difficult to accurately estimate. Furthermore, for optimal GNN performance, usually only a small percentage of the strongest connections are selected as sparse edges, resulting in oversimplification of complex brain connections. To tackle these challenges, we propose the Spatio-Temporal Node Attention Graph Neural Network (STNAGNN) as a data-driven alternative that combines sparse predefined FC with dense data-driven spatio-temporal connections, allowing for flexible and spatio-temporal learning of ROI interaction patterns.

Updated: 2025-04-08 04:47:57

标题: STNAGNN: 数据驱动的超越功能连接的时空脑连接

摘要: 最近几年，图神经网络（GNN）广泛应用于大脑fMRI分析，然而在嘈杂的fMRI数据中定义ROI之间的连接仍然是一个挑战。在所有方法中，功能连接组（FC）是最流行的方法。通过计算ROI时间序列之间的相关系数，FC是估计ROI连接的一种强大且计算效率高的方式。然而，众所周知，FC忽略了ROI交互中的结构连接和因果关系。此外，在fMRI的短时空滑动窗口子序列中，FC变得更加嘈杂。提出了有效连接组（EC）作为一个方向性的替代方案，但很难准确估计。此外，为了获得最佳的GNN性能，通常只选择最强连接的一小部分作为稀疏边，导致对复杂脑连接的过度简化。为了解决这些挑战，我们提出了时空节点注意力图神经网络（STNAGNN）作为一种数据驱动的替代方案，将稀疏预定义的FC与密集的数据驱动时空连接相结合，允许灵活和时空学习ROI交互模式。

更新时间: 2025-04-08 04:47:57

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.12065v2

Towards Smarter Hiring: Are Zero-Shot and Few-Shot Pre-trained LLMs Ready for HR Spoken Interview Transcript Analysis?

This research paper presents a comprehensive analysis of the performance of prominent pre-trained large language models (LLMs), including GPT-4 Turbo, GPT-3.5 Turbo, text-davinci-003, text-babbage-001, text-curie-001, text-ada-001, llama-2-7b-chat, llama-2-13b-chat, and llama-2-70b-chat, in comparison to expert human evaluators in providing scores, identifying errors, and offering feedback and improvement suggestions to candidates during mock HR (Human Resources) interviews. We introduce a dataset called HURIT (Human Resource Interview Transcripts), which comprises 3,890 HR interview transcripts sourced from real-world HR interview scenarios. Our findings reveal that pre-trained LLMs, particularly GPT-4 Turbo and GPT-3.5 Turbo, exhibit commendable performance and are capable of producing evaluations comparable to those of expert human evaluators. Although these LLMs demonstrate proficiency in providing scores comparable to human experts in terms of human evaluation metrics, they frequently fail to identify errors and offer specific actionable advice for candidate performance improvement in HR interviews. Our research suggests that the current state-of-the-art pre-trained LLMs are not fully conducive for automatic deployment in an HR interview assessment. Instead, our findings advocate for a human-in-the-loop approach, to incorporate manual checks for inconsistencies and provisions for improving feedback quality as a more suitable strategy.

Updated: 2025-04-08 04:46:10

标题: 走向更聪明的招聘：零射和少射预训练LLMs是否准备好用于人力资源口头面试文本分析？

摘要: 这项研究论文对知名的预训练大型语言模型（LLMs），包括GPT-4 Turbo、GPT-3.5 Turbo、text-davinci-003、text-babbage-001、text-curie-001、text-ada-001、llama-2-7b-chat、llama-2-13b-chat和llama-2-70b-chat的性能进行了全面分析，与专业人类评估者在模拟人力资源（HR）面试中提供评分、识别错误和提供反馈和改进建议的能力进行了比较。我们引入了一个名为HURIT（人力资源面试转录）的数据集，包括来源于真实世界HR面试场景的3,890份HR面试转录。我们的研究结果显示，预训练的LLMs，特别是GPT-4 Turbo和GPT-3.5 Turbo，表现出令人称赞的性能，能够产生与专业人类评估者相媲美的评估结果。尽管这些LLMs在提供得分方面表现出与人类专家相当的熟练度，但它们经常无法识别错误并为候选人在HR面试中提供具体可行的改进建议。我们的研究表明，目前的最先进的预训练LLMs并不完全适合自动部署在HR面试评估中。相反，我们的研究主张采用人机协同的方法，以纳入手动检查不一致性和改进反馈质量的条款作为更合适的策略。

更新时间: 2025-04-08 04:46:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.05683v1

Dual Boost-Driven Graph-Level Clustering Network

Graph-level clustering remains a pivotal yet formidable challenge in graph learning. Recently, the integration of deep learning with representation learning has demonstrated notable advancements, yielding performance enhancements to a certain degree. However, existing methods suffer from at least one of the following issues: 1. the original graph structure has noise, and 2. during feature propagation and pooling processes, noise is gradually aggregated into the graph-level embeddings through information propagation. Consequently, these two limitations mask clustering-friendly information, leading to suboptimal graph-level clustering performance. To this end, we propose a novel Dual Boost-Driven Graph-Level Clustering Network (DBGCN) to alternately promote graph-level clustering and filtering out interference information in a unified framework. Specifically, in the pooling step, we evaluate the contribution of features at the global and optimize them using a learnable transformation matrix to obtain high-quality graph-level representation, such that the model's reasoning capability can be improved. Moreover, to enable reliable graph-level clustering, we first identify and suppress information detrimental to clustering by evaluating similarities between graph-level representations, providing more accurate guidance for multi-view fusion. Extensive experiments demonstrated that DBGCN outperforms the state-of-the-art graph-level clustering methods on six benchmark datasets.

Updated: 2025-04-08 04:32:46

标题: 双向增强驱动的图级聚类网络

摘要: 图级聚类仍然是图学习中一个重要但艰巨的挑战。最近，深度学习与表示学习的整合已经展示出显著的进展，从而在一定程度上提高了性能。然而，现有方法存在以下至少一种问题：1.原始图结构存在噪音，2.在特征传播和池化过程中，噪音逐渐聚集到图级嵌入中，通过信息传播。因此，这两个限制掩盖了有利于聚类的信息，导致次优的图级聚类性能。因此，我们提出了一种新颖的双增强驱动图级聚类网络（DBGCN），在一个统一的框架中交替促进图级聚类并过滤干扰信息。具体来说，在池化步骤中，我们评估全局特征的贡献，并使用可学习的转换矩阵优化它们，以获得高质量的图级表示，从而提高模型的推理能力。此外，为了实现可靠的图级聚类，我们首先通过评估图级表示之间的相似性来识别和抑制对聚类有害的信息，为多视图融合提供更准确的指导。大量实验证明，DBGCN在六个基准数据集上优于最先进的图级聚类方法。

更新时间: 2025-04-08 04:32:46

领域: cs.LG

下载: http://arxiv.org/abs/2504.05670v1

RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection

Neural Architecture Search (NAS) is an automated technique to design optimal neural network architectures for a specific workload. Conventionally, evaluating candidate networks in NAS involves extensive training, which requires significant time and computational resources. To address this, training-free NAS has been proposed to expedite network evaluation with minimal search time. However, state-of-the-art training-free NAS algorithms struggle to precisely distinguish well-performing networks from poorly-performing networks, resulting in inaccurate performance predictions and consequently sub-optimal top-1 network accuracy. Moreover, they are less effective in activation function exploration. To tackle the challenges, this paper proposes RBFleX-NAS, a novel training-free NAS framework that accounts for both activation outputs and input features of the last layer with a Radial Basis Function (RBF) kernel. We also present a detection algorithm to identify optimal hyperparameters using the obtained activation outputs and input feature maps. We verify the efficacy of RBFleX-NAS over a variety of NAS benchmarks. RBFleX-NAS significantly outperforms state-of-the-art training-free NAS methods in terms of top-1 accuracy, achieving this with short search time in NAS-Bench-201 and NAS-Bench-SSS. In addition, it demonstrates higher Kendall correlation compared to layer-based training-free NAS algorithms. Furthermore, we propose NAFBee, a new activation design space that extends the activation type to encompass various commonly used functions. In this extended design space, RBFleX-NAS demonstrates its superiority by accurately identifying the best-performing network during activation function search, providing a significant advantage over other NAS algorithms.

Updated: 2025-04-08 04:25:57

标题: RBFleX-NAS：使用径向基函数核和超参数检测的无需训练的神经架构搜索

摘要: 神经架构搜索（NAS）是一种自动化技术，用于为特定工作负载设计最佳神经网络架构。传统上，在NAS中评估候选网络涉及大量训练，这需要大量时间和计算资源。为了解决这个问题，提出了无需训练的NAS，以最小搜索时间加快网络评估。然而，最先进的无训练NAS算法很难准确区分表现良好的网络和表现不佳的网络，导致性能预测不准确，进而导致子优化的top-1网络准确性。此外，它们在激活函数探索方面效果较差。为了解决这些挑战，本文提出了RBFleX-NAS，这是一个新颖的无训练NAS框架，考虑了最后一层的激活输出和输入特征，使用径向基函数（RBF）核。我们还提出了一种检测算法，利用获得的激活输出和输入特征图来识别最佳超参数。我们验证了RBFleX-NAS在各种NAS基准测试中的有效性。在NAS-Bench-201和NAS-Bench-SSS中，RBFleX-NAS在top-1准确性方面明显优于最先进的无训练NAS方法，并且在短时间内实现了这一点。此外，与基于层的无训练NAS算法相比，它展示了更高的Kendall相关性。此外，我们提出了NAFBee，一个新的激活设计空间，扩展了激活类型，包括各种常用函数。在这个扩展的设计空间中，RBFleX-NAS通过准确识别激活函数搜索中表现最佳的网络，展示了其优势，比其他NAS算法具有显著优势。

更新时间: 2025-04-08 04:25:57

领域: cs.LG

下载: http://arxiv.org/abs/2503.22733v2

HypRL: Reinforcement Learning of Control Policies for Hyperproperties

We study the problem of learning control policies for complex tasks whose requirements are given by a hyperproperty. The use of hyperproperties is motivated by their significant power to formally specify requirements of multi-agent systems as well as those that need expressiveness in terms of multiple execution traces (e.g., privacy and fairness). Given a Markov decision process M with unknown transitions (representing the environment) and a HyperLTL formula $\varphi$, our approach first employs Skolemization to handle quantifier alternations in $\varphi$. We introduce quantitative robustness functions for HyperLTL to define rewards of finite traces of M with respect to $\varphi$. Finally, we utilize a suitable reinforcement learning algorithm to learn (1) a policy per trace quantifier in $\varphi$, and (2) the probability distribution of transitions of M that together maximize the expected reward and, hence, probability of satisfaction of $\varphi$ in M. We present a set of case studies on (1) safety-preserving multi-agent path planning, (2) fairness in resource allocation, and (3) the post-correspondence problem (PCP).

Updated: 2025-04-08 04:19:02

标题: HypRL：用于超属性控制策略的强化学习

摘要: 我们研究了学习控制策略的问题，这些策略针对复杂任务的要求由超性质给出。使用超性质的动机在于它们显著地能够形式化地指定多智能体系统的要求，以及那些需要在多个执行轨迹方面具有表达能力的要求（例如隐私和公平性）。给定一个具有未知转移的马尔可夫决策过程M（表示环境）和一个HyperLTL公式ϕ，我们的方法首先使用Skolemization来处理ϕ中的量词替换。我们为HyperLTL引入了量化稳健函数，以定义有限M轨迹对ϕ的奖励。最后，我们利用适当的强化学习算法来学习（1）ϕ中每个轨迹量词的策略，以及（2）M转移的概率分布，这两者共同最大化预期奖励和因此M中ϕ满足概率。我们展示了一组案例研究：（1）保持安全的多智能体路径规划，（2）资源分配中的公平性，以及（3）后对应问题（PCP）。

更新时间: 2025-04-08 04:19:02

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2504.04675v2

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing

Speech foundation models have significantly advanced various speech-related tasks by providing exceptional representation capabilities. However, their high-dimensional output features often create a mismatch with downstream task models, which typically require lower-dimensional inputs. A common solution is to apply a dimensionality reduction (DR) layer, but this approach increases parameter overhead, computational costs, and risks losing valuable information. To address these issues, we propose Nested Res2Net (Nes2Net), a lightweight back-end architecture designed to directly process high-dimensional features without DR layers. The nested structure enhances multi-scale feature extraction, improves feature interaction, and preserves high-dimensional information. We first validate Nes2Net on CtrSVDD, a singing voice deepfake detection dataset, and report a 22% performance improvement and an 87% back-end computational cost reduction over the state-of-the-art baseline. Additionally, extensive testing across four diverse datasets: ASVspoof 2021, ASVspoof 5, PartialSpoof, and In-the-Wild, covering fully spoofed speech, adversarial attacks, partial spoofing, and real-world scenarios, consistently highlights Nes2Net's superior robustness and generalization capabilities. The code package and pre-trained models are available at https://github.com/Liu-Tianchi/Nes2Net.

Updated: 2025-04-08 04:11:28

标题: Nes2Net：基于基础模型驱动的语音反欺诈的轻量级嵌套架构

摘要: 语音基础模型通过提供出色的表示能力，显著推进了各种与语音相关的任务。然而，它们的高维输出特征经常与下游任务模型不匹配，后者通常需要低维输入。一种常见解决方案是应用降维（DR）层，但这种方法会增加参数开销、计算成本，并有可能丢失宝贵信息。为了解决这些问题，我们提出了Nested Res2Net（Nes2Net），这是一种轻量级的后端架构，旨在直接处理高维特征而无需降维层。嵌套结构增强了多尺度特征提取，改善了特征交互，并保留了高维信息。我们首先在CtrSVDD上验证了Nes2Net，这是一个歌声深度伪造检测数据集，并报告了相对于最先进基线的22%性能提升和87%的后端计算成本降低。此外，在四个不同的数据集上进行了广泛测试：ASVspoof 2021、ASVspoof 5、PartialSpoof和In-the-Wild，涵盖了完全伪造的语音、对抗性攻击、部分伪造和真实世界场景，始终突显了Nes2Net卓越的鲁棒性和泛化能力。代码包和预训练模型可在https://github.com/Liu-Tianchi/Nes2Net上获得。

更新时间: 2025-04-08 04:11:28

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2504.05657v1

Curved representational Bregman divergences and their applications

By analogy to curved exponential families, we define curved Bregman divergences as restrictions of Bregman divergences to sub-dimensional parameter subspaces, and prove that the barycenter of a finite weighted parameter set with respect to a curved Bregman divergence amounts to the Bregman projection onto the subspace induced by the constraint of the barycenter with respect to the unconstrained full Bregman divergence. We demonstrate the significance of curved Bregman divergences with two examples: (1) symmetrized Bregman divergences and (2) the Kullback-Leibler divergence between circular complex normal distributions. We then consider monotonic embeddings to define representational curved Bregman divergences and show that the $\alpha$-divergences are representational curved Bregman divergences with respect to $\alpha$-embeddings of the probability simplex into the positive measure cone. As an application, we report an efficient method to calculate the intersection of a finite set of $\alpha$-divergence spheres.

Updated: 2025-04-08 04:05:12

标题: 曲线表示的Bregman散度及其应用

摘要: 通过对曲线指数族的类比，我们定义了曲线Bregman散度，将Bregman散度限制在子维参数子空间上，并证明了有限加权参数集关于曲线Bregman散度的质心等于关于未约束的完整Bregman散度的约束质心的Bregman投影。我们通过两个示例展示了曲线Bregman散度的重要性：(1)对称化的Bregman散度和(2)圆形复杂正态分布之间的Kullback-Leibler散度。然后，我们考虑单调嵌入来定义表征性曲线Bregman散度，并展示了$\alpha$-散度是关于$\alpha$-嵌入概率单纯形到正测度锥的表征性曲线Bregman散度。作为一个应用，我们报告了一种高效计算有限集合$\alpha$-散度球体的交集的方法。

更新时间: 2025-04-08 04:05:12

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2504.05654v1

Modulated Differentiable STFT and Balanced Spectrum Metric for Freight Train Wheelset Bearing Cross-machine Transfer Fault Diagnosis under Speed Fluctuations

The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentiable short-time Fourier transform (STFT) and physics-informed balanced spectrum quality metric is proposed to learn domain-invariant and discriminative features under time-varying speeds. Firstly, due to insufficiency in extracting extract frequency components of time-varying speed signals using fixed windows, a modulated differentiable STFT (MDSTFT) that is interpretable with STFT-informed theoretical support, is proposed to extract the robust time-frequency spectrum (TFS). During training process, multiple windows with different lengths dynamically change. Also, in addition to the classification metric and domain discrepancy metric, we creatively introduce a third kind of metric, referred to as the physics-informed metric, to enhance transferable TFS. A physics-informed balanced spectrum quality (BSQ) regularization loss is devised to guide an optimization direction for MDSTFT and model. With it, not only can model acquire high-quality TFS, but also a physics-restricted domain adaptation network can be also acquired, making it learn real-world physics knowledge, ultimately diminish the domain discrepancy across different datasets. The experiment is conducted in the scenario of migrating from the laboratory datasets to the freight train dataset, indicating that the hybrid-driven pyDSN outperforms existing methods and has practical value.

Updated: 2025-04-08 04:01:43

标题: 调制可微STFT和平衡频谱度量在速度波动下用于货运列车轮对轴承跨机故障诊断

摘要: 轮对轴承的服务条件对铁路重载货运列车的安全运行有直接影响，因为它们是关键部件。然而，列车速度波动和故障样本不足是限制轴承故障诊断准确性的两个主要问题。因此，提出了一个结合可解释的调制可微短时傅里叶变换（STFT）和物理信息平衡频谱质量度量的跨机器转移诊断（pyDSN）网络，以在时间变化的速度下学习域不变和区分特征。首先，由于使用固定窗口提取时间变化速度信号的频率成分不足，提出了一个具有STFT信息支持的可解释的调制可微STFT（MDSTFT），用于提取稳健的时频谱（TFS）。在训练过程中，多个具有不同长度的窗口动态变化。此外，除了分类度量和域差异度量，我们还创造性地引入了第三种度量，称为物理信息度量，以增强可转移的TFS。设计了一种物理信息平衡频谱质量（BSQ）正则化损失，用于引导MDSTFT和模型的优化方向。通过它，模型不仅可以获得高质量的TFS，还可以获得一个受物理限制的域自适应网络，使其学习真实世界的物理知识，最终减小不同数据集之间的域差异。实验在从实验室数据集迁移到货运列车数据集的情景中进行，表明混合驱动的pyDSN优于现有方法，并具有实际价值。

更新时间: 2025-04-08 04:01:43

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.11917v2

Vision Transformers with Autoencoders and Explainable AI for Cancer Patient Risk Stratification Using Whole Slide Imaging

Cancer remains one of the leading causes of mortality worldwide, necessitating accurate diagnosis and prognosis. Whole Slide Imaging (WSI) has become an integral part of clinical workflows with advancements in digital pathology. While various studies have utilized WSIs, their extracted features may not fully capture the most relevant pathological information, and their lack of interpretability limits clinical adoption. In this paper, we propose PATH-X, a framework that integrates Vision Transformers (ViT) and Autoencoders with SHAP (Shapley Additive Explanations) to enhance model explainability for patient stratification and risk prediction using WSIs from The Cancer Genome Atlas (TCGA). A representative image slice is selected from each WSI, and numerical feature embeddings are extracted using Google's pre-trained ViT. These features are then compressed via an autoencoder and used for unsupervised clustering and classification tasks. Kaplan-Meier survival analysis is applied to evaluate stratification into two and three risk groups. SHAP is used to identify key contributing features, which are mapped onto histopathological slices to provide spatial context. PATH-X demonstrates strong performance in breast and glioma cancers, where a sufficient number of WSIs enabled robust stratification. However, performance in lung cancer was limited due to data availability, emphasizing the need for larger datasets to enhance model reliability and clinical applicability.

Updated: 2025-04-08 03:59:22

标题: 视觉变换器与自动编码器和可解释的人工智能在使用全幻灯片成像进行癌症患者风险分层方面的应用

摘要: 癌症仍然是全球主要死因之一，需要准确的诊断和预后。随着数字病理学的发展，全切片成像（WSI）已经成为临床工作流程的一个重要组成部分。尽管各种研究已经利用了WSI，但它们提取的特征可能无法完全捕获最相关的病理信息，而它们缺乏可解释性限制了临床应用。本文提出了PATH-X，这是一个集成了Vision Transformers（ViT）和Autoencoders与SHAP（Shapley Additive Explanations）的框架，用于增强对来自癌症基因组图谱（TCGA）的WSI进行患者分层和风险预测的模型可解释性。从每个WSI中选择代表性图像切片，并利用Google预训练的ViT提取数值特征嵌入。然后，通过自编码器对这些特征进行压缩，并用于无监督聚类和分类任务。Kaplan-Meier生存分析被应用于评估分层成两个和三个风险组。SHAP用于识别关键的贡献特征，并将其映射到组织病理切片上，以提供空间上下文。 PATH-X在乳腺癌和胶质瘤方面表现出较强的性能，其中足够数量的WSI实现了稳健的分层。然而，在肺癌方面的表现受到数据可用性的限制，强调了需要更大数据集以增强模型的可靠性和临床适用性。

更新时间: 2025-04-08 03:59:22

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.04749v2

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

Recent advancements in large language models (LLMs) have revolutionized their ability to handle single-turn tasks, yet real-world applications demand sophisticated multi-turn interactions. This survey provides a comprehensive review of recent advancements in evaluating and enhancing multi-turn interactions in LLMs. Focusing on task-specific scenarios, from instruction following in diverse domains such as math and coding to complex conversational engagements in roleplay, healthcare, education, and even adversarial jailbreak settings, we systematically examine the challenges of maintaining context, coherence, fairness, and responsiveness over prolonged dialogues. The paper organizes current benchmarks and datasets into coherent categories that reflect the evolving landscape of multi-turn dialogue evaluation. In addition, we review a range of enhancement methodologies under multi-turn settings, including model-centric strategies (contextual learning, supervised fine-tuning, reinforcement learning, and new architectures), external integration approaches (memory-augmented, retrieval-based methods, and knowledge graph), and agent-based techniques for collaborative interactions. Finally, we discuss open challenges and propose future directions for research to further advance the robustness and effectiveness of multi-turn interactions in LLMs. Related resources and papers are available at https://github.com/yubol-cmu/Awesome-Multi-Turn-LLMs.

Updated: 2025-04-08 03:58:37

标题: 超越单轮交互：关于大型语言模型多轮交互的调查

摘要: 最近大型语言模型（LLMs）的进展已经彻底改变了它们处理单轮任务的能力，然而现实世界中的应用需求复杂的多轮交互。本调查综合评估了LLMs中评估和增强多轮交互的最新进展。重点关注任务特定场景，从数学和编码等各种领域的指导遵循到角色扮演中复杂的对话互动，以及在医疗保健、教育甚至对抗性越狱设置中进行系统检查，我们系统地研究了在长时间对话中保持上下文、连贯性、公平性和响应性的挑战。本文将当前的基准和数据集组织成连贯的类别，反映了多轮对话评估的不断发展的格局。此外，我们还回顾了多轮设置下一系列增强方法，包括模型中心策略（上下文学习、监督微调、强化学习和新架构）、外部整合方法（记忆增强、检索式方法和知识图）以及基于代理的协作交互技术。最后，我们讨论了开放挑战，并提出未来研究方向，以进一步提升LLMs中多轮交互的稳健性和效果。相关资源和论文可在https://github.com/yubol-cmu/Awesome-Multi-Turn-LLMs 上找到。

更新时间: 2025-04-08 03:58:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.04717v2

ToM-RL: Reinforcement Learning Unlocks Theory of Mind in Small LLMs

Recent advancements in rule-based reinforcement learning (RL), applied during the post-training phase of large language models (LLMs), have significantly enhanced their capabilities in structured reasoning tasks such as mathematics and logical inference. However, the effectiveness of RL in social reasoning, particularly in Theory of Mind (ToM), the ability to infer others' mental states, remains largely unexplored. In this study, we demonstrate that RL methods effectively unlock ToM reasoning capabilities even in small-scale LLMs (0.5B to 7B parameters). Using a modest dataset comprising 3200 questions across diverse scenarios, our RL-trained 7B model achieves 84.50\% accuracy on the Hi-ToM benchmark, surpassing models like GPT-4o and DeepSeek-v3 despite significantly fewer parameters. While smaller models ($\leq$3B parameters) suffer from reasoning collapse, larger models (7B parameters) maintain stable performance through consistent belief tracking. Additionally, our RL-based models demonstrate robust generalization to higher-order, out-of-distribution ToM problems, novel textual presentations, and previously unseen datasets. These findings highlight RL's potential to enhance social cognitive reasoning, bridging the gap between structured problem-solving and nuanced social inference in LLMs.

Updated: 2025-04-08 03:58:20

标题: ToM-RL：强化学习在小型LLMs中解锁心智理论

摘要: 最近在大型语言模型（LLMs）的后训练阶段应用基于规则的强化学习（RL）取得了显著进展，在数学和逻辑推理等结构化推理任务方面显著增强了它们的能力。然而，RL在社会推理，特别是在心智理论（ToM）方面，即推断他人的心智状态的能力，仍然大部分未被探索。在这项研究中，我们展示RL方法有效地解锁了即使在小规模LLMs（0.5B至7B参数）中的ToM推理能力。使用包含3200个问题跨越不同情境的适度数据集，我们的RL训练的7B模型在Hi-ToM基准测试中实现了84.50\%的准确率，超越了像GPT-4o和DeepSeek-v3这样的模型，尽管参数显著较少。尽管较小模型（≤3B参数）遭受推理崩溃，但较大模型（7B参数）通过一致的信念跟踪保持稳定性能。此外，我们基于RL的模型展示了对高阶、超出分布的ToM问题、新颖的文本呈现方式和之前未见的数据集的强大泛化能力。这些发现突显了RL增强社会认知推理的潜力，弥合了LLMs中结构化问题解决和微妙社会推理之间的差距。

更新时间: 2025-04-08 03:58:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.01698v2

Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking

Large Language Models (LLMs) have become increasingly integral to a wide range of applications. However, they still remain the threat of jailbreak attacks, where attackers manipulate designed prompts to make the models elicit malicious outputs. Analyzing jailbreak methods can help us delve into the weakness of LLMs and improve it. In this paper, We reveal a vulnerability in large language models (LLMs), which we term Defense Threshold Decay (DTD), by analyzing the attention weights of the model's output on input and subsequent output on prior output: as the model generates substantial benign content, its attention weights shift from the input to prior output, making it more susceptible to jailbreak attacks. To demonstrate the exploitability of DTD, we propose a novel jailbreak attack method, Sugar-Coated Poison (SCP), which induces the model to generate substantial benign content through benign input and adversarial reasoning, subsequently producing malicious content. To mitigate such attacks, we introduce a simple yet effective defense strategy, POSD, which significantly reduces jailbreak success rates while preserving the model's generalization capabilities.

Updated: 2025-04-08 03:57:09

标题: 糖衣毒药：良性生成解锁LLM越狱

摘要: 大型语言模型（LLMs）已经越来越成为各种应用的核心。然而，它们仍然存在越狱攻击的威胁，攻击者通过操纵设计好的提示来使模型产生恶意输出。分析越狱方法可以帮助我们深入研究LLMs的弱点并改进它。在本文中，我们揭示了大型语言模型（LLMs）中的一个漏洞，我们称之为防御阈值衰减（DTD），通过分析模型在输入上的输出和前一个输出上的后续输出的注意力权重：随着模型生成大量良性内容，其注意力权重从输入转移到前一个输出，使其更容易受到越狱攻击的影响。为了展示DTD的可利用性，我们提出了一种新颖的越狱攻击方法，称为糖衣毒素（SCP），通过良性输入和敌对推理诱使模型生成大量良性内容，随后产生恶意内容。为了减轻这种攻击，我们引入了一种简单而有效的防御策略，POSD，可以显著降低越狱成功率同时保留模型的泛化能力。

更新时间: 2025-04-08 03:57:09

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2504.05652v1

Measuring Déjà vu Memorization Efficiently

Recent research has shown that representation learning models may accidentally memorize their training data. For example, the d\'ej\`a vu method shows that for certain representation learning models and training images, it is sometimes possible to correctly predict the foreground label given only the representation of the background - better than through dataset-level correlations. However, their measurement method requires training two models - one to estimate dataset-level correlations and the other to estimate memorization. This multiple model setup becomes infeasible for large open-source models. In this work, we propose alternative simple methods to estimate dataset-level correlations, and show that these can be used to approximate an off-the-shelf model's memorization ability without any retraining. This enables, for the first time, the measurement of memorization in pre-trained open-source image representation and vision-language representation models. Our results show that different ways of measuring memorization yield very similar aggregate results. We also find that open-source models typically have lower aggregate memorization than similar models trained on a subset of the data. The code is available both for vision and vision language models.

Updated: 2025-04-08 03:55:20

标题: 高效测量“既视感”记忆

摘要: 最近的研究表明，表示学习模型可能会意外地记住它们的训练数据。例如，d\'ej\`a vu方法表明，对于某些表示学习模型和训练图像，有时可以仅通过背景的表示正确预测前景标签 - 这比通过数据集级别的相关性更好。然而，他们的测量方法需要训练两个模型 - 一个用于估计数据集级别的相关性，另一个用于估计记忆。这种多模型设置对于大型开源模型来说是不可行的。在这项工作中，我们提出了替代的简单方法来估计数据集级别的相关性，并展示这些方法可以用来近似现成模型的记忆能力，而无需重新训练。这使得首次能够测量预训练的开源图像表示和视觉-语言表示模型的记忆。我们的结果表明，不同的记忆测量方式产生非常相似的总体结果。我们还发现，开源模型通常比在数据子集上训练的类似模型具有较低的总体记忆。该代码适用于视觉和视觉语言模型。

更新时间: 2025-04-08 03:55:20

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2504.05651v1

Lattice: Learning to Efficiently Compress the Memory

Attention mechanisms have revolutionized sequence learning but suffer from quadratic computational complexity. This paper introduces Lattice, a novel recurrent neural network (RNN) mechanism that leverages the inherent low-rank structure of K-V matrices to efficiently compress the cache into a fixed number of memory slots, achieving sub-quadratic complexity. We formulate this compression as an online optimization problem and derive a dynamic memory update rule based on a single gradient descent step. The resulting recurrence features a state- and input-dependent gating mechanism, offering an interpretable memory update process. The core innovation is the orthogonal update: each memory slot is updated exclusively with information orthogonal to its current state hence incorporation of only novel, non-redundant data, which minimizes the interference with previously stored information. The experimental results show that Lattice achieves the best perplexity compared to all baselines across diverse context lengths, with performance improvement becoming more pronounced as the context length increases.

Updated: 2025-04-08 03:48:43

标题: 智能压缩内存的学习技术：Lattice

摘要: 注意机制已经彻底改变了序列学习，但受到二次计算复杂性的困扰。本文介绍了一种新颖的循环神经网络（RNN）机制Lattice，利用K-V矩阵固有的低秩结构，将缓存有效地压缩到固定数量的内存槽中，实现了次二次复杂度。我们将这种压缩形式化为在线优化问题，并基于单个梯度下降步骤推导出动态内存更新规则。由此产生的循环特性具有状态和输入相关的门控机制，提供了可解释的内存更新过程。核心创新是正交更新：每个内存槽仅使用与其当前状态正交的信息进行更新，因此仅包含新颖、非冗余的数据，最小化与先前存储信息的干扰。实验结果表明，与各种上下文长度的基线相比，Lattice在所有基线中实现了最佳困惑度，性能改善随着上下文长度的增加而更加显著。

更新时间: 2025-04-08 03:48:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05646v1

Improved Inference of Inverse Ising Problems under Missing Observations in Restricted Boltzmann Machines

Restricted Boltzmann machines (RBMs) are energy-based models analogous to the Ising model and are widely applied in statistical machine learning. The standard inverse Ising problem with a complete dataset requires computing both data and model expectations and is computationally challenging because model expectations have a combinatorial explosion. Furthermore, in many applications, the available datasets are partially incomplete, making it difficult to compute even data expectations. In this study, we propose a approximation framework for these expectations in the practical inverse Ising problems that integrates mean-field approximation or persistent contrastive divergence to generate refined initial points and spatial Monte Carlo integration to enhance estimator accuracy. We demonstrate that the proposed method effectively and accurately tunes the model parameters in comparison to the conventional method.

Updated: 2025-04-08 03:39:56

标题: 在受限玻尔兹曼机中缺失观测下改进逆伊辛问题推断

摘要: 限制玻尔兹曼机（RBMs）是类似于伊辛模型的基于能量的模型，在统计机器学习中被广泛应用。完整数据集的标准逆伊辛问题需要计算数据和模型期望，由于模型期望存在组合爆炸，因此具有挑战性的计算。此外，在许多应用中，可用数据集是部分不完整的，这使得计算数据期望也变得困难。在本研究中，我们提出了一个用于逆伊辛问题的近似框架，该框架整合了平均场逼近或持续对比散度来生成精细的初始点，以及空间蒙特卡罗积分来提高估计器的准确性。我们证明，与传统方法相比，所提出的方法有效地和准确地调整了模型参数。

更新时间: 2025-04-08 03:39:56

领域: stat.ML,cond-mat.dis-nn,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2504.05643v1

DBOT: Artificial Intelligence for Systematic Long-Term Investing

Long-term investing was previously seen as requiring human judgment. With the advent of generative artificial intelligence (AI) systems, automated systematic long-term investing is now feasible. In this paper, we present DBOT, a system whose goal is to reason about valuation like Aswath Damodaran, who is a unique expert in the investment arena in terms of having published thousands of valuations on companies in addition to his numerous writings on the topic, which provide ready training data for an AI system. DBOT can value any publicly traded company. DBOT can also be back-tested, making its behavior and performance amenable to scientific inquiry. We compare DBOT to its analytic parent, Damodaran, and highlight the research challenges involved in raising its current capability to that of Damodaran's. Finally, we examine the implications of DBOT-like AI agents for the financial industry, especially how they will impact the role of human analysts in valuation.

Updated: 2025-04-08 03:34:22

标题: DBOT：系统性长期投资的人工智能

摘要: 长期投资以前被认为需要人类判断。随着生成式人工智能（AI）系统的出现，自动系统化的长期投资现在是可行的。在本文中，我们介绍了DBOT，一个系统，其目标是像Aswath Damodaran一样推理估值，他是投资领域独特的专家，因为他不仅在公司上发布了成千上万次估值，而且在这个主题上还有大量著作，为AI系统提供了现成的训练数据。DBOT可以对任何上市公司进行估值。DBOT还可以进行回测，使其行为和表现可以接受科学研究。我们将DBOT与其分析父级Damodaran进行比较，并强调提升其当前能力至Damodaran水平所涉及的研究挑战。最后，我们探讨了类似DBOT的AI代理对金融行业的影响，特别是它们将如何影响人类分析师在估值中的角色。

更新时间: 2025-04-08 03:34:22

领域: cs.CL,cs.AI,q-fin.PR

下载: http://arxiv.org/abs/2504.05639v1

TAGC: Optimizing Gradient Communication in Distributed Transformer Training

The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training. A significant bottleneck in this process is gradient synchronization across multiple GPUs, particularly in the zero-redundancy parallelism mode. In this paper, we introduce Transformer-Aware Gradient Compression (TAGC), an optimized gradient compression algorithm designed specifically for transformer-based models. TAGC extends the lossless homomorphic compression method by adapting it for sharded models and incorporating transformer-specific optimizations, such as layer-selective compression and dynamic sparsification. Our experimental results demonstrate that TAGC accelerates training by up to 15% compared to the standard Fully Sharded Data Parallel (FSDP) approach, with minimal impact on model quality. We integrate TAGC into the PyTorch FSDP framework, the implementation is publicly available at https://github.com/ipolyakov/TAGC.

Updated: 2025-04-08 03:33:39

标题: TAGC：在分布式Transformer训练中优化梯度通信

摘要: 大型语言模型（LLMs）日益复杂的增长需要高效的训练策略来减轻与分布式训练相关的高计算成本。在这个过程中的一个重要瓶颈是在多个GPU之间进行梯度同步，特别是在零冗余并行模式中。在本文中，我们介绍了Transformer-Aware Gradient Compression（TAGC），这是一种专门为基于transformer的模型设计的优化梯度压缩算法。TAGC通过将无损同态压缩方法适应于分片模型，并结合transformer特定的优化，如层选择性压缩和动态稀疏化，扩展了该方法。我们的实验结果表明，与标准的Fully Sharded Data Parallel（FSDP）方法相比，TAGC可以将训练加速高达15%，对模型质量影响最小。我们将TAGC集成到PyTorch FSDP框架中，实现公开可用于https://github.com/ipolyakov/TAGC。

更新时间: 2025-04-08 03:33:39

领域: cs.LG,cs.DC,I.2.6; C.2.4; I.2.11

下载: http://arxiv.org/abs/2504.05638v1

Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation

Multi-agent task allocation (MATA) plays a vital role in cooperative multi-agent systems, with significant implications for applications such as logistics, search and rescue, and robotic coordination. Although traditional deep reinforcement learning (DRL) methods have been shown to be promising, their effectiveness is hindered by a reliance on manually designed reward functions and inefficiencies in dynamic environments. In this paper, an inverse reinforcement learning (IRL)-based framework is proposed, in which multi-head self-attention (MHSA) and graph attention mechanisms are incorporated to enhance reward function learning and task execution efficiency. Expert demonstrations are utilized to infer optimal reward densities, allowing dependence on handcrafted designs to be reduced and adaptability to be improved. Extensive experiments validate the superiority of the proposed method over widely used multi-agent reinforcement learning (MARL) algorithms in terms of both cumulative rewards and task execution efficiency.

Updated: 2025-04-08 03:33:08

标题: 使用图卷积的注意力增强逆强化学习用于多智能体任务分配

摘要: 多智能体任务分配（MATA）在合作多智能体系统中发挥着至关重要的作用，对于物流、搜救和机器人协调等应用具有重要意义。尽管传统的深度强化学习（DRL）方法表现出有希望的前景，但其有效性受到手动设计的奖励函数依赖和动态环境中的低效率的阻碍。本文提出了一种基于逆强化学习（IRL）的框架，其中融入了多头自注意力（MHSA）和图注意力机制，以增强奖励函数学习和任务执行效率。专家演示被用来推断最佳奖励密度，从而减少对手工设计的依赖并提高适应性。大量实验证实了所提方法在累积奖励和任务执行效率方面优于广泛使用的多智能体强化学习（MARL）算法。

更新时间: 2025-04-08 03:33:08

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2504.05045v2

DMol: A Schedule-Driven Diffusion Model for Highly Efficient and Versatile Molecule Generation

We introduce a new graph diffusion model for small molecule generation, \emph{DMol}, which outperforms the state-of-the-art DiGress model in terms of validity by roughly $1.5\%$ across all benchmarking datasets while reducing the number of diffusion steps by at least $10$-fold, and the running time to roughly one half. The performance improvements are a result of a careful change in the objective function and a ``graph noise" scheduling approach which, at each diffusion step, allows one to only change a subset of nodes of varying size in the molecule graph. Another relevant property of the method is that it can be easily combined with junction-tree-like graph representations that arise by compressing a collection of relevant ring structures into supernodes. Unlike classical junction-tree techniques that involve VAEs and require complicated reconstruction steps, compressed DMol directly performs graph diffusion on a graph that compresses only a carefully selected set of frequent carbon rings into supernodes, which results in straightforward sample generation. This compressed DMol method offers additional validity improvements over generic DMol of roughly $2\%$, increases the novelty of the method, and further improves the running time due to reductions in the graph size.

Updated: 2025-04-08 03:31:21

标题: DMol：一种基于计划驱动的扩散模型，用于高效灵活的分子生成

摘要: 我们引入了一种新的图扩散模型用于小分子生成，即\emph{DMol}，在所有基准数据集上，它的有效性比最先进的DiGress模型提高了约1.5％，同时将扩散步数减少至至少10倍，并将运行时间缩短到大约一半。性能改进是通过仔细改变目标函数和“图噪声”调度方法实现的，该方法允许在每个扩散步骤中只改变分子图中各种大小的节点子集。该方法的另一个相关特性是，它可以轻松与类似联结树的图表示相结合，这些图表示通过将一组相关环结构压缩为超节点而产生。与涉及VAEs并需要复杂重建步骤的经典联结树技术不同，压缩的DMol直接在仅将一组精心选择的常见碳环压缩为超节点的图上执行图扩散，从而实现了直接的样本生成。这种压缩的DMol方法相对于通用DMol提供了约2％的有效性改进，增加了方法的新颖性，并由于图大小的减小进一步改进了运行时间。

更新时间: 2025-04-08 03:31:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06312v1

Towards Optimal Heterogeneous Client Sampling in Multi-Model Federated Learning

Federated learning (FL) allows edge devices to collaboratively train models without sharing local data. As FL gains popularity, clients may need to train multiple unrelated FL models, but communication constraints limit their ability to train all models simultaneously. While clients could train FL models sequentially, opportunistically having FL clients concurrently train different models -- termed multi-model federated learning (MMFL) -- can reduce the overall training time. Prior work uses simple client-to-model assignments that do not optimize the contribution of each client to each model over the course of its training. Prior work on single-model FL shows that intelligent client selection can greatly accelerate convergence, but na\"ive extensions to MMFL can violate heterogeneous resource constraints at both the server and the clients. In this work, we develop a novel convergence analysis of MMFL with arbitrary client sampling methods, theoretically demonstrating the strengths and limitations of previous well-established gradient-based methods. Motivated by this analysis, we propose MMFL-LVR, a loss-based sampling method that minimizes training variance while explicitly respecting communication limits at the server and reducing computational costs at the clients. We extend this to MMFL-StaleVR, which incorporates stale updates for improved efficiency and stability, and MMFL-StaleVRE, a lightweight variant suitable for low-overhead deployment. Experiments show our methods improve average accuracy by up to 19.1% over random sampling, with only a 5.4% gap from the theoretical optimum (full client participation).

Updated: 2025-04-08 03:29:49

标题: 朝向多模型联邦学习中最佳异质客户抽样

摘要: 联邦学习（FL）允许边缘设备协作训练模型，而无需共享本地数据。随着FL的流行，客户端可能需要训练多个不相关的FL模型，但通信限制限制了他们同时训练所有模型的能力。虽然客户端可以顺序训练FL模型，但机会主义地让FL客户端同时训练不同模型 - 称为多模型联邦学习（MMFL）- 可以减少总体训练时间。之前的研究使用简单的客户端到模型分配，不会优化每个客户端在其训练过程中对每个模型的贡献。之前关于单一模型FL的工作表明，智能客户端选择可以极大加速收敛，但对MMFL的天真扩展可能会违反服务器和客户端的异质资源约束。在这项工作中，我们开发了一种新颖的MMFL收敛分析，理论上展示了先前成熟的基于梯度的方法的优势和局限性。受到这一分析的启发，我们提出了MMFL-LVR，一种基于损失的采样方法，可以最小化训练方差，同时明确尊重服务器的通信限制，并减少客户端的计算成本。我们将此扩展到MMFL-StaleVR，它结合了陈旧的更新以提高效率和稳定性，以及MMFL-StaleVRE，一种轻量级变体，适用于低开销部署。实验表明，我们的方法将平均准确度提高了高达19.1%，仅与理论最优值（完整客户端参与）相差5.4%。

更新时间: 2025-04-08 03:29:49

领域: cs.LG,cs.DC,I.2.11

下载: http://arxiv.org/abs/2504.05138v2

A Multi-Modal AI System for Screening Mammography: Integrating 2D and 3D Imaging to Improve Breast Cancer Detection in a Prospective Clinical Study

Although digital breast tomosynthesis (DBT) improves diagnostic performance over full-field digital mammography (FFDM), false-positive recalls remain a concern in breast cancer screening. We developed a multi-modal artificial intelligence system integrating FFDM, synthetic mammography, and DBT to provide breast-level predictions and bounding-box localizations of suspicious findings. Our AI system, trained on approximately 500,000 mammography exams, achieved 0.945 AUROC on an internal test set. It demonstrated capacity to reduce recalls by 31.7% and radiologist workload by 43.8% while maintaining 100% sensitivity, underscoring its potential to improve clinical workflows. External validation confirmed strong generalizability, reducing the gap to a perfect AUROC by 35.31%-69.14% relative to strong baselines. In prospective deployment across 18 sites, the system reduced recall rates for low-risk cases. An improved version, trained on over 750,000 exams with additional labels, further reduced the gap by 18.86%-56.62% across large external datasets. Overall, these results underscore the importance of utilizing all available imaging modalities, demonstrate the potential for clinical impact, and indicate feasibility of further reduction of the test error with increased training set when using large-capacity neural networks.

Updated: 2025-04-08 03:29:40

标题: 一种用于乳腺X线摄影筛查的多模式人工智能系统：整合2D和3D成像以改善乳腺癌检测的前瞻性临床研究

摘要: 尽管数字乳腺断层摄影（DBT）在诊断性能方面优于全数字乳腺X线摄影（FFDM），但在乳腺癌筛查中，假阳性复查仍然是一个令人担忧的问题。我们开发了一个多模态人工智能系统，集成了FFDM、合成乳腺X线摄影和DBT，以提供可疑发现的乳腺级别预测和边界框定位。我们的人工智能系统在约500,000个乳房X线检查的训练下，在内部测试集上实现了0.945的AUROC。它展示了减少复查率31.7%和减少放射科医生工作量43.8%的能力，同时保持100%的灵敏度，突显了其改善临床工作流程的潜力。外部验证证实了其强大的泛化能力，相对于强基线，将AUROC提高了35.31%-69.14%。在18个站点的前瞻性部署中，该系统降低了低风险病例的复查率。一个经过改进的版本，在超过750,000个检查中进行了训练，并进一步减少了18.86%-56.62%的差距，跨大型外部数据集。总的来说，这些结果强调了利用所有可用的成像模态的重要性，展示了临床影响的潜力，并表明在使用大容量神经网络时，通过增加训练集来进一步减少测试错误的可行性。

更新时间: 2025-04-08 03:29:40

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.05636v1

IMPersona: Evaluating Individual Level LM Impersonation

As language models achieve increasingly human-like capabilities in conversational text generation, a critical question emerges: to what extent can these systems simulate the characteristics of specific individuals? To evaluate this, we introduce IMPersona, a framework for evaluating LMs at impersonating specific individuals' writing style and personal knowledge. Using supervised fine-tuning and a hierarchical memory-inspired retrieval system, we demonstrate that even modestly sized open-source models, such as Llama-3.1-8B-Instruct, can achieve impersonation abilities at concerning levels. In blind conversation experiments, participants (mis)identified our fine-tuned models with memory integration as human in 44.44% of interactions, compared to just 25.00% for the best prompting-based approach. We analyze these results to propose detection methods and defense strategies against such impersonation attempts. Our findings raise important questions about both the potential applications and risks of personalized language models, particularly regarding privacy, security, and the ethical deployment of such technologies in real-world contexts.

Updated: 2025-04-08 03:29:25

标题: IMPersona：评估个体级别的LM冒充

摘要: 随着语言模型在对话文本生成方面越来越接近人类的能力，一个关键问题浮现出来：这些系统能够多大程度地模拟特定个体的特征？为了评估这一点，我们引入了IMPersona，一个用于评估语言模型在模仿特定个体写作风格和个人知识方面的框架。通过使用监督微调和层次记忆灵感检索系统，我们展示了即使是像Llama-3.1-8B-Instruct这样规模适中的开源模型也能在模仿能力方面达到令人担忧的水平。在盲目对话实验中，参与者在44.44%的互动中将我们微调过的带有记忆集成的模型误认为是人类，而最佳提示型方法仅为25.00%。我们分析这些结果，提出检测方法和防御策略来抵御这种模拟尝试。我们的发现引发了关于个性化语言模型潜在应用和风险的重要问题，特别是关于隐私、安全和在现实环境中道德部署这种技术的问题。

更新时间: 2025-04-08 03:29:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.04332v2

To Start Up a Start-Up$-$Embedding Strategic Demand Development in Operational On-Demand Fulfillment via Reinforcement Learning with Information Shaping

The last few years have witnessed rapid growth in the on-demand delivery market, with many start-ups entering the field. However, not all of these start-ups have succeeded due to various reasons, among others, not being able to establish a large enough customer base. In this paper, we address this problem that many on-demand transportation start-ups face: how to establish themselves in a new market. When starting, such companies often have limited fleet resources to serve demand across a city. Depending on the use of the fleet, varying service quality is observed in different areas of the city, and in turn, the service quality impacts the respective growth of demand in each area. Thus, operational fulfillment decisions drive the longer-term demand development. To integrate strategic demand development into real-time fulfillment operations, we propose a two-step approach. First, we derive analytical insights into optimal allocation decisions for a stylized problem. Second, we use these insights to shape the training data of a reinforcement learning strategy for operational real-time fulfillment. Our experiments demonstrate that combining operational efficiency with long-term strategic planning is highly advantageous. Further, we show that the careful shaping of training data is essential for the successful development of demand.

Updated: 2025-04-08 03:25:37

标题: 创业初创企业的战略需求开发嵌入运营按需履行中的强化学习与信息塑造

摘要: 近几年来，随着许多初创公司进入领域，即时交付市场迅速增长。然而，并非所有这些初创公司都成功，原因有很多，其中之一是无法建立足够大的客户群。在本文中，我们解决了许多即时交通初创公司面临的问题：如何在新市场中建立自己的地位。当初创公司开始时，通常有限的车队资源无法满足城市各地的需求。根据车队的使用情况，在城市不同地区观察到不同的服务质量，而服务质量反过来影响着每个地区需求的增长。因此，运营履行决策推动着长期需求的发展。为了将战略需求发展整合到实时履行操作中，我们提出了一种两步方法。首先，我们为一个简化的问题推导出最优分配决策的分析见解。其次，我们利用这些见解来塑造实时履行操作的强化学习策略的训练数据。我们的实验证明，将运营效率与长期战略规划结合起来是非常有利的。此外，我们展示了精心塑造训练数据对于成功发展需求至关重要。

更新时间: 2025-04-08 03:25:37

领域: cs.LG

下载: http://arxiv.org/abs/2504.05633v1

Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning

Recent advances in large-scale generative language models have shown that reasoning capabilities can significantly improve model performance across a variety of tasks. However, the impact of reasoning on a model's ability to mitigate stereotypical responses remains largely underexplored. In this work, we investigate the crucial relationship between a model's reasoning ability and fairness, and ask whether improved reasoning capabilities can mitigate harmful stereotypical responses, especially those arising due to shallow or flawed reasoning. We conduct a comprehensive evaluation of multiple open-source LLMs, and find that larger models with stronger reasoning abilities exhibit substantially lower stereotypical bias on existing fairness benchmarks. Building on this insight, we introduce ReGiFT -- Reasoning Guided Fine-Tuning, a novel approach that extracts structured reasoning traces from advanced reasoning models and infuses them into models that lack such capabilities. We use only general-purpose reasoning and do not require any fairness-specific supervision for bias mitigation. Notably, we see that models fine-tuned using ReGiFT not only improve fairness relative to their non-reasoning counterparts but also outperform advanced reasoning models on fairness benchmarks. We also analyze how variations in the correctness of the reasoning traces and their length influence model fairness and their overall performance. Our findings highlight that enhancing reasoning capabilities is an effective, fairness-agnostic strategy for mitigating stereotypical bias caused by reasoning flaws.

Updated: 2025-04-08 03:21:51

标题: 朝向公平的推理：通过推理引导微调减少语言模型中的偏见

摘要: 最近大规模生成式语言模型的进展表明，推理能力可以显著提高模型在各种任务中的性能。然而，推理对模型缓解刻板印象性回应的影响仍然大部分未被探索。在这项工作中，我们调查了模型推理能力和公平性之间的关键关系，并询问改进的推理能力是否可以缓解有害的刻板印象性回应，特别是那些由浅薄或有缺陷的推理引起的回应。我们对多个开源LLM进行了全面评估，并发现具有更强推理能力的更大模型在现有的公平性基准测试中表现出明显较低的刻板印象性偏见。基于这一发现，我们引入了ReGiFT -- 推理引导微调，一种从先进推理模型中提取结构化推理痕迹并将其注入缺乏这种能力的模型的新方法。我们仅使用通用推理，不需要任何特定于公平性的监督来减轻偏见。值得注意的是，我们发现使用ReGiFT进行微调的模型不仅相对于其非推理对应物改善了公平性，而且在公平性基准测试中也优于先进推理模型。我们还分析了推理痕迹的正确性和长度变化如何影响模型的公平性和整体性能。我们的发现强调，增强推理能力是一种有效的、与公平性无关的策略，可以缓解由推理缺陷引起的刻板印象性偏见。

更新时间: 2025-04-08 03:21:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.05632v1

DataMan: Data Manager for Pre-training Large Language Models

The performance emergence of large language models (LLMs) driven by data scaling laws makes the selection of pre-training data increasingly important. However, existing methods rely on limited heuristics and human intuition, lacking comprehensive and clear guidelines. To address this, we are inspired by ``reverse thinking'' -- prompting LLMs to self-identify which criteria benefit its performance. As its pre-training capabilities are related to perplexity (PPL), we derive 14 quality criteria from the causes of text perplexity anomalies and introduce 15 common application domains to support domain mixing. In this paper, we train a Data Manager (DataMan) to learn quality ratings and domain recognition from pointwise rating, and use it to annotate a 447B token pre-training corpus with 14 quality ratings and domain type. Our experiments validate our approach, using DataMan to select 30B tokens to train a 1.3B-parameter language model, demonstrating significant improvements in in-context learning (ICL), perplexity, and instruction-following ability over the state-of-the-art baseline. The best-performing model, based on the Overall Score l=5 surpasses a model trained with 50% more data using uniform sampling. We continue pre-training with high-rated, domain-specific data annotated by DataMan to enhance domain-specific ICL performance and thus verify DataMan's domain mixing ability. Our findings emphasize the importance of quality ranking, the complementary nature of quality criteria, and their low correlation with perplexity, analyzing misalignment between PPL and ICL performance. We also thoroughly analyzed our pre-training dataset, examining its composition, the distribution of quality ratings, and the original document sources.

Updated: 2025-04-08 03:21:10

标题: DataMan：用于预训练大型语言模型的数据管理器

摘要: 由数据缩放定律驱动的大型语言模型（LLMs）的性能出现使得预训练数据的选择变得越来越重要。然而，现有方法依赖有限的启发式和人类直觉，缺乏全面和清晰的指导方针。为了解决这个问题，我们受到“逆向思维”的启发--促使LLMs自我识别哪些标准有利于其性能。由于其预训练能力与困惑度（PPL）有关，我们从文本困惑度异常的原因中推导出14个质量标准，并引入15个常见应用领域以支持领域混合。在本文中，我们训练了一个数据管理器（DataMan）来从单点评分中学习质量评分和领域识别，并用它来用14个质量评分和领域类型注释一个447B令牌的预训练语料库。我们的实验验证了我们的方法，使用DataMan选择30B令牌来训练一个1.3B参数的语言模型，展示了在上下文学习（ICL），困惑度和遵循指令能力方面明显的改进超过了最先进的基线。基于综合评分l=5的表现最佳的模型超过了使用均匀抽样训练50%更多数据的模型。我们继续使用DataMan注释的高评分，领域特定数据进行高评分的预训练，以增强领域特定ICL性能，并因此验证DataMan的领域混合能力。我们的研究结果强调了质量排名的重要性，质量标准的互补性质，以及它们与困惑度之间的低相关性，分析了PPL和ICL性能之间的不一致性。我们还彻底分析了我们的预训练数据集，考察了其组成，质量评分的分布和原始文档来源。

更新时间: 2025-04-08 03:21:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.19363v3

Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide's side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.

Updated: 2025-04-08 03:11:32

标题: 基于众包的知识图构建：利用大型语言模型构建药物副作用知识图，并在塞马鲁肽上应用

摘要: 社交媒体是一个丰富的真实世界数据来源，捕捉了有价值的患者体验信息，用于药物警戒。然而，从非结构化和嘈杂的社交媒体内容中挖掘数据仍然是一项具有挑战性的任务。我们提出了一个系统框架，利用大型语言模型(LLMs)从社交媒体中提取药物副作用并将其组织成知识图(KG)。我们将这一框架应用于使用Reddit数据的塞麦格列汀用于减肥。利用构建的知识图，我们进行了全面分析，以调查不同塞麦格列汀品牌随时间报告的副作用。这些发现进一步通过与FAERS数据库中报告的不良事件进行比较进行验证，为医疗保健专业人员和患者提供了关于塞麦格列汀副作用的重要以患者为中心的见解，这些见解补充了其安全概况和当前塞麦格列汀知识库。我们的工作证明了利用LLMs将社交媒体数据转化为结构化KGs用于药物警戒的可行性。

更新时间: 2025-04-08 03:11:32

领域: cs.AI,cs.SI,J.4

下载: http://arxiv.org/abs/2504.04346v2

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

We present VAPO, Value-based Augmented Proximal Policy Optimization framework for reasoning models., a novel framework tailored for reasoning models within the value-based paradigm. Benchmarked the AIME 2024 dataset, VAPO, built on the Qwen 32B pre-trained model, attains a state-of-the-art score of $\mathbf{60.4}$. In direct comparison under identical experimental settings, VAPO outperforms the previously reported results of DeepSeek-R1-Zero-Qwen-32B and DAPO by more than 10 points. The training process of VAPO stands out for its stability and efficiency. It reaches state-of-the-art performance within a mere 5,000 steps. Moreover, across multiple independent runs, no training crashes occur, underscoring its reliability. This research delves into long chain-of-thought (long-CoT) reasoning using a value-based reinforcement learning framework. We pinpoint three key challenges that plague value-based methods: value model bias, the presence of heterogeneous sequence lengths, and the sparsity of reward signals. Through systematic design, VAPO offers an integrated solution that effectively alleviates these challenges, enabling enhanced performance in long-CoT reasoning tasks.

Updated: 2025-04-08 03:06:22

标题: VAPO：高效可靠的用于高级推理任务的强化学习

摘要: 我们提出了VAPO，即基于值的增强近端策略优化框架，专为基于值的范式中推理模型而设计。在AIME 2024数据集上进行了基准测试，VAPO基于Qwen 32B预训练模型，获得了60.4的最先进得分。在相同的实验设置下直接比较，VAPO的表现优于之前报告的DeepSeek-R1-Zero-Qwen-32B和DAPO超过10个点。VAPO的训练过程以其稳定性和效率脱颖而出。仅在5000步内达到了最先进的性能。此外，在多次独立运行中，没有发生训练崩溃，突出了其可靠性。这项研究探讨了使用基于值的强化学习框架进行长逻辑推理（long-CoT）的问题。我们指出了困扰基于值方法的三个关键挑战：值模型偏差、异构序列长度的存在以及奖励信号的稀疏性。通过系统设计，VAPO提供了一个集成解决方案，有效缓解了这些挑战，实现了长逻辑推理任务的增强性能。

更新时间: 2025-04-08 03:06:22

领域: cs.AI

下载: http://arxiv.org/abs/2504.05118v2

The Amenability Framework: Rethinking Causal Ordering Without Estimating Causal Effects

Who should we prioritize for intervention when we cannot estimate intervention effects? In many applied domains -- such as advertising, customer retention, and behavioral nudging -- prioritization is guided by predictive models that estimate outcome probabilities rather than causal effects. This paper investigates when these predictions (scores) can effectively rank individuals by their intervention effects, particularly when direct effect estimation is infeasible or unreliable. We propose a conceptual framework based on amenability -- an individual's latent proclivity to be influenced by an intervention -- and formalize conditions under which predictive scores serve as effective proxies for amenability. These conditions justify using non-causal scores for intervention prioritization, even when the scores do not directly estimate effects. We further show that, under plausible assumptions, predictive models can outperform causal effect estimators in ranking individuals by intervention effects. Empirical evidence from an advertising context supports our theoretical findings, demonstrating that predictive modeling can offer a more robust approach to targeting than effect estimation. Our framework suggests a shift in focus -- from estimating effects to inferring who is amenable -- as a practical and theoretically grounded strategy for prioritizing interventions in resource-constrained environments.

Updated: 2025-04-08 03:03:32

标题: 《服从性框架：重新思考无需估计因果效应的因果排序》

摘要: 当我们无法估计干预效果时，我们应该优先考虑谁进行干预？在许多应用领域，如广告、客户保留和行为劝导，优先考虑是由预测模型指导的，这些模型估计结果概率而不是因果效果。本文研究了在直接效果估计不可行或不可靠时，这些预测（分数）何时能有效地按其干预效果对个体进行排名。我们提出了一个基于适应性的概念框架，即一个个体对干预的潜在倾向，以及规定了预测分数何时作为适应性的有效代理的条件。这些条件证明了在干预优先考虑时使用非因果分数是合理的，即使这些分数并不直接估计效果。我们进一步展示，在合理假设下，预测模型可以在排名个体的干预效果方面胜过因果效果估计器。来自广告背景的实证证据支持我们的理论发现，证明了预测建模可以提供比效果估计更稳健的定位方法。我们的框架建议，将重点从估计效果转向推断谁是适应的，作为在资源受限环境中优先考虑干预的实用和理论基础策略。

更新时间: 2025-04-08 03:03:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2206.12532v7

Maternal and Fetal Health Status Assessment by Using Machine Learning on Optical 3D Body Scans

Monitoring maternal and fetal health during pregnancy is crucial for preventing adverse outcomes. While tests such as ultrasound scans offer high accuracy, they can be costly and inconvenient. Telehealth and more accessible body shape information provide pregnant women with a convenient way to monitor their health. This study explores the potential of 3D body scan data, captured during the 18-24 gestational weeks, to predict adverse pregnancy outcomes and estimate clinical parameters. We developed a novel algorithm with two parallel streams which are used for extract body shape features: one for supervised learning to extract sequential abdominal circumference information, and another for unsupervised learning to extract global shape descriptors, alongside a branch for demographic data. Our results indicate that 3D body shape can assist in predicting preterm labor, gestational diabetes mellitus (GDM), gestational hypertension (GH), and in estimating fetal weight. Compared to other machine learning models, our algorithm achieved the best performance, with prediction accuracies exceeding 88% and fetal weight estimation accuracy of 76.74% within a 10% error margin, outperforming conventional anthropometric methods by 22.22%.

Updated: 2025-04-08 03:02:26

标题: 利用机器学习在光学3D人体扫描中评估母婴健康状况

摘要: 在怀孕期间监测母婴健康对预防不良结局至关重要。虽然像超声波扫描这样的检测方法具有高准确性，但它们可能成本高且不方便。远程医疗和更易获取的体形信息为孕妇提供了一种便捷的监测方式。本研究探讨了在孕18-24周期间捕获的3D身体扫描数据的潜力，用于预测不良妊娠结局并估计临床参数。我们开发了一种新算法，其中有两个并行流用于提取体形特征：一个用于监督学习提取顺序腹围信息，另一个用于无监督学习提取全局形状描述符，同时还有一个用于人口统计数据的分支。我们的结果表明，3D体形可以帮助预测早产、妊娠糖尿病（GDM）、妊娠高血压（GH），并估计胎儿体重。与其他机器学习模型相比，我们的算法取得了最佳表现，预测准确率超过88%，在10%误差范围内的胎儿体重估计准确率为76.74%，比传统人体测量方法高出22.22%。

更新时间: 2025-04-08 03:02:26

领域: cs.LG

下载: http://arxiv.org/abs/2504.05627v1

Safe Screening Rules for Group OWL Models

Group Ordered Weighted $L_{1}$-Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.

Updated: 2025-04-08 02:59:56

标题: 团体OWL模型的安全筛查规则

摘要: Group Ordered Weighted $L_{1}$-Norm (Group OWL)正则化模型已经成为处理具有相关特征的高维稀疏多任务学习的有效方法。近端梯度方法被用作解决Group OWL模型的标准方法。然而，在高维情况下，Group OWL模型通常会遭受巨大的计算成本和内存使用。为了解决这一挑战，在本文中，我们首次提出了Group OWL模型的安全筛选规则，通过有效处理结构化不可分离的惩罚，可以快速识别在所有任务中具有零系数的非活跃特征。因此，在训练过程中去除非活跃特征，我们可能实现大幅计算节约和内存节省。更重要的是，所提出的筛选规则可以直接与现有的批处理和随机设置的求解器集成。从理论上讲，我们证明了我们的筛选规则是安全的，并且可以安全地应用于现有的迭代优化算法。我们的实验结果表明，我们的筛选规则可以有效识别非活跃特征，并在不损失准确性的情况下显著提高计算速度。

更新时间: 2025-04-08 02:59:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.03152v2

Model-Agnostic Policy Explanations with Large Language Models

Intelligent agents, such as robots, are increasingly deployed in real-world, human-centric environments. To foster appropriate human trust and meet legal and ethical standards, these agents must be able to explain their behavior. However, state-of-the-art agents are typically driven by black-box models like deep neural networks, limiting their interpretability. We propose a method for generating natural language explanations of agent behavior based only on observed states and actions -- without access to the agent's underlying model. Our approach learns a locally interpretable surrogate model of the agent's behavior from observations, which then guides a large language model to generate plausible explanations with minimal hallucination. Empirical results show that our method produces explanations that are more comprehensible and correct than those from baselines, as judged by both language models and human evaluators. Furthermore, we find that participants in a user study more accurately predicted the agent's future actions when given our explanations, suggesting improved understanding of agent behavior.

Updated: 2025-04-08 02:56:02

标题: 使用大型语言模型的模型无关政策解释

摘要: 智能代理，如机器人，越来越多地部署在现实世界的以人为中心的环境中。为了培养适当的人类信任并满足法律和道德标准，这些代理必须能够解释它们的行为。然而，目前的代理通常由深度神经网络等黑盒模型驱动，这限制了它们的可解释性。我们提出了一种方法，基于观察到的状态和行动，仅生成代理行为的自然语言解释，而无需访问代理的基础模型。我们的方法通过观察学习代理行为的局部可解释代理模型，然后引导一个大型语言模型生成具有最小幻觉的合理解释。实证结果表明，我们的方法产生的解释比基准方法更易理解和更正确，这是由语言模型和人类评估者判断的。此外，我们发现，在用户研究中，参与者在给予我们的解释后更准确地预测了代理的未来行动，表明对代理行为的理解得到了改善。

更新时间: 2025-04-08 02:56:02

领域: cs.LG

下载: http://arxiv.org/abs/2504.05625v1

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Large vision-language models (LVLMs) have shown remarkable capabilities in interpreting visual content. While existing works demonstrate these models' vulnerability to deliberately placed adversarial texts, such texts are often easily identifiable as anomalous. In this paper, we present the first approach to generate scene-coherent typographic adversarial attacks that mislead advanced LVLMs while maintaining visual naturalness through the capability of the LLM-based agent. Our approach addresses three critical questions: what adversarial text to generate, where to place it within the scene, and how to integrate it seamlessly. We propose a training-free, multi-modal LLM-driven scene-coherent typographic adversarial planning (SceneTAP) that employs a three-stage process: scene understanding, adversarial planning, and seamless integration. The SceneTAP utilizes chain-of-thought reasoning to comprehend the scene, formulate effective adversarial text, strategically plan its placement, and provide detailed instructions for natural integration within the image. This is followed by a scene-coherent TextDiffuser that executes the attack using a local diffusion mechanism. We extend our method to real-world scenarios by printing and placing generated patches in physical environments, demonstrating its practical implications. Extensive experiments show that our scene-coherent adversarial text successfully misleads state-of-the-art LVLMs, including ChatGPT-4o, even after capturing new images of physical setups. Our evaluations demonstrate a significant increase in attack success rates while maintaining visual naturalness and contextual appropriateness. This work highlights vulnerabilities in current vision-language models to sophisticated, scene-coherent adversarial attacks and provides insights into potential defense mechanisms.

Updated: 2025-04-08 02:54:58

标题: SceneTAP: 场景一致的排版对抗规划器，用于应对现实世界环境中的视觉语言模型

摘要: 大型视觉语言模型（LVLM）展现出在解释视觉内容方面的卓越能力。尽管现有研究表明这些模型对有意放置的对抗性文本具有脆弱性，但这些文本通常很容易识别为异常。本文介绍了第一个生成具有场景一致性的印刷对抗攻击的方法，该方法通过基于LLM的代理的能力维持视觉自然性来误导先进的LVLMs。我们的方法解决了三个关键问题：生成什么对抗性文本，将其放置在场景的哪里，以及如何无缝集成它。我们提出了一种无需训练的、多模态的LLM驱动的场景一致性印刷对抗规划（SceneTAP）方法，该方法采用三阶段过程：场景理解、对抗规划和无缝集成。SceneTAP利用思维链推理来理解场景，制定有效的对抗性文本，战略性地规划其放置位置，并为在图像中自然集成提供详细说明。然后是一个场景一致的TextDiffuser，通过局部扩散机制执行攻击。我们将我们的方法扩展到现实场景中，通过在物理环境中打印和放置生成的补丁来展示其实际意义。广泛的实验表明，我们的场景一致性对抗性文本成功地误导了最先进的LVLMs，包括ChatGPT-4o，即使在捕捉新的物理设置图像之后也是如此。我们的评估表明，攻击成功率显著增加，同时保持视觉自然性和上下文适当性。这项工作突出了当前视觉语言模型对复杂、场景一致的对抗性攻击的脆弱性，并提供了潜在防御机制的见解。

更新时间: 2025-04-08 02:54:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.00114v2

PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity

As deep learning continues to be driven by ever-larger datasets, understanding which examples are most important for generalization has become a critical question. While progress in data selection continues, emerging applications require studying this problem in dynamic contexts. To bridge this gap, we pose the Incremental Data Selection (IDS) problem, where examples arrive as a continuous stream, and need to be selected without access to the full data source. In this setting, the learner must incrementally build a training dataset of predefined size while simultaneously learning the underlying task. We find that in IDS, the impact of a new sample on the model state depends fundamentally on both its geometric relationship in the feature space and its prediction error. Leveraging this insight, we propose PEAKS (Prediction Error Anchored by Kernel Similarity), an efficient data selection method tailored for IDS. Our comprehensive evaluations demonstrate that PEAKS consistently outperforms existing selection strategies. Furthermore, PEAKS yields increasingly better performance returns than random selection as training data size grows on real-world datasets.

Updated: 2025-04-08 02:48:22

标题: PEAKS: 通过核相似性锚定预测误差逐步选择关键训练示例

摘要: 随着深度学习继续受到越来越大的数据集驱动，理解哪些示例对泛化最为重要已经成为一个关键问题。尽管在数据选择方面取得了进展，新兴应用需要在动态环境中研究这个问题。为了弥合这一差距，我们提出了增量数据选择（IDS）问题，其中示例以连续流的形式到达，并且需要在没有访问完整数据源的情况下进行选择。在这种情况下，学习者必须在同时学习潜在任务的情况下，逐步构建一个预定义大小的训练数据集。我们发现在IDS中，新样本对模型状态的影响在很大程度上取决于其在特征空间中的几何关系和其预测误差。利用这一洞察力，我们提出了PEAKS（由核相似性锚定的预测误差），这是一种专为IDS定制的高效数据选择方法。我们的全面评估表明，PEAKS始终优于现有的选择策略。此外，在真实数据集上，随着训练数据规模的增长，PEAKS的性能回报也比随机选择更好。

更新时间: 2025-04-08 02:48:22

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.05250v2

Batched Stochastic Bandit for Nondegenerate Functions

This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{{\mathcal{O}}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $\pi$, there exists a problem instance on which $\pi$ admits a regret of order ${\Omega} ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ \Omega ( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.

Updated: 2025-04-08 02:42:12

标题: 批处理随机赌博机在非退化函数中的应用

摘要: 这篇论文研究了非退化函数的批处理赌博学习问题。我们提出了一种算法，可以在非退化函数的情况下近乎最优地解决批处理赌博问题。更具体地说，我们引入了一种称为几何缩小（GN）的算法，其遗憾界为$\widetilde{{\mathcal{O}}} ( A_{+}^d \sqrt{T} )$。此外，GN仅需要$\mathcal{O} (\log \log T)$批次即可实现这种遗憾。我们还为这个问题提供了下界分析。更具体地说，我们证明了在某些（紧致的）加倍维度为$d$的加倍度量空间上：1. 对于任何策略$\pi$，存在一个问题实例，在该实例上$\pi$的遗憾为${\Omega} ( A_-^d \sqrt{T})$；2. 没有任何策略可以在所有问题实例上实现$ A_-^d \sqrt{T} $的遗憾，并且少于$ \Omega ( \log \log T ) $轮通信。我们的下界分析表明，GN算法以最少的批次实现了接近最优的遗憾。

更新时间: 2025-04-08 02:42:12

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.05733v3

Continual Learning of Multiple Cognitive Functions with Brain-inspired Temporal Development Mechanism

Cognitive functions in current artificial intelligence networks are tied to the exponential increase in network scale, whereas the human brain can continuously learn hundreds of cognitive functions with remarkably low energy consumption. This advantage is in part due to the brain cross-regional temporal development mechanisms, where the progressive formation, reorganization, and pruning of connections from basic to advanced regions, facilitate knowledge transfer and prevent network redundancy. Inspired by these, we propose the Continual Learning of Multiple Cognitive Functions with Brain-inspired Temporal Development Mechanism(TD-MCL), enabling cognitive enhancement from simple to complex in Perception-Motor-Interaction(PMI) multiple cognitive task scenarios. The TD-MCL model proposes the sequential evolution of long-range connections between different cognitive modules to promote positive knowledge transfer, while using feedback-guided local connection inhibition and pruning to effectively eliminate redundancies in previous tasks, reducing energy consumption while preserving acquired knowledge. Experiments show that the proposed method can achieve continual learning capabilities while reducing network scale, without introducing regularization, replay, or freezing strategies, and achieving superior accuracy on new tasks compared to direct learning. The proposed method shows that the brain's developmental mechanisms offer a valuable reference for exploring biologically plausible, low-energy enhancements of general cognitive abilities.

Updated: 2025-04-08 02:36:36

标题: 用脑启发的时间发展机制持续学习多种认知功能

摘要: 当前人工智能网络中的认知功能与网络规模的指数增长密切相关，而人类大脑可以以极低的能量消耗持续学习数百种认知功能。这一优势部分归因于大脑跨区域时间发展机制，其中从基础到高级区域的连接逐渐形成、重组和修剪，促进知识转移并防止网络冗余。受此启发，我们提出了受大脑启发的时间发展机制(TD-MCL)下的多认知功能的持续学习方法，使感知-运动-交互(PMI)多认知任务场景中的认知能力得到从简单到复杂的增强。TD-MCL模型提出了不同认知模块之间长程连接的顺序演化，以促进正向知识转移，同时利用反馈引导的局部连接抑制和修剪，有效消除先前任务中的冗余，降低能量消耗同时保留已获得的知识。实验证明，所提出的方法可以实现持续学习能力，同时减小网络规模，而不引入正则化、重播或冻结策略，并在新任务上实现比直接学习更高的准确性。所提出的方法表明，大脑的发展机制为探索生物学合理、低能量的普遍认知能力增强提供了有价值的参考。

更新时间: 2025-04-08 02:36:36

领域: cs.AI

下载: http://arxiv.org/abs/2504.05621v1

An In-depth Evaluation of Large Language Models in Sentence Simplification with Error-based Human Assessment

Recent studies have used both automatic metrics and human evaluations to assess the simplification abilities of LLMs. However, the suitability of existing evaluation methodologies for LLMs remains in question. First, the suitability of current automatic metrics on LLMs' simplification evaluation is still uncertain. Second, current human evaluation approaches in sentence simplification often fall into two extremes: they are either too superficial, failing to offer a clear understanding of the models' performance, or overly detailed, making the annotation process complex and prone to inconsistency, which in turn affects the evaluation's reliability. To address these problems, this study provides in-depth insights into LLMs' performance while ensuring the reliability of the evaluation. We design an error-based human annotation framework to assess the LLMs' simplification capabilities. We select both closed-source and open-source LLMs, including GPT-4, Qwen2.5-72B, and Llama-3.2-3B. We believe that these models offer a representative selection across large, medium, and small sizes of LLMs. Results show that GPT-4 generally generates fewer erroneous simplification outputs compared to the current state-of-the-art. However, LLMs have their limitations, as seen in GPT-4's struggles with lexical paraphrasing. Results show that LLMs generally generate fewer erroneous simplification outputs compared to the previous state-of-the-art. However, LLMs have their limitations, as seen in GPT-4's and Qwen2.5-72B's struggle with lexical paraphrasing. Furthermore, we conduct meta-evaluations on widely used automatic metrics using our human annotations. We find that these metrics lack sufficient sensitivity to assess the overall high-quality simplifications, particularly those generated by high-performance LLMs.

Updated: 2025-04-08 02:31:31

标题: 一个对大型语言模型在句子简化中进行深入评估的基于错误的人类评估

摘要: 最近的研究使用自动指标和人类评估来评估LLMs的简化能力。然而，现有评估方法对LLMs的适用性仍存在疑问。首先，目前自动指标在LLMs简化评估上的适用性仍不确定。其次，在句子简化方面，目前的人类评估方法往往分为两个极端：要么过于肤浅，无法清晰理解模型的性能，要么过于详细，使注释过程复杂且容易不一致，进而影响评估的可靠性。为解决这些问题，本研究提供了对LLMs性能的深入见解，同时确保评估的可靠性。我们设计了基于错误的人类注释框架来评估LLMs的简化能力。我们选择了闭源和开源的LLMs，包括GPT-4、Qwen2.5-72B和Llama-3.2-3B。我们认为这些模型代表了大、中、小型LLMs的典型选择。结果显示，与当前的最新技术相比，GPT-4通常生成的错误简化输出较少。然而，LLMs也有其局限性，如GPT-4在词汇改写方面的困难。结果显示，与先前的最新技术相比，LLMs通常生成的错误简化输出更少。然而，LLMs也有其局限性，如GPT-4和Qwen2.5-72B在词汇改写方面的挣扎。此外，我们使用我们的人类注释对广泛使用的自动指标进行元评估。我们发现这些指标缺乏足够的灵敏度来评估整体高质量的简化，特别是由高性能LLMs生成的简化。

更新时间: 2025-04-08 02:31:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.04963v2

Model Extrapolation Expedites Alignment

Given the high computational cost of preference alignment training of large language models (LLMs), exploring efficient methods to reduce the training overhead remains an important and compelling research problem. Motivated by the observation that alignment training typically involves only small parameter changes without injecting new knowledge into models, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs' alignment with human preferences. Given a partially-trained model and its initial SFT checkpoint, ExPO improves the implicit optimization objective of alignment training by simply amplifying the parameter change based on a first-order approximation, without any additional training overhead. Through controlled experiments, we demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one. Moreover, we show that ExPO notably improves existing open-source LLMs (ranging from 1.8B to 70B parameters) on the leading AlpacaEval 2.0 and MT-Bench benchmarks, which highlights ExPO's broader utility in efficiently enhancing LLM alignment.

Updated: 2025-04-08 02:27:00

标题: 模型外推加快对齐

摘要: 考虑到大型语言模型（LLMs）的偏好调整训练的计算成本较高，探索有效的方法来减少训练开销仍然是一个重要且引人注目的研究问题。受到对齐训练通常仅涉及对模型进行小参数更改而不注入新知识的观察的启发，我们提出了一种称为ExPO（模型外推）的直接方法，以加快LLMs与人类偏好的对齐。给定一个部分训练的模型及其初始SFT检查点，ExPO通过简单地放大基于一阶近似的参数更改来改善对齐训练的隐式优化目标，而无需任何额外的训练开销。通过对照实验，我们证明ExPO将一个仅经过20%步骤训练的DPO模型推动超越完全训练的模型。此外，我们展示ExPO显著改进了现有的开源LLMs（参数范围从18亿到700亿）在领先的AlpacaEval 2.0和MT-Bench基准测试中的表现，这突显了ExPO在有效增强LLMs对齐方面的更广泛实用性。

更新时间: 2025-04-08 02:27:00

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.16792v3

Technical Report: Full Version of Analyzing and Optimizing Perturbation of DP-SGD Geometrically

Differential privacy (DP) has become a prevalent privacy model in a wide range of machine learning tasks, especially after the debut of DP-SGD. However, DP-SGD, which directly perturbs gradients in the training iterations, fails to mitigate the negative impacts of noise on gradient direction. As a result, DP-SGD is often inefficient. Although various solutions (e.g., clipping to reduce the sensitivity of gradients and amplifying privacy bounds to save privacy budgets) are proposed to trade privacy for model efficiency, the root cause of its inefficiency is yet unveiled. In this work, we first generalize DP-SGD and theoretically derive the impact of DP noise on the training process. Our analysis reveals that, in terms of a perturbed gradient, only the noise on direction has eminent impact on the model efficiency while that on magnitude can be mitigated by optimization techniques, i.e., fine-tuning gradient clipping and learning rate. Besides, we confirm that traditional DP introduces biased noise on the direction when adding unbiased noise to the gradient itself. Overall, the perturbation of DP-SGD is actually sub-optimal from a geometric perspective. Motivated by this, we design a geometric perturbation strategy GeoDP within the DP framework, which perturbs the direction and the magnitude of a gradient, respectively. By directly reducing the noise on the direction, GeoDP mitigates the negative impact of DP noise on model efficiency with the same DP guarantee. Extensive experiments on two public datasets (i.e., MNIST and CIFAR-10), one synthetic dataset and three prevalent models (i.e., Logistic Regression, CNN and ResNet) confirm the effectiveness and generality of our strategy.

Updated: 2025-04-08 02:26:10

标题: 技术报告：几何分析和优化DP-SGD扰动的完整版本

摘要: 差分隐私（DP）已经成为广泛应用于各种机器学习任务的隐私模型，特别是在DP-SGD推出之后。然而，DP-SGD直接扰动训练迭代中的梯度，未能减轻噪音对梯度方向的负面影响。因此，DP-SGD通常效率低下。虽然提出了各种解决方案（例如剪切以减少梯度的灵敏度和增加隐私边界以节省隐私预算）来交换隐私以换取模型效率，但其低效性的根本原因尚未揭示。在这项工作中，我们首先概括了DP-SGD，并在理论上推导了DP噪音对训练过程的影响。我们的分析揭示了在扰动梯度方面，只有方向上的噪音对模型效率有显著影响，而在幅度上的噪音可以通过优化技术来减轻，即微调梯度剪切和学习率。此外，我们确认传统的DP在向梯度本身添加无偏噪音时会在方向上引入偏置噪音。总体而言，从几何角度来看，DP-SGD的扰动实际上是次优的。受此启发，我们设计了一个几何扰动策略GeoDP，在DP框架内，分别扰动梯度的方向和幅度。通过直接减少方向上的噪音，GeoDP减轻了DP噪音对模型效率的负面影响，同时保持相同的DP保证。在两个公共数据集（即MNIST和CIFAR-10）、一个合成数据集和三个流行模型（即逻辑回归、CNN和ResNet）上进行的大量实验验证了我们策略的有效性和普适性。

更新时间: 2025-04-08 02:26:10

领域: cs.LG,cs.AI,cs.CV,cs.DB

下载: http://arxiv.org/abs/2504.05618v1

Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

Recent advancements in large language models (LLMs) have significantly advanced text-to-SQL systems. However, most LLM-based methods often narrowly focus on SQL generation, neglecting the complexities of real-world conversational queries. This oversight can lead to unreliable responses, particularly for ambiguous questions that cannot be directly addressed with SQL. To bridge this gap, we propose MMSQL, a comprehensive test suite designed to evaluate the question classification and SQL generation capabilities of LLMs by simulating real-world scenarios with diverse question types and multi-turn Q&A interactions. Using MMSQL, we assessed the performance of popular LLMs, including both open-source and closed-source models, and identified key factors impacting their performance in such scenarios. Moreover, we introduce an LLM-based multi-agent framework that employs specialized agents to identify question types and determine appropriate answering strategies. Our experiments demonstrate that this approach significantly enhances the model's ability to navigate the complexities of conversational dynamics, effectively handling the diverse and complex nature of user queries. Our dataset and code are publicly available at https://mcxiaoxiao.github.io/MMSQL.

Updated: 2025-04-08 02:23:17

标题: 评估和增强用于多轮文本到SQL的LLMs，涵盖多种问题类型

摘要: 最近大型语言模型（LLMs）的最新进展显著推动了文本到SQL系统的发展。然而，大多数基于LLM的方法往往狭隘地专注于SQL生成，忽略了真实世界对话查询的复杂性。这一疏忽可能导致不可靠的响应，特别是对于无法直接用SQL解决的模糊问题。为了弥补这一差距，我们提出了MMSQL，这是一个全面的测试套件，旨在通过模拟具有多样化问题类型和多轮问答交互的真实场景，评估LLMs的问题分类和SQL生成能力。使用MMSQL，我们评估了流行的LLMs的性能，包括开源和闭源模型，并确定了影响它们在这种场景下性能的关键因素。此外，我们提出了一个基于LLM的多代理框架，利用专门的代理来识别问题类型并确定适当的回答策略。我们的实验表明，这种方法显著增强了模型处理对话动态复杂性的能力，有效处理用户查询的多样化和复杂性。我们的数据集和代码可以在https://mcxiaoxiao.github.io/MMSQL上公开获取。

更新时间: 2025-04-08 02:23:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2412.17867v4

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods

Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L^2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), where the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper analyzes LMC using WM, which is useful for understanding stochastic gradient descent's effectiveness and its application in areas like model merging. We first empirically show that permutations found by WM do not significantly reduce the $L^2$ distance between two models, and the occurrence of LMC is not merely due to distance reduction by WM itself. We then demonstrate that permutations can change the directions of the singular vectors, but not the singular values, of the weight matrices in each layer. This finding shows that permutations found by WM primarily align the directions of singular vectors associated with large singular values across models. This alignment brings the singular vectors with large singular values, which determine the model's functionality, closer between the original and merged models, allowing the merged model to retain functionality similar to the original models, thereby satisfying LMC. This paper also analyzes activation matching (AM) in terms of singular vectors and finds that the principle of AM is likely the same as that of WM. Finally, we analyze the difference between WM and the straight-through estimator (STE), a dataset-dependent permutation search method, and show that WM can be more advantageous than STE in achieving LMC among three or more models.

Updated: 2025-04-08 02:23:05

标题: 使用基于排列的加权匹配分析线性模式连通性：揭示其他排列搜索方法的见解

摘要: 最近，Ainsworth等人表明，使用权重匹配（WM）来最小化模型参数排列搜索中的$L^2$距离有效地识别满足线性模式连接（LMC）的排列，其中沿着两个不同种子独立训练的模型之间的线性路径的损失保持几乎恒定。本文分析了使用WM的LMC，这对于理解随机梯度下降的有效性及其在模型合并等领域的应用很有用。我们首先通过实证研究表明，通过WM找到的排列并不能显著减少两个模型之间的$L^2$距离，并且LMC的发生并不仅仅是由于WM本身的距离减小。然后我们证明，排列可以改变各层权重矩阵的奇异向量的方向，但不能改变奇异值。这一发现表明，通过WM找到的排列主要是在模型之间对齐与大奇异值相关联的奇异向量的方向。这种对齐使具有大奇异值的奇异向量更接近原始模型和合并模型之间，允许合并模型保持与原始模型类似的功能，从而满足LMC。本文还从奇异向量的角度分析了激活匹配（AM），并发现AM的原则可能与WM的原则相同。最后，我们分析了WM和直通估计器（STE），一种依赖数据集的排列搜索方法之间的区别，并表明在实现三个或更多模型之间的LMC方面，WM可能比STE更有优势。

更新时间: 2025-04-08 02:23:05

领域: cs.LG

下载: http://arxiv.org/abs/2402.04051v5

FedEFC: Federated Learning Using Enhanced Forward Correction Against Noisy Labels

Federated Learning (FL) is a powerful framework for privacy-preserving distributed learning. It enables multiple clients to collaboratively train a global model without sharing raw data. However, handling noisy labels in FL remains a major challenge due to heterogeneous data distributions and communication constraints, which can severely degrade model performance. To address this issue, we propose FedEFC, a novel method designed to tackle the impact of noisy labels in FL. FedEFC mitigates this issue through two key techniques: (1) prestopping, which prevents overfitting to mislabeled data by dynamically halting training at an optimal point, and (2) loss correction, which adjusts model updates to account for label noise. In particular, we develop an effective loss correction tailored to the unique challenges of FL, including data heterogeneity and decentralized training. Furthermore, we provide a theoretical analysis, leveraging the composite proper loss property, to demonstrate that the FL objective function under noisy label distributions can be aligned with the clean label distribution. Extensive experimental results validate the effectiveness of our approach, showing that it consistently outperforms existing FL techniques in mitigating the impact of noisy labels, particularly under heterogeneous data settings (e.g., achieving up to 41.64% relative performance improvement over the existing loss correction method).

Updated: 2025-04-08 02:14:50

标题: FedEFC：使用增强的前向校正对抗嘈杂标签的联邦学习

摘要: 联邦学习（FL）是一种强大的框架，用于保护隐私的分布式学习。它使多个客户端能够协作训练一个全局模型，而无需共享原始数据。然而，在FL中处理噪声标签仍然是一个主要挑战，这是由于异构数据分布和通信约束造成的，这可能严重降低模型性能。为了解决这个问题，我们提出了FedEFC，这是一种旨在应对FL中噪声标签影响的新方法。FedEFC通过两种关键技术减轻了这个问题：（1）预停止，通过在最佳点动态停止训练，防止过度拟合误标记数据，和（2）损失校正，调整模型更新以考虑标签噪声。特别是，我们开发了一种针对FL独特挑战的有效损失校正，包括数据异质性和分散式训练。此外，我们提供了一个理论分析，利用复合适当损失属性，以证明在噪声标签分布下的FL目标函数可以与干净标签分布对齐。广泛的实验结果验证了我们方法的有效性，显示它在减轻噪声标签影响方面始终优于现有的FL技术，特别是在异构数据设置下（例如，相对性能提高高达41.64%，超过现有损失校正方法）。

更新时间: 2025-04-08 02:14:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05615v1

Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG

Retrieval models typically rely on costly human-labeled query-document relevance annotations for training and evaluation. To reduce this cost and leverage the potential of Large Language Models (LLMs) in relevance judgments, we aim to explore whether LLM-generated annotations can effectively replace human annotations in training retrieval models. Retrieval usually emphasizes relevance, which indicates "topic-relatedness" of a document to a query, while in RAG, the value of a document (or utility) depends on how it contributes to answer generation. Recognizing this mismatch, some researchers use LLM performance on downstream tasks with documents as labels, but this approach requires manual answers for specific tasks, leading to high costs and limited generalization. In another line of work, prompting LLMs to select useful documents as RAG references eliminates the need for human annotation and is not task-specific. If we leverage LLMs' utility judgments to annotate retrieval data, we may retain cross-task generalization without human annotation in large-scale corpora. Therefore, we investigate utility-focused annotation via LLMs for large-scale retriever training data across both in-domain and out-of-domain settings on the retrieval and RAG tasks. To reduce the impact of low-quality positives labeled by LLMs, we design a novel loss function, i.e., Disj-InfoNCE. Our experiments reveal that: (1) Retrievers trained on utility-focused annotations significantly outperform those trained on human annotations in the out-of-domain setting on both tasks, demonstrating superior generalization capabilities. (2) LLM annotation does not replace human annotation in the in-domain setting. However, incorporating just 20% human-annotated data enables retrievers trained with utility-focused annotations to match the performance of models trained entirely with human annotations.

Updated: 2025-04-08 02:11:05

标题: 利用LLMs进行以效用为中心的注释：减少检索和RAG的手动工作

摘要: 检索模型通常依赖于昂贵的人工标记的查询文档相关性注释来进行训练和评估。为了降低这种成本并利用大型语言模型（LLMs）在相关性判断中的潜力，我们旨在探讨LLM生成的注释是否能有效地取代人工注释来训练检索模型。检索通常强调相关性，这表明文档与查询的“主题相关性”，而在RAG中，文档的价值（或效用）取决于它对答案生成的贡献。认识到这种不匹配，一些研究人员使用LLM在带有文档标签的下游任务上的表现，但这种方法需要特定任务的手动答案，导致成本高昂且通用性有限。在另一条研究方向中，提示LLMs选择有用的文档作为RAG参考消除了对人工注释的需求并且不是特定于任务的。如果我们利用LLMs的效用判断来注释检索数据，我们可能在大规模语料库中保留跨任务的通用性而无需人工注释。因此，我们通过LLMs进行基于效用的注释调查在检索和RAG任务中的大规模检索器训练数据在领域内和领域外设置。为了减少LLMs标记的低质量正例的影响，我们设计了一种新颖的损失函数，即Disj-InfoNCE。我们的实验揭示了：（1）在领域外设置中，基于效用注释训练的检索器在两个任务上明显优于基于人工注释训练的模型，展示了出色的泛化能力。（2）在领域内设置中，LLM注释不能替代人工注释。然而，仅结合20%的人工注释数据就能使基于效用注释训练的检索器与完全基于人工注释训练的模型性能相匹配。

更新时间: 2025-04-08 02:11:05

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.05220v2

Deep Learning-Based Approach for Identification of Potato Leaf Diseases Using Wrapper Feature Selection and Feature Concatenation

The potato is a widely grown crop in many regions of the world. In recent decades, potato farming has gained incredible traction in the world. Potatoes are susceptible to several illnesses that stunt their development. This plant seems to have significant leaf disease. Early Blight and Late Blight are two prevalent leaf diseases that affect potato plants. The early detection of these diseases would be beneficial for enhancing the yield of this crop. The ideal solution is to use image processing to identify and analyze these disorders. Here, we present an autonomous method based on image processing and machine learning to detect late blight disease affecting potato leaves. The proposed method comprises four different phases: (1) Histogram Equalization is used to improve the quality of the input image; (2) feature extraction is performed using a Deep CNN model, then these extracted features are concatenated; (3) feature selection is performed using wrapper-based feature selection; (4) classification is performed using an SVM classifier and its variants. This proposed method achieves the highest accuracy of 99% using SVM by selecting 550 features.

Updated: 2025-04-08 02:06:27

标题: 基于深度学习的土豆叶病识别方法：使用包装特征选择和特征串联

摘要: 马铃薯是世界许多地区广泛种植的作物。近几十年来，马铃薯种植在世界范围内获得了不可思议的增长。马铃薯容易受到一些影响其生长的疾病的侵害。这种植物似乎有显著的叶病。早疫病和晚疫病是影响马铃薯植株的两种常见叶病。及早检测这些疾病有助于增加这种作物的产量。理想的解决方案是利用图像处理来识别和分析这些疾病。在这里，我们提出了一种基于图像处理和机器学习的自动方法，用于检测影响马铃薯叶片的晚疫病。该方法包括四个不同阶段：（1）使用直方图均衡化来改善输入图像的质量；（2）使用深度CNN模型进行特征提取，然后将这些提取的特征连接起来；（3）使用基于包装的特征选择进行特征选择；（4）使用SVM分类器及其变种进行分类。该提出的方法通过选择550个特征，使用SVM实现了99%的最高准确度。

更新时间: 2025-04-08 02:06:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.03370v2

Artificial Intelligence Index Report 2025

Welcome to the eighth edition of the AI Index report. The 2025 Index is our most comprehensive to date and arrives at an important moment, as AI's influence across society, the economy, and global governance continues to intensify. New in this year's report are in-depth analyses of the evolving landscape of AI hardware, novel estimates of inference costs, and new analyses of AI publication and patenting trends. We also introduce fresh data on corporate adoption of responsible AI practices, along with expanded coverage of AI's growing role in science and medicine. Since its founding in 2017 as an offshoot of the One Hundred Year Study of Artificial Intelligence, the AI Index has been committed to equipping policymakers, journalists, executives, researchers, and the public with accurate, rigorously validated, and globally sourced data. Our mission has always been to help these stakeholders make better-informed decisions about the development and deployment of AI. In a world where AI is discussed everywhere - from boardrooms to kitchen tables - this mission has never been more essential. The AI Index continues to lead in tracking and interpreting the most critical trends shaping the field - from the shifting geopolitical landscape and the rapid evolution of underlying technologies, to AI's expanding role in business, policymaking, and public life. Longitudinal tracking remains at the heart of our mission. In a domain advancing at breakneck speed, the Index provides essential context - helping us understand where AI stands today, how it got here, and where it may be headed next. Recognized globally as one of the most authoritative resources on artificial intelligence, the AI Index has been cited in major media outlets such as The New York Times, Bloomberg, and The Guardian; referenced in hundreds of academic papers; and used by policymakers and government agencies around the world.

Updated: 2025-04-08 02:01:37

标题: 人工智能指数报告2025

摘要: 欢迎阅读第八版AI指数报告。2025年的指数是迄今为止最全面的，正值AI在社会、经济和全球治理领域的影响不断加剧的重要时刻。今年报告中新增的内容包括对AI硬件发展格局的深入分析、推理成本的新估计以及对AI出版和专利趋势的新分析。我们还介绍了企业采用负责任AI实践的最新数据，并扩大了对AI在科学和医学领域日益增长作用的覆盖范围。自2017年作为人工智能一百年研究的衍生项目成立以来，AI指数一直致力于为决策者、记者、高管、研究人员和公众提供准确、经过严格验证且全球数据来源的数据。我们的使命始终是帮助这些利益相关者做出更明智的关于AI发展和部署的决策。在一个人人谈论AI的世界中 - 无论是在董事会还是在餐桌上 - 这一使命变得更加重要。AI指数继续领先追踪和解释塑造该领域最关键趋势的工作 - 从地缘政治格局的变化和基础技术的快速发展，到AI在商业、政策制定和公共生活中的不断扩大作用。纵向追踪仍然是我们使命的核心。在一个飞速发展的领域中，指数提供了必要的背景 - 帮助我们了解AI今天的发展状况，它是如何到达这里的，以及下一步可能的发展方向。作为全球公认的最权威的人工智能资源之一，AI指数已经被《纽约时报》、彭博社和《卫报》等主要媒体引用；被数百篇学术论文参考；并被世界各地的政策制定者和政府机构使用。

更新时间: 2025-04-08 02:01:37

领域: cs.AI

下载: http://arxiv.org/abs/2504.07139v1

Fairness in Machine Learning-based Hand Load Estimation: A Case Study on Load Carriage Tasks

Predicting external hand load from sensor data is essential for ergonomic exposure assessments, as obtaining this information typically requires direct observation or supplementary data. While machine learning methods have been used to estimate external hand load from worker postures or force exertion data, our findings reveal systematic bias in these predictions due to individual differences such as age and biological sex. To explore this issue, we examined bias in hand load prediction by varying the sex ratio in the training dataset. We found substantial sex disparity in predictive performance, especially when the training dataset is more sex-imbalanced. To address this bias, we developed and evaluated a fair predictive model for hand load estimation that leverages a Variational Autoencoder (VAE) with feature disentanglement. This approach is designed to separate sex-agnostic and sex-specific latent features, minimizing feature overlap. The disentanglement capability enables the model to make predictions based solely on sex-agnostic features of motion patterns, ensuring fair prediction for both biological sexes. Our proposed fair algorithm outperformed conventional machine learning methods (e.g., Random Forests) in both fairness and predictive accuracy, achieving a lower mean absolute error (MAE) difference across male and female sets and improved fairness metrics such as statistical parity (SP) and positive and negative residual differences (PRD and NRD), even when trained on imbalanced sex datasets. These findings emphasize the importance of fairness-aware machine learning algorithms to prevent potential disadvantages in workplace health and safety for certain worker populations.

Updated: 2025-04-08 01:55:40

标题: 机器学习在手部负荷估计中的公平性：负载搬运任务的案例研究

摘要: 从传感器数据中预测外部手部负载对于人体工程学暴露评估至关重要，因为获取这些信息通常需要直接观察或补充数据。虽然机器学习方法已被用于从工人姿势或施加的力量数据估计外部手部负载，但我们的研究发现这些预测存在系统偏差，原因是个体差异，如年龄和生理性别。为了探讨这个问题，我们通过改变训练数据集中的性别比例来研究手部负载预测中的偏差。我们发现在性别不平衡的训练数据集中，预测性能存在显著的性别差距。为了解决这种偏差，我们开发并评估了一种公平的手部负载估计预测模型，利用具有特征解缠的变分自动编码器（VAE）。这种方法旨在分离性别不可知和性别特定的潜在特征，最小化特征重叠。解缠能力使模型能够仅基于运动模式的性别不可知特征进行预测，确保对两种生理性别都进行公平预测。我们提出的公平算法在公平性和预测准确性方面优于传统的机器学习方法（例如随机森林），实现了男性和女性集之间较低的平均绝对误差（MAE）差异，并改善了公平度指标，如统计平衡（SP）和正负残差差异（PRD和NRD），甚至在训练不平衡性别数据集时也是如此。这些发现强调了意识到公平性的机器学习算法对于防止某些工人群体在工作场所健康和安全方面潜在的劣势的重要性。

更新时间: 2025-04-08 01:55:40

领域: cs.LG

下载: http://arxiv.org/abs/2504.05610v1

FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction

Extractive reading comprehension systems are designed to locate the correct answer to a question within a given text. However, a persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably recognizing unanswerable queries. Despite significant advances in large language models (LLMs) for reading comprehension, this issue remains critical, particularly as the length of supported contexts continues to expand. To address this challenge, we propose an innovative data augmentation methodology grounded in a multi-agent collaborative framework. Unlike traditional methods, such as the costly human annotation process required for datasets like SQuAD 2.0, our method autonomously generates evidence-based question-answer pairs and systematically constructs unanswerable questions. Using this methodology, we developed the FactGuard-Bench dataset, which comprises 25,220 examples of both answerable and unanswerable question scenarios, with context lengths ranging from 8K to 128K. Experimental evaluations conducted on seven popular LLMs reveal that even the most advanced models achieve only 61.79% overall accuracy. Furthermore, we emphasize the importance of a model's ability to reason about unanswerable questions to avoid generating plausible but incorrect answers. By implementing efficient data selection and generation within the multi-agent collaborative framework, our method significantly reduces the traditionally high costs associated with manual annotation and provides valuable insights for the training and optimization of LLMs.

Updated: 2025-04-08 01:45:16

标题: 事实守护：利用多智能体系统生成可回答和不可回答问题，以增强长文本LLM提取

摘要: 提取式阅读理解系统旨在在给定文本中定位问题的正确答案。然而，一个持久的挑战在于确保这些模型在回答问题时保持高准确性，同时可靠地识别无法回答的查询。尽管在大型语言模型（LLMs）用于阅读理解方面取得了显著进展，但这个问题仍然至关重要，特别是随着支持的上下文长度不断扩大。为了应对这一挑战，我们提出了一种基于多代理协作框架的创新数据增强方法。与传统方法不同，比如为SQuAD 2.0等数据集所需的昂贵人工标注过程，我们的方法自主生成基于证据的问题-答案对，并系统地构建无法回答的问题。利用这种方法，我们开发了FactGuard-Bench数据集，包括25,220个可回答和无法回答的问题场景示例，上下文长度范围从8K到128K不等。对七种流行的LLMs进行的实验评估显示，即使最先进的模型也仅达到61.79%的总体准确度。此外，我们强调模型能够推理无法回答的问题的重要性，以避免生成看似正确但实际错误的答案。通过在多代理协作框架中实施高效的数据选择和生成，我们的方法显著降低了传统上与手动标注相关的高成本，并为LLMs的训练和优化提供了有价值的见解。

更新时间: 2025-04-08 01:45:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.05607v1

Fox-1: Open Small Language Model for Cloud and Edge

We present Fox-1, a series of small language models (SLMs) consisting of Fox-1-1.6B and Fox-1-1.6B-Instruct-v0.1. These models are pre-trained on 3 trillion tokens of web-scraped document data and fine-tuned with 5 billion tokens of instruction-following and multi-turn conversation data. Aiming to improve the pre-training efficiency, Fox-1-1.6B model introduces a novel 3-stage data curriculum across all the training data with 2K-8K sequence length. In architecture design, Fox-1 features a deeper layer structure, an expanded vocabulary, and utilizes Grouped Query Attention (GQA), offering a performant and efficient architecture compared to other SLMs. Fox-1 achieves better or on-par performance in various benchmarks compared to StableLM-2-1.6B, Gemma-2B, Qwen1.5-1.8B, and OpenELM1.1B, with competitive inference speed and throughput. The model weights have been released under the Apache 2.0 license, where we aim to promote the democratization of LLMs and make them fully accessible to the whole open-source community.

Updated: 2025-04-08 01:39:22

标题: Fox-1：用于云和边缘的开放小型语言模型

摘要: 我们介绍了Fox-1，一个由Fox-1-1.6B和Fox-1-1.6B-Instruct-v0.1组成的一系列小语言模型（SLMs）。这些模型在抓取的网页文档数据中预训练了3万亿个标记，并通过5亿个指令跟随和多轮对话数据进行了微调。为了提高预训练效率，Fox-1-1.6B模型引入了一种新颖的3阶段数据课程，覆盖所有2K-8K序列长度的训练数据。在架构设计方面，Fox-1具有更深的层次结构，扩展的词汇表，并利用了分组查询注意力（GQA），相比其他SLM，提供了高性能和高效的架构。与StableLM-2-1.6B、Gemma-2B、Qwen1.5-1.8B和OpenELM1.1B相比，Fox-1在各种基准测试中表现更好或相当，并具有竞争性的推理速度和吞吐量。模型权重已根据Apache 2.0许可发布，我们旨在推动LLM的民主化并使其完全开放给整个开源社区。

更新时间: 2025-04-08 01:39:22

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.05281v3

ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

Chain-of-Thought (CoT) enhances an LLM's ability to perform complex reasoning tasks, but it also introduces new security issues. In this work, we present ShadowCoT, a novel backdoor attack framework that targets the internal reasoning mechanism of LLMs. Unlike prior token-level or prompt-based attacks, ShadowCoT directly manipulates the model's cognitive reasoning path, enabling it to hijack multi-step reasoning chains and produce logically coherent but adversarial outcomes. By conditioning on internal reasoning states, ShadowCoT learns to recognize and selectively disrupt key reasoning steps, effectively mounting a self-reflective cognitive attack within the target model. Our approach introduces a lightweight yet effective multi-stage injection pipeline, which selectively rewires attention pathways and perturbs intermediate representations with minimal parameter overhead (only 0.15% updated). ShadowCoT further leverages reinforcement learning and reasoning chain pollution (RCP) to autonomously synthesize stealthy adversarial CoTs that remain undetectable to advanced defenses. Extensive experiments across diverse reasoning benchmarks and LLMs show that ShadowCoT consistently achieves high Attack Success Rate (94.4%) and Hijacking Success Rate (88.4%) while preserving benign performance. These results reveal an emergent class of cognition-level threats and highlight the urgent need for defenses beyond shallow surface-level consistency.

Updated: 2025-04-08 01:36:16

标题: ShadowCoT：在LLMs中潜在推理后门的认知劫持

摘要: Chain-of-Thought (CoT) enhances an LLM's ability to perform complex reasoning tasks, but it also introduces new security issues. In this work, we present ShadowCoT, a novel backdoor attack framework that targets the internal reasoning mechanism of LLMs. Unlike prior token-level or prompt-based attacks, ShadowCoT directly manipulates the model's cognitive reasoning path, enabling it to hijack multi-step reasoning chains and produce logically coherent but adversarial outcomes. By conditioning on internal reasoning states, ShadowCoT learns to recognize and selectively disrupt key reasoning steps, effectively mounting a self-reflective cognitive attack within the target model. Our approach introduces a lightweight yet effective multi-stage injection pipeline, which selectively rewires attention pathways and perturbs intermediate representations with minimal parameter overhead (only 0.15% updated). ShadowCoT further leverages reinforcement learning and reasoning chain pollution (RCP) to autonomously synthesize stealthy adversarial CoTs that remain undetectable to advanced defenses. Extensive experiments across diverse reasoning benchmarks and LLMs show that ShadowCoT consistently achieves high Attack Success Rate (94.4%) and Hijacking Success Rate (88.4%) while preserving benign performance. These results reveal an emergent class of cognition-level threats and highlight the urgent need for defenses beyond shallow surface-level consistency.

更新时间: 2025-04-08 01:36:16

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2504.05605v1

On the Impact of Language Nuances on Sentiment Analysis with Large Language Models: Paraphrasing, Sarcasm, and Emojis

Large Language Models (LLMs) have demonstrated impressive performance across various tasks, including sentiment analysis. However, data quality--particularly when sourced from social media--can significantly impact their accuracy. This research explores how textual nuances, including emojis and sarcasm, affect sentiment analysis, with a particular focus on improving data quality through text paraphrasing techniques. To address the lack of labeled sarcasm data, the authors created a human-labeled dataset of 5929 tweets that enabled the assessment of LLM in various sarcasm contexts. The results show that when topic-specific datasets, such as those related to nuclear power, are used to finetune LLMs these models are not able to comprehend accurate sentiment in presence of sarcasm due to less diverse text, requiring external interventions like sarcasm removal to boost model accuracy. Sarcasm removal led to up to 21% improvement in sentiment accuracy, as LLMs trained on nuclear power-related content struggled with sarcastic tweets, achieving only 30% accuracy. In contrast, LLMs trained on general tweet datasets, covering a broader range of topics, showed considerable improvements in predicting sentiment for sarcastic tweets (60% accuracy), indicating that incorporating general text data can enhance sarcasm detection. The study also utilized adversarial text augmentation, showing that creating synthetic text variants by making minor changes significantly increased model robustness and accuracy for sarcastic tweets (approximately 85%). Additionally, text paraphrasing of tweets with fragmented language transformed around 40% of the tweets with low-confidence labels into high-confidence ones, improving LLMs sentiment analysis accuracy by 6%.

Updated: 2025-04-08 01:29:58

标题: 关于语言细微差别对大型语言模型情感分析的影响：释义、讽刺和表情符号

摘要: 大型语言模型(LLMs)已经展示了在各种任务中的出色性能，包括情感分析。然而，数据质量--特别是来自社交媒体的数据--可以显著影响它们的准确性。这项研究探讨了文本细微差异，包括表情符号和讽刺，如何影响情感分析，特别关注通过文本改写技术来提高数据质量。为了解决缺乏标记的讽刺数据，作者们创建了一个由5929条推文人工标记的数据集，这使得能够评估LLM在各种讽刺语境中的表现。结果显示，当使用特定主题数据集，比如与核能相关的数据集来微调LLMs时，这些模型在存在讽刺的情况下无法理解准确的情感，因为文本多样性较少，需要外部干预，比如删除讽刺来提高模型的准确性。删除讽刺导致了情感准确性的提高，LLMs在核能相关内容上训练时困扰于讽刺推文，仅达到30%的准确性。相比之下，训练在一般推文数据集上的LLMs，涵盖了更广泛的主题范围，对于讽刺推文的情感预测显示出了明显的改进(60%的准确性)，表明整合一般文本数据可以增强讽刺检测。该研究还利用对抗文本增强，显示通过进行微小改变创建合成文本变体显著增加了模型对讽刺推文的稳健性和准确性(约85%)。另外，对于使用片段化语言的推文的文本改写将约40%的标签置信度低的推文转换为高置信度的推文，将LLMs的情感分析准确性提高了6%。

更新时间: 2025-04-08 01:29:58

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.05603v1

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

Speculative Decoding (SD) is a widely used approach to accelerate the inference of large language models (LLMs) without reducing generation quality. It operates by first using a compact model to draft multiple tokens efficiently, followed by parallel verification using the target LLM. This approach leads to faster inference compared to auto-regressive decoding. While there are multiple approaches to create a draft model, one promising approach is to use early-exit methods. These methods draft candidate tokens by using a subset of layers of the primary model and applying the remaining layers for verification, allowing a single model to handle both drafting and verification. While this technique reduces memory usage and computational cost, its performance relies on the choice of the exit layer for drafting and the number of tokens drafted (speculation length) in each SD round. Prior works use hyperparameter exploration to statically select these values. However, our evaluations show that these hyperparameter values are task-specific, and even within a task they are dependent on the current sequence context. We introduce DEL, a plug-and-play method that adaptively selects the exit layer and speculation length during inference. DEL dynamically tracks the token acceptance rate if the tokens are drafted at each layer of an LLM and uses that knowledge to heuristically select the optimal exit layer and speculation length. Our experiments across a broad range of models and downstream tasks show that DEL achieves overall speedups of $2.16\times$$\sim$$2.50\times$ over vanilla auto-regressive decoding and improves upon the state-of-the-art SD methods by up to $0.27\times$.

Updated: 2025-04-08 01:12:59

标题: DEL：上下文感知动态退出层用于高效自我推测解码

摘要: Speculative Decoding（SD）是一种广泛使用的方法，用于加速大型语言模型（LLMs）的推理，而不会降低生成质量。它通过首先使用紧凑模型高效地起草多个标记，然后通过并行验证使用目标LLM。与自回归解码相比，这种方法导致更快的推理速度。虽然有多种方法可以创建起草模型，但一种有前途的方法是使用早期退出方法。这些方法通过使用主要模型的一部分层来起草候选标记，并应用剩余层进行验证，从而允许单个模型处理起草和验证。虽然这种技术减少了内存使用和计算成本，但其性能取决于选择用于起草的退出层以及每个SD轮中起草的标记数量（推测长度）。以前的工作使用超参数探索静态地选择这些值。然而，我们的评估表明，这些超参数值是特定于任务的，并且甚至在任务内部它们取决于当前序列上下文。我们介绍了DEL，一种自适应选择退出层和推测长度的即插即用方法。DEL动态跟踪在LLM的每一层起草标记时标记接受率，并利用这一知识启发式地选择最佳退出层和推测长度。我们在广泛的模型和下游任务上进行的实验表明，DEL相对于基本的自回归解码实现了2.16倍至2.50倍的整体加速，并且在现有技术中改进了高达0.27倍。

更新时间: 2025-04-08 01:12:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.05598v1

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Large Language Models (LLMs) have been used as experts to infer causal graphs, often by repeatedly applying a pairwise prompt that asks about the causal relationship of each variable pair. However, such experts, including human domain experts, cannot distinguish between direct and indirect effects given a pairwise prompt. Therefore, instead of the graph, we propose that causal order be used as a more stable output interface for utilizing expert knowledge. Even when querying a perfect expert with a pairwise prompt, we show that the inferred graph can have significant errors whereas the causal order is always correct. In practice, however, LLMs are imperfect experts and we find that pairwise prompts lead to multiple cycles. Hence, we propose the triplet method, a novel querying strategy that introduces an auxiliary variable for every variable pair and instructs the LLM to avoid cycles within this triplet. It then uses a voting-based ensemble method that results in higher accuracy and fewer cycles while ensuring cost efficiency. Across multiple real-world graphs, such a triplet-based method yields a more accurate order than the pairwise prompt, using both LLMs and human annotators. The triplet method enhances robustness by repeatedly querying an expert with different auxiliary variables, enabling smaller models like Phi-3 and Llama-3 8B Instruct to surpass GPT-4 with pairwise prompting. For practical usage, we show how the expert-provided causal order from the triplet method can be used to reduce error in downstream graph discovery and effect inference tasks.

Updated: 2025-04-08 01:03:56

标题: 因果顺序：在因果推断中利用不完美专家的关键

摘要: 大型语言模型（LLMs）已被用作专家来推断因果图，通常是通过反复应用一个询问关于每对变量之间因果关系的提示。然而，这样的专家，包括人类领域专家，无法区分直接和间接效应，即使给定一个成对提示。因此，我们建议使用因果顺序作为更稳定的专家知识利用输出界面，而不是图形。即使使用成对提示查询完美专家，我们发现推断的图形可能存在重大错误，而因果顺序始终是正确的。然而，在实践中，LLMs是不完美的专家，我们发现成对提示会导致多个循环。因此，我们提出了三元组方法，一种引入辅助变量的新型查询策略，用于每对变量，并指示LLM避免这个三元组内的循环。然后使用基于投票的集成方法，结果准确性更高，循环更少，同时确保成本效率。在多个现实世界的图形中，这种基于三元组的方法比成对提示，无论是使用LLMs还是人类注释者，都产生更准确的顺序。三元组方法通过反复查询具有不同辅助变量的专家来增强鲁棒性，使Phi-3和Llama-3 8B Instruct等较小的模型超越GPT-4的成对提示。对于实际用途，我们展示了如何利用三元组方法提供的专家提供的因果顺序来减少下游图形发现和效果推断任务中的错误。

更新时间: 2025-04-08 01:03:56

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.15117v2

Class Imbalance Correction for Improved Universal Lesion Detection and Tagging in CT

Radiologists routinely detect and size lesions in CT to stage cancer and assess tumor burden. To potentially aid their efforts, multiple lesion detection algorithms have been developed with a large public dataset called DeepLesion (32,735 lesions, 32,120 CT slices, 10,594 studies, 4,427 patients, 8 body part labels). However, this dataset contains missing measurements and lesion tags, and exhibits a severe imbalance in the number of lesions per label category. In this work, we utilize a limited subset of DeepLesion (6\%, 1331 lesions, 1309 slices) containing lesion annotations and body part label tags to train a VFNet model to detect lesions and tag them. We address the class imbalance by conducting three experiments: 1) Balancing data by the body part labels, 2) Balancing data by the number of lesions per patient, and 3) Balancing data by the lesion size. In contrast to a randomly sampled (unbalanced) data subset, our results indicated that balancing the body part labels always increased sensitivity for lesions >= 1cm for classes with low data quantities (Bone: 80\% vs. 46\%, Kidney: 77\% vs. 61\%, Soft Tissue: 70\% vs. 60\%, Pelvis: 83\% vs. 76\%). Similar trends were seen for three other models tested (FasterRCNN, RetinaNet, FoveaBox). Balancing data by lesion size also helped the VFNet model improve recalls for all classes in contrast to an unbalanced dataset. We also provide a structured reporting guideline for a ``Lesions'' subsection to be entered into the ``Findings'' section of a radiology report. To our knowledge, we are the first to report the class imbalance in DeepLesion, and have taken data-driven steps to address it in the context of joint lesion detection and tagging.

Updated: 2025-04-08 00:58:26

标题: CT中改良的通用病变检测和标记的类别不平衡校正

摘要: 放射科医生通常在CT中检测和测量病变，以进行癌症分期和评估肿瘤负担。为了潜在地帮助他们的工作，已经开发了多种病变检测算法，其中一个使用了一个名为DeepLesion的大型公共数据集（32,735个病变，32,120个CT切片，10,594个研究，4,427名患者，8个身体部位标签）。然而，这个数据集存在缺失的测量和病变标签，并且在每个标签类别下病变数量方面存在严重的不平衡。在这项工作中，我们利用DeepLesion的一个有限子集（6％，1331个病变，1309个切片），其中包含病变注释和身体部位标签，来训练一个VFNet模型来检测病变并标记它们。我们通过进行三个实验来解决类别不平衡问题：1）按照身体部位标签平衡数据，2）按照每位患者的病变数量平衡数据，3）按病变大小平衡数据。与随机抽样的（不平衡）数据子集相比，我们的结果表明，按照身体部位标签平衡数据总是增加了对病变>= 1cm的敏感性，特别是对于数据量较低的类别（骨骼：80％ vs. 46％，肾脏：77％ vs. 61％，软组织：70％ vs. 60％，骨盆：83％ vs. 76％）。对另外三个测试的模型（FasterRCNN，RetinaNet，FoveaBox）也出现了类似的趋势。按病变大小平衡数据也帮助VFNet模型提高了对所有类别的召回率，相比于不平衡数据集。我们还提供了一个结构化的报告指南，用于将“病变”子部分输入到放射学报告的“发现”部分。据我们所知，我们是第一个报告DeepLesion中类别不平衡问题，并采取了数据驱动的措施来解决在联合病变检测和标记的情境下。

更新时间: 2025-04-08 00:58:26

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.05591v1

Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems

Controlling instabilities in complex dynamical systems is challenging in scientific and engineering applications. Deep reinforcement learning (DRL) has seen promising results for applications in different scientific applications. The many-query nature of control tasks requires multiple interactions with real environments of the underlying physics. However, it is usually sparse to collect from the experiments or expensive to simulate for complex dynamics. Alternatively, controlling surrogate modeling could mitigate the computational cost issue. However, a fast and accurate learning-based model by offline training makes it very hard to get accurate pointwise dynamics when the dynamics are chaotic. To bridge this gap, the current work proposes a multi-fidelity reinforcement learning (MFRL) framework that leverages differentiable hybrid models for control tasks, where a physics-based hybrid model is corrected by limited high-fidelity data. We also proposed a spectrum-based reward function for RL learning. The effect of the proposed framework is demonstrated on two complex dynamics in physics. The statistics of the MFRL control result match that computed from many-query evaluations of the high-fidelity environments and outperform other SOTA baselines.

Updated: 2025-04-08 00:50:15

标题: 复杂动力系统的多信度强化学习控制

摘要: 在科学和工程应用中，控制复杂动态系统中的不稳定性是具有挑战性的。深度强化学习（DRL）在不同科学应用中已经取得了令人期待的结果。控制任务的多次查询性质要求与底层物理环境进行多次交互。然而，从实验中收集数据通常是稀缺的，或者在复杂动态中进行模拟是昂贵的。相反，控制替代建模可以减轻计算成本问题。然而，通过离线训练获得快速和准确的基于学习的模型在动态混沌时很难获得准确的点动态。为了弥合这一差距，当前工作提出了一个多保真度强化学习（MFRL）框架，利用可微分的混合模型进行控制任务，其中基于物理的混合模型通过有限的高保真度数据进行校正。我们还提出了一种基于频谱的奖励函数用于强化学习。所提出的框架在物理学中的两个复杂动态上展示了效果。MFRL控制结果的统计数据与高保真度环境的多次查询评估计算结果相匹配，并优于其他SOTA基线。

更新时间: 2025-04-08 00:50:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05588v1

Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications

We propose the linear barycentric coding model (LBCM) which utilizes the linear optimal transport (LOT) metric for analysis and synthesis of probability measures. We provide a closed-form solution to the variational problem characterizing the probability measures in the LBCM and establish equivalence of the LBCM to the set of 2-Wasserstein barycenters in the special case of compatible measures. Computational methods for synthesizing and analyzing measures in the LBCM are developed with finite sample guarantees. One of our main theoretical contributions is to identify an LBCM, expressed in terms of a simple family, which is sufficient to express all probability measures on the closed unit interval. We show that a natural analogous construction of an LBCM in 2 dimensions fails, and we leave it as an open problem to identify the proper extension in more than 1 dimension. We conclude by demonstrating the utility of LBCM for covariance estimation and data imputation.

Updated: 2025-04-08 00:49:14

标题: 线性化Wasserstein重心：合成、分析、表示能力和应用

摘要: 我们提出了线性重心编码模型（LBCM），该模型利用线性最优输运（LOT）度量对概率测度进行分析和合成。我们提供了描述LBCM中概率测度的变分问题的封闭形式解，并建立了在兼容测度的特殊情况下，LBCM与2-Wasserstein重心集合的等价性。我们开发了在LBCM中合成和分析测度的计算方法，并提供了有限样本保证。我们的主要理论贡献之一是确定了一个简单家族表示的LBCM，足以表示在单位闭区间上的所有概率测度。我们展示了在二维空间中对LBCM进行自然类似的构造失败，我们将其作为一个开放问题，在超过1维空间中确定适当的扩展。最后，我们展示了LBCM在协方差估计和数据插补中的实用性。

更新时间: 2025-04-08 00:49:14

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.23602v2

Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations

Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks. However, vanilla SMoEs have issues such as expert redundancy and heavy memory requirements, making them inefficient and non-scalable, especially for resource-constrained scenarios. Expert-level sparsification of SMoEs involves pruning the least important experts to address these limitations. In this work, we aim to address three questions: (1) What is the best recipe to identify the least knowledgeable subset of experts that can be dropped with minimal impact on performance? (2) How should we perform expert dropping (one-shot or iterative), and what correction measures can we undertake to minimize its drastic impact on SMoE subnetwork capabilities? (3) What capabilities of full-SMoEs are severely impacted by the removal of the least dominant experts, and how can we recover them? Firstly, we propose MoE Experts Compression Suite (MC-Suite), which is a collection of some previously explored and multiple novel recipes to provide a comprehensive benchmark for estimating expert importance from diverse perspectives, as well as unveil numerous valuable insights for SMoE experts. Secondly, unlike prior works with a one-shot expert pruning approach, we explore the benefits of iterative pruning with the re-estimation of the MC-Suite criterion. Moreover, we introduce the benefits of task-agnostic fine-tuning as a correction mechanism during iterative expert dropping, which we term MoE Lottery Subnetworks. Lastly, we present an experimentally validated conjecture that, during expert dropping, SMoEs' instruction-following capabilities are predominantly hurt, which can be restored to a robust level subject to external augmentation of instruction-following capabilities using k-shot examples and supervised fine-tuning.

Updated: 2025-04-08 00:49:08

标题: 在MoEs中找到出色的专家：专家淘汰策略和观察的统一研究

摘要: 稀疏激活的专家混合模型（SMoE）在扩展神经网络的学习能力方面表现出潜力。然而，普通的SMoE存在专家冗余和大量内存需求等问题，使其在资源受限的情况下效率低下且不可扩展。对SMoE进行专家级稀疏化涉及修剪最不重要的专家以解决这些限制。在这项工作中，我们旨在回答三个问题：（1）如何确定可以最小影响性能的最不熟悉的专家子集的最佳方法？（2）我们应该如何进行专家删除（一次性或迭代），以及我们可以采取什么校正措施来最小化其对SMoE子网络能力的严重影响？（3）在去除最不占主导地位的专家后，完整的SMoE的能力受到严重影响，我们如何恢复它们？首先，我们提出了MoE专家压缩套件（MC-Suite），这是一组一些先前探索过的和多种新颖方法的集合，为从各种角度评估专家重要性提供了全面的基准，并揭示了许多有价值的见解供SMoE专家使用。其次，与先前采用一次性专家修剪方法的作品不同，我们探索了通过重新估计MC-Suite标准进行迭代修剪的好处。此外，我们介绍了任务不可知微调作为在迭代专家删除过程中的校正机制的优势，我们称之为MoE Lottery子网络。最后，我们提出了一个经过实验证实的猜想，即在专家删除过程中，SMoE的遵循指令能力受到严重伤害，可以通过使用k-shot示例和监督微调进行外部增强的指令遵循能力，将其恢复到稳固水平。

更新时间: 2025-04-08 00:49:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05586v1

TW-CRL: Time-Weighted Contrastive Reward Learning for Efficient Inverse Reinforcement Learning

Episodic tasks in Reinforcement Learning (RL) often pose challenges due to sparse reward signals and high-dimensional state spaces, which hinder efficient learning. Additionally, these tasks often feature hidden "trap states" -- irreversible failures that prevent task completion but do not provide explicit negative rewards to guide agents away from repeated errors. To address these issues, we propose Time-Weighted Contrastive Reward Learning (TW-CRL), an Inverse Reinforcement Learning (IRL) framework that leverages both successful and failed demonstrations. By incorporating temporal information, TW-CRL learns a dense reward function that identifies critical states associated with success or failure. This approach not only enables agents to avoid trap states but also encourages meaningful exploration beyond simple imitation of expert trajectories. Empirical evaluations on navigation tasks and robotic manipulation benchmarks demonstrate that TW-CRL surpasses state-of-the-art methods, achieving improved efficiency and robustness.

Updated: 2025-04-08 00:48:29

标题: TW-CRL：时间加权对比奖励学习用于高效的逆强化学习

摘要: 强化学习（RL）中的情节任务经常面临挑战，因为奖励信号稀疏且状态空间维度高，这阻碍了有效学习。此外，这些任务经常包含隐藏的“陷阱状态” - 不可逆的失败，阻止任务完成但不提供明确的负面奖励来引导代理人避免重复错误。为了解决这些问题，我们提出了一种时间加权对比奖励学习（TW-CRL）方法，这是一种利用成功和失败演示的逆强化学习（IRL）框架。通过结合时间信息，TW-CRL学习到一个密集的奖励函数，识别与成功或失败相关的关键状态。这种方法不仅使代理人避免陷阱状态，还鼓励有意义的探索，超出对专家轨迹的简单模仿。在导航任务和机器人操作基准测试上的实证评估表明，TW-CRL超越了最先进的方法，实现了更高的效率和稳健性。

更新时间: 2025-04-08 00:48:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.05585v1

Estimating the Probability of Sampling a Trained Neural Network at Random

We present and analyze an algorithm for estimating the size, under a Gaussian or uniform measure, of a localized neighborhood in neural network parameter space with behavior similar to an ``anchor'' point. We refer to this as the "local volume" of the anchor. We adapt an existing basin-volume estimator, which is very fast but in many cases only provides a lower bound. We show that this lower bound can be improved with an importance-sampling method using gradient information that is already provided by popular optimizers. The negative logarithm of local volume can also be interpreted as a measure of the anchor network's information content. As expected for a measure of complexity, this quantity increases during language model training. We find that overfit, badly-generalizing neighborhoods are smaller, indicating a more complex learned behavior. This smaller volume can also be interpreted in an MDL sense as suboptimal compression. Our results are consistent with a picture of generalization we call the "volume hypothesis": that neural net training produces good generalization primarily because the architecture gives simple functions more volume in parameter space, and the optimizer samples from the low-loss manifold in a volume-sensitive way. We believe that fast local-volume estimators are a promising practical metric of network complexity and architectural inductive bias for interpretability purposes.

Updated: 2025-04-08 00:36:01

标题: 估计随机抽样到训练好的神经网络的概率

摘要: 我们提出并分析了一种算法，用于估计神经网络参数空间中类似于“锚点”的行为的局部邻域在高斯或均匀度量下的大小。我们将这称为锚点的“局部体积”。我们改进了现有的盆地体积估计器，该估计器非常快速，但在许多情况下只提供一个下界。我们展示了可以利用梯度信息改进这个下界的重要性抽样方法，这些信息已经由流行的优化器提供。局部体积的负对数也可以解释为锚网络的信息内容的度量。正如预期的那样，作为一种复杂性度量，这个数量在语言模型训练期间增加。我们发现，过拟合、泛化能力差的邻域较小，表明学习行为更复杂。这种较小的体积也可以从MDL的角度解释为次优压缩。我们的结果与一种我们称之为“体积假设”的泛化图像一致：神经网络训练主要是因为架构在参数空间中给予简单函数更多的体积，而优化器以体积敏感的方式从低损失流形中进行抽样，从而产生良好的泛化。我们相信快速的局部体积估计器是网络复杂性和架构归纳偏差的有前景的实用指标，用于解释性目的。

更新时间: 2025-04-08 00:36:01

领域: cs.LG

下载: http://arxiv.org/abs/2501.18812v2

Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity

Our understanding of biological neuronal networks has profoundly influenced the development of artificial neural networks (ANNs). However, neurons utilized in ANNs differ considerably from their biological counterparts, primarily due to the absence of complex dendritic trees with local nonlinearities. Early studies have suggested that dendritic nonlinearities could substantially improve the learning capabilities of neural network models. In this study, we systematically examined the role of nonlinear dendrites within neural networks. Utilizing machine-learning methodologies, we assessed how dendritic nonlinearities influence neural network performance. Our findings demonstrate that dendritic nonlinearities do not substantially affect learning capacity; rather, their primary benefit lies in enabling network capacity expansion while minimizing communication costs through effective localized feature aggregation. This research provides critical insights with significant implications for designing future neural network accelerators aimed at reducing communication overhead during neural network training and inference.

Updated: 2025-04-08 00:33:27

标题: 减轻神经网络中的通信成本：树突非线性的作用

摘要: 我们对生物神经元网络的理解深刻影响了人工神经网络（ANNs）的发展。然而，在ANNs中使用的神经元与其生物对应物存在显著差异，主要是由于缺乏具有局部非线性的复杂树突。早期研究表明，树突非线性可以显著提高神经网络模型的学习能力。在本研究中，我们系统地研究了非线性树突在神经网络中的作用。利用机器学习方法，我们评估了树突非线性对神经网络性能的影响。我们的发现表明，树突非线性并不显著影响学习能力；相反，它们的主要好处在于通过有效的局部特征聚合实现网络容量扩展的同时最小化通信成本。这项研究提供了关键见解，对设计旨在减少神经网络训练和推断过程中通信开销的未来神经网络加速器具有重要意义。

更新时间: 2025-04-08 00:33:27

领域: cs.NE,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2306.11950v2

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding

We introduce SoundVista, a method to generate the ambient sound of an arbitrary scene at novel viewpoints. Given a pre-acquired recording of the scene from sparsely distributed microphones, SoundVista can synthesize the sound of that scene from an unseen target viewpoint. The method learns the underlying acoustic transfer function that relates the signals acquired at the distributed microphones to the signal at the target viewpoint, using a limited number of known recordings. Unlike existing works, our method does not require constraints or prior knowledge of sound source details. Moreover, our method efficiently adapts to diverse room layouts, reference microphone configurations and unseen environments. To enable this, we introduce a visual-acoustic binding module that learns visual embeddings linked with local acoustic properties from panoramic RGB and depth data. We first leverage these embeddings to optimize the placement of reference microphones in any given scene. During synthesis, we leverage multiple embeddings extracted from reference locations to get adaptive weights for their contribution, conditioned on target viewpoint. We benchmark the task on both publicly available data and real-world settings. We demonstrate significant improvements over existing methods.

Updated: 2025-04-08 00:22:16

标题: SoundVista：通过视听绑定实现新视角环境声音合成

摘要: 我们介绍了一种名为SoundVista的方法，用于在新颖的视点生成任意场景的环境声音。给定稀疏分布的麦克风预先获取的场景录音，SoundVista可以合成从未见过的目标视点处的场景声音。该方法学习了在有限数量已知录音的情况下将分布式麦克风获取的信号与目标视点处信号相关联的基础声学传递函数。与现有作品不同，我们的方法不需要声源细节的约束或先验知识。此外，我们的方法能够有效地适应各种房间布局、参考麦克风配置和未知环境。为实现这一点，我们引入了一个视觉声学绑定模块，该模块从全景RGB和深度数据中学习与局部声学特性相关联的视觉嵌入。我们首先利用这些嵌入来优化在任何给定场景中参考麦克风的放置。在合成过程中，我们利用从参考位置提取的多个嵌入来获得适应性权重，这些权重取决于目标视点。我们在公开可用数据和实际环境中对该任务进行了基准测试，并展示了相对于现有方法的显著改进。

更新时间: 2025-04-08 00:22:16

领域: cs.SD,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2504.05576v1

A Lightweight Large Vision-language Model for Multimodal Medical Images

Medical Visual Question Answering (VQA) enhances clinical decision-making by enabling systems to interpret medical images and answer clinical queries. However, developing efficient, high-performance VQA models is challenging due to the complexity of medical imagery and diverse modalities. In this paper, we introduce a lightweight, multimodal VQA model integrating BiomedCLIP for image feature extraction and LLaMA-3 for text processing. Designed for medical VQA tasks, our model achieves state-of-the-art performance on the OmniMedVQA dataset. With approximately 8 billion parameters, it requires only two NVIDIA 40 GB A100 GPUs, demonstrating superior efficiency over larger models. Our results show 73.4% accuracy for open-end questions, surpassing existing models and validating its potential for real-world medical applications. Key contributions include a specialized multimodal VQA model, a resource-efficient architecture, and strong performance in answering open-ended clinical questions.

Updated: 2025-04-08 00:19:48

标题: 一个轻量级大视觉语言模型用于多模态医学图像

摘要: 医学可视化问答（VQA）通过使系统能够解释医学图像并回答临床查询，增强了临床决策能力。然而，由于医学图像的复杂性和多样性模态，开发高效、高性能的VQA模型具有挑战性。在本文中，我们介绍了一种轻量级、多模态VQA模型，集成了BiomedCLIP用于图像特征提取和LLaMA-3用于文本处理。设计用于医学VQA任务，我们的模型在OmniMedVQA数据集上实现了最先进的性能。拥有约80亿个参数，该模型仅需要两个NVIDIA 40 GB A100 GPU，表现出比较大模型更高的效率。我们的结果显示，对于开放式问题，准确率为73.4%，超过了现有模型，并验证了其在现实世界医学应用中的潜力。关键贡献包括专门的多模态VQA模型、资源高效的架构以及在回答开放式临床问题方面的强大性能。

更新时间: 2025-04-08 00:19:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.05575v1

Automated radiotherapy treatment planning guided by GPT-4Vision

Objective: Radiotherapy treatment planning is a time-consuming and potentially subjective process that requires the iterative adjustment of model parameters to balance multiple conflicting objectives. Recent advancements in frontier Artificial Intelligence (AI) models offer promising avenues for addressing the challenges in planning and clinical decision-making. This study introduces GPT-RadPlan, an automated treatment planning framework that integrates radiation oncology knowledge with the reasoning capabilities of large multi-modal models, such as GPT-4Vision (GPT-4V) from OpenAI. Approach: Via in-context learning, we incorporate clinical requirements and a few (3 in our experiments) approved clinical plans with their optimization settings, enabling GPT-4V to acquire treatment planning domain knowledge. The resulting GPT-RadPlan system is integrated into our in-house inverse treatment planning system through an application programming interface (API). For a given patient, GPT-RadPlan acts as both plan evaluator and planner, first assessing dose distributions and dose-volume histograms (DVHs), and then providing textual feedback on how to improve the plan to match the physician's requirements. In this manner, GPT-RadPlan iteratively refines the plan by adjusting planning parameters, such as weights and dose objectives, based on its suggestions. Main results: The efficacy of the automated planning system is showcased across 17 prostate cancer and 13 head and neck cancer VMAT plans with prescribed doses of 70.2 Gy and 72 Gy, respectively, where we compared GPT-RadPlan results to clinical plans produced by human experts. In all cases, GPT-RadPlan either outperformed or matched the clinical plans, demonstrating superior target coverage and reducing organ-at-risk doses by 5 Gy on average (15 percent for prostate and 10-15 percent for head and neck).

Updated: 2025-04-08 00:19:04

标题: 由GPT-4Vision引导的自动化放疗治疗规划

摘要: 目标：放射治疗计划是一个耗时且潜在主观的过程，需要通过迭代调整模型参数来平衡多个冲突的目标。最近前沿人工智能（AI）模型的进展为解决规划和临床决策中的挑战提供了有希望的途径。本研究介绍了GPT-RadPlan，这是一个自动化治疗计划框架，将放射肿瘤知识与GPT-4Vision（GPT-4V）等大型多模态模型的推理能力整合在一起。方法：通过上下文学习，我们结合临床要求和少量（在我们的实验中为3个）经批准的临床计划及其优化设置，使GPT-4V能够获得治疗计划领域知识。由此产生的GPT-RadPlan系统通过应用程序编程接口（API）集成到我们内部反向治疗计划系统中。对于给定的患者，GPT-RadPlan既充当计划评估者又充当规划者，首先评估剂量分布和剂量体积直方图（DVHs），然后提供文本反馈以改进计划以符合医师的要求。通过这种方式，GPT-RadPlan通过根据其建议调整计划参数（如权重和剂量目标）来迭代地完善计划。主要结果：自动化规划系统的有效性在17例前列腺癌和13例头颈癌VMAT计划中展示，分别处方剂量为70.2 Gy和72 Gy，我们将GPT-RadPlan的结果与人类专家制定的临床计划进行了比较。在所有情况下，GPT-RadPlan要么胜过要么与临床计划匹配，展示了优越的靶区覆盖率，并平均减少了5 Gy的器官风险剂量（前列腺为15%，头颈为10-15%）。

更新时间: 2025-04-08 00:19:04

领域: physics.med-ph,cs.AI

下载: http://arxiv.org/abs/2406.15609v3

NutriBench: A Dataset for Evaluating Large Language Models on Nutrition Estimation from Meal Descriptions

Accurate nutrition estimation helps people make informed dietary choices and is essential in the prevention of serious health complications. We present NutriBench, the first publicly available natural language meal description nutrition benchmark. NutriBench consists of 11,857 meal descriptions generated from real-world global dietary intake data. The data is human-verified and annotated with macro-nutrient labels, including carbohydrates, proteins, fats, and calories. We conduct an extensive evaluation of NutriBench on the task of carbohydrate estimation, testing twelve leading Large Language Models (LLMs), including GPT-4o, Llama3.1, Qwen2, Gemma2, and OpenBioLLM models, using standard, Chain-of-Thought and Retrieval-Augmented Generation strategies. Additionally, we present a study involving professional nutritionists, finding that LLMs can provide comparable but significantly faster estimates. Finally, we perform a real-world risk assessment by simulating the effect of carbohydrate predictions on the blood glucose levels of individuals with diabetes. Our work highlights the opportunities and challenges of using LLMs for nutrition estimation, demonstrating their potential to aid professionals and laypersons and improve health outcomes. Our benchmark is publicly available at: https://mehak126.github.io/nutribench.html

Updated: 2025-04-08 00:17:27

标题: NutriBench：一个用于评估大型语言模型在餐食描述中营养估计的数据集

摘要: 准确的营养估计有助于人们做出明智的饮食选择，并在预防严重健康并发症中至关重要。我们提出了NutriBench，这是第一个公开可用的自然语言餐饮描述营养基准。NutriBench由11,857个由真实世界全球饮食摄入数据生成的餐饮描述组成。数据经过人工验证，并用宏量营养标签进行了标注，包括碳水化合物、蛋白质、脂肪和卡路里。我们对NutriBench在碳水化合物估计任务上进行了广泛评估，测试了十二个领先的大型语言模型（LLMs），包括GPT-4o、Llama3.1、Qwen2、Gemma2和OpenBioLLM模型，使用标准、思维链和检索增强生成策略。此外，我们进行了一项涉及专业营养师的研究，发现LLMs可以提供可比但明显更快的估计。最后，我们通过模拟碳水化合物预测对患有糖尿病的个体的血糖水平的影响，进行了实际风险评估。我们的工作突显了使用LLMs进行营养估计的机遇和挑战，展示了它们帮助专业人士和普通人改善健康结果的潜力。我们的基准可在以下网址公开访问：https://mehak126.github.io/nutribench.html

更新时间: 2025-04-08 00:17:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12843v5

Fully-inductive Node Classification on Arbitrary Graphs

One fundamental challenge in graph machine learning is generalizing to new graphs. Many existing methods following the inductive setup can generalize to test graphs with new structures, but assuming the feature and label spaces remain the same as the training ones. This paper introduces a fully-inductive setup, where models should perform inference on arbitrary test graphs with new structures, feature and label spaces. We propose GraphAny as the first attempt at this challenging setup. GraphAny models inference on a new graph as an analytical solution to a LinearGNN, which can be naturally applied to graphs with any feature and label spaces. To further build a stronger model with learning capacity, we fuse multiple LinearGNN predictions with learned inductive attention scores. Specifically, the attention module is carefully parameterized as a function of the entropy-normalized distance features between pairs of LinearGNN predictions to ensure generalization to new graphs. Empirically, GraphAny trained on a single Wisconsin dataset with only 120 labeled nodes can generalize to 30 new graphs with an average accuracy of 67.26%, surpassing not only all inductive baselines, but also strong transductive methods trained separately on each of the 30 test graphs.

Updated: 2025-04-08 00:15:02

标题: 在任意图上的全归纳节点分类

摘要: 图机器学习中的一个基本挑战是推广到新的图形。许多现有的方法遵循归纳设置，可以推广到具有新结构的测试图形，但假设特征和标签空间与训练图形相同。本文介绍了一个完全归纳设置，其中模型应该在具有新结构、特征和标签空间的任意测试图形上执行推理。我们提出GraphAny作为对这种具有挑战性设置的第一次尝试。GraphAny模型将对新图形的推理建模为LinearGNN的解析解，可以自然地应用于具有任何特征和标签空间的图形。为了进一步构建一个具有学习能力的更强大模型，我们利用学习的归纳注意分数融合多个LinearGNN预测。具体来说，注意模块被小心地参数化为LinearGNN预测对之间的熵归一化距离特征的函数，以确保对新图形的推广。在实证上，只在一个拥有120个标记节点的威斯康星数据集上训练的GraphAny可以推广到30个新图形，平均准确率为67.26%，不仅超过了所有归纳基线，而且还超过了单独在每个30个测试图形上训练的强大传导方法。

更新时间: 2025-04-08 00:15:02

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2405.20445v5

MicroNN: An On-device Disk-resident Updatable Vector Database

Nearest neighbour search over dense vector collections has important applications in information retrieval, retrieval augmented generation (RAG), and content ranking. Performing efficient search over large vector collections is a well studied problem with many existing approaches and open source implementations. However, most state-of-the-art systems are generally targeted towards scenarios using large servers with an abundance of memory, static vector collections that are not updatable, and nearest neighbour search in isolation of other search criteria. We present Micro Nearest Neighbour (MicroNN), an embedded nearest-neighbour vector search engine designed for scalable similarity search in low-resource environments. MicroNN addresses the problem of on-device vector search for real-world workloads containing updates and hybrid search queries that combine nearest neighbour search with structured attribute filters. In this scenario, memory is highly constrained and disk-efficient index structures and algorithms are required, as well as support for continuous inserts and deletes. MicroNN is an embeddable library that can scale to large vector collections with minimal resources. MicroNN is used in production and powers a wide range of vector search use-cases on-device. MicroNN takes less than 7 ms to retrieve the top-100 nearest neighbours with 90% recall on publicly available million-scale vector benchmark while using ~10 MB of memory.

Updated: 2025-04-08 00:05:58

标题: MicroNN：一种在设备上磁盘驻留可更新的向量数据库

摘要: 密集向量集合中的最近邻搜索在信息检索、检索增强生成（RAG）和内容排名中具有重要应用。在大型向量集合上执行有效搜索是一个经过深入研究的问题，有许多现有方法和开源实现。然而，大多数最先进的系统通常针对使用大型服务器、具有丰富内存、不可更新的静态向量集合以及孤立于其他搜索条件的最近邻搜索的情景。我们提出了Micro Nearest Neighbour（MicroNN），这是一个专为低资源环境中可伸缩相似性搜索而设计的嵌入式最近邻向量搜索引擎。MicroNN解决了包含更新和将最近邻搜索与结构化属性过滤器相结合的混合搜索查询的实际工作负载中的设备上向量搜索问题。在这种情况下，内存受到严格限制，需要磁盘高效索引结构和算法，以及支持连续插入和删除。MicroNN是一个可嵌入的库，可以在最小资源下扩展到大型向量集合。MicroNN已在生产中使用，并在设备上支持各种向量搜索用例。MicroNN在使用约10 MB内存时，在公开可用的百万级向量基准测试中，检索前100个最近邻居的时间不到7毫秒，召回率为90%。

更新时间: 2025-04-08 00:05:58

领域: cs.DB,cs.AI,cs.IR

下载: http://arxiv.org/abs/2504.05573v1

Knowledge-Instruct: Effective Continual Pre-training from Limited Data using Instructions

While Large Language Models (LLMs) acquire vast knowledge during pre-training, they often lack domain-specific, new, or niche information. Continual pre-training (CPT) attempts to address this gap but suffers from catastrophic forgetting and inefficiencies in low-data regimes. We introduce Knowledge-Instruct, a novel approach to efficiently inject knowledge from limited corpora through pure instruction-tuning. By generating information-dense synthetic instruction data, it effectively integrates new knowledge while preserving general reasoning and instruction-following abilities. Knowledge-Instruct demonstrates superior factual memorization, minimizes catastrophic forgetting, and remains scalable by leveraging synthetic data from relatively small language models. Additionally, it enhances contextual understanding, including complex multi-hop reasoning, facilitating integration with retrieval systems. We validate its effectiveness across diverse benchmarks, including Companies, a new dataset that we release to measure knowledge injection capabilities.

Updated: 2025-04-08 00:00:36

标题: Knowledge-Instruct: 利用指导有效地从有限数据中进行持续预训练

摘要: 大型语言模型（LLMs）在预训练期间获得了广泛的知识，但通常缺乏领域特定、新颖或小众信息。持续预训练（CPT）试图填补这一差距，但在低数据环境中存在灾难性遗忘和低效率的问题。我们引入了一种名为Knowledge-Instruct的新方法，通过纯指导调整有效地注入有限语料库中的知识。通过生成信息密集的合成指导数据，它有效地整合了新知识，同时保留了一般推理和遵循指导的能力。Knowledge-Instruct展示了卓越的事实记忆能力，最大限度地减少了灾难性遗忘，并通过利用相对较小的语言模型的合成数据保持了可扩展性。此外，它增强了上下文理解，包括复杂的多跳推理，有助于与检索系统的集成。我们验证了其在各种基准测试中的有效性，包括我们发布的一个新数据集Companies，用于衡量知识注入能力。

更新时间: 2025-04-08 00:00:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.05571v1