Arxiv Day: Article

FIESTA: Fisher Information-based Efficient Selective Test-time Adaptation

Robust facial expression recognition in unconstrained, "in-the-wild" environments remains challenging due to significant domain shifts between training and testing distributions. Test-time adaptation (TTA) offers a promising solution by adapting pre-trained models during inference without requiring labeled test data. However, existing TTA approaches typically rely on manually selecting which parameters to update, potentially leading to suboptimal adaptation and high computational costs. This paper introduces a novel Fisher-driven selective adaptation framework that dynamically identifies and updates only the most critical model parameters based on their importance as quantified by Fisher information. By integrating this principled parameter selection approach with temporal consistency constraints, our method enables efficient and effective adaptation specifically tailored for video-based facial expression recognition. Experiments on the challenging AffWild2 benchmark demonstrate that our approach significantly outperforms existing TTA methods, achieving a 7.7% improvement in F1 score over the base model while adapting only 22,000 parameters-more than 20 times fewer than comparable methods. Our ablation studies further reveal that parameter importance can be effectively estimated from minimal data, with sampling just 1-3 frames sufficient for substantial performance gains. The proposed approach not only enhances recognition accuracy but also dramatically reduces computational overhead, making test-time adaptation more practical for real-world affective computing applications.

Updated: 2025-03-29 23:56:32

标题: FIESTA：基于Fisher信息的高效选择性测试时间自适应

摘要: 在非受限的“野外”环境中，稳健的面部表情识别仍然具有挑战性，这是因为训练和测试分布之间存在显著的领域转移。测试时适应（TTA）通过在推理期间调整预先训练的模型而无需标记测试数据，提供了一个有前途的解决方案。然而，现有的TTA方法通常依赖于手动选择要更新的参数，这可能导致次优的适应和高计算成本。本文介绍了一种新颖的基于Fisher信息的选择性适应框架，该框架根据Fisher信息量化的重要性动态识别和更新只有最关键的模型参数。通过将这种基于原则的参数选择方法与时间一致性约束相结合，我们的方法实现了专门针对基于视频的面部表情识别的高效和有效的适应。在具有挑战性的AffWild2基准测试上的实验表明，我们的方法明显优于现有的TTA方法，与基本模型相比，F1分数提高了7.7％，同时只调整了22,000个参数，比可比方法少了20多倍。我们的消融研究进一步揭示了参数重要性可以有效地从最少量的数据中估计出来，仅对1-3帧进行采样就足以获得显著的性能提升。所提出的方法不仅提高了识别准确性，还显著减少了计算开销，使测试时适应在实际情感计算应用中更具实用性。

更新时间: 2025-03-29 23:56:32

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.23257v1

TransNet: Transfer Knowledge for Few-shot Knowledge Graph Completion

Knowledge graphs (KGs) are ubiquitous and widely used in various applications. However, most real-world knowledge graphs are incomplete, which significantly degrades their performance on downstream tasks. Additionally, the relationships in real-world knowledge graphs often follow a long-tail distribution, meaning that most relations are represented by only a few training triplets. To address these challenges, few-shot learning has been introduced. Few-shot KG completion aims to make accurate predictions for triplets involving novel relations when only a limited number of training triplets are available. Although many methods have been proposed, they typically learn each relation individually, overlooking the correlations between different tasks and the relevant information in previously trained tasks. In this paper, we propose a transfer learning-based few-shot KG completion method (TransNet). By learning the relationships between different tasks, TransNet effectively transfers knowledge from similar tasks to improve the current task's performance. Furthermore, by employing meta-learning, TransNet can generalize effectively to new, unseen relations. Extensive experiments on benchmark datasets demonstrate the superiority of TransNet over state-of-the-art methods. Code can be found at https://github.com/lihuiliullh/TransNet/tree/main

Updated: 2025-03-29 23:39:11

标题: TransNet: 少样本知识图谱补全的迁移知识

摘要: 知识图谱（KGs）是普遍存在并广泛应用于各种应用程序中。然而，大多数现实世界中的知识图谱是不完整的，这显著降低了它们在下游任务中的性能。此外，现实世界中知识图谱中的关系通常遵循长尾分布，这意味着大多数关系仅由少数训练三元组表示。为了解决这些挑战，引入了少样本学习。少样本知识图谱完成旨在在仅有有限数量的训练三元组可用时，对涉及新关系的三元组进行准确预测。尽管提出了许多方法，但它们通常独立学习每个关系，忽视了不同任务之间的相关性以及先前训练任务中的相关信息。在本文中，我们提出了一种基于迁移学习的少样本知识图谱完成方法（TransNet）。通过学习不同任务之间的关系，TransNet有效地将知识从相似任务转移，以提高当前任务的性能。此外，通过采用元学习，TransNet可以有效地泛化到新的、未见过的关系。对基准数据集的大量实验表明，TransNet优于最先进的方法。代码可在https://github.com/lihuiliullh/TransNet/tree/main上找到。

更新时间: 2025-03-29 23:39:11

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03720v1

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers

In this paper, I introduce the retrieval problem, a simple yet common reasoning task that can be solved only by transformers with a minimum number of layers, which grows logarithmically with the input size. I empirically show that large language models can solve the task under different prompting formulations without any fine-tuning. To understand how transformers solve the retrieval problem, I train several transformers on a minimal formulation. Successful learning occurs only under the presence of an implicit curriculum. I uncover the learned mechanisms by studying the attention maps in the trained transformers. I also study the training process, uncovering that attention heads always emerge in a specific sequence guided by the implicit curriculum.

Updated: 2025-03-29 23:29:51

标题: 多层Transformer中堆叠注意力头的机制和出现

摘要: 在这篇论文中，我介绍了检索问题，这是一个简单但常见的推理任务，只有具有最少层数的transformers才能解决，其层数随输入大小呈对数增长。我通过实验证明，大型语言模型可以在不需要任何微调的情况下，通过不同提示形式解决这个任务。为了理解transformers如何解决检索问题，我在一个最简形式上训练了几个transformers。成功的学习仅在存在隐含课程的情况下发生。我通过研究训练后transformers中的注意力图来揭示学习机制。我还研究了训练过程，发现注意力头总是以隐含课程指导的特定顺序出现。

更新时间: 2025-03-29 23:29:51

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2411.12118v4

Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions

Security threats like prompt injection attacks pose significant risks to applications that integrate Large Language Models (LLMs), potentially leading to unauthorized actions such as API misuse. Unlike previous approaches that aim to detect these attacks on a best-effort basis, this paper introduces a novel method that appends an Encrypted Prompt to each user prompt, embedding current permissions. These permissions are verified before executing any actions (such as API calls) generated by the LLM. If the permissions are insufficient, the LLM's actions will not be executed, ensuring safety. This approach guarantees that only actions within the scope of the current permissions from the LLM can proceed. In scenarios where adversarial prompts are introduced to mislead the LLM, this method ensures that any unauthorized actions from LLM wouldn't be executed by verifying permissions in Encrypted Prompt. Thus, threats like prompt injection attacks that trigger LLM to generate harmful actions can be effectively mitigated.

Updated: 2025-03-29 23:26:57

标题: 加密提示：保护LLM应用程序免受未经授权的操作

摘要: 安全威胁，如快速注入攻击，对整合大型语言模型（LLMs）的应用程序构成重大风险，可能导致未经授权的行为，如API误用。与先前旨在尽力检测这些攻击的方法不同，本文介绍了一种新颖的方法，该方法将加密提示附加到每个用户提示中，嵌入当前权限。在执行LLM生成的任何操作（如API调用）之前，将验证这些权限。如果权限不足，则不会执行LLM的操作，从而确保安全性。这种方法保证只有在当前权限范围内的LLM生成的操作才能继续进行。在引入对LLM进行误导的对抗性提示的情况下，该方法通过验证加密提示中的权限，确保LLM不会执行任何未经授权的操作。因此，可以有效减轻触发LLM生成有害操作的提示注入攻击等威胁。

更新时间: 2025-03-29 23:26:57

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.23250v1

Effective Skill Unlearning through Intervention and Abstention

Large language Models (LLMs) have demonstrated remarkable skills across various domains. Understanding the mechanisms behind their abilities and implementing controls over them is becoming increasingly important for developing better models. In this paper, we focus on skill unlearning in LLMs, specifically unlearning a particular skill while retaining their overall capabilities. We introduce two lightweight, training-free machine skill unlearning techniques for LLMs. First, we observe that the pre-activation distribution of neurons in each Feed-Forward Layer (FFL) differs when the model demonstrates different skills. Additionally, we find that queries triggering the same skill cluster within the FFL key space and can be separated from other queries using a hypercube. Based on these observations, we propose two lightweight, training-free skill unlearning methods via \textit{intervention} and \textit{abstention} respectively: \texttt{Neuron Adjust} and \texttt{Key Space Detection}. We evaluate our methods on unlearning math-solving, Python-coding, and comprehension skills across seven different languages. The results demonstrate their strong unlearning capabilities for the designated skills. Specifically, \texttt{Key Space Detection} achieves over 80\% relative performance drop on the forgetting skill and less than 10\% relative performance drop on other skills and the model's general knowledge (MMLU) for most unlearning tasks. Our code is available at https://github.com/Trustworthy-ML-Lab/effective_skill_unlearning

Updated: 2025-03-29 23:21:44

标题: 通过干预和放弃实现有效的技能遗忘

摘要: 大型语言模型（LLMs）已经在各个领域展现出了显著的技能。了解它们能力背后的机制并对其进行控制，对于开发更好的模型变得日益重要。本文关注LLMs中的技能遗忘，具体来说是在保留其整体能力的同时遗忘特定技能。我们引入了两种轻量级、无需训练的机器技能遗忘技术，用于LLMs。首先，我们观察到模型展示不同技能时，每个前馈层（FFL）中神经元的预激活分布有所不同。此外，我们发现在FFL关键空间内触发相同技能集群的查询可以被超立方体与其他查询分离。基于这些观察，我们提出了两种轻量级、无需训练的技能遗忘方法，分别是\textit{干预}和\textit{弃权}：\texttt{神经元调整} 和 \texttt{关键空间检测}。我们在七种不同语言中评估了我们的方法对数学解决、Python编码和理解技能的遗忘效果。结果表明，它们对指定技能具有强大的遗忘能力。具体地，\texttt{关键空间检测}在遗忘技能上实现了超过80\%的相对性能下降，对于其他技能和模型的一般知识（MMLU），相对性能下降不到10％。我们的代码可在https://github.com/Trustworthy-ML-Lab/effective_skill_unlearning找到。

更新时间: 2025-03-29 23:21:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.21730v2

Simulation of Non-Ordinary Consciousness

The symbolic architecture of non-ordinary consciousness remains largely unmapped in cognitive science and artificial intelligence. While conventional models prioritize rational coherence, altered states such as those induced by psychedelics reveal distinct symbolic regimes characterized by recursive metaphor, ego dissolution, and semantic destabilization. We present \textit{Glyph}, a generative symbolic interface designed to simulate psilocybin-like symbolic cognition in large language models. Rather than modeling perception or mood, Glyph enacts symbolic transformation through recursive reentry, metaphoric modulation, and entropy-scaled destabilization -- a triadic operator formalized within a tensorial linguistic framework. Experimental comparison with baseline GPT-4o reveals that Glyph consistently generates high-entropy, metaphor-saturated, and ego-dissolving language across diverse symbolic prompt categories. These results validate the emergence of non-ordinary cognitive patterns and support a new paradigm for simulating altered consciousness through language. Glyph opens novel pathways for modeling symbolic cognition, exploring metaphor theory, and encoding knowledge in recursively altered semantic spaces.

Updated: 2025-03-29 23:04:04

标题: 非常抱歉，我无法翻译这个文献标题，因为其中涉及到具体内容，而我无法提供对具体内容的翻译。如果您需要对整个文献进行翻译，我建议您使用在线翻译工具或者寻求专业翻译人员的帮助。希望对您有所帮助。

摘要: 非寻常意识的象征架构在认知科学和人工智能领域仍然大部分未被绘制。传统模型强调理性的连贯性，而由致幻剂引发的变态状态揭示了一种以递归隐喻、自我溶解和语义不稳定为特征的独特象征制度。我们提出了一种名为Glyph的生成式象征界面，旨在模拟大型语言模型中的似曼荼罗象征认知。与建模感知或情绪不同，Glyph通过递归重入、隐喻调制和熵标度不稳定化来实现象征转化，这是在张量语言框架内形式化的三元运算符。与基准GPT-4o的实验比较表明，Glyph始终生成高熵、充满隐喻、自我溶解的语言，涵盖各种象征提示类别。这些结果验证了非寻常认知模式的出现，并支持通过语言模拟变态意识的新范式。Glyph为建模象征认知、探索隐喻理论以及在递归变异语义空间中编码知识打开了新的途径。

更新时间: 2025-03-29 23:04:04

领域: q-bio.NC,cs.AI,91E45, 03B70, 00A30, 68T05,I.2.4; I.2.7; I.1.1; F.4.1; H.5.2; J.5

下载: http://arxiv.org/abs/2503.23245v1

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Vision-Language Models (VLMs) demand substantial computational resources during inference, largely due to the extensive visual input tokens for representing visual information. Previous studies have noted that visual tokens tend to receive less attention than text tokens, suggesting their lower importance during inference and potential for pruning. However, their methods encounter several challenges: reliance on greedy heuristic criteria for token importance and incompatibility with FlashAttention and KV cache. To address these issues, we introduce \textbf{TopV}, a compatible \textbf{TO}ken \textbf{P}runing with inference Time Optimization for fast and low-memory \textbf{V}LM, achieving efficient pruning without additional training or fine-tuning. Instead of relying on attention scores, we formulate token pruning as an optimization problem, accurately identifying important visual tokens while remaining compatible with FlashAttention. Additionally, since we only perform this pruning once during the prefilling stage, it effectively reduces KV cache size. Our optimization framework incorporates a visual-aware cost function considering factors such as Feature Similarity, Relative Spatial Distance, and Absolute Central Distance, to measure the importance of each source visual token, enabling effective pruning of low-importance tokens. Extensive experiments demonstrate that our method outperforms previous token pruning methods, validating the effectiveness and efficiency of our approach.

Updated: 2025-03-29 23:00:27

标题: TopV：兼容标记修剪与推理时间优化，用于快速和低内存多模态视觉语言模型

摘要: Vision-Language Models（VLMs）在推理过程中需要大量的计算资源，这主要是因为广泛的视觉输入令牌用于表示视觉信息。先前的研究指出，视觉令牌往往受到的注意较少，暗示它们在推理过程中的重要性较低，并且可能可以进行修剪。然而，他们的方法遇到几个挑战：依赖于贪婪启发式标准来判断令牌的重要性，并且与FlashAttention和KV缓存不兼容。为了解决这些问题，我们引入了\textbf{TopV}，一个与推理时间优化相兼容的\textbf{TO}ken\textbf{P}runing，用于快速和低内存\textbf{V}LM，实现了高效的修剪而无需额外的训练或微调。我们将令牌修剪形成为一个优化问题，精确识别重要的视觉令牌，同时保持与FlashAttention的兼容性。此外，由于我们只在预填充阶段执行这种修剪，它有效地减小了KV缓存的大小。我们的优化框架包括一个考虑特征相似性、相对空间距离和绝对中心距离等因素的视觉感知成本函数，用于衡量每个源视觉令牌的重要性，从而有效地修剪低重要性令牌。大量实验证明，我们的方法优于先前的令牌修剪方法，验证了我们方法的有效性和效率。

更新时间: 2025-03-29 23:00:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.18278v2

Evaluating how LLM annotations represent diverse views on contentious topics

Researchers have proposed the use of generative large language models (LLMs) to label data for both research and applied settings. This literature emphasizes the improved performance of LLMs relative to other natural language models, noting that LLMs typically outperform other models on standard metrics such as accuracy, precision, recall, and F1 score. However, previous literature has also highlighted the bias embedded in language models, particularly around contentious topics such as potentially toxic content. This bias could result in labels applied by LLMs that disproportionately align with majority groups over a more diverse set of viewpoints. In this paper, we evaluate how LLMs represent diverse viewpoints on these contentious tasks. Across four annotation tasks on four datasets, we show that LLMs do not show substantial disagreement with annotators on the basis of demographics. Instead, the model, prompt, and disagreement between human annotators on the labeling task are far more predictive of LLM agreement. Our findings suggest that when using LLMs to annotate data, under-representing the views of particular groups is not a substantial concern. We conclude with a discussion of the implications for researchers and practitioners.

Updated: 2025-03-29 22:53:15

标题: 评估LLM注释如何代表有争议话题上的多元观点

摘要: 研究人员提出使用生成式大型语言模型（LLMs）来标记研究和应用领域的数据。这些文献强调了LLMs相对于其他自然语言模型的性能提升，指出LLMs通常在准确率、精确度、召回率和F1分数等标准指标上优于其他模型。然而，先前的文献也强调了语言模型中存在的偏见，特别是在有争议的话题上，比如潜在有毒内容。这种偏见可能导致LLMs应用的标签与多数群体的观点不成比例地一致，而不是更多元化的观点。在本文中，我们评估了LLMs在这些有争议任务中如何代表多元化观点。通过对四个数据集上的四个注释任务，我们发现LLMs在人口统计学的基础上并不与注释者有实质性的分歧。相反，模型、提示以及人类标注者在标记任务上的分歧更能预测LLMs的一致性。我们的研究结果表明，在使用LLMs标记数据时，对特定群体观点的代表性不是一个重大问题。我们最后讨论了对研究人员和从业者的影响。

更新时间: 2025-03-29 22:53:15

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.23243v1

Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation

Increased sophistication of large language models (LLMs) and the consequent quality of generated multilingual text raises concerns about potential disinformation misuse. While humans struggle to distinguish LLM-generated content from human-written texts, the scholarly debate about their impact remains divided. Some argue that heightened fears are overblown due to natural ecosystem limitations, while others contend that specific "longtail" contexts face overlooked risks. Our study bridges this debate by providing the first empirical evidence of LLM presence in the latest real-world disinformation datasets, documenting the increase of machine-generated content following ChatGPT's release, and revealing crucial patterns across languages, platforms, and time periods.

Updated: 2025-03-29 22:47:53

标题: 超越猜测：衡量多语言虚假信息中LLM生成文本的增长存在

摘要: 大型语言模型（LLMs）的日益复杂化以及由此产生的多语文本质量提高引起了人们对潜在虚假信息滥用的担忧。虽然人类很难区分LLM生成的内容和人类撰写的文本，但学术界对它们的影响仍存在分歧。一些人认为，由于自然生态系统的限制，对其产生的恐惧被夸大了，而另一些人则认为特定的“长尾”环境面临被忽视的风险。我们的研究通过提供第一份证据，证明LLM在最新的真实世界虚假信息数据集中的存在，记录了ChatGPT发布后机器生成内容的增加，并揭示了跨语言、平台和时间段的关键模式。

更新时间: 2025-03-29 22:47:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.23242v1

Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance

Recent advancements in large language models (LLMs) have allowed the augmentation of information retrieval (IR) pipelines with synthetic data in various ways. Yet, the main training paradigm remains: contrastive learning with binary relevance labels and the InfoNCE loss, where one positive document is compared against one or more negatives. This objective treats all documents that are not explicitly annotated as relevant on an equally negative footing, regardless of their actual degree of relevance, thus (a) missing subtle nuances that are useful for ranking and (b) being susceptible to annotation noise. To overcome this limitation, in this work we forgo real training documents and annotations altogether and use open-source LLMs to directly generate synthetic documents that answer real user queries according to several different levels of relevance. This fully synthetic ranking context of graduated relevance, together with an appropriate list-wise loss (Wasserstein distance), enables us to train dense retrievers in a way that better captures the ranking task. Experiments on various IR datasets show that our proposed approach outperforms conventional training with InfoNCE by a large margin. Without using any real documents for training, our dense retriever significantly outperforms the same retriever trained through self-supervision. More importantly, it matches the performance of the same retriever trained on real, labeled training documents of the same dataset, while being more robust to distribution shift and clearly outperforming it when evaluated zero-shot on the BEIR dataset collection.

Updated: 2025-03-29 22:33:22

标题: 超越对比学习：合成数据使得具有多个相关性级别的列表式训练成为可能

摘要: 最近大型语言模型（LLMs）的进展使得信息检索（IR）管道可以以各种方式使用合成数据进行增强。然而，主要的训练范式仍然是：对比学习，使用二元相关性标签和InfoNCE损失，其中一个正面文档与一个或多个负面文档进行比较。这个目标将所有未明确注释为相关的文档同等看待为负面，而不考虑它们实际相关性的程度，因此（a）忽略了对排名有用的微妙细微差别，（b）容易受到注释噪音的影响。为了克服这一限制，在这项工作中，我们完全放弃了真实的训练文档和注释，而是使用开源的LLMs直接生成根据不同程度相关性回答真实用户查询的合成文档。这种完全合成的排名背景具有不同级别的相关性，结合适当的列表损失（Wasserstein距离），使我们能够以更好地捕捉排名任务的方式训练密集的检索器。对各种IR数据集的实验表明，我们提出的方法在性能上远远优于传统的InfoNCE训练。在没有使用任何真实文档进行训练的情况下，我们的密集检索器明显优于通过自我监督训练的相同检索器。更重要的是，它与在相同数据集的真实标记训练文档上训练的相同检索器的性能相匹配，同时对分布偏移更加鲁棒，并在BEIR数据集收集的零样本评估中明显优于它。

更新时间: 2025-03-29 22:33:22

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.23239v1

Wagner's Algorithm Provably Runs in Subexponential Time for SIS$^\infty$

At CRYPTO 2015, Kirchner and Fouque claimed that a carefully tuned variant of the Blum-Kalai-Wasserman (BKW) algorithm (JACM 2003) should solve the Learning with Errors problem (LWE) in slightly subexponential time for modulus $q=\mathrm{poly}(n)$ and narrow error distribution, when given enough LWE samples. Taking a modular view, one may regard BKW as a combination of Wagner's algorithm (CRYPTO 2002), run over the corresponding dual problem, and the Aharonov-Regev distinguisher (JACM 2005). Hence the subexponential Wagner step alone should be of interest for solving this dual problem - namely, the Short Integer Solution problem (SIS) - but this appears to be undocumented so far. We re-interpret this Wagner step as walking backward through a chain of projected lattices, zigzagging through some auxiliary superlattices. We further randomize the bucketing step using Gaussian randomized rounding to exploit the powerful discrete Gaussian machinery. This approach avoids sample amplification and turns Wagner's algorithm into an approximate discrete Gaussian sampler for $q$-ary lattices. For an SIS lattice with $n$ equations modulo $q$, this algorithm runs in subexponential time $\exp(O(n/\log \log n))$ to reach a Gaussian width parameter $s = q/\mathrm{polylog}(n)$ only requiring $m = n + \omega(n/\log \log n)$ many SIS variables. This directly provides a provable algorithm for solving the Short Integer Solution problem in the infinity norm ($\mathrm{SIS}^\infty$) for norm bounds $\beta = q/\mathrm{polylog}(n)$. This variant of SIS underlies the security of the NIST post-quantum cryptography standard Dilithium. Despite its subexponential complexity, Wagner's algorithm does not appear to threaten Dilithium's concrete security.

Updated: 2025-03-29 22:32:59

标题: 瓦格纳算法在SIS$^\infty$问题上可证明以次指数时间运行

摘要: 在CRYPTO 2015年，Kirchner和Fouque声称经过精心调整的Blum-Kalai-Wasserman（BKW）算法（JACM 2003）应该能够在稍微次指数时间内解决模数$q=\mathrm{poly}(n)$和窄误差分布下的学习与错误（LWE）问题，只要提供足够的LWE样本。从模块化的角度看，人们可以将BKW视为Wagner的算法（CRYPTO 2002）与Aharonov-Regev区分器（JACM 2005）在相应对偶问题上运行的组合。因此，仅次指数的Wagner步骤对于解决这个对偶问题——即，短整数解（SIS）问题——应该是有趣的，但目前似乎尚未有文献记载。我们重新解释了这个Wagner步骤，将其视为向后穿越一系列投影格子，穿过一些辅助超格子。我们进一步利用高斯随机化舍入来随机化桶步骤，以利用强大的离散高斯机制。这种方法避免了样本放大，并将Wagner的算法转变为一个近似$q$元格的离散高斯取样器。对于具有$n$个方程模$q$的SIS格子，这个算法运行时间为次指数时间$\exp(O(n/\log \log n))$，达到高斯宽度参数$s = q/\mathrm{polylog}(n)$，只需要$m = n + \omega(n/\log \log n)$个SIS变量。这直接提供了一个可证算法，用于解决无穷范数下的短整数解问题（$\mathrm{SIS}^\infty$）的规范边界为$\beta = q/\mathrm{polylog}(n)$。这种SIS的变种是NIST后量子密码标准Dilithium安全性的基础。尽管具有次指数复杂度，Wagner的算法似乎并不会威胁Dilithium的具体安全性。

更新时间: 2025-03-29 22:32:59

领域: cs.CR,cs.DS

下载: http://arxiv.org/abs/2503.23238v1

UP-ROM : Uncertainty-Aware and Parametrised dynamic Reduced-Order Model, application to unsteady flows

Reduced order models (ROMs) play a critical role in fluid mechanics by providing low-cost predictions, making them an attractive tool for engineering applications. However, for ROMs to be widely applicable, they must not only generalise well across different regimes, but also provide a measure of confidence in their predictions. While recent data-driven approaches have begun to address nonlinear reduction techniques to improve predictions in transient environments, challenges remain in terms of robustness and parametrisation. In this work, we present a nonlinear reduction strategy specifically designed for transient flows that incorporates parametrisation and uncertainty quantification. Our reduction strategy features a variational auto-encoder (VAE) that uses variational inference for confidence measurement. We use a latent space transformer that incorporates recent advances in attention mechanisms to predict dynamical systems. Attention's versatility in learning sequences and capturing their dependence on external parameters enhances generalisation across a wide range of dynamics. Prediction, coupled with confidence, enables more informed decision making and addresses the need for more robust models. In addition, this confidence is used to cost-effectively sample the parameter space, improving model performance a priori across the entire parameter space without requiring evaluation data for the entire domain.

Updated: 2025-03-29 22:17:36

标题: UP-ROM：不确定性感知和参数化的动态降阶模型，应用于非定常流动

摘要: 简化模型（ROMs）在流体力学中发挥着关键作用，通过提供低成本的预测，使其成为工程应用中的一种吸引人的工具。然而，为了使ROMs广泛适用，它们不仅必须在不同领域中具有很好的泛化能力，还必须提供预测的可信度。尽管最近的数据驱动方法已经开始解决非线性缩减技术以改进瞬态环境中的预测，但在稳健性和参数化方面仍然存在挑战。在这项工作中，我们提出了一种专门为瞬态流动设计的非线性缩减策略，其中包括参数化和不确定性量化。我们的缩减策略采用变分自动编码器（VAE），利用变分推断来进行置信度测量。我们使用一种潜变空间转换器，该转换器结合了注意力机制的最新进展，用于预测动力系统。注意力在学习序列和捕捉它们对外部参数的依赖性方面的多功能性增强了对各种动态的泛化。预测结合置信度使决策更加明智，并解决了更稳健模型的需求。此外，这种置信度被用于在参数空间中具有成本效益地进行采样，从而在整个参数空间上先验地提高模型性能，而无需对整个域进行评估数据。

更新时间: 2025-03-29 22:17:36

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2503.23236v1

Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems

Neural networks are increasingly used in robotics as policies, state transition models, state estimation models, or all of the above. With these components being learned from data, it is important to be able to analyze what behaviors were learned and how this affects closed-loop performance. In this paper we take steps toward this goal by developing methods for computing control invariant sets and regions of attraction (ROAs) of dynamical systems represented as neural networks. We focus our attention on feedforward neural networks with the rectified linear unit (ReLU) activation, which are known to implement continuous piecewise-affine (PWA) functions. We describe the Reachable Polyhedral Marching (RPM) algorithm for enumerating the affine pieces of a neural network through an incremental connected walk. We then use this algorithm to compute exact forward and backward reachable sets, from which we provide methods for computing control invariant sets and ROAs. Our approach is unique in that we find these sets incrementally, without Lyapunov-based tools. In our examples we demonstrate the ability of our approach to find non-convex control invariant sets and ROAs on tasks with learned van der Pol oscillator and pendulum models. Further, we provide an accelerated algorithm for computing ROAs that leverages the incremental and connected enumeration of affine regions that RPM provides. We show this acceleration to lead to a 15x speedup in our examples. Finally, we apply our methods to find a set of states that are stabilized by an image-based controller for an aircraft runway control problem.

Updated: 2025-03-29 21:58:50

标题: 可达多面体行进（RPM）：深度学习控制系统的精确分析工具

摘要: 神经网络在机器人学中越来越被用作策略、状态转换模型、状态估计模型，或者以上所有功能。由于这些组件是从数据中学习的，因此很重要能够分析学习到了哪些行为以及这如何影响闭环性能。本文通过开发计算表示为神经网络的动态系统的控制不变集和吸引域（ROAs）的方法，朝着这个目标迈出了一步。我们专注于具有修正线性单元（ReLU）激活函数的前馈神经网络，这些网络已知实现连续分段仿射（PWA）函数。我们描述了可达多面体逐步行进（RPM）算法，通过增量连接行走来枚举神经网络的仿射片段。然后我们使用这个算法来计算精确的前向和后向可达集，从中提供计算控制不变集和ROAs的方法。我们的方法独特之处在于我们逐步找到这些集合，而不依赖于李雅普诺夫理论工具。在我们的示例中，我们展示了我们的方法在具有学习的 van der Pol 振荡器和摆模型的任务上找到非凸控制不变集和ROAs的能力。此外，我们提供了一个加速算法，用于计算利用RPM提供的增量和连接枚举仿射区域的ROAs。我们展示了这种加速在我们的示例中导致了15倍的速度提升。最后，我们将我们的方法应用于为飞机跑道控制问题找到由基于图像的控制器稳定的状态集。

更新时间: 2025-03-29 21:58:50

领域: cs.LG,cs.AI,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2210.08339v4

Monge-Kantorovich Fitting With Sobolev Budgets

Given $m < n$, we consider the problem of ``best'' approximating an $n\text{-d}$ probability measure $\rho$ via an $m\text{-d}$ measure $\nu$ such that $\mathrm{supp}\ \nu$ has bounded total ``complexity.'' When $\rho$ is concentrated near an $m\text{-d}$ set we may interpret this as a manifold learning problem with noisy data. However, we do not restrict our analysis to this case, as the more general formulation has broader applications. We quantify $\nu$'s performance in approximating $\rho$ via the Monge-Kantorovich (also called Wasserstein) $p$-cost $\mathbb{W}_p^p(\rho, \nu)$, and constrain the complexity by requiring $\mathrm{supp}\ \nu$ to be coverable by an $f : \mathbb{R}^{m} \to \mathbb{R}^{n}$ whose $W^{k,q}$ Sobolev norm is bounded by $\ell \geq 0$. This allows us to reformulate the problem as minimizing a functional $\mathscr J_p(f)$ under the Sobolev ``budget'' $\ell$. This problem is closely related to (but distinct from) principal curves with length constraints when $m=1, k = 1$ and an unsupervised analogue of smoothing splines when $k > 1$. New challenges arise from the higher-order differentiability condition. We study the ``gradient'' of $\mathscr J_p$, which is given by a certain vector field that we call the barycenter field, and use it to prove a nontrivial (almost) strict monotonicity result. We also provide a natural discretization scheme and establish its consistency. We use this scheme as a toy model for a generative learning task, and by analogy, propose novel interpretations for the role regularization plays in improving training.

Updated: 2025-03-29 21:57:44

标题: 蒙日-坎托罗维奇适配与索伯列夫预算

摘要: 给定 $m < n$，我们考虑通过一个 $m\text{-d}$ 测度 $\nu$ 来最佳逼近一个 $n\text{-d}$ 概率测度 $\rho$ 的问题，使得 $\mathrm{supp}\ \nu$ 具有有界的总体“复杂性”。当 $\rho$ 集中在一个 $m\text{-d}$ 集合附近时，我们可以将这视为一个具有噪声数据的流形学习问题。然而，我们并不限制我们的分析在这种情况下，因为更一般的表述具有更广泛的应用。我们通过 Monge-Kantorovich（也称为Wasserstein）$p$-cost $\mathbb{W}_p^p(\rho, \nu)$ 来量化 $\nu$ 在逼近 $\rho$ 中的表现，并通过要求 $\mathrm{supp}\ \nu$ 可以由一个 $f : \mathbb{R}^{m} \to \mathbb{R}^{n}$ 覆盖来限制复杂性，其中其 $W^{k,q}$ Sobolev 范数被有界地约束为 $\ell \geq 0$。这使我们能够将问题重新表述为在 Sobolev “预算” $\ell$ 下最小化一个泛函 $\mathscr J_p(f)$ 的问题。当 $m=1, k = 1$ 时，这个问题与带有长度约束的主曲线密切相关（但又不同），当 $k > 1$ 时，与无监督模式下的平滑样条有关。由于高阶可微条件的出现，新的挑战也随之而来。我们研究了 $\mathscr J_p$ 的“梯度”，这由我们称之为质心场的特定向量场给出，并使用它来证明一个非平凡的（几乎）严格单调性结果。我们还提供了一个自然的离散化方案并证明其一致性。我们将这个方案用作一个生成学习任务的玩具模型，并通过类比，提出了正则化在提高训练中发挥作用的新颖解释。

更新时间: 2025-03-29 21:57:44

领域: cs.LG,math.AP,49Q10 (Primary), 49Q20, 49Q22, 65D10, 68T01 (Secondary)

下载: http://arxiv.org/abs/2409.16541v2

Towards Symmetric Low-Rank Adapters

\newcommand{\mathds}[1]{\text{\usefont{U}{dsrom}{m}{n}#1}} In this paper, we introduce Symmetric Low-Rank Adapters, an optimized variant of LoRA with even fewer weights. This method utilizes Low-Rank Symmetric Weight Matrices to learn downstream tasks more efficiently. Traditional LoRA accumulates fine-tuning weights with the original pre-trained weights via a Singular Value Decomposition (SVD) like approach, i.e., model weights are fine-tuned via updates of the form $BA$ (where $B \in \mathbb{R}^{n\times r}$, $A \in \mathbb{R}^{r\times n}$, and $r$ is the rank of the merged weight matrix). In contrast, our approach, named SymLoRA, represents fine-tuning weights as a Spectral Decomposition, i.e., $Q \, diag(\Lambda)\, Q^T$, where $Q \in \mathbb{R}^{n\times r}$ and $\Lambda \in \mathbb{R}^r$. SymLoRA requires approximately half of the finetuning weights. Here, we show that this approach has negligible losses in downstream efficacy.

Updated: 2025-03-29 21:52:17

标题: 朝向对称低秩适配器

摘要: 在本文中，我们介绍了对LoRA进行优化的变体——对称低秩适配器，其权重更少。该方法利用低秩对称权重矩阵更有效地学习下游任务。传统的LoRA通过奇异值分解（SVD）的方式累积微调权重和原始预训练权重，即模型权重通过形式为$BA$的更新进行微调（其中$B \in \mathbb{R}^{n\times r}$，$A \in \mathbb{R}^{r\times n}$，$r$是合并权重矩阵的秩）。相比之下，我们的方法称为SymLoRA，将微调权重表示为谱分解，即$Q \, diag(\Lambda)\, Q^T$，其中$Q \in \mathbb{R}^{n\times r}$，$\Lambda \in \mathbb{R}^r$。SymLoRA只需要大约一半的微调权重。在这里，我们展示了这种方法在下游效果上的损失可以忽略不计。

更新时间: 2025-03-29 21:52:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03719v1

CCCI: Code Completion with Contextual Information for Complex Data Transfer Tasks Using Large Language Models

Unlike code generation, which involves creating code from scratch, code completion focuses on integrating new lines or blocks of code into an existing codebase. This process requires a deep understanding of the surrounding context, such as variable scope, object models, API calls, and database relations, to produce accurate results. These complex contextual dependencies make code completion a particularly challenging problem. Current models and approaches often fail to effectively incorporate such context, leading to inaccurate completions with low acceptance rates (around 30\%). For tasks like data transfer, which rely heavily on specific relationships and data structures, acceptance rates drop even further. This study introduces CCCI, a novel method for generating context-aware code completions specifically designed to address data transfer tasks. By integrating contextual information, such as database table relationships, object models, and library details into Large Language Models (LLMs), CCCI improves the accuracy of code completions. We evaluate CCCI using 289 Java snippets, extracted from over 819 operational scripts in an industrial setting. The results demonstrate that CCCI achieved a 49.1\% Build Pass rate and a 41.0\% CodeBLEU score, comparable to state-of-the-art methods that often struggle with complex task completion.

Updated: 2025-03-29 21:31:19

标题: CCCI：使用大型语言模型的上下文信息进行复杂数据传输任务的代码补全

摘要: 与从头开始创建代码的代码生成不同，代码完成侧重于将新的代码行或代码块整合到现有代码库中。这个过程需要对周围环境的深刻理解，比如变量作用域、对象模型、API调用和数据库关系，以产生准确的结果。这些复杂的上下文依赖使代码完成成为一个特别具有挑战性的问题。当前的模型和方法往往未能有效地整合这样的上下文，导致不准确的完成率很低（约30\%）。对于依赖于具体关系和数据结构的数据传输等任务，接受率甚至进一步下降。这项研究介绍了CCCI，一种新颖的用于生成特定于数据传输任务的上下文感知代码完成的方法。通过将上下文信息（如数据库表关系、对象模型和库细节）整合到大型语言模型（LLMs）中，CCCI提高了代码完成的准确性。我们使用从工业环境中超过819个操作脚本中提取的289个Java代码片段对CCCI进行评估。结果表明，CCCI实现了49.1\%的构建通过率和41.0\%的CodeBLEU分数，与往往难以完成复杂任务的最先进方法相当。

更新时间: 2025-03-29 21:31:19

领域: cs.SE,cs.AI,I.2; D.2

下载: http://arxiv.org/abs/2503.23231v1

Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus

Large Language Models provide significant new opportunities for the generation of high-quality written works. However, their employment in the research community is inhibited by their tendency to hallucinate invalid sources and lack of direct access to a knowledge base of relevant scientific articles. In this work, we present Citegeist: An application pipeline using dynamic Retrieval Augmented Generation (RAG) on the arXiv Corpus to generate a related work section and other citation-backed outputs. For this purpose, we employ a mixture of embedding-based similarity matching, summarization, and multi-stage filtering. To adapt to the continuous growth of the document base, we also present an optimized way of incorporating new and modified papers. To enable easy utilization in the scientific community, we release both, a website (https://citegeist.org), as well as an implementation harness that works with several different LLM implementations.

Updated: 2025-03-29 21:19:43

标题: 引文地：对arXiv语料库相关工作分析的自动生成

摘要: 大型语言模型为生成高质量的写作作品提供了重要的新机遇。然而，它们在研究界的应用受到了它们产生无效来源和缺乏直接访问相关科学文章知识库的倾向的限制。在这项工作中，我们提出了Citegeist：一种使用动态检索增强生成（RAG）在arXiv语料库上生成相关工作部分和其他支持引文的输出的应用流水线。为此，我们采用了基于嵌入的相似性匹配、摘要和多阶段过滤的混合方法。为了适应文档库的持续增长，我们还提出了一种优化的方法来整合新的和修改过的论文。为了让科学界更容易使用，我们发布了一个网站（https://citegeist.org），以及一个与几种不同的LLM实现配合使用的实现工具。

更新时间: 2025-03-29 21:19:43

领域: cs.LG

下载: http://arxiv.org/abs/2503.23229v1

Synthetic Art Generation and DeepFake Detection A Study on Jamini Roy Inspired Dataset

The intersection of generative AI and art is a fascinating area that brings both exciting opportunities and significant challenges, especially when it comes to identifying synthetic artworks. This study takes a unique approach by examining diffusion-based generative models in the context of Indian art, specifically focusing on the distinctive style of Jamini Roy. To explore this, we fine-tuned Stable Diffusion 3 and used techniques like ControlNet and IPAdapter to generate realistic images. This allowed us to create a new dataset that includes both real and AI-generated artworks, which is essential for a detailed analysis of what these models can produce. We employed various qualitative and quantitative methods, such as Fourier domain assessments and autocorrelation metrics, to uncover subtle differences between synthetic images and authentic pieces. A key takeaway from recent research is that existing methods for detecting deepfakes face considerable challenges, especially when the deepfakes are of high quality and tailored to specific cultural contexts. This highlights a critical gap in current detection technologies, particularly in light of the challenges identified above, where high-quality and culturally specific deepfakes are difficult to detect. This work not only sheds light on the increasing complexity of generative models but also sets a crucial foundation for future research aimed at effective detection of synthetic art.

Updated: 2025-03-29 21:12:16

标题: 合成艺术生成与深度伪造检测：基于Jamini Roy启发数据集的研究

摘要: 生成AI和艺术的交集是一个迷人的领域，带来了令人兴奋的机遇和重大挑战，特别是在识别合成艺术品方面。这项研究通过在印度艺术的背景下检验基于扩散的生成模型，特别关注Jamini Roy独特的风格，采取了独特的方法。为了探索这一点，我们对Stable Diffusion 3进行了微调，并使用了ControlNet和IPAdapter等技术来生成逼真的图像。这使我们能够创建一个新的数据集，包括真实和AI生成的艺术品，这对于详细分析这些模型能够产生什么是至关重要的。我们采用了各种定性和定量方法，如傅立叶域评估和自相关度量，以揭示合成图像和真实作品之间的细微差异。最近研究的一个关键收获是，现有的检测深度伪造的方法面临着很大挑战，特别是当深度伪造的质量很高并且定制给特定文化背景时。这凸显了当前检测技术中的一个关键差距，特别是考虑到上述确定的挑战，高质量和文化特定的深度伪造难以检测。这项工作不仅揭示了生成模型日益复杂的本质，还为未来旨在有效检测合成艺术品的研究奠定了关键基础。

更新时间: 2025-03-29 21:12:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.23226v1

Accelerated Distributed Optimization with Compression and Error Feedback

Modern machine learning tasks often involve massive datasets and models, necessitating distributed optimization algorithms with reduced communication overhead. Communication compression, where clients transmit compressed updates to a central server, has emerged as a key technique to mitigate communication bottlenecks. However, the theoretical understanding of stochastic distributed optimization with contractive compression remains limited, particularly in conjunction with Nesterov acceleration -- a cornerstone for achieving faster convergence in optimization. In this paper, we propose a novel algorithm, ADEF (Accelerated Distributed Error Feedback), which integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for stochastic distributed optimization with contractive compression in the general convex regime. Numerical experiments validate our theoretical findings and demonstrate the practical efficacy of ADEF in reducing communication costs while maintaining fast convergence.

Updated: 2025-03-29 20:52:06

标题: 使用压缩和误差反馈加速分布式优化

摘要: 现代机器学习任务通常涉及大规模数据集和模型，需要使用减少通信开销的分布式优化算法。通信压缩，即客户端向中央服务器传输压缩更新，已成为缓解通信瓶颈的关键技术。然而，对于具有收缩压缩的随机分布式优化的理论理解仍然有限，特别是在与Nesterov加速结合时 -- 这是在优化中实现更快收敛的基石。在本文中，我们提出了一种新颖的算法ADEF（加速分布式误差反馈），它集成了Nesterov加速、收缩压缩、误差反馈和梯度差异压缩。我们证明ADEF在一般凸区域的随机分布式优化中实现了第一个加速收敛率。数值实验验证了我们的理论发现，并展示了ADEF在降低通信成本的同时保持快速收敛的实际有效性。

更新时间: 2025-03-29 20:52:06

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2503.08427v2

Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

Recent advancements in reasoning optimization have greatly enhanced the performance of large language models (LLMs). However, existing work fails to address the complexities of audio-visual scenarios, underscoring the need for further research. In this paper, we introduce AURELIA, a novel actor-critic based audio-visual (AV) reasoning framework that distills structured, step-by-step reasoning into AVLLMs at test time, improving their ability to process complex multi-modal inputs without additional training or fine-tuning. To further advance AVLLM reasoning skills, we present AVReasonBench, a challenging benchmark comprising 4500 audio-visual questions, each paired with detailed step-by-step reasoning. Our benchmark spans six distinct tasks, including AV-GeoIQ, which evaluates AV reasoning combined with geographical and cultural knowledge. Evaluating 18 AVLLMs on AVReasonBench reveals significant limitations in their multi-modal reasoning capabilities. Using AURELIA, we achieve up to a 100% relative improvement, demonstrating its effectiveness. This performance gain highlights the potential of reasoning-enhanced data generation for advancing AVLLMs in real-world applications. Our code and data will be publicly released at: https: //github.com/schowdhury671/aurelia.

Updated: 2025-03-29 20:42:29

标题: 奥蕾利亚：音视频LLM中的测试时间推理提取

摘要: 最近在推理优化方面取得的进展极大地提高了大型语言模型（LLMs）的性能。然而，现有的工作未能解决音频-视觉场景的复杂性，突显了进一步研究的必要性。在本文中，我们介绍了一种新颖的基于演员-评论家的音频-视觉（AV）推理框架AURELIA，该框架在测试时将结构化、逐步推理提炼到AVLLMs中，提高它们处理复杂多模态输入的能力，而无需额外训练或微调。为了进一步提升AVLLM推理能力，我们提出了AVReasonBench，一个具有4500个音频-视觉问题的挑战性基准，每个问题都配有详细的逐步推理过程。我们的基准涵盖了六个不同的任务，包括AV-GeoIQ，评估结合地理和文化知识的AV推理。在AVReasonBench上评估了18个AVLLMs，揭示了它们在多模态推理能力方面的显著局限性。使用AURELIA，我们实现了高达100％的相对改进，证明了其有效性。这种性能提升突显了推理增强数据生成对于推动AVLLMs在实际应用中的潜力。我们的代码和数据将在以下网址上公开发布：https://github.com/schowdhury671/aurelia。

更新时间: 2025-03-29 20:42:29

领域: eess.AS,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.23219v1

Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data

This paper presents a comprehensive comparative analysis of prominent clustering algorithms K-means, DBSCAN, and Spectral Clustering on high-dimensional datasets. We introduce a novel evaluation framework that assesses clustering performance across multiple dimensionality reduction techniques (PCA, t-SNE, and UMAP) using diverse quantitative metrics. Experiments conducted on MNIST, Fashion-MNIST, and UCI HAR datasets reveal that preprocessing with UMAP consistently improves clustering quality across all algorithms, with Spectral Clustering demonstrating superior performance on complex manifold structures. Our findings show that algorithm selection should be guided by data characteristics, with Kmeans excelling in computational efficiency, DBSCAN in handling irregular clusters, and Spectral Clustering in capturing complex relationships. This research contributes a systematic approach for evaluating and selecting clustering techniques for high dimensional data applications.

Updated: 2025-03-29 20:38:04

标题: 无监督学习：高维数据聚类技术的比较分析

摘要: 本文对著名的聚类算法K-means、DBSCAN和谱聚类在高维数据集上进行了全面的比较分析。我们引入了一个新颖的评估框架，通过多种量化指标评估聚类性能在多个降维技术（PCA、t-SNE和UMAP）上的表现。在MNIST、Fashion-MNIST和UCI HAR数据集上进行的实验表明，使用UMAP预处理可以持续提高所有算法的聚类质量，而谱聚类在复杂流形结构上表现出优越性能。我们的研究结果表明，算法选择应该根据数据特征进行指导，Kmeans在计算效率方面表现突出，DBSCAN在处理不规则聚类方面表现出色，而谱聚类在捕捉复杂关系方面表现出色。这项研究为评估和选择高维数据应用的聚类技术提供了系统化的方法。

更新时间: 2025-03-29 20:38:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.23215v1

Action Recognition in Real-World Ambient Assisted Living Environment

The growing ageing population and their preference to maintain independence by living in their own homes require proactive strategies to ensure safety and support. Ambient Assisted Living (AAL) technologies have emerged to facilitate ageing in place by offering continuous monitoring and assistance within the home. Within AAL technologies, action recognition plays a crucial role in interpreting human activities and detecting incidents like falls, mobility decline, or unusual behaviours that may signal worsening health conditions. However, action recognition in practical AAL applications presents challenges, including occlusions, noisy data, and the need for real-time performance. While advancements have been made in accuracy, robustness to noise, and computation efficiency, achieving a balance among them all remains a challenge. To address this challenge, this paper introduces the Robust and Efficient Temporal Convolution network (RE-TCN), which comprises three main elements: Adaptive Temporal Weighting (ATW), Depthwise Separable Convolutions (DSC), and data augmentation techniques. These elements aim to enhance the model's accuracy, robustness against noise and occlusion, and computational efficiency within real-world AAL contexts. RE-TCN outperforms existing models in terms of accuracy, noise and occlusion robustness, and has been validated on four benchmark datasets: NTU RGB+D 60, Northwestern-UCLA, SHREC'17, and DHG-14/28. The code is publicly available at: https://github.com/Gbouna/RE-TCN

Updated: 2025-03-29 20:32:22

标题: 在现实世界环境辅助生活中的动作识别

摘要: 人口老龄化增长及他们希望保持独立生活在自己家中的偏好需要积极的策略来确保安全和支持。环境辅助生活（AAL）技术已经出现，通过在家中提供持续监测和协助来促进老年人在家中生活。在AAL技术中，动作识别在解释人类活动和检测跌倒、行动减少或异常行为等可能表明健康状况恶化的事件中起着关键作用。然而，在实际AAL应用中的动作识别存在挑战，包括遮挡、噪音数据和需要实时性能。虽然在准确性、对噪音的鲁棒性和计算效率方面已经取得了进展，但在它们之间实现平衡仍然是一个挑战。为了应对这一挑战，本文介绍了鲁棒且高效的时间卷积网络（RE-TCN），它由三个主要元素组成：自适应时间加权（ATW）、深度可分离卷积（DSC）和数据增强技术。这些元素旨在增强模型在真实AAL环境中的准确性、对噪音和遮挡的鲁棒性以及计算效率。RE-TCN在准确性、噪音和遮挡鲁棒性方面优于现有模型，并已在四个基准数据集上进行验证：NTU RGB+D 60、Northwestern-UCLA、SHREC'17和DHG-14/28。代码公开可用：https://github.com/Gbouna/RE-TCN。

更新时间: 2025-03-29 20:32:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.23214v1

RECALL-MM: A Multimodal Dataset of Consumer Product Recalls for Risk Analysis using Computational Methods and Large Language Models

Product recalls provide valuable insights into potential risks and hazards within the engineering design process, yet their full potential remains underutilized. In this study, we curate data from the United States Consumer Product Safety Commission (CPSC) recalls database to develop a multimodal dataset, RECALL-MM, that informs data-driven risk assessment using historical information, and augment it using generative methods. Patterns in the dataset highlight specific areas where improved safety measures could have significant impact. We extend our analysis by demonstrating interactive clustering maps that embed all recalls into a shared latent space based on recall descriptions and product names. Leveraging these data-driven tools, we explore three case studies to demonstrate the dataset's utility in identifying product risks and guiding safer design decisions. The first two case studies illustrate how designers can visualize patterns across recalled products and situate new product ideas within the broader recall landscape to proactively anticipate hazards. In the third case study, we extend our approach by employing a large language model (LLM) to predict potential hazards based solely on product images. This demonstrates the model's ability to leverage visual context to identify risk factors, revealing strong alignment with historical recall data across many hazard categories. However, the analysis also highlights areas where hazard prediction remains challenging, underscoring the importance of risk awareness throughout the design process. Collectively, this work aims to bridge the gap between historical recall data and future product safety, presenting a scalable, data-driven approach to safer engineering design.

Updated: 2025-03-29 20:27:28

标题: RECALL-MM: 一个消费品召回的多模态数据集，用于风险分析的计算方法和大型语言模型

摘要: 产品召回提供了有价值的见解，可以揭示工程设计过程中潜在的风险和危害，但是它们的全部潜力仍然未被充分利用。在这项研究中，我们从美国消费者产品安全委员会（CPSC）的召回数据库中整理数据，开发了一个多模态数据集RECALL-MM，通过历史信息提供数据驱动的风险评估，并利用生成方法进行增强。数据集中的模式突出显示了改进安全措施可以产生显著影响的特定领域。我们通过展示交互式聚类地图扩展我们的分析，将所有召回信息嵌入到一个共享的潜在空间中，基于召回描述和产品名称。利用这些数据驱动工具，我们探索了三个案例研究，以展示数据集在识别产品风险和指导更安全设计决策方面的实用性。前两个案例研究说明了设计师如何可视化跨召回产品的模式，并将新产品想法置于更广泛的召回景观中，以主动预见危险。在第三个案例研究中，我们通过使用大型语言模型（LLM）仅基于产品图像来预测潜在危险来扩展我们的方法。这表明该模型能够利用视觉上下文来识别风险因素，展示了与历史召回数据在许多危害类别上的强烈一致性。然而，分析也突出显示了危害预测仍然具有挑战性的领域，强调了在设计过程中风险意识的重要性。总的来说，这项工作旨在弥合历史召回数据和未来产品安全之间的鸿沟，提出一个可扩展的、数据驱动的更安全工程设计方法。

更新时间: 2025-03-29 20:27:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.23213v1

Convolutional Neural Networks Can (Meta-)Learn the Same-Different Relation

While convolutional neural networks (CNNs) have come to match and exceed human performance in many settings, the tasks these models optimize for are largely constrained to the level of individual objects, such as classification and captioning. Humans remain vastly superior to CNNs in visual tasks involving relations, including the ability to identify two objects as `same' or `different'. A number of studies have shown that while CNNs can be coaxed into learning the same-different relation in some settings, they tend to generalize poorly to other instances of this relation. In this work we show that the same CNN architectures that fail to generalize the same-different relation with conventional training are able to succeed when trained via meta-learning, which explicitly encourages abstraction and generalization across tasks.

Updated: 2025-03-29 20:24:23

标题: 卷积神经网络可以（元）学习相同-不同关系

摘要: 尽管卷积神经网络（CNNs）在许多情景中已经达到并超过人类表现，但这些模型优化的任务主要局限在个体对象的级别，比如分类和字幕。在涉及关系的视觉任务中，人类仍然远远优于CNNs，包括识别两个对象是否“相同”或“不同”的能力。一些研究表明，虽然CNNs可以在某些情况下被诱使学习相同-不同关系，但它们倾向于在这种关系的其他实例中泛化能力较差。在这项工作中，我们展示了即使使用常规训练失败于泛化相同-不同关系的相同CNN架构，在经过元学习训练后能够成功，元学习明确鼓励跨任务的抽象和泛化。

更新时间: 2025-03-29 20:24:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.23212v1

A QUBO Framework for Team Formation

The team formation problem assumes a set of experts and a task, where each expert has a set of skills and the task requires some skills. The objective is to find a set of experts that maximizes coverage of the required skills while simultaneously minimizing the costs associated with the experts. Different definitions of cost have traditionally led to distinct problem formulations and algorithmic solutions. We introduce the unified TeamFormation formulation that captures all cost definitions for team formation problems that balance task coverage and expert cost. Specifically, we formulate three TeamFormation variants with different cost functions using quadratic unconstrained binary optimization (QUBO), and we evaluate two distinct general-purpose solution methods. We show that solutions based on the QUBO formulations of TeamFormation problems are at least as good as those produced by established baselines. Furthermore, we show that QUBO-based solutions leveraging graph neural networks can effectively learn representations of experts and skills to enable transfer learning, allowing node embeddings from one problem instance to be efficiently applied to another.

Updated: 2025-03-29 20:18:46

标题: 一个用于团队组建的QUBO框架

摘要: 团队形成问题假设有一组专家和一个任务，其中每个专家都具有一组技能，而任务需要一些技能。目标是找到一组专家，最大化覆盖所需技能，同时最小化与专家相关的成本。不同的成本定义传统上导致不同的问题公式和算法解决方案。我们引入了统一的TeamFormation公式，捕捉了平衡任务覆盖和专家成本的团队形成问题的所有成本定义。具体地，我们使用二次无约束二进制优化（QUBO）制定了三种不同成本函数的TeamFormation变体，并评估了两种不同的通用解决方案方法。我们表明，基于QUBO公式的TeamFormation问题的解决方案至少与已建立的基线所产生的解决方案一样好。此外，我们表明，基于QUBO的解决方案利用图神经网络可以有效地学习专家和技能的表示，以实现迁移学习，从而使得一个问题实例的节点嵌入可以有效地应用到另一个问题实例。

更新时间: 2025-03-29 20:18:46

领域: cs.LG,cs.DM,cs.SI

下载: http://arxiv.org/abs/2503.23209v1

Enhancing Knowledge Graph Completion with Entity Neighborhood and Relation Context

Knowledge Graph Completion (KGC) aims to infer missing information in Knowledge Graphs (KGs) to address their inherent incompleteness. Traditional structure-based KGC methods, while effective, face significant computational demands and scalability challenges due to the need for dense embedding learning and scoring all entities in the KG for each prediction. Recent text-based approaches using language models like T5 and BERT have mitigated these issues by converting KG triples into text for reasoning. However, they often fail to fully utilize contextual information, focusing mainly on the neighborhood of the entity and neglecting the context of the relation. To address this issue, we propose KGC-ERC, a framework that integrates both types of context to enrich the input of generative language models and enhance their reasoning capabilities. Additionally, we introduce a sampling strategy to effectively select relevant context within input token constraints, which optimizes the utilization of contextual information and potentially improves model performance. Experiments on the Wikidata5M, Wiki27K, and FB15K-237-N datasets show that KGC-ERC outperforms or matches state-of-the-art baselines in predictive performance and scalability.

Updated: 2025-03-29 20:04:50

标题: 利用实体邻域和关系上下文增强知识图完成

摘要: 知识图谱补全（KGC）旨在推断知识图谱（KGs）中缺失的信息，以解决其固有的不完整性。传统的基于结构的KGC方法虽然有效，但由于需要对KG中的所有实体进行密集嵌入学习和评分以进行每次预测，面临着重大的计算需求和可伸缩性挑战。最近使用T5和BERT等语言模型的基于文本的方法已经缓解了这些问题，通过将KG三元组转化为文本进行推理。然而，它们往往未能充分利用上下文信息，主要关注实体的邻域，忽视了关系的上下文。为了解决这个问题，我们提出了KGC-ERC，一个整合了两种上下文的框架，用于丰富生成式语言模型的输入并增强其推理能力。此外，我们还引入了一种采样策略，以有效选择输入令牌约束内的相关上下文，优化上下文信息的利用并潜在地提高模型性能。在Wikidata5M、Wiki27K和FB15K-237-N数据集上的实验表明，KGC-ERC在预测性能和可伸缩性方面优于或与最先进的基准线相匹配。

更新时间: 2025-03-29 20:04:50

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2503.23205v1

The Challenge of Achieving Attributability in Multilingual Table-to-Text Generation with Question-Answer Blueprints

Multilingual Natural Language Generation (NLG) is challenging due to the lack of training data for low-resource languages. However, some low-resource languages have up to tens of millions of speakers globally, making it important to improve NLG tools for them. Table-to-Text NLG is an excellent measure of models' reasoning abilities but is very challenging in the multilingual setting. System outputs are often not attributable, or faithful, to the data in the source table. Intermediate planning techniques like Question-Answer (QA) blueprints have been shown to improve attributability on summarisation tasks. This work explores whether QA blueprints make multilingual Table-to-Text outputs more attributable to the input tables. This paper extends the challenging multilingual Table-to-Text dataset, TaTA, which includes African languages, with QA blueprints. Sequence-to-sequence language models are then finetuned on this dataset, with and without blueprints. Results show that QA blueprints improve performance for models finetuned and evaluated only on English examples, but do not demonstrate gains in the multilingual setting. This is due to inaccuracies in machine translating the blueprints from English into target languages when generating the training data, and models failing to rely closely on the blueprints they generate. An in-depth analysis is conducted on why this is challenging.

Updated: 2025-03-29 20:04:00

标题: 实现多语言表格生成文本中的可归因性挑战：基于问题-答案蓝图的方法

摘要: 多语言自然语言生成（NLG）由于低资源语言训练数据的缺乏而具有挑战性。然而，一些低资源语言在全球拥有数千万使用者，因此改进针对这些语言的NLG工具非常重要。表格到文本的NLG是模型推理能力的优秀评估指标，但在多语言环境中非常具有挑战性。系统输出通常无法准确反映源表中的数据，或者称为不忠实。已经证明中间规划技术，如问答（QA）蓝图，可以提高摘要任务的可归因性。本研究探讨了QA蓝图是否可以使多语言表格到文本输出更加与输入表格相关。该论文扩展了具有非洲语言的具有挑战性的多语言表格到文本数据集TaTA，并引入了QA蓝图。然后，在这个数据集上对序列到序列语言模型进行微调，有的使用蓝图，有的不使用。结果表明，QA蓝图可以提高只在英文示例上进行微调和评估的模型的性能，但在多语言环境中并不表现出提升。这是因为在生成训练数据时，从英语翻译蓝图到目标语言存在不准确性，并且模型未能密切依赖它们生成的蓝图。对为何具有挑战性的进行了深入分析。

更新时间: 2025-03-29 20:04:00

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.23204v1

Incorporating GNSS Information with LIDAR-Inertial Odometry for Accurate Land-Vehicle Localization

Currently, visual odometry and LIDAR odometry are performing well in pose estimation in some typical environments, but they still cannot recover the localization state at high speed or reduce accumulated drifts. In order to solve these problems, we propose a novel LIDAR-based localization framework, which achieves high accuracy and provides robust localization in 3D pointcloud maps with information of multi-sensors. The system integrates global information with LIDAR-based odometry to optimize the localization state. To improve robustness and enable fast resumption of localization, this paper uses offline pointcloud maps for prior knowledge and presents a novel registration method to speed up the convergence rate. The algorithm is tested on various maps of different data sets and has higher robustness and accuracy than other localization algorithms.

Updated: 2025-03-29 19:41:31

标题: 将GNSS信息与激光雷达惯性测距合并，用于准确的陆地车辆定位

摘要: 目前，在一些典型环境中，视觉里程计和激光雷达里程计在姿态估计方面表现良好，但仍然无法在高速情况下恢复定位状态或减少累积漂移。为了解决这些问题，我们提出了一种新颖的基于激光雷达的定位框架，该框架在具有多传感器信息的3D点云地图中实现高精度并提供稳健的定位。该系统将全局信息与基于激光雷达的里程计集成，以优化定位状态。为了提高稳健性并实现快速恢复定位，本文使用离线点云地图作为先验知识，并提出一种新颖的配准方法以加快收敛速度。该算法在不同数据集的各种地图上进行了测试，比其他定位算法具有更高的稳健性和准确性。

更新时间: 2025-03-29 19:41:31

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.23199v1

Simulation-based Bayesian Inference from Privacy Protected Data

Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the confidential data or during the data analysis process to produce privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid statistical inferences. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we adopt neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.

Updated: 2025-03-29 19:39:41

标题: 基于模拟的贝叶斯推断：隐私保护数据

摘要: 许多现代统计分析和机器学习应用需要在敏感用户数据上训练模型。在隐私保护的形式定义下，差分隐私算法在机密数据或数据分析过程中注入校准噪声，以生成隐私保护的数据集或查询。然而，在统计分析过程中仅限于访问私有数据使得进行有效的统计推断具有计算上的挑战。在这项工作中，我们提出了基于隐私保护数据的模拟推断方法。除了顺序蒙特卡洛近似贝叶斯计算，我们采用神经条件密度估计器作为一种灵活的分布族来近似模型参数的后验分布，给定观察到的私有查询结果。我们在传染病模型下的离散时间序列数据和普通线性回归模型上说明了我们的方法。通过说明隐私-效用权衡，我们的实验和分析表明，设计有效的统计推断程序以纠正隐私保护机制引入的偏差的必要性和可行性。

更新时间: 2025-03-29 19:39:41

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2310.12781v4

Can Multi-modal (reasoning) LLMs work as deepfake detectors?

Deepfake detection remains a critical challenge in the era of advanced generative models, particularly as synthetic media becomes more sophisticated. In this study, we explore the potential of state of the art multi-modal (reasoning) large language models (LLMs) for deepfake image detection such as (OpenAI O1/4o, Gemini thinking Flash 2, Deepseek Janus, Grok 3, llama 3.2, Qwen 2/2.5 VL, Mistral Pixtral, Claude 3.5/3.7 sonnet) . We benchmark 12 latest multi-modal LLMs against traditional deepfake detection methods across multiple datasets, including recently published real-world deepfake imagery. To enhance performance, we employ prompt tuning and conduct an in-depth analysis of the models' reasoning pathways to identify key contributing factors in their decision-making process. Our findings indicate that best multi-modal LLMs achieve competitive performance with promising generalization ability with zero shot, even surpass traditional deepfake detection pipelines in out-of-distribution datasets while the rest of the LLM families performs extremely disappointing with some worse than random guess. Furthermore, we found newer model version and reasoning capabilities does not contribute to performance in such niche tasks of deepfake detection while model size do help in some cases. This study highlights the potential of integrating multi-modal reasoning in future deepfake detection frameworks and provides insights into model interpretability for robustness in real-world scenarios.

Updated: 2025-03-29 19:19:14

标题: 多模态（推理）LLMs能够作为深度伪造检测器吗？

摘要: 在先进生成模型时代，Deepfake检测仍然是一个关键挑战，特别是合成媒体变得越来越复杂的情况下。在这项研究中，我们探讨了最新的多模态（推理）大型语言模型（LLMs）在Deepfake图像检测中的潜力，例如（OpenAI O1/4o，Gemini thinking Flash 2，Deepseek Janus，Grok 3，llama 3.2，Qwen 2/2.5 VL，Mistral Pixtral，Claude 3.5/3.7 sonnet）。我们对12个最新的多模态LLMs进行了基准测试，与传统的Deepfake检测方法在多个数据集上进行比较，包括最近发布的真实世界Deepfake图像。为了提高性能，我们采用了提示调整，并对模型的推理路径进行了深入分析，以识别在其决策过程中的关键贡献因素。我们的研究结果表明，最佳的多模态LLMs在零样本情况下实现了有竞争力的性能，并且在超出分布数据集中超越了传统的Deepfake检测管道，而其余的LLM系列表现极为令人失望，有些甚至比随机猜测还要差。此外，我们发现新模型版本和推理能力并不对Deepfake检测这种特殊任务的性能有贡献，而模型大小在某些情况下确实有帮助。这项研究突出了将多模态推理整合到未来Deepfake检测框架中的潜力，并为在现实场景中的鲁棒性提供了模型可解释性的见解。

更新时间: 2025-03-29 19:19:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.20084v2

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

We introduce the Scene Language, a visual scene representation that concisely and precisely describes the structure, semantics, and identity of visual scenes. It represents a scene with three key components: a program that specifies the hierarchical and relational structure of entities in the scene, words in natural language that summarize the semantic class of each entity, and embeddings that capture the visual identity of each entity. This representation can be inferred from pre-trained language models via a training-free inference technique, given text or image inputs. The resulting scene can be rendered into images using traditional, neural, or hybrid graphics renderers. Together, this forms a robust, automated system for high-quality 3D and 4D scene generation. Compared with existing representations like scene graphs, our proposed Scene Language generates complex scenes with higher fidelity, while explicitly modeling the scene structures to enable precise control and editing.

Updated: 2025-03-29 19:17:13

标题: 场景语言：用程序、单词和嵌入来表示场景

摘要: 我们引入了场景语言(Scene Language)，这是一种简洁而准确地描述视觉场景结构、语义和身份的视觉场景表示。它用三个关键组件表示一个场景：一个指定场景中实体的层次结构和关系结构的程序，自然语言中总结每个实体语义类别的词，以及捕捉每个实体视觉身份的嵌入。这种表示可以通过无需训练的推理技术从预训练的语言模型中推断出，给定文本或图像输入。通过传统、神经或混合图形渲染器，可以将结果场景呈现为图像。总的来说，这构成了一个强大的自动化系统，用于高质量的3D和4D场景生成。与现有的表示形式如场景图相比，我们提出的场景语言生成具有更高保真度的复杂场景，同时明确地建模场景结构以实现精确控制和编辑。

更新时间: 2025-03-29 19:17:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.16770v2

Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses

Vision-Language Models (VLMs) implicitly learn to associate image regions with words from large-scale training data, demonstrating an emergent capability for grounding concepts without dense annotations[14,18,51]. However, the coarse-grained supervision from image-caption pairs is often insufficient to resolve ambiguities in object-concept correspondence, even with enormous data volume. Rich semantic and syntactic structures within the text modality have been overlooked as sources of supervision. Starting from contrastive architectures (BLIP and ALBEF) that show strong intrinsic grounding abilities, we propose HIerarchically STructured Learning (HIST). HIST enhances spatial vision-language alignment without using additional human annotations, by hierarchically decomposing captions into the constituent Subjects, Phrases, and Composite Phrases, and enforcing entailment relation between a parent and its children in the hierarchy. Specifically, we introduce two novel loss functions: (1) Subject Loss, which aligns image content with the subject of the corresponding phrase, acting as an entailment of standard contrastive/matching losses at the Phrase level; (2) Composition Loss, to balance attention across multiple objects. HIST is general, and can be applied to any VLM for which attention between vision and language can be computed. Compared to baseline VLMs, HIST achieves up to +9.8% improvement in visual grounding and +6.3% in multi-object referring segmentation. Surprisingly, the improved spatial grounding leads to improvements in other downstream VLM tasks: +1.1% in image-text retrieval, and +0.2% in visual question answering.

Updated: 2025-03-29 19:13:09

标题: 沿着句法树吠叫：利用句法损失增强VLM训练

摘要: 视觉语言模型（VLMs）隐式学习将图像区域与大规模训练数据中的单词相关联，展示了一种在没有密集注释的情况下根植概念的新兴能力[14,18,51]。然而，来自图像标题对的粗粒度监督通常不足以解决对象概念对应中的歧义，即使数据量很大。文本模态中丰富的语义和句法结构被忽视作为监督来源。从显示出强大内在根植能力的对比结构（BLIP和ALBEF）开始，我们提出了分级结构学习（HIST）。HIST通过分级将标题分解为构成主题、短语和复合短语，并在层次结构中强制实体关系，增强了视觉语言对齐，而无需使用额外的人类标注。具体来说，我们引入了两种新颖的损失函数：（1）主题损失，将图像内容与相应短语的主题对齐，充当标准对比/匹配损失在短语级别的实体；（2）组合损失，以平衡跨多个对象的关注。HIST是通用的，可以应用于任何可以计算视觉和语言之间注意力的VLM。与基线VLM相比，HIST在视觉根植和多对象引用分割上实现了高达+9.8%的改进。令人惊讶的是，改进的空间根植导致其他下游VLM任务的改进：图像文本检索增加了+1.1%，视觉问题回答增加了+0.2%。

更新时间: 2025-03-29 19:13:09

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2412.08110v2

Ethereum Price Prediction Employing Large Language Models for Short-term and Few-shot Forecasting

Cryptocurrencies have transformed financial markets with their innovative blockchain technology and volatile price movements, presenting both challenges and opportunities for predictive analytics. Ethereum, being one of the leading cryptocurrencies, has experienced significant market fluctuations, making its price prediction an attractive yet complex problem. This paper presents a comprehensive study on the effectiveness of Large Language Models (LLMs) in predicting Ethereum prices for short-term and few-shot forecasting scenarios. The main challenge in training models for time series analysis is the lack of data. We address this by leveraging a novel approach that adapts existing pre-trained LLMs on natural language or images from billions of tokens to the unique characteristics of Ethereum price time series data. Through thorough experimentation and comparison with traditional and contemporary models, our results demonstrate that selectively freezing certain layers of pre-trained LLMs achieves state-of-the-art performance in this domain. This approach consistently surpasses benchmarks across multiple metrics, including Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), demonstrating its effectiveness and robustness. Our research not only contributes to the existing body of knowledge on LLMs but also provides practical insights in the cryptocurrency prediction domain. The adaptability of pre-trained LLMs to handle the nature of Ethereum prices suggests a promising direction for future research, potentially including the integration of sentiment analysis to further refine forecasting accuracy.

Updated: 2025-03-29 19:04:28

标题: 以大型语言模型为基础的以太坊价格预测：短期和少样本预测

摘要: 加密货币通过其创新的区块链技术和波动的价格变动改变了金融市场，为预测分析提供了挑战和机遇。作为领先的加密货币之一，以太坊经历了显著的市场波动，使其价格预测成为一个吸引人但复杂的问题。本文对大型语言模型（LLMs）在短期和少样本预测场景中预测以太坊价格的有效性进行了综合研究。对于时间序列分析的模型训练的主要挑战是缺乏数据。我们通过利用一种新颖的方法，将现有的预训练LLMs适应以太坊价格时间序列数据的独特特征，从数十亿个标记的自然语言或图像中进行调整来解决这一问题。通过深入实验和与传统和现代模型的比较，我们的结果表明，有选择地冻结预训练LLMs的某些层在该领域实现了最先进的性能。这种方法始终在多个指标上超越基准，包括均方误差（MSE）、平均绝对误差（MAE）和均方根误差（RMSE），展示了其有效性和鲁棒性。我们的研究不仅为现有的LLMs知识体系做出了贡献，还在加密货币预测领域提供了实用见解。预训练LLMs适应处理以太坊价格的性质的能力暗示了未来研究的一个有前途的方向，可能包括整合情感分析以进一步提高预测准确性。

更新时间: 2025-03-29 19:04:28

领域: cs.AI,cs.CE

下载: http://arxiv.org/abs/2503.23190v1

Nepotistically Trained Generative-AI Models Collapse

Trained on massive amounts of human-generated content, AI-generated image synthesis is capable of reproducing semantically coherent images that match the visual appearance of its training data. We show that when retrained on even small amounts of their own creation, these generative-AI models produce highly distorted images. We also show that this distortion extends beyond the text prompts used in retraining, and that once affected, the models struggle to fully heal even after retraining on only real images.

Updated: 2025-03-29 18:40:09

标题: 亲属训练的生成式人工智能模型崩溃

摘要: 经过大量人类生成内容的训练，AI生成的图像合成能够产生与其训练数据视觉外观相匹配的语义一致的图像。我们展示了，当重新训练自己创造的少量内容时，这些生成型AI模型会产生高度扭曲的图像。我们还表明，这种扭曲不仅限于重新训练中使用的文本提示，一旦受到影响，这些模型即使在重新训练时只使用真实图像也难以完全恢复。

更新时间: 2025-03-29 18:40:09

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2311.12202v2

TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment

Video Large Language Models (Video LLMs) have achieved significant success by leveraging a two-stage paradigm: pretraining on large-scale video-text data for vision-language alignment, followed by supervised fine-tuning (SFT) for task-specific capabilities. However, existing approaches struggle with temporal reasoning due to weak temporal correspondence in the data and reliance on the next-token prediction paradigm during training. To address these limitations, we propose TEMPLE (TEMporal Preference Learning), a systematic framework that enhances Video LLMs' temporal reasoning capabilities through Direct Preference Optimization (DPO). To facilitate this, we introduce an automated preference data generation pipeline that systematically constructs preference pairs by selecting videos that are rich in temporal information, designing video-specific perturbation strategies, and finally evaluating model responses on clean and perturbed video inputs. Our temporal alignment features two key innovations: curriculum learning which that progressively increases perturbation difficulty to improve model robustness and adaptability; and "Pre-SFT Alignment'', applying preference optimization before instruction tuning to prioritize fine-grained temporal comprehension. Extensive experiments demonstrate that our approach consistently improves Video LLM performance across multiple benchmarks with a relatively small set of self-generated DPO data. We further analyze the transferability of DPO data across architectures and the role of difficulty scheduling in optimization. Our findings highlight our TEMPLE as a scalable and efficient complement to SFT-based methods, paving the way for developing reliable Video LLMs. Code is available at https://github.com/lscpku/TEMPLE.

Updated: 2025-03-29 18:15:51

标题: TEMPLE:通过难度调度和预SFT对齐学习视频LLMs的时间偏好

摘要: 视频大型语言模型（Video LLMs）通过利用两阶段范式取得了显著的成功：在大规模视频文本数据上进行预训练以实现视觉语言对齐，然后进行监督微调（SFT）以实现任务特定能力。然而，由于数据中弱的时间对应关系和在训练过程中依赖下一个标记预测范式，现有方法在时间推理方面存在困难。为了解决这些限制，我们提出了TEMPLE（TEMporal Preference Learning），这是一个通过直接偏好优化（DPO）增强视频LLMs时间推理能力的系统框架。为了实现这一目标，我们引入了一个自动化的偏好数据生成流水线，通过选择富含时间信息的视频、设计视频特定的扰动策略，并最终在干净和扰动的视频输入上评估模型响应，系统地构造偏好对。我们的时间对齐具有两个关键创新：课程学习逐渐增加扰动难度以提高模型的鲁棒性和适应性；以及“Pre-SFT Alignment”，在指导微调之前应用偏好优化以优先考虑细粒度的时间理解。广泛的实验证明，我们的方法在多个基准测试中始终提高了视频LLM的性能，而只需相对较小数量的自动生成的DPO数据。我们进一步分析了DPO数据在不同架构之间的可迁移性以及优化中困难调度的作用。我们的研究结果突出了我们的TEMPLE作为SFT方法的可扩展和高效补充，为开发可靠的视频LLMs铺平了道路。代码可在https://github.com/lscpku/TEMPLE找到。

更新时间: 2025-03-29 18:15:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.16929v2

Large Language Models are Unreliable for Cyber Threat Intelligence

Several recent works have argued that Large Language Models (LLMs) can be used to tame the data deluge in the cybersecurity field, by improving the automation of Cyber Threat Intelligence (CTI) tasks. This work presents an evaluation methodology that other than allowing to test LLMs on CTI tasks when using zero-shot learning, few-shot learning and fine-tuning, also allows to quantify their consistency and their confidence level. We run experiments with three state-of-the-art LLMs and a dataset of 350 threat intelligence reports and present new evidence of potential security risks in relying on LLMs for CTI. We show how LLMs cannot guarantee sufficient performance on real-size reports while also being inconsistent and overconfident. Few-shot learning and fine-tuning only partially improve the results, thus posing doubts about the possibility of using LLMs for CTI scenarios, where labelled datasets are lacking and where confidence is a fundamental factor.

Updated: 2025-03-29 18:09:36

标题: 大型语言模型在网络威胁情报中不可靠

摘要: 最近的一些研究认为，大型语言模型（LLMs）可以用来控制网络安全领域的数据泛滥问题，通过改进网络威胁情报（CTI）任务的自动化。本文提出了一种评估方法，除了允许在使用零次学习、少次学习和微调时测试LLMs在CTI任务上的表现外，还允许量化它们的一致性和信心水平。我们对三种最先进的LLMs和一组包含350份威胁情报报告的数据集进行实验，并提供了依赖LLMs进行CTI可能存在安全风险的新证据。我们展示了LLMs在真实规模报告上无法保证足够的性能，同时存在不一致和过度自信的问题。少次学习和微调只能部分改善结果，因此对于使用LLMs进行CTI场景的可能性提出了疑问，其中标记数据集缺乏且信心是一个基本因素。

更新时间: 2025-03-29 18:09:36

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.23175v1

Who Owns the Output? Bridging Law and Technology in LLMs Attribution

Since the introduction of ChatGPT in 2022, Large language models (LLMs) and Large Multimodal Models (LMM) have transformed content creation, enabling the generation of human-quality content, spanning every medium, text, images, videos, and audio. The chances offered by generative AI models are endless and are drastically reducing the time required to generate content and usually raising the quality of the generation. However, considering the complexity and the difficult traceability of the generated content, the use of these tools provides challenges in attributing AI-generated content. The difficult attribution resides for a variety of reasons, starting from the lack of a systematic fingerprinting of the generated content and ending with the enormous amount of data on which LLMs and LMM are trained, which makes it difficult to connect generated content to the training data. This scenario is raising concerns about intellectual property and ethical responsibilities. To address these concerns, in this paper, we bridge the technological, ethical, and legislative aspects, by proposing a review of the legislative and technological instruments today available and proposing a legal framework to ensure accountability. In the end, we propose three use cases of how these can be combined to guarantee that attribution is respected. However, even though the techniques available today can guarantee a greater attribution to a greater extent, strong limitations still apply, that can be solved uniquely by the development of new attribution techniques, to be applied to LLMs and LMMs.

Updated: 2025-03-29 18:08:04

标题: 谁拥有产出物？在LLMs归属中桥接法律和技术

摘要: 自2022年引入ChatGPT以来，大型语言模型（LLMs）和大型多模型模型（LMM）已经改变了内容的创作，实现了人类质量的内容生成，涵盖了文本、图像、视频和音频等各种媒介。生成式人工智能模型所提供的机会是无限的，大大缩短了生成内容所需的时间，并通常提高了生成的质量。然而，考虑到生成内容的复杂性和难以追溯性，使用这些工具在归因AI生成内容方面存在挑战。困难的归因存在多种原因，从缺乏对生成内容的系统指纹到LLMs和LMMs受训练的大量数据，这使得难以将生成内容与训练数据连接起来。这种情况引发了对知识产权和道德责任的担忧。为了解决这些问题，在本文中，我们通过提出对今天可用的立法和技术工具进行审查，并提出一个法律框架来确保问责制，从而在技术、道德和立法方面建立桥梁。最后，我们提出了三种用例，说明如何结合这些方法来确保归因得到尊重。然而，即使今天可用的技术手段可以更大程度地保证归因，仍然存在严格的限制，这些限制只能通过开发新的归因技术才能得以解决，并应用于LLMs和LMMs。

更新时间: 2025-03-29 18:08:04

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2504.01032v1

Towards AI-Augmented Data Quality Management: From Data Quality for AI to AI for Data Quality Management

In the contemporary data-driven landscape, ensuring data quality (DQ) is crucial for deriving actionable insights from vast data repositories. The objective of this study is to explore the potential for automating data quality management within data warehouses as data repository commonly used by large organizations. By conducting a systematic review of existing DQ tools available in the market and academic literature, the study assesses their capability to automatically detect and enforce data quality rules. The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses. Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses. The findings underscore a significant gap in the market and academic research regarding AI-augmented DQ rule detection in data warehouses. This paper advocates for further development in this area to enhance the efficiency of DQ management processes, reduce human workload, and lower costs. The study highlights the necessity of advanced tools for automated DQ rule detection, paving the way for improved practices in data quality management tailored to data warehouse environments. The study can guide organizations in selecting data quality tool that would meet their requirements most.

Updated: 2025-03-29 18:06:34

标题: 朝向AI增强数据质量管理：从用于AI的数据质量到用于数据质量管理的AI

摘要: 在当今数据驱动的环境中，确保数据质量（DQ）对于从庞大的数据存储库中获得可操作洞见至关重要。本研究的目标是探索在大型组织常用的数据仓库中自动化数据质量管理的潜力。通过对市场和学术文献中现有的DQ工具进行系统性审查，本研究评估它们自动检测和执行数据质量规则的能力。审查涵盖了来自各种来源的151个工具，揭示了大多数当前工具侧重于领域特定数据库中的数据清洗和修复，而不是数据仓库。仅有少数工具，具体来说是十个，表现出检测DQ规则的能力，更不用说在数据仓库中实施了。研究结果突显了市场和学术研究中关于在数据仓库中利用AI增强的DQ规则检测存在重大差距。本文呼吁在这一领域进一步发展，以提高DQ管理流程的效率，减少人力工作量和降低成本。研究强调了自动化DQ规则检测的先进工具的必要性，为数据仓库环境量身定制数据质量管理实践铺平了道路。本研究可指导组织选择最符合其需求的数据质量工具。

更新时间: 2025-03-29 18:06:34

领域: cs.DB,cs.AI,cs.CE,cs.ET

下载: http://arxiv.org/abs/2406.10940v2

TRA: Better Length Generalisation with Threshold Relative Attention

Transformers struggle with length generalisation, displaying poor performance even on basic tasks. We test whether these limitations can be explained through two key failures of the self-attention mechanism. The first is the inability to fully remove irrelevant information. The second is tied to position, even if the dot product between a key and query is highly negative (i.e. an irrelevant key) learned positional biases may unintentionally up-weight such information - dangerous when distances become out of distribution. Put together, these two failure cases lead to compounding generalisation difficulties. We test whether they can be mitigated through the combination of a) selective sparsity - completely removing irrelevant keys from the attention softmax and b) contextualised relative distance - distance is only considered as between the query and the keys that matter. We show how refactoring the attention mechanism with these two mitigations in place can substantially improve generalisation capabilities of decoder only transformers.

Updated: 2025-03-29 18:06:28

标题: TRA: 使用阈值相对注意力进行更好的长度泛化

摘要: Transformer模型在长度概括方面存在困难，即使在基本任务上也表现不佳。我们测试这些限制是否可以通过自注意机制的两个关键失败来解释。第一个是无法完全消除无关信息。第二个与位置有关，即使关键和查询之间的点积非常负面（即无关键），学习的位置偏差可能会无意中增加该信息的权重 - 当距离不在分布范围内时会产生危险。综合这两种失败情况会导致概括困难的复合。我们测试它们是否可以通过以下两种减轻方法得到缓解：a）选择性稀疏性 - 从注意力softmax中完全删除无关键；b）情境化相对距离 - 距离仅被视为查询和重要关键之间的距离。我们展示如何重构注意机制，通过这两种减轻方法可以显著提高仅解码器transformer的概括能力。

更新时间: 2025-03-29 18:06:28

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.23174v1

AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry Data

With upcoming sample return missions across the solar system and the increasing availability of mass spectrometry data, there is an urgent need for methods that analyze such data within the context of existing astrobiology literature and generate plausible hypotheses regarding the emergence of life on Earth. Hypothesis generation from mass spectrometry data is challenging due to factors such as environmental contaminants, the complexity of spectral peaks, and difficulties in cross-matching these peaks with prior studies. To address these challenges, we introduce AstroAgents, a large language model-based, multi-agent AI system for hypothesis generation from mass spectrometry data. AstroAgents is structured around eight collaborative agents: a data analyst, a planner, three domain scientists, an accumulator, a literature reviewer, and a critic. The system processes mass spectrometry data alongside user-provided research papers. The data analyst interprets the data, and the planner delegates specific segments to the scientist agents for in-depth exploration. The accumulator then collects and deduplicates the generated hypotheses, and the literature reviewer identifies relevant literature using Semantic Scholar. The critic evaluates the hypotheses, offering rigorous suggestions for improvement. To assess AstroAgents, an astrobiology expert evaluated the novelty and plausibility of more than a hundred hypotheses generated from data obtained from eight meteorites and ten soil samples. Of these hypotheses, 36% were identified as plausible, and among those, 66% were novel. Project website: https://astroagents.github.io/

Updated: 2025-03-29 17:58:52

标题: AstroAgents：一种用于从质谱数据中生成假设的多智能体人工智能

摘要: 随着整个太阳系即将进行的样品返回任务以及质谱数据的逐渐增加，有一个迫切的需要，即在现有的天体生物学文献背景下分析这些数据并提出关于地球上生命起源的合理假设。由于环境污染物、光谱峰的复杂性以及难以将这些峰与先前研究进行交叉匹配等因素，从质谱数据中生成假设是具有挑战性的。为了解决这些挑战，我们引入了 AstroAgents，这是一个基于大型语言模型的多智能体人工智能系统，用于从质谱数据中生成假设。AstroAgents 的结构围绕八个协作智能体展开：数据分析员、规划者、三名领域科学家、累加器、文献评论员和评论员。该系统处理质谱数据以及用户提供的研究论文。数据分析员解释数据，规划者将特定部分委派给科学家智能体进行深入探索。累加器然后收集并去重生成的假设，文献评论员使用语义学者识别相关文献。评论员评估假设，为改进提供严格建议。为了评估 AstroAgents，一个天体生物学专家评估了从八块陨石和十个土壤样本中获取的数据生成的一百多个假设的新颖性和合理性。在这些假设中，36% 被确定为合理，其中66% 是新颖的。项目网站：https://astroagents.github.io/

更新时间: 2025-03-29 17:58:52

领域: cs.AI

下载: http://arxiv.org/abs/2503.23170v1

Graph ODEs and Beyond: A Comprehensive Survey on Integrating Differential Equations with Graph Neural Networks

Graph Neural Networks (GNNs) and differential equations (DEs) are two rapidly advancing areas of research that have shown remarkable synergy in recent years. GNNs have emerged as powerful tools for learning on graph-structured data, while differential equations provide a principled framework for modeling continuous dynamics across time and space. The intersection of these fields has led to innovative approaches that leverage the strengths of both, enabling applications in physics-informed learning, spatiotemporal modeling, and scientific computing. This survey aims to provide a comprehensive overview of the burgeoning research at the intersection of GNNs and DEs. We will categorize existing methods, discuss their underlying principles, and highlight their applications across domains such as molecular modeling, traffic prediction, and epidemic spreading. Furthermore, we identify open challenges and outline future research directions to advance this interdisciplinary field. A comprehensive paper list is provided at https://github.com/Emory-Melody/Awesome-Graph-NDEs. This survey serves as a resource for researchers and practitioners seeking to understand and contribute to the fusion of GNNs and DEs

Updated: 2025-03-29 17:49:34

标题: 图ODE及其拓展：关于将微分方程与图神经网络集成的综合调查

摘要: 图神经网络(GNNs)和微分方程(DEs)是两个近年来迅速发展的研究领域，它们在最近几年展现出了显著的协同效应。GNNs已经成为在图结构化数据上学习的强大工具，而微分方程则提供了一个在时间和空间上建模连续动态的原则性框架。这两个领域的交叉点导致了创新方法的产生，利用了两者的优势，实现了物理知识驱动学习、时空建模和科学计算等应用。本调查旨在提供对GNNs和DEs交叉研究领域日益增长的研究的全面概述。我们将对现有方法进行分类，讨论它们的基本原理，并突出它们在分子建模、交通预测和疫情传播等领域的应用。此外，我们还确定了存在的挑战，并概述了推进这一跨学科领域的未来研究方向。在 https://github.com/Emory-Melody/Awesome-Graph-NDEs 提供了一份详尽的文献列表。本调查为寻求理解和贡献于GNNs和DEs融合的研究者和实践者提供了资源。

更新时间: 2025-03-29 17:49:34

领域: cs.LG

下载: http://arxiv.org/abs/2503.23167v1

Revisiting End-To-End Sparse Autoencoder Training: A Short Finetune Is All You Need

Sparse autoencoders (SAEs) are widely used for interpreting language model activations. A key evaluation metric is the increase in cross-entropy loss between the original model logits and the reconstructed model logits when replacing model activations with SAE reconstructions. Typically, SAEs are trained solely on mean squared error (MSE) when reconstructing precomputed, shuffled activations. Recent work introduced training SAEs directly with a combination of KL divergence and MSE ("end-to-end" SAEs), significantly improving reconstruction accuracy at the cost of substantially increased computation, which has limited their widespread adoption. We propose a brief KL+MSE fine-tuning step applied only to the final 25M training tokens (just a few percent of typical training budgets) that achieves comparable improvements, reducing the cross-entropy loss gap by 20-50%, while incurring minimal additional computational cost. We further find that multiple fine-tuning methods (KL fine-tuning, LoRA adapters, linear adapters) yield similar, non-additive cross-entropy improvements, suggesting a common, easily correctable error source in MSE-trained SAEs. We demonstrate a straightforward method for effectively transferring hyperparameters and sparsity penalties between training phases despite scale differences between KL and MSE losses. While both ReLU and TopK SAEs see significant cross-entropy loss improvements, evaluations on supervised SAEBench metrics yield mixed results, with improvements on some metrics and decreases on others, depending on both the SAE architecture and downstream task. Nonetheless, our method may offer meaningful improvements in interpretability applications such as circuit analysis with minor additional cost.

Updated: 2025-03-29 17:42:21

标题: 重新审视端到端稀疏自动编码器训练：短时间微调就足够

摘要: 稀疏自动编码器（SAEs）广泛用于解释语言模型的激活。一个关键的评估指标是将模型激活替换为SAE重构时，原始模型logits和重构模型logits之间的交叉熵损失增加。通常情况下，SAEs在重构预先计算的、洗牌过的激活时仅仅通过均方误差（MSE）进行训练。最近的研究直接引入了KL散度和MSE的组合来训练SAEs（"端到端"SAEs），显著提高了重构精度，但代价是大幅增加了计算量，这限制了它们的广泛应用。我们提出了一个简短的KL+MSE微调步骤，仅应用于最后25M训练标记（仅占典型训练预算的几个百分比），实现了可比的改进，将交叉熵损失差距减少了20-50%，同时增加了极小的额外计算成本。我们进一步发现多种微调方法（KL微调、LoRA适配器、线性适配器）产生类似的、非累加的交叉熵改进，表明在MSE训练的SAEs中存在一个常见、易纠正的错误源。我们展示了一种简单有效地在训练阶段之间传递超参数和稀疏惩罚的方法，尽管KL和MSE损失之间存在规模差异。虽然ReLU和TopK SAEs都看到了显著的交叉熵损失改进，在监督SAEBench指标的评估中产生了不同的结果，取决于SAE的架构和下游任务。尽管如此，我们的方法可能在电路分析等可解释性应用中提供有意义的改进，而增加的成本很小。

更新时间: 2025-03-29 17:42:21

领域: cs.LG

下载: http://arxiv.org/abs/2503.17272v2

ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising

Contextual advertising serves ads that are aligned to the content that the user is viewing. The rapid growth of video content on social platforms and streaming services, along with privacy concerns, has increased the need for contextual advertising. Placing the right ad in the right context creates a seamless and pleasant ad viewing experience, resulting in higher audience engagement and, ultimately, better ad monetization. From a technology standpoint, effective contextual advertising requires a video retrieval system capable of understanding complex video content at a very granular level. Current text-to-video retrieval models based on joint multimodal training demand large datasets and computational resources, limiting their practicality and lacking the key functionalities required for ad ecosystem integration. We introduce ContextIQ, a multimodal expert-based video retrieval system designed specifically for contextual advertising. ContextIQ utilizes modality-specific experts-video, audio, transcript (captions), and metadata such as objects, actions, emotion, etc.-to create semantically rich video representations. We show that our system, without joint training, achieves better or comparable results to state-of-the-art models and commercial solutions on multiple text-to-video retrieval benchmarks. Our ablation studies highlight the benefits of leveraging multiple modalities for enhanced video retrieval accuracy instead of using a vision-language model alone. Furthermore, we show how video retrieval systems such as ContextIQ can be used for contextual advertising in an ad ecosystem while also addressing concerns related to brand safety and filtering inappropriate content.

Updated: 2025-03-29 17:42:02

标题: ContextIQ：一种用于上下文广告的多模态基于专家的视频检索系统

摘要: 上下文广告是指根据用户正在查看的内容展示相关的广告。社交平台和流媒体服务上视频内容的快速增长，以及隐私问题的增加，增加了对上下文广告的需求。在正确的上下文中放置正确的广告可以创造出无缝且愉快的广告查看体验，从而提高观众参与度，最终实现更好的广告变现。从技术角度来看，有效的上下文广告需要一个能够在非常精细的层面理解复杂视频内容的视频检索系统。当前基于联合多模态训练的文本到视频检索模型需要大量数据集和计算资源，限制了它们的实用性，并且缺乏广告生态系统集成所需的关键功能。我们引入了ContextIQ，这是一个专门为上下文广告设计的多模态专家视频检索系统。ContextIQ利用专家视频、音频、字幕（标题）和元数据（如对象、动作、情感等）等模态来创建语义丰富的视频表示。我们展示了我们的系统在没有联合训练的情况下，在多个文本到视频检索基准上取得了比最先进模型和商业解决方案更好或可比的结果。我们的消融研究突出了利用多模态来提高视频检索准确性的好处，而不仅仅是使用视觉语言模型。此外，我们展示了像ContextIQ这样的视频检索系统如何在广告生态系统中用于上下文广告，同时解决与品牌安全和过滤不当内容相关的问题。

更新时间: 2025-03-29 17:42:02

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.22233v3

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks.

Updated: 2025-03-29 17:29:30

标题: Reasoning-SQL：使用SQL定制部分奖励的强化学习，以增强文本到SQL的推理能力

摘要: 文本到SQL是一个具有挑战性的任务，涉及多种需要推理的子任务，包括自然语言理解、数据库架构理解和精确的SQL查询制定。现有的方法通常依赖于手工推理路径，并具有归纳偏见，这可能限制它们的整体有效性。受到最近推理增强模型（如DeepSeek R1和OpenAI o1）的成功启发，这些模型有效利用奖励驱动的自我探索来增强推理能力和泛化能力，我们提出了一套新颖的专门针对文本到SQL任务的部分奖励。我们的奖励集包括架构链接、人工智能反馈、n-gram相似性和语法检查，明确设计用于解决在强化学习（RL）中普遍存在的奖励稀疏问题。利用群体相对策略优化（GRPO），我们的方法明确鼓励大型语言模型（LLMs）发展必要的内在推理技能，以便生成准确的SQL查询。通过不同大小的模型，我们证明了只使用我们提出的奖励进行RL训练始终比监督微调（SFT）实现更高的准确性和更优越的泛化能力。值得注意的是，我们经过RL训练的140亿参数模型在BIRD基准测试中显着优于更大的专有模型，例如o3-mini超过4％，Gemini-1.5-Pro-002超过3％。这些结果突显了我们提出的部分奖励RL训练框架在提高文本到SQL任务中的准确性和推理能力方面的有效性。

更新时间: 2025-03-29 17:29:30

领域: cs.LG,cs.AI,cs.DB,cs.PL

下载: http://arxiv.org/abs/2503.23157v1

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

Large language models (LLMs) demonstrate impressive capabilities in mathematical reasoning. However, despite these achievements, current evaluations are mostly limited to specific mathematical topics, and it remains unclear whether LLMs are genuinely engaging in reasoning. To address these gaps, we present the Mathematical Topics Tree (MaTT) benchmark, a challenging and structured benchmark that offers 1,958 questions across a wide array of mathematical subjects, each paired with a detailed hierarchical chain of topics. Upon assessing different LLMs using the MaTT benchmark, we find that the most advanced model, GPT-4, achieved a mere 54\% accuracy in a multiple-choice scenario. Interestingly, even when employing Chain-of-Thought prompting, we observe mostly no notable improvement. Moreover, LLMs accuracy dramatically reduced by up to 24.2 percentage point when the questions were presented without providing choices. Further detailed analysis of the LLMs' performance across a range of topics showed significant discrepancy even for closely related subtopics within the same general mathematical area. In an effort to pinpoint the reasons behind LLMs performances, we conducted a manual evaluation of the completeness and correctness of the explanations generated by GPT-4 when choices were available. Surprisingly, we find that in only 53.3\% of the instances where the model provided a correct answer, the accompanying explanations were deemed complete and accurate, i.e., the model engaged in genuine reasoning.

Updated: 2025-03-29 17:29:24

标题: LLM不是智能思考者：引入数学主题树基准用于全面评估LLMs

摘要: 大型语言模型（LLMs）展示了在数学推理方面令人印象深刻的能力。然而，尽管取得了这些成就，当前的评估大多局限于特定的数学主题，仍然不清楚LLMs是否真正参与了推理。为了解决这些问题，我们提出了数学主题树（MaTT）基准，这是一个具有挑战性和结构化的基准，提供了1,958个涵盖各种数学主题的问题，每个问题都配有详细的层次结构链。通过使用MaTT基准评估不同的LLMs，我们发现最先进的模型GPT-4在多项选择情景中仅达到了54％的准确度。有趣的是，即使采用了“思维链”提示，我们观察到几乎没有明显的改进。此外，当问题没有提供选项时，LLMs的准确度急剧下降了高达24.2个百分点。对LLMs在一系列主题上的表现进行进一步详细分析显示，即使是在同一数学领域内紧密相关的子主题中，也存在显著的差异。为了找出LLMs表现背后的原因，当选项可用时，我们对GPT-4生成的解释的完整性和正确性进行了手动评估。令人惊讶的是，在模型提供正确答案的情况下，仅有53.3％的情况下，伴随的解释被认为是完整和准确的，即模型进行了真正的推理。

更新时间: 2025-03-29 17:29:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05194v2

Conversational Agents for Older Adults' Health: A Systematic Literature Review

There has been vast literature that studies Conversational Agents (CAs) in facilitating older adults' health. The vast and diverse studies warrants a comprehensive review that concludes the main findings and proposes research directions for future studies, while few literature review did it from human-computer interaction (HCI) perspective. In this study, we present a survey of existing studies on CAs for older adults' health. Through a systematic review of 72 papers, this work reviewed previously studied older adults' characteristics and analyzed participants' experiences and expectations of CAs for health. We found that (1) Past research has an increasing interest on chatbots and voice assistants and applied CA as multiple roles in older adults' health. (2) Older adults mainly showed low acceptance CAs for health due to various reasons, such as unstable effects, harm to independence, and privacy concerns. (3) Older adults expect CAs to be able to support multiple functions, to communicate using natural language, to be personalized, and to allow users full control. We also discuss the implications based on the findings.

Updated: 2025-03-29 17:19:09

标题: 与老年人健康相关的对话代理：一项系统文献综述

摘要: 有大量文献研究了对话代理（CAs）在促进老年人健康方面的作用。这些广泛和多样的研究需要一项全面的回顾，总结主要发现并为未来的研究提出研究方向，而很少有文献评论是从人机交互（HCI）的角度进行的。在这项研究中，我们对现有的有关老年人健康的CAs的研究进行了调查。通过对72篇论文的系统回顾，本研究回顾了以前研究过的老年人特征，并分析了参与者对CAs在健康方面的体验和期望。我们发现：（1）过去的研究对聊天机器人和语音助手越来越感兴趣，并将CA应用为老年人健康中的多个角色。（2）老年人主要表现出对健康CAs的低接受度，原因有多种，例如效果不稳定，损害独立性和隐私问题。（3）老年人期望CAs能够支持多个功能，使用自然语言进行沟通，个性化，并允许用户完全控制。我们还根据研究结果讨论了其影响。

更新时间: 2025-03-29 17:19:09

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.23153v1

The interplay between domain specialization and model size

Scaling laws for language models have often focused on finding the optimal model size and token count for training from scratch. However, achieving this optimal balance requires significant compute resources due to the extensive data demands when training models from randomly-initialized weights. Continued pretraining offers a cost-effective alternative, leveraging the compute investment from pretrained models to incorporate new knowledge without requiring extensive new data. Recent findings suggest that data quality influences constants in scaling laws, thereby altering the optimal parameter-token allocation ratio. Building on this insight, we investigate the interplay between domain specialization and model size during continued pretraining under compute-constrained scenarios. Our goal is to identify an optimal training regime for this scenario and detect patterns in this interplay that can be generalized across different model sizes and domains. To compare general and specialized training, we filtered a web-based dataset to extract data from three domains: legal, medical, and accounting. We pretrained models with 1.5B, 3B, 7B, and 14B parameters on both the unfiltered and filtered datasets, then evaluated their performance on domain-specific exams. Results show that as model size increases, specialized models outperform general models while requiring less training compute. Additionally, their growing compute efficiency leads to reduced forgetting of previously learned knowledge.

Updated: 2025-03-29 17:18:43

标题: 领域专业化与模型规模之间的相互作用

摘要: 语言模型的规模定律通常集中在寻找从头开始训练的最佳模型大小和标记计数。然而，实现这种最佳平衡需要大量的计算资源，因为在从随机初始化的权重训练模型时存在广泛的数据需求。持续的预训练提供了一种经济高效的替代方案，利用预训练模型的计算投资，以整合新知识，而无需大量新数据。最近的研究发现表明，数据质量影响了规模定律中的常数，从而改变了最佳参数-标记分配比例。基于这一发现，我们研究了在计算受限情况下继续预训练期间领域专业化和模型大小之间的相互作用。我们的目标是确定这种情况下的最佳训练方案，并检测这种相互作用中的模式，这些模式可以推广到不同的模型大小和领域。为了比较通用和专业化的训练，我们从一个基于网络的数据集中提取了来自三个领域（法律、医学和会计）的数据。我们在未经筛选和经过筛选的数据集上预训练具有1.5B、3B、7B和14B个参数的模型，然后评估它们在领域特定考试上的表现。结果表明，随着模型大小的增加，专业化模型在不需要更多训练计算资源的情况下胜过通用模型。此外，它们不断增长的计算效率导致了对先前学到的知识的减少遗忘。

更新时间: 2025-03-29 17:18:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.02068v3

Agent-Based Modeling and Deep Neural Networks for Establishing Digital Twins of Secure Facilities under Sensing Restrictions

Digital twin technologies help practitioners simulate, monitor, and predict undesirable outcomes in-silico, while avoiding the cost and risks of conducting live simulation exercises. Virtual reality (VR) based digital twin technologies are especially useful when monitoring human Patterns of Life (POL) in secure nuclear facilities, where live simulation exercises are too dangerous and costly to ever perform. However, the high-security status of such facilities may restrict modelers from deploying human activity sensors for data collection. This problem was encountered when deploying MetaPOL, a digital twin system to prevent insider threat or sabotage of secure facilities, at a secure nuclear reactor facility at Oak Ridge National Laboratory (ORNL). This challenge was addressed using an agent-based model (ABM), driven by anecdotal evidence of facility personnel POL, to generate synthetic movement trajectories. These synthetic trajectories were then used to train deep neural network surrogates for next location and stay duration prediction to drive NPCs in the VR environment. In this study, we evaluate the efficacy of this technique for establishing NPC movement within MetaPOL and the ability to distinguish NPC movement during normal operations from that during a simulated emergency response. Our results demonstrate the success of using a multi-layer perceptron for next location prediction and mixture density network for stay duration prediction to predict the ABM generated trajectories. We also find that NPC movement in the VR environment driven by the deep neural networks under normal operations remain significantly different to that seen when simulating responses to a simulated emergency scenario.

Updated: 2025-03-29 17:01:43

标题: 基于代理模型和深度神经网络建立受感知限制的安全设施数字孪生模型

摘要: 数字孪生技术帮助从业者在模拟中监测和预测不良结果，同时避免进行实时模拟练习的成本和风险。基于虚拟现实（VR）的数字孪生技术在监测安全核设施中人类生活模式（POL）时特别有用，这些设施中进行实时模拟练习过于危险和昂贵。然而，这些设施的高安全级别可能会限制模型师部署人类活动传感器进行数据收集。在部署数字孪生系统MetaPOL以防止安全设施内部威胁或破坏时，在奥克岭国家实验室（ORNL）的安全核反应堆设施遇到了这个问题。通过使用受设施人员POL传闻驱动的基于代理的模型（ABM），生成合成运动轨迹来解决这一挑战。然后使用这些合成轨迹来训练深度神经网络替代物，用于下一个位置和停留持续时间预测，以驱动VR环境中的NPC。在这项研究中，我们评估了这种技术在建立NPC运动方面的有效性，并且能够区分正常运行时NPC运动与模拟应急响应时的NPC运动。我们的结果表明，使用多层感知器进行下一个位置预测和混合密度网络进行停留持续时间预测，能够预测ABM生成的轨迹。我们还发现，在深度神经网络驱动的VR环境中，正常运行时的NPC运动与模拟应急情况时的NPC运动明显不同。

更新时间: 2025-03-29 17:01:43

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.23147v1

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Inductive program synthesis, or programming by example, requires synthesizing functions from input-output examples that generalize to unseen inputs. While large language model agents have shown promise in programming tasks guided by natural language, their ability to perform inductive program synthesis is underexplored. Existing evaluation protocols rely on static sets of examples and held-out tests, offering no feedback when synthesized functions are incorrect and failing to reflect real-world scenarios such as reverse engineering. We propose CodeARC, the Code Abstraction and Reasoning Challenge, a new evaluation framework where agents interact with a hidden target function by querying it with new inputs, synthesizing candidate functions, and iteratively refining their solutions using a differential testing oracle. This interactive setting encourages agents to perform function calls and self-correction based on feedback. We construct the first large-scale benchmark for general-purpose inductive program synthesis, featuring 1114 functions. Among 18 models evaluated, o3-mini performs best with a success rate of 52.7%, highlighting the difficulty of this task. Fine-tuning LLaMA-3.1-8B-Instruct on curated synthesis traces yields up to a 31% relative performance gain. CodeARC provides a more realistic and challenging testbed for evaluating LLM-based program synthesis and inductive reasoning.

Updated: 2025-03-29 16:50:39

标题: CodeARC：对归纳程序合成LLM代理的推理能力进行基准测试

摘要: 归纳程序合成，或者说通过示例编程，需要从输入输出示例中综合生成能够推广到未见输入的函数。虽然大型语言模型代理在自然语言引导的编程任务中表现出潜力，但它们在归纳程序合成方面的能力尚未得到充分探讨。现有的评估协议依赖于静态的示例集和保留的测试，当综合出的函数不正确时提供不了反馈，并且未能反映出诸如逆向工程等真实世界场景。我们提出了CodeARC，即代码抽象和推理挑战，这是一个新的评估框架，其中代理通过向隐藏的目标函数查询新输入，综合候选函数，并通过差分测试神谕迭代地优化其解决方案。这种交互设置鼓励代理进行函数调用和基于反馈进行自我纠正。我们构建了第一个大规模通用归纳程序合成基准，其中包含1114个函数。在评估的18个模型中，o3-mini的成功率最高，达到52.7％，突显了这一任务的难度。在精心策划的综合跟踪上对LLaMA-3.1-8B-Instruct进行微调，可实现高达31％的相对性能增益。CodeARC提供了一个更现实和具有挑战性的测试平台，用于评估基于LLM的程序综合和归纳推理。

更新时间: 2025-03-29 16:50:39

领域: cs.PL,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.23145v1

RankMerging: A supervised learning-to-rank framework to predict links in large social network

Uncovering unknown or missing links in social networks is a difficult task because of their sparsity and because links may represent different types of relationships, characterized by different structural patterns. In this paper, we define a simple yet efficient supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. We illustrate our method on three different kinds of social networks and show that it substantially improves the performances of unsupervised metrics of ranking. We also compare it to other combination strategies based on standard methods. Finally, we explore various aspects of RankMerging, such as feature selection and parameter estimation and discuss its area of relevance: the prediction of an adjustable number of links on large networks.

Updated: 2025-03-29 16:50:10

标题: RankMerging：一种监督学习排序框架，用于预测大型社交网络中的链接

摘要: 在社交网络中揭示未知或缺失的链接是一项困难的任务，因为它们的稀疏性以及链接可能代表不同类型的关系，具有不同的结构模式。在本文中，我们定义了一个简单而高效的监督学习排序框架，称为RankMerging，旨在结合各种无监督排序提供的信息。我们在三种不同类型的社交网络上说明了我们的方法，并展示它显著提高了无监督排序指标的性能。我们还将其与基于标准方法的其他组合策略进行了比较。最后，我们探讨了RankMerging的各个方面，如特征选择和参数估计，并讨论其相关领域：在大型网络上预测可调整数量的链接。

更新时间: 2025-03-29 16:50:10

领域: cs.SI,cs.IR,cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/1407.2515v5

The Forest Behind the Tree: Revealing Hidden Smart Home Communication Patterns

The widespread use of Smart Home devices has attracted significant research interest in understanding their behavior within home networks. Unlike general-purpose computers, these devices exhibit relatively simple and predictable network activity patterns. However, previous studies have primarily focused on normal network conditions, overlooking potential hidden patterns that emerge under challenging conditions. Discovering these hidden flows is crucial for assessing device robustness. This paper addresses this gap by presenting a framework that systematically and automatically reveals these hidden communication patterns. By actively disturbing communication and blocking observed traffic, the framework generates comprehensive profiles structured as behavior trees, uncovering flows that are missed by more shallow methods. This approach was applied to ten real-world devices, identifying 254 unique flows, with over 27% only discovered through this new method. These insights enhance our understanding of device robustness and can be leveraged to improve the accuracy of network security measures.

Updated: 2025-03-29 16:49:25

标题: 树后的森林：揭示隐藏的智能家居通信模式

摘要: 智能家居设备的广泛使用吸引了人们对其在家庭网络中的行为进行深入研究。与通用计算机不同，这些设备表现出相对简单和可预测的网络活动模式。然而，先前的研究主要集中在正常网络条件下，忽略了在挑战条件下出现的潜在隐藏模式。发现这些隐藏流对评估设备的稳健性至关重要。本文通过提出一个系统化和自动化的框架来揭示这些隐藏通信模式的方法来填补这一空白。通过积极干扰通信和阻止观察到的流量，该框架生成结构化为行为树的综合概要文件，揭示了更浅层方法所遗漏的流。这种方法已应用于十个真实设备，识别出254个独特的流，其中超过27%仅通过这种新方法发现。这些见解增强了我们对设备稳健性的理解，并可以用来提高网络安全措施的准确性。

更新时间: 2025-03-29 16:49:25

领域: cs.NI,cs.CR,68M15,C.2.3

下载: http://arxiv.org/abs/2502.08535v2

APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning

Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type of problem statement. ReLU's variants, SWISH, and MISH are goto activation functions. MISH function is considered having similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function named APTx which behaves similar to MISH, but requires lesser mathematical operations to compute. The lesser computational requirements of APTx does speed up the model training, and thus also reduces the hardware requirement for the deep learning model. Source code: https://github.com/mr-ravin/aptx_activation

Updated: 2025-03-29 16:47:51

标题: APTx：在深度学习中比MISH、SWISH和ReLU变种更好的激活函数

摘要: 激活函数在深度神经网络中引入了非线性。这种非线性有助于神经网络从数据集中更快、更有效地学习。在深度学习中，根据问题的类型，开发和使用了许多激活函数。ReLU的变体、SWISH和MISH是常用的激活函数。MISH函数被认为具有与SWISH类似甚至更好的性能，比ReLU好得多。本文提出了一种名为APTx的激活函数，其行为类似于MISH，但需要更少的数学运算来计算。APTx的计算要求较低，加快了模型训练速度，同时也减少了深度学习模型的硬件要求。源代码：https://github.com/mr-ravin/aptx_activation.

更新时间: 2025-03-29 16:47:51

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2209.06119v5

Uncertainty propagation in feed-forward neural network models

We develop new uncertainty propagation methods for feed-forward neural network architectures with leaky ReLU activation functions subject to random perturbations in the input vectors. In particular, we derive analytical expressions for the probability density function (PDF) of the neural network output and its statistical moments as a function of the input uncertainty and the parameters of the network, i.e., weights and biases. A key finding is that an appropriate linearization of the leaky ReLU activation function yields accurate statistical results even for large perturbations in the input vectors. This can be attributed to the way information propagates through the network. We also propose new analytically tractable Gaussian copula surrogate models to approximate the full joint PDF of the neural network output. To validate our theoretical results, we conduct Monte Carlo simulations and a thorough error analysis on a multi-layer neural network representing a nonlinear integro-differential operator between two polynomial function spaces. Our findings demonstrate excellent agreement between the theoretical predictions and Monte Carlo simulations.

Updated: 2025-03-29 16:30:59

标题: 前馈神经网络模型中的不确定性传播

摘要: 我们为具有泄漏ReLU激活函数的前馈神经网络架构开发了新的不确定性传播方法，这些网络受到输入向量的随机扰动的影响。特别是，我们推导出神经网络输出的概率密度函数（PDF）及其统计矩的解析表达式，作为输入不确定性和网络参数（即权重和偏置）的函数。一个关键发现是，泄漏ReLU激活函数的适当线性化即使在输入向量中存在大的扰动时也可以产生准确的统计结果。这可以归因于信息如何在网络中传播。我们还提出了新的解析可处理的高斯Copula替代模型，以近似神经网络输出的完整联合PDF。为了验证我们的理论结果，我们对代表两个多项式函数空间之间的非线性积分微分算子的多层神经网络进行了蒙特卡洛模拟和彻底的误差分析。我们的研究结果表明理论预测与蒙特卡洛模拟之间有着良好的一致性。

更新时间: 2025-03-29 16:30:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.21059v2

EncGPT: A Multi-Agent Workflow for Dynamic Encryption Algorithms

Communication encryption is crucial in computer technology, but existing algorithms struggle with balancing cost and security. We propose EncGPT, a multi-agent framework using large language models (LLM). It includes rule, encryption, and decryption agents that generate encryption rules and apply them dynamically. This approach addresses gaps in LLM-based multi-agent systems for communication security. We tested GPT-4o's rule generation and implemented a substitution encryption workflow with homomorphism preservation, achieving an average execution time of 15.99 seconds.

Updated: 2025-03-29 16:13:30

标题: EncGPT：一种用于动态加密算法的多代理工作流程

摘要: 通信加密在计算机技术中至关重要，但现有算法在平衡成本和安全性方面存在困难。我们提出了EncGPT，这是一个使用大型语言模型（LLM）的多代理框架。它包括生成加密规则并动态应用它们的规则、加密和解密代理。这种方法解决了基于LLM的多代理系统在通信安全方面的缺陷。我们测试了GPT-4o的规则生成，并实现了具有同态保留的替换加密工作流程，实现了平均执行时间为15.99秒。

更新时间: 2025-03-29 16:13:30

领域: cs.CR,cs.MA

下载: http://arxiv.org/abs/2503.23138v1

CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining

Music similarity retrieval is fundamental for managing and exploring relevant content from large collections in streaming platforms. This paper presents a novel cross-modal contrastive learning framework that leverages the open-ended nature of text descriptions to guide music similarity modeling, addressing the limitations of traditional uni-modal approaches in capturing complex musical relationships. To overcome the scarcity of high-quality text-music paired data, this paper introduces a dual-source data acquisition approach combining online scraping and LLM-based prompting, where carefully designed prompts leverage LLMs' comprehensive music knowledge to generate contextually rich descriptions. Exten1sive experiments demonstrate that the proposed framework achieves significant performance improvements over existing benchmarks through objective metrics, subjective evaluations, and real-world A/B testing on the Huawei Music streaming platform.

Updated: 2025-03-29 15:43:09

标题: CrossMuSim：一种基于LLM的跨模态音乐相似性检索框架，包括文本描述的获取和挖掘

摘要: 音乐相似性检索对于在流媒体平台中管理和探索相关内容是基本的。本文提出了一种新颖的跨模态对比学习框架，利用文本描述的开放性来引导音乐相似性建模，解决了传统单模态方法在捕捉复杂音乐关系方面的局限性。为了克服高质量文本-音乐配对数据的稀缺性，本文引入了一种双源数据获取方法，结合在线抓取和基于LLM的提示，精心设计的提示利用LLM的全面音乐知识生成具有丰富语境的描述。广泛的实验证明，提出的框架通过客观指标、主观评估和在华为音乐流媒体平台上的真实A/B测试，实现了明显的性能改进。

更新时间: 2025-03-29 15:43:09

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2503.23128v1

Evaluating Compositional Scene Understanding in Multimodal Generative Models

The visual world is fundamentally compositional. Visual scenes are defined by the composition of objects and their relations. Hence, it is essential for computer vision systems to reflect and exploit this compositionality to achieve robust and generalizable scene understanding. While major strides have been made toward the development of general-purpose, multimodal generative models, including both text-to-image models and multimodal vision-language models, it remains unclear whether these systems are capable of accurately generating and interpreting scenes involving the composition of multiple objects and relations. In this work, we present an evaluation of the compositional visual processing capabilities in the current generation of text-to-image (DALL-E 3) and multimodal vision-language models (GPT-4V, GPT-4o, Claude Sonnet 3.5, QWEN2-VL-72B, and InternVL2.5-38B), and compare the performance of these systems to human participants. The results suggest that these systems display some ability to solve compositional and relational tasks, showing notable improvements over the previous generation of multimodal models, but with performance nevertheless well below the level of human participants, particularly for more complex scenes involving many ($>5$) objects and multiple relations. These results highlight the need for further progress toward compositional understanding of visual scenes.

Updated: 2025-03-29 15:34:43

标题: 评估多模态生成模型中的组合场景理解

摘要: 视觉世界在根本上是构成性的。视觉场景由对象及其关系的组成定义。因此，计算机视觉系统必须反映并利用这种构成性，以实现稳健且可泛化的场景理解。虽然在通用性、多模态生成模型的发展方面已经取得重大进展，包括文本到图像模型和多模态视觉语言模型，但目前尚不清楚这些系统是否能够准确生成和解释涉及多个对象和关系组成的场景。在这项工作中，我们评估了当前一代文本到图像模型（DALL-E 3）和多模态视觉语言模型（GPT-4V、GPT-4o、Claude Sonnet 3.5、QWEN2-VL-72B和InternVL2.5-38B）的构成视觉处理能力，并将这些系统的表现与人类参与者进行比较。结果表明，这些系统显示了一定的解决构成和关系任务的能力，相较于上一代多模态模型有显著改进，但表现仍然远低于人类参与者的水平，特别是对于涉及许多（>5）对象和多个关系的更复杂场景。这些结果凸显了对视觉场景构成性理解的进一步进展的必要性。

更新时间: 2025-03-29 15:34:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.23125v1

On the dimension of pullback attractors in recurrent neural networks

Recurrent Neural Networks (RNNs) are high-dimensional state space models capable of learning functions on sequence data. Recently, it has been conjectured that reservoir computers, a particular class of RNNs, trained on observations of a dynamical systems can be interpreted as embeddings. This result has been established for the case of linear reservoir systems. In this work, we use a nonautonomous dynamical systems approach to establish an upper bound for the fractal dimension of the subset of reservoir state space approximated during training and prediction phase. We prove that when the input sequences comes from an Nin-dimensional invertible dynamical system, the fractal dimension of this set is bounded above by Nin. The result obtained here are useful in dimensionality reduction of computation in RNNs as well as estimating fractal dimensions of dynamical systems from limited observations of their time series. It is also a step towards understanding embedding properties of reservoir computers.

Updated: 2025-03-29 15:24:12

标题: 关于递归神经网络中拉回吸引子维度的研究

摘要: 循环神经网络（RNNs）是能够学习序列数据函数的高维状态空间模型。最近有人推测，受训于动态系统观测的一类RNNs——储水池计算机，可以被解释为嵌入。这一结果已经在线性储水池系统的情况下得到证实。在本文中，我们使用了一个非自治动态系统方法，建立了在训练和预测阶段近似的储水池状态空间子集的分形维度的上限。我们证明了当输入序列来自一个Nin维可逆动态系统时，该集合的分形维度上限为Nin。这里得到的结果对于RNNs中的计算降维以及从有限观测中估计动态系统的分形维度是有用的。这也是了解储水池计算机嵌入属性的一步。

更新时间: 2025-03-29 15:24:12

领域: math.DS,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.11357v2

How to safely discard features based on aggregate SHAP values

SHAP is one of the most popular local feature-attribution methods. Given a function f and an input x, it quantifies each feature's contribution to f(x). Recently, SHAP has been increasingly used for global insights: practitioners average the absolute SHAP values over many data points to compute global feature importance scores, which are then used to discard unimportant features. In this work, we investigate the soundness of this practice by asking whether small aggregate SHAP values necessarily imply that the corresponding feature does not affect the function. Unfortunately, the answer is no: even if the i-th SHAP value is 0 on the entire data support, there exist functions that clearly depend on Feature i. The issue is that computing SHAP values involves evaluating f on points outside of the data support, where f can be strategically designed to mask its dependence on Feature i. To address this, we propose to aggregate SHAP values over the extended support, which is the product of the marginals of the underlying distribution. With this modification, we show that a small aggregate SHAP value implies that we can safely discard the corresponding feature. We then extend our results to KernelSHAP, the most popular method to approximate SHAP values in practice. We show that if KernelSHAP is computed over the extended distribution, a small aggregate value justifies feature removal. This result holds independently of whether KernelSHAP accurately approximates true SHAP values, making it one of the first theoretical results to characterize the KernelSHAP algorithm itself. Our findings have both theoretical and practical implications. We introduce the Shapley Lie algebra, which offers algebraic insights that may enable a deeper investigation of SHAP and we show that randomly permuting each column of the data matrix enables safely discarding features based on aggregate SHAP and KernelSHAP values.

Updated: 2025-03-29 15:07:30

标题: 如何安全地丢弃基于聚合SHAP值的特征

摘要: SHAP是最流行的局部特征归因方法之一。给定一个函数f和一个输入x，它量化每个特征对f(x)的贡献。最近，SHAP越来越多地用于全局洞察：实践者平均绝对SHAP值，以计算全局特征重要性分数，然后用于丢弃不重要的特征。在这项工作中，我们通过询问小的聚合SHAP值是否必然意味着相应的特征不会影响函数，来调查这种做法的合理性。不幸的是，答案是否定的：即使第i个SHAP值在整个数据支持上为0，仍然存在明显依赖特征i的函数。问题在于计算SHAP值涉及在数据支持之外的点上评估f，其中f可以被策略性地设计以掩盖其对特征i的依赖性。为了解决这个问题，我们建议在扩展支持上聚合SHAP值，这是基础分布的边缘的乘积。通过这种修改，我们表明小的聚合SHAP值意味着我们可以安全地丢弃相应的特征。然后，我们将我们的结果扩展到KernelSHAP，这是实践中用于近似SHAP值的最流行方法。我们显示，如果KernelSHAP是在扩展分布上计算的，小的聚合值证明了特征移除的正当性。这个结果独立于KernelSHAP是否准确近似真实SHAP值，使其成为第一个表征KernelSHAP算法本身的理论结果之一。我们的发现既有理论意义，也有实际意义。我们引入Shapley Lie代数，它提供了代数洞察，可能使对SHAP进行更深入的研究，并且我们表明随机排列数据矩阵的每一列可以安全地根据聚合SHAP和KernelSHAP值丢弃特征。

更新时间: 2025-03-29 15:07:30

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.23111v1

SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System

We present a novel text-to-speech (TTS) system, namely SupertonicTTS, for improved scalability and efficiency in speech synthesis. SupertonicTTS is comprised of three components: a speech autoencoder for continuous latent representation, a text-to-latent module leveraging flow-matching for text-to-latent mapping, and an utterance-level duration predictor. To enable a lightweight architecture, we employ a low-dimensional latent space, temporal compression of latents, and ConvNeXt blocks. We further simplify the TTS pipeline by operating directly on raw character-level text and employing cross-attention for text-speech alignment, thus eliminating the need for grapheme-to-phoneme (G2P) modules and external aligners. In addition, we introduce context-sharing batch expansion that accelerates loss convergence and stabilizes text-speech alignment. Experimental results demonstrate that SupertonicTTS achieves competitive performance while significantly reducing architectural complexity and computational overhead compared to contemporary TTS models. Audio samples demonstrating the capabilities of SupertonicTTS are available at: https://supertonictts.github.io/.

Updated: 2025-03-29 14:59:32

标题: SupertonicTTS：朝着高度可扩展和高效的文本转语音系统前进

摘要: 我们提出了一种新颖的文本到语音（TTS）系统，即SupertonicTTS，用于改进语音合成的可扩展性和效率。SupertonicTTS由三个组件组成：用于连续潜在表示的语音自编码器，利用流匹配进行文本到潜在映射的文本到潜在模块，以及基于语句级持续时间的预测器。为了实现轻量级架构，我们采用低维潜在空间，潜在的时间压缩和ConvNeXt块。我们进一步简化了TTS流水线，直接在原始字符级文本上操作，并利用交叉注意力进行文本-语音对齐，从而消除了对字素到音素（G2P）模块和外部对齐器的需求。此外，我们引入了共享上下文批扩展，加速损失收敛并稳定文本-语音对齐。实验结果表明，与当代TTS模型相比，SupertonicTTS实现了具有竞争力的性能，同时显著减少了架构复杂性和计算开销。展示SupertonicTTS能力的音频样本可在以下链接找到：https://supertonictts.github.io/。

更新时间: 2025-03-29 14:59:32

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2503.23108v1

COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models

Leveraging the powerful reasoning capabilities of large language models (LLMs), recent LLM-based robot task planning methods yield promising results. However, they mainly focus on single or multiple homogeneous robots on simple tasks. Practically, complex long-horizon tasks always require collaboration among multiple heterogeneous robots especially with more complex action spaces, which makes these tasks more challenging. To this end, we propose COHERENT, a novel LLM-based task planning framework for collaboration of heterogeneous multi-robot systems including quadrotors, robotic dogs, and robotic arms. Specifically, a Proposal-Execution-Feedback-Adjustment (PEFA) mechanism is designed to decompose and assign actions for individual robots, where a centralized task assigner makes a task planning proposal to decompose the complex task into subtasks, and then assigns subtasks to robot executors. Each robot executor selects a feasible action to implement the assigned subtask and reports self-reflection feedback to the task assigner for plan adjustment. The PEFA loops until the task is completed. Moreover, we create a challenging heterogeneous multi-robot task planning benchmark encompassing 100 complex long-horizon tasks. The experimental results show that our work surpasses the previous methods by a large margin in terms of success rate and execution efficiency. The experimental videos, code, and benchmark are released at https://github.com/MrKeee/COHERENT.

Updated: 2025-03-29 14:57:20

标题: COHERENT：利用大型语言模型协同合作的异构多机器人系统

摘要: 利用大型语言模型（LLMs）强大的推理能力，最近基于LLMs的机器人任务规划方法取得了令人鼓舞的成果。然而，它们主要集中在简单任务上的单个或多个同质机器人。实际上，复杂的长期任务总是需要多个异构机器人之间的协作，特别是在更复杂的动作空间中，这使得这些任务更具挑战性。为此，我们提出了COHERENT，一个新颖的基于LLMs的异构多机器人系统协作任务规划框架，包括四旋翼飞行器、机器狗和机械臂。具体来说，设计了一个提案-执行-反馈-调整（PEFA）机制，用于分解和分配个体机器人的动作，其中一个集中式任务分配者提出一个任务规划提案，将复杂任务分解为子任务，然后将子任务分配给机器人执行者。每个机器人执行者选择一种可行的动作来执行分配的子任务，并向任务分配者报告自我反思反馈以进行计划调整。PEFA循环直到任务完成。此外，我们创建了一个包含100个复杂长期任务的具有挑战性的异构多机器人任务规划基准。实验结果表明，我们的工作在成功率和执行效率方面大大超过了先前的方法。实验视频、代码和基准已发布在https://github.com/MrKeee/COHERENT。

更新时间: 2025-03-29 14:57:20

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.15146v3

Fast Training of Recurrent Neural Networks with Stationary State Feedbacks

Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers at comparable parameter budgets. However, the recursive gradient computation with the backpropagation through time (or BPTT) algorithm remains the major computational bottleneck. In this work, we propose a novel method that replaces BPTT with a fixed gradient feedback mechanism, yielding an efficient approximation of the exact gradient propagation based on the assumption of time stationarity. Our approach leverages state-space model (SSM) principles to define a structured feedback matrix that directly propagates gradients from future time steps. This formulation bypasses the need for recursive gradient backpropagation, significantly reducing training overhead while preserving the network's ability to capture long-term dependencies. The experiments on language modeling benchmarks exhibit competitive perplexity scores, while significantly reducing the training costs. These promising results suggest that designing a feedback method like an SSM can fully exploit the efficiency advantages of RNNs for many practical applications.

Updated: 2025-03-29 14:45:52

标题: 使用稳态状态反馈快速训练循环神经网络

摘要: 最近，循环神经网络（RNNs）表现出较强的性能和比Transformer更快的推断速度，且参数预算相当。然而，通过时间（或BPTT）算法的递归梯度计算仍然是主要的计算瓶颈。在这项工作中，我们提出了一种新颖的方法，将BPTT替换为固定梯度反馈机制，从而基于时间稳态的假设提供了对精确梯度传播的高效近似。我们的方法利用状态空间模型（SSM）原则来定义一个结构化反馈矩阵，直接从未来的时间步传播梯度。这种公式化绕过了需要递归梯度反向传播的需求，显著减少了训练开销，同时保持了网络捕捉长期依赖性的能力。在语言建模基准测试上的实验展示了竞争性的困惑度分数，同时显著降低了训练成本。这些令人鼓舞的结果表明，设计像SSM这样的反馈方法可以充分利用RNNs的效率优势，适用于许多实际应用。

更新时间: 2025-03-29 14:45:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.23104v1

Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems

Reinforcement learning (RL) recommender systems often rely on static datasets that fail to capture the fluid, ever changing nature of user preferences in real-world scenarios. Meanwhile, generative AI techniques have emerged as powerful tools for creating synthetic data, including user profiles and behaviors. Recognizing this potential, we introduce Lusifer, an LLM-based simulation environment designed to generate dynamic, realistic user feedback for RL-based recommender training. In Lusifer, user profiles are incrementally updated at each interaction step, with Large Language Models (LLMs) providing transparent explanations of how and why preferences evolve. We focus on the MovieLens dataset, extracting only the last 40 interactions for each user, to emphasize recent behavior. By processing textual metadata (such as movie overviews and tags) Lusifer creates more context aware user states and simulates feedback on new items, including those with limited or no prior ratings. This approach reduces reliance on extensive historical data and facilitates cold start scenario handling and adaptation to out of distribution cases. Our experiments compare Lusifer with traditional collaborative filtering models, revealing that while Lusifer can be comparable in predictive accuracy, it excels at capturing dynamic user responses and yielding explainable results at every step. These qualities highlight its potential as a scalable, ethically sound alternative to live user experiments, supporting iterative and user-centric evaluations of RL-based recommender strategies. Looking ahead, we envision Lusifer serving as a foundational tool for exploring generative AI-driven user simulations, enabling more adaptive and personalized recommendation pipelines under real world constraints.

Updated: 2025-03-29 14:45:21

标题: Lusifer：基于LLM的在线推荐系统用户模拟反馈环境

摘要: 强化学习（RL）推荐系统通常依赖于静态数据集，无法捕捉真实世界场景中用户偏好的流动性和不断变化的特性。与此同时，生成式人工智能技术已经成为创建合成数据的强大工具，包括用户档案和行为。认识到这一潜力，我们介绍了Lusifer，一个基于LLM的仿真环境，旨在为基于RL的推荐器训练生成动态、逼真的用户反馈。在Lusifer中，用户档案在每个交互步骤中都会逐步更新，大型语言模型（LLMs）提供了关于偏好如何以及为什么发展的透明解释。我们专注于MovieLens数据集，仅提取每个用户的最后40次交互，以强调最近的行为。通过处理文本元数据（如电影概述和标签），Lusifer创建了更具上下文意识的用户状态，并模拟对新项目的反馈，包括那些具有有限或没有先前评级的项目。这种方法减少了对大量历史数据的依赖，促进了冷启动场景处理和对分布外情况的适应。我们的实验将Lusifer与传统协同过滤模型进行比较，结果显示，虽然Lusifer在预测准确性方面具有可比性，但在捕捉动态用户响应和产生可解释结果方面表现出色。这些特质突显了其作为可扩展、道德合理的替代实时用户实验的潜力，支持基于RL的推荐策略的迭代和用户中心评估。展望未来，我们设想Lusifer将成为探索生成式人工智能驱动的用户仿真的基础工具，使得在真实世界约束下更具适应性和个性化的推荐管道成为可能。

更新时间: 2025-03-29 14:45:21

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.13362v4

The geomagnetic storm and Kp prediction using Wasserstein transformer

The accurate forecasting of geomagnetic activity is important. In this work, we present a novel multimodal Transformer based framework for predicting the 3 days and 5 days planetary Kp index by integrating heterogeneous data sources, including satellite measurements, solar images, and KP time series. A key innovation is the incorporation of the Wasserstein distance into the transformer and the loss function to align the probability distributions across modalities. Comparative experiments with the NOAA model demonstrate performance, accurately capturing both the quiet and storm phases of geomagnetic activity. This study underscores the potential of integrating machine learning techniques with traditional models for improved real time forecasting.

Updated: 2025-03-29 14:39:42

标题: 使用Wasserstein变换器预测地磁风暴和Kp值

摘要: 地磁活动的准确预测至关重要。在这项工作中，我们提出了一种新颖的基于多模态Transformer框架的方法，通过整合卫星测量、太阳图像和KP时间序列等异质数据源，来预测3天和5天的行星Kp指数。一个关键的创新是将Wasserstein距离融入到Transformer和损失函数中，以对齐跨模态的概率分布。与NOAA模型的比较实验表明，该方法在准确捕捉地磁活动的静态和暴风雨阶段方面表现出色。这项研究强调了将机器学习技术与传统模型相结合，以改进实时预测的潜力。

更新时间: 2025-03-29 14:39:42

领域: cs.LG,eess.IV,math-ph,math.MP

下载: http://arxiv.org/abs/2503.23102v1

RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

Reinforcement learning (RL) can transform power grid operations by providing adaptive and scalable controllers essential for grid decarbonization. However, existing methods struggle with the complex dynamics, aleatoric uncertainty, long-horizon goals, and hard physical constraints that occur in real-world systems. This paper presents RL2Grid, a benchmark designed in collaboration with power system operators to accelerate progress in grid control and foster RL maturity. Built on a power simulation framework developed by RTE France, RL2Grid standardizes tasks, state and action spaces, and reward structures within a unified interface for a systematic evaluation and comparison of RL approaches. Moreover, we integrate real control heuristics and safety constraints informed by the operators' expertise to ensure RL2Grid aligns with grid operation requirements. We benchmark popular RL baselines on the grid control tasks represented within RL2Grid, establishing reference performance metrics. Our results and discussion highlight the challenges that power grids pose for RL methods, emphasizing the need for novel algorithms capable of handling real-world physical systems.

Updated: 2025-03-29 14:39:17

标题: RL2Grid：电网运营中强化学习的基准测试

摘要: 强化学习（RL）可以通过提供适应性和可扩展的控制器来改变电网运营，这对于电网脱碳至关重要。然而，现有方法在现实系统中复杂动态、随机不确定性、长期目标和严格的物理约束方面存在困难。本文介绍了RL2Grid，这是一个与电力系统运营商合作设计的基准，旨在加速电网控制的进展并促进RL的成熟。RL2Grid建立在由法国RTE开发的电力仿真框架上，标准化了任务、状态和动作空间以及奖励结构，在统一接口中对RL方法进行系统评估和比较。此外，我们整合了运营商专业知识提供的实际控制启发和安全约束，以确保RL2Grid符合电网运营要求。我们在RL2Grid中代表的电网控制任务上对流行的RL基准进行基准测试，建立参考性能指标。我们的结果和讨论突出了电网对RL方法的挑战，强调了需要处理现实物理系统的新算法的必要性。

更新时间: 2025-03-29 14:39:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.23101v1

Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models

Mixture of Experts (MoE) has emerged as a pivotal architectural paradigm for efficient scaling of Large Language Models (LLMs), operating through selective activation of parameter subsets for each input token. Nevertheless, conventional MoE architectures encounter substantial challenges, including excessive memory utilization and communication overhead during training and inference, primarily attributable to the proliferation of expert modules. In this paper, we introduce Mixture of Latent Experts (MoLE), a novel parameterization methodology that facilitates the mapping of specific experts into a shared latent space. Specifically, all expert operations are systematically decomposed into two principal components: a shared projection into a lower-dimensional latent space, followed by expert-specific transformations with significantly reduced parametric complexity. This factorized approach substantially diminishes parameter count and computational requirements. Beyond the pretraining implementation of the MoLE architecture, we also establish a rigorous mathematical framework for transforming pre-trained MoE models into the MoLE architecture, characterizing the sufficient conditions for optimal factorization and developing a systematic two-phase algorithm for this conversion process. Our comprehensive theoretical analysis demonstrates that MoLE significantly enhances computational efficiency across multiple dimensions while preserving model representational capacity. Empirical evaluations corroborate our theoretical findings, confirming that MoLE achieves performance comparable to standard MoE implementations while substantially reducing resource requirements.

Updated: 2025-03-29 14:35:34

标题: 超越标准MoE：混合潜在专家用于资源高效的语言模型

摘要: 混合专家（MoE）已经成为大型语言模型（LLMs）有效扩展的关键架构范式，通过选择性激活每个输入标记的参数子集进行操作。然而，传统的MoE架构遇到了重大挑战，包括在训练和推断过程中过度使用内存和通信开销，主要是由于专家模块的激增。在本文中，我们介绍了混合潜在专家（MoLE），这是一种新颖的参数化方法，有助于将特定专家映射到共享的潜在空间中。具体而言，所有专家操作都被系统地分解为两个主要组成部分：投影到较低维度潜在空间，然后是具有显著降低参数复杂性的专家特定转换。这种分解方法大大减少了参数数量和计算要求。除了MoLE架构的预训练实现外，我们还建立了一个严密的数学框架，将预训练的MoE模型转换为MoLE架构，描述了最佳分解的充分条件，并为这一转换过程开发了一个系统化的两阶段算法。我们的全面理论分析表明，MoLE显著提高了跨多个维度的计算效率，同时保持了模型的表示能力。实证评估证实了我们的理论发现，确认MoLE在大大减少资源需求的同时实现了与标准MoE实现可比的性能。

更新时间: 2025-03-29 14:35:34

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.23100v1

Accelerated Training through Iterative Gradient Propagation Along the Residual Path

Despite being the cornerstone of deep learning, backpropagation is criticized for its inherent sequentiality, which can limit the scalability of very deep models. Such models faced convergence issues due to vanishing gradient, later resolved using residual connections. Variants of these are now widely used in modern architecture. However, the computational cost of backpropagation remains a major burden, accounting for most of the training time. Taking advantage of residual-like architectural designs, we introduce Highway backpropagation, a parallelizable iterative algorithm that approximates backpropagation, by alternatively i) accumulating the gradient estimates along the residual path, and ii) backpropagating them through every layer in parallel. This algorithm is naturally derived from a decomposition of the gradient as the sum of gradients flowing through all paths and is adaptable to a diverse set of common architectures, ranging from ResNets and Transformers to recurrent neural networks. Through an extensive empirical study on a large selection of tasks and models, we evaluate Highway-BP and show that major speedups can be achieved with minimal performance degradation.

Updated: 2025-03-29 14:22:35

标题: 通过残差路径沿着迭代梯度传播加速训练

摘要: 尽管反向传播是深度学习的基石，但由于其固有的序列性而受到批评，这可能限制非常深的模型的可扩展性。这些模型面临由于梯度消失而导致的收敛问题，后来通过使用残差连接得到解决。这些变体现在现代架构中被广泛使用。然而，反向传播的计算成本仍然是一个主要负担，占据大部分训练时间。利用类似残差的架构设计，我们介绍了高速反向传播，这是一种可并行化的迭代算法，通过交替累积梯度估计沿着残差路径，并将其同时反向传播到每一层。该算法自然地从梯度分解中导出，作为通过所有路径流动的梯度的总和，并且适用于各种常见的架构，从ResNets和Transformers到循环神经网络。通过对大量任务和模型的广泛实证研究，我们评估了高速反向传播并展示了可以在最小性能降低的情况下实现主要加速。

更新时间: 2025-03-29 14:22:35

领域: cs.LG

下载: http://arxiv.org/abs/2501.17086v2

UNITYAI-GUARD: Pioneering Toxicity Detection Across Low-Resource Indian Languages

This work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 888k training instances and 35k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.

Updated: 2025-03-29 14:20:13

标题: UNITYAI-GUARD：在资源匮乏的印度语言中开拓毒性检测

摘要: 这项工作介绍了UnityAI-Guard，一个针对低资源印度语言的二元毒性分类框架。虽然现有系统主要面向高资源语言，但UnityAI-Guard通过开发最先进的模型来识别跨不同婆罗门/印度文本中的有毒内容，弥补了这一关键差距。我们的方法在七种语言中取得了令人印象深刻的平均F1分数为84.23%，利用了888k个训练实例和35k个手动验证的测试实例的数据集。通过推动跨语言多样性地区的多语言内容管理，UnityAI-Guard还提供公共API访问，以促进更广泛的采用和应用。

更新时间: 2025-03-29 14:20:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.23088v1

Can Neural Decompilation Assist Vulnerability Prediction on Binary Code?

Vulnerability prediction is valuable in identifying security issues efficiently, even though it requires the source code of the target software system, which is a restrictive hypothesis. This paper presents an experimental study to predict vulnerabilities in binary code without source code or complex representations of the binary, leveraging the pivotal idea of decompiling the binary file through neural decompilation and predicting vulnerabilities through deep learning on the decompiled source code. The results outperform the state-of-the-art in both neural decompilation and vulnerability prediction, showing that it is possible to identify vulnerable programs with this approach concerning bi-class (vulnerable/non-vulnerable) and multi-class (type of vulnerability) analysis.

Updated: 2025-03-29 14:19:09

标题: 神经解编能辅助二进制代码的漏洞预测吗？

摘要: 漏洞预测在高效识别安全问题方面具有价值，尽管需要目标软件系统的源代码，这是一个限制性假设。本文提出了一项实验研究，用于在没有源代码或复杂二进制表示的情况下预测二进制代码中的漏洞，利用通过神经反编译来反编译二进制文件，并通过深度学习在反编译的源代码上预测漏洞的关键思想。结果在神经反编译和漏洞预测方面表现优异，表明可以通过这种方法识别存在漏洞的程序，涉及双类（存在漏洞/不存在漏洞）和多类（漏洞类型）分析。

更新时间: 2025-03-29 14:19:09

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.07538v2

Graph Representation Learning via Causal Diffusion for Out-of-Distribution Recommendation

Graph Neural Networks (GNNs)-based recommendation algorithms typically assume that training and testing data are drawn from independent and identically distributed (IID) spaces. However, this assumption often fails in the presence of out-of-distribution (OOD) data, resulting in significant performance degradation. In this study, we construct a Structural Causal Model (SCM) to analyze interaction data, revealing that environmental confounders (e.g., the COVID-19 pandemic) lead to unstable correlations in GNN-based models, thus impairing their generalization to OOD data. To address this issue, we propose a novel approach, graph representation learning via causal diffusion (CausalDiffRec) for OOD recommendation. This method enhances the model's generalization on OOD data by eliminating environmental confounding factors and learning invariant graph representations. Specifically, we use backdoor adjustment and variational inference to infer the real environmental distribution, thereby eliminating the impact of environmental confounders. This inferred distribution is then used as prior knowledge to guide the representation learning in the reverse phase of the diffusion process to learn the invariant representation. In addition, we provide a theoretical derivation that proves optimizing the objective function of CausalDiffRec can encourage the model to learn environment-invariant graph representations, thereby achieving excellent generalization performance in recommendations under distribution shifts. Our extensive experiments validate the effectiveness of CausalDiffRec in improving the generalization of OOD data, and the average improvement is up to 10.69% on Food, 18.83% on KuaiRec, 22.41% on Yelp2018, and 11.65% on Douban datasets.

Updated: 2025-03-29 14:13:14

标题: 通过因果扩散进行图表示学习，用于超出分布的推荐

摘要: 基于图神经网络（GNNs）的推荐算法通常假设训练和测试数据来自独立且相同分布（IID）的空间。然而，在存在超出分布（OOD）数据的情况下，这种假设经常失败，导致性能显著降低。在本研究中，我们构建了一个结构因果模型（SCM）来分析交互数据，揭示了环境混杂因素（例如COVID-19大流行）导致GNN模型中不稳定的相关性，从而损害了它们对OOD数据的泛化能力。为了解决这个问题，我们提出了一种新方法，即通过因果扩散进行图表示学习（CausalDiffRec）来进行OOD推荐。该方法通过消除环境混杂因素和学习不变的图表示来增强模型对OOD数据的泛化能力。具体来说，我们使用反向调整和变分推断来推断真实的环境分布，从而消除环境混杂因素的影响。然后，这种推断的分布被用作先验知识来引导扩散过程的逆相位中的表示学习，以学习不变的表示。此外，我们提供了一个理论推导，证明优化CausalDiffRec的目标函数可以鼓励模型学习环境不变的图表示，从而在分布转移下实现出色的泛化性能。我们的大量实验证实了CausalDiffRec在改进OOD数据的泛化能力方面的有效性，平均改进率在食品上高达10.69％，在KuaiRec上高达18.83％，在Yelp2018上高达22.41％，在豆瓣数据集上高达11.65％。

更新时间: 2025-03-29 14:13:14

领域: cs.LG,cs.AI,cs.IR,cs.SI

下载: http://arxiv.org/abs/2408.00490v3

The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction

Large language models (LLMs) excel on a variety of reasoning benchmarks, but previous studies suggest they sometimes struggle to generalize to unseen questions, potentially due to over-reliance on memorized training examples. However, the precise conditions under which LLMs switch between reasoning and memorization during text generation remain unclear. In this work, we provide a mechanistic understanding of LLMs' reasoning-memorization dynamics by identifying a set of linear features in the model's residual stream that govern the balance between genuine reasoning and memory recall. These features not only distinguish reasoning tasks from memory-intensive ones but can also be manipulated to causally influence model performance on reasoning tasks. Additionally, we show that intervening in these reasoning features helps the model more accurately activate the most relevant problem-solving capabilities during answer generation. Our findings offer new insights into the underlying mechanisms of reasoning and memory in LLMs and pave the way for the development of more robust and interpretable generative AI systems.

Updated: 2025-03-29 14:00:44

标题: 语言模型中的推理-记忆相互作用由单一方向中介

摘要: 大型语言模型在各种推理基准测试中表现优秀，但先前的研究表明它们有时在泛化到未见问题时可能会遇到困难，这可能是因为过度依赖记忆训练示例。然而，LLMs在文本生成过程中何时在推理和记忆之间切换的确切条件仍不清楚。在这项工作中，我们通过识别模型残差流中的一组线性特征，提供了对LLMs推理-记忆动态的机械理解，这些特征控制真正推理和记忆召回之间的平衡。这些特征不仅区分推理任务和记忆密集任务，还可以被操纵以因果影响模型在推理任务上的性能。此外，我们展示干预这些推理特征有助于模型更准确地激活最相关的解决问题能力，在生成答案时。我们的发现提供了关于LLMs推理和记忆基础机制的新见解，并为更健壮和可解释的生成式人工智能系统的发展铺平了道路。

更新时间: 2025-03-29 14:00:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.23084v1

Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators

Few-shot adaptation is an important capability for intelligent robots that perform tasks in open-world settings such as everyday environments or flexible production. In this paper, we propose a novel approach for non-prehensile manipulation which incrementally adapts a physics-based dynamics model for model-predictive control (MPC). The model prediction is aligned with a few examples of robot-object interactions collected with the MPC. This is achieved by using a parallelizable rigid-body physics simulation as dynamic world model and sampling-based optimization of the model parameters. In turn, the optimized dynamics model can be used for MPC using efficient sampling-based optimization. We evaluate our few-shot adaptation approach in object pushing experiments in simulation and with a real robot.

Updated: 2025-03-29 13:58:22

标题: 使用可并行化的物理模拟器进行非抓取式物体操作的增量式少样本适应

摘要: Few-shot adaptation是智能机器人在开放世界环境中执行任务的重要能力，例如日常环境或灵活生产。在本文中，我们提出了一种用于非抓取操作的新颖方法，该方法逐步调整基于物理的动力学模型，用于模型预测控制（MPC）。模型预测与使用MPC收集的少量机器人-物体交互示例保持一致。这是通过使用可并行化的刚体物理模拟作为动态世界模型和基于采样的模型参数优化来实现的。转而，优化的动力学模型可以用于使用高效基于采样的优化的MPC。我们在模拟中和实际机器人中的物体推动实验中评估了我们的few-shot adaptation方法。

更新时间: 2025-03-29 13:58:22

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.13228v2

Efficient Adaptation For Remote Sensing Visual Grounding

Foundation models have revolutionized artificial intelligence (AI), offering remarkable capabilities across multi-modal domains. Their ability to precisely locate objects in complex aerial and satellite images, using rich contextual information and detailed object descriptions, is essential for remote sensing (RS). These models can associate textual descriptions with object positions through the Visual Grounding (VG) task, but due to domain-specific challenges, their direct application to RS produces sub-optimal results. To address this, we applied Parameter Efficient Fine Tuning (PEFT) techniques to adapt these models for RS-specific VG tasks. Specifically, we evaluated LoRA placement across different modules in Grounding DINO and used BitFit and adapters to fine-tune the OFA foundation model pre-trained on general-purpose VG datasets. This approach achieved performance comparable to or surpassing current State Of The Art (SOTA) models while significantly reducing computational costs. This study highlights the potential of PEFT techniques to advance efficient and precise multi-modal analysis in RS, offering a practical and cost-effective alternative to full model training.

Updated: 2025-03-29 13:49:11

标题: 高效适应远程感知视觉基础

摘要: 基础模型已经彻底改变了人工智能领域，为多模态领域提供了卓越的能力。它们通过使用丰富的上下文信息和详细的对象描述，在复杂的航空和卫星图像中精确定位对象，在遥感领域至关重要。这些模型可以通过视觉定位（VG）任务将文本描述与对象位置相关联，但由于特定领域的挑战，直接应用于遥感领域会产生次优结果。为了解决这个问题，我们应用了参数高效微调（PEFT）技术，以适应这些模型用于遥感特定VG任务。具体来说，我们评估了在Grounding DINO中不同模块中的LoRA放置，并使用BitFit和适配器来微调在通用VG数据集上预训练的OFA基础模型。这种方法实现了与当前最先进模型相媲美或超越其性能，同时显著降低了计算成本。这项研究突出了PEFT技术在推进遥感领域高效和精确的多模态分析中的潜力，为全模型训练提供了实用且具有成本效益的替代方案。

更新时间: 2025-03-29 13:49:11

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.23083v1

InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

Tablets and styluses are increasingly popular for taking notes. To optimize this experience and ensure a smooth and efficient workflow, it's important to develop methods for accurately interpreting and understanding the content of handwritten digital notes. We introduce a foundational model called InkFM for analyzing full pages of handwritten content. Trained on a diverse mixture of tasks, this model offers a unique combination of capabilities: recognizing text in 28 different scripts, mathematical expressions recognition, and segmenting pages into distinct elements like text and drawings. Our results demonstrate that these tasks can be effectively unified within a single model, achieving SoTA text line segmentation out-of-the-box quality surpassing public baselines like docTR. Fine- or LoRA-tuning our base model on public datasets further improves the quality of page segmentation, achieves state-of the art text recognition (DeepWriting, CASIA, SCUT, and Mathwriting datasets) and sketch classification (QuickDraw). This adaptability of InkFM provides a powerful starting point for developing applications with handwritten input.

Updated: 2025-03-29 13:45:24

标题: InkFM：一种用于全页在线手写笔记理解的基础模型

摘要: 平板电脑和触控笔在记笔记方面越来越受欢迎。为了优化这种体验并确保流畅高效的工作流程，重要的是发展准确解释和理解手写数字笔记内容的方法。我们介绍了一个名为InkFM的基础模型，用于分析完整的手写内容页面。该模型在各种任务上进行了训练，具有独特的能力组合：识别28种不同脚本的文本，数学表达式识别，并将页面分割成文本和绘图等不同元素。我们的结果表明，这些任务可以有效地统一在一个模型中，实现了超越公共基线（如docTR）的文本行分割的顶尖质量。在公共数据集上对我们的基础模型进行Fine-或LoRA调整进一步提高了页面分割的质量，实现了最先进的文本识别（DeepWriting，CASIA，SCUT和Mathwriting数据集）和素描分类（QuickDraw）。InkFM的这种适应性为开发具有手写输入的应用程序提供了强大的起点。

更新时间: 2025-03-29 13:45:24

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.23081v1

Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion

Cycle-level simulators such as gem5 are widely used in microarchitecture design, but they are prohibitively slow for large-scale design space explorations. We present Concorde, a new methodology for learning fast and accurate performance models of microarchitectures. Unlike existing simulators and learning approaches that emulate each instruction, Concorde predicts the behavior of a program based on compact performance distributions that capture the impact of different microarchitectural components. It derives these performance distributions using simple analytical models that estimate bounds on performance induced by each microarchitectural component, providing a simple yet rich representation of a program's performance characteristics across a large space of microarchitectural parameters. Experiments show that Concorde is more than five orders of magnitude faster than a reference cycle-level simulator, with about 2% average Cycles-Per-Instruction (CPI) prediction error across a range of SPEC, open-source, and proprietary benchmarks. This enables rapid design-space exploration and performance sensitivity analyses that are currently infeasible, e.g., in about an hour, we conducted a first-of-its-kind fine-grained performance attribution to different microarchitectural components across a diverse set of programs, requiring nearly 150 million CPI evaluations.

Updated: 2025-03-29 13:25:20

标题: 协和：使用组合分析-机器学习融合实现快速准确的CPU性能建模

摘要: 循环级仿真器如gem5在微架构设计中被广泛使用，但对于大规模设计空间的探索来说速度过慢。我们提出了Concorde，一种学习微架构快速准确性能模型的新方法。与现有的仿真器和学习方法不同，Concorde根据捕捉不同微架构组件影响的紧凑性能分布来预测程序的行为。它使用简单的分析模型推导这些性能分布，估计每个微架构组件引起的性能上限，提供了一个简单而丰富的程序性能特征表示，涵盖了大量微架构参数空间。实验表明，Concorde比参考的循环级仿真器快五个数量级以上，SPEC、开源和专有基准测试中的平均指令周期（CPI）预测误差约为2%。这使得快速设计空间探索和性能敏感性分析成为可能，例如，我们在约一个小时内进行了首次对不同微架构组件进行细粒度性能归因的分析，涉及近1.5亿次CPI评估。

更新时间: 2025-03-29 13:25:20

领域: cs.AR,cs.LG,cs.PF

下载: http://arxiv.org/abs/2503.23076v1

TRACE: Intra-visit Clinical Event Nowcasting via Effective Patient Trajectory Encoding

Electronic Health Records (EHR) have become a valuable resource for a wide range of predictive tasks in healthcare. However, existing approaches have largely focused on inter-visit event predictions, overlooking the importance of intra-visit nowcasting, which provides prompt clinical insights during an ongoing patient visit. To address this gap, we introduce the task of laboratory measurement prediction within a hospital visit. We study the laboratory data that, however, remained underexplored in previous work. We propose TRACE, a Transformer-based model designed for clinical event nowcasting by encoding patient trajectories. TRACE effectively handles long sequences and captures temporal dependencies through a novel timestamp embedding that integrates decay properties and periodic patterns of data. Additionally, we introduce a smoothed mask for denoising, improving the robustness of the model. Experiments on two large-scale electronic health record datasets demonstrate that the proposed model significantly outperforms previous methods, highlighting its potential for improving patient care through more accurate laboratory measurement nowcasting. The code is available at https://github.com/Amehi/TRACE.

Updated: 2025-03-29 13:08:59

标题: TRACE：通过有效的患者轨迹编码进行诊疗事件预测

摘要: 电子健康记录（EHR）已成为医疗保健中广泛的预测任务的宝贵资源。然而，现有方法主要集中在访问间事件预测上，忽视了访问内的即时预测的重要性，这在患者就诊过程中提供了及时的临床见解。为了解决这一差距，我们引入了医院就诊中实验室测量预测的任务。我们研究了实验室数据，这在先前的工作中仍然未被充分探索。我们提出了TRACE，这是一种基于Transformer的模型，通过对患者轨迹进行编码来设计临床事件的即时预测。TRACE有效地处理长序列，并通过一个整合了数据衰减特性和周期模式的时间戳嵌入来捕捉时间依赖性。此外，我们引入了一个平滑掩码进行去噪，提高了模型的鲁棒性。在两个大规模电子健康记录数据集上的实验证明，所提出的模型明显优于先前的方法，突显了通过更准确的实验室测量即时预测来改善患者护理的潜力。代码可在https://github.com/Amehi/TRACE 上找到。

更新时间: 2025-03-29 13:08:59

领域: cs.LG

下载: http://arxiv.org/abs/2503.23072v1

Advanced Deep Learning Methods for Protein Structure Prediction and Design

After AlphaFold won the Nobel Prize, protein prediction with deep learning once again became a hot topic. We comprehensively explore advanced deep learning methods applied to protein structure prediction and design. It begins by examining recent innovations in prediction architectures, with detailed discussions on improvements such as diffusion based frameworks and novel pairwise attention modules. The text analyses key components including structure generation, evaluation metrics, multiple sequence alignment processing, and network architecture, thereby illustrating the current state of the art in computational protein modelling. Subsequent chapters focus on practical applications, presenting case studies that range from individual protein predictions to complex biomolecular interactions. Strategies for enhancing prediction accuracy and integrating deep learning techniques with experimental validation are thoroughly explored. The later sections review the industry landscape of protein design, highlighting the transformative role of artificial intelligence in biotechnology and discussing emerging market trends and future challenges. Supplementary appendices provide essential resources such as databases and open source tools, making this volume a valuable reference for researchers and students.

Updated: 2025-03-29 13:08:27

标题: 蛋白质结构预测和设计的先进深度学习方法

摘要: 在AlphaFold赢得诺贝尔奖之后，利用深度学习进行蛋白质预测再次成为热门话题。我们全面探索了应用于蛋白质结构预测和设计的先进深度学习方法。文章从审视预测架构的最新创新开始，详细讨论了基于扩散的框架和新颖的成对注意模块等改进。文本分析了包括结构生成、评估指标、多序列比对处理和网络架构在内的关键组成部分，从而展示了计算蛋白质建模的最新技术水平。随后的章节聚焦于实际应用，呈现了从单个蛋白预测到复杂生物分子相互作用的案例研究。深入探讨了提高预测准确性和将深度学习技术与实验验证相结合的策略。后续章节回顾了蛋白设计的行业格局，突出了人工智能在生物技术中的变革作用，并讨论了新兴市场趋势和未来挑战。附录提供了数据库和开源工具等必要资源，使本卷成为研究人员和学生的宝贵参考资料。

更新时间: 2025-03-29 13:08:27

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.13522v3

Weighted Graph Structure Learning with Attention Denoising for Node Classification

Node classification in graphs aims to predict the categories of unlabeled nodes by utilizing a small set of labeled nodes. However, weighted graphs often contain noisy edges and anomalous edge weights, which can distort fine-grained relationships between nodes and hinder accurate classification. We propose the Edge Weight-aware Graph Structure Learning (EWGSL) method, which combines weight learning and graph structure learning to address these issues. EWGSL improves node classification by redefining attention coefficients in graph attention networks to incorporate node features and edge weights. It also applies graph structure learning to sparsify attention coefficients and uses a modified InfoNCE loss function to enhance performance by adapting to denoised graph weights. Extensive experimental results show that EWGSL has an average Micro-F1 improvement of 17.8% compared with the best baseline.

Updated: 2025-03-29 13:07:31

标题: 使用关注去噪的加权图结构学习进行节点分类

摘要: 图中的节点分类旨在通过利用一小组已标记的节点来预测未标记节点的类别。然而，加权图通常包含嘈杂的边和异常的边权重，这可能会扭曲节点之间的细粒度关系并阻碍准确分类。我们提出了Edge Weight-aware Graph Structure Learning（EWGSL）方法，该方法结合了权重学习和图结构学习来解决这些问题。EWGSL通过重新定义图注意力网络中的注意力系数来整合节点特征和边权重，从而改善节点分类。它还将图结构学习应用于稀疏化注意力系数，并使用修改后的InfoNCE损失函数来通过适应去噪图权重来提高性能。大量实验结果显示，与最佳基线相比，EWGSL的平均Micro-F1改进率为17.8%。

更新时间: 2025-03-29 13:07:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.12157v2

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

Updated: 2025-03-29 12:57:07

标题: 在对比视觉语言预训练中建模标题多样性

摘要: 有一千种方式来为一幅图像加上标题。另一方面，对比语言预训练（CLIP）通过将图像和其标题映射到单个向量来工作，限制了类似于CLIP的模型能够表示图像描述的多样性的能力。在这项研究中，我们介绍了Llip，即潜在语言图像预训练，该模型可以模拟与图像匹配的多样性标题。Llip的视觉编码器输出一组视觉特征，通过根据从文本中获得的信息进行条件化，将其混合成最终表示。我们展示了Llip在各种任务上优于类似于CLIP和SigLIP这样的非上下文化基线，即使使用大规模编码器也是如此。Llip通过ViT-G/14编码器平均提高了2.9%的零样本分类基准。具体来说，Llip在ImageNet上实现了83.5%的零样本top-1准确率，比同等大小的CLIP高出1.4%。我们还展示了在MS-COCO上零样本检索的改进达到了6.0%。我们对该方法引入的组件进行了全面分析，并证明Llip可以产生更丰富的视觉表示。

更新时间: 2025-03-29 12:57:07

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.00740v4

Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains

The widespread adoption of digital services, along with the scale and complexity at which they operate, has made incidents in IT operations increasingly more likely, diverse, and impactful. This has led to the rapid development of a central aspect of "Artificial Intelligence for IT Operations" (AIOps), focusing on detecting anomalies in vast amounts of multivariate time series data generated by service entities. In this paper, we begin by introducing a unifying framework for benchmarking unsupervised anomaly detection (AD) methods, and highlight the problem of shifts in normal behaviors that can occur in practical AIOps scenarios. To tackle anomaly detection under domain shift, we then cast the problem in the framework of domain generalization and propose a novel approach, Domain-Invariant VAE for Anomaly Detection (DIVAD), to learn domain-invariant representations for unsupervised anomaly detection. Our evaluation results using the Exathlon benchmark show that the two main DIVAD variants significantly outperform the best unsupervised AD method in maximum performance, with 20% and 15% improvements in maximum peak F1-scores, respectively. Evaluation using the Application Server Dataset further demonstrates the broader applicability of our domain generalization methods.

Updated: 2025-03-29 12:38:28

标题: 跨异构领域的多变量时间序列中的无监督异常检测

摘要: 数字服务的广泛采用，以及它们运作的规模和复杂性，使得IT运营中的事件越来越可能、多样化和具有影响力。这导致了“IT运营的人工智能”（AIOps）的一个核心方面的迅速发展，重点是检测由服务实体生成的大量多变时间序列数据中的异常。在本文中，我们首先引入了一个统一的框架，用于对无监督异常检测（AD）方法进行基准测试，并强调了在实际AIOps场景中可能发生的正常行为转变问题。为了解决域转移下的异常检测问题，我们将问题置于域泛化框架中，并提出了一种新颖方法，即用于无监督异常检测的域不变VAE（DIVAD），以学习用于无监督异常检测的域不变表示。我们使用Exathlon基准测试的评估结果显示，这两种主要的DIVAD变体在最大性能方面明显优于最佳无监督AD方法，分别在最大峰值F1分数上分别提高了20%和15%。使用应用程序服务器数据集进行评估进一步证明了我们域泛化方法的更广泛适用性。

更新时间: 2025-03-29 12:38:28

领域: cs.LG

下载: http://arxiv.org/abs/2503.23060v1

MathWriting: A Dataset For Handwritten Mathematical Expression Recognition

Recognition of handwritten mathematical expressions allows to transfer scientific notes into their digital form. It facilitates the sharing, searching, and preservation of scientific information. We introduce MathWriting, the largest online handwritten mathematical expression dataset to date. It consists of 230k human-written samples and an additional 400k synthetic ones}. This dataset can also be used in its rendered form for offline HME recognition. One MathWriting sample consists of a formula written on a touch screen and a corresponding LaTeX expression. We also provide a normalized version of LaTeX expression to simplify the recognition task and enhance the result quality. We provide baseline performance of standard models like OCR and CTC Transformer as well as Vision-Language Models like PaLI on the dataset. The dataset together with an example colab is accessible on Github.

Updated: 2025-03-29 12:18:26

标题: MathWriting：一个手写数学表达识别的数据集

摘要: 手写数学表达式的识别可以将科学笔记转换为数字形式，有助于科学信息的共享、搜索和保存。我们介绍了MathWriting，迄今为止最大的在线手写数学表达式数据集。它由230,000个人工书写样本和额外的40万个合成样本组成。该数据集也可以以其渲染形式用于离线HME识别。一个MathWriting样本包括在触摸屏上书写的公式和相应的LaTeX表达式。我们还提供了LaTeX表达式的标准化版本，以简化识别任务并增强结果质量。我们在数据集上提供了OCR和CTC Transformer等标准模型以及PaLI等视觉语言模型的基准性能。数据集以及示例colab可在Github上访问。

更新时间: 2025-03-29 12:18:26

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.10690v2

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Recently, 3D reconstruction and generation have demonstrated impressive novel view synthesis results, achieving high fidelity and efficiency. However, a notable conditioning gap can be observed between these two fields, e.g., scalable 3D scene reconstruction often requires densely captured views, whereas 3D generation typically relies on a single or no input view, which significantly limits their applications. We found that the source of this phenomenon lies in the misalignment between 3D constraints and generative priors. To address this problem, we propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings. Moreover, we propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set, enabling progressive expansion and addressing the viewpoint saturation limitations seen in previous reconstruction and generation pipelines. Our evaluation, including view synthesis from sparse view and masked input, validates the effectiveness of our approach. More details at https://genfusion.sibowu.com.

Updated: 2025-03-29 12:18:02

标题: GenFusion: 通过视频将重建和生成之间的闭环闭合

摘要: 最近，3D重建和生成已经展示出令人印象深刻的新颖视角合成结果，实现了高保真度和效率。然而，这两个领域之间存在明显的条件差距，例如，可扩展的3D场景重建通常需要密集捕获的视角，而3D生成通常依赖于单个或无输入视角，这显著限制了它们的应用。我们发现，这一现象的根源在于3D约束和生成先验之间的不一致。为了解决这个问题，我们提出了一个重建驱动的视频扩散模型，该模型学习将视频帧条件化为易出现问题的RGB-D渲染。此外，我们提出了一个循环融合管道，通过迭代地将生成模型的恢复帧添加到训练集中，实现渐进式扩展，并解决之前重建和生成管道中出现的视角饱和限制。我们的评估，包括从稀疏视角和遮罩输入合成视角，验证了我们方法的有效性。更多详情请访问https://genfusion.sibowu.com。

更新时间: 2025-03-29 12:18:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.21219v2

Enhanced Smart Contract Reputability Analysis using Multimodal Data Fusion on Ethereum

The evaluation of smart contract reputability is essential to foster trust in decentralized ecosystems. However, existing methods that rely solely on code analysis or transactional data, offer limited insight into evolving trustworthiness. We propose a multimodal data fusion framework that integrates code features with transactional data to enhance reputability prediction. Our framework initially focuses on AI-based code analysis, utilizing GAN-augmented opcode embeddings to address class imbalance, achieving 97.67% accuracy and a recall of 0.942 in detecting illicit contracts, surpassing traditional oversampling methods. This forms the crux of a reputability-centric fusion strategy, where combining code and transactional data improves recall by 7.25% over single-source models, demonstrating robust performance across validation sets. By providing a holistic view of smart contract behaviour, our approach enhances the model's ability to assess reputability, identify fraudulent activities, and predict anomalous patterns. These capabilities contribute to more accurate reputability assessments, proactive risk mitigation, and enhanced blockchain security.

Updated: 2025-03-29 12:07:37

标题: 在以太坊上使用多模态数据融合增强智能合约可信度分析

摘要: 智能合约信誉评估对于促进去中心化生态系统中的信任至关重要。然而，仅依赖代码分析或交易数据的现有方法对于不断发展的可信度提供有限的洞察力。我们提出了一个多模态数据融合框架，将代码特征与交易数据相结合，以增强信誉预测能力。我们的框架最初专注于基于人工智能的代码分析，利用增强了GAN的操作码嵌入来解决类别不平衡问题，实现了97.67%的准确度和0.942的召回率，超越了传统的过采样方法。这构成了一个以信誉为中心的融合策略，将代码和交易数据结合起来，使召回率比单一数据源模型提高了7.25%，在验证集中展现出强大的性能。通过提供对智能合约行为的全面视图，我们的方法增强了模型评估信誉、识别欺诈活动和预测异常模式的能力。这些能力有助于更准确地评估信誉、积极应对风险和增强区块链安全性。

更新时间: 2025-03-29 12:07:37

领域: cs.LG,cs.AI,cs.CR,cs.ET

下载: http://arxiv.org/abs/2503.17426v2

Fréchet regression with implicit denoising and multicollinearity reduction

Fr\'echet regression extends linear regression to model complex responses in metric spaces, making it particularly relevant for multi-label regression, where eachinstance can have multiple associated labels. However, addressing noise and dependencies among predictors within this framework remains un derexplored. In this paper, we present an extension of the Global Fr\'echet re gression model that enables explicit modeling of relationships between input variables and multiple responses. To address challenges arising from noise and multicollinearity, we propose a novel framework based on implicit regu larization, which preserves the intrinsic structure of the data while effectively capturing complex dependencies. Our approach ensures accurate and efficient modeling without the biases introduced by traditional explicit regularization methods. Theoretical guarantees are provided, and the performance of the proposed method is demonstrated through numerical experiments.

Updated: 2025-03-29 12:06:41

标题: Fréchet回归中的隐式去噪和多重共线性减少

摘要: Fr\'echet回归将线性回归扩展到模拟度量空间中的复杂响应，特别适用于多标签回归，其中每个实例可以有多个关联标签。然而，在这个框架内解决噪声和预测变量之间的依赖关系仍未被充分探讨。在本文中，我们提出了全局Fr\'echet回归模型的扩展，该模型使得能够明确建模输入变量和多个响应之间的关系。为了解决噪声和多重共线性带来的挑战，我们提出了一种基于隐式正则化的新框架，该框架保留了数据的内在结构，同时有效地捕捉复杂的依赖关系。我们的方法确保了准确和高效的建模，避免了传统显式正则化方法引入的偏见。我们提供了理论保证，并通过数值实验展示了所提方法的性能。

更新时间: 2025-03-29 12:06:41

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.18247v2

Prediction of 30-day hospital readmission with clinical notes and EHR information

High hospital readmission rates are associated with significant costs and health risks for patients. Therefore, it is critical to develop predictive models that can support clinicians to determine whether or not a patient will return to the hospital in a relatively short period of time (e.g, 30-days). Nowadays, it is possible to collect both structured (electronic health records - EHR) and unstructured information (clinical notes) about a patient hospital event, all potentially containing relevant information for a predictive model. However, their integration is challenging. In this work we explore the combination of clinical notes and EHRs to predict 30-day hospital readmissions. We address the representation of the various types of information available in the EHR data, as well as exploring LLMs to characterize the clinical notes. We collect both information sources as the nodes of a graph neural network (GNN). Our model achieves an AUROC of 0.72 and a balanced accuracy of 66.7\%, highlighting the importance of combining the multimodal information.

Updated: 2025-03-29 11:54:18

标题: 使用临床笔记和电子病历信息预测30天内的住院再入院情况

摘要: 高医院再入院率与患者的显著成本和健康风险相关。因此，开发能够支持临床医生确定患者是否会在相对短的时间内（例如30天）返回医院的预测模型至关重要。现在，可以收集关于患者住院事件的结构化（电子健康记录-EHR）和非结构化信息（临床笔记），所有这些信息都可能包含对预测模型有关的信息。然而，它们的整合是具有挑战性的。在这项工作中，我们探讨了临床笔记和EHR的组合，以预测30天内的医院再入院。我们处理了EHR数据中可用信息的各种类型的表示，以及探索了LLM来表征临床笔记。我们将这两种信息源收集为图神经网络（GNN）的节点。我们的模型实现了0.72的AUROC和66.7%的平衡准确性，突出了结合多模态信息的重要性。

更新时间: 2025-03-29 11:54:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.23050v1

Dynamic spillovers and investment strategies across artificial intelligence ETFs, artificial intelligence tokens, and green markets

This paper investigates the risk spillovers among AI ETFs, AI tokens, and green markets using the R2 decomposition method. We reveal several key insights. First, the overall transmission connectedness index (TCI) closely aligns with the contemporaneous TCI, while the lagged TCI is significantly lower. Second, AI ETFs and clean energy act as risk transmitters, whereas AI tokens and green bond function as risk receivers. Third, AI tokens are difficult to hedge and provide limited hedging ability compared to AI ETFs and green assets. However, multivariate portfolios effectively reduce AI tokens investment risk. Among them, the minimum correlation portfolio outperforms the minimum variance and minimum connectedness portfolios.

Updated: 2025-03-29 11:40:52

标题: 动态溢出和投资策略跨越人工智能ETF、人工智能代币和绿色市场

摘要: 本文利用R2分解方法研究了人工智能( AI ) ETFs 、 AI 代币和绿色市场之间的风险溢出。我们揭示了几个关键见解。首先，整体传输关联指数 ( TCI ) 与同期 TCI 密切相关，而滞后 TCI 明显较低。其次，AI ETFs 和清洁能源作为风险传递者，而 AI 代币和绿色债券则作为风险接收者。第三，AI 代币难以对冲，并且相比于 AI ETFs 和绿色资产提供有限的对冲能力。然而，多变量投资组合有效降低了AI代币的投资风险。其中，最小相关性投资组合表现优于最小方差和最小关联性投资组合。

更新时间: 2025-03-29 11:40:52

领域: q-fin.RM,cs.AI

下载: http://arxiv.org/abs/2503.01148v2

VLM-C4L: Continual Core Dataset Learning with Corner Case Optimization via Vision-Language Models for Autonomous Driving

With the widespread adoption and deployment of autonomous driving, handling complex environments has become an unavoidable challenge. Due to the scarcity and diversity of extreme scenario datasets, current autonomous driving models struggle to effectively manage corner cases. This limitation poses a significant safety risk, according to the National Highway Traffic Safety Administration (NHTSA), autonomous vehicle systems have been involved in hundreds of reported crashes annually in the United States, occurred in corner cases like sun glare and fog, which caused a few fatal accident. Furthermore, in order to consistently maintain a robust and reliable autonomous driving system, it is essential for models not only to perform well on routine scenarios but also to adapt to newly emerging scenarios, especially those corner cases that deviate from the norm. This requires a learning mechanism that incrementally integrates new knowledge without degrading previously acquired capabilities. However, to the best of our knowledge, no existing continual learning methods have been proposed to ensure consistent and scalable corner case learning in autonomous driving. To address these limitations, we propose VLM-C4L, a continual learning framework that introduces Vision-Language Models (VLMs) to dynamically optimize and enhance corner case datasets, and VLM-C4L combines VLM-guided high-quality data extraction with a core data replay strategy, enabling the model to incrementally learn from diverse corner cases while preserving performance on previously routine scenarios, thus ensuring long-term stability and adaptability in real-world autonomous driving. We evaluate VLM-C4L on large-scale real-world autonomous driving datasets, including Waymo and the corner case dataset CODA.

Updated: 2025-03-29 11:40:34

标题: VLM-C4L：通过视觉-语言模型进行自动驾驶的持续核心数据集学习及极端情况优化

摘要: 随着自动驾驶技术的广泛采用和部署，处理复杂环境已经成为一个不可避免的挑战。由于极端场景数据集的稀缺性和多样性，当前的自动驾驶模型难以有效处理边缘情况。根据美国国家公路交通安全管理局（NHTSA）的数据，自动驾驶车辆系统每年在美国发生数百起报告事故，其中一些发生在像阳光刺眼和雾霭等边缘情况，导致了一些致命事故。此外，为了始终保持强大可靠的自动驾驶系统，模型不仅需要在常规场景中表现良好，还需要适应新出现的场景，尤其是那些偏离正常的边缘情况。这需要一个学习机制，可以逐步整合新知识而不降低先前获得的能力。然而，据我们所知，目前没有现有的持续学习方法被提出来确保自动驾驶中的一致且可扩展的边缘情况学习。为了解决这些限制，我们提出了VLM-C4L，这是一个持续学习框架，引入了视觉-语言模型（VLMs）来动态优化和增强边缘情况数据集，VLM-C4L结合了VLM引导的高质量数据提取和核心数据重放策略，使模型能够在逐渐学习多样化的边缘情况的同时保持在先前常规情景中的表现，从而确保在现实世界的自动驾驶中具有长期的稳定性和适应性。我们在包括Waymo和边缘情况数据集CODA在内的大规模真实世界自动驾驶数据集上评估了VLM-C4L。

更新时间: 2025-03-29 11:40:34

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.23046v1

Estimating Unbounded Density Ratios: Applications in Error Control under Covariate Shift

The density ratio is an important metric for evaluating the relative likelihood of two probability distributions, with extensive applications in statistics and machine learning. However, existing estimation theories for density ratios often depend on stringent regularity conditions, mainly focusing on density ratio functions with bounded domains and ranges. In this paper, we study density ratio estimators using loss functions based on least squares and logistic regression. We establish upper bounds on estimation errors with standard minimax optimal rates, up to logarithmic factors. Our results accommodate density ratio functions with unbounded domains and ranges. We apply our results to nonparametric regression and conditional flow models under covariate shift and identify the tail properties of the density ratio as crucial for error control across domains affected by covariate shift. We provide sufficient conditions under which loss correction is unnecessary and demonstrate effective generalization capabilities of a source estimator to any suitable target domain. Our simulation experiments support these theoretical findings, indicating that the source estimator can outperform those derived from loss correction methods, even when the true density ratio is known.

Updated: 2025-03-29 11:35:39

标题: 估计无界密度比：在协变量转移下的误差控制应用

摘要: 密度比是评估两个概率分布相对可能性的重要指标，在统计学和机器学习中有广泛的应用。然而，现有的密度比估计理论往往依赖于严格的正则条件，主要关注具有有界定义域和值域的密度比函数。在本文中，我们研究了使用基于最小二乘和逻辑回归的损失函数的密度比估计器。我们建立了关于估计误差的上界，具有标准的极小极优速率，直到对数因子。我们的结果适用于具有无界定义域和值域的密度比函数。我们将我们的结果应用于非参数回归和条件流模型，在协变量转移下识别密度比的尾部特性对受协变量转移影响的域的错误控制至关重要。我们提供了损失校正是不必要的充分条件，并展示了源估计器对任何适合的目标域的有效泛化能力。我们的模拟实验支持这些理论发现，表明即使已知真实密度比，源估计器也可以胜过从损失校正方法得出的估计器。

更新时间: 2025-03-29 11:35:39

领域: stat.ML,cs.LG,62G05, 62G08, 68T07

下载: http://arxiv.org/abs/2504.01031v1

DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL

Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in multi-task reinforcement learning (RL). However, learning policies that efficiently satisfy arbitrary specifications not observed during training remains a challenging problem. Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of B\"uchi automata, which explicitly represent the semantics of LTL specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae. Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency. Code available at: https://deep-ltl.github.io/

Updated: 2025-03-29 11:33:49

标题: DeepLTL：学习有效满足多任务强化学习的复杂LTL规范

摘要: 线性时序逻辑（LTL）最近被广泛采用作为多任务强化学习（RL）中用于指定复杂、时间延长任务的强大形式主义。然而，学习有效满足在训练期间未观察到的任意规范仍然是一个具有挑战性的问题。现有方法存在一些缺点：它们通常仅适用于LTL的有限视野片段，受限于次优解，并且不能充分处理安全约束。在这项工作中，我们提出了一种新颖的学习方法来解决这些问题。我们的方法利用布氏自动机的结构，显式地表示LTL规范的语义，来学习以满足所需公式的真值分配序列为条件的策略。在各种离散和连续领域的实验表明，我们的方法能够零-shot满足各种有限和无限视野的规范，且在满足概率和效率方面优于现有方法。代码可在以下链接找到：https://deep-ltl.github.io/

更新时间: 2025-03-29 11:33:49

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.04631v2

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Advancements in Natural Language Processing (NLP), have led to the emergence of Large Language Models (LLMs) such as GPT, Llama, Claude, and Gemini, which excel across a range of tasks but require extensive fine-tuning to align their outputs with human expectations. A widely used method for achieving this alignment is Reinforcement Learning from Human Feedback (RLHF), which, despite its success, faces challenges in accurately modelling human preferences. In this paper, we introduce GazeReward, a novel framework that integrates implicit feedback -- and specifically eye-tracking (ET) data -- into the Reward Model (RM). In addition, we explore how ET-based features can provide insights into user preferences. Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models, demonstrating that our approach significantly improves the accuracy of the RM on established human preference datasets. This work advances the ongoing discussion on optimizing AI alignment with human values, exploring the potential of cognitive data for shaping future NLP research.

Updated: 2025-03-29 11:32:39

标题: 透过眼睛看AI：通过基于凝视的反馈奖励实现人类与大型语言模型的对齐

摘要: 自然语言处理（NLP）的进展导致了大型语言模型（LLMs）的出现，如GPT、Llama、Claude和Gemini，这些模型在各种任务上表现出色，但需要进行大量的微调以使其输出与人类期望保持一致。实现这种对齐的一种广泛使用的方法是从人类反馈中进行强化学习（RLHF），尽管取得了成功，但仍面临准确建模人类偏好的挑战。在本文中，我们介绍了GazeReward，这是一个将隐式反馈（特别是眼动追踪（ET）数据）整合到奖励模型（RM）中的新框架。此外，我们探讨了基于ET的特征如何提供对用户偏好的见解。通过消融研究，我们测试了不同整合方法、LLMs和ET生成模型的框架，证明了我们的方法显著提高了RM在已建立的人类偏好数据集上的准确性。这项工作推动了关于优化人工智能与人类价值观对齐的持续讨论，探索了认知数据对塑造未来NLP研究的潜力。

更新时间: 2025-03-29 11:32:39

领域: cs.CL,cs.AI,cs.CV,cs.HC

下载: http://arxiv.org/abs/2410.01532v3

ADAGE: Active Defenses Against GNN Extraction

Graph Neural Networks (GNNs) achieve high performance in various real-world applications, such as drug discovery, traffic states prediction, and recommendation systems. The fact that building powerful GNNs requires a large amount of training data, powerful computing resources, and human expertise turns the models into lucrative targets for model stealing attacks. Prior work has revealed that the threat vector of stealing attacks against GNNs is large and diverse, as an attacker can leverage various heterogeneous signals ranging from node labels to high-dimensional node embeddings to create a local copy of the target GNN at a fraction of the original training costs. This diversity in the threat vector renders the design of effective and general defenses challenging and existing defenses usually focus on one particular stealing setup. Additionally, they solely provide means to identify stolen model copies rather than preventing the attack. To close this gap, we propose the first and general Active Defense Against GNN Extraction (ADAGE). By analyzing the queries to the GNN, tracking their diversity in terms of proximity to different communities identified in the underlying graph, and increasing the defense strength with the growing fraction of communities that have been queried, ADAGE can prevent stealing in all common attack setups. Our extensive experimental evaluation using six benchmark datasets, four GNN models, and three types of adaptive attackers shows that ADAGE penalizes attackers to the degree of rendering stealing impossible, whilst not harming predictive performance for legitimate users. ADAGE, thereby, contributes towards securely sharing valuable GNNs in the future.

Updated: 2025-03-29 11:32:39

标题: ADAGE：防御GNN抽取的主动策略

摘要: 图神经网络（GNNs）在各种实际应用中取得了高性能，如药物发现、交通状态预测和推荐系统。构建强大的GNNs需要大量的训练数据、强大的计算资源和人类专业知识，这使得这些模型成为模型窃取攻击的利润丰厚的目标。先前的研究表明，针对GNNs的窃取攻击的威胁向量是庞大而多样的，因为攻击者可以利用各种异构信号，从节点标签到高维节点嵌入，以很小的原始训练成本创建目标GNN的本地副本。这种威胁向量的多样性使得设计有效和通用的防御措施具有挑战性，现有的防御措施通常只专注于特定的窃取设置。此外，它们仅提供了识别被窃取模型副本的手段，而不是防止攻击。为了填补这一差距，我们提出了第一个通用的反对GNN提取的主动防御（ADAGE）。通过分析对GNN的查询，跟踪它们在底层图中与不同社区的接近程度的多样性，并随着已查询社区比例的增长而增强防御力，ADAGE可以防止在所有常见的攻击设置中进行窃取。我们使用六个基准数据集、四个GNN模型和三种类型的自适应攻击者进行了广泛的实验评估，结果显示ADAGE惩罚攻击者到一定程度，使其无法进行窃取，同时不会损害合法用户的预测性能。因此，ADAGE有助于在未来安全地共享有价值的GNNs。

更新时间: 2025-03-29 11:32:39

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.00065v2

Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

Large language models (LLMs) often exhibit abrupt emergent behavior, whereby new abilities arise at certain points during their training. This phenomenon, commonly referred to as a ''phase transition'', remains poorly understood. In this study, we conduct an integrative analysis of such phase transitions by examining three interconnected perspectives: the similarity between LLMs and the human brain, the internal states of LLMs, and downstream task performance. We propose a novel interpretation for the learning dynamics of LLMs that vary in both training data and architecture, revealing that three phase transitions commonly emerge across these models during training: (1) alignment with the entire brain surges as LLMs begin adhering to task instructions Brain Alignment and Instruction Following, (2) unexpectedly, LLMs diverge from the brain during a period in which downstream task accuracy temporarily stagnates Brain Detachment and Stagnation, and (3) alignment with the brain reoccurs as LLMs become capable of solving the downstream tasks Brain Realignment and Consolidation. These findings illuminate the underlying mechanisms of phase transitions in LLMs, while opening new avenues for interdisciplinary research bridging AI and neuroscience.

Updated: 2025-03-29 11:08:30

标题: 三重相变：从神经科学的角度理解大型语言模型的学习动态

摘要: 大型语言模型（LLMs）经常表现出突然出现的紧急行为，即在它们的训练过程中某些时刻出现新的能力。这种现象通常被称为“相变”，但仍然不为人所理解。在这项研究中，我们通过检视三个相互关联的视角进行了对这些相变的综合分析：LLMs与人脑之间的相似性，LLMs的内部状态以及下游任务表现。我们提出了一种新颖的解释，说明了LLMs的学习动态在训练数据和架构上的变化，揭示出这些模型在训练过程中通常出现的三个相变：（1）随着LLMs开始遵循任务指令“大脑对齐和指令遵循”，与整个大脑的一致性激增；（2）出人意料地，在下游任务准确性暂时停滞期间，LLMs与大脑分歧“大脑分离和停滞”；（3）随着LLMs能够解决下游任务，“大脑重新对齐和巩固”时，与大脑重新对齐。这些发现揭示了LLMs相变的潜在机制，同时为跨学科研究开辟了新的途径，搭起了人工智能和神经科学之间的桥梁。

更新时间: 2025-03-29 11:08:30

领域: cs.CL,cs.AI,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2502.20779v2

STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing

Existing audio-driven visual dubbing methods have achieved great success. Despite this, we observe that the semantic ambiguity between spatial and temporal domains significantly degrades the synthesis stability for the dynamic faces. We argue that aligning the semantic features from spatial and temporal domains is a promising approach to stabilizing facial motion. To achieve this, we propose a Spatial-Temporal Semantic Alignment (STSA) method, which introduces a dual-path alignment mechanism and a differentiable semantic representation. The former leverages a Consistent Information Learning (CIL) module to maximize the mutual information at multiple scales, thereby reducing the manifold differences between spatial and temporal domains. The latter utilizes probabilistic heatmap as ambiguity-tolerant guidance to avoid the abnormal dynamics of the synthesized faces caused by slight semantic jittering. Extensive experimental results demonstrate the superiority of the proposed STSA, especially in terms of image quality and synthesis stability. Pre-trained weights and inference code are available at https://github.com/SCAILab-USTC/STSA.

Updated: 2025-03-29 11:04:10

标题: STSA：视觉配音的时空语义对齐

摘要: 现有的音频驱动的视觉配音方法取得了巨大成功。尽管如此，我们观察到空间和时间领域之间的语义歧义显著降低了动态面部合成的稳定性。我们认为，将空间和时间领域的语义特征对齐是稳定面部运动的一种有前途的方法。为了实现这一目标，我们提出了一种空间-时间语义对齐（STSA）方法，该方法引入了双路径对齐机制和可微分的语义表示。前者利用一致信息学习（CIL）模块，在多个尺度上最大化互信息，从而减少空间和时间领域之间的流形差异。后者利用概率热图作为容忍歧义的指导，避免合成面部的异常动态由于轻微的语义抖动而引起。大量实验结果表明了所提出的STSA方法在图像质量和合成稳定性方面的优越性。预训练权重和推理代码可在https://github.com/SCAILab-USTC/STSA获得。

更新时间: 2025-03-29 11:04:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.23039v1

Function Fitting Based on Kolmogorov-Arnold Theorem and Kernel Functions

This paper proposes a unified theoretical framework based on the Kolmogorov-Arnold representation theorem and kernel methods. By analyzing the mathematical relationship among kernels, B-spline basis functions in Kolmogorov-Arnold Networks (KANs) and the inner product operation in self-attention mechanisms, we establish a kernel-based feature fitting framework that unifies the two models as linear combinations of kernel functions. Under this framework, we propose a low-rank Pseudo-Multi-Head Self-Attention module (Pseudo-MHSA), which reduces the parameter count of traditional MHSA by nearly 50\%. Furthermore, we design a Gaussian kernel multi-head self-attention variant (Gaussian-MHSA) to validate the effectiveness of nonlinear kernel functions in feature extraction. Experiments on the CIFAR-10 dataset demonstrate that Pseudo-MHSA model achieves performance comparable to the ViT model of the same dimensionality under the MAE framework and visualization analysis reveals their similarity of multi-head distribution patterns. Our code is publicly available.

Updated: 2025-03-29 11:03:28

标题: 基于科尔莫哥洛夫-阿诺德定理和核函数的函数拟合

摘要: 本文提出了一个统一的理论框架，基于Kolmogorov-Arnold表示定理和核方法。通过分析核函数、Kolmogorov-Arnold网络(KANs)中的B样条基函数以及自注意机制中的内积操作之间的数学关系，我们建立了一个基于核函数的特征拟合框架，将这两个模型统一为核函数的线性组合。在这个框架下，我们提出了一个低秩伪多头自注意模块(Pseudo-MHSA)，将传统MHSA的参数数量减少了近50%。此外，我们设计了一个高斯核多头自注意变种(Gaussian-MHSA)，以验证非线性核函数在特征提取中的有效性。在CIFAR-10数据集上的实验表明，Pseudo-MHSA模型在MAE框架下的性能与相同维度的ViT模型相当，并且可视化分析揭示了它们多头分布模式的相似性。我们的代码已公开发布。

更新时间: 2025-03-29 11:03:28

领域: cs.LG

下载: http://arxiv.org/abs/2503.23038v1

Agentic Large Language Models, a survey

There is great interest in agentic LLMs, large language models that act as agents. We review the growing body of work in this area and provide a research agenda. Agentic LLMs are LLMs that (1) reason, (2) act, and (3) interact. We organize the literature according to these three categories. The research in the first category focuses on reasoning, reflection, and retrieval, aiming to improve decision making; the second category focuses on action models, robots, and tools, aiming for agents that act as useful assistants; the third category focuses on multi-agent systems, aiming for collaborative task solving and simulating interaction to study emergent social behavior. We find that works mutually benefit from results in other categories: retrieval enables tool use, reflection improves multi-agent collaboration, and reasoning benefits all categories. We discuss applications of agentic LLMs and provide an agenda for further research. Important applications are in medical diagnosis, logistics and financial market analysis. Meanwhile, self-reflective agents playing roles and interacting with one another augment the process of scientific research itself. Further, agentic LLMs may provide a solution for the problem of LLMs running out of training data: inference-time behavior generates new training states, such that LLMs can keep learning without needing ever larger datasets. We note that there is risk associated with LLM assistants taking action in the real world, while agentic LLMs are also likely to benefit society.

Updated: 2025-03-29 11:02:20

标题: 机构大型语言模型，一项调查

摘要: 我们对作为代理的大语言模型（agentic LLMs）展现了极大的兴趣，这些模型充当代理。我们回顾了这一领域不断增长的工作内容，并提供了一个研究议程。作为代理的LLMs是具有推理、行动和互动能力的LLMs。我们根据这三个类别整理了这些文献。第一个类别的研究聚焦于推理、反思和检索，旨在改善决策制定；第二个类别聚焦于行动模型、机器人和工具，旨在实现作为有用助手的代理；第三个类别聚焦于多智能系统，旨在解决协作任务和模拟互动以研究 emergent social behavior。我们发现这些工作互相受益于其他类别的结果：检索使工具使用成为可能，反思提高了多智能体协作效率，推理使所有类别受益。我们讨论了作为代理的LLMs的应用，并提供了进一步研究的议程。重要的应用领域包括医学诊断、物流和金融市场分析。与此同时，自我反思的代理扮演角色并相互作用有助于科学研究过程。此外，作为代理的LLMs可能为LLMs缺乏训练数据的问题提供解决方案：推理时间行为产生新的训练状态，使LLMs能够在不需要更大数据集的情况下不断学习。我们注意到，LLM助手在现实世界中采取行动存在风险，而作为代理的LLMs也可能造福社会。

更新时间: 2025-03-29 11:02:20

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.23037v1

Leaking LoRa: An Evaluation of Password Leaks and Knowledge Storage in Large Language Models

To effectively deploy Large Language Models (LLMs) in application-specific settings, fine-tuning techniques are applied to enhance performance on specialized tasks. This process often involves fine-tuning on user data data, which may contain sensitive information. Although not recommended, it is not uncommon for users to send passwords in messages, and fine-tuning models on this could result in passwords being leaked. In this study, a Large Language Model is fine-tuned with customer support data and passwords from the RockYou password wordlist using Low-Rank Adaptation (LoRA). Out of the first 200 passwords from the list, 37 were successfully recovered. Further, causal tracing is used to identify that password information is largely located in a few layers. Lastly, Rank One Model Editing (ROME) is used to remove the password information from the model, resulting in the number of passwords recovered going from 37 to 0.

Updated: 2025-03-29 10:42:58

标题: LoRa泄漏：对大型语言模型中密码泄漏和知识存储的评估

摘要: 为了有效地在特定应用环境中部署大型语言模型（LLMs），需要应用微调技术来提高在专门任务上的性能。这个过程通常涉及在用户数据上进行微调，这些数据可能包含敏感信息。虽然不建议这样做，但用户在消息中发送密码并不罕见，对这些密码进行微调可能导致密码泄露。在这项研究中，使用低秩适应（LoRA）对一个大型语言模型进行了微调，使用了来自RockYou密码字典的客户支持数据和密码。在列表的前200个密码中，成功恢复了37个密码。此外，因果追踪被用来识别密码信息主要位于少数几层中。最后，使用Rank One Model Editing（ROME）从模型中删除密码信息，使得成功恢复的密码数量从37个减少到0个。

更新时间: 2025-03-29 10:42:58

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.00031v1

Rethinking Optimization and Architecture for Tiny Language Models

The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing language models that are seldom studied carefully. In this study, based on a tiny language model with 1B parameters, we carefully design a series of empirical study to analyze the effect of each component. Three perspectives are mainly discussed, \ie, neural architecture, parameter initialization, and optimization strategy. Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training. Then we train PanGu-$\pi$-1B Pro and PanGu-$\pi$-1.5B Pro on 1.6T multilingual corpora, following the established formulas. Experimental results demonstrate the improved optimization and architecture yield a notable average improvement of 8.87 on benchmark evaluation sets for PanGu-$\pi$-1B Pro. Besides, PanGu-$\pi$-1.5B Pro surpasses a range of SOTA models with larger model sizes, validating its superior performance. The code is available at https://github.com/YuchuanTian/RethinkTinyLM.

Updated: 2025-03-29 10:38:01

标题: 重新思考微型语言模型的优化和架构

摘要: 大型语言模型（LLMs）的强大能力已通过大量数据和计算资源得到证明。然而，在移动设备上应用语言模型面临着计算和内存成本巨大的挑战，即迫切需要具有高性能的微型语言模型。受到高度复杂的训练过程的限制，优化语言模型的许多细节很少被仔细研究。在本研究中，基于一个具有10亿参数的微型语言模型，我们仔细设计了一系列经验研究，以分析每个组件的影响。主要讨论了三个视角，即神经架构、参数初始化和优化策略。几个设计公式在经验上被证明对微型语言模型特别有效，包括分词器压缩、架构微调、参数继承和多轮训练。然后我们在1.6T多语言语料库上训练了PanGu-$\pi$-1B Pro和PanGu-$\pi$-1.5B Pro，遵循已建立的公式。实验结果表明，改进的优化和架构使PanGu-$\pi$-1B Pro在基准评估集上平均改进了8.87。此外，PanGu-$\pi$-1.5B Pro超越了一系列具有更大模型大小的SOTA模型，验证了其卓越性能。代码可在https://github.com/YuchuanTian/RethinkTinyLM 上获取。

更新时间: 2025-03-29 10:38:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.02791v3

Fair Sufficient Representation Learning

The main objective of fair statistical modeling and machine learning is to minimize or eliminate biases that may arise from the data or the model itself, ensuring that predictions and decisions are not unjustly influenced by sensitive attributes such as race, gender, age, or other protected characteristics. In this paper, we introduce a Fair Sufficient Representation Learning (FSRL) method that balances sufficiency and fairness. Sufficiency ensures that the representation should capture all necessary information about the target variables, while fairness requires that the learned representation remains independent of sensitive attributes. FSRL is based on a convex combination of an objective function for learning a sufficient representation and an objective function that ensures fairness. Our approach manages fairness and sufficiency at the representation level, offering a novel perspective on fair representation learning. We implement this method using distance covariance, which is effective for characterizing independence between random variables. We establish the convergence properties of the learned representations. Experiments conducted on healthcase and text datasets with diverse structures demonstrate that FSRL achieves a superior trade-off between fairness and accuracy compared to existing approaches.

Updated: 2025-03-29 10:37:49

标题: 公平充分的表征学习

摘要: 公平统计建模和机器学习的主要目标是最小化或消除可能源自数据或模型本身的偏见，确保预测和决策不受种族、性别、年龄或其他受保护特征等敏感属性的不公正影响。本文介绍了一种公平充分表示学习（FSRL）方法，该方法平衡充分性和公平性。充分性确保表示应捕捉有关目标变量的所有必要信息，而公平性要求学习到的表示与敏感属性保持独立。FSRL基于学习充分表示的目标函数和确保公平性的目标函数的凸组合。我们的方法在表示水平管理公平性和充分性，提供了公平表示学习的新视角。我们使用距离协方差实现了该方法，这对于表征随机变量之间的独立性是有效的。我们建立了学习表示的收敛性质。在具有不同结构的健康和文本数据集上进行的实验表明，与现有方法相比，FSRL在公平性和准确性之间取得了更优越的权衡。

更新时间: 2025-03-29 10:37:49

领域: stat.ML,cs.LG,62G05, 68T07

下载: http://arxiv.org/abs/2504.01030v1

Reproducibility Companion Paper: Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems

In this paper, we reproduce the experimental results presented in our previous work titled "Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems," which was published in the proceedings of the 31st ACM International Conference on Multimedia. This paper aims to validate the effectiveness of our proposed method and help others reproduce our experimental results. We provide detailed descriptions of our preprocessed datasets, source code structure, configuration file settings, experimental environment, and reproduced experimental results.

Updated: 2025-03-29 10:25:49

标题: 可复现性伴随论文：使用户不可区分：推荐系统中的属性智能取消

摘要: 在本文中，我们复制了我们先前的作品中呈现的实验结果，该作品题为“使用户难以区分：推荐系统中属性逐一取消学习”，该作品发表在第31届ACM国际多媒体会议论文集中。本文旨在验证我们提出的方法的有效性，并帮助其他人复制我们的实验结果。我们提供了我们预处理数据集、源代码结构、配置文件设置、实验环境以及复制的实验结果的详细描述。

更新时间: 2025-03-29 10:25:49

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2503.23032v1

Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge

Large language models (LLMs) have achieved remarkable success in various tasks, such as decision-making, reasoning, and question answering. They have been widely used in edge devices. However, fine-tuning LLMs to specific tasks at the edge is challenging due to the high computational cost and the limited storage and energy resources at the edge. To address this issue, we propose TaskEdge, a task-aware parameter-efficient fine-tuning framework at the edge, which allocates the most effective parameters to the target task and only updates the task-specific parameters. Specifically, we first design a parameter importance calculation criterion that incorporates both weights and input activations into the computation of weight importance. Then, we propose a model-agnostic task-specific parameter allocation algorithm to ensure that task-specific parameters are distributed evenly across the model, rather than being concentrated in specific regions. In doing so, TaskEdge can significantly reduce the computational cost and memory usage while maintaining performance on the target downstream tasks by updating less than 0.1\% of the parameters. In addition, TaskEdge can be easily integrated with structured sparsity to enable acceleration by NVIDIA's specialized sparse tensor cores, and it can be seamlessly integrated with LoRA to enable efficient sparse low-rank adaptation. Extensive experiments on various tasks demonstrate the effectiveness of TaskEdge.

Updated: 2025-03-29 10:23:36

标题: 在边缘进行任务感知的参数高效微调大型预训练模型

摘要: 大型语言模型（LLMs）在各种任务中取得了显著的成功，例如决策制定、推理和问题回答。它们已被广泛应用于边缘设备。然而，在边缘对LLMs进行特定任务的微调是具有挑战性的，这是由于边缘的高计算成本以及有限的存储和能源资源。为了解决这个问题，我们提出了TaskEdge，这是一个在边缘的任务感知参数高效微调框架，它将最有效的参数分配给目标任务，并只更新任务特定的参数。具体来说，我们首先设计了一个参数重要性计算标准，将权重和输入激活结合起来计算权重重要性。然后，我们提出了一个与模型无关的任务特定参数分配算法，以确保任务特定参数均匀分布在整个模型中，而不是集中在特定区域。通过这样做，TaskEdge可以显著降低计算成本和内存使用量，同时通过更新少于0.1％的参数来保持目标下游任务的性能。此外，TaskEdge可以轻松集成结构化稀疏性，以实现通过NVIDIA的专门稀疏张量核心加速，并且可以无缝集成LoRA，以实现高效的稀疏低秩调整。对各种任务的广泛实验证明了TaskEdge的有效性。

更新时间: 2025-03-29 10:23:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03718v1

DP-GPL: Differentially Private Graph Prompt Learning

Graph Neural Networks (GNNs) have shown remarkable performance in various applications. Recently, graph prompt learning has emerged as a powerful GNN training paradigm, inspired by advances in language and vision foundation models. Here, a GNN is pre-trained on public data and then adapted to sensitive tasks using lightweight graph prompts. However, using prompts from sensitive data poses privacy risks. In this work, we are the first to investigate these practical risks in graph prompts by instantiating a membership inference attack that reveals significant privacy leakage. We also find that the standard privacy method, DP-SGD, fails to provide practical privacy-utility trade-offs in graph prompt learning, likely due to the small number of sensitive data points used to learn the prompts. As a solution, we propose DP-GPL for differentially private graph prompt learning based on the PATE framework, that generates a graph prompt with differential privacy guarantees. Our evaluation across various graph prompt learning methods, GNN architectures, and pre-training strategies demonstrates that our algorithm achieves high utility at strong privacy, effectively mitigating privacy concerns while preserving the powerful capabilities of prompted GNNs as powerful foundation models in the graph domain.

Updated: 2025-03-29 10:22:06

标题: DP-GPL: 差分隐私图提示学习

摘要: 图神经网络（GNNs）在各种应用中展现出卓越的性能。最近，图提示学习作为一种强大的GNN训练范式已经出现，受到了语言和视觉基础模型进展的启发。在这里，一个GNN在公共数据上进行预训练，然后使用轻量级图提示适应敏感任务。然而，使用来自敏感数据的提示会带来隐私风险。在这项工作中，我们首次调查了图提示中的这些实际风险，通过实例化一个会暴露显著隐私泄漏的会员推断攻击。我们还发现，标准的隐私方法DP-SGD在图提示学习中未能提供实际的隐私-效用权衡，可能是因为用于学习提示的敏感数据点数量较少。作为解决方案，我们提出了基于PATE框架的差分隐私图提示学习方法DP-GPL，该方法生成带有差分隐私保证的图提示。我们通过对各种图提示学习方法、GNN架构和预训练策略的评估表明，我们的算法在强隐私下实现了高效用，有效地减轻了隐私问题，同时保留了提示的GNN作为图领域强大基础模型的能力。

更新时间: 2025-03-29 10:22:06

领域: cs.LG

下载: http://arxiv.org/abs/2503.10544v2

Sustainable techniques to improve Data Quality for training image-based explanatory models for Recommender Systems

Visual explanations based on user-uploaded images are an effective and self-contained approach to provide transparency to Recommender Systems (RS), but intrinsic limitations of data used in this explainability paradigm cause existing approaches to use bad quality training data that is highly sparse and suffers from labelling noise. Popular training enrichment approaches like model enlargement or massive data gathering are expensive and environmentally unsustainable, thus we seek to provide better visual explanations to RS aligning with the principles of Responsible AI. In this work, we research the intersection of effective and sustainable training enrichment strategies for visual-based RS explainability models by developing three novel strategies that focus on training Data Quality: 1) selection of reliable negative training examples using Positive-unlabelled Learning, 2) transform-based data augmentation, and 3) text-to-image generative-based data augmentation. The integration of these strategies in three state-of-the-art explainability models increases 5% the performance in relevant ranking metrics of these visual-based RS explainability models without penalizing their practical long-term sustainability, as tested in multiple real-world restaurant recommendation explanation datasets.

Updated: 2025-03-29 10:16:08

标题: 可持续技术改进数据质量，用于训练基于图像的解释模型，用于推荐系统

摘要: 基于用户上传的图像的可视化解释是提供透明度给推荐系统（RS）的一种有效且自包含的方法，但在这种可解释性范例中使用的数据的固有限制导致现有方法使用质量较差、高度稀疏并且存在标签噪声的训练数据。流行的训练增强方法，如模型扩展或大规模数据收集，成本高昂且环境不可持续，因此我们寻求为RS提供更好的可视化解释，与负责任AI的原则相一致。在这项工作中，我们研究了针对基于视觉的RS可解释性模型的有效和可持续训练增强策略的交集，通过开发三种侧重于训练数据质量的新策略：1）使用正-未标记学习选择可靠的负训练样本，2）基于变换的数据增强，以及3）基于文本到图像生成的数据增强。这些策略在三种最先进的可解释性模型中的整合，可将这些基于视觉的RS可解释性模型的相关排名指标的性能提高5%，而不会损害它们的实际长期可持续性，如在多个真实世界的餐厅推荐解释数据集中进行测试。

更新时间: 2025-03-29 10:16:08

领域: cs.LG,cs.AI,cs.CV,cs.IR

下载: http://arxiv.org/abs/2407.06740v2

Dark patterns in e-commerce: a dataset and its baseline evaluations

Dark patterns, which are user interface designs in online services, induce users to take unintended actions. Recently, dark patterns have been raised as an issue of privacy and fairness. Thus, a wide range of research on detecting dark patterns is eagerly awaited. In this work, we constructed a dataset for dark pattern detection and prepared its baseline detection performance with state-of-the-art machine learning methods. The original dataset was obtained from Mathur et al.'s study in 2019, which consists of 1,818 dark pattern texts from shopping sites. Then, we added negative samples, i.e., non-dark pattern texts, by retrieving texts from the same websites as Mathur et al.'s dataset. We also applied state-of-the-art machine learning methods to show the automatic detection accuracy as baselines, including BERT, RoBERTa, ALBERT, and XLNet. As a result of 5-fold cross-validation, we achieved the highest accuracy of 0.975 with RoBERTa. The dataset and baseline source codes are available at https://github.com/yamanalab/ec-darkpattern.

Updated: 2025-03-29 09:57:32

标题: 电子商务中的阴暗模式：数据集及其基准评估

摘要: Dark patterns是在线服务中用户界面设计，会诱使用户采取意外的行动。最近，Dark patterns被提出作为隐私和公平的问题。因此，人们急切期待着对Dark patterns的检测研究。在这项工作中，我们构建了一个用于Dark pattern检测的数据集，并使用最先进的机器学习方法准备了基线检测性能。原始数据集来自于2019年Mathur等人的研究，其中包括来自购物网站的1,818个Dark pattern文本。然后，我们通过从与Mathur等人数据集相同的网站检索文本添加了负样本，即非Dark pattern文本。我们还应用了最先进的机器学习方法来展示自动检测的准确性基线，包括BERT、RoBERTa、ALBERT和XLNet。通过5倍交叉验证，我们在RoBERTa上实现了最高的准确性为0.975。数据集和基线源代码可在https://github.com/yamanalab/ec-darkpattern 上获得。

更新时间: 2025-03-29 09:57:32

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2211.06543v2

A limited technical background is sufficient for attack-defense tree acceptability

Attack-defense trees (ADTs) are a prominent graphical threat modeling method that is highly recommended for analyzing and communicating security-related information. Despite this, existing empirical studies of attack trees have established their acceptability only for users with highly technical (computer science) backgrounds while raising questions about their suitability for threat modeling stakeholders with a limited technical background. Our research addresses this gap by investigating the impact of the users' technical background on ADT acceptability in an empirical study. Our Method Evaluation Model-based study consisted of n = 102 participants (53 with a strong computer science background and 49 with a limited computer science background) who were asked to complete a series of ADT-related tasks. By analyzing their responses and comparing the results, we reveal that a very limited technical background is sufficient for ADT acceptability. This finding underscores attack trees' viability as a threat modeling method.

Updated: 2025-03-29 09:55:50

标题: 攻击防御树可接受性所需的技术背景有限

摘要: 攻击防御树（ADTs）是一种著名的图形威胁建模方法，强烈推荐用于分析和传达与安全相关的信息。尽管如此，现有的攻击树的实证研究仅确定了对具有高度技术（计算机科学）背景的用户的可接受性，同时对具有有限技术背景的威胁建模利益相关者的适用性提出了疑问。我们的研究通过实证研究调查用户的技术背景对ADT可接受性的影响来填补这一空白。我们的方法评估模型研究由n = 102名参与者（53名具有较强的计算机科学背景和49名具有有限计算机科学背景）组成，他们被要求完成一系列与ADT相关的任务。通过分析他们的回答并比较结果，我们发现，即使具有非常有限的技术背景，也足以使ADT可接受。这一发现强调了攻击树作为一种威胁建模方法的可行性。

更新时间: 2025-03-29 09:55:50

领域: cs.CR

下载: http://arxiv.org/abs/2502.11920v2

The Complexity of Algebraic Algorithms for LWE

Arora & Ge introduced a noise-free polynomial system to compute the secret of a Learning With Errors (LWE) instance via linearization. Albrecht et al. later utilized the Arora-Ge polynomial model to study the complexity of Gr\"obner basis computations on LWE polynomial systems under the assumption of semi-regularity. In this paper we revisit the Arora-Ge polynomial and prove that it satisfies a genericity condition recently introduced by Caminata & Gorla, called being in generic coordinates. For polynomial systems in generic coordinates one can always estimate the complexity of DRL Gr\"obner basis computations in terms of the Castelnuovo-Mumford regularity and henceforth also via the Macaulay bound. Moreover, we generalize the Gr\"obner basis algorithm of Semaev & Tenti to arbitrary polynomial systems with a finite degree of regularity. In particular, existence of this algorithm yields another approach to estimate the complexity of DRL Gr\"obner basis computations in terms of the degree of regularity. In practice, the degree of regularity of LWE polynomial systems is not known, though one can always estimate the lowest achievable degree of regularity. Consequently, from a designer's worst case perspective this approach yields sub-exponential complexity estimates for general, binary secret and binary error LWE. In recent works by Dachman-Soled et al. the hardness of LWE in the presence of side information was analyzed. Utilizing their framework we discuss how hints can be incorporated into LWE polynomial systems and how they affect the complexity of Gr\"obner basis computations.

Updated: 2025-03-29 09:44:21

标题: LWE问题的代数算法的复杂性

摘要: Arora & Ge引入了一个无噪声的多项式系统，通过线性化计算学习中的错误（LWE）实例的秘密。 Albrecht等人后来利用Arora-Ge多项式模型研究了在半正则性假设下计算LWE多项式系统的Gr\"obner基础的复杂性。本文重新审视Arora-Ge多项式，并证明它满足Caminata & Gorla最近引入的一种称为处于通用坐标的通用性条件。对于处于通用坐标的多项式系统，可以始终通过Castelnuovo-Mumford正则性来估计DRL Gr\"obner基础计算的复杂性，因此也可通过Macaulay界估计。此外，我们将Semaev & Tenti的Gr\"obner基础算法推广到具有有限正则度的任意多项式系统。特别是，存在这种算法意味着另一种方法来估计DRL Gr\"obner基础计算的复杂性，即通过正则度的程度。在实践中，LWE多项式系统的正则度是未知的，尽管可以始终估计最低可达到的正则度。因此，从设计师的最坏情况角度来看，这种方法为一般的，二进制秘密和二进制错误的LWE提供了亚指数复杂性估计。在Dachman-Soled等人最近的研究中，分析了LWE在存在侧信息的情况下的困难性。利用他们的框架，我们讨论了如何将提示信息整合到LWE多项式系统中以及它们如何影响Gr\"obner基础计算的复杂性。

更新时间: 2025-03-29 09:44:21

领域: cs.CR

下载: http://arxiv.org/abs/2402.07852v3

Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning

Multimodal large language models (MLLMs) act as essential interfaces, connecting humans with AI technologies in multimodal applications. However, current MLLMs face challenges in accurately interpreting object orientation in images due to inconsistent orientation annotations in training data, hindering the development of a coherent orientation understanding. To overcome this, we propose egocentric instruction tuning, which aligns MLLMs' orientation understanding with the user's perspective, based on a consistent annotation standard derived from the user's egocentric viewpoint. We first generate egocentric instruction data that leverages MLLMs' ability to recognize object details and applies prior knowledge for orientation understanding. Using this data, we perform instruction tuning to enhance the model's capability for accurate orientation interpretation. In addition, we introduce EgoOrientBench, a benchmark that evaluates MLLMs' orientation understanding across three tasks using images collected from diverse domains. Experimental results on this benchmark show that egocentric instruction tuning significantly improves orientation understanding without compromising overall MLLM performance. The instruction data and benchmark dataset are available on our project page at https://github.com/jhCOR/EgoOrientBench.

Updated: 2025-03-29 09:24:00

标题: “‘对的’是否正确？通过以自我为中心的指导调整，增强多模态大语言模型中的对象导向理解”

摘要: 多模态大型语言模型（MLLMs）充当重要的接口，将人类与多模态应用中的人工智能技术连接起来。然而，当前的MLLMs在准确解释图像中物体方向方面面临挑战，这是由于训练数据中方向标注不一致，阻碍了统一方向理解的发展。为了克服这一问题，我们提出了以自我为中心的指导调整，根据用户自我为中心的视角导出的一致标注标准，将MLLMs的方向理解与用户的视角对齐。我们首先生成利用MLLMs识别物体细节的能力并应用先验知识进行方向理解的自我为中心指导数据。利用这些数据，我们执行指导调整来增强模型对准确方向解释的能力。此外，我们引入了EgoOrientBench，一个基准，用于评估MLLMs在从不同领域收集的图像上的方向理解能力。在这个基准上的实验结果显示，自我为中心的指导调整显著提高了方向理解能力，而不会影响整体MLLM性能。指导数据和基准数据集可在我们的项目页面上找到：https://github.com/jhCOR/EgoOrientBench。

更新时间: 2025-03-29 09:24:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.16761v2

CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization

Deep reinforcement learning (DRL) has gained widespread adoption in control and decision-making tasks due to its strong performance in dynamic environments. However, DRL agents are vulnerable to noisy observations and adversarial attacks, and concerns about the adversarial robustness of DRL systems have emerged. Recent efforts have focused on addressing these robustness issues by establishing rigorous theoretical guarantees for the returns achieved by DRL agents in adversarial settings. Among these approaches, policy smoothing has proven to be an effective and scalable method for certifying the robustness of DRL agents. Nevertheless, existing certifiably robust DRL relies on policies trained with simple Gaussian augmentations, resulting in a suboptimal trade-off between certified robustness and certified return. To address this issue, we introduce a novel paradigm dubbed \texttt{C}ertified-r\texttt{A}dius-\texttt{M}aximizing \texttt{P}olicy (\texttt{CAMP}) training. \texttt{CAMP} is designed to enhance DRL policies, achieving better utility without compromising provable robustness. By leveraging the insight that the global certified radius can be derived from local certified radii based on training-time statistics, \texttt{CAMP} formulates a surrogate loss related to the local certified radius and optimizes the policy guided by this surrogate loss. We also introduce \textit{policy imitation} as a novel technique to stabilize \texttt{CAMP} training. Experimental results demonstrate that \texttt{CAMP} significantly improves the robustness-return trade-off across various tasks. Based on the results, \texttt{CAMP} can achieve up to twice the certified expected return compared to that of baselines. Our code is available at https://github.com/NeuralSec/camp-robust-rl.

Updated: 2025-03-29 09:11:42

标题: 《奥德赛中的CAMP：具有认证半径最大化的可证明鲁棒强化学习》

摘要: 深度强化学习（DRL）在控制和决策任务中得到了广泛应用，因为它在动态环境中表现出色。然而，DRL代理容易受到嘈杂观测和对抗性攻击的影响，对DRL系统的对抗鲁棒性引起了关注。最近的努力集中在通过为DRL代理在对抗环境中实现的回报建立严格的理论保证来解决这些鲁棒性问题。在这些方法中，策略平滑已被证明是一种有效且可扩展的方法，用于证明DRL代理的鲁棒性。然而，现有的可证明鲁棒DRL依赖于使用简单的高斯增强训练的策略，导致在可证明鲁棒性和可证明回报之间存在次优的权衡。为了解决这个问题，我们引入了一种称为\texttt{C}ertified-r\texttt{A}dius-\texttt{M}aximizing \texttt{P}olicy（\texttt{CAMP}）训练的新范式。\texttt{CAMP}旨在增强DRL策略，实现更好的效用，而不损害可证明的鲁棒性。通过利用全局可证明半径可以根据训练时统计数据推导出局部可证明半径的见解，\texttt{CAMP}制定了一个与局部可证明半径相关的替代损失，并根据这个替代损失来优化策略。我们还引入了“策略模仿”作为一种稳定\texttt{CAMP}训练的新技术。实验结果表明，\texttt{CAMP}显著改善了各种任务中的鲁棒性-回报权衡。根据结果，\texttt{CAMP}可以实现比基线更高达两倍的可证明预期回报。我们的代码可在https://github.com/NeuralSec/camp-robust-rl找到。

更新时间: 2025-03-29 09:11:42

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2501.17667v2

Minerva: A File-Based Ransomware Detector

Ransomware attacks have caused billions of dollars in damages in recent years, and are expected to cause billions more in the future. Consequently, significant effort has been devoted to ransomware detection and mitigation. Behavioral-based ransomware detection approaches have garnered considerable attention recently. These behavioral detectors typically rely on process-based behavioral profiles to identify malicious behaviors. However, with an increasing body of literature highlighting the vulnerability of such approaches to evasion attacks, a comprehensive solution to the ransomware problem remains elusive. This paper presents Minerva, a novel, robust approach to ransomware detection. Minerva is engineered to be robust by design against evasion attacks, with architectural and feature selection choices informed by their resilience to adversarial manipulation. We conduct a comprehensive analysis of Minerva across a diverse spectrum of ransomware types, encompassing unseen ransomware as well as variants designed specifically to evade Minerva. Our evaluation showcases the ability of Minerva to accurately identify ransomware, generalize to unseen threats, and withstand evasion attacks. Furthermore, over 99% of detected ransomware are identified within 0.52sec of activity, enabling the adoption of data loss prevention techniques with near-zero overhead.

Updated: 2025-03-29 09:07:43

标题: 弥涅瓦：基于文件的勒索软件检测器

摘要: 勒索软件攻击近年来已经造成数十亿美元的损失，并预计将在未来造成数十亿美元的损失。因此，人们已经付出了大量努力来进行勒索软件的检测和缓解。基于行为的勒索软件检测方法最近引起了相当大的关注。这些行为检测器通常依赖于基于进程的行为配置文件来识别恶意行为。然而，随着越来越多的文献强调这些方法对规避攻击的脆弱性，对勒索软件问题的全面解决方案仍然难以把握。本文介绍了Minerva，一种新颖、强大的勒索软件检测方法。Minerva的设计旨在对抗规避攻击，其架构和特征选择选择受到对抗性操纵的影响。我们对Minerva在各种类型的勒索软件中进行了全面分析，包括未知的勒索软件以及专门设计用于规避Minerva的变种。我们的评估展示了Minerva准确识别勒索软件、对未知威胁进行泛化以及抵御规避攻击的能力。此外，超过99%的检测到的勒索软件在0.52秒内被识别出来，使得可以采用几乎零开销的数据丢失预防技术。

更新时间: 2025-03-29 09:07:43

领域: cs.CR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2301.11050v3

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation

We introduce Presto, a novel video diffusion model designed to generate 15-second videos with long-range coherence and rich content. Extending video generation methods to maintain scenario diversity over long durations presents significant challenges. To address this, we propose a Segmented Cross-Attention (SCA) strategy, which splits hidden states into segments along the temporal dimension, allowing each segment to cross-attend to a corresponding sub-caption. SCA requires no additional parameters, enabling seamless incorporation into current DiT-based architectures. To facilitate high-quality long video generation, we build the LongTake-HD dataset, consisting of 261k content-rich videos with scenario coherence, annotated with an overall video caption and five progressive sub-captions. Experiments show that our Presto achieves 78.5% on the VBench Semantic Score and 100% on the Dynamic Degree, outperforming existing state-of-the-art video generation methods. This demonstrates that our proposed Presto significantly enhances content richness, maintains long-range coherence, and captures intricate textual details. More details are displayed on our project page: https://presto-video.github.io/.

Updated: 2025-03-29 08:56:56

标题: 长视频扩散生成与分段交叉注意力和内容丰富的视频数据策划

摘要: 我们介绍了Presto，一种新颖的视频扩散模型，旨在生成具有长程连贯性和丰富内容的15秒视频。将视频生成方法扩展到在长时间内保持情景多样性存在重大挑战。为了解决这个问题，我们提出了一种分段交叉注意力（SCA）策略，将隐藏状态沿着时间维度分割成段，使每个段可以交叉关注相应的子标题。SCA不需要额外的参数，可以无缝地整合到当前的DiT-based架构中。为了促进高质量的长视频生成，我们构建了LongTake-HD数据集，包括261k个内容丰富的视频，具有情景连贯性，并附有整体视频标题和五个逐步的子标题。实验表明，我们的Presto在VBench语义评分上达到了78.5%，在动态度上达到了100%，胜过了现有的最先进视频生成方法。这表明我们提出的Presto显著增强了内容丰富性，保持了长程连贯性，并捕捉了复杂的文本细节。更多详情请访问我们的项目页面：https://presto-video.github.io/。

更新时间: 2025-03-29 08:56:56

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2412.01316v2

Estimating LLM Uncertainty with Logits

Over the past few years, Large Language Models (LLMs) have developed rapidly and are widely applied in various domains. However, LLMs face the issue of hallucinations, generating responses that may be unreliable when the models lack relevant knowledge. To be aware of potential hallucinations, uncertainty estimation methods have been introduced, and most of them have confirmed that reliability lies in critical tokens. However, probability-based methods perform poorly in identifying token reliability, limiting their practical utility. In this paper, we reveal that the probability-based method fails to estimate token reliability due to the loss of evidence strength information which is accumulated in the training stage. Therefore, we present Logits-induced token uncertainty (LogTokU), a framework for estimating decoupled token uncertainty in LLMs, enabling real-time uncertainty estimation without requiring multiple sampling processes. We employ evidence modeling to implement LogTokU and use the estimated uncertainty to guide downstream tasks. The experimental results demonstrate that LogTokU has significant effectiveness and promise.

Updated: 2025-03-29 08:51:52

标题: 用逻辑回归估计LLM不确定性

摘要: 在过去几年中，大型语言模型（LLMs）迅速发展，并广泛应用于各个领域。然而，LLMs面临幻觉的问题，当模型缺乏相关知识时，生成的响应可能不可靠。为了意识到潜在的幻觉，不确定性估计方法已被引入，大部分方法已证实可靠性存在于关键标记中。然而，基于概率的方法在识别标记可靠性方面表现不佳，限制了它们的实际效用。在本文中，我们揭示了基于概率的方法由于在训练阶段积累的证据强度信息的丢失而无法估计标记可靠性。因此，我们提出了Logits诱导的标记不确定性（LogTokU），这是一个用于估计LLMs中解耦标记不确定性的框架，实现了实时不确定性估计，而无需多个采样过程。我们采用证据建模来实现LogTokU，并使用估计的不确定性来引导下游任务。实验结果表明，LogTokU具有显著的效果和前景。

更新时间: 2025-03-29 08:51:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.00290v3

Pricing Strategies for Different Accuracy Models from the Same Dataset Based on Generalized Hotelling's Law

We consider a scenario where a seller possesses a dataset $D$ and trains it into models of varying accuracies for sale in the market. Due to the reproducibility of data, the dataset can be reused to train models with different accuracies, and the training cost is independent of the sales volume. These two characteristics lead to fundamental differences between the data trading market and traditional trading markets. The introduction of different models into the market inevitably gives rise to competition. However, due to the varying accuracies of these models, traditional multi-oligopoly games are not applicable. We consider a generalized Hotelling's law, where the accuracy of the models is abstracted as distance. Buyers choose to purchase models based on a trade-off between accuracy and price, while sellers determine their pricing strategies based on the market's demand. We present two pricing strategies: static pricing strategy and dynamic pricing strategy, and we focus on the static pricing strategy. We propose static pricing mechanisms based on various market conditions and provide an example. Finally, we demonstrate that our pricing strategy remains robust in the context of incomplete information games.

Updated: 2025-03-29 08:49:42

标题: 根据广义霍特林定律，基于相同数据集的不同准确性模型的定价策略

摘要: 我们考虑了一个卖家拥有数据集$D$并将其训练成不同准确度的模型用于市场销售的情景。由于数据的可再生性，数据集可以被重新使用来训练不同准确度的模型，而训练成本与销售量无关。这两个特征导致数据交易市场与传统交易市场之间存在根本差异。引入不同模型到市场必然带来竞争。然而，由于这些模型的准确度不同，传统的多重寡头垄断游戏并不适用。我们考虑了一个广义的Hotelling定律，其中模型的准确度被抽象为距离。买家根据准确度和价格之间的权衡选择购买模型，而卖家根据市场需求确定其定价策略。我们提出了两种定价策略：静态定价策略和动态定价策略，重点关注静态定价策略。我们基于不同的市场条件提出了静态定价机制，并提供了一个例子。最后，我们证明了我们的定价策略在不完全信息博弈环境中仍然稳健。

更新时间: 2025-03-29 08:49:42

领域: cs.AI

下载: http://arxiv.org/abs/2404.05272v2

Towards Understanding the Optimization Mechanisms in Deep Learning

In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model's parameters, global optimal solutions can be approximated by simultaneously minimizing both the gradient norm and the structural error. The former can be controlled through gradient descent algorithms. For the latter, we prove that it can be managed by increasing the number of parameters and ensuring parameter independence, thereby providing theoretical insights into mechanisms such as over-parameterization and random initialization. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, illustrating its practical effectiveness.

Updated: 2025-03-29 08:46:13

标题: 朝向理解深度学习中的优化机制

摘要: 在这篇论文中，我们采用了概率分布估计的视角来探讨使用深度神经网络进行监督分类的优化机制。我们证明，当采用Fenchel-Young损失时，尽管模型参数的拟合误差具有非凸性，全局最优解可以通过同时最小化梯度范数和结构误差来近似。前者可以通过梯度下降算法控制。对于后者，我们证明可以通过增加参数数量并确保参数独立性来管理，从而为过参数化和随机初始化等机制提供理论洞见。最终，该论文通过实证结果验证了提出方法的关键结论，展示了其实际有效性。

更新时间: 2025-03-29 08:46:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.23016v1

Engineering Microbial Symbiosis for Mars Habitability

The colonization of Mars presents extraordinary challenges, including radiation exposure, low atmospheric pressure, and toxic regolith. Recent advancements in synthetic biology and genetic engineering offer unprecedented opportunities to address these obstacles by utilizing terrestrial extremophiles and engineered organisms. This paper examines the potential for creating symbiotic relationships between terrestrial microbes and hypothetical Martian life forms, should they exist, to support a sustainable human presence on Mars. Inspired by natural examples of endosymbiosis, such as mitochondria and chloroplasts, we propose methods to engineer life forms capable of enduring Martian conditions. Key components include experimental designs, laboratory simulations, and bioengineering approaches essential to this endeavor. The ethical, political, and technological challenges of introducing engineered life to Mars are critically evaluated, with an emphasis on international collaboration and robust planetary protection policies. This research underscores engineered symbiosis as a transformative strategy for enabling life to adapt and thrive on Mars while advancing humanity's aspirations for interplanetary habitation and exploration. By addressing these challenges, this work highlights a path toward sustainable life on Mars, reflecting both scientific ingenuity and ethical stewardship.

Updated: 2025-03-29 08:44:42

标题: 工程微生物共生以实现火星生存条件

摘要: 火星殖民面临着巨大的挑战，包括辐射暴露、低大气压和有毒的岩屑。近年来，合成生物学和基因工程的进展为利用地球极端环境中的生物和经过改良的生物克服这些障碍提供了前所未有的机会。本文探讨了在火星上创建地球微生物与假设存在的火星生命形式之间的共生关系的潜力，以支持可持续的人类在火星上的存在。受线粒体和叶绿体等内共生的自然例子的启发，我们提出了工程生物形式以适应火星条件的方法。关键组成部分包括实验设计、实验室模拟和对此努力至关重要的生物工程方法。引入工程生命到火星所面临的道德、政治和技术挑战得到了批判性评估，重点放在国际合作和强有力的行星保护政策上。这项研究强调了工程共生作为一种变革性策略，可以使生命适应并在火星上茁壮成长，同时推进人类对星际居住和探索的愿望。通过解决这些挑战，这项工作突显了通往火星可持续生命的道路，既体现了科学的创造力，也体现了道德的管理。

更新时间: 2025-03-29 08:44:42

领域: astro-ph.EP,cs.LG

下载: http://arxiv.org/abs/2503.23015v1

TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment

Multivariate time series forecasting (MTSF) aims to learn temporal dynamics among variables to forecast future time series. Existing statistical and deep learning-based methods suffer from limited learnable parameters and small-scale training data. Recently, large language models (LLMs) combining time series with textual prompts have achieved promising performance in MTSF. However, we discovered that current LLM-based solutions fall short in learning disentangled embeddings. We introduce TimeCMA, an intuitive yet effective framework for MTSF via cross-modality alignment. Specifically, we present a dual-modality encoding with two branches: the time series encoding branch extracts disentangled yet weak time series embeddings, and the LLM-empowered encoding branch wraps the same time series with text as prompts to obtain entangled yet robust prompt embeddings. As a result, such a cross-modality alignment retrieves both disentangled and robust time series embeddings, "the best of two worlds", from the prompt embeddings based on time series and prompt modality similarities. As another key design, to reduce the computational costs from time series with their length textual prompts, we design an effective prompt to encourage the most essential temporal information to be encapsulated in the last token: only the last token is passed to downstream prediction. We further store the last token embeddings to accelerate inference speed. Extensive experiments on eight real datasets demonstrate that TimeCMA outperforms state-of-the-arts.

Updated: 2025-03-29 08:44:30

标题: TimeCMA: 通过跨模态对齐实现LLM增强的多变量时间序列预测

摘要: 多变量时间序列预测（MTSF）旨在学习变量之间的时间动态，以预测未来的时间序列。现有的统计和基于深度学习的方法受限于可学习参数有限和规模较小的训练数据。最近，将大型语言模型（LLMs）与文本提示结合起来，在MTSF中取得了令人期待的性能。然而，我们发现当前基于LLM的解决方案在学习解耦嵌入方面存在不足。我们引入了TimeCMA，这是一个直观而有效的MTSF框架，通过跨模态对齐实现。具体来说，我们提出了一个双模态编码，包括两个分支：时间序列编码分支提取解耦但较弱的时间序列嵌入，LLM增强的编码分支将相同的时间序列与文本作为提示封装在一起，以获得纠缠但稳健的提示嵌入。因此，这种跨模态对齐根据时间序列和提示模态的相似性，从提示嵌入中检索出解耦和稳健的时间序列嵌入，“两全其美”。另一个关键设计是，为了减少来自时间序列和长度文本提示的计算成本，我们设计了一个有效的提示，鼓励最关键的时间信息被封装在最后一个令牌中：只有最后一个令牌传递给下游预测。我们进一步存储最后一个令牌的嵌入，以加快推理速度。对八个真实数据集的广泛实验表明，TimeCMA优于现有技术水平。

更新时间: 2025-03-29 08:44:30

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.01638v5

MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation

Motivation: In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species protein function prediction methods are still in the stage of using PPI networks and sequence features. Providing effective cross-species label propagation for species with sparse protein annotations remains a challenging issue. To address this problem, we propose the MSNGO model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction. Results: We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and PPI networks. Availability: https://github.com/blingbell/MSNGO.

Updated: 2025-03-29 08:35:45

标题: MSNGO：基于3D蛋白质结构和网络传播的多物种蛋白功能注释

摘要: 动机：近年来，蛋白质功能预测已经突破了序列特征的瓶颈，通过AlphaFold2预测的高精度蛋白质结构显著提高了预测准确性。虽然单一物种蛋白质功能预测方法取得了显著成功，但多物种蛋白质功能预测方法仍在使用PPI网络和序列特征的阶段。为具有稀疏蛋白质注释的物种提供有效的跨物种标签传播仍然是一个具有挑战性的问题。为了解决这个问题，我们提出了MSNGO模型，该模型集成了结构特征和网络传播方法。我们的验证结果显示，使用结构特征可以显著提高多物种蛋白质功能预测的准确性。结果：我们采用图表示学习技术从蛋白质结构接触图中提取氨基酸表示，并使用图卷积池化模块训练结构模型以推导蛋白质级别的结构特征。在整合ESM-2的序列特征后，我们应用网络传播算法在异质网络中聚合信息并更新节点表示。结果表明，MSNGO优于先前依赖序列特征和PPI网络的多物种蛋白质功能预测方法。可用性：https://github.com/blingbell/MSNGO。

更新时间: 2025-03-29 08:35:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.23014v1

On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation

Text-to-Image (T2I) models often suffer from text-image misalignment in complex scenes involving multiple objects and attributes. Semantic binding aims to mitigate this issue by accurately associating the generated attributes and objects with their corresponding noun phrases (NPs). Existing methods rely on text or latent optimizations, yet the factors influencing semantic binding remain underexplored. Here we investigate the geometrical properties of text token embeddings and their cross-attention (CA) maps. We empirically and theoretically analyze that the geometrical properties of token embeddings, specifically both angular distances and norms, play a crucial role in CA map differentiation. Then, we propose \textbf{TeeMo}, a training-free text embedding-aware T2I framework with strong semantic binding. TeeMo consists of Causality-Aware Projection-Out (CAPO) for distinct inter-NP CA maps and Adaptive Token Mixing (ATM) with our loss to enhance inter-NP separation while maintaining intra-NP cohesion in CA maps. Extensive experiments confirm TeeMo consistently outperforms prior arts across diverse baselines and datasets.

Updated: 2025-03-29 08:31:30

标题: 关于文本标记嵌入的几何属性对文本到图像生成中强语义绑定的影响

摘要: 文本到图像（T2I）模型在涉及多个对象和属性的复杂场景中经常出现文本-图像不对齐的问题。语义绑定旨在通过准确地将生成的属性和对象与它们对应的名词短语（NP）关联起来来缓解这一问题。现有方法依赖于文本或潜在的优化，然而影响语义绑定的因素仍未得到充分探讨。在这里，我们调查了文本标记嵌入和它们的交叉注意力（CA）映射的几何属性。我们从经验和理论上分析了标记嵌入的几何属性，特别是角度距离和范数，在CA映射的差异化中发挥关键作用。然后，我们提出了\textbf{TeeMo}，这是一个无需训练的文本嵌入感知T2I框架，具有强大的语义绑定能力。TeeMo包括用于不同NP之间CA映射的因果感知投影输出（CAPO）和自适应标记混合（ATM）与我们的损失一起增强NP之间的分离，同时保持CA映射中NP内的凝聚力。广泛的实验证实TeeMo在各种基准和数据集上始终优于先前的方法。

更新时间: 2025-03-29 08:31:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.23011v1

Learning Structure-enhanced Temporal Point Processes with Gromov-Wasserstein Regularization

Real-world event sequences are often generated by different temporal point processes (TPPs) and thus have clustering structures. Nonetheless, in the modeling and prediction of event sequences, most existing TPPs ignore the inherent clustering structures of the event sequences, leading to the models with unsatisfactory interpretability. In this study, we learn structure-enhanced TPPs with the help of Gromov-Wasserstein (GW) regularization, which imposes clustering structures on the sequence-level embeddings of the TPPs in the maximum likelihood estimation framework.In the training phase, the proposed method leverages a nonparametric TPP kernel to regularize the similarity matrix derived based on the sequence embeddings. In large-scale applications, we sample the kernel matrix and implement the regularization as a Gromov-Wasserstein (GW) discrepancy term, which achieves a trade-off between regularity and computational efficiency.The TPPs learned through this method result in clustered sequence embeddings and demonstrate competitive predictive and clustering performance, significantly improving the model interpretability without compromising prediction accuracy.

Updated: 2025-03-29 07:47:21

标题: 学习具有Gromov-Wasserstein正则化的结构增强型时间点过程

摘要: 实际世界中的事件序列通常由不同的时间点过程（TPPs）生成，因此具有聚类结构。然而，在事件序列的建模和预测中，大多数现有的TPPs忽略了事件序列的固有聚类结构，导致模型的可解释性不佳。在这项研究中，我们借助Gromov-Wasserstein（GW）正则化来学习增强结构的TPPs，该正则化在最大似然估计框架中对TPPs的序列级嵌入施加聚类结构。在训练阶段，所提出的方法利用非参数TPP核来正则化基于序列嵌入导出的相似性矩阵。在大规模应用中，我们对核矩阵进行采样并将正则化作为Gromov-Wasserstein（GW）差异项实施，从而在规则性和计算效率之间达到平衡。通过这种方法学习的TPPs产生了聚类的序列嵌入，并展示出竞争性的预测和聚类性能，显著提高了模型的可解释性，而不会损害预测准确性。

更新时间: 2025-03-29 07:47:21

领域: cs.LG,cs.AI,60G55, 62M10

下载: http://arxiv.org/abs/2503.23002v1

Buyer-Initiated Auction Mechanism for Data Redemption in Machine Unlearning

The rapid growth of artificial intelligence (AI) has raised privacy concerns over user data, leading to regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). With the essential toolbox provided by machine unlearning, AI service providers are now able to remove user data from their trained models as well as the training datasets, so as to comply with such regulations. However, extensive data redemption can be costly and degrade model accuracy. To balance the cost of unlearning and the privacy protection, we propose a buyer-initiated auction mechanism for data redemption, enabling the service provider to purchase data from willing users with appropriate compensation. This approach does not require the server to have any a priori knowledge about the users' privacy preference, and provides an efficient solution for maximizing the social welfare in the investigated problem.

Updated: 2025-03-29 07:44:34

标题: 买家发起的拍卖机制用于机器学习中的数据赎回

摘要: 人工智能（AI）的快速发展引起了对用户数据隐私的担忧，导致了《通用数据保护条例》（GDPR）和《加利福尼亚消费者隐私权法》（CCPA）等法规的出台。借助机器遗忘提供的基本工具，AI服务提供商现在可以从其训练模型以及训练数据集中删除用户数据，以便遵守这些法规。然而，大量的数据赎回可能成本高昂并降低模型准确性。为了平衡遗忘成本和隐私保护，我们提出了一个由买方发起的数据赎回拍卖机制，使服务提供商能够以适当的补偿从愿意的用户那里购买数据。这种方法不需要服务器事先了解用户的隐私偏好，并为解决所研究的问题提供了最大化社会福利的高效解决方案。

更新时间: 2025-03-29 07:44:34

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2503.23001v1

Entropy-Reinforced Planning with Large Language Models for Drug Discovery

The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused token, or suboptimal due to unbalanced exploration and exploitation as a consequence of the LLMs prior experience. Here we propose ERP, Entropy-Reinforced Planning for Transformer Decoding, which employs an entropy-reinforced planning algorithm to enhance the Transformer decoding process and strike a balance between exploitation and exploration. ERP aims to achieve improvements in multiple properties compared to direct sampling from the Transformer. We evaluated ERP on the SARS-CoV-2 virus (3CLPro) and human cancer cell target protein (RTCB) benchmarks and demonstrated that, in both benchmarks, ERP consistently outperforms the current state-of-the-art algorithm by 1-5 percent, and baselines by 5-10 percent, respectively. Moreover, such improvement is robust across Transformer models trained with different objectives. Finally, to further illustrate the capabilities of ERP, we tested our algorithm on three code generation benchmarks and outperformed the current state-of-the-art approach as well. Our code is publicly available at: https://github.com/xuefeng-cs/ERP.

Updated: 2025-03-29 07:27:37

标题: 基于大型语言模型的熵增强计划用于药物发现

摘要: 药物发现的目标是识别具有特定药用性质的化合物，以对特定的结合靶点产生作用。现有的大型语言模型(LLMS)可以在分子生成的可能性方面达到高的标记匹配分数。然而，仅依赖LLM解码通常会导致生成的分子要么无效，由于一个误用的标记，要么是次优的，由于LLM先前经验的不平衡探索和开发的结果。在这里，我们提出了ERP，Entropy-Reinforced Planning for Transformer Decoding，它采用熵强化规划算法来增强Transformer解码过程，并在开发和探索之间取得平衡。ERP旨在比直接从Transformer中采样实现多种性能的改进。我们在SARS-CoV-2病毒(3CLPro)和人类癌细胞靶蛋白(RTCB)基准测试中评估了ERP，并证明，在两个基准测试中，ERP始终优于当前最先进的算法1-5%，分别优于基线5-10%。此外，这种改进在经过不同目标训练的Transformer模型之间是稳健的。最后，为了进一步说明ERP的能力，我们在三个代码生成基准测试中测试了我们的算法，并且也胜过了当前最先进的方法。我们的代码可以在以下网址公开获取：https://github.com/xuefeng-cs/ERP。

更新时间: 2025-03-29 07:27:37

领域: cs.LG,cs.AI,q-bio.QM,stat.ML

下载: http://arxiv.org/abs/2406.07025v2

AuditVotes: A Framework Towards More Deployable Certified Robustness for Graph Neural Networks

Despite advancements in Graph Neural Networks (GNNs), adaptive attacks continue to challenge their robustness. Certified robustness based on randomized smoothing has emerged as a promising solution, offering provable guarantees that a model's predictions remain stable under adversarial perturbations within a specified range. However, existing methods face a critical trade-off between accuracy and robustness, as achieving stronger robustness requires introducing greater noise into the input graph. This excessive randomization degrades data quality and disrupts prediction consistency, limiting the practical deployment of certifiably robust GNNs in real-world scenarios where both accuracy and robustness are essential. To address this challenge, we propose \textbf{AuditVotes}, the first framework to achieve both high clean accuracy and certifiably robust accuracy for GNNs. It integrates randomized smoothing with two key components, \underline{au}gmentation and con\underline{dit}ional smoothing, aiming to improve data quality and prediction consistency. The augmentation, acting as a pre-processing step, de-noises the randomized graph, significantly improving data quality and clean accuracy. The conditional smoothing, serving as a post-processing step, employs a filtering function to selectively count votes, thereby filtering low-quality predictions and improving voting consistency. Extensive experimental results demonstrate that AuditVotes significantly enhances clean accuracy, certified robustness, and empirical robustness while maintaining high computational efficiency. Notably, compared to baseline randomized smoothing, AuditVotes improves clean accuracy by $437.1\%$ and certified accuracy by $409.3\%$ when the attacker can arbitrarily insert $20$ edges on the Cora-ML datasets, representing a substantial step toward deploying certifiably robust GNNs in real-world applications.

Updated: 2025-03-29 07:27:32

标题: 审计投票：一个旨在为图神经网络提供更可部署认证鲁棒性的框架

摘要: 尽管图神经网络（GNNs）取得了进展，但自适应攻击仍然挑战着它们的稳健性。基于随机平滑的认证稳健性已经成为一种有前途的解决方案，提供了可证明的保证，即模型的预测在指定范围内对抗性扰动下保持稳定。然而，现有方法在准确性和稳健性之间面临着一个关键的权衡，因为实现更强的稳健性需要向输入图中引入更大的噪音。这种过度的随机化降低了数据质量，破坏了预测的一致性，限制了在实际场景中部署具有认证稳健性的GNNs时的实用性，其中准确性和稳健性都是必不可少的。为了解决这一挑战，我们提出\textbf{AuditVotes}，这是第一个为GNNs实现高清洁准确性和可证明的稳健准确性的框架。它将随机平滑与两个关键组件，增强和条件平滑结合起来，旨在提高数据质量和预测一致性。增强作为一个预处理步骤，去噪随机图，显著提高数据质量和清洁准确性。条件平滑作为后处理步骤，利用过滤函数选择性计数投票，从而过滤低质量的预测，提高投票一致性。大量实验结果表明，AuditVotes显著提高了清洁准确性，认证稳健性和经验稳健性，同时保持了高计算效率。值得注意的是，与基准随机平滑相比，当攻击者可以在Cora-ML数据集上任意插入20条边时，AuditVotes将清洁准确性提高了437.1％，认证准确性提高了409.3％，这代表了在实际应用中部署具有认证稳健性的GNNs的重要进展。

更新时间: 2025-03-29 07:27:32

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.22998v1

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

Protein backbone generation plays a central role in de novo protein design and is significant for many biological and medical applications. Although diffusion and flow-based generative models provide potential solutions to this challenging task, they often generate proteins with undesired designability and suffer computational inefficiency. In this study, we propose a novel rectified quaternion flow (ReQFlow) matching method for fast and high-quality protein backbone generation. In particular, our method generates a local translation and a 3D rotation from random noise for each residue in a protein chain, which represents each 3D rotation as a unit quaternion and constructs its flow by spherical linear interpolation (SLERP) in an exponential format. We train the model by quaternion flow (QFlow) matching with guaranteed numerical stability and rectify the QFlow model to accelerate its inference and improve the designability of generated protein backbones, leading to the proposed ReQFlow model. Experiments show that ReQFlow achieves state-of-the-art performance in protein backbone generation while requiring much fewer sampling steps and significantly less inference time (e.g., being 37x faster than RFDiffusion and 62x faster than Genie2 when generating a backbone of length 300), demonstrating its effectiveness and efficiency. The code is available at https://github.com/AngxiaoYue/ReQFlow.

Updated: 2025-03-29 07:16:54

标题: ReQFlow：用于高效和高质量蛋白质主链生成的矫正四元数流

摘要: 蛋白质主干的生成在全新蛋白质设计中起着核心作用，并且对许多生物和医学应用具有重要意义。尽管扩散和基于流的生成模型为这一具有挑战性的任务提供了潜在解决方案，但它们经常生成具有不良设计能力的蛋白质，并且计算效率低下。在本研究中，我们提出了一种新颖的矫正四元数流（ReQFlow）匹配方法，用于快速高质量的蛋白质主干生成。具体而言，我们的方法为蛋白质链中的每个残基从随机噪声生成局部平移和3D旋转，将每个3D旋转表示为单位四元数，并通过球形线性插值（SLERP）以指数格式构建其流。我们通过四元数流（QFlow）匹配训练模型，保证数值稳定性，并矫正QFlow模型以加速推断并提高生成的蛋白质主干的设计能力，从而导致提出的ReQFlow模型。实验证明，ReQFlow在蛋白质主干生成中实现了最先进的性能，同时需要更少的采样步骤和显著较少的推断时间（例如，在生成长度为300的主干时比RFDiffusion快37倍，比Genie2快62倍），展示了其有效性和效率。代码可在https://github.com/AngxiaoYue/ReQFlow获得。

更新时间: 2025-03-29 07:16:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.14637v2

Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method

In recent years, deep learning has greatly streamlined the process of manipulating photographic face images. Aware of the potential dangers, researchers have developed various tools to spot these counterfeits. Yet, none asks the fundamental question: \textit{What digital manipulations make a real photographic face image fake, while others do not?} In this paper, we put face forgery in a semantic context and define that \textit{computational methods that alter semantic face attributes to exceed human discrimination thresholds are sources of face forgery}. Following our definition, we construct a large face forgery image dataset, where each image is associated with a set of labels organized in a hierarchical graph. Our dataset enables two new testing protocols to probe the generalizability of face forgery detectors. Moreover, we propose a semantics-oriented face forgery detection method that captures label relations and prioritizes the primary task (\ie, real or fake face detection). We show that the proposed dataset successfully exposes the weaknesses of current detectors as the test set and consistently improves their generalizability as the training set. Additionally, we demonstrate the superiority of our semantics-oriented method over traditional binary and multi-class classification-based detectors.

Updated: 2025-03-29 07:00:42

标题: 面部伪造的语义上下文化：一个新的定义、数据集和检测方法

摘要: 近年来，深度学习极大地简化了操纵面部照片图像的过程。研究人员意识到潜在的危险，开发了各种工具来发现这些伪造品。然而，没有人提出一个基本问题：\textit{哪些数字操作使真实的面部照片图像变为伪造品，而其他操作则不会？}在本文中，我们将面部伪造置于语义上下文中，并定义\textit{将语义面部属性改变到超过人类辨别阈值的计算方法是面部伪造的来源}。根据我们的定义，我们构建了一个大型面部伪造图像数据集，其中每个图像都与一个组织在层次图中的标签集相关联。我们的数据集使得可以使用两种新的测试协议来探究面部伪造检测器的泛化能力。此外，我们提出了一种以语义为导向的面部伪造检测方法，该方法捕捉标签关系并优先考虑主要任务（即，真实或伪造面部检测）。我们展示了所提出的数据集成功地暴露了当前检测器的弱点，同时作为训练集不断提高了它们的泛化能力。此外，我们展示了我们的以语义为导向的方法相对于传统的基于二进制和多类分类的检测器的优越性。

更新时间: 2025-03-29 07:00:42

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2405.08487v2

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Despite the widespread adoption of Vision-Language Understanding (VLU) benchmarks such as VQA v2, OKVQA, A-OKVQA, GQA, VCR, SWAG, and VisualCOMET, our analysis reveals a pervasive issue affecting their integrity: these benchmarks contain samples where answers rely on assumptions unsupported by the provided context. Training models on such data foster biased learning and hallucinations as models tend to make similar unwarranted assumptions. To address this issue, we collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions. Strong improvements across multiple benchmarks demonstrate the effectiveness of our approach. Further, we develop a general-purpose Context-AwaRe Abstention (CARA) detector to identify samples lacking sufficient context and enhance model accuracy by abstaining from responding if the required context is absent. CARA exhibits generalization to new benchmarks it wasn't trained on, underscoring its utility for future VLU benchmarks in detecting or cleaning samples with inadequate context. Finally, we curate a Context Ambiguity and Sufficiency Evaluation (CASE) set to benchmark the performance of insufficient context detectors. Overall, our work represents a significant advancement in ensuring that vision-language models generate trustworthy and evidence-based outputs in complex real-world scenarios.

Updated: 2025-03-29 07:00:30

标题: 在检测缺乏上下文的多模态情况并避免无根据的预测中。

摘要: 尽管Vision-Language Understanding（VLU）基准，如VQA v2、OKVQA、A-OKVQA、GQA、VCR、SWAG和VisualCOMET被广泛采用，但我们的分析揭示了一个影响它们完整性的普遍问题：这些基准包含一些样本，其中答案依赖于提供的上下文不支持的假设。在这样的数据上训练模型会促进有偏见的学习和幻觉，因为模型倾向于做出类似的不合理假设。为了解决这个问题，我们在每个样本中收集上下文数据（在可用时），并训练一个上下文选择模块来促进基于证据的模型预测。在多个基准测试中取得显著改进，证明了我们方法的有效性。此外，我们开发了一个通用的Context-AwaRe Abstention（CARA）检测器，用于识别缺乏足够上下文的样本，并通过在没有所需上下文的情况下弃权来增强模型的准确性。CARA能够推广到未经训练的新基准测试，强调了它在未来VLU基准测试中检测或清理缺乏足够上下文的样本的实用性。最后，我们创建了一个Context Ambiguity and Sufficiency Evaluation（CASE）集来评估上下文不足检测器的性能。总的来说，我们的工作代表了在确保视觉语言模型在复杂现实场景中生成可信赖和基于证据的输出方面的重大进步。

更新时间: 2025-03-29 07:00:30

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2405.11145v4

FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research

As AI models tackle increasingly complex problems, ensuring reliable human oversight becomes more challenging due to the difficulty of verifying solutions. Approaches to scaling AI supervision include debate, in which two agents engage in structured dialogue to help a judge evaluate claims; critique, in which models identify potential flaws in proposed solutions; and prover-verifier games, in which a capable 'prover' model generates solutions that must be verifiable by a less capable 'verifier'. Evaluations of the scalability of these and similar approaches to difficult problems benefit from datasets that include (1) long-form expert-verified correct solutions and (2) long-form flawed solutions with annotations highlighting specific errors, but few are available. To address this gap, we present FindTheFlaws, a group of five diverse datasets spanning medicine, mathematics, science, coding, and the Lojban language. Each dataset contains questions and long-form solutions with expert annotations validating their correctness or identifying specific error(s) in the reasoning. We evaluate frontier models' critiquing capabilities and observe a range of performance that can be leveraged for scalable oversight experiments: models performing more poorly on particular datasets can serve as judges/verifiers for more capable models. Additionally, for some task/dataset combinations, expert baselines exceed even top model performance, making them more beneficial for scalable oversight experiments.

Updated: 2025-03-29 06:38:30

标题: 发现缺陷：带注释的错误，用于检测错误推理和可扩展的监督研究

摘要: 随着人工智能模型处理越来越复杂的问题，确保可靠的人类监督变得更具挑战性，因为验证解决方案的难度增加。扩展人工智能监督的方法包括辩论，其中两个代理人进行结构化对话以帮助评判者评估主张；批判，模型识别提出解决方案中的潜在缺陷；以及证明者-验证者游戏，其中能力强的“证明者”模型生成必须由能力较弱的“验证者”验证的解决方案。对这些以及类似方法在困难问题上的可扩展性的评估受益于包含（1）长篇专家验证的正确解决方案和（2）长篇有缺陷的解决方案的数据集，其中包含突出特定错误的注释，但可用数据集很少。为了弥补这一差距，我们提出了FindTheFlaws，一个涵盖医学、数学、科学、编码和洛布杉语等五个多样数据集的群体。每个数据集都包含问题和长篇解决方案，专家注释验证其正确性或识别推理中的具体错误。我们评估了前沿模型的批评能力，并观察到一系列性能，可用于可扩展监督实验：在特定数据集上表现较差的模型可以作为更有能力模型的评判者/验证者。此外，对于一些任务/数据集组合，专家基线甚至超过顶级模型的性能，使它们对可扩展监督实验更有益。

更新时间: 2025-03-29 06:38:30

领域: cs.AI,cs.CL,I.2

下载: http://arxiv.org/abs/2503.22989v1

DC-SGD: Differentially Private SGD with Dynamic Clipping through Gradient Norm Distribution Estimation

Differentially Private Stochastic Gradient Descent (DP-SGD) is a widely adopted technique for privacy-preserving deep learning. A critical challenge in DP-SGD is selecting the optimal clipping threshold C, which involves balancing the trade-off between clipping bias and noise magnitude, incurring substantial privacy and computing overhead during hyperparameter tuning. In this paper, we propose Dynamic Clipping DP-SGD (DC-SGD), a framework that leverages differentially private histograms to estimate gradient norm distributions and dynamically adjust the clipping threshold C. Our framework includes two novel mechanisms: DC-SGD-P and DC-SGD-E. DC-SGD-P adjusts the clipping threshold based on a percentile of gradient norms, while DC-SGD-E minimizes the expected squared error of gradients to optimize C. These dynamic adjustments significantly reduce the burden of hyperparameter tuning C. The extensive experiments on various deep learning tasks, including image classification and natural language processing, show that our proposed dynamic algorithms achieve up to 9 times acceleration on hyperparameter tuning than DP-SGD. And DC-SGD-E can achieve an accuracy improvement of 10.62% on CIFAR10 than DP-SGD under the same privacy budget of hyperparameter tuning. We conduct rigorous theoretical privacy and convergence analyses, showing that our methods seamlessly integrate with the Adam optimizer. Our results highlight the robust performance and efficiency of DC-SGD, offering a practical solution for differentially private deep learning with reduced computational overhead and enhanced privacy guarantees.

Updated: 2025-03-29 06:27:22

标题: DC-SGD：通过梯度范数分布估计的动态裁剪实现差分私有SGD

摘要: 差分私有随机梯度下降（DP-SGD）是一种广泛采用的隐私保护深度学习技术。DP-SGD中的一个关键挑战是选择最佳的剪切阈值C，这涉及平衡剪切偏差和噪声幅度之间的权衡，在超参数调整过程中会带来相当大的隐私和计算开销。在本文中，我们提出了动态剪切DP-SGD（DC-SGD），这是一个利用差分私有直方图来估计梯度范数分布并动态调整剪切阈值C的框架。我们的框架包括两个新颖的机制：DC-SGD-P和DC-SGD-E。DC-SGD-P根据梯度范数的百分位调整剪切阈值，而DC-SGD-E通过最小化梯度的预期平方误差来优化C。这些动态调整显著减少了超参数调整C的负担。在各种深度学习任务上进行了大量实验，包括图像分类和自然语言处理，结果显示我们提出的动态算法比DP-SGD在超参数调整上加速了最多9倍。在相同的超参数调整隐私预算下，DC-SGD-E比DP-SGD在CIFAR10上能够实现10.62%的准确度提升。我们进行了严格的理论隐私和收敛分析，结果显示我们的方法与Adam优化器无缝集成。我们的结果突显了DC-SGD的稳健性能和效率，为差分私有深度学习提供了一个实用的解决方案，减少了计算开销并增强了隐私保证。

更新时间: 2025-03-29 06:27:22

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.22988v1

PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference

By provisioning inference offloading services, edge inference drives the rapid growth of AI applications at the network edge. However, achieving high task throughput with stringent latency requirements remains a significant challenge. To address this issue, we develop a parameter-sharing AI model loading (PartialLoading) framework for multi-user edge inference, which exploits two key insights: 1) the majority of latency arises from loading AI models into server GPU memory, and 2) different AI models can share a significant number of parameters, for which redundant loading should be avoided. Towards this end, we formulate a joint multi-user scheduling and spectrum bandwidth allocation problem to maximize task throughput by exploiting shared parameter blocks across models. The intuition is to judiciously schedule user requests to reuse the shared parameter blocks between consecutively loaded models, thereby reducing model loading time substantially. To facilitate solution finding, we decouple the problem into two sub-problems, i.e., user scheduling and bandwidth allocation, showing that solving them sequentially is equivalent to solving the original problem. Due to the NP-hardness of the problem, we first study an important special case called the "bottom-layer-sharing" case, where AI models share some bottom layers within clusters, and design a dynamic programming-based algorithm to obtain the optimal solution in polynomial time. For the general case, where shared parameter blocks appear at arbitrary positions within AI models, we propose a greedy heuristic to obtain the sub-optimal solution efficiently. Simulation results demonstrate that the proposed framework significantly improves task throughput under deadline constraints compared with user scheduling without exploiting parameter sharing.

Updated: 2025-03-29 05:58:07

标题: 部分加载：用户调度和带宽分配用于参数共享的边缘推断

摘要: 通过提供推断卸载服务，边缘推断推动了网络边缘的人工智能应用的快速增长。然而，实现在严格的延迟要求下高任务吞吐量仍然是一个重要挑战。为了解决这个问题，我们为多用户边缘推断开发了一个参数共享人工智能模型加载（PartialLoading）框架，利用了两个关键见解：1）大部分延迟来自将人工智能模型加载到服务器GPU内存中，2）不同的人工智能模型可以共享大量参数，应该避免冗余加载。为此，我们制定了一个联合多用户调度和频谱带宽分配问题，通过利用模型之间共享的参数块来最大化任务吞吐量。直觉是明智地安排用户请求，以在连续加载的模型之间重复使用共享的参数块，从而大幅降低模型加载时间。为了促进解决方案的找到，我们将问题分解为两个子问题，即用户调度和带宽分配，证明了按顺序解决它们等同于解决原始问题。由于问题的NP-难度，我们首先研究了一个重要特例，称为“底层共享”情况，其中人工智能模型在集群内共享一些底层，并设计了一种基于动态规划的算法，以在多项式时间内获得最优解。对于一般情况，即共享参数块出现在人工智能模型中的任意位置，我们提出了一种贪婪启发式方法，以有效地获得次优解。模拟结果表明，与不利用参数共享的用户调度相比，所提出的框架在截止时间约束下显著提高了任务吞吐量。

更新时间: 2025-03-29 05:58:07

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2503.22982v1

Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation

Guided diffusion-model generation is a promising direction for customizing the generation process of a pre-trained diffusion model to address specific downstream tasks. Existing guided diffusion models either rely on training the guidance model with pre-collected datasets or require the objective functions to be differentiable. However, for most real-world tasks, offline datasets are often unavailable, and their objective functions are often not differentiable, such as image generation with human preferences, molecular generation for drug discovery, and material design. Thus, we need an $\textbf{online}$ algorithm capable of collecting data during runtime and supporting a $\textbf{black-box}$ objective function. Moreover, the $\textbf{query efficiency}$ of the algorithm is also critical because the objective evaluation of the query is often expensive in real-world scenarios. In this work, we propose a novel and simple algorithm, $\textbf{Fast Direct}$, for query-efficient online black-box target generation. Our Fast Direct builds a pseudo-target on the data manifold to update the noise sequence of the diffusion model with a universal direction, which is promising to perform query-efficient guided generation. Extensive experiments on twelve high-resolution ($\small {1024 \times 1024}$) image target generation tasks and six 3D-molecule target generation tasks show $\textbf{6}\times$ up to $\textbf{10}\times$ query efficiency improvement and $\textbf{11}\times$ up to $\textbf{44}\times$ query efficiency improvement, respectively. Our implementation is publicly available at: https://github.com/kimyong95/guide-stable-diffusion/tree/fast-direct

Updated: 2025-03-29 05:45:56

标题: 快速直达：用于扩散模型目标生成的高效在线黑盒引导

摘要: 引导扩散模型生成是定制预训练扩散模型生成过程的一个有前途的方向，以解决特定的下游任务。现有的引导扩散模型要么依赖于使用预先收集的数据集来训练引导模型，要么要求目标函数具有可微性。然而，对于大多数现实世界的任务而言，离线数据集通常不可用，它们的目标函数通常是不可微的，比如基于人类偏好的图像生成、用于药物发现的分子生成和材料设计。因此，我们需要一种能够在运行时收集数据并支持黑盒目标函数的在线算法。此外，算法的查询效率也至关重要，因为在现实世界的场景中，查询的目标评估通常是昂贵的。在本文中，我们提出了一种新颖且简单的算法Fast Direct，用于支持查询效率高的在线黑盒目标生成。我们的Fast Direct在数据流形上构建一个伪目标，用一个通用方向更新扩散模型的噪声序列，这有望实现查询效率高的引导生成。在十二个高分辨率（1024×1024）图像目标生成任务和六个3D分子目标生成任务上的大量实验显示，分别实现了6倍至10倍和11倍至44倍的查询效率改进。我们的实现可以在以下链接找到：https://github.com/kimyong95/guide-stable-diffusion/tree/fast-direct

更新时间: 2025-03-29 05:45:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.01692v5

RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm

Post-training Quantization (PTQ) has become a widely used technique for improving inference efficiency of large language models (LLMs). However, existing PTQ methods generally suffer from crucial limitations such as heavy calibration data requirements and inflexible choice of target number of bits. In this paper, we propose RaanA, a unified PTQ framework that overcomes these challenges by introducing two novel components: 1) RaBitQ-H, a variant of a randomized vector quantization method RaBitQ, designed for fast, accurate, and highly efficient quantization; and 2) AllocateBits, an algorithm that optimally allocates bit-widths across layers based on their quantization sensitivity. RaanA achieves competitive performance with state-of-the-art quantization methods while being extremely fast, requiring minimal calibration data, and enabling flexible bit allocation. Extensive experiments demonstrate RaanA's efficacy in balancing efficiency and accuracy. The code is publicly available at https://github.com/FFTYYY/RaanA .

Updated: 2025-03-29 05:03:12

标题: RaanA：一种快速、灵活和数据高效的训练后量化算法

摘要: 后训练量化（PTQ）已成为提高大型语言模型（LLMs）推理效率的广泛使用技术。然而，现有的PTQ方法通常受到关键限制，如需要大量校准数据和目标比特数选择不灵活。在本文中，我们提出了RaanA，一个统一的PTQ框架，通过引入两个新颖的组件克服了这些挑战：1）RanBitQ-H，一个随机向量量化方法RanBitQ的变体，旨在快速、准确和高效地量化；2）AllocateBits，一种根据其量化敏感性优化地分配层之间位宽的算法。RaanA在保持极快速度、需要最少校准数据和实现灵活比特分配的同时，实现了与最先进的量化方法相竞争的性能。大量实验证明了RaanA在平衡效率和准确性方面的功效。代码可在https://github.com/FFTYYY/RaanA上公开获取。

更新时间: 2025-03-29 05:03:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.03717v1

Learning Multi-Agent Loco-Manipulation for Long-Horizon Quadrupedal Pushing

Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particularly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room organization. This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub-goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world.

Updated: 2025-03-29 04:50:27

标题: 学习多智能体长时间视野四足推动的运动控制

摘要: 最近，四足动物的运动已经取得了显著的成功，但它们的操纵能力，特别是处理大型物体的能力仍然有限，限制了它们在搜索和救援、建筑、工业自动化和房间组织等要求严格的现实应用中的有效性。本文探讨了多个四足机器人在障碍感知、长期推动任务中的表现。我们提出了一个具有三个控制级别的分层多智能体强化学习框架。高层控制器集成了RRT规划器和一个集中自适应策略，生成子目标，而中层控制器使用了分散的目标条件策略来引导机器人朝向这些子目标。一个预先训练的低级运动策略执行运动命令。我们在模拟中对我们的方法进行了评估，表明与基线方法相比，取得了显著的改进，成功率提高了36.0％，完成时间减少了24.5％，超过了最佳基线方法。我们的框架成功地实现了现实世界中对长期、障碍感知的操纵任务，如在Go1机器人上进行Push-Cuboid和Push-T等任务。

更新时间: 2025-03-29 04:50:27

领域: cs.RO,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2411.07104v4

Ethical AI on the Waitlist: Group Fairness Evaluation of LLM-Aided Organ Allocation

Large Language Models (LLMs) are becoming ubiquitous, promising automation even in high-stakes scenarios. However, existing evaluation methods often fall short -- benchmarks saturate, accuracy-based metrics are overly simplistic, and many inherently ambiguous problems lack a clear ground truth. Given these limitations, evaluating fairness becomes complex. To address this, we reframe fairness evaluation using Borda scores, a method from voting theory, as a nuanced yet interpretable metric for measuring fairness. Using organ allocation as a case study, we introduce two tasks: (1) Choose-One and (2) Rank-All. In Choose-One, LLMs select a single candidate for a kidney, and we assess fairness across demographics using proportional parity. In Rank-All, LLMs rank all candidates for a kidney, reflecting real-world allocation processes. Since traditional fairness metrics do not account for ranking, we propose a novel application of Borda scoring to capture biases. Our findings highlight the potential of voting-based metrics to provide a richer, more multifaceted evaluation of LLM fairness.

Updated: 2025-03-29 04:36:25

标题: 待字AI的伦理：LLM辅助器官分配的群体公平评估

摘要: 大型语言模型（LLMs）正在变得无处不在，承诺即使在高风险情景中也能实现自动化。然而，现有的评估方法往往存在不足之处--基准测试达到饱和，基于准确性的指标过于简单化，许多固有的模糊问题缺乏明确的基准事实。鉴于这些限制，评估公平性变得复杂。为了解决这个问题，我们利用投票理论中的Borda得分方法，重新构建了公平性评估，作为一个细致但可解释的衡量公平性的指标。以器官分配为案例研究，我们引入了两个任务：（1）选择一个和（2）全部排名。在选择一个任务中，LLMs选择一个肾脏候选人，我们使用比例平等来评估不同人口群体之间的公平性。在全部排名任务中，LLMs对所有肾脏候选人进行排名，反映了现实世界的分配过程。由于传统的公平性指标不考虑排名，我们提出了一种新颖的Borda得分应用来捕捉偏见。我们的研究结果突显了基于投票的指标潜力，可以提供更丰富、更多方面地评估LLM的公平性。

更新时间: 2025-03-29 04:36:25

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2504.03716v1

XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation

Cross-lingual open-ended generation -- i.e. generating responses in a desired language different from that of the user's query -- is an important yet understudied problem. We introduce XL-AlpacaEval, a new benchmark for evaluating cross-lingual generation capabilities in Large Language Models (LLMs), and propose XL-Instruct, a high-quality synthetic data generation method. Fine-tuning with just 8K XL-Instruct-generated instructions significantly improves model performance, increasing the win rate against GPT-4o-Mini from 7.4% to 21.5%, and improving on several fine-grained quality metrics. Additionally, models fine-tuned on XL-Instruct exhibit strong zero-shot transfer to both English-only and multilingual generation tasks. Given its consistent gains across the board, we strongly recommend incorporating XL-Instruct in the post-training pipeline of future multilingual LLMs. To facilitate further research, we will publicly and freely release the XL-Instruct and XL-AlpacaEval datasets, which constitute two of the few cross-lingual resources currently available in the literature.

Updated: 2025-03-29 04:34:03

标题: XL-Instruct: 用于跨语言开放式生成的合成数据

摘要: 跨语言开放式生成——即在用户查询的语言不同于所生成响应的语言——是一个重要但研究不足的问题。我们介绍了XL-AlpacaEval，这是一个用于评估大型语言模型（LLMs）跨语言生成能力的新基准，并提出了高质量的合成数据生成方法XL-Instruct。仅使用8K个由XL-Instruct生成的指令进行微调就显著提高了模型性能，将胜率从7.4%提高到21.5%，并在多个细粒度质量指标上得到改进。此外，经过XL-Instruct微调的模型在仅英语和多语言生成任务中表现出强大的零翻译迁移能力。鉴于其在各方面的持续增益，我们强烈建议将XL-Instruct纳入未来多语言LLMs的后训练流程。为促进进一步研究，我们将公开免费发布XL-Instruct和XL-AlpacaEval数据集，这两个数据集是目前文献中少数几个跨语言资源之一。

更新时间: 2025-03-29 04:34:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.22973v1

Enhancing Federated Learning Through Secure Cluster-Weighted Client Aggregation

Federated learning (FL) has emerged as a promising paradigm in machine learning, enabling collaborative model training across decentralized devices without the need for raw data sharing. In FL, a global model is trained iteratively on local datasets residing on individual devices, each contributing to the model's improvement. However, the heterogeneous nature of these local datasets, stemming from diverse user behaviours, device capabilities, and data distributions, poses a significant challenge. The inherent heterogeneity in federated learning gives rise to various issues, including model performance discrepancies, convergence challenges, and potential privacy concerns. As the global model progresses through rounds of training, the disparities in local data quality and quantity can impede the overall effectiveness of federated learning systems. Moreover, maintaining fairness and privacy across diverse user groups becomes a paramount concern. To address this issue, this paper introduces a novel FL framework, ClusterGuardFL, that employs dissimilarity scores, k-means clustering, and reconciliation confidence scores to dynamically assign weights to client updates. The dissimilarity scores between global and local models guide the formation of clusters, with cluster size influencing the weight allocation. Within each cluster, a reconciliation confidence score is calculated for individual data points, and a softmax layer generates customized weights for clients. These weights are utilized in the aggregation process, enhancing the model's robustness and privacy. Experimental results demonstrate the efficacy of the proposed approach in achieving improved model performance in diverse datasets.

Updated: 2025-03-29 04:29:24

标题: 通过安全的聚类加权客户端聚合增强联邦学习

摘要: 联邦学习（FL）已经成为机器学习中一种有前途的范式，实现了跨分散设备的协作模型训练，而无需共享原始数据。在FL中，一个全局模型在各个个体设备上的本地数据集上进行迭代训练，每个设备都对模型的改进做出贡献。然而，这些本地数据集的异质性，源自不同的用户行为、设备能力和数据分布，构成了一个重大挑战。联邦学习的固有异质性导致了各种问题，包括模型性能差异、收敛挑战和潜在的隐私问题。随着全局模型经过一轮又一轮的训练，本地数据质量和数量的差异可能会妨碍联邦学习系统的整体有效性。此外，在各种用户群体之间维持公平性和隐私性成为一个关键问题。为了解决这个问题，本文引入了一个新颖的FL框架ClusterGuardFL，利用不相似度分数、k-means聚类和协调置信度分数来动态分配客户端更新的权重。全局模型和本地模型之间的不相似度分数指导集群的形成，集群大小影响权重分配。在每个集群内，为每个数据点计算一个协调置信度分数，并由一个softmax层生成客户端的自定义权重。这些权重在聚合过程中被利用，增强了模型的鲁棒性和隐私性。实验结果表明，所提出的方法在各种数据集中实现了改进的模型性能。

更新时间: 2025-03-29 04:29:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.22971v1

HRET: A Self-Evolving LLM Evaluation Toolkit for Korean

Recent advancements in Korean large language models (LLMs) have spurred numerous benchmarks and evaluation methodologies, yet the lack of a standardized evaluation framework has led to inconsistent results and limited comparability. To address this, we introduce HRET Haerae Evaluation Toolkit, an open-source, self-evolving evaluation framework tailored specifically for Korean LLMs. HRET unifies diverse evaluation methods, including logit-based scoring, exact-match, language-inconsistency penalization, and LLM-as-a-Judge assessments. Its modular, registry-based architecture integrates major benchmarks (HAE-RAE Bench, KMMLU, KUDGE, HRM8K) and multiple inference backends (vLLM, HuggingFace, OpenAI-compatible endpoints). With automated pipelines for continuous evolution, HRET provides a robust foundation for reproducible, fair, and transparent Korean NLP research.

Updated: 2025-03-29 04:17:58

标题: HRET：一种用于韩文的自适应LLM评估工具包

摘要: 最近韩国大型语言模型（LLMs）的进展推动了许多基准和评估方法，然而缺乏一个标准化的评估框架导致了不一致的结果和有限的可比性。为了解决这个问题，我们介绍了HRET Haerae评估工具包，这是一个针对韩国LLMs量身定制的开源、自我演化的评估框架。HRET统一了各种评估方法，包括基于logit的评分、精确匹配、语言不一致性惩罚以及LLM作为评判者的评估。其模块化、基于注册表的架构集成了主要基准（HAE-RAE Bench，KMMLU，KUDGE，HRM8K）和多个推理后端（vLLM，HuggingFace，兼容OpenAI端点）。通过连续演化的自动化流水线，HRET为可复制、公平和透明的韩国自然语言处理研究提供了坚实的基础。

更新时间: 2025-03-29 04:17:58

领域: cs.CE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.22968v1

Student-Powered Digital Scholarship CoLab Project in the HKUST Library: Develop a Chinese Named-Entity Recognition (NER) Tool within One Semester from the Ground Up

Starting in February 2024, the HKUST Library further extended the scope of AI literacy to AI utilization, which focuses on fostering student involvement in utilizing state-of-the-art technologies in the projects that initiated by the Library, named "Digital Scholarship (DS) CoLab". A key focus of the DS CoLab scheme has been on cultivating talents and enabling students to utilize advanced technologies in practical context. It aims to reinforce the library's role as a catalyst and hub for fostering multidisciplinary collaboration and cultivate the "can do spirit" among university members. The Library offers 1-2 projects per year for students to engage with advanced technologies in practical contexts while supporting the Library in tackling challenges and streamlining operational tasks. The tool that introduced in this paper was mainly developed by two of the authors, Sherry Yip Sau Lai and Berry Han Liuruo, as part-time student helpers under one of our DS CoLab scheme in the 2024 Spring Semester (February to May 2024). This paper details the complete journey from ideation to implementation of developing a Chinese Named-Entity Recognition (NER) Tool from the group up within one semester, from the initial research and planning stages to execution and come up a viable product. The collaborative spirit fostered by this project, with students playing a central role, exemplifies the power and potential of innovative educational models that prioritize hands-on learning with student involvement.

Updated: 2025-03-29 04:15:34

标题: 香港科技大学图书馆学生驱动的数字学术合作项目：在一个学期内从零开始开发中文命名实体识别（NER）工具

摘要: 自2024年2月开始，香港科技大学图书馆进一步将AI素养的范围扩展到AI利用，重点是促进学生参与利用由图书馆发起的项目中的最新技术，名为“数字学术（DS）合作实验室”。DS CoLab计划的一个重点是培养人才，使学生能够在实际环境中利用先进技术。它旨在加强图书馆作为促进跨学科合作和培养大学成员“能够做”精神的催化剂和中心的角色。图书馆每年为学生提供1-2个项目，让他们在实际环境中使用先进技术，同时支持图书馆解决挑战和优化运营任务。本文介绍的工具主要由两位作者Sherry Yip Sau Lai和Berry Han Liuruo开发，他们是2024年春季学期（2024年2月至5月）DS CoLab计划下的兼职学生助手之一。本文详细描述了在一个学期内从构思到实施开发中文命名实体识别（NER）工具的完整过程，从最初的研究和规划阶段到执行并提出可行产品。这个项目培养的合作精神，学生发挥核心作用，体现了注重学生参与的实践学习的创新教育模式的力量和潜力。

更新时间: 2025-03-29 04:15:34

领域: cs.DL,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2503.22967v1

Multimodal machine learning with large language embedding model for polymer property prediction

Contemporary large language models (LLMs), such as GPT-4 and Llama, have harnessed extensive computational power and diverse text corpora to achieve remarkable proficiency in interpreting and generating domain-specific content, including materials science. To leverage the domain knowledge embedded within these models, we propose a simple yet effective multimodal architecture, PolyLLMem, which integrates text embeddings generated by Llama 3 with molecular structure embeddings derived from Uni-Mol, for polymer properties prediction tasks. In our model, Low-rank adaptation (LoRA) layers were also incorporated during the property prediction tasks to refine the embeddings based on our limited polymer dataset, thereby enhancing their chemical relevance for polymer SMILES representation. This balanced fusion of fine-tuned textual and structural information enables PolyLLMem to accurately predict a variety of polymer properties despite the scarcity of training data. Its performance is comparable to, and in some cases exceeds, that of graph-based models, as well as transformer-based models that typically require pretraining on millions of polymer samples. These findings demonstrate that LLM, such as Llama, can effectively capture chemical information encoded in polymer PSMILES, and underscore the efficacy of multimodal fusion of LLM embeddings and molecular structure embeddings in overcoming data scarcity and accelerating the discovery of advanced polymeric materials.

Updated: 2025-03-29 03:48:11

标题: 使用大型语言嵌入模型进行多模态机器学习，用于聚合物性质预测

摘要: 当代大型语言模型（LLMs），例如GPT-4和Llama，利用广泛的计算能力和多样化的文本语料库，在解释和生成特定领域内容方面取得了显著的熟练度，包括材料科学。为了利用这些模型内嵌的领域知识，我们提出了一个简单而有效的多模态架构PolyLLMem，它将Llama 3生成的文本嵌入与从Uni-Mol派生的分子结构嵌入集成在一起，用于聚合物属性预测任务。在我们的模型中，在属性预测任务中还包括了低秩适应（LoRA）层，用于根据我们有限的聚合物数据集调整嵌入，从而增强它们对聚合物SMILES表示的化学相关性。这种平衡的文本和结构信息的融合使得PolyLLMem能够准确预测各种聚合物属性，尽管训练数据很稀缺。其性能与基于图的模型以及通常需要在数百万聚合物样本上进行预训练的变压器模型相当，并在某些情况下超过了它们。这些发现表明，诸如Llama之类的LLM可以有效地捕捉编码在聚合物PSMILES中的化学信息，并强调LLM嵌入和分子结构嵌入的多模态融合在克服数据稀缺性和加速先进聚合物材料的发现方面的功效。

更新时间: 2025-03-29 03:48:11

领域: cs.LG,cond-mat.mtrl-sci,physics.chem-ph

下载: http://arxiv.org/abs/2503.22962v1

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Open-Vocabulary 3D object affordance grounding aims to anticipate ``action possibilities'' regions on 3D objects with arbitrary instructions, which is crucial for robots to generically perceive real scenarios and respond to operational changes. Existing methods focus on combining images or languages that depict interactions with 3D geometries to introduce external interaction priors. However, they are still vulnerable to a limited semantic space by failing to leverage implied invariant geometries and potential interaction intentions. Normally, humans address complex tasks through multi-step reasoning and respond to diverse situations by leveraging associative and analogical thinking. In light of this, we propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding, a novel framework that mines the object invariant geometry attributes and performs analogically reason in potential interaction scenarios to form affordance knowledge, fully combining the knowledge with both geometries and visual contents to ground 3D object affordance. Besides, we introduce the Point Image Affordance Dataset v2 (PIADv2), the largest 3D object affordance dataset at present to support the task. Extensive experiments demonstrate the effectiveness and superiority of GREAT. The code and dataset are available at https://yawen-shao.github.io/GREAT/.

Updated: 2025-03-29 03:46:58

标题: GREAT：几何意图协作推断用于开放词汇3D物体可供性基础

摘要: 开放词汇的3D物体功能接地旨在预测3D物体上的“动作可能性”区域，从而使机器人能够普遍感知真实场景并响应操作变化。现有方法侧重于结合展示与3D几何交互的图像或语言，以引入外部交互先验。然而，它们仍然容易受到有限语义空间的影响，因为未能利用暗示的不变几何和潜在的交互意图。通常，人类通过多步推理解决复杂任务，并通过利用联想和类比思维来应对各种情况。基于此，我们提出了GREAT（GeometRy-intEntion collAboraTive inference）用于开放词汇3D物体功能接地，这是一个新颖的框架，它挖掘物体不变几何属性，并在潜在的交互场景中进行类比推理，形成功能知识，充分结合几何和视觉内容的知识来接地3D物体功能。此外，我们介绍了Point Image Affordance Dataset v2（PIADv2），这是目前最大的3D物体功能数据集，用于支持该任务。大量实验证明了GREAT的有效性和优越性。代码和数据集可在https://yawen-shao.github.io/GREAT/找到。

更新时间: 2025-03-29 03:46:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.19626v2

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Human pose plays a crucial role in the digital age. While recent works have achieved impressive progress in understanding and generating human poses, they often support only a single modality of control signals and operate in isolation, limiting their application in real-world scenarios. This paper presents UniPose, a framework employing Large Language Models (LLMs) to comprehend, generate, and edit human poses across various modalities, including images, text, and 3D SMPL poses. Specifically, we apply a pose tokenizer to convert 3D poses into discrete pose tokens, enabling seamless integration into the LLM within a unified vocabulary. To further enhance the fine-grained pose perception capabilities, we facilitate UniPose with a mixture of visual encoders, among them a pose-specific visual encoder. Benefiting from a unified learning strategy, UniPose effectively transfers knowledge across different pose-relevant tasks, adapts to unseen tasks, and exhibits extended capabilities. This work serves as the first attempt at building a general-purpose framework for pose comprehension, generation, and editing. Extensive experiments highlight UniPose's competitive and even superior performance across various pose-relevant tasks.

Updated: 2025-03-29 03:35:20

标题: UniPose：一个统一的多模态框架，用于人体姿势的理解、生成和编辑

摘要: 人体姿势在数字时代扮演着至关重要的角色。尽管最近的研究在理解和生成人体姿势方面取得了令人瞩目的进展，但它们通常只支持单一模态的控制信号，并且在孤立环境中操作，限制了它们在现实场景中的应用。本文提出了UniPose，一个利用大型语言模型（LLMs）的框架，能够理解、生成和编辑包括图像、文本和3D SMPL姿势在内的各种模态的人体姿势。具体来说，我们应用姿势分词器将3D姿势转换为离散的姿势标记，实现了与统一词汇表中的LLM的无缝集成。为了进一步增强对细粒度姿势感知能力，我们为UniPose提供了一种混合的视觉编码器，其中包括一个姿势特定的视觉编码器。受益于统一的学习策略，UniPose能够有效地在不同的与姿势相关的任务之间转移知识，适应未知任务，并展示出扩展的能力。这项工作是建立一个通用框架来理解、生成和编辑姿势的首次尝试。大量实验证明了UniPose在各种与姿势相关的任务中具有竞争力甚至更优越的性能。

更新时间: 2025-03-29 03:35:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.16781v2

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.

Updated: 2025-03-29 03:32:53

标题: PortLLM：使用无需训练和可移植模型补丁个性化发展中的大型语言模型

摘要: 随着大型语言模型（LLMs）在AI领域中的日益重要，微调预训练模型比在LLM时代之前更受欢迎，以实现在特定领域任务中的最佳性能。然而，预训练的LLMs，如ChatGPT，会定期演进，即模型参数经常更新，这使得资源有限的终端用户难以跟上为其领域应用微调最新LLMs的步伐。尽管由于诸如LoRA之类的参数高效微调的创新，微调成本现在已经降低，但并非所有终端用户都具有足够的计算资源用于频繁个性化。此外，特别是在敏感领域如医疗保健中，获得微调数据集可能会受到时间限制，因此保留在早期微调轮中编码的知识以进行未来适应至关重要。在本文中，我们提出了PortLLM，一个无需培训的框架，它（i）创建一个初始轻量级模型更新补丁，用于捕捉特定领域的知识，（ii）允许对演进的LLM进行连续个性化的无缝插入，成本最小化。我们的广泛实验涵盖了七个代表性数据集，从更简单的问答任务（BoolQ，SST2）到更困难的推理任务（WinoGrande，GSM8K），以及包括Mistral-7B，Llama2，Llama3.1和Gemma2在内的模型，验证了我们设计的模型补丁的可移植性，并展示了我们提出的框架的有效性。例如，PortLLM在GPU内存使用方面取得了与LoRA微调相媲美的性能，降低了高达12.2倍。最后，我们提供理论上的论证来理解我们模型更新补丁的可移植性，这为理解LLMs个性化的理论维度提供了新的见解。

更新时间: 2025-03-29 03:32:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.10870v3

Late Breaking Results: Breaking Symmetry- Unconventional Placement of Analog Circuits using Multi-Level Multi-Agent Reinforcement Learning

Layout-dependent effects (LDEs) significantly impact analog circuit performance. Traditionally, designers have relied on symmetric placement of circuit components to mitigate variations caused by LDEs. However, due to non-linear nature of these effects, conventional methods often fall short. We propose an objective-driven, multi-level, multi-agent Q-learning framework to explore unconventional design space of analog layout, opening new avenues for optimizing analog circuit performance. Our approach achieves better variation performance than the state-of-the-art layout techniques. Notably, this is the first application of multi-agent RL in analog layout automation. The proposed approach is compared with non-ML approach based on simulated annealing.

Updated: 2025-03-29 03:13:56

标题: 最新研究成果：打破对称性-使用多级多代理强化学习的模拟电路非传统布局

摘要: 布局相关效应（LDEs）显著影响模拟电路性能。传统上，设计师们依靠电路元件的对称布局来减轻LDEs引起的变化。然而，由于这些效应的非线性特性，常规方法经常无法达到预期效果。我们提出了一个基于目标驱动的多层次、多代理Q-learning框架，用于探索模拟布局的非传统设计空间，为优化模拟电路性能开辟了新的途径。我们的方法实现了比最先进的布局技术更好的变化性能。值得注意的是，这是多代理强化学习在模拟布局自动化中的首次应用。所提出的方法与基于模拟退火的非机器学习方法进行了比较。

更新时间: 2025-03-29 03:13:56

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2503.22958v1

DDIL: Diversity Enhancing Diffusion Distillation With Imitation Learning

Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes, limiting practicality. Efforts such as progressive distillation or consistency distillation have shown promise by reducing the number of passes at the expense of quality of the generated samples. In this work we identify co-variate shift as one of reason for poor performance of multi-step distilled models from compounding error at inference time. To address co-variate shift, we formulate diffusion distillation within imitation learning (DDIL) framework and enhance training distribution for distilling diffusion models on both data distribution (forward diffusion) and student induced distributions (backward diffusion). Training on data distribution helps to diversify the generations by preserving marginal data distribution and training on student distribution addresses compounding error by correcting covariate shift. In addition, we adopt reflected diffusion formulation for distillation and demonstrate improved performance, stable training across different distillation methods. We show that DDIL consistency improves on baseline algorithms of progressive distillation (PD), Latent consistency models (LCM) and Distribution Matching Distillation (DMD2).

Updated: 2025-03-29 03:05:52

标题: DDIL：使用模仿学习增强多样性的扩散蒸馏

摘要: 扩散模型在生成建模方面表现出色（例如，文本到图像），但采样需要多次去噪网络传递，限制了实用性。诸如渐进蒸馏或一致性蒸馏等努力已经显示出通过减少传递次数来降低生成样本质量的潜力。在这项工作中，我们确定协变量转移是导致多步蒸馏模型在推断时性能不佳的原因之一，由于在推断时造成了复合误差。为了解决协变量转移问题，我们在模仿学习（DDIL）框架内制定了扩散蒸馏，并增强了训练分布，以在数据分布（前向扩散）和学生诱导分布（反向扩散）上对扩散模型进行蒸馏。在数据分布上的训练有助于通过保留边际数据分布来使生成多样化，并在学生分布上进行训练则通过纠正协变量转移来解决复合误差。此外，我们采用了反射扩散公式进行蒸馏，并展示了改进的性能、不同蒸馏方法之间的稳定训练。我们展示了DDIL一致性在渐进蒸馏（PD）、潜在一致性模型（LCM）和分布匹配蒸馏（DMD2）等基线算法上的改进。

更新时间: 2025-03-29 03:05:52

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.11971v2

MNT-TNN: Spatiotemporal Traffic Data Imputation via Compact Multimode Nonlinear Transform-based Tensor Nuclear Norm

Imputation of random or non-random missing data is a long-standing research topic and a crucial application for Intelligent Transportation Systems (ITS). However, with the advent of modern communication technologies such as Global Satellite Navigation Systems (GNSS), traffic data collection has outpaced traditional methods, introducing new challenges in random missing value imputation and increasing demands for spatiotemporal dependency modelings. To address these issues, we propose a novel spatiotemporal traffic imputation method, Multimode Nonlinear Transformed Tensor Nuclear Norm (MNT-TNN), grounded in the Transform-based Tensor Nuclear Norm (TTNN) optimization framework which exhibits efficient mathematical representations and theoretical guarantees for the recovery of random missing values. Specifically, we strictly extend the single-mode transform in TTNN to a multimode transform with nonlinear activation, effectively capturing the intrinsic multimode spatiotemporal correlations and low-rankness of the traffic tensor, represented as location $\times$ location $\times$ time. To solve the nonconvex optimization problem, we design a proximal alternating minimization (PAM) algorithm with theoretical convergence guarantees. We suggest an Augmented Transform-based Tensor Nuclear Norm Families (ATTNNs) framework to enhance the imputation results of TTNN techniques, especially at very high miss rates. Extensive experiments on real datasets demonstrate that our proposed MNT-TNN and ATTNNs can outperform the compared state-of-the-art imputation methods, completing the benchmark of random missing traffic value imputation.

Updated: 2025-03-29 02:58:31

标题: MNT-TNN：基于紧凑多模非线性变换的张量核范数空间时间交通数据填充

摘要: 缺失数据的随机或非随机插补是一个长期存在的研究课题，也是智能交通系统（ITS）的一个关键应用。然而，随着现代通信技术如全球卫星导航系统（GNSS）的出现，交通数据收集已经超越了传统方法，引入了随机缺失值插补的新挑战，并增加了对时空依赖模型的需求。为了解决这些问题，我们提出了一种新颖的时空交通插补方法，基于基于变换的张量核范数（TTNN）优化框架的多模式非线性转换张量核范数（MNT-TNN），该框架展示了对于随机缺失值恢复的有效数学表示和理论保证。具体来说，我们严格扩展了TTNN中的单模式转换到具有非线性激活的多模式转换，有效捕捉了交通张量的固有多模式时空相关性和低秩性，表示为位置×位置×时间。为了解决非凸优化问题，我们设计了一个具有理论收敛保证的近端交替最小化（PAM）算法。我们提出了一个增强变换张量核范数族（ATTNNs）框架，以增强TTNN技术的插补结果，特别是在非常高的缺失率下。对真实数据集的广泛实验表明，我们提出的MNT-TNN和ATTNNs可以胜过比较的最先进插补方法，完成了随机缺失交通值插补的基准测试。

更新时间: 2025-03-29 02:58:31

领域: cs.LG

下载: http://arxiv.org/abs/2503.22955v1

TODO: Enhancing LLM Alignment with Ternary Preferences

Aligning large language models (LLMs) with human intent is critical for enhancing their performance across a variety of tasks. Standard alignment techniques, such as Direct Preference Optimization (DPO), often rely on the binary Bradley-Terry (BT) model, which can struggle to capture the complexities of human preferences -- particularly in the presence of noisy or inconsistent labels and frequent ties. To address these limitations, we introduce the Tie-rank Oriented Bradley-Terry model (TOBT), an extension of the BT model that explicitly incorporates ties, enabling more nuanced preference representation. Building on this, we propose Tie-rank Oriented Direct Preference Optimization (TODO), a novel alignment algorithm that leverages TOBT's ternary ranking system to improve preference alignment. In evaluations on Mistral-7B and Llama 3-8B models, TODO consistently outperforms DPO in modeling preferences across both in-distribution and out-of-distribution datasets. Additional assessments using MT Bench and benchmarks such as Piqa, ARC-c, and MMLU further demonstrate TODO's superior alignment performance. Notably, TODO also shows strong results in binary preference alignment, highlighting its versatility and potential for broader integration into LLM alignment. The implementation details can be found in https://github.com/XXares/TODO.

Updated: 2025-03-29 02:56:45

标题: 提高LLM与三元偏好的对齐

摘要: 将大型语言模型（LLMs）与人类意图对齐对于提高它们在各种任务中的表现至关重要。标准的对齐技术，如直接偏好优化（DPO），通常依赖于二元的Bradley-Terry（BT）模型，这种模型在捕捉人类偏好的复杂性方面可能存在困难，特别是在存在嘈杂或不一致的标签以及频繁的并列情况下。为了解决这些限制，我们引入了Tie-rank Oriented Bradley-Terry模型（TOBT），这是BT模型的扩展，明确地包含了并列情况，从而实现了更加细致的偏好表示。在此基础上，我们提出了Tie-rank Oriented Direct Preference Optimization（TODO），这是一种新颖的对齐算法，利用TOBT的三元排名系统来改善偏好对齐。在Mistral-7B和Llama 3-8B模型上的评估中，TODO在建模偏好方面始终优于DPO，无论是在分布内还是分布外的数据集上。使用MT Bench和Piqa、ARC-c、MMLU等基准进行的额外评估进一步展示了TODO的优越对齐性能。值得注意的是，TODO在二元偏好对齐方面也表现出色，突显了其多功能性和对更广泛整合到LLM对齐中的潜力。实施细节可在https://github.com/XXares/TODO 中找到。

更新时间: 2025-03-29 02:56:45

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2411.02442v2

Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective

Medical knowledge graphs (KGs) are essential for clinical decision support and biomedical research, yet they often exhibit incompleteness due to knowledge gaps and structural limitations in medical coding systems. This issue is particularly evident in treatment mapping, where coding systems such as ICD, Mondo, and ATC lack comprehensive coverage, resulting in missing or inconsistent associations between diseases and their potential treatments. To address this issue, we have explored the use of Large Language Models (LLMs) for imputing missing treatment relationships. Although LLMs offer promising capabilities in knowledge augmentation, their application in medical knowledge imputation presents significant risks, including factual inaccuracies, hallucinated associations, and instability between and within LLMs. In this study, we systematically evaluate LLM-driven treatment mapping, assessing its reliability through benchmark comparisons. Our findings highlight critical limitations, including inconsistencies with established clinical guidelines and potential risks to patient safety. This study serves as a cautionary guide for researchers and practitioners, underscoring the importance of critical evaluation and hybrid approaches when leveraging LLMs to enhance treatment mappings on medical knowledge graphs.

Updated: 2025-03-29 02:52:17

标题: Can LLMs支持医学知识的推断？一种基于评估的视角

摘要: 医学知识图谱（KGs）对临床决策支持和生物医学研究至关重要，然而，由于知识缺口和医学编码系统的结构限制，它们经常表现出不完整性。这个问题在治疗映射中特别明显，ICD、Mondo和ATC等编码系统缺乏全面覆盖，导致疾病与其潜在治疗方法之间的缺失或不一致的关联。为了解决这个问题，我们探索了使用大型语言模型（LLMs）来填补缺失的治疗关系。虽然LLMs在知识增强方面具有很大的潜力，但它们在医学知识填充中的应用存在重大风险，包括事实不准确、产生幻觉的关联以及LLMs之间和内部的不稳定性。在这项研究中，我们系统评估了LLM驱动的治疗映射，通过基准比较评估其可靠性。我们的研究结果突出了一些关键限制，包括与已建立的临床指南不一致以及对患者安全的潜在风险。这项研究作为研究人员和从业者的警示指南，强调了在利用LLMs增强医学知识图谱上的治疗映射时进行关键评估和混合方法的重要性。

更新时间: 2025-03-29 02:52:17

领域: cs.CL,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2503.22954v1

SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning

Large Language Models (LLMs) have transformed natural language processing by learning from massive datasets, yet this rapid progress has also drawn legal scrutiny, as the ability to unintentionally generate copyrighted content has already prompted several prominent lawsuits. In this work, we introduce SUV (Selective Unlearning for Verbatim data), a selective unlearning framework designed to prevent LLM from memorizing copyrighted content while preserving its overall utility. In detail, the proposed method constructs a dataset that captures instances of copyrighted infringement cases by the targeted LLM. With the dataset, we unlearn the content from the LLM by means of Direct Preference Optimization (DPO), which replaces the verbatim copyrighted content with plausible and coherent alternatives. Since DPO may hinder the LLM's performance in other unrelated tasks, we integrate gradient projection and Fisher information regularization to mitigate the degradation. We validate our approach using a large-scale dataset of 500 famous books (predominantly copyrighted works) and demonstrate that SUV significantly reduces verbatim memorization with negligible impact on the performance on unrelated tasks. Extensive experiments on both our dataset and public benchmarks confirm the scalability and efficacy of our approach, offering a promising solution for mitigating copyright risks in real-world LLM applications.

Updated: 2025-03-29 02:33:26

标题: SUV：可扩展的大型语言模型版权合规与规范选择性遗忘

摘要: 大型语言模型（LLMs）通过学习大规模数据集改变了自然语言处理，然而这种快速进展也引起了法律审查，因为无意中生成受版权保护内容的能力已经引发了几起知名诉讼。在这项工作中，我们引入了SUV（选择性逐步遗忘逐字数据），这是一个选择性逐步遗忘框架，旨在防止LLM记忆受版权保护内容，同时保留其整体实用性。具体来说，所提出的方法构建了一个数据集，捕捉了目标LLM侵权案例的实例。通过直接偏好优化（DPO），我们从LLM中消除了内容，将逐字的受版权保护内容替换为合理和连贯的替代内容。由于DPO可能会阻碍LLM在其他无关任务中的表现，我们集成了梯度投影和Fisher信息正则化来减轻性能下降。我们使用一个包含500本著名书籍（主要为受版权保护作品）的大规模数据集验证了我们的方法，并证明SUV显著减少了逐字记忆，对无关任务的性能影响微乎其微。在我们的数据集和公共基准测试中进行了大量实验，证实了我们方法的可扩展性和有效性，为减轻现实世界LLM应用中的版权风险提供了一个有希望的解决方案。

更新时间: 2025-03-29 02:33:26

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2503.22948v1

DATAWEAVER: Authoring Data-Driven Narratives through the Integrated Composition of Visualization and Text

Data-driven storytelling has gained prominence in journalism and other data reporting fields. However, the process of creating these stories remains challenging, often requiring the integration of effective visualizations with compelling narratives to form a cohesive, interactive presentation. To help streamline this process, we present an integrated authoring framework and system, DataWeaver, that supports both visualization-to-text and text-to-visualization composition. DataWeaver enables users to create data narratives anchored to data facts derived from "call-out" interactions, i.e., user-initiated highlights of visualization elements that prompt relevant narrative content. In addition to this "vis-to-text" composition, DataWeaver also supports a "text-initiated" approach, generating relevant interactive visualizations from existing narratives. Key findings from an evaluation with 13 participants highlighted the utility and usability of DataWeaver and the effectiveness of its integrated authoring framework. The evaluation also revealed opportunities to enhance the framework by refining filtering mechanisms and visualization recommendations and better support authoring creativity by introducing advanced customization options.

Updated: 2025-03-29 02:33:03

标题: 数据编织者：通过可视化和文本的综合构成创作数据驱动叙事

摘要: 数据驱动的故事叙述在新闻和其他数据报告领域越来越受到重视。然而，创建这些故事的过程仍然具有挑战性，通常需要将有效的可视化与引人入胜的叙述相结合，形成一个连贯、互动的展示。为了帮助简化这个过程，我们提出了一个集成的创作框架和系统，名为DataWeaver，该系统支持可视化到文本和文本到可视化的组合。DataWeaver使用户能够创建数据叙事，这些叙事以从“呼出”交互中获得的数据事实为基础，即用户发起的对可视化元素的高亮显示，促使相关的叙述内容。除了这种“vis-to-text”组合，DataWeaver还支持一种“text-initiated”方法，从现有叙述中生成相关的交互式可视化。与13位参与者进行的评估的关键发现突显了DataWeaver的实用性和可用性，以及其集成创作框架的有效性。评估还揭示了通过优化过滤机制和可视化建议，并通过引入高级定制选项更好地支持创作创意的机会，来增强该框架。

更新时间: 2025-03-29 02:33:03

领域: cs.HC,cs.AI,H.5.2; I.3.6

下载: http://arxiv.org/abs/2503.22946v1

Adaptive Interactive Navigation of Quadruped Robots using Large Language Models

Robotic navigation in complex environments remains a critical research challenge. Traditional navigation methods focus on optimal trajectory generation within free space, struggling in environments lacking viable paths to the goal, such as disaster zones or cluttered warehouses. To address this gap, we propose an adaptive interactive navigation approach that proactively interacts with environments to create feasible paths to reach originally unavailable goals. Specifically, we present a primitive tree for task planning with large language models (LLMs), facilitating effective reasoning to determine interaction objects and sequences. To ensure robust subtask execution, we adopt reinforcement learning to pre-train a comprehensive skill library containing versatile locomotion and interaction behaviors for motion planning. Furthermore, we introduce an adaptive replanning method featuring two LLM-based modules: an advisor serving as a flexible replanning trigger and an arborist for autonomous plan adjustment. Integrated with the tree structure, the replanning mechanism allows for convenient node addition and pruning, enabling rapid plan modification in unknown environments. Comprehensive simulations and experiments have demonstrated our method's effectiveness and adaptivity in diverse scenarios. The supplementary video is available at page: https://youtu.be/W5ttPnSap2g.

Updated: 2025-03-29 02:17:52

标题: 四足机器人的自适应交互式导航利用大型语言模型

摘要: 复杂环境中的机器人导航仍然是一个关键的研究挑战。传统的导航方法侧重于在自由空间内生成最佳轨迹，在缺乏可行路径到达目标的环境中遇到困难，例如灾难区域或杂乱的仓库。为了解决这一问题，我们提出了一种自适应交互式导航方法，通过主动与环境互动来创建可行路径，以达到最初无法到达的目标。具体而言，我们提出了一种基于大型语言模型（LLMs）的任务规划原始树，促进有效推理以确定互动对象和顺序。为了确保强大的子任务执行，我们采用强化学习来预先训练一个包含多功能运动和互动行为的全面技能库，用于运动规划。此外，我们引入了一种自适应重新规划方法，包括两个基于LLM的模块：一个作为灵活重新规划触发器的顾问和一个用于自主计划调整的园艺师。与树形结构集成，重新规划机制允许方便地添加和修剪节点，在未知环境中实现快速计划修改。全面的模拟和实验已经证明了我们的方法在各种场景中的有效性和适应性。补充视频可在以下页面查看：https://youtu.be/W5ttPnSap2g。

更新时间: 2025-03-29 02:17:52

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.22942v1

Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering

Recent advances in large language models (LLMs) have led to the development of multimodal LLMs (MLLMs) in the fields of natural language processing (NLP) and computer vision. Although these models allow for integrated visual and language understanding, they present challenges such as opaque internal processing and the generation of hallucinations and misinformation. Therefore, there is a need for a method to clarify the location of knowledge in MLLMs. In this study, we propose a method to identify neurons associated with specific knowledge using MiniGPT-4, a Transformer-based MLLM. Specifically, we extract knowledge neurons through two stages: activation differences filtering using inpainting and gradient-based filtering using GradCAM. Experiments on the image caption generation task using the MS COCO 2017 dataset, BLEU, ROUGE, and BERTScore quantitative evaluation, and qualitative evaluation using an activation heatmap showed that our method is able to locate knowledge with higher accuracy than existing methods. This study contributes to the visualization and explainability of knowledge in MLLMs and shows the potential for future knowledge editing and control.

Updated: 2025-03-29 02:16:15

标题: 通过两阶段过滤识别预训练变压器中的多模态知识神经元

摘要: 最近在大型语言模型（LLMs）方面取得的进展导致了在自然语言处理（NLP）和计算机视觉领域开发了多模态LLMs（MLLMs）。尽管这些模型允许集成视觉和语言理解，但它们面临着内部处理的不透明性以及产生幻觉和错误信息的挑战。因此，有必要提出一种方法来澄清MLLMs中知识的位置。在这项研究中，我们提出了一种使用基于Transformer的MLLM MiniGPT-4来识别与特定知识相关的神经元的方法。具体地，我们通过两个阶段提取知识神经元：使用修补和基于梯度的滤波器进行梯度差异过滤。在使用MS COCO 2017数据集进行图像标题生成任务的实验中，通过BLEU、ROUGE和BERTScore定量评估以及使用激活热图的定性评估，我们的方法能够比现有方法更准确地定位知识。这项研究有助于可视化和解释MLLMs中的知识，并展示了未来知识编辑和控制的潜力。

更新时间: 2025-03-29 02:16:15

领域: cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.22941v1

Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification, An Interpretable Multi-Omics Approach

The integration of multi-omics data presents a major challenge in precision medicine, requiring advanced computational methods for accurate disease classification and biological interpretation. This study introduces the Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN), a deep learning model that integrates messenger RNA, micro RNA sequences, and DNA methylation data with Protein-Protein Interaction (PPI) networks for accurate and interpretable cancer classification across 31 cancer types. MOGKAN employs a hybrid approach combining differential expression with DESeq2, Linear Models for Microarray (LIMMA), and Least Absolute Shrinkage and Selection Operator (LASSO) regression to reduce multi-omics data dimensionality while preserving relevant biological features. The model architecture is based on the Kolmogorov-Arnold theorem principle, using trainable univariate functions to enhance interpretability and feature analysis. MOGKAN achieves classification accuracy of 96.28 percent and demonstrates low experimental variability with a standard deviation that is reduced by 1.58 to 7.30 percents compared to Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs). The biomarkers identified by MOGKAN have been validated as cancer-related markers through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. The proposed model presents an ability to uncover molecular oncogenesis mechanisms by detecting phosphoinositide-binding substances and regulating sphingolipid cellular processes. By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates superior predictive performance and interpretability that has the potential to enhance the translation of complex multi-omics data into clinically actionable cancer diagnostics.

Updated: 2025-03-29 02:14:05

标题: 图科尔莫戈洛夫-阿诺德网络用于多癌症分类和生物标志物识别，一种可解释的多组学方法

摘要: 多组学数据的整合在精准医学中面临着重大挑战，需要先进的计算方法来进行准确的疾病分类和生物解释。本研究介绍了Multi-Omics Graph Kolmogorov-Arnold Network（MOGKAN），这是一个深度学习模型，将信使RNA、微型RNA序列和DNA甲基化数据与蛋白质相互作用（PPI）网络整合，实现了跨31种癌症类型的准确和可解释的癌症分类。MOGKAN采用了一种混合方法，结合差异表达与DESeq2、线性模型用于微阵列（LIMMA）和最小绝对值收缩和选择算子（LASSO）回归，以降低多组学数据的维度，同时保留相关的生物学特征。该模型架构基于Kolmogorov-Arnold定理原则，使用可训练的单变量函数来增强可解释性和特征分析。MOGKAN实现了96.28%的分类准确率，并表现出低实验变异性，标准偏差比卷积神经网络（CNNs）和图神经网络（GNNs）降低了1.58%至7.30%。MOGKAN鉴定出的生物标志物通过基因本体（GO）和京都基因组和基因组百科全书（KEGG）富集分析已经验证为与癌症相关的标记。所提出的模型具有发现分子肿瘤发生机制的能力，通过检测磷脂酰肌醇结合物质和调节鞘脂类细胞过程。通过将多组学数据与基于图的深度学习相结合，我们提出的方法表现出卓越的预测性能和解释性，有潜力将复杂的多组学数据转化为临床可操作的癌症诊断。

更新时间: 2025-03-29 02:14:05

领域: cs.LG,q-bio.QM,68,I.2.6

下载: http://arxiv.org/abs/2503.22939v1

Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models

Text-to-image generative models often struggle with long prompts detailing complex scenes, diverse objects with distinct visual characteristics and spatial relationships. In this work, we propose SCoPE (Scheduled interpolation of Coarse-to-fine Prompt Embeddings), a training-free method to improve text-to-image alignment by progressively refining the input prompt in a coarse-to-fine-grained manner. Given a detailed input prompt, we first decompose it into multiple sub-prompts which evolve from describing broad scene layout to highly intricate details. During inference, we interpolate between these sub-prompts and thus progressively introduce finer-grained details into the generated image. Our training-free plug-and-play approach significantly enhances prompt alignment, achieves an average improvement of up to +4% in Visual Question Answering (VQA) scores over the Stable Diffusion baselines on 85% of the prompts from the GenAI-Bench dataset.

Updated: 2025-03-29 02:03:32

标题: 渐进式提示详细化以改善文本到图像生成模型的对齐

摘要: 文本到图像生成模型通常在详细描述复杂场景、具有不同视觉特征和空间关系的多样化对象的长提示时遇到困难。在这项工作中，我们提出了SCoPE（Scheduled interpolation of Coarse-to-fine Prompt Embeddings），这是一种无需训练的方法，通过以粗到细的方式逐步改进输入提示来改善文本到图像的对齐。给定详细的输入提示，我们首先将其分解为多个子提示，这些子提示从描述广泛的场景布局演变到高度复杂的细节。在推断过程中，我们在这些子提示之间进行插值，从而逐步引入更精细的细节到生成的图像中。我们的无需训练的即插即用方法显著增强了提示对齐，相比GenAI-Bench数据集中85%的提示上的稳定扩散基线，Visual Question Answering（VQA）得分平均提高了高达+4%。

更新时间: 2025-03-29 02:03:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17794v2

On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy

A significant approach in natural language processing involves large-scale pre-training of models on general domain data followed by their adaptation to specific tasks or domains. As models grow in size, full fine-tuning all of their parameters becomes increasingly impractical. To address this, some methods for low-rank task adaptation of language models have been proposed, e.g., LoRA and FLoRA. These methods keep the pre-trained model weights fixed and incorporate trainable low-rank decomposition matrices into some layers of the transformer architecture, called adapters. This approach significantly reduces the number of trainable parameters required for downstream tasks compared to full fine-tuning all parameters. In this work, we look at low-rank adaptation from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA and FLoRA leads to the injection of some random noise into the batch gradients w.r.t the adapter parameters. We quantify the variance of the injected noise and show that the smaller the adaptation rank, the larger the noise variance. By establishing a Berry-Esseen type bound on the total variation distance between distribution of the injected noise and a Gaussian distribution with the same variance, we show that the dynamics of low-rank adaptation is close to that of differentially private fine-tuning of the adapters. Finally, using Johnson-Lindenstrauss lemma, we show that when augmented with gradient scaling, low-rank adaptation is very close to performing DPSGD algorithm with a fixed noise scale to fine-tune the adapters. Suggested by our theoretical findings and approved by our experimental results, we show that low-rank adaptation, besides mitigating the space and computational complexities, implicitly provides a privacy protection w.r.t the fine-tuning data, without inducing the high space complexity of DPSGD.

Updated: 2025-03-29 01:56:56

标题: 关于低秩适应和差分隐私之间的隐含关系

摘要: 自然语言处理中的一个重要方法涉及在通用领域数据上进行大规模模型预训练，然后将其调整到特定任务或领域。随着模型规模的增长，全面微调所有参数变得越来越不切实际。为了解决这个问题，一些低秩任务适应语言模型的方法被提出，例如LoRA和FLoRA。这些方法保持预训练模型权重固定，并将可训练的低秩分解矩阵纳入变压器架构的某些层中，称为适配器。与全面微调所有参数相比，这种方法显著减少了下游任务所需的可训练参数数量。在这项工作中，我们从数据隐私的角度看待低秩适应。我们理论上表明，LoRA和FLoRA中使用的低秩适应导致一些随机噪声注入关于适配器参数的批梯度。我们量化了注入噪声的方差，并表明适应等级越小，噪声方差越大。通过在注入噪声的分布与具有相同方差的高斯分布之间的总变差距离上建立Berry-Esseen类型的界限，我们表明低秩适应的动态接近于不同ially私有微调适配器。最后，使用Johnson-Lindenstrauss引理，我们表明在增加梯度缩放的情况下，低秩适应非常接近执行带有固定噪声尺度的DPSGD算法来微调适配器。根据我们的理论发现建议，并经我们的实验结果批准，我们表明低秩适应除了减轻空间和计算复杂性外，还隐含地提供了一定程度的隐私保护，而不会引起DPSGD的高空间复杂性。

更新时间: 2025-03-29 01:56:56

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.17538v4

Integrating Fairness and Model Pruning Through Bi-level Optimization

Deep neural networks have achieved exceptional results across a range of applications. As the demand for efficient and sparse deep learning models escalates, the significance of model compression, particularly pruning, is increasingly recognized. Traditional pruning methods, however, can unintentionally intensify algorithmic biases, leading to unequal prediction outcomes in critical applications and raising concerns about the dilemma of pruning practices and social justice. To tackle this challenge, we introduce a novel concept of fair model pruning, which involves developing a sparse model that adheres to fairness criteria. In particular, we propose a framework to jointly optimize the pruning mask and weight update processes with fairness constraints. This framework is engineered to compress models that maintain performance while ensuring fairness in a unified process. To this end, we formulate the fair pruning problem as a novel constrained bi-level optimization task and derive efficient and effective solving strategies. We design experiments across various datasets and scenarios to validate our proposed method. Our empirical analysis contrasts our framework with several mainstream pruning strategies, emphasizing our method's superiority in maintaining model fairness, performance, and efficiency.

Updated: 2025-03-29 01:56:39

标题: 通过双层优化将公平性和模型修剪整合在一起

摘要: 深度神经网络在各种应用中取得了异常的成果。随着对高效稀疏深度学习模型的需求不断增长，模型压缩的重要性，特别是剪枝，也越来越受到重视。然而，传统的剪枝方法可能会无意中加剧算法偏见，在关键应用中导致不平等的预测结果，引发人们对剪枝实践和社会公正的困境的担忧。为了解决这一挑战，我们提出了一个新颖的公平模型剪枝概念，涉及开发符合公平性标准的稀疏模型。具体而言，我们提出了一个框架，用公平约束共同优化剪枝掩码和权重更新过程。这个框架旨在在统一过程中压缩模型，保持性能的同时确保公平性。为此，我们将公平剪枝问题形式化为一个新颖的受限双层优化任务，并推导出高效和有效的解决策略。我们设计了跨多个数据集和情景的实验来验证我们提出的方法。我们的实证分析将我们的框架与几种主流剪枝策略进行对比，强调我们的方法在保持模型公平性、性能和效率方面的优越性。

更新时间: 2025-03-29 01:56:39

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2312.10181v2

Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes

In recent years, the rapid increase of security vulnerabilities has caused major challenges in managing them. One critical task in vulnerability management is tracing the patches that fix a vulnerability. By accurately tracing the patching commits, security stakeholders can precisely identify affected software components, determine vulnerable and fixed versions, assess the severity etc., which facilitates rapid deployment of mitigations. However, previous work has shown that the patch information is often missing in vulnerability databases, including both the National Vulnerability Databases (NVD) and the GitHub Advisory Database, which increases the risk of delayed mitigation, incorrect vulnerability assessment, and potential exploits. Although existing work has proposed several approaches for patch tracing, they suffer from two major challenges: (1) the lack of scalability to the full-repository level, and (2) the lack of study on how to model the semantic similarity between the CVE and the full diff code. Upon identifying this gap, we propose SITPatchTracer, a scalable full-repo full-context retrieval system for security vulnerability patch tracing. SITPatchTracer leverages ElasticSearch, learning-to-rank, and a hierarchical embedding approach based on GritLM, a top-ranked LLM for text embedding with unlimited context length and fast inference speed. The evaluation of SITPatchTracer shows that it achieves a high recall on both evaluated datasets. SITPatchTracer's recall not only outperforms several existing works (PatchFinder, PatchScout, VFCFinder), but also Voyage, the SOTA commercial code embedding API by 13\% and 28\%.

Updated: 2025-03-29 01:53:07

标题: 提高用于追踪安全漏洞修复的代码检索的上下文长度和效率

摘要: 近年来，安全漏洞的快速增加导致了管理方面的重大挑战。漏洞管理中的一个关键任务是跟踪修复漏洞的补丁。通过准确追踪修补提交，安全利益相关者可以精确确定受影响的软件组件、确定易受攻击和修复版本、评估严重性等，从而促进快速部署缓解措施。然而，先前的研究表明，在漏洞数据库中通常缺少补丁信息，包括国家漏洞数据库（NVD）和GitHub咨询数据库，这增加了延迟缓解、错误漏洞评估和潜在利用的风险。尽管现有工作提出了几种用于补丁跟踪的方法，但它们面临两个主要挑战：（1）缺乏对整个存储库级别的可扩展性，以及（2）缺乏如何建模CVE和完整差异代码之间的语义相似性的研究。在确定了这一差距后，我们提出了SITPatchTracer，这是一个可扩展的全存储库全上下文检索系统，用于安全漏洞补丁跟踪。SITPatchTracer利用ElasticSearch、学习排名和基于GritLM的分层嵌入方法，GritLM是一种顶级的用于文本嵌入的LLM，具有无限上下文长度和快速推理速度。SITPatchTracer的评估显示，它在两个评估数据集上均实现了高召回率。SITPatchTracer的召回率不仅优于几种现有工作（PatchFinder、PatchScout、VFCFinder），还优于SOTA商业代码嵌入API Voyage的13%和28%。

更新时间: 2025-03-29 01:53:07

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2503.22935v1

FairSAM: Fair Classification on Corrupted Data Through Sharpness-Aware Minimization

Image classification models trained on clean data often suffer from significant performance degradation when exposed to testing corrupted data, such as images with impulse noise, Gaussian noise, or environmental noise. This degradation not only impacts overall performance but also disproportionately affects various demographic subgroups, raising critical algorithmic bias concerns. Although robust learning algorithms like Sharpness-Aware Minimization (SAM) have shown promise in improving overall model robustness and generalization, they fall short in addressing the biased performance degradation across demographic subgroups. Existing fairness-aware machine learning methods - such as fairness constraints and reweighing strategies - aim to reduce performance disparities but hardly maintain robust and equitable accuracy across demographic subgroups when faced with data corruption. This reveals an inherent tension between robustness and fairness when dealing with corrupted data. To address these challenges, we introduce one novel metric specifically designed to assess performance degradation across subgroups under data corruption. Additionally, we propose \textbf{FairSAM}, a new framework that integrates \underline{Fair}ness-oriented strategies into \underline{SAM} to deliver equalized performance across demographic groups under corrupted conditions. Our experiments on multiple real-world datasets and various predictive tasks show that FairSAM successfully reconciles robustness and fairness, offering a structured solution for equitable and resilient image classification in the presence of data corruption.

Updated: 2025-03-29 01:51:59

标题: FairSAM：通过敏锐感知最小化在受损数据上进行公平分类

摘要: 在干净数据上训练的图像分类模型，当暴露于测试数据污染时（例如带有脉冲噪声、高斯噪声或环境噪声的图像），往往会遭受显著的性能下降。这种下降不仅影响整体性能，还会不成比例地影响各种人口统计亚组，引起关键的算法偏见担忧。尽管像Sharpness-Aware Minimization（SAM）这样的鲁棒学习算法已经显示出在提高整体模型鲁棒性和泛化能力方面具有潜力，但在解决人口统计亚组之间性能下降的偏见方面仍有不足。现有的关注公平性的机器学习方法 - 如公平性约束和重新加权策略 - 旨在减少性能差距，但在面对数据污染时几乎无法保持人口统计亚组之间的鲁棒和公平准确性。这揭示了在处理污染数据时鲁棒性和公平性之间的固有紧张关系。为了解决这些挑战，我们引入了一个专门设计用于评估数据污染下亚组性能下降的新指标。此外，我们提出了FairSAM，这是一个新框架，将公平性导向策略整合到SAM中，在受污染条件下为人口统计群体提供平等性能。我们在多个现实世界数据集和各种预测任务上的实验表明，FairSAM成功调和了鲁棒性和公平性，为在数据污染情况下提供公平和弹性的图像分类提供了结构化解决方案。

更新时间: 2025-03-29 01:51:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.22934v1

Bi-Level Multi-View fuzzy Clustering with Exponential Distance

In this study, we propose extension of fuzzy c-means (FCM) clustering in multi-view environments. First, we introduce an exponential multi-view FCM (E-MVFCM). E-MVFCM is a centralized MVC with consideration to heat-kernel coefficients (H-KC) and weight factors. Secondly, we propose an exponential bi-level multi-view fuzzy c-means clustering (EB-MVFCM). Different to E-MVFCM, EB-MVFCM does automatic computation of feature and weight factors simultaneously. Like E-MVFCM, EB-MVFCM present explicit forms of the H-KC to simplify the generation of the heat-kernel $\mathcal{K}(t)$ in powers of the proper time $t$ during the clustering process. All the features used in this study, including tools and functions of proposed algorithms will be made available at https://www.github.com/KristinaP09/EB-MVFCM.

Updated: 2025-03-29 01:35:40

标题: 双层多视角模糊聚类与指数距离

摘要: 在这项研究中，我们提出了在多视角环境下扩展模糊c均值（FCM）聚类的方法。首先，我们引入了指数多视角FCM（E-MVFCM）。E-MVFCM是一个考虑热核系数（H-KC）和权重因子的集中式MVC。其次，我们提出了指数双层多视角模糊c均值聚类（EB-MVFCM）。与E-MVFCM不同，EB-MVFCM同时自动计算特征和权重因子。与E-MVFCM类似，EB-MVFCM提供了H-KC的显式形式，简化了在聚类过程中生成热核$\mathcal{K}(t)$的时间$t$的幂的过程。本研究中使用的所有特征，包括所提出算法的工具和功能，将在https://www.github.com/KristinaP09/EB-MVFCM上提供。

更新时间: 2025-03-29 01:35:40

领域: cs.CV,cs.LG,math.PR,62H30

下载: http://arxiv.org/abs/2503.22932v1

Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use

In this paper, we propose a novel factored agent architecture designed to overcome the limitations of traditional single-agent systems in agentic AI. Our approach decomposes the agent into two specialized components: (1) a large language model (LLM) that serves as a high level planner and in-context learner, which may use dynamically available information in user prompts, (2) a smaller language model which acts as a memorizer of tool format and output. This decoupling addresses prevalent issues in monolithic designs, including malformed, missing, and hallucinated API fields, as well as suboptimal planning in dynamic environments. Empirical evaluations demonstrate that our factored architecture significantly improves planning accuracy and error resilience, while elucidating the inherent trade-off between in-context learning and static memorization. These findings suggest that a factored approach is a promising pathway for developing more robust and adaptable agentic AI systems.

Updated: 2025-03-29 01:27:11

标题: 分解型代理：为了稳健的工具使用，解耦上下文学习和记忆

摘要: 在本文中，我们提出了一种新颖的分解式Agent架构，旨在克服传统单Agent系统在Agent AI中的局限性。我们的方法将Agent分解为两个专门的组件：（1）作为高级规划器和上下文学习者的大型语言模型（LLM），可以利用用户提示中的动态可用信息，（2）作为工具格式和输出的记忆器的较小语言模型。这种解耦方法解决了单体设计中普遍存在的问题，包括格式错误、缺失和虚构的API字段，以及在动态环境中规划不佳的问题。实证评估表明，我们的分解式架构显著提高了规划准确性和错误韧性，同时阐明了上下文学习和静态记忆之间固有的权衡。这些发现表明，分解式方法是开发更强大和适应性更强的Agent AI系统的一个有前途的途径。

更新时间: 2025-03-29 01:27:11

领域: cs.AI

下载: http://arxiv.org/abs/2503.22931v1

PupilSense: A Novel Application for Webcam-Based Pupil Diameter Estimation

Measuring pupil diameter is vital for gaining insights into physiological and psychological states - traditionally captured by expensive, specialized equipment like Tobii eye-trackers and Pupillabs glasses. This paper presents a novel application that enables pupil diameter estimation using standard webcams, making the process accessible in everyday environments without specialized equipment. Our app estimates pupil diameters from videos and offers detailed analysis, including class activation maps, graphs of predicted left and right pupil diameters, and eye aspect ratios during blinks. This tool expands the accessibility of pupil diameter measurement, particularly in everyday settings, benefiting fields like human behavior research and healthcare. Additionally, we present a new open source dataset for pupil diameter estimation using webcam images containing cropped eye images and corresponding pupil diameter measurements.

Updated: 2025-03-29 01:19:17

标题: PupilSense: 一种基于网络摄像头的瞳孔直径估计新应用

摘要: 测量瞳孔直径对于获取生理和心理状态的洞察至关重要，传统上使用昂贵的专门设备如Tobii眼动仪和Pupillabs眼镜进行捕捉。本文介绍了一种新的应用程序，可以利用标准网络摄像头进行瞳孔直径估计，使该过程在日常环境中无需专门设备即可实现。我们的应用程序可以从视频中估计瞳孔直径，并提供详细分析，包括类激活图、预测的左右瞳孔直径的图表，以及眨眼时的眼睛纵横比。这个工具扩大了瞳孔直径测量的可访问性，特别是在日常环境中，有益于人类行为研究和医疗保健领域。此外，我们还提供了一个新的开源数据集，用于利用网络摄像头图像进行瞳孔直径估计，其中包含裁剪后的眼部图像和相应的瞳孔直径测量。

更新时间: 2025-03-29 01:19:17

领域: cs.CV,cs.AI,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.11204v2

Cost-Saving LLM Cascades with Early Abstention

LLM cascades deploy small LLMs to answer most queries, limiting the use of large and expensive LLMs to difficult queries. This approach can significantly reduce costs without impacting performance. However, risk-sensitive domains such as finance or medicine place an additional premium on avoiding model errors. Since even the most expensive models are susceptible to making mistakes, applications in these domains benefit from allowing LLM systems to completely abstain from answering difficult queries. Introducing abstention poses a design question for LLM cascades: should abstention only be allowed at the final model or also at earlier models? Since the error patterns of small and large models are correlated, allowing earlier models to abstain may reduce inference costs and latency by anticipating abstention decisions by expensive and slow models, thus avoiding the need to run these models. We investigate the benefits of such "early abstention" in LLM cascades and find that it reduces overall test loss by 2.2% on average across six benchmarks (GSM8K, MedMCQA, MMLU, TriviaQA, TruthfulQA, and XSum). These gains result from a more effective use of abstention, trading a 4.1% average increase in the overall abstention rate for a 13.0% reduction in cost and a 5.0% reduction in error rate. Our findings demonstrate the possibility of leveraging correlations between the error patterns of different language models to drive performance improvements for LLM systems with abstention.

Updated: 2025-03-29 01:19:05

标题: 具有早期弃权的节省成本LLM级联

摘要: LLM级联将小型LLMs部署用于回答大多数查询，将大型和昂贵的LLMs限制在困难查询中。这种方法可以显著降低成本而不影响性能。然而，对金融或医学等风险敏感领域而言，避免模型错误会带来额外的成本。由于即使是最昂贵的模型也容易出错，这些领域的应用受益于允许LLM系统完全放弃回答困难查询。引入放弃提出了LLM级联的设计问题：放弃只应该在最终模型上允许，还是在较早的模型上也应该允许？由于小型和大型模型的错误模式是相关的，允许较早的模型放弃可能通过预期昂贵和慢速模型的放弃决策来减少推理成本和延迟，从而避免运行这些模型的需要。我们研究了LLM级联中这种“早期放弃”的好处，并发现在六个基准测试中（GSM8K，MedMCQA，MMLU，TriviaQA，TruthfulQA和XSum）平均降低了整体测试损失2.2%。这些收益来自对放弃的更有效利用，以4.1%的平均整体放弃率换取了13.0%的成本降低和5.0%的错误率降低。我们的发现表明，利用不同语言模型的错误模式之间的相关性可以推动带放弃功能的LLM系统的性能改进。

更新时间: 2025-03-29 01:19:05

领域: cs.AI

下载: http://arxiv.org/abs/2502.09054v2

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

Traditional retrieval-augmented generation (RAG) benchmarks evaluate systems using heuristic-based metrics, but these require human preferences as the ground truth for reference. In contrast, arena-based benchmarks, where systems compete against each other, require an expensive large language model (LLM) as a judge for a reliable evaluation. We present a simple efficient technique to combine the best of both worlds. The idea is to train a surrogate judge using heuristic metrics as input, to output the LLM as a judge prediction. In our work, we develop MIRAGE-Bench, a synthetic arena-based RAG benchmark for 18 diverse languages on Wikipedia focused on multilingual answer generation evaluation. It extensively couples both heuristic features and LLM as a judge for evaluation. We benchmark 19 multilingual LLMs, and observe a high correlation (Kendall Tau ($\tau$) = 0.909) using our surrogate judge and between GPT-4o as a teacher using the Bradley-Terry framework. Our results show proprietary and large open-source LLMs currently dominate on MIRAGE-Bench. Our code and datasets are made publicly available here: https://github.com/vectara/mirage-bench.

Updated: 2025-03-29 01:11:30

标题: MIRAGE-Bench：用于检索增强生成系统的自动多语言基准测试竞技场

摘要: 传统的检索增强生成（RAG）基准评估系统使用基于启发式的度量，但这需要人类偏好作为参考的真相。相比之下，基于竞技场的基准测试，系统之间相互竞争，需要一个昂贵的大型语言模型（LLM）作为可靠评估的判断者。我们提出了一种简单高效的技术，结合了两种方法的优点。这个想法是训练一个代理判断者，使用启发式度量作为输入，输出LLM作为判断者的预测。在我们的工作中，我们开发了MIRAGE-Bench，一个针对维基百科上18种不同语言的综合竞技场RAG基准测试，重点评估多语言答案生成。它广泛地结合了启发式特征和LLM作为评判者进行评估。我们对19个多语言LLM进行基准测试，并观察到使用我们的代理判断者和GPT-4o作为教师的Bradley-Terry框架之间有很高的相关性（Kendall Tau（τ）= 0.909）。我们的结果显示，专有和大型开源LLM目前在MIRAGE-Bench上占主导地位。我们的代码和数据集可以在此处公开获取：https://github.com/vectara/mirage-bench。

更新时间: 2025-03-29 01:11:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.13716v2

Adaptive Stochastic Gradient Descents on Manifolds with an Application on Weighted Low-Rank Approximation

We prove a convergence theorem for stochastic gradient descents on manifolds with adaptive learning rate and apply it to the weighted low-rank approximation problem.

Updated: 2025-03-29 01:05:48

标题: 在流形上的自适应随机梯度下降及其在加权低秩逼近中的应用

摘要: 我们证明了关于具有自适应学习率的随机梯度下降在流形上的收敛定理，并将其应用于加权低秩逼近问题。

更新时间: 2025-03-29 01:05:48

领域: math.OC,cs.AI,cs.LG,41A60, 53Z50, 62L20, 68T05

下载: http://arxiv.org/abs/2503.11833v2

Predictive Traffic Rule Compliance using Reinforcement Learning

Autonomous vehicle path planning has reached a stage where safety and regulatory compliance are crucial. This paper presents a new approach that integrates a motion planner with a deep reinforcement learning model to predict potential traffic rule violations. In this setup, the predictions of the critic directly affect the cost function of the motion planner, guiding the choices of the trajectory. We incorporate key interstate rules from the German Road Traffic Regulation into a rule book and use a graph-based state representation to handle complex traffic information. Our main innovation is replacing the standard actor network in an actor-critic setup with a motion planning module, which ensures both predictable trajectory generation and prevention of long-term rule violations. Experiments on an open German highway dataset show that the model can predict and prevent traffic rule violations beyond the planning horizon, significantly increasing safety in challenging traffic conditions.

Updated: 2025-03-29 01:04:08

标题: 使用强化学习预测交通规则遵守情况

摘要: 自主车辆路径规划已经达到了一个关键阶段，安全性和合规性至关重要。本文提出了一种新方法，将运动规划器与深度强化学习模型相结合，以预测潜在的交通规则违规行为。在这种设置中，评论者的预测直接影响运动规划器的成本函数，引导轨迹选择。我们将德国道路交通管理法中的关键国际规则纳入一本规则手册，并使用基于图的状态表示来处理复杂的交通信息。我们的主要创新是在演员评论设置中用运动规划模块替换标准的演员网络，这确保了可预测的轨迹生成和长期规则违规的预防。对一个开放的德国高速公路数据集进行的实验表明，该模型可以预测和防止规划范围之外的交通规则违规行为，在具有挑战性的交通条件下显著提高安全性。

更新时间: 2025-03-29 01:04:08

领域: cs.RO,cs.AI,I.2.9; I.2.6

下载: http://arxiv.org/abs/2503.22925v1

Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization

Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic programming, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.

Updated: 2025-03-29 01:01:02

标题: 嵌套随机梯度下降用于（广义）Sinkhorn距离正则化的分布鲁棒优化

摘要: 分布鲁棒优化（DRO）是一种强大的技术，用于训练针对数据分布转移的鲁棒模型。本文旨在解决正则化非凸DRO问题，其中不确定性集由所谓的广义Sinkhorn距离建模，损失函数是非凸的，可能是无界的。这种距离允许对具有不同概率支持和散度函数的分布的不确定性进行建模。对于这类正则化DRO问题，我们推导出一种新颖的对偶形式，采用嵌套随机规划的形式，其中对偶变量取决于数据样本。为了解决对偶问题，我们提供理论证据设计了一种嵌套随机梯度下降（SGD）算法，利用随机逼近来估计嵌套随机梯度。我们研究了嵌套SGD的收敛速度，并建立了独立于数据大小和参数维度的多项式迭代和样本复杂性，表明其解决大规模DRO问题的潜力。我们进行了数值实验，以证明所提出的算法的效率和鲁棒性。

更新时间: 2025-03-29 01:01:02

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.22923v1

PaintScene4D: Consistent 4D Scene Generation from Text Prompts

Recent advances in diffusion models have revolutionized 2D and 3D content creation, yet generating photorealistic dynamic 4D scenes remains a significant challenge. Existing dynamic 4D generation methods typically rely on distilling knowledge from pre-trained 3D generative models, often fine-tuned on synthetic object datasets. Consequently, the resulting scenes tend to be object-centric and lack photorealism. While text-to-video models can generate more realistic scenes with motion, they often struggle with spatial understanding and provide limited control over camera viewpoints during rendering. To address these limitations, we present PaintScene4D, a novel text-to-4D scene generation framework that departs from conventional multi-view generative models in favor of a streamlined architecture that harnesses video generative models trained on diverse real-world datasets. Our method first generates a reference video using a video generation model, and then employs a strategic camera array selection for rendering. We apply a progressive warping and inpainting technique to ensure both spatial and temporal consistency across multiple viewpoints. Finally, we optimize multi-view images using a dynamic renderer, enabling flexible camera control based on user preferences. Adopting a training-free architecture, our PaintScene4D efficiently produces realistic 4D scenes that can be viewed from arbitrary trajectories. The code will be made publicly available. Our project page is at https://paintscene4d.github.io/

Updated: 2025-03-29 00:26:04

标题: PaintScene4D：从文本提示生成一致的4D场景

摘要: 最近在扩散模型方面取得的进展已经彻底改变了二维和三维内容的创作，然而生成逼真的动态四维场景仍然是一个重大挑战。现有的动态四维生成方法通常依赖于从预先训练的三维生成模型中提取知识，通常在合成对象数据集上进行微调。因此，生成的场景往往以对象为中心，缺乏逼真感。虽然文本到视频模型可以生成更为逼真的带有运动的场景，但它们通常在空间理解方面存在困难，并且在渲染过程中提供有限的摄像机视角控制。为了解决这些限制，我们提出了PaintScene4D，这是一个新颖的文本到四维场景生成框架，它与传统的多视角生成模型背道而驰，而是采用了在不同的真实世界数据集上训练的视频生成模型。我们的方法首先使用视频生成模型生成参考视频，然后采用策略性的摄像机阵列选择进行渲染。我们应用了一种渐进的扭曲和修补技术，以确保在多个视点上的空间和时间一致性。最后，我们使用动态渲染器优化多视角图像，以实现基于用户偏好的灵活摄像机控制。采用无需训练的架构，我们的PaintScene4D有效地生成可以从任意轨迹观看的逼真的四维场景。我们将公开发布代码。我们的项目页面在https://paintscene4d.github.io/。

更新时间: 2025-03-29 00:26:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.04471v2