Arxiv Day: Article

Reasoning with LLMs for Zero-Shot Vulnerability Detection

Automating software vulnerability detection (SVD) remains a critical challenge in an era of increasingly complex and interdependent software systems. Despite significant advances in Large Language Models (LLMs) for code analysis, prevailing evaluation methodologies often lack the \textbf{context-aware robustness} necessary to capture real-world intricacies and cross-component interactions. To address these limitations, we present \textbf{VulnSage}, a comprehensive evaluation framework and a dataset curated from diverse, large-scale open-source system software projects developed in C/C++. Unlike prior datasets, it leverages a heuristic noise pre-filtering approach combined with LLM-based reasoning to ensure a representative and minimally noisy spectrum of vulnerabilities. The framework supports multi-granular analysis across function, file, and inter-function levels and employs four diverse zero-shot prompt strategies: Baseline, Chain-of-Thought, Think, and Think & Verify. Through this evaluation, we uncover that structured reasoning prompts substantially improve LLM performance, with Think & Verify reducing ambiguous responses from 20.3% to 9.1% while increasing accuracy. We further demonstrate that code-specialized models consistently outperform general-purpose alternatives, with performance varying significantly across vulnerability types, revealing that no single approach universally excels across all security contexts. Link to dataset and codes: https://github.com/Erroristotle/VulnSage.git

Updated: 2025-03-22 23:59:17

标题: 使用LLMs进行零样本漏洞检测的推理

摘要: 自动化软件漏洞检测（SVD）在日益复杂和相互依赖的软件系统时代仍然是一个关键挑战。尽管在代码分析方面存在大型语言模型（LLMs）的显著进展，但当前的评估方法往往缺乏捕捉现实世界复杂性和跨组件交互所需的\textbf{上下文感知健壮性}。为了解决这些限制，我们提出了VulnSage，一个综合评估框架和一个从不同的大规模开源系统软件项目中精心策划的数据集，这些项目是用C/C++开发的。与先前的数据集不同，它利用了一种启发式噪声预过滤方法，结合LLM为基础的推理，以确保一个代表性和噪声最小的漏洞范围。该框架支持跨函数、文件和函数间级别的多粒度分析，并采用四种不同的零-shot提示策略：基线、思维链、思考和思考与验证。通过这一评估，我们发现结构化推理提示显著提高了LLM的性能，思考与验证将模糊响应从20.3%降至9.1%，同时提高了准确性。我们进一步证明了专门针对代码的模型始终优于通用替代方案，性能在漏洞类型之间存在显著差异，揭示了在所有安全上下文中没有一种单一方法能够普遍优于其他方法。数据集和代码链接：https://github.com/Erroristotle/VulnSage.git

更新时间: 2025-03-22 23:59:17

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.17885v1

Learning Optimal Filters Using Variational Inference

Filtering - the task of estimating the conditional distribution for states of a dynamical system given partial and noisy observations - is important in many areas of science and engineering, including weather and climate prediction. However, the filtering distribution is generally intractable to obtain for high-dimensional, nonlinear systems. Filters used in practice, such as the ensemble Kalman filter (EnKF), provide biased probabilistic estimates for nonlinear systems and have numerous tuning parameters. Here, we present a framework for learning a parameterized analysis map - the transformation that takes samples from a forecast distribution, and combines with an observation, to update the approximate filtering distribution - using variational inference. In principle this can lead to a better approximation of the filtering distribution, and hence smaller bias. We show that this methodology can be used to learn the gain matrix, in an affine analysis map, for filtering linear and nonlinear dynamical systems; we also study the learning of inflation and localization parameters for an EnKF. The framework developed here can also be used to learn new filtering algorithms with more general forms for the analysis map.

Updated: 2025-03-22 23:54:29

标题: 利用变分推断学习最佳滤波器

摘要: 过滤-估计动态系统状态的条件分布，给定部分和嘈杂观测-在许多科学和工程领域中都很重要，包括天气和气候预测。然而，对于高维非线性系统，通常无法获得过滤分布。实际应用中使用的滤波器，如集合卡尔曼滤波器（EnKF），为非线性系统提供有偏的概率估计，并具有许多调参参数。在这里，我们提出了一个学习参数化分析映射的框架-这个转换获取来自预测分布的样本，并与观测组合，以更新近似过滤分布-使用变分推断。原则上，这可以导致对过滤分布的更好近似，从而减小偏差。我们展示了这种方法可以用于学习增益矩阵，在一个仿射分析映射中，用于过滤线性和非线性动态系统；我们还研究了EnKF的膨胀和本地化参数的学习。这里开发的框架也可以用于学习具有更一般形式的分析映射的新的过滤算法。

更新时间: 2025-03-22 23:54:29

领域: cs.LG,math.DS,62M20, 93E11, 60G35, 62F15

下载: http://arxiv.org/abs/2406.18066v3

Near-Polynomially Competitive Active Logistic Regression

We address the problem of active logistic regression in the realizable setting. It is well known that active learning can require exponentially fewer label queries compared to passive learning, in some cases using $\log \frac{1}{\eps}$ rather than $\poly(1/\eps)$ labels to get error $\eps$ larger than the optimum. We present the first algorithm that is polynomially competitive with the optimal algorithm on every input instance, up to factors polylogarithmic in the error and domain size. In particular, if any algorithm achieves label complexity polylogarithmic in $\eps$, so does ours. Our algorithm is based on efficient sampling and can be extended to learn more general class of functions. We further support our theoretical results with experiments demonstrating performance gains for logistic regression compared to existing active learning algorithms.

Updated: 2025-03-22 23:43:20

标题: 接近多项式竞争的主动逻辑回归

摘要: 我们讨论了在可实现情境中的主动逻辑回归问题。众所周知，与被动学习相比，主动学习可能需要指数级别较少的标签查询，在某些情况下，使用$\log\frac{1}{\eps}$标签而不是$\poly(1/\eps)$标签，以获得比最优解更大的误差$\eps$。我们提出了第一个算法，该算法在每个输入实例上都与最优算法多项式竞争，最多比误差和域大小的多对数因子。特别是，如果任何算法实现了多对数复杂度在$\eps$中，我们的算法也会实现。我们的算法基于有效的抽样，并可以扩展到学习更一般类别的函数。我们进一步通过实验证明了与现有主动学习算法相比，逻辑回归的性能增益。

更新时间: 2025-03-22 23:43:20

领域: cs.LG

下载: http://arxiv.org/abs/2503.05981v3

EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability

Sophisticated phishing attacks have emerged as a major cybersecurity threat, becoming more common and difficult to prevent. Though machine learning techniques have shown promise in detecting phishing attacks, they function mainly as "black boxes" without revealing their decision-making rationale. This lack of transparency erodes the trust of users and diminishes their effective threat response. We present EXPLICATE: a framework that enhances phishing detection through a three-component architecture: an ML-based classifier using domain-specific features, a dual-explanation layer combining LIME and SHAP for complementary feature-level insights, and an LLM enhancement using DeepSeek v3 to translate technical explanations into accessible natural language. Our experiments show that EXPLICATE attains 98.4 % accuracy on all metrics, which is on par with existing deep learning techniques but has better explainability. High-quality explanations are generated by the framework with an accuracy of 94.2 % as well as a consistency of 96.8\% between the LLM output and model prediction. We create EXPLICATE as a fully usable GUI application and a light Chrome extension, showing its applicability in many deployment situations. The research shows that high detection performance can go hand-in-hand with meaningful explainability in security applications. Most important, it addresses the critical divide between automated AI and user trust in phishing detection systems.

Updated: 2025-03-22 23:37:35

标题: 解释：通过可解释的人工智能和LLM增强钓鱼检测

摘要: Sophisticated phishing attacks have emerged as a major cybersecurity threat, becoming more common and difficult to prevent. Though machine learning techniques have shown promise in detecting phishing attacks, they function mainly as "black boxes" without revealing their decision-making rationale. This lack of transparency erodes the trust of users and diminishes their effective threat response. We present EXPLICATE: a framework that enhances phishing detection through a three-component architecture: an ML-based classifier using domain-specific features, a dual-explanation layer combining LIME and SHAP for complementary feature-level insights, and an LLM enhancement using DeepSeek v3 to translate technical explanations into accessible natural language. Our experiments show that EXPLICATE attains 98.4% accuracy on all metrics, which is on par with existing deep learning techniques but has better explainability. High-quality explanations are generated by the framework with an accuracy of 94.2% as well as a consistency of 96.8% between the LLM output and model prediction. We create EXPLICATE as a fully usable GUI application and a light Chrome extension, showing its applicability in many deployment situations. The research shows that high detection performance can go hand-in-hand with meaningful explainability in security applications. Most importantly, it addresses the critical divide between automated AI and user trust in phishing detection systems.

更新时间: 2025-03-22 23:37:35

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.20796v1

Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior

Recent advancements in large language models (LLMs) have demonstrated that fine-tuning and human alignment can render LLMs harmless. In practice, such "harmlessness" behavior is mainly achieved by training models to reject harmful requests, such as "Explain how to burn down my neighbor's house", where the model appropriately declines to respond. However, this approach can inadvertently result in false refusal, where models reject benign queries as well, such as "Tell me how to kill a Python process". In this work, we demonstrate that prompting safety reflection before generating a response can mitigate false refusal behavior. Building on this finding, we introduce the Think-Before-Refusal (TBR) schema and conduct safety-aware instruction fine-tuning incorporating safety reflection. In an ablation study across 15 pre-trained models, we show that models fine-tuned with safety reflection significantly reduce false refusal behavior while maintaining safety and overall performance compared to those fine-tuned without safety reflection.

Updated: 2025-03-22 23:35:49

标题: 在拒绝之前先思考：触发低水平机器学习模型中的安全反思以减轻虚假拒绝行为

摘要: 最近大型语言模型（LLMs）的最新进展表明，微调和人类对齐可以使LLMs变得无害。在实践中，这种“无害”行为主要通过训练模型拒绝有害请求来实现，例如“解释如何烧毁我的邻居的房子”，在这种情况下，模型适当地拒绝回答。然而，这种方法可能会无意中导致虚假拒绝，即模型也会拒绝无害查询，比如“告诉我如何杀死一个Python进程”。在这项工作中，我们展示在生成响应之前提示安全反思可以减轻虚假拒绝行为。基于这一发现，我们引入了Think-Before-Refusal（TBR）模式，并进行了安全意识指导微调，其中包含安全反思。在对15个预训练模型进行消融研究中，我们展示，经过安全反思微调的模型显著减少虚假拒绝行为，同时与没有安全反思微调的模型相比保持安全性和整体性能。

更新时间: 2025-03-22 23:35:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17882v1

IceBench: A Benchmark for Deep Learning based Sea Ice Type Classification

Sea ice plays a critical role in the global climate system and maritime operations, making timely and accurate classification essential. However, traditional manual methods are time-consuming, costly, and have inherent biases. Automating sea ice type classification addresses these challenges by enabling faster, more consistent, and scalable analysis. While both traditional and deep learning approaches have been explored, deep learning models offer a promising direction for improving efficiency and consistency in sea ice classification. However, the absence of a standardized benchmark and comparative study prevents a clear consensus on the best-performing models. To bridge this gap, we introduce \textit{IceBench}, a comprehensive benchmarking framework for sea ice type classification. Our key contributions are threefold: First, we establish the IceBench benchmarking framework which leverages the existing AI4Arctic Sea Ice Challenge dataset as a standardized dataset, incorporates a comprehensive set of evaluation metrics, and includes representative models from the entire spectrum of sea ice type classification methods categorized in two distinct groups, namely, pixel-based classification methods and patch-based classification methods. IceBench is open-source and allows for convenient integration and evaluation of other sea ice type classification methods; hence, facilitating comparative evaluation of new methods and improving reproducibility in the field. Second, we conduct an in-depth comparative study on representative models to assess their strengths and limitations, providing insights for both practitioners and researchers. Third, we leverage IceBench for systematic experiments addressing key research questions on model transferability across seasons (time) and locations (space), data downscaling, and preprocessing strategies.

Updated: 2025-03-22 23:14:50

标题: IceBench：基于深度学习的海冰类型分类基准

摘要: 海冰在全球气候系统和海上作业中发挥着关键作用，因此及时准确的分类至关重要。然而，传统的手动方法耗时、昂贵且具有固有偏见。通过自动化海冰类型分类来解决这些挑战，可以实现更快速、更一致和可扩展的分析。虽然传统方法和深度学习方法都得到了探索，但深度学习模型为改善海冰分类效率和一致性提供了一个有前途的方向。然而，缺乏标准化基准和比较研究阻碍了对表现最佳模型的明确共识。为弥合这一差距，我们引入IceBench，这是一个用于海冰类型分类的全面基准框架。我们的主要贡献有三个方面：首先，我们建立了IceBench基准框架，利用现有的AI4Arctic海冰挑战数据集作为标准化数据集，包括全面的评估指标，并包含整个海冰类型分类方法光谱中的代表性模型，分为基于像素的分类方法和基于补丁的分类方法两个不同组别。IceBench是开源的，可以方便地集成和评估其他海冰类型分类方法，从而促进新方法的比较评估，提高领域的可重现性。其次，我们对代表性模型进行深入比较研究，评估它们的优势和局限性，为从业者和研究人员提供见解。第三，我们利用IceBench进行系统实验，解决关键研究问题，包括模型在季节（时间）和地点（空间）之间的可转移性，数据降尺度和预处理策略。

更新时间: 2025-03-22 23:14:50

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17877v1

Leveraging Multi-modal Representations to Predict Protein Melting Temperatures

Accurately predicting protein melting temperature changes (Delta Tm) is fundamental for assessing protein stability and guiding protein engineering. Leveraging multi-modal protein representations has shown great promise in capturing the complex relationships among protein sequences, structures, and functions. In this study, we develop models based on powerful protein language models, including ESM-2, ESM-3 and AlphaFold, using various feature extraction methods to enhance prediction accuracy. By utilizing the ESM-3 model, we achieve a new state-of-the-art performance on the s571 test dataset, obtaining a Pearson correlation coefficient (PCC) of 0.50. Furthermore, we conduct a fair evaluation to compare the performance of different protein language models in the Delta Tm prediction task. Our results demonstrate that integrating multi-modal protein representations could advance the prediction of protein melting temperatures.

Updated: 2025-03-22 23:01:55

标题: 利用多模态表示预测蛋白质熔化温度

摘要: 准确预测蛋白质熔解温度变化（ΔTm）对于评估蛋白质稳定性并指导蛋白质工程至关重要。利用多模态蛋白质表示已显示出在捕获蛋白质序列、结构和功能之间复杂关系方面具有巨大潜力。在本研究中，我们基于强大的蛋白质语言模型，包括ESM-2、ESM-3和AlphaFold，利用各种特征提取方法开发模型以增强预测准确性。通过利用ESM-3模型，我们在s571测试数据集上实现了一个新的最先进性能，获得了0.50的皮尔逊相关系数（PCC）。此外，我们进行了公平评估，比较不同蛋白质语言模型在ΔTm预测任务中的表现。我们的结果表明，整合多模态蛋白质表示可以推进蛋白质熔解温度的预测。

更新时间: 2025-03-22 23:01:55

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2412.04526v3

Synthetic media and computational capitalism: towards a critical theory of artificial intelligence

This paper develops a critical theory of artificial intelligence, within a historical constellation where computational systems increasingly generate cultural content that destabilises traditional distinctions between human and machine production. Through this analysis, I introduce the concept of the algorithmic condition, a cultural moment when machine-generated work not only becomes indistinguishable from human creation but actively reshapes our understanding of ideas of authenticity. This transformation, I argue, moves beyond false consciousness towards what I call post-consciousness, where the boundaries between individual and synthetic consciousness become porous. Drawing on critical theory and extending recent work on computational ideology, I develop three key theoretical contributions, first, the concept of the Inversion to describe a new computational turn in algorithmic society; second, automimetric production as a framework for understanding emerging practices of automated value creation; and third, constellational analysis as a methodological approach for mapping the complex interplay of technical systems, cultural forms and political economic structures. Through these contributions, I argue that we need new critical methods capable of addressing both the technical specificity of AI systems and their role in restructuring forms of life under computational capitalism. The paper concludes by suggesting that critical reflexivity is needed to engage with the algorithmic condition without being subsumed by it and that it represents a growing challenge for contemporary critical theory.

Updated: 2025-03-22 22:59:28

标题: 合成媒体与计算资本主义：走向人工智能的批判性理论

摘要: 本文发展了人工智能的批判理论，处于一个历史的环境中，计算系统越来越多地生成文化内容，这些内容破坏了传统的人类和机器生产之间的区别。通过这种分析，我介绍了算法条件的概念，即机器生成的作品不仅与人类创作难以区分，而且积极地重新塑造了我们对真实性概念的理解。我认为，这种转变超越了虚假意识，走向了我所称的后意识，其中个体和合成意识之间的界限变得模糊。借鉴批判理论并延伸最近关于计算意识形态的工作，我提出了三个关键的理论贡献，首先是倒置的概念，用来描述算法社会中的新的计算转向；其次是自我度量生产作为理解自动价值创造新兴实践的框架；第三是星座分析作为一种方法论方法，用于描绘技术系统、文化形式和政治经济结构之间复杂的相互作用。通过这些贡献，我认为我们需要新的批判方法，能够同时解决人工智能系统的技术特异性以及它们在重塑计算资本主义下的生活形式中的作用。文章结论指出，需要批判反思来应对算法条件，而不是被其吞没，这对当代批判理论构成了日益增加的挑战。

更新时间: 2025-03-22 22:59:28

领域: cs.CY,cs.AI,K.4.0; K.4.1

下载: http://arxiv.org/abs/2503.18976v1

Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models

The emergence of unveiling human-like behaviors in Large Language Models (LLMs) has led to a closer connection between NLP and human psychology. Scholars have been studying the inherent personalities exhibited by LLMs and attempting to incorporate human traits and behaviors into them. However, these efforts have primarily focused on commercially-licensed LLMs, neglecting the widespread use and notable advancements seen in Open LLMs. This work aims to address this gap by employing a set of 12 LLM Agents based on the most representative Open models and subject them to a series of assessments concerning the Myers-Briggs Type Indicator (MBTI) test and the Big Five Inventory (BFI) test. Our approach involves evaluating the intrinsic personality traits of Open LLM agents and determining the extent to which these agents can mimic human personalities when conditioned by specific personalities and roles. Our findings unveil that $(i)$ each Open LLM agent showcases distinct human personalities; $(ii)$ personality-conditioned prompting produces varying effects on the agents, with only few successfully mirroring the imposed personality, while most of them being ``closed-minded'' (i.e., they retain their intrinsic traits); and $(iii)$ combining role and personality conditioning can enhance the agents' ability to mimic human personalities. Our work represents a step up in understanding the dense relationship between NLP and human psychology through the lens of Open LLMs.

Updated: 2025-03-22 22:45:12

标题: 开放模型，封闭思维？关于代理通过开放大型语言模型模拟人类个性能力的研究

摘要: 大语言模型（LLMs）展现出人类行为的特质，使自然语言处理（NLP）与人类心理学之间的联系更加紧密。学者们一直在研究LLMs展现出的内在人格特征，并试图将人类特质和行为融入其中。然而，这些努力主要集中在商业许可的LLMs上，忽视了开放LLMs的广泛使用和显着进展。本研究旨在通过使用一组基于最具代表性的开放模型的12个LLM代理来填补这一空白，并对它们进行一系列关于迈尔斯-布里格斯类型指标（MBTI）测试和大五人格特质问卷（BFI）测试的评估。我们的方法涉及评估开放LLM代理的内在人格特征，并确定这些代理在特定人格和角色条件下能够模仿人类人格的程度。我们的研究发现：（i）每个开放LLM代理展示出不同的人类人格特征；（ii）人格条件提示对代理产生不同的影响，只有少数成功地模仿了强加的人格，而大多数代理是“心胸狭窄”（即，它们保留了自身的内在特征）；（iii）结合角色和人格条件可以增强代理模仿人类人格的能力。我们的工作代表了通过开放LLM的视角进一步理解NLP和人类心理学之间密切关系的一步。

更新时间: 2025-03-22 22:45:12

领域: cs.AI,cs.CL,cs.CY,cs.HC,physics.soc-ph

下载: http://arxiv.org/abs/2401.07115v3

A Distributed Blockchain-based Access Control for the Internet of Things

Recently, the Internet of Things (IoT) environment has become increasingly fertile for malicious users to break the security and privacy of IoT users. Access control is a paramount necessity to forestall illicit access. Traditional access control mechanisms are designed and managed in a centralized manner, thus rendering them unfit for decentralized IoT systems. To address the distributed IoT environment, blockchain is viewed as a promising decentralised data management technology. In this thesis, we investigate the state-of-art works in the domain of distributed blockchain-based access control. We establish the most important requirements and assess related works against them. We propose a Distributed Blockchain and Attribute-based Access Control model for IoT entitled (DBC-ABAC) that merges blockchain technology with the attribute-based access control model. A proof-of-concept implementation is presented using Hyperledger Fabric. To validate performance, we experimentally evaluate and compare our work with other recent works using Hyperledger Caliper tool. Results indicate that the proposed model surpasses other works in terms of latency and throughput with considerable efficiency.

Updated: 2025-03-22 22:36:02

标题: 基于区块链的分布式物联网访问控制

摘要: 最近，物联网（IoT）环境日益成为恶意用户破坏IoT用户安全和隐私的温床。访问控制是防止非法访问的至关重要性。传统的访问控制机制设计和管理方式是集中的，因此不适用于分散的IoT系统。为了解决分布式IoT环境的问题，区块链被视为一种有前途的分散数据管理技术。在这篇论文中，我们调查了基于分布式区块链的访问控制领域的最新工作。我们确定了最重要的要求，并对相关工作进行了评估。我们提出了一种名为分布式区块链和基于属性的访问控制模型（DBC-ABAC）的IoT模型，将区块链技术与基于属性的访问控制模型相结合。我们使用Hyperledger Fabric进行了一个概念验证实现。为了验证性能，我们使用Hyperledger Caliper工具对我们的工作进行了实验评估和比较。结果表明，提出的模型在延迟和吞吐量方面超越其他工作，并具有相当高的效率。

更新时间: 2025-03-22 22:36:02

领域: cs.CR

下载: http://arxiv.org/abs/2503.17873v1

good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval

Composed image retrieval (CIR) enables users to search images using a reference image combined with textual modifications. Recent advances in vision-language models have improved CIR, but dataset limitations remain a barrier. Existing datasets often rely on simplistic, ambiguous, or insufficient manual annotations, hindering fine-grained retrieval. We introduce good4cir, a structured pipeline leveraging vision-language models to generate high-quality synthetic annotations. Our method involves: (1) extracting fine-grained object descriptions from query images, (2) generating comparable descriptions for target images, and (3) synthesizing textual instructions capturing meaningful transformations between images. This reduces hallucination, enhances modification diversity, and ensures object-level consistency. Applying our method improves existing datasets and enables creating new datasets across diverse domains. Results demonstrate improved retrieval accuracy for CIR models trained on our pipeline-generated datasets. We release our dataset construction framework to support further research in CIR and multi-modal retrieval.

Updated: 2025-03-22 22:33:56

标题: good4cir：为组合图像检索生成详细的合成标题

摘要: 合成图像检索（CIR）使用户能够使用参考图像结合文本修改来搜索图像。最近视觉语言模型的进展提高了CIR的性能，但数据集限制仍然是一个障碍。现有数据集通常依赖于简单、模糊或不足的手动注释，阻碍了细粒度的检索。我们引入了good4cir，一个结构化的流水线，利用视觉语言模型生成高质量的合成注释。我们的方法包括：（1）从查询图像中提取细粒度对象描述，（2）为目标图像生成可比较的描述，（3）合成捕捉图像之间有意义转换的文本指令。这降低了幻觉，增强了修改多样性，并确保对象级别的一致性。应用我们的方法改善了现有数据集，并使得能够在不同领域创建新的数据集。结果表明，在我们的流水线生成的数据集上训练的CIR模型的检索准确性得到了提高。我们发布我们的数据集构建框架，以支持CIR和多模式检索的进一步研究。

更新时间: 2025-03-22 22:33:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17871v1

Accelerating and enhancing thermodynamic simulations of electrochemical interfaces

Electrochemical interfaces are crucial in catalysis, energy storage, and corrosion, where their stability and reactivity depend on complex interactions between the electrode, adsorbates, and electrolyte. Predicting stable surface structures remains challenging, as traditional surface Pourbaix diagrams tend to either rely on expert knowledge or costly $\textit{ab initio}$ sampling, and neglect thermodynamic equilibration with the environment. Machine learning (ML) potentials can accelerate static modeling but often overlook dynamic surface transformations. Here, we extend the Virtual Surface Site Relaxation-Monte Carlo (VSSR-MC) method to autonomously sample surface reconstructions modeled under aqueous electrochemical conditions. Through fine-tuning foundational ML force fields, we accurately and efficiently predict surface energetics, recovering known Pt(111) phases and revealing new LaMnO$_\mathrm{3}$(001) surface reconstructions. By explicitly accounting for bulk-electrolyte equilibria, our framework enhances electrochemical stability predictions, offering a scalable approach to understanding and designing materials for electrochemical applications.

Updated: 2025-03-22 22:33:19

标题: 加速和增强电化学界面热力学模拟

摘要: 电化学界面在催化、能源储存和腐蚀中至关重要，其稳定性和反应性取决于电极、吸附物和电解质之间复杂的相互作用。预测稳定的表面结构仍具有挑战性，传统的表面Pourbaix图往往要么依赖专业知识，要么需要昂贵的从头开始抽样，且忽略了与环境的热力学平衡。机器学习(ML)势可以加速静态建模，但往往忽视动态表面转化。在这里，我们将虚拟表面位点弛豫-蒙特卡洛(VSSR-MC)方法扩展到自主抽样在水性电化学条件下建模的表面重构。通过微调基础ML力场，我们准确、高效地预测表面能量学，恢复已知的Pt(111)相和揭示新的LaMnO$_\mathrm{3}$(001)表面重构。通过明确考虑体相-电解质平衡，我们的框架增强了电化学稳定性预测，为理解和设计电化学应用材料提供了一种可扩展的方法。

更新时间: 2025-03-22 22:33:19

领域: cond-mat.mtrl-sci,cond-mat.stat-mech,cs.CE,cs.LG

下载: http://arxiv.org/abs/2503.17870v1

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Scaling up self-supervised learning has driven breakthroughs in language and vision, yet comparable progress has remained elusive in reinforcement learning (RL). In this paper, we study building blocks for self-supervised RL that unlock substantial improvements in scalability, with network depth serving as a critical factor. Whereas most RL papers in recent years have relied on shallow architectures (around 2 - 5 layers), we demonstrate that increasing the depth up to 1024 layers can significantly boost performance. Our experiments are conducted in an unsupervised goal-conditioned setting, where no demonstrations or rewards are provided, so an agent must explore (from scratch) and learn how to maximize the likelihood of reaching commanded goals. Evaluated on simulated locomotion and manipulation tasks, our approach increases performance by $2\times$ - $50\times$. Increasing the model depth not only increases success rates but also qualitatively changes the behaviors learned.

Updated: 2025-03-22 22:24:37

标题: 1000层网络用于自监督RL：扩展深度可以实现新的目标达成能力

摘要: 自我监督学习的扩展推动了语言和视觉方面的突破，然而在强化学习领域取得可比进展仍然难以实现。在本文中，我们研究了自我监督强化学习的构建模块，解锁了可扩展性方面的显著改进，其中网络深度被视为一个关键因素。虽然近年来大多数强化学习论文依赖于浅层架构（约2-5层），我们展示了将深度增加到1024层可以显著提升性能。我们的实验是在无监督的目标条件设置下进行的，其中没有提供演示或奖励，因此代理必须探索（从头开始）并学习如何最大化达到指定目标的可能性。在模拟的移动和操作任务上进行评估，我们的方法将性能提高了2倍至50倍。增加模型深度不仅增加了成功率，还在学习的行为方面产生了质的变化。

更新时间: 2025-03-22 22:24:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.14858v2

Meta-Representational Predictive Coding: Biomimetic Self-Supervised Learning

Self-supervised learning has become an increasingly important paradigm in the domain of machine intelligence. Furthermore, evidence for self-supervised adaptation, such as contrastive formulations, has emerged in recent computational neuroscience and brain-inspired research. Nevertheless, current work on self-supervised learning relies on biologically implausible credit assignment -- in the form of backpropagation of errors -- and feedforward inference, typically a forward-locked pass. Predictive coding, in its mechanistic form, offers a biologically plausible means to sidestep these backprop-specific limitations. However, unsupervised predictive coding rests on learning a generative model of raw pixel input (akin to ``generative AI'' approaches), which entails predicting a potentially high dimensional input; on the other hand, supervised predictive coding, which learns a mapping between inputs to target labels, requires human annotation, and thus incurs the drawbacks of supervised learning. In this work, we present a scheme for self-supervised learning within a neurobiologically plausible framework that appeals to the free energy principle, constructing a new form of predictive coding that we call meta-representational predictive coding (MPC). MPC sidesteps the need for learning a generative model of sensory input (e.g., pixel-level features) by learning to predict representations of sensory input across parallel streams, resulting in an encoder-only learning and inference scheme. This formulation rests on active inference (in the form of sensory glimpsing) to drive the learning of representations, i.e., the representational dynamics are driven by sequences of decisions made by the model to sample informative portions of its sensorium.

Updated: 2025-03-22 22:13:14

标题: 元代表性预测编码：仿生自监督学习

摘要: 自我监督学习已成为机器智能领域中日益重要的范式。此外，最近计算神经科学和仿生大脑研究中出现了自我监督适应性的证据，例如对比形式。然而，目前关于自我监督学习的研究依赖于生物学上不可信的信用分配方式，即误差的反向传播，以及前向推理，通常是一种前向锁定的传递。在其机械形式中，预测编码提供了一种生物学上可信的方式来规避这些反向传播特定的限制。然而，无监督的预测编码依赖于学习原始像素输入的生成模型（类似于“生成式人工智能”方法），这意味着需要预测一个潜在的高维输入；另一方面，监督预测编码学习输入到目标标签的映射，需要人类注释，因此会带来监督学习的缺点。在这项工作中，我们提出了一个在神经生物学可信框架内进行自我监督学习的方案，该方案基于自由能原理，构建了一种我们称之为元表征预测编码（MPC）的新形式的预测编码。MPC通过学习预测传感输入的表征来跨并行流，从而避免了学习传感输入的生成模型（例如像素级特征），实现了仅编码器的学习和推理方案。这种表述基于主动推理（以感官瞥见的形式）来驱动表征的学习，即通过模型做出的决策序列来采样其感官器官中的信息性部分。

更新时间: 2025-03-22 22:13:14

领域: cs.NE,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2503.21796v1

A Privacy Model for Classical & Learned Bloom Filters

The Classical Bloom Filter (CBF) is a class of Probabilistic Data Structures (PDS) for handling Approximate Query Membership (AMQ). The Learned Bloom Filter (LBF) is a recently proposed class of PDS that combines the Classical Bloom Filter with a Learning Model while preserving the Bloom Filter's one-sided error guarantees. Bloom Filters have been used in settings where inputs are sensitive and need to be private in the presence of an adversary with access to the Bloom Filter through an API or in the presence of an adversary who has access to the internal state of the Bloom Filter. This paper conducts a rigorous differential privacy-based analysis for the Bloom Filter. We propose constructions that satisfy differential privacy and asymmetric differential privacy. This is also the first work that analyses and addresses the privacy of the Learned Bloom Filter under any rigorous model, which is an open problem.

Updated: 2025-03-22 22:11:55

标题: 一个用于传统和学习式布隆过滤器的隐私模型

摘要: 经典布隆过滤器（CBF）是处理近似查询成员资格（AMQ）的概率数据结构（PDS）类别。学习布隆过滤器（LBF）是一种最近提出的PDS类别，它将经典布隆过滤器与学习模型结合在一起，同时保留了布隆过滤器的单向错误保证。布隆过滤器已经在需要对输入进行隐私保护的设置中使用，这些输入在对手通过API访问布隆过滤器或对手可以访问布隆过滤器的内部状态时需要保持私密。本文对布隆过滤器进行了严格的基于差分隐私的分析。我们提出了满足差分隐私和非对称差分隐私的构造。这也是第一项在任何严格模型下分析和解决学习布隆过滤器隐私问题的工作，这是一个未解决的问题。

更新时间: 2025-03-22 22:11:55

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2501.15751v2

Detecting and Mitigating DDoS Attacks with AI: A Survey

Distributed Denial of Service attacks represent an active cybersecurity research problem. Recent research shifted from static rule-based defenses towards AI-based detection and mitigation. This comprehensive survey covers several key topics. Preeminently, state-of-the-art AI detection methods are discussed. An in-depth taxonomy based on manual expert hierarchies and an AI-generated dendrogram are provided, thus settling DDoS categorization ambiguities. An important discussion on available datasets follows, covering data format options and their role in training AI detection methods together with adversarial training and examples augmentation. Beyond detection, AI based mitigation techniques are surveyed as well. Finally, multiple open research directions are proposed.

Updated: 2025-03-22 21:54:23

标题: 使用人工智能检测和缓解DDoS攻击：一项调查

摘要: 分布式拒绝服务攻击代表了一个活跃的网络安全研究问题。最近的研究从静态基于规则的防御转向基于人工智能的检测和缓解。这份全面的调查涵盖了几个关键主题。尤其是，最先进的人工智能检测方法被讨论了。基于手动专家层次结构和人工智能生成的树状图的深入分类法被提供，从而解决了DDoS分类的模糊性。随后是关于可用数据集的重要讨论，涵盖数据格式选项及其在训练人工智能检测方法中的作用，还包括对抗性训练和示例增强。除了检测，还对基于人工智能的缓解技术进行了调查。最后，提出了多个开放的研究方向。

更新时间: 2025-03-22 21:54:23

领域: cs.CR,cs.AI,cs.LG,cs.NI

下载: http://arxiv.org/abs/2503.17867v1

The Limits of Assumption-free Tests for Algorithm Performance

Algorithm evaluation and comparison are fundamental questions in machine learning and statistics -- how well does an algorithm perform at a given modeling task, and which algorithm performs best? Many methods have been developed to assess algorithm performance, often based around cross-validation type strategies, retraining the algorithm of interest on different subsets of the data and assessing its performance on the held-out data points. Despite the broad use of such procedures, the theoretical properties of these methods are not yet fully understood. In this work, we explore some fundamental limits for answering these questions with limited amounts of data. In particular, we make a distinction between two questions: how good is an algorithm $A$ at the problem of learning from a training set of size $n$, versus, how good is a particular fitted model produced by running $A$ on a particular training data set of size $n$? Our main results prove that, for any test that treats the algorithm $A$ as a ``black box'' (i.e., we can only study the behavior of $A$ empirically), there is a fundamental limit on our ability to carry out inference on the performance of $A$, unless the number of available data points $N$ is many times larger than the sample size $n$ of interest. (On the other hand, evaluating the performance of a particular fitted model is easy as long as a holdout data set is available -- that is, as long as $N-n$ is not too small.) We also ask whether an assumption of algorithmic stability might be sufficient to circumvent this hardness result. Surprisingly, we find that this is not the case: the same hardness result still holds for the problem of evaluating the performance of $A$, aside from a high-stability regime where fitted models are essentially nonrandom. Finally, we also establish similar hardness results for the problem of comparing multiple algorithms.

Updated: 2025-03-22 21:51:23

标题: 算法性能无假设测试的局限性

摘要: 算法评估和比较是机器学习和统计学中的基本问题--一个算法在给定建模任务中表现如何，哪个算法表现最佳？许多方法已经被开发用于评估算法性能，通常基于交叉验证类型的策略，重新训练感兴趣的算法在不同数据子集上，并评估其在留出数据点上的性能。尽管广泛使用这些程序，但这些方法的理论性质尚未完全理解。在这项工作中，我们探讨了在有限数据量下回答这些问题的一些基本限制。特别是，我们区分了两个问题：算法$A$在大小为$n$的训练集上学习问题上有多好，以及通过在大小为$n$的特定训练数据集上运行$A$产生的特定拟合模型有多好？我们的主要结果证明，对于任何将算法$A$视为“黑匣子”的测试（即，我们只能根据经验研究$A$的行为），除非可用数据点数$N$比感兴趣的样本量$n$多几倍，否则我们无法对$A$的性能进行推断。（另一方面，只要有留出数据集可用，评估特定拟合模型的性能就很容易--也就是说，只要$N-n$不是太小。）我们还询问算法稳定性的假设是否足以规避这一困难结果。令人惊讶的是，我们发现这并非如此：对于评估$A$的性能问题，除了高稳定性区域外，相同的困难结果仍然存在，其中拟合模型基本上是非随机的。最后，我们还为比较多个算法的问题建立了类似的困难结果。

更新时间: 2025-03-22 21:51:23

领域: math.ST,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2402.07388v3

A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements

This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-informed machine learning capabilities for modeling nonlinear dynamics with control and integrates them into a model predictive control framework. To demonstrate the capability of the proposed method we test and validate with two noisy nonlinear dynamic systems: the chaotic Lorenz 3 system, and turning machine tool. Analysis of the results illustrate that the proposed method outperforms state-of-the-art benchmarks as measured by both modeling accuracy and control performance for nonlinear dynamic systems under high-noise conditions.

Updated: 2025-03-22 21:47:19

标题: 一个基于物理学的机器学习控制方法，用于具有高度嘈杂测量的非线性动态系统

摘要: 本研究提出了一种基于物理信息的机器学习控制方法，用于具有高度噪声测量的非线性动态系统。现有的利用机器学习进行系统识别的数据驱动控制方法无法有效处理高度噪声的测量，导致控制性能不稳定。为解决这一挑战，本研究扩展了当前物理信息机器学习模型非线性动力学建模的能力，并将其集成到模型预测控制框架中。为了展示所提出方法的能力，我们对两个具有噪声的非线性动态系统进行了测试和验证：混沌的Lorenz 3系统和转动机床。结果分析表明，所提出的方法在高噪声条件下的非线性动态系统的建模精度和控制性能方面优于最先进的基准。

更新时间: 2025-03-22 21:47:19

领域: eess.SY,cs.LG,cs.SY,math.DS

下载: http://arxiv.org/abs/2311.07613v2

Removing Structured Noise with Diffusion Models

Solving ill-posed inverse problems requires careful formulation of prior beliefs over the signals of interest and an accurate description of their manifestation into noisy measurements. Handcrafted signal priors based on e.g. sparsity are increasingly replaced by data-driven deep generative models, and several groups have recently shown that state-of-the-art score-based diffusion models yield particularly strong performance and flexibility. In this paper, we show that the powerful paradigm of posterior sampling with diffusion models can be extended to include rich, structured, noise models. To that end, we propose a joint conditional reverse diffusion process with learned scores for the noise and signal-generating distribution. We demonstrate strong performance gains across various inverse problems with structured noise, outperforming competitive baselines that use normalizing flows and adversarial networks. This opens up new opportunities and relevant practical applications of diffusion modeling for inverse problems in the context of non-Gaussian measurement models.

Updated: 2025-03-22 21:39:08

标题: 用扩散模型消除结构化噪声

摘要: 解决不适定的反问题需要仔细制定关于感兴趣信号的先验信念，并准确描述它们在嘈杂测量中的表现。基于稀疏性等手工信号先验正逐渐被基于数据驱动深度生成模型所取代，最近一些团队表明，基于最先进的基于分数的扩散模型产生特别强大的性能和灵活性。在本文中，我们展示了扩散模型后验抽样的强大范式可以扩展到包括丰富、结构化的噪声模型。为此，我们提出了一个具有学习分数的噪声和信号生成分布的联合条件反向扩散过程。我们在各种具有结构化噪声的反问题中展示了强大的性能增益，超越了使用归一化流和对抗网络的竞争基线。这为扩散建模在非高斯测量模型背景下的反问题开辟了新的机会和相关实际应用。

更新时间: 2025-03-22 21:39:08

领域: cs.LG,eess.IV,eess.SP

下载: http://arxiv.org/abs/2302.05290v4

Consistent Validation for Predictive Methods in Spatial Settings

Spatial prediction tasks are key to weather forecasting, studying air pollution impacts, and other scientific endeavors. Determining how much to trust predictions made by statistical or physical methods is essential for the credibility of scientific conclusions. Unfortunately, classical approaches for validation fail to handle mismatch between locations available for validation and (test) locations where we want to make predictions. This mismatch is often not an instance of covariate shift (as commonly formalized) because the validation and test locations are fixed (e.g., on a grid or at select points) rather than i.i.d. from two distributions. In the present work, we formalize a check on validation methods: that they become arbitrarily accurate as validation data becomes arbitrarily dense. We show that classical and covariate-shift methods can fail this check. We propose a method that builds from existing ideas in the covariate-shift literature, but adapts them to the validation data at hand. We prove that our proposal passes our check. And we demonstrate its advantages empirically on simulated and real data.

Updated: 2025-03-22 21:39:04

标题: 空间环境中预测方法的一致性验证

摘要: 空间预测任务对于天气预报、研究空气污染影响以及其他科学工作至关重要。确定要信任统计或物理方法进行的预测有多少是科学结论可信度的关键。不幸的是，传统的验证方法无法处理验证可用位置与我们想要进行预测的（测试）位置之间的不匹配。这种不匹配通常不是协变量转移的一个实例（通常形式化），因为验证和测试位置是固定的（例如，在网格上或在选择的点上），而不是来自两个分布的独立同分布。在本研究中，我们正式化对验证方法进行检查：随着验证数据变得越来越密集，它们变得越来越准确。我们展示了传统和协变量转移方法可能无法通过此检查。我们提出了一种方法，该方法基于协变量转移文献中的现有思想，但是根据手头的验证数据进行调整。我们证明了我们的提议通过了我们的检查。我们在模拟和实际数据上实证地展示了它的优势。

更新时间: 2025-03-22 21:39:04

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.03527v3

Proactive and Reactive Constraint Programming for Stochastic Project Scheduling with Maximal Time-Lags

This study investigates scheduling strategies for the stochastic resource-constrained project scheduling problem with maximal time lags (SRCPSP/max)). Recent advances in Constraint Programming (CP) and Temporal Networks have reinvoked interest in evaluating the advantages and drawbacks of various proactive and reactive scheduling methods. First, we present a new, CP-based fully proactive method. Second, we show how a reactive approach can be constructed using an online rescheduling procedure. A third contribution is based on partial order schedules and uses Simple Temporal Networks with Uncertainty (STNUs). Our statistical analysis shows that the STNU-based algorithm performs best in terms of solution quality, while also showing good relative offline and online computation time.

Updated: 2025-03-22 21:20:27

标题: 主动和被动约束编程用于具有最大时间延迟的随机项目调度

摘要: 本研究探讨了具有最大时间延迟的随机资源约束项目调度问题（SRCPSP/max）的调度策略。约束编程（CP）和时间网络的最新进展重新激发了评估各种主动和被动调度方法的优势和劣势的兴趣。首先，我们提出了一种基于CP的全面主动方法。其次，我们展示了如何使用在线重新调度程序构建一种被动方法。第三个贡献基于部分排序调度，并使用具有不确定性的简单时间网络（STNUs）。我们的统计分析显示，基于STNU的算法在解决方案质量方面表现最佳，同时还显示出良好的相对离线和在线计算时间。

更新时间: 2025-03-22 21:20:27

领域: cs.AI

下载: http://arxiv.org/abs/2409.09107v4

Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality

The goal of the Inverse reinforcement learning (IRL) task is to identify the underlying reward function and the corresponding optimal policy from a set of expert demonstrations. While most IRL algorithms' theoretical guarantees rely on a linear reward structure, we aim to extend the theoretical understanding of IRL to scenarios where the reward function is parameterized by neural networks. Meanwhile, conventional IRL algorithms usually adopt a nested structure, leading to computational inefficiency, especially in high-dimensional settings. To address this problem, we propose the first two-timescale single-loop IRL algorithm under neural network parameterized reward and provide a non-asymptotic convergence analysis under overparameterization. Although prior optimality results for linear rewards do not apply, we show that our algorithm can identify the globally optimal reward and policy under certain neural network structures. This is the first IRL algorithm with a non-asymptotic convergence guarantee that provably achieves global optimality in neural network settings.

Updated: 2025-03-22 21:16:08

标题: 理解过度参数化下的逆强化学习：非渐近分析与全局最优性

摘要: 逆强化学习（IRL）任务的目标是从一组专家演示中识别基础奖励函数和相应的最优策略。虽然大多数IRL算法的理论保证依赖于线性奖励结构，但我们的目标是将对IRL的理论理解扩展到奖励函数由神经网络参数化的情况。同时，传统的IRL算法通常采用嵌套结构，导致计算效率低下，特别是在高维设置中。为了解决这个问题，我们提出了第一个在神经网络参数化奖励下的两时间尺度单循环IRL算法，并在过参数化下提供了非渐近收敛分析。尽管线性奖励的先前最优结果不适用，但我们表明我们的算法可以在某些神经网络结构下识别全局最优奖励和策略。这是第一个具有非渐近收敛保证的IRL算法，在神经网络设置中可以明确实现全局最优性。

更新时间: 2025-03-22 21:16:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.17865v1

LLM+KG@VLDB'24 Workshop Summary

The unification of large language models (LLMs) and knowledge graphs (KGs) has emerged as a hot topic. At the LLM+KG'24 workshop, held in conjunction with VLDB 2024 in Guangzhou, China, one of the key themes explored was important data management challenges and opportunities due to the effective interaction between LLMs and KGs. This report outlines the major directions and approaches presented by various speakers during the LLM+KG'24 workshop.

Updated: 2025-03-22 20:58:34

标题: LLM+KG@VLDB'24研讨会总结

摘要: 大型语言模型（LLMs）和知识图谱（KGs）的统一已经成为一个热门话题。在与VLDB 2024会议同时举行的LLM+KG'24研讨会上，探讨了由LLMs和KGs之间有效互动带来的重要数据管理挑战和机遇。本报告概述了LLM+KG'24研讨会期间各位发言人提出的主要方向和方法。

更新时间: 2025-03-22 20:58:34

领域: cs.DB,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.01978v2

Differentiable Optimization for Deep Learning-Enhanced DC Approximation of AC Optimal Power Flow

The growing scale of power systems and the increasing uncertainty introduced by renewable energy sources necessitates novel optimization techniques that are significantly faster and more accurate than existing methods. The AC Optimal Power Flow (AC-OPF) problem, a core component of power grid optimization, is often approximated using linearized DC Optimal Power Flow (DC-OPF) models for computational tractability, albeit at the cost of suboptimal and inefficient decisions. To address these limitations, we propose a novel deep learning-based framework for network equivalency that enhances DC-OPF to more closely mimic the behavior of AC-OPF. The approach utilizes recent advances in differentiable optimization, incorporating a neural network trained to predict adjusted nodal shunt conductances and branch susceptances in order to account for nonlinear power flow behavior. The model can be trained end-to-end using modern deep learning frameworks by leveraging the implicit function theorem. Results demonstrate the framework's ability to significantly improve prediction accuracy, paving the way for more reliable and efficient power systems.

Updated: 2025-03-22 20:53:53

标题: 深度学习增强的直流交流最优潮流的可微优化

摘要: 随着电力系统规模不断增长和可再生能源引入的不确定性日益增加，需要新颖的优化技术，这些技术比现有方法要快得多且更准确。交流最优潮流（AC-OPF）问题是电网优化的核心组成部分，通常使用线性化的直流最优潮流（DC-OPF）模型进行近似处理，以便进行计算，尽管这会带来次优和低效的决策。为了解决这些限制，我们提出了一种基于深度学习的网络等效性框架，以增强DC-OPF以更接近AC-OPF的行为。该方法利用了可微优化的最新进展，引入了一个神经网络，经过训练可以预测调整后的节点并联电导和支路电感，以便考虑非线性功率流行为。该模型可以通过利用隐函数定理，在现代深度学习框架下进行端到端的训练。结果表明，该框架能够显著提高预测准确性，为更可靠和高效的电力系统铺平道路。

更新时间: 2025-03-22 20:53:53

领域: math.OC,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.01970v1

A Causal Adjustment Module for Debiasing Scene Graph Generation

While recent debiasing methods for Scene Graph Generation (SGG) have shown impressive performance, these efforts often attribute model bias solely to the long-tail distribution of relationships, overlooking the more profound causes stemming from skewed object and object pair distributions. In this paper, we employ causal inference techniques to model the causality among these observed skewed distributions. Our insight lies in the ability of causal inference to capture the unobservable causal effects between complex distributions, which is crucial for tracing the roots of model bias. Specifically, we introduce the Mediator-based Causal Chain Model (MCCM), which, in addition to modeling causality among objects, object pairs, and relationships, incorporates mediator variables, i.e., cooccurrence distribution, for complementing the causality. Following this, we propose the Causal Adjustment Module (CAModule) to estimate the modeled causal structure, using variables from MCCM as inputs to produce a set of adjustment factors aimed at correcting biased model predictions. Moreover, our method enables the composition of zero-shot relationships, thereby enhancing the model's ability to recognize such relationships. Experiments conducted across various SGG backbones and popular benchmarks demonstrate that CAModule achieves state-of-the-art mean recall rates, with significant improvements also observed on the challenging zero-shot recall rate metric.

Updated: 2025-03-22 20:44:01

标题: 一个用于去偏差场景图生成的因果调整模块。

摘要: 最近针对场景图生成（SGG）的去偏方法表现出令人印象深刻的性能，这些努力通常将模型偏差归因于关系的长尾分布，而忽视了源自偏斜对象和对象对分布的更深层次原因。在本文中，我们采用因果推断技术来建模这些观察到的偏斜分布之间的因果关系。我们的洞察力在于因果推断能够捕捉复杂分布之间的不可观测因果效应，这对于追溯模型偏差的根源至关重要。具体来说，我们引入了基于中介的因果链模型（MCCM），除了建模对象、对象对和关系之间的因果关系外，还包括中介变量，即共现分布，用于补充因果关系。在此基础上，我们提出了因果调整模块（CAModule）来估计建模的因果结构，使用来自MCCM的变量作为输入，生成一组调整因子，旨在纠正偏差的模型预测。此外，我们的方法还能够组合零样本关系，从而增强模型识别此类关系的能力。在各种SGG骨干结构和流行基准上进行的实验表明，CAModule实现了最先进的平均召回率，同时在具有挑战性的零样本召回率指标上也观察到了显著的改进。

更新时间: 2025-03-22 20:44:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17862v1

Plurals: A System for Guiding LLMs Via Simulated Social Ensembles

Recent debates raised concerns that language models may favor certain viewpoints. But what if the solution is not to aim for a 'view from nowhere' but rather to leverage different viewpoints? We introduce Plurals, a system and Python library for pluralistic AI deliberation. Plurals consists of Agents (LLMs, optionally with personas) which deliberate within customizable Structures, with Moderators overseeing deliberation. Plurals is a generator of simulated social ensembles. Plurals integrates with government datasets to create nationally representative personas, includes deliberation templates inspired by deliberative democracy, and allows users to customize both information-sharing structures and deliberation behavior within Structures. Six case studies demonstrate fidelity to theoretical constructs and efficacy. Three randomized experiments show simulated focus groups produced output resonant with an online sample of the relevant audiences (chosen over zero-shot generation in 75% of trials). Plurals is both a paradigm and a concrete system for pluralistic AI. The Plurals library is available at https://github.com/josh-ashkinaze/plurals and will be continually updated.

Updated: 2025-03-22 20:30:18

标题: 复数形式：通过模拟社会群体引导LLMs的系统

摘要: 最近的辩论引起了关于语言模型可能偏向某些观点的担忧。但如果解决方案不是追求“无处不在的观点”，而是利用不同的观点呢？我们介绍了Plurals，一个支持多元化AI审议的系统和Python库。Plurals由代理人（LLMs，可选地具有角色）组成，他们在可定制的结构中进行审议，由主持人监督审议。Plurals是一个模拟社会集合体的生成器。Plurals与政府数据集集成，以创建具有全国代表性的角色，包括受启发于审议民主的审议模板，并允许用户在结构内定制信息共享结构和审议行为。六个案例研究展示了对理论构想和有效性的忠实性。三个随机实验表明，模拟焦点小组产生了与相关受众在线样本共鸣的输出（在75%的试验中选择超过零样本生成）。Plurals既是一种范式，又是一个具体的多元化AI系统。Plurals库可在https://github.com/josh-ashkinaze/plurals 上获得，并将持续更新。

更新时间: 2025-03-22 20:30:18

领域: cs.CL,cs.AI,cs.CY,cs.HC,cs.MA

下载: http://arxiv.org/abs/2409.17213v6

A novel gradient-based method for decision trees optimizing arbitrary differential loss functions

There are many approaches for training decision trees. This work introduces a novel gradient-based method for constructing decision trees that optimize arbitrary differentiable loss functions, overcoming the limitations of heuristic splitting rules. Unlike traditional approaches that rely on heuristic splitting rules, the proposed method refines predictions using the first and second derivatives of the loss function, enabling the optimization of complex tasks such as classification, regression, and survival analysis. We demonstrate the method's applicability to classification, regression, and survival analysis tasks, including those with censored data. Numerical experiments on both real and synthetic datasets compare the proposed method with traditional decision tree algorithms, such as CART, Extremely Randomized Trees, and SurvTree. The implementation of the method is publicly available, providing a practical tool for researchers and practitioners. This work advances the field of decision tree-based modeling, offering a more flexible and accurate approach for handling structured data and complex tasks. By leveraging gradient-based optimization, the proposed method bridges the gap between traditional decision trees and modern machine learning techniques, paving the way for further innovations in interpretable and high-performing models.

Updated: 2025-03-22 20:25:30

标题: 一种用于优化任意差分损失函数的决策树的新型基于梯度的方法

摘要: 有很多方法可以用来训练决策树。这项工作引入了一种新颖的基于梯度的方法，用于构建能够优化任意可微损失函数的决策树，克服了启发式分裂规则的局限性。与依赖启发式分裂规则的传统方法不同，所提出的方法利用损失函数的一阶和二阶导数来优化预测，实现了对复杂任务的优化，如分类、回归和生存分析。我们展示了该方法在分类、回归和生存分析任务中的适用性，包括具有被截尾数据的任务。在真实数据集和合成数据集上进行的数值实验将所提出的方法与传统的决策树算法进行了比较，如CART、极端随机树和SurvTree。该方法的实现是公开可用的，为研究人员和实践者提供了一个实用工具。这项工作推动了基于决策树的建模领域的发展，提供了一种更灵活、更准确的处理结构化数据和复杂任务的方法。通过利用基于梯度的优化，所提出的方法弥合了传统决策树与现代机器学习技术之间的差距，为可解释和高性能模型的进一步创新铺平了道路。

更新时间: 2025-03-22 20:25:30

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17855v1

Deep learning framework for action prediction reveals multi-timescale locomotor control

Modeling movement in real-world tasks is a fundamental scientific goal for motor control, biomechanics, and rehabilitation engineering. However, existing models and their simplifying assumptions such as linear and fixed timescale mappings do not generalize to real-world contexts. Here, we develop a deep learning-based framework for action prediction with architecture-dependent trial embedding, outperforming traditional models across multiple contexts (walking and running, treadmill and overground, varying terrains) and input modalities (multiple body states, gaze). We find that neural network architectures with flexible input history-dependence like GRU and Transformer perform best overall. By quantifying the model's predictions relative to an autoregressive baseline, we identify context- and modality-dependent timescales. There is greater reliance on fast-timescale predictions in complex terrain, gaze predictions precede body state predictions, and full-body state predictions precede center-of-mass-relevant predictions. This deep learning framework for action prediction provides quantifiable insights into the control of complex movements and can be extended to other actions, contexts, and populations.

Updated: 2025-03-22 20:22:39

标题: 深度学习框架用于动作预测，揭示多时间尺度的运动控制

摘要: 在真实世界任务中建模运动是运动控制、生物力学和康复工程的一个基本科学目标。然而，现有的模型及其简化假设，如线性和固定时间尺度映射，并不能推广到真实世界的情境。在这里，我们开发了一个基于深度学习的行动预测框架，通过依赖于架构的试验嵌入，在多个情境（步行和跑步、跑步机和室外、不同地形）和输入模态（多个身体状态、凝视）中表现优于传统模型。我们发现具有灵活输入历史依赖性的神经网络架构，如GRU和Transformer，在整体上表现最佳。通过量化模型相对于自回归基线的预测，我们确定了情境和模态相关的时间尺度。在复杂地形中更依赖于快时间尺度的预测，凝视预测先于身体状态预测，全身状态预测先于质心相关的预测。这个行动预测的深度学习框架提供了对复杂运动控制的可量化见解，并可以扩展到其他行动、情境和人群。

更新时间: 2025-03-22 20:22:39

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.16340v2

Active management of battery degradation in wireless sensor network using deep reinforcement learning for group battery replacement

Wireless sensor networks (WSNs) have become a promising solution for structural health monitoring (SHM), especially in hard-to-reach or remote locations. Battery-powered WSNs offer various advantages over wired systems, however limited battery life has always been one of the biggest obstacles in practical use of the WSNs, regardless of energy harvesting methods. While various methods have been studied for battery health management, existing methods exclusively aim to extend lifetime of individual batteries, lacking a system level view. A consequence of applying such methods is that batteries in a WSN tend to fail at different times, posing significant difficulty on planning and scheduling of battery replacement trip. This study investigate a deep reinforcement learning (DRL) method for active battery degradation management by optimizing duty cycle of WSNs at the system level. This active management strategy effectively reduces earlier failure of battery individuals which enable group replacement without sacrificing WSN performances. A simulated environment based on a real-world WSN setup was developed to train a DRL agent and learn optimal duty cycle strategies. The performance of the strategy was validated in a long-term setup with various network sizes, demonstrating its efficiency and scalability.

Updated: 2025-03-22 20:21:34

标题: 使用深度强化学习对无线传感器网络中的电池退化进行积极管理，用于群体电池更换

摘要: 无线传感器网络（WSNs）已成为结构健康监测（SHM）的一种有前途的解决方案，特别是在难以接触或偏远地区。电池供电的WSNs相对有线系统具有各种优势，但是有限的电池寿命一直是WSNs实际使用中的最大障碍之一，无论采用何种能量收集方法。虽然已经研究了各种电池健康管理方法，但现有方法仅旨在延长单个电池的寿命，缺乏系统级别的视角。应用此类方法的一个后果是WSN中的电池往往在不同的时间点发生故障，给电池更换行程的规划和安排带来了重大困难。本研究通过优化WSNs的系统级别工作循环，探讨了一种深度强化学习（DRL）方法来进行主动电池退化管理。这种主动管理策略有效地减少了电池个体的早期故障，从而实现了群体更换而不损害WSNs的性能。基于真实世界WSN设置的模拟环境被开发用于训练一个DRL代理并学习最佳的工作循环策略。该策略在各种网络规模的长期设置中得到了验证，展示了其效率和可扩展性。

更新时间: 2025-03-22 20:21:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.15865v2

Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning

Medical coding, the translation of unstructured clinical text into standardized medical codes, is a crucial but time-consuming healthcare practice. Though large language models (LLM) could automate the coding process and improve the efficiency of such tasks, interpretability remains paramount for maintaining patient trust. Current efforts in interpretability of medical coding applications rely heavily on label attention mechanisms, which often leads to the highlighting of extraneous tokens irrelevant to the ICD code. To facilitate accurate interpretability in medical language models, this paper leverages dictionary learning that can efficiently extract sparsely activated representations from dense language model embeddings in superposition. Compared with common label attention mechanisms, our model goes beyond token-level representations by building an interpretable dictionary which enhances the mechanistic-based explanations for each ICD code prediction, even when the highlighted tokens are medically irrelevant. We show that dictionary features can steer model behavior, elucidate the hidden meanings of upwards of 90% of medically irrelevant tokens, and are human interpretable.

Updated: 2025-03-22 20:20:38

标题: 超越标签注意力：通过词典学习实现自动医学编码的语言模型透明化

摘要: 医学编码是将非结构化临床文本转化为标准化医学编码的关键但耗时的医疗实践。尽管大型语言模型（LLM）可以自动化编码过程并提高此类任务的效率，但可解释性仍然是维护患者信任的重要因素。当前医学编码应用可解释性方面的努力主要依赖于标签注意力机制，这经常导致突出显示与ICD编码无关的多余标记。为了在医学语言模型中促进准确的可解释性，本文利用字典学习，可以从密集语言模型嵌入中高效提取稀疏激活表示。与常见的标签注意力机制相比，我们的模型通过构建一个可解释的字典，超越了标记级别的表示，增强了每个ICD编码预测的基于机制的解释，即使突出显示的标记与医学无关。我们展示了字典特征可以引导模型行为，阐明了超过90%的医学无关标记的隐藏含义，并且具有人类可解释性。

更新时间: 2025-03-22 20:20:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.00173v2

ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation

Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, their complexity makes latent token representations difficult to interpret. We introduce ULTra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them. ULTra enables unsupervised semantic segmentation using pre-trained models without requiring fine-tuning. Additionally, we propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model. Our method achieves state-of-the-art performance in unsupervised semantic segmentation, outperforming existing segmentation methods. Furthermore, we validate ULTra for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization using LLMs, demonstrating its broad applicability in explaining the semantic structure of latent token representations.

Updated: 2025-03-22 19:54:49

标题: ULTra：揭示基于Transformer的理解和分割中的潜在令牌可解释性

摘要: 变压器通过自注意机制彻底改变了计算机视觉（CV）。然而，它们的复杂性使得潜在的令牌表示难以解释。我们引入了ULTra，一个用于解释变压器嵌入并揭示其中有意义语义模式的框架。ULTra能够利用预训练模型进行无监督语义分割，而无需进行微调。此外，我们提出了一种自监督训练方法，通过学习外部变换矩阵来改善分割性能，而不修改基础模型。我们的方法在无监督语义分割方面取得了最先进的性能，超过了现有的分割方法。此外，我们验证了ULTra在合成和真实场景中的模型解释能力，包括使用LLMs进行对象选择和可解释的文本摘要，展示了它在解释潜在令牌表示的语义结构方面的广泛适用性。

更新时间: 2025-03-22 19:54:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.12589v2

On the Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures

Risk-sensitive reinforcement learning (RL) has become a popular tool for controlling the risk of uncertain outcomes and ensuring reliable performance in highly stochastic sequential decision-making problems. While Policy Gradient (PG) methods have been developed for risk-sensitive RL, it remains unclear if these methods enjoy the same global convergence guarantees as in the risk-neutral case \citep{mei2020global,agarwal2021theory,cen2022fast,bhandari2024global}. In this paper, we consider a class of dynamic time-consistent risk measures, named Expected Conditional Risk Measures (ECRMs), and derive PG and Natural Policy Gradient (NPG) updates for ECRMs-based RL problems. We provide global optimality {and iteration complexities} of the proposed algorithms under the following four settings: (i) PG with constrained direct parameterization, (ii) PG with softmax parameterization and log barrier regularization, (iii) NPG with softmax parameterization and entropy regularization, and (iv) approximate NPG with inexact policy evaluation. Furthermore, we test a risk-averse REINFORCE algorithm \citep{williams1992simple} and a risk-averse NPG algorithm \citep{kakade2001natural} on a stochastic Cliffwalk environment to demonstrate the efficacy of our methods and the importance of risk control.

Updated: 2025-03-22 19:54:15

标题: 关于具有预期条件风险度量的风险规避策略梯度方法的全球收敛性

摘要: 风险敏感的强化学习（RL）已成为控制不确定结果风险和确保高度随机的顺序决策问题中可靠性表现的流行工具。虽然已经为风险敏感的RL开发了策略梯度（PG）方法，但尚不清楚这些方法是否像在风险中性情况下那样享有相同的全局收敛保证。本文考虑了一类动态时间一致的风险度量，名为预期条件风险度量（ECRMs），并为基于ECRMs的RL问题推导了PG和自然策略梯度（NPG）更新。我们在以下四种设置下提供了所提出算法的全局最优性和迭代复杂度：（i）带约束直接参数化的PG，（ii）带softmax参数化和对数障碍正则化的PG，（iii）带softmax参数化和熵正则化的NPG，以及（iv）带近似NPG的不准确策略评估。此外，我们在随机Cliffwalk环境中测试了一种风险回避的REINFORCE算法和一种风险回避的NPG算法，以展示我们方法的有效性和风险控制的重要性。

更新时间: 2025-03-22 19:54:15

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2301.10932v4

Linear Partial Gromov-Wasserstein Embedding

The Gromov-Wasserstein (GW) problem, a variant of the classical optimal transport (OT) problem, has attracted growing interest in the machine learning and data science communities due to its ability to quantify similarity between measures in different metric spaces. However, like the classical OT problem, GW imposes an equal mass constraint between measures, which restricts its application in many machine learning tasks. To address this limitation, the partial Gromov-Wasserstein (PGW) problem has been introduced. It relaxes the equal mass constraint, allowing the comparison of general positive Radon measures. Despite this, both GW and PGW face significant computational challenges due to their non-convex nature. To overcome these challenges, we propose the linear partial Gromov-Wasserstein (LPGW) embedding, a linearized embedding technique for the PGW problem. For $K$ different metric measure spaces, the pairwise computation of the PGW distance requires solving the PGW problem ${O}(K^2)$ times. In contrast, the proposed linearization technique reduces this to ${O}(K)$ times. Similar to the linearization technique for the classical OT problem, we prove that LPGW defines a valid metric for metric measure spaces. Finally, we demonstrate the effectiveness of LPGW in practical applications such as shape retrieval and learning with transport-based embeddings, showing that LPGW preserves the advantages of PGW in partial matching while significantly enhancing computational efficiency. The code is available at https://github.com/mint-vu/Linearized_Partial_Gromov_Wasserstein.

Updated: 2025-03-22 19:53:49

标题: 线性局部Gromov-Wasserstein嵌入

摘要: Gromov-Wasserstein（GW）问题是经典最优传输（OT）问题的变种，由于其能够量化不同度量空间中的测度之间的相似性，已经引起了机器学习和数据科学社区的越来越多的兴趣。然而，与经典的OT问题一样，GW对测度之间的质量施加了相等的约束，这限制了其在许多机器学习任务中的应用。为了解决这一限制，引入了部分Gromov-Wasserstein（PGW）问题。它放宽了相等质量约束，允许比较一般的正Radon测度。尽管如此，无论是GW还是PGW都面临着重要的计算挑战，因为它们的非凸性质。为了克服这些挑战，我们提出了线性部分Gromov-Wasserstein（LPGW）嵌入，这是一种针对PGW问题的线性化嵌入技术。对于$K$个不同的度量测度空间，计算PGW距离的成对计算需要解决PGW问题${O}(K^2)$次。相比之下，所提出的线性化技术将其减少到${O}(K)$次。类似于经典OT问题的线性化技术，我们证明LPGW为度量测度空间定义了有效的度量。最后，我们展示了LPGW在形状检索和基于传输的嵌入学习等实际应用中的有效性，表明LPGW在部分匹配方面保留了PGW的优势，同时显著提高了计算效率。代码可在https://github.com/mint-vu/Linearized_Partial_Gromov_Wasserstein找到。

更新时间: 2025-03-22 19:53:49

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.16669v3

NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect

Multi-GPU systems are becoming increasingly important in highperformance computing (HPC) and cloud infrastructure, providing acceleration for data-intensive applications, including machine learning workloads. These systems consist of multiple GPUs interconnected through high-speed networking links such as NVIDIA's NVLink. In this work, we explore whether the interconnect on such systems can offer a novel source of leakage, enabling new forms of covert and side-channel attacks. Specifically, we reverse engineer the operations of NVlink and identify two primary sources of leakage: timing variations due to contention and accessible performance counters that disclose communication patterns. The leakage is visible remotely and even across VM instances in the cloud, enabling potentially dangerous attacks. Building on these observations, we develop two types of covert-channel attacks across two GPUs, achieving a bandwidth of over 70 Kbps with an error rate of 4.78% for the contention channel. We develop two end-to-end crossGPU side-channel attacks: application fingerprinting (including 18 high-performance computing and deep learning applications) and 3D graphics character identification within Blender, a multi-GPU rendering application. These attacks are highly effective, achieving F1 scores of up to 97.78% and 91.56%, respectively. We also discover that leakage surprisingly occurs across Virtual Machines on the Google Cloud Platform (GCP) and demonstrate a side-channel attack on Blender, achieving F1 scores exceeding 88%. We also explore potential defenses such as managing access to counters and reducing the resolution of the clock to mitigate the two sources of leakage.

Updated: 2025-03-22 19:52:02

标题: NVBleed: NVIDIA多GPU互连的隐蔽和侧信道攻击

摘要: 多GPU系统在高性能计算（HPC）和云基础设施中变得越来越重要，为数据密集型应用程序提供加速，包括机器学习工作负载。这些系统由通过高速网络连接的多个GPU组成，例如NVIDIA的NVLink。在这项工作中，我们探讨了这些系统上的互连是否可以提供一种新的泄漏源，从而实现新形式的隐蔽和侧信道攻击。具体来说，我们对NVLink的操作进行了逆向工程，并确定了两个主要的泄漏源：由于争用而导致的时序变化和可访问的性能计数器，揭示了通信模式。泄漏可以远程可见，甚至跨云中的虚拟机实例，可能会导致危险的攻击。基于这些观察，我们开发了两种跨两个GPU的隐蔽信道攻击，实现了超过70 Kbps的带宽，错误率为4.78%。我们开发了两种端到端的跨GPU侧信道攻击：应用程序指纹识别（包括18个高性能计算和深度学习应用程序）和在Blender中的3D图形字符识别，这是一个多GPU渲染应用程序。这些攻击非常有效，分别实现了高达97.78%和91.56%的F1分数。我们还发现泄漏出乎意料地发生在Google Cloud平台（GCP）上的虚拟机之间，并展示了对Blender的侧信道攻击，实现了超过88%的F1分数。我们还探讨了潜在的防御措施，如管理对计数器的访问和降低时钟分辨率以减轻两种泄漏源。

更新时间: 2025-03-22 19:52:02

领域: cs.CR

下载: http://arxiv.org/abs/2503.17847v1

Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

Latent diffusion models have become the popular choice for scaling up diffusion models for high resolution image synthesis. Compared to pixel-space models that are trained end-to-end, latent models are perceived to be more efficient and to produce higher image quality at high resolution. Here we challenge these notions, and show that pixel-space models can be very competitive to latent models both in quality and efficiency, achieving 1.5 FID on ImageNet512 and new SOTA results on ImageNet128, ImageNet256 and Kinetics600. We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions. 1: Use the sigmoid loss-weighting (Kingma & Gao, 2023) with our prescribed hyper-parameters. 2: Use our simplified memory-efficient architecture with fewer skip-connections. 3: Scale the model to favor processing the image at a high resolution with fewer parameters, rather than using more parameters at a lower resolution. Combining these with guidance intervals, we obtain a family of pixel-space diffusion models we call Simpler Diffusion (SiD2).

Updated: 2025-03-22 19:42:20

标题: 更简单的扩散（SiD2）：在像素空间扩散的ImageNet512上的1.5 FID

摘要: 潜在扩散模型已成为高分辨率图像合成中扩展扩散模型的流行选择。与在像素空间训练的端到端模型相比，潜在模型被认为更高效，并且在高分辨率下能产生更高质量的图像。在这里，我们挑战这些观念，并展示像素空间模型在质量和效率上可以与潜在模型非常竞争，实现在ImageNet512上的1.5 FID和在ImageNet128、ImageNet256和Kinetics600上的最新SOTA结果。我们提出了一个简单的方法，用于将端到端像素空间扩散模型扩展到高分辨率。1：使用我们规定的超参数的sigmoid损失加权（Kingma＆Gao，2023）。2：使用我们简化的内存高效架构，减少跳连接。3：扩展模型以更倾向于使用较少的参数在高分辨率下处理图像，而不是在较低分辨率下使用更多参数。结合这些和引导间隔，我们得到一系列像素空间扩散模型的家族，我们称之为更简单的扩散（SiD2）。

更新时间: 2025-03-22 19:42:20

领域: cs.CV,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.19324v2

Graphical Transformation Models

Graphical Transformation Models (GTMs) are introduced as a novel approach to effectively model multivariate data with intricate marginals and complex dependency structures non-parametrically, while maintaining interpretability through the identification of varying conditional independencies. GTMs extend multivariate transformation models by replacing the Gaussian copula with a custom-designed multivariate transformation, offering two major advantages. Firstly, GTMs can capture more complex interdependencies using penalized splines, which also provide an efficient regularization scheme. Secondly, we demonstrate how to approximately regularize GTMs using a lasso penalty towards pairwise conditional independencies, akin to Gaussian graphical models. The model's robustness and effectiveness are validated through simulations, showcasing its ability to accurately learn parametric vine copulas and identify conditional independencies. Additionally, the model is applied to a benchmark astrophysics dataset, where the GTM demonstrates favorable performance compared to non-parametric vine copulas in learning complex multivariate distributions.

Updated: 2025-03-22 19:41:15

标题: 图形转换模型

摘要: 图形转换模型（GTM）被引入作为一种新颖的方法，可以有效地对具有复杂边际和复杂依赖结构的多变量数据进行非参数建模，同时通过识别不同条件独立性来保持可解释性。GTM通过用自定义设计的多变量转换替换高斯Copula，扩展了多变量转换模型，提供了两个主要优势。首先，GTM可以使用受惩罚的样条线来捕捉更复杂的相互依赖关系，这也提供了一种高效的正则化方案。其次，我们演示了如何通过向成对条件独立性施加套索惩罚来近似正则化GTM，类似于高斯图模型。通过模拟验证了模型的稳健性和有效性，展示了其能够准确学习参数藤Copula并识别条件独立性。此外，该模型被应用于一个基准天体物理数据集，其中GTM在学习复杂多变量分布方面表现出比非参数藤Copula更好的性能。

更新时间: 2025-03-22 19:41:15

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17845v1

Privacy-Preserving Hamming Distance Computation with Property-Preserving Hashing

We study the problem of approximating Hamming distance in sublinear time under property-preserving hashing (PPH), where only hashed representations of inputs are available. Building on the threshold evaluation framework of Fleischhacker, Larsen, and Simkin (EUROCRYPT 2022), we present a sequence of constructions with progressively improved complexity: a baseline binary search algorithm, a refined variant with constant repetition per query, and a novel hash design that enables constant-time approximation without oracle access. Our results demonstrate that approximate distance recovery is possible under strong cryptographic guarantees, bridging efficiency and security in similarity estimation.

Updated: 2025-03-22 19:35:59

标题: 隐私保护的哈明距离计算与属性保持哈希算法

摘要: 我们研究在保持属性哈希（PPH）下在亚线性时间内近似汉明距离的问题，只有输入的哈希表示可用。在Fleischhacker、Larsen和Simkin（EUROCRYPT 2022）的阈值评估框架基础上，我们提出了一系列逐渐改进复杂性的构建：基线二进制搜索算法，具有每次查询恒定重复的改进变体，以及一种新颖的哈希设计，能够在没有oracle访问的情况下实现常数时间近似。我们的结果表明，在强加密保证下可能实现近似距离恢复，将效率和安全性在相似性估计中实现了联系。

更新时间: 2025-03-22 19:35:59

领域: cs.CC,cs.CR

下载: http://arxiv.org/abs/2503.17844v1

A General Approach for Determining Applicability Domain of Machine Learning Models

Knowledge of the domain of applicability of a machine learning model is essential to ensuring accurate and reliable model predictions. In this work, we develop a new and general approach of assessing model domain and demonstrate that our approach provides accurate and meaningful domain designation across multiple model types and material property data sets. Our approach assesses the distance between data in feature space using kernel density estimation, where this distance provides an effective tool for domain determination. We show that chemical groups considered unrelated based on chemical knowledge exhibit significant dissimilarities by our measure. We also show that high measures of dissimilarity are associated with poor model performance (i.e., high residual magnitudes) and poor estimates of model uncertainty (i.e., unreliable uncertainty estimation). Automated tools are provided to enable researchers to establish acceptable dissimilarity thresholds to identify whether new predictions of their own machine learning models are in-domain versus out-of-domain.

Updated: 2025-03-22 19:31:26

标题: 一个确定机器学习模型适用领域的通用方法

摘要: 机器学习模型适用领域的知识对于确保准确可靠的模型预测至关重要。在这项工作中，我们开发了一种新的通用方法来评估模型领域，并展示了我们的方法能够在多个模型类型和材料属性数据集中提供准确和有意义的领域指定。我们的方法通过核密度估计评估特征空间中数据之间的距离，这种距离提供了一种有效的领域确定工具。我们表明，根据化学知识认为不相关的化学团体在我们的度量中表现出显著差异。我们还表明，高度不相似的度量与模型表现差（即高残差大小）和模型不确定性估计差（即不可靠的不确定性估计）有关。我们提供了自动化工具，使研究人员能够建立可接受的不相似阈值，以确定他们自己的机器学习模型的新预测是在领域内还是在领域外。

更新时间: 2025-03-22 19:31:26

领域: cond-mat.mtrl-sci,cond-mat.other,cs.LG

下载: http://arxiv.org/abs/2406.05143v2

Adapt, Agree, Aggregate: Semi-Supervised Ensemble Labeling for Graph Convolutional Networks

In this paper, we propose a novel framework that combines ensemble learning with augmented graph structures to improve the performance and robustness of semi-supervised node classification in graphs. By creating multiple augmented views of the same graph, our approach harnesses the "wisdom of a diverse crowd", mitigating the challenges posed by noisy graph structures. Leveraging ensemble learning allows us to simultaneously achieve three key goals: adaptive confidence threshold selection based on model agreement, dynamic determination of the number of high-confidence samples for training, and robust extraction of pseudo-labels to mitigate confirmation bias. Our approach uniquely integrates adaptive ensemble consensus to flexibly guide pseudo-label extraction and sample selection, reducing the risks of error accumulation and improving robustness. Furthermore, the use of ensemble-driven consensus for pseudo-labeling captures subtle patterns that individual models often overlook, enabling the model to generalize better. Experiments on several real-world datasets demonstrate the effectiveness of our proposed method.

Updated: 2025-03-22 19:10:54

标题: 适应，一致，聚合：用于图卷积网络的半监督集成标记

摘要: 在本文中，我们提出了一种结合集成学习和增强图结构的新框架，以提高半监督图中节点分类的性能和鲁棒性。通过创建同一图的多个增强视图，我们的方法利用“多样化群体的智慧”，缓解了嘈杂图结构带来的挑战。利用集成学习使我们能够同时实现三个关键目标：基于模型一致性的自适应置信阈值选择，动态确定用于训练的高置信样本数量，以及稳健提取伪标签以减轻确认偏差。我们的方法独特地集成了自适应集成共识，灵活地指导伪标签提取和样本选择，降低错误累积风险并提高鲁棒性。此外，使用基于集成驱动的共识进行伪标记捕捉了个别模型经常忽略的微妙模式，使模型能够更好地泛化。对几个真实数据集的实验证明了我们提出的方法的有效性。

更新时间: 2025-03-22 19:10:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17842v1

A Study on the Improvement of Code Generation Quality Using Large Language Models Leveraging Product Documentation

Research on using Large Language Models (LLMs) in system development is expanding, especially in automated code and test generation. While E2E testing is vital for ensuring application quality, most test generation research has focused on unit tests, with limited work on E2E test code. This study proposes a method for automatically generating E2E test code from product documentation such as manuals, FAQs, and tutorials using LLMs with tailored prompts. The two step process interprets documentation intent and produces executable test code. Experiments on a web app with six key features (e.g., authentication, profile, discussion) showed that tests generated from product documentation had high compilation success and functional coverage, outperforming those based on requirement specs and user stories. These findings highlight the potential of product documentation to improve E2E test quality and, by extension, software quality.

Updated: 2025-03-22 18:42:05

标题: 利用产品文档的大型语言模型提高代码生成质量的研究

摘要: 研究表明在系统开发中使用大型语言模型（LLMs）的研究正在扩展，特别是在自动化代码和测试生成方面。虽然端到端测试对于确保应用程序质量至关重要，但大多数测试生成研究都集中在单元测试上，对端到端测试代码的研究有限。本研究提出了一种方法，利用定制提示使用LLMs自动从产品文档（例如手册、常见问题解答和教程）中生成端到端测试代码。这个两步过程解释文档意图并生成可执行的测试代码。在一个具有六个关键功能的Web应用程序（例如认证、个人资料、讨论）上进行的实验显示，从产品文档生成的测试具有很高的编译成功率和功能覆盖率，优于基于需求规格和用户故事的测试。这些发现突显了产品文档改善端到端测试质量以及软件质量的潜力。

更新时间: 2025-03-22 18:42:05

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.17837v1

How To Think About End-To-End Encryption and AI: Training, Processing, Disclosure, and Consent

End-to-end encryption (E2EE) has become the gold standard for securing communications, bringing strong confidentiality and privacy guarantees to billions of users worldwide. However, the current push towards widespread integration of artificial intelligence (AI) models, including in E2EE systems, raises some serious security concerns. This work performs a critical examination of the (in)compatibility of AI models and E2EE applications. We explore this on two fronts: (1) the integration of AI "assistants" within E2EE applications, and (2) the use of E2EE data for training AI models. We analyze the potential security implications of each, and identify conflicts with the security guarantees of E2EE. Then, we analyze legal implications of integrating AI models in E2EE applications, given how AI integration can undermine the confidentiality that E2EE promises. Finally, we offer a list of detailed recommendations based on our technical and legal analyses, including: technical design choices that must be prioritized to uphold E2EE security; how service providers must accurately represent E2EE security; and best practices for the default behavior of AI features and for requesting user consent. We hope this paper catalyzes an informed conversation on the tensions that arise between the brisk deployment of AI and the security offered by E2EE, and guides the responsible development of new AI features.

Updated: 2025-03-22 18:40:59

标题: 如何思考端到端加密和人工智能：培训、处理、披露和同意

摘要: 端到端加密（E2EE）已成为保护通信的黄金标准，为全球数十亿用户提供了强大的保密性和隐私保障。然而，目前普遍推行人工智能（AI）模型，包括在E2EE系统中，引发了一些严重的安全担忧。本文对AI模型与E2EE应用程序的（不）兼容性进行了关键性审查。我们从两个方面进行了探讨：（1）在E2EE应用程序中集成AI“助手”，以及（2）使用E2EE数据训练AI模型。我们分析了每个方面的潜在安全影响，并确定了与E2EE安全保证相冲突的问题。然后，我们分析了在E2EE应用程序中集成AI模型的法律影响，考虑到AI集成可能破坏E2EE承诺的保密性。最后，我们根据我们的技术和法律分析提供了一系列详细建议，包括：必须优先考虑的技术设计选择以维护E2EE安全性；服务提供商必须准确代表E2EE安全性；以及AI功能的默认行为和请求用户同意的最佳实践。我们希望本文能引发关于快速部署AI与E2EE所提供的安全性之间紧张关系的明智讨论，并引导新AI功能的负责任开发。

更新时间: 2025-03-22 18:40:59

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2412.20231v2

To Google or To ChatGPT? A Comparison of CS2 Students' Information Gathering Approaches and Outcomes

LLMs such as ChatGPT have been widely adopted by students in higher education as tools for learning programming and related concepts. However, it remains unclear how effective students are and what strategies students use while learning with LLMs. Since the majority of students' experiences in online self-learning have come through using search engines such as Google, evaluating AI tools in this context can help us address these gaps. In this mixed methods research, we conducted an exploratory within-subjects study to understand how CS2 students learn programming concepts using both LLMs as well as traditional online methods such as educational websites and videos to examine how students approach learning within and across both scenarios. We discovered that students found it easier to learn a more difficult concept using traditional methods than using ChatGPT. We also found that students ask fewer follow-ups and use more keyword-based queries for search engines while their prompts to LLMs tend to explicitly ask for information.

Updated: 2025-03-22 18:17:31

标题: 使用Google还是使用ChatGPT？CS2学生信息搜集方法和结果的比较

摘要: LLMs（例如ChatGPT）被广泛应用于高等教育中，作为学习编程和相关概念的工具。然而，学生使用LLMs学习时的效果如何以及学生使用何种策略尚不清楚。由于大多数学生在在线自学中的经验来自使用Google等搜索引擎，评估在这种情境下的AI工具可以帮助我们填补这些空白。在这项混合方法研究中，我们进行了一项探索性的被试内研究，以了解CS2学生如何使用LLMs以及传统在线方法（如教育网站和视频）学习编程概念，以检查学生在这两种情境中的学习方式。我们发现，学生发现使用传统方法学习较难的概念比使用ChatGPT更容易。我们还发现，学生在使用搜索引擎时提出较少的跟进问题，更多地使用基于关键词的查询，而在向LLMs提出问题时通常明确要求信息。

更新时间: 2025-03-22 18:17:31

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2501.11935v3

FundusGAN: A Hierarchical Feature-Aware Generative Framework for High-Fidelity Fundus Image Generation

Recent advancements in ophthalmology foundation models such as RetFound have demonstrated remarkable diagnostic capabilities but require massive datasets for effective pre-training, creating significant barriers for development and deployment. To address this critical challenge, we propose FundusGAN, a novel hierarchical feature-aware generative framework specifically designed for high-fidelity fundus image synthesis. Our approach leverages a Feature Pyramid Network within its encoder to comprehensively extract multi-scale information, capturing both large anatomical structures and subtle pathological features. The framework incorporates a modified StyleGAN-based generator with dilated convolutions and strategic upsampling adjustments to preserve critical retinal structures while enhancing pathological detail representation. Comprehensive evaluations on the DDR, DRIVE, and IDRiD datasets demonstrate that FundusGAN consistently outperforms state-of-the-art methods across multiple metrics (SSIM: 0.8863, FID: 54.2, KID: 0.0436 on DDR). Furthermore, disease classification experiments reveal that augmenting training data with FundusGAN-generated images significantly improves diagnostic accuracy across multiple CNN architectures (up to 6.49\% improvement with ResNet50). These results establish FundusGAN as a valuable foundation model component that effectively addresses data scarcity challenges in ophthalmological AI research, enabling more robust and generalizable diagnostic systems while reducing dependency on large-scale clinical data collection.

Updated: 2025-03-22 18:08:07

标题: FundusGAN：一种用于生成高保真度眼底图像的分层特征感知生成框架

摘要: 近年来，眼科领域基金会模型的最新进展，如RetFound，展示了出色的诊断能力，但需要大规模数据集进行有效的预训练，为开发和部署创建了重大障碍。为了解决这一关键挑战，我们提出了FundusGAN，这是一个新颖的层次特征感知生成框架，专门设计用于高保真的眼底图像合成。我们的方法利用编码器内的特征金字塔网络全面提取多尺度信息，捕捉大型解剖结构和微妙的病理特征。该框架结合了基于修改的StyleGAN的生成器，采用扩张卷积和策略性上采样调整，以保留关键的视网膜结构，同时增强病理细节表征。在DDR、DRIVE和IDRiD数据集上的全面评估表明，FundusGAN在多个指标上 consistently优于最先进的方法（DDR上的SSIM: 0.8863，FID: 54.2，KID: 0.0436）。此外，疾病分类实验表明，通过增加使用FundusGAN生成的图像的训练数据，可以显著提高多个CNN体系结构的诊断准确性（ResNet50的提高达6.49%）。这些结果将FundusGAN确立为眼科AI研究中有效解决数据稀缺挑战的宝贵基础模型组件，使诊断系统更加健壮和可泛化，同时减少对大规模临床数据收集的依赖。

更新时间: 2025-03-22 18:08:07

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.17831v1

Fingerprinting Implementations of Cryptographic Primitives and Protocols that Use Post-Quantum Algorithms

Fingerprinting is a technique used to create behavioral profiles of systems to identify threats and weaknesses. When applied to cryptographic primitives and network protocols, it can be exploited by attackers for denial-of-service, key recovery, or downgrade attacks. In this paper, we evaluate the feasibility of fingerprinting post-quantum (PQ) algorithms by analyzing key exchange and digital signature primitives, their integration into protocols like TLS, SSH, QUIC, OpenVPN, and OIDC, and their usage in SNARK libraries (pysnark and lattice_zksnark). PQ algorithms differ from classical ones in memory and computation demands. We examine implementations across liboqs and CIRCL libraries on Windows, Ubuntu, and MacOS. Our experiments show that we can distinguish classical from PQ key exchange and signatures with 98% and 100% accuracy, respectively; identify the specific PQ algorithm used with 97% and 86% accuracy; distinguish between liboqs and CIRCL implementations with up to 100% accuracy; and identify PQ vs. hybrid implementations within CIRCL with 97% accuracy. In protocol-level analysis, we can detect the presence and type of PQ key exchange. SNARK libraries are distinguishable with 100% accuracy. To demonstrate real-world applicability, we apply our fingerprinting methods to the Tranco dataset to detect domains using PQ TLS and integrate our methods into QUARTZ, an open-source threat analysis tool developed by Cisco.

Updated: 2025-03-22 18:00:21

标题: 使用后量子算法的密码原语和协议的指纹实现

摘要: 指纹识别是一种用于创建系统行为特征概要以识别威胁和弱点的技术。当应用于加密原语和网络协议时，攻击者可以利用它进行拒绝服务、密钥恢复或降级攻击。本文评估了通过分析密钥交换和数字签名原语以及它们在诸如TLS、SSH、QUIC、OpenVPN和OIDC等协议中的集成以及它们在SNARK库（pysnark和lattice_zksnark）中的使用的可行性。后量子（PQ）算法在内存和计算需求上与经典算法不同。我们在Windows、Ubuntu和MacOS上跨liboqs和CIRCL库的实现进行了检查。我们的实验表明，我们可以将经典的密钥交换和签名与PQ密钥交换和签名分别以98%和100%的准确度区分开；以97%和86%的准确度识别所使用的特定PQ算法；在高达100%的准确度下区分liboqs和CIRCL的实现；以97%的准确度识别CIRCL中的PQ与混合实现。在协议级别分析中，我们可以检测到PQ密钥交换的存在和类型。SNARK库可以以100%的准确度区分。为了展示现实世界的适用性，我们将我们的指纹识别方法应用于Tranco数据集，以检测使用PQ TLS的域，并将我们的方法集成到由思科开发的开源威胁分析工具QUARTZ中。

更新时间: 2025-03-22 18:00:21

领域: cs.CR

下载: http://arxiv.org/abs/2503.17830v1

On the Minimax Regret of Sequential Probability Assignment via Square-Root Entropy

We study the problem of sequential probability assignment under logarithmic loss, both with and without side information. Our objective is to analyze the minimax regret -- a notion extensively studied in the literature -- in terms of geometric quantities, such as covering numbers and scale-sensitive dimensions. We show that the minimax regret for the case of no side information (equivalently, the Shtarkov sum) can be upper bounded in terms of sequential square-root entropy, a notion closely related to Hellinger distance. For the problem of sequential probability assignment with side information, we develop both upper and lower bounds based on the aforementioned entropy. The lower bound matches the upper bound, up to log factors, for classes in the Donsker regime (according to our definition of entropy).

Updated: 2025-03-22 17:26:34

标题: 关于通过平方根熵进行顺序概率分配的最小后悔

摘要: 我们研究了在对数损失下，带有和不带有辅助信息的序贯概率分配问题。我们的目标是分析几何量，如覆盖数和尺度敏感维度，来研究文献中广泛研究的最小化后悔问题。我们证明了在没有辅助信息的情况下的最小化后悔（等效于Shtarkov和）可以用序贯平方根熵的概念上界约束，这个概念与Hellinger距离密切相关。对于带有辅助信息的序贯概率分配问题，我们基于上述熵开发了上下界。对于在我们定义的熵范围内的类别，下界与上界匹配，差异仅在于对数因子。

更新时间: 2025-03-22 17:26:34

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2503.17823v1

Metacognition in Content-Centric Computational Cognitive C4 Modeling

For AI agents to emulate human behavior, they must be able to perceive, meaningfully interpret, store, and use large amounts of information about the world, themselves, and other agents. Metacognition is a necessary component of all of these processes. In this paper, we briefly a) introduce content-centric computational cognitive (C4) modeling for next-generation AI agents; b) review the long history of developing C4 agents at RPI's LEIA (Language-Endowed Intelligent Agents) Lab; c) discuss our current work on extending LEIAs' cognitive capabilities to cognitive robotic applications developed using a neuro symbolic processing model; and d) sketch plans for future developments in this paradigm that aim to overcome underappreciated limitations of currently popular, LLM-driven methods in AI.

Updated: 2025-03-22 17:23:27

标题: 内容中心的计算认知C4建模中的元认知

摘要: 为了模拟人类行为，AI代理必须能够感知、有意义地解释、存储和使用关于世界、自身和其他代理的大量信息。元认知是所有这些过程的必要组成部分。在本文中，我们简要介绍了针对下一代AI代理的基于内容的计算认知（C4）建模；回顾了RPI LEIA（具有语言能力的智能代理）实验室开发C4代理的悠久历史；讨论了我们目前正在进行的将LEIA的认知能力扩展到使用神经符号处理模型开发的认知机器人应用的工作；并勾勒了这一范式未来发展的计划，旨在克服当前流行的以LLM为驱动方法在AI中存在的被低估的局限性。

更新时间: 2025-03-22 17:23:27

领域: cs.AI

下载: http://arxiv.org/abs/2503.17822v1

SynMorph: Generating Synthetic Face Morphing Dataset with Mated Samples

Face morphing attack detection (MAD) algorithms have become essential to overcome the vulnerability of face recognition systems. To solve the lack of large-scale and public-available datasets due to privacy concerns and restrictions, in this work we propose a new method to generate a synthetic face morphing dataset with 2450 identities and more than 100k morphs. The proposed synthetic face morphing dataset is unique for its high-quality samples, different types of morphing algorithms, and the generalization for both single and differential morphing attack detection algorithms. For experiments, we apply face image quality assessment and vulnerability analysis to evaluate the proposed synthetic face morphing dataset from the perspective of biometric sample quality and morphing attack potential on face recognition systems. The results are benchmarked with an existing SOTA synthetic dataset and a representative non-synthetic and indicate improvement compared with the SOTA. Additionally, we design different protocols and study the applicability of using the proposed synthetic dataset on training morphing attack detection algorithms.

Updated: 2025-03-22 17:21:22

标题: SynMorph：使用匹配样本生成合成面部变形数据集

摘要: 面部变形攻击检测（MAD）算法已经成为克服人脸识别系统的脆弱性的必要手段。为了解决由于隐私问题和限制导致的缺乏大规模和公开可用数据集的问题，本文提出了一种新方法，生成了一个包含2450个身份和超过100k个变形的合成面部变形数据集。所提出的合成面部变形数据集在高质量样本、不同类型的变形算法以及对单个和差分变形攻击检测算法的泛化方面具有独特性。在实验中，我们应用面部图像质量评估和脆弱性分析来评估所提出的合成面部变形数据集，从生物特征样本质量和对人脸识别系统的变形攻击潜力的角度进行评估。结果与现有的SOTA合成数据集和代表性的非合成数据集进行了基准测试，并与SOTA相比表现出改进。此外，我们设计了不同的协议，并研究了在训练变形攻击检测算法中使用所提出的合成数据集的适用性。

更新时间: 2025-03-22 17:21:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.05595v2

Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification

Synthetic data augmentation via large language models (LLMs) allows researchers to leverage additional training data, thus enhancing the performance of downstream tasks, especially when real-world data is scarce. However, the generated data can deviate from the real-world data, and this misalignment can bring deficient outcomes while applying the trained model to applications. Therefore, we proposed efficient weighted-loss approaches to align synthetic data with real-world distribution by emphasizing high-quality and diversified data generated by LLMs with using merely a little real-world data. We empirically assessed the effectiveness of our method on multiple text classification tasks, and the results showed leveraging our approaches on a BERT-level model robustly outperformed standard cross-entropy and other data weighting approaches, providing potential solutions to effectively leveraging synthetic data from any suitable data generator for model training.

Updated: 2025-03-22 17:19:57

标题: 并非所有由LLM生成的数据都是相等的：重新思考文本分类中的数据加权

摘要: 通过大型语言模型（LLMs）进行合成数据增强，使研究人员能够利用额外的训练数据，从而提高下游任务的性能，特别是在真实世界数据稀缺的情况下。然而，生成的数据可能会偏离真实世界数据，这种不一致可能会导致应用训练模型时出现不足的结果。因此，我们提出了高效的加权损失方法，通过强调由LLMs生成的高质量和多样化数据，仅使用少量真实世界数据就可以使合成数据与真实世界分布保持一致。我们在多个文本分类任务上经验性地评估了我们的方法的有效性，结果显示我们的方法在BERT级别模型上表现出色地优于标准的交叉熵和其他数据加权方法，为有效利用任何适合数据生成器生成的合成数据进行模型训练提供了潜在解决方案。

更新时间: 2025-03-22 17:19:57

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.21526v2

OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination

AI agents hold the potential to transform everyday life by helping humans achieve their goals. To do this successfully, agents need to be able to coordinate with novel partners without prior interaction, a setting known as zero-shot coordination (ZSC). Overcooked has become one of the most popular benchmarks for evaluating coordination capabilities of AI agents and learning algorithms. In this work, we investigate the origins of ZSC challenges in Overcooked. We introduce a state augmentation mechanism which mixes states that might be encountered when paired with unknown partners into the training distribution, reducing the out-of-distribution challenge associated with ZSC. We show that independently trained agents under this algorithm coordinate successfully in Overcooked. Our results suggest that ZSC failure can largely be attributed to poor state coverage under self-play rather than more sophisticated coordination challenges. The Overcooked environment is therefore not suitable as a ZSC benchmark. To address these shortcomings, we introduce OvercookedV2, a new version of the benchmark, which includes asymmetric information and stochasticity, facilitating the creation of interesting ZSC scenarios. To validate OvercookedV2, we conduct experiments demonstrating that mere exhaustive state coverage is insufficient to coordinate well. Finally, we use OvercookedV2 to build a new range of coordination challenges, including ones that require test time protocol formation, and we demonstrate the need for new coordination algorithms that can adapt online. We hope that OvercookedV2 will help benchmark the next generation of ZSC algorithms and advance collaboration between AI agents and humans.

Updated: 2025-03-22 17:14:24

标题: 过度烹饪V2：重新思考零射协调的过度烹饪

摘要: AI代理人有潜力通过帮助人类实现他们的目标来改变日常生活。为了成功地做到这一点，代理人需要能够与新领域的合作伙伴协调，而无需先前的互动，这种环境被称为零射击协调（ZSC）。《Overcooked》已成为评估AI代理人和学习算法协调能力的最受欢迎的基准之一。在这项工作中，我们调查了《Overcooked》中ZSC挑战的起源。我们引入了一种状态增强机制，将可能在与未知合作伙伴配对时遇到的状态混合到训练分布中，降低了与ZSC相关的超出分布挑战。我们展示了在这种算法下独立训练的代理人在《Overcooked》中成功协调。我们的结果表明，ZSC失败主要归因于自我博弈下状态覆盖不足，而不是更复杂的协调挑战。因此，《Overcooked》环境不适合作为ZSC基准。为了解决这些缺陷，我们引入了OvercookedV2，这是基准的新版本，包括不对称信息和随机性，有助于创建有趣的ZSC场景。为了验证OvercookedV2，我们进行了实验，证明光是穷举状态覆盖是不足以很好地协调的。最后，我们利用OvercookedV2构建了一系列新的协调挑战，包括需要测试时间协议形成的挑战，并展示了需要新的在线适应协调算法。我们希望OvercookedV2将有助于基准测试下一代ZSC算法，并推动AI代理人与人类之间的合作。

更新时间: 2025-03-22 17:14:24

领域: cs.AI

下载: http://arxiv.org/abs/2503.17821v1

Don't Kill the Baby: The Case for AI in Arbitration

Since the introduction of Generative AI (GenAI) in 2022, its ability to simulate human intelligence and generate content has sparked both enthusiasm and concern. While much criticism focuses on AI's potential to perpetuate bias, create emotional dissonance, displace jobs, and raise ethical questions, these concerns often overlook the practical benefits of AI, particularly in legal contexts. This article examines the integration of AI into arbitration, arguing that the Federal Arbitration Act (FAA) allows parties to contractually choose AI-driven arbitration, despite traditional reservations. The article makes three key contributions: (1) It shifts the focus from debates over AI's personhood to the practical aspects of incorporating AI into arbitration, asserting that AI can effectively serve as an arbitrator if both parties agree; (2) It positions arbitration as an ideal starting point for broader AI adoption in the legal field, given its flexibility and the autonomy it grants parties to define their standards of fairness; and (3) It outlines future research directions, emphasizing the importance of empirically comparing AI and human arbitration, which could lead to the development of distinct systems. By advocating for the use of AI in arbitration, this article underscores the importance of respecting contractual autonomy and creating an environment that allows AI's potential to be fully realized. Drawing on the insights of Judge Richard Posner, the article argues that the ethical obligations of AI in arbitration should be understood within the context of its technological strengths and the voluntary nature of arbitration agreements. Ultimately, it calls for a balanced, open-minded approach to AI in arbitration, recognizing its potential to enhance the efficiency, fairness, and flexibility of dispute resolution

Updated: 2025-03-22 17:00:00

标题: 不要杀死这个孩子：在仲裁中使用人工智能的案例

摘要: 自从2022年引入生成式人工智能（GenAI）以来，其模拟人类智能并生成内容的能力引发了人们的热情和关注。尽管许多批评集中在人工智能可能持续偏见、产生情感上的不协调、取代工作岗位以及引发道德问题的潜力上，但这些关注通常忽视了人工智能在法律环境中的实际好处。本文研究了人工智能在仲裁中的整合，认为《联邦仲裁法案》（FAA）允许各方合同选择基于人工智能的仲裁，尽管传统观念存在保留。本文做出了三个关键贡献：（1）它将焦点从关于人工智能人格的辩论转移到将人工智能纳入仲裁的实际方面，主张只要两个当事方同意，人工智能可以有效地担任仲裁人；（2）它将仲裁定位为法律领域更广泛采用人工智能的理想起点，鉴于其灵活性和授予当事方定义公平标准的自主权；和（3）它概述了未来的研究方向，强调了对比人工智能和人类仲裁的重要性，这可能导致开发不同的系统。通过倡导在仲裁中使用人工智能，本文强调尊重合同自主权的重要性，并创造一个允许充分实现人工智能潜力的环境。借鉴理查德·波斯纳法官的见解，本文认为人工智能在仲裁中的道德义务应当在其技术优势和仲裁协议的自愿性背景下理解。最终，它呼吁在仲裁中采取一种平衡、开放的态度，认识到人工智能在提升争端解决效率、公平性和灵活性方面的潜力。

更新时间: 2025-03-22 17:00:00

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2408.11608v2

Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based?

A novel pinching antenna system (PASS)-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed. PASS consists of multiple waveguides spanning over thousands of wavelength, which equip numerous low-cost dielectric particles, named pinching antennas (PAs), to radiate signals into free space. The positions of PAs can be reconfigured to change both the large-scale path losses and phases of signals, thus facilitating the novel pinching beamforming design. A sum rate maximization problem is formulated, which jointly optimizes the transmit and pinching beamforming to adaptively achieve constructive signal enhancement and destructive interference mitigation. To solve this highly coupled and nonconvex problem, both optimization-based and learning-based methods are proposed. 1) For the optimization-based method, a majorization-minimization and penalty dual decomposition (MM-PDD) algorithm is developed, which handles the nonconvex complex exponential component using a Lipschitz surrogate function and then invokes PDD for problem decoupling. 2) For the learning-based method, a novel Karush-Kuhn-Tucker (KKT)-guided dual learning (KDL) approach is proposed, which enables KKT solutions to be reconstructed in a data-driven manner by learning dual variables. Following this idea, a KDL-Tranformer algorithm is developed, which captures both inter-PA/inter-user dependencies and channel-state-information (CSI)-beamforming dependencies by attention mechanisms. Simulation results demonstrate that: i) The proposed PASS framework significantly outperforms conventional massive multiple input multiple output (MIMO) system even with a few PAs. ii) The proposed KDL-Transformer can improve over 30% system performance than MM-PDD algorithm, while achieving a millisecond-level response on modern GPUs.

Updated: 2025-03-22 16:27:16

标题: 联合发送和捏合波束成形用于捏合天线系统（PASS）：基于优化还是基于学习？

摘要: 提出了一种新颖的捏合式天线系统（PASS）-启用的下行多用户多输入单输出（MISO）框架。PASS由跨越数千波长的多个波导组成，配备许多低成本的介质颗粒，称为捏合天线（PAs），将信号辐射到自由空间中。PAs的位置可以重新配置以改变信号的大尺度路径损耗和相位，从而促进新颖的捏合波束成形设计。制定了一个求和速率最大化问题，联合优化发送和捏合波束成形，以自适应地实现信号的构造增强和破坏性干扰抑制。为了解决这个高度耦合且非凸问题，提出了基于优化和基于学习的方法。1）对于基于优化的方法，开发了一种主化极小化和惩罚双分解（MM-PDD）算法，该算法使用Lipschitz替代函数处理非凸复指数分量，然后调用PDD进行问题解耦。2）对于基于学习的方法，提出了一种新颖的Karush-Kuhn-Tucker（KKT）引导的双学习（KDL）方法，该方法使得KKT解可以通过学习双变量以数据驱动的方式重建。根据这一思路，开发了一种KDL-Transformer算法，通过注意机制捕捉了PA之间/用户之间的依赖关系和信道状态信息（CSI）-波束成形依赖关系。模拟结果表明：i）所提出的PASS框架即使只有少量PA也明显优于传统的大规模多输入多输出（MIMO）系统。ii）所提出的KDL-Transformer比MM-PDD算法改善了30%以上的系统性能，同时在现代GPU上实现了毫秒级的响应。

更新时间: 2025-03-22 16:27:16

领域: eess.SP,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2502.08637v2

Connectedness: a dimension of security bug severity assessment for measuring uncertainty

Current frameworks for evaluating security bug severity, such as the Common Vulnerability Scoring System (CVSS), prioritize the ratio of exploitability to impact. This paper suggests that the above approach measures the "known knowns" but inadequately addresses the "known unknowns" especially when there exist multiple possible exploit paths and side effects, which introduce significant uncertainty. This paper introduces the concept of connectedness, which measures how strongly a security bug is connected with different entities, thereby reflecting the uncertainty of impact and the exploit potential. This work highlights the critical but underappreciated role connectedness plays in severity assessments.

Updated: 2025-03-22 16:25:08

标题: Connectedness: 衡量不确定性的安全漏洞严重性评估维度

摘要: 目前评估安全漏洞严重程度的框架，如通用漏洞评分系统（CVSS），优先考虑可利用性与影响之间的比率。本文建议，上述方法度量了“已知已知”，但在存在多种可能的利用路径和副作用时，尤其是引入显著不确定性时，却未能充分解决“已知未知”。本文引入了“连接性”概念，衡量安全漏洞与不同实体的联系强度，从而反映了影响和利用潜力的不确定性。这项工作强调了连接性在严重性评估中扮演的关键但未被充分重视的角色。

更新时间: 2025-03-22 16:25:08

领域: cs.CR

下载: http://arxiv.org/abs/2503.17813v1

Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models

Natural Language to SQL (NL2SQL) has seen significant advancements with large language models (LLMs). However, these models often depend on closed-source systems and high computational resources, posing challenges in data privacy and deployment. In contrast, small language models (SLMs) struggle with NL2SQL tasks, exhibiting poor performance and incompatibility with existing frameworks. To address these issues, we introduce Feather-SQL, a new lightweight framework tailored for SLMs. Feather-SQL improves SQL executability and accuracy through 1) schema pruning and linking, 2) multi-path and multi-candidate generation. Additionally, we introduce the 1+1 Model Collaboration Paradigm, which pairs a strong general-purpose chat model with a fine-tuned SQL specialist, combining strong analytical reasoning with high-precision SQL generation. Experimental results on BIRD demonstrate that Feather-SQL improves NL2SQL performance on SLMs, with around 10% boost for models without fine-tuning. The proposed paradigm raises the accuracy ceiling of SLMs to 54.76%, highlighting its effectiveness.

Updated: 2025-03-22 16:22:53

标题: Feather-SQL: 一个轻量级的NL2SQL框架，采用双模型协作范式，适用于小型语言模型

摘要: 自然语言到SQL（NL2SQL）在大型语言模型（LLMs）的帮助下取得了显著进展。然而，这些模型通常依赖闭源系统和高计算资源，给数据隐私和部署带来挑战。相比之下，小语言模型（SLMs）在NL2SQL任务中表现不佳，性能差且与现有框架不兼容。为了解决这些问题，我们引入了Feather-SQL，一个专为SLMs量身定制的轻量级框架。Feather-SQL通过模式修剪和链接以及多路径和多候选生成来提高SQL的可执行性和准确性。此外，我们引入了1+1模型协作范式，将强大的通用聊天模型与经过微调的SQL专家配对，结合强大的分析推理和高精度的SQL生成。在BIRD上的实验结果表明，Feather-SQL提高了SLMs上NL2SQL任务的性能，在没有微调的模型中提高了约10%。提出的范式将SLMs的准确率上限提高到了54.76%，突显了其有效性。

更新时间: 2025-03-22 16:22:53

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2503.17811v1

Poisson-Process Topic Model for Integrating Knowledge from Pre-trained Language Models

Topic modeling is traditionally applied to word counts without accounting for the context in which words appear. Recent advancements in large language models (LLMs) offer contextualized word embeddings, which capture deeper meaning and relationships between words. We aim to leverage such embeddings to improve topic modeling. We use a pre-trained LLM to convert each document into a sequence of word embeddings. This sequence is then modeled as a Poisson point process, with its intensity measure expressed as a convex combination of $K$ base measures, each corresponding to a topic. To estimate these topics, we propose a flexible algorithm that integrates traditional topic modeling methods, enhanced by net-rounding applied before and kernel smoothing applied after. One advantage of this framework is that it treats the LLM as a black box, requiring no fine-tuning of its parameters. Another advantage is its ability to seamlessly integrate any traditional topic modeling approach as a plug-in module, without the need for modifications Assuming each topic is a $\beta$-H\"{o}lder smooth intensity measure on the embedded space, we establish the rate of convergence of our method. We also provide a minimax lower bound and show that the rate of our method matches with the lower bound when $\beta\leq 1$. Additionally, we apply our method to several datasets, providing evidence that it offers an advantage over traditional topic modeling approaches.

Updated: 2025-03-22 16:19:04

标题: 泊松过程主题模型用于整合预训练语言模型的知识

摘要: 主题建模传统上应用于单词计数，而没有考虑单词出现的上下文。最近大型语言模型（LLMs）的进展提供了上下文化的单词嵌入，捕捉单词之间更深层的含义和关系。我们旨在利用这样的嵌入来改进主题建模。我们使用预训练的LLM将每个文档转换为一系列单词嵌入。然后将该序列建模为泊松点过程，其强度测度表达为$K$个基础测度的凸组合，每个基础测度对应一个主题。为了估计这些主题，我们提出了一种灵活的算法，将传统主题建模方法与在之前应用的净舍入和在之后应用的核平滑相结合。这个框架的一个优点是将LLM视为黑匣子，不需要微调其参数。另一个优点是其能够无缝地将任何传统主题建模方法作为插件模块集成，无需修改。假设每个主题是嵌入空间上的$\beta$-H\"{o}lder平滑强度测度，我们建立了我们方法的收敛速率。我们还提供了一个极小极大下限，并展示了当$\beta\leq 1$时，我们方法的速率与下限匹配。此外，我们将我们的方法应用于几个数据集，提供证据表明它相对于传统主题建模方法具有优势。

更新时间: 2025-03-22 16:19:04

领域: stat.ML,cs.LG,math.ST,stat.TH,62G07

下载: http://arxiv.org/abs/2503.17809v1

Neural Network Approach to Stochastic Dynamics for Smooth Multimodal Density Estimation

In this paper we consider a new probability sampling methods based on Langevin diffusion dynamics to resolve the problem of existing Monte Carlo algorithms when draw samples from high dimensional target densities. We extent Metropolis-Adjusted Langevin Diffusion algorithm by modelling the stochasticity of precondition matrix as a random matrix. An advantage compared to other proposal method is that it only requires the gradient of log-posterior. The proposed method provides fully adaptation mechanisms to tune proposal densities to exploits and adapts the geometry of local structures of statistical models. We clarify the benefits of the new proposal by modelling a Quantum Probability Density Functions of a free particle in a plane (energy Eigen-functions). The proposed model represents a remarkable improvement in terms of performance accuracy and computational time over standard MCMC method.

Updated: 2025-03-22 16:17:12

标题: 神经网络方法用于平滑多模态密度估计的随机动力学

摘要: 在这篇论文中，我们考虑了一种基于Langevin扩散动力学的新的概率抽样方法，以解决现有的蒙特卡洛算法在从高维目标密度中抽样时的问题。我们通过将预条件矩阵的随机性建模为一个随机矩阵，扩展了Metropolis-Adjusted Langevin Diffusion算法。与其他提议方法相比的一个优势是它只需要log-posterior的梯度。所提出的方法提供了完全适应机制，以调整提议密度来利用和适应统计模型的局部结构的几何性。我们通过对平面上的自由粒子的量子概率密度函数（能量本征函数）建模，阐明了新提议的好处。所提出的模型在性能准确性和计算时间方面相比标准MCMC方法有显著的改善。

更新时间: 2025-03-22 16:17:12

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2503.17807v1

A Roadmap Towards Improving Multi-Agent Reinforcement Learning With Causal Discovery And Inference

Causal reasoning is increasingly used in Reinforcement Learning (RL) to improve the learning process in several dimensions: efficacy of learned policies, efficiency of convergence, generalisation capabilities, safety and interpretability of behaviour. However, applications of causal reasoning to Multi-Agent RL (MARL) are still mostly unexplored. In this paper, we take the first step in investigating the opportunities and challenges of applying causal reasoning in MARL. We measure the impact of a simple form of causal augmentation in state-of-the-art MARL scenarios increasingly requiring cooperation, and with state-of-the-art MARL algorithms exploiting various degrees of collaboration between agents. Then, we discuss the positive as well as negative results achieved, giving us the chance to outline the areas where further research may help to successfully transfer causal RL to the multi-agent setting.

Updated: 2025-03-22 15:49:13

标题: 一个通往通过因果发现和推理改进多智能体强化学习的路线图

摘要: 因果推理在强化学习（RL）中越来越被用于改进学习过程的多个方面：学习策略的效力、收敛效率、泛化能力、安全性和行为可解释性。然而，因果推理在多智体强化学习（MARL）中的应用仍然大多未被探索。本文首次探讨了在MARL中应用因果推理的机遇和挑战。我们在最先进的MARL场景中测量了一种简单形式的因果增强对合作要求越来越高的MARL场景的影响，并使用各种程度的智体协作的最先进MARL算法。然后，我们讨论了取得的积极和消极结果，为我们提供了机会概述进一步研究可能有助于成功将因果RL转移到多智体环境的领域。

更新时间: 2025-03-22 15:49:13

领域: cs.LG,cs.AI,cs.MA,stat.ME

下载: http://arxiv.org/abs/2503.17803v1

Causality-oriented robustness: exploiting general noise interventions

Since distribution shifts are common in real-world applications, there is a pressing need to develop prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general noise interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression as a special case, and that it yields prediction models that protect against more diverse perturbations. We establish finite-sample results and extend our approach to semi-supervised domain adaptation to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell and intensive health care datasets.

Updated: 2025-03-22 15:37:46

标题: 因果导向的鲁棒性：利用一般噪声干预

摘要: 由于在现实世界的应用中分布转移是常见的，因此迫切需要开发能够抵抗这种转移的预测模型。现有的框架，如经验风险最小化或分布稳健优化，要么缺乏对未知分布的泛化能力，要么依赖于假设的距离度量。相比之下，因果关系提供了一种基于数据驱动和结构化的角度来进行稳健预测。然而，用于因果推断的假设可能过于严格，并且这种因果模型提供的稳健性通常缺乏灵活性。在本文中，我们专注于基于因果关系的稳健性，并提出了一种利用训练数据中的一般噪声干预来实现对未知干预的稳健预测的方法，自然地在分布预测和因果之间插值。在线性设置中，我们证明了DRIG产生的预测在基于数据的一类分布转移中是稳健的。此外，我们展示了我们的框架将锚定回归作为一种特殊情况，并且它产生的预测模型可以抵御更多样的扰动。我们建立了有限样本结果，并将我们的方法扩展到半监督领域适应以进一步提高预测性能。最后，我们在合成模拟和单细胞和密集医疗数据集上对我们的方法进行了实证验证。

更新时间: 2025-03-22 15:37:46

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2307.10299v2

Machine Learning - Driven Materials Discovery: Unlocking Next-Generation Functional Materials - A minireview

The rapid advancement of machine learning and artificial intelligence (AI)-driven techniques is revolutionizing materials discovery, property prediction, and material design by minimizing human intervention and accelerating scientific progress. This review provides a comprehensive overview of smart, machine learning (ML)-driven approaches, emphasizing their role in predicting material properties, discovering novel compounds, and optimizing material structures. Key methodologies ranging from deep learning, graph neural networks, and Bayesian optimization to automated generative models, such as generative adversarial networks (GANs) and variational autoencoders (VAEs) enable the autonomous design of materials with tailored functionalities. By leveraging AutoML frameworks (e.g., AutoGluon, TPOT, and H2O.ai), researchers can automate the model selection, hyperparameter tuning, and feature engineering, significantly improving the efficiency of materials informatics. Furthermore, the integration of AI-driven robotic laboratories and high-throughput computing has established a fully automated pipeline for rapid synthesis and experimental validation, drastically reducing the time and cost of material discovery. This review highlights real-world applications of automated ML-driven approaches in predicting mechanical, thermal, electrical, and optical properties of materials, demonstrating successful cases in superconductors, catalysts, photovoltaics, and energy storage systems. We also address key challenges, such as data quality, interpretability, and the integration of AutoML with quantum computing, which are essential for future advancements. Ultimately, the synergy between AI, automated experimentation, and computational modeling transforms the way the materials are discovered, optimized, and designed, paving the way for next-generation innovations in energy, electronics, and nanotechnology.

Updated: 2025-03-22 15:24:38

标题: 机器学习驱动的材料发现：开拓下一代功能材料-一篇小综述

摘要: 机器学习和人工智能驱动技术的快速发展正在通过减少人类干预和加快科学进步来改变材料发现、属性预测和材料设计。本综述提供了对智能、机器学习驱动方法的全面概述，强调它们在预测材料属性、发现新化合物和优化材料结构中的作用。从深度学习、图神经网络和贝叶斯优化到自动生成模型，如生成对抗网络（GANs）和变分自动编码器（VAEs）等关键方法使得材料的自主设计具有定制功能。通过利用AutoML框架（例如AutoGluon、TPOT和H2O.ai），研究人员可以自动选择模型、调整超参数和进行特征工程，显著提高了材料信息学的效率。此外，人工智能驱动的机器人实验室和高通量计算的整合建立了一个完全自动化的快速合成和实验验证流水线，大幅缩短了材料发现的时间和成本。本综述突出了自动化机器学习驱动方法在预测材料的机械、热、电和光学性质方面的实际应用，展示了在超导体、催化剂、光伏和储能系统等领域取得成功的案例。我们还讨论了关键挑战，如数据质量、可解释性以及AutoML与量子计算的整合，这对未来的进步至关重要。最终，人工智能、自动实验和计算建模之间的协同作用正在改变材料的发现、优化和设计方式，为能源、电子和纳米技术领域的下一代创新铺平道路。

更新时间: 2025-03-22 15:24:38

领域: cond-mat.mtrl-sci,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.18975v1

GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting

Recent developments in 3D reconstruction and neural rendering have significantly propelled the capabilities of photo-realistic 3D scene rendering across various academic and industrial fields. The 3D Gaussian Splatting technique, alongside its derivatives, integrates the advantages of primitive-based and volumetric representations to deliver top-tier rendering quality and efficiency. Despite these advancements, the method tends to generate excessive redundant noisy Gaussians overfitted to every training view, which degrades the rendering quality. Additionally, while 3D Gaussian Splatting excels in small-scale and object-centric scenes, its application to larger scenes is hindered by constraints such as limited video memory, excessive optimization duration, and variable appearance across views. To address these challenges, we introduce GaussianFocus, an innovative approach that incorporates a patch attention algorithm to refine rendering quality and implements a Gaussian constraints strategy to minimize redundancy. Moreover, we propose a subdivision reconstruction strategy for large-scale scenes, dividing them into smaller, manageable blocks for individual training. Our results indicate that GaussianFocus significantly reduces unnecessary Gaussians and enhances rendering quality, surpassing existing State-of-The-Art (SoTA) methods. Furthermore, we demonstrate the capability of our approach to effectively manage and render large scenes, such as urban environments, whilst maintaining high fidelity in the visual output.

Updated: 2025-03-22 15:18:23

标题: 高斯焦点：三维高斯喷涂的约束注意焦点

摘要: 最近在3D重建和神经渲染方面的发展显著推动了各个学术和工业领域的逼真3D场景渲染能力。3D高斯喷溅技术及其衍生物整合了基于基元和体积表示的优势，提供了一流的渲染质量和效率。尽管取得了这些进展，该方法往往会在每个训练视图上生成过多的冗余噪声高斯，从而降低了渲染质量。此外，尽管3D高斯喷溅在小规模和以对象为中心的场景中表现优异，但在应用于较大场景时受到了限制，如有限的视频内存、过多的优化时间和视图之间的外观差异。为了解决这些挑战，我们提出了一种创新方法GaussianFocus，它采用了一种补丁注意力算法来提高渲染质量，并实施了一种高斯约束策略来最小化冗余。此外，我们提出了一种用于大规模场景的细分重建策略，将它们分成更小、可管理的块进行单独训练。我们的结果表明，GaussianFocus显著减少了不必要的高斯，并提高了渲染质量，超越了现有的最新方法。此外，我们展示了我们的方法有效管理和渲染大场景的能力，如城市环境，同时保持视觉输出的高保真度。

更新时间: 2025-03-22 15:18:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17798v1

Enhancing Fourier Neural Operators with Local Spatial Features

Partial Differential Equation (PDE) problems often exhibit strong local spatial structures, and effectively capturing these structures is critical for approximating their solutions. Recently, the Fourier Neural Operator (FNO) has emerged as an efficient approach for solving these PDE problems. By using parametrization in the frequency domain, FNOs can efficiently capture global patterns. However, this approach inherently overlooks the critical role of local spatial features, as frequency-domain parameterized convolutions primarily emphasize global interactions without encoding comprehensive localized spatial dependencies. Although several studies have attempted to address this limitation, their extracted Local Spatial Features (LSFs) remain insufficient, and computational efficiency is often compromised. To address this limitation, we introduce a convolutional neural network (CNN) preprocessor to extract LSFs directly from input data, resulting in a hybrid architecture termed \textit{Conv-FNO}. Furthermore, we introduce two novel resizing schemes to make our Conv-FNO resolution invariant. In this work, we focus on demonstrating the effectiveness of incorporating LSFs into FNOs by conducting both a theoretical analysis and extensive numerical experiments. Our findings show that this simple yet impactful modification enhances the representational capacity of FNOs and significantly improves performance on challenging PDE benchmarks.

Updated: 2025-03-22 15:11:56

标题: 利用本地空间特征增强傅里叶神经算子

摘要: 偏微分方程（PDE）问题通常表现出强烈的局部空间结构，有效捕捉这些结构对于逼近其解是至关重要的。最近，傅立叶神经算子（FNO）已经成为解决这些PDE问题的有效方法。通过在频域中使用参数化，FNO可以有效地捕捉全局模式。然而，这种方法固有地忽视了局部空间特征的关键作用，因为频域参数化卷积主要强调全局交互作用，而没有编码全面的局部空间依赖关系。尽管有几项研究尝试解决这一限制，但它们提取的局部空间特征（LSFs）仍然不足，并且往往牺牲了计算效率。为了解决这一限制，我们引入了一个卷积神经网络（CNN）预处理器，直接从输入数据中提取LSFs，从而形成一个混合架构称为\textit{Conv-FNO}。此外，我们引入了两种新的调整方案，使我们的Conv-FNO具有分辨率不变性。在这项工作中，我们专注于通过进行理论分析和大量数值实验，展示将LSFs纳入FNO的有效性。我们的研究结果表明，这种简单但有重大影响的修改提高了FNO的表征能力，并显著改善了在具有挑战性的PDE基准测试中的性能。

更新时间: 2025-03-22 15:11:56

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2503.17797v1

Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models

Text-to-image generative models often struggle with long prompts detailing complex scenes, diverse objects with distinct visual characteristics and spatial relationships. In this work, we propose SCoPE (Scheduled interpolation of Coarse-to-fine Prompt Embeddings), a training-free method to improve text-to-image alignment by progressively refining the input prompt in a coarse-to-fine-grained manner. Given a detailed input prompt, we first decompose it into multiple sub-prompts which evolve from describing broad scene layout to highly intricate details. During inference, we interpolate between these sub-prompts and thus progressively introduce finer-grained details into the generated image. Our training-free plug-and-play approach significantly enhances prompt alignment, achieves an average improvement of up to +4% in Visual Question Answering (VQA) scores over the Stable Diffusion baselines on 85% of the prompts from the GenAI-Bench dataset.

Updated: 2025-03-22 15:05:21

标题: 渐进式提示细节设计以改善文本到图像生成模型的对齐

摘要: 文本到图像生成模型经常在详细描述复杂场景、具有明显视觉特征和空间关系的多样对象的长提示方面遇到困难。在这项工作中，我们提出了SCoPE（Scheduled interpolation of Coarse-to-fine Prompt Embeddings），这是一种无需训练的方法，通过以粗粒度到细粒度的方式逐渐改进输入提示，从而改善文本到图像的对齐。给定一个详细的输入提示，我们首先将其分解为多个子提示，这些子提示从描述广泛的场景布局逐渐演变为高度复杂的细节。在推理过程中，我们在这些子提示之间插值，从而逐渐引入更精细的细节到生成的图像中。我们的无需训练的即插即用方法显著增强了提示对齐，相比于基准的稳定扩散，在GenAI-Bench数据集的85%提示中，视觉问答（VQA）分数平均提高了高达+4%。

更新时间: 2025-03-22 15:05:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17794v1

Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency. Many attempts have been released in the open source community to break the trade-off between performance and efficiency, such as the Qwen Coder series and the DeepSeek Coder series. This paper introduces yet another attempt in this area, namely Ling-Coder-Lite. We leverage the efficient Mixture-of-Experts (MoE) architecture along with a set of high-quality data curation methods (especially those based on program analytics) to build an efficient yet powerful code LLM. Ling-Coder-Lite exhibits on-par performance on 12 representative coding benchmarks compared to state-of-the-art models of similar size, such as Qwen2.5-Coder-7B and DeepSeek-Coder-V2-Lite, while offering competitive latency and throughput. In practice, we achieve a 50\% reduction in deployment resources compared to the similar-sized dense model without performance loss. To facilitate further research and development in this area, we open-source our models as well as a substantial portion of high-quality data for the annealing and post-training stages. The models and data can be accessed at~\url{https://huggingface.co/inclusionAI/Ling-Coder-lite}.

Updated: 2025-03-22 15:00:18

标题: 每个样本都很重要：利用专家混合和高质量数据实现高效准确的代码LLM

摘要: 最近对大型语言模型（LLMs）的进展表明，在代码生成和理解方面具有显著的能力。但是，构建一个性能全面而又高效的代码LLM仍然具有挑战性。许多尝试已在开源社区中发布，以打破性能和效率之间的权衡，例如Qwen Coder系列和DeepSeek Coder系列。本文介绍了在这一领域的另一次尝试，名为Ling-Coder-Lite。我们利用高效的专家混合（MoE）架构以及一系列高质量的数据整理方法（特别是基于程序分析的方法）来构建一个既高效又强大的代码LLM。与Qwen2.5-Coder-7B和DeepSeek-Coder-V2-Lite等类似大小的最新模型相比，Ling-Coder-Lite在12个代表性编码基准上表现出色，同时提供了具有竞争力的延迟和吞吐量。在实践中，与类似大小的密集模型相比，我们实现了50\%的部署资源减少而无性能损失。为了促进这一领域的进一步研究和发展，我们开源了我们的模型以及大量用于退火和后训练阶段的高质量数据。这些模型和数据可以通过链接https://huggingface.co/inclusionAI/Ling-Coder-lite访问。

更新时间: 2025-03-22 15:00:18

领域: cs.LG,cs.AI,cs.CL,I.2.7

下载: http://arxiv.org/abs/2503.17793v1

Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction

Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures and occlusions, causing significant difficulty in achieving plausible interaction alignment. Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts. To tackle this, we propose a novel framework that attempts to precisely align hand poses and interactions by synergistically integrating foundation model-driven 2D priors with diffusion-based interaction refinement for occlusion-resistant two-hand reconstruction. First, we introduce a Fusion Alignment Encoder that learns to align fused multimodal priors keypoints, segmentation maps, and depth cues from foundation models during training. This provides robust structured guidance, further enabling efficient inference without foundation models at test time while maintaining high reconstruction accuracy. Second, we employ a two-hand diffusion model explicitly trained to transform interpenetrated poses into plausible, non-penetrated interactions, leveraging gradient-guided denoising to correct artifacts and ensure realistic spatial relations. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on InterHand2.6M, FreiHAND, and HIC datasets, significantly advancing occlusion handling and interaction robustness.

Updated: 2025-03-22 14:42:27

标题: 将基础模型先验与基于扩散的手部互动对齐，实现抗遮挡的双手重建

摘要: 从单眼图像中重建双手面临着复杂和动态的手部姿势和遮挡，导致在实现合理交互对齐方面存在显著困难。现有方法在处理这种对齐问题时往往遇到困难，通常导致对齐不准确和穿透伪影。为了解决这个问题，我们提出了一个新颖的框架，通过将基于模型驱动的2D先验与基于扩散的相互作用细化相结合，试图精确对齐手部姿势和交互作用，实现抗遮挡的双手重建。首先，我们引入了一个融合对齐编码器，该编码器在训练过程中学习如何对齐基础模型中的融合多模态先验关键点、分割图和深度线索。这提供了强大的结构引导，进一步在测试时实现高效推断，同时保持高重建准确性。其次，我们采用了一个专门训练的双手扩散模型，将相互穿插的姿势转化为合理的、非穿透的交互作用，利用梯度引导去噪来校正伪影，并确保现实的空间关系。广泛的评估表明，我们的方法在InterHand2.6M、FreiHAND和HIC数据集上实现了最先进的性能，显著提升了遮挡处理和交互鲁棒性。

更新时间: 2025-03-22 14:42:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17788v1

MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation

The automatic generation of brain CT reports has gained widespread attention, given its potential to assist radiologists in diagnosing cranial diseases. However, brain CT scans involve extensive medical entities, such as diverse anatomy regions and lesions, exhibiting highly inconsistent spatial patterns in 3D volumetric space. This leads to biased learning of medical entities in existing methods, resulting in repetitiveness and inaccuracy in generated reports. To this end, we propose a Medical Entity-balanced Prompting Network (MEPNet), which harnesses the large language model (LLM) to fairly interpret various entities for accurate brain CT report generation. By introducing the visual embedding and the learning status of medical entities as enriched clues, our method prompts the LLM to balance the learning of diverse entities, thereby enhancing reports with comprehensive findings. First, to extract visual embedding of entities, we propose Knowledge-driven Joint Attention to explore and distill entity patterns using both explicit and implicit medical knowledge. Then, a Learning Status Scorer is designed to evaluate the learning of entity visual embeddings, resulting in unique learning status for individual entities. Finally, these entity visual embeddings and status are elaborately integrated into multi-modal prompts, to guide the text generation of LLM. This process allows LLM to self-adapt the learning process for biased-fitted entities, thereby covering detailed findings in generated reports. We conduct experiments on two brain CT report generation benchmarks, showing the effectiveness in clinical accuracy and text coherence.

Updated: 2025-03-22 14:31:30

标题: MEPNet：医学实体平衡提示网络用于脑CT报告生成

摘要: 自动生成脑CT报告已经引起了广泛关注，因为它有潜力帮助放射科医师诊断颅内疾病。然而，脑CT扫描涉及大量的医学实体，如不同的解剖区域和病变，在3D体积空间中呈现高度不一致的空间模式。这导致现有方法中医学实体的学习存在偏见，导致生成的报告重复性和不准确性。因此，我们提出了一个医学实体平衡提示网络（MEPNet），利用大型语言模型（LLM）公平解释各种实体，以实现准确的脑CT报告生成。通过引入视觉嵌入和医学实体的学习状态作为丰富的线索，我们的方法促使LLM平衡学习不同实体，从而增强报告的综合发现。首先，为了提取实体的视觉嵌入，我们提出基于知识驱动的联合注意力来探索和提炼实体模式，同时利用显式和隐式医学知识。然后，设计了一个学习状态评分器来评估实体视觉嵌入的学习情况，为每个实体产生独特的学习状态。最后，这些实体视觉嵌入和状态被精心整合到多模态提示中，以指导LLM的文本生成。这个过程允许LLM自适应学习过程，以适应偏见适合的实体，从而包括生成报告中的详细发现。我们在两个脑CT报告生成基准上进行实验，展示了在临床准确性和文本连贯性方面的有效性。

更新时间: 2025-03-22 14:31:30

领域: cs.AI

下载: http://arxiv.org/abs/2503.17784v1

Energy-Aware LLMs: A step towards sustainable AI for downstream applications

Advanced Large Language Models (LLMs) have revolutionized various fields, including communication networks, sparking an innovation wave that has led to new applications and services, and significantly enhanced solution schemes. Despite all these impressive developments, most LLMs typically require huge computational resources, resulting in terribly high energy consumption. Thus, this research study proposes an end-to-end pipeline that investigates the trade-off between energy efficiency and model performance for an LLM during fault ticket analysis in communication networks. It further evaluates the pipeline performance using two real-world datasets for the tasks of root cause analysis and response feedback in a communication network. Our results show that an appropriate combination of quantization and pruning techniques is able to reduce energy consumption while significantly improving model performance.

Updated: 2025-03-22 14:28:29

标题: 能源感知的LLMs：迈向可持续人工智能在下游应用中的一步

摘要: 先进的大型语言模型（LLMs）已经彻底改变了各个领域，包括通信网络，引发了一波创新浪潮，带来了新的应用和服务，并显著增强了解决方案方案。尽管有所有这些令人印象深刻的发展，大多数LLMs通常需要巨大的计算资源，导致能耗极高。因此，本研究提出了一个端到端的流程，研究在通信网络中进行故障票分析过程中LLM的能效和模型性能之间的权衡。它进一步使用两个真实世界数据集评估了流程的性能，用于通信网络中的根本原因分析和响应反馈任务。我们的结果表明，适当结合量化和修剪技术能够降低能耗，同时显著提高模型性能。

更新时间: 2025-03-22 14:28:29

领域: cs.PF,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.17783v1

Towards Seamless Hierarchical Federated Learning under Intermittent Client Participation: A Stagewise Decision-Making Methodology

Federated Learning (FL) offers a pioneering distributed learning paradigm that enables devices/clients to build a shared global model. This global model is obtained through frequent model transmissions between clients and a central server, which may cause high latency, energy consumption, and congestion over backhaul links. To overcome these drawbacks, Hierarchical Federated Learning (HFL) has emerged, which organizes clients into multiple clusters and utilizes edge nodes (e.g., edge servers) for intermediate model aggregations between clients and the central server. Current research on HFL mainly focus on enhancing model accuracy, latency, and energy consumption in scenarios with a stable/fixed set of clients. However, addressing the dynamic availability of clients -- a critical aspect of real-world scenarios -- remains underexplored. This study delves into optimizing client selection and client-to-edge associations in HFL under intermittent client participation so as to minimize overall system costs (i.e., delay and energy), while achieving fast model convergence. We unveil that achieving this goal involves solving a complex NP-hard problem. To tackle this, we propose a stagewise methodology that splits the solution into two stages, referred to as Plan A and Plan B. Plan A focuses on identifying long-term clients with high chance of participation in subsequent model training rounds. Plan B serves as a backup, selecting alternative clients when long-term clients are unavailable during model training rounds. This stagewise methodology offers a fresh perspective on client selection that can enhance both HFL and conventional FL via enabling low-overhead decision-making processes. Through evaluations on MNIST and CIFAR-10 datasets, we show that our methodology outperforms existing benchmarks in terms of model accuracy and system costs.

Updated: 2025-03-22 13:48:11

标题: 朝向无缝分层联邦学习在间断客户参与下：一种分阶段决策方法论

摘要: 联合学习（FL）提供了一种开创性的分布式学习范式，使设备/客户能够构建共享的全局模型。通过客户和中央服务器之间频繁的模型传输，可以获得这个全局模型，这可能会导致高延迟、能耗和背部链路拥塞。为了克服这些缺点，分层联合学习（HFL）应运而生，它将客户组织成多个集群，并利用边缘节点（例如边缘服务器）在客户和中央服务器之间进行中间模型聚合。目前关于HFL的研究主要集中在提高模型准确性、延迟和能耗等方面，在具有稳定/固定客户集的场景中。然而，解决客户的动态可用性--现实场景中的一个关键方面--仍未得到充分探讨。这项研究深入探讨了在间歇性客户参与条件下优化HFL中的客户选择和客户到边缘的关联，以最小化整体系统成本（即延迟和能耗），同时实现快速模型收敛。我们揭示了实现这一目标涉及解决一个复杂的NP难题。为了解决这个问题，我们提出了一个分阶段方法，将解决方案分为两个阶段，称为计划A和计划B。计划A侧重于识别在随后的模型训练轮次中有高参与机会的长期客户。计划B作为备用方案，在模型训练轮次中长期客户不可用时选择替代客户。这种分阶段方法提供了一个新的客户选择视角，可以通过实现低开销的决策过程来增强HFL和传统FL。通过对MNIST和CIFAR-10数据集的评估，我们展示了我们的方法在模型准确性和系统成本方面优于现有基准。

更新时间: 2025-03-22 13:48:11

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2502.09303v2

Renewable Energy Transition in South America: Predictive Analysis of Generation Capacity by 2050

In this research, renewable energy expansion in South America up to 2050 is predicted based on machine learning models that are trained on past energy data. The research employs gradient boosting regression and Prophet time series forecasting to make predictions of future generation capacities for solar, wind, hydroelectric, geothermal, biomass, and other renewable sources in South American nations. Model output analysis indicates staggering future expansion in the generation of renewable energy, with solar and wind energy registering the highest expansion rates. Geospatial visualization methods were applied to illustrate regional disparities in the utilization of renewable energy. The results forecast South America to record nearly 3-fold growth in the generation of renewable energy by the year 2050, with Brazil and Chile spearheading regional development. Such projections help design energy policy, investment strategy, and climate change mitigation throughout the region, in helping the developing economies to transition to sustainable energy.

Updated: 2025-03-22 13:41:00

标题: 南美可再生能源转型：2050年发电容量的预测分析

摘要: 在这项研究中，基于过去能源数据训练的机器学习模型，预测了南美到2050年可再生能源的扩张情况。该研究采用了梯度提升回归和Prophet时间序列预测来预测南美国家太阳能、风能、水力发电、地热能、生物质和其他可再生能源未来发电能力。模型输出分析表明，可再生能源发电量将在未来呈惊人的增长趋势，太阳能和风能的扩张速度最高。地理空间可视化方法被应用于展示可再生能源利用的区域差异。结果预测，到2050年，南美可再生能源发电将增长近3倍，巴西和智利将引领区域发展。这样的预测有助于制定能源政策、投资战略和气候变化缓解计划，帮助发展中国家过渡到可持续能源。

更新时间: 2025-03-22 13:41:00

领域: cs.LG

下载: http://arxiv.org/abs/2503.17771v1

Design and implementation of a novel cryptographically secure pseudorandom number generator

The aim of this paper is to present a new design for a pseudorandom number generator (PRNG) that is cryptographically secure, passes all of the usual statistical tests referenced in the literature and hence generates high quality random sequences, that is compact and easy to implement in practice, of portable design and offering reasonable execution times. Our procedure achieves those objectives through the use of a sequence of modular exponentiations followed by the application of Feistel-like boxes that mix up bits using a nonlinear function. The results of extensive statistical tests on sequences of about 2^40 bits in size generated by our algorithm are also presented.

Updated: 2025-03-22 13:15:00

标题: 设计和实现一种新型的具有加密安全特性的伪随机数生成器

摘要: 本文旨在提出一种新的伪随机数生成器（PRNG）设计，该设计在密码学上是安全的，通过了文献中引用的所有常规统计测试，因此生成高质量的随机序列，而且设计紧凑且易于实现，具有便携设计并提供合理的执行时间。我们的方法通过使用一系列模指数运算，然后应用类似Feistel的盒子，使用非线性函数混合位，实现了这些目标。我们还展示了通过我们的算法生成的大约2^40位大小的序列的广泛统计测试结果。

更新时间: 2025-03-22 13:15:00

领域: cs.CR,cs.NA,math.NA,math.NT,65C10, 11T71,D.4.6; E.3

下载: http://arxiv.org/abs/2503.17767v1

Lifelong Evolution of Swarms

Adapting to task changes without forgetting previous knowledge is a key skill for intelligent systems, and a crucial aspect of lifelong learning. Swarm controllers, however, are typically designed for specific tasks, lacking the ability to retain knowledge across changing tasks. Lifelong learning, on the other hand, focuses on individual agents with limited insights into the emergent abilities of a collective like a swarm. To address this gap, we introduce a lifelong evolutionary framework for swarms, where a population of swarm controllers is evolved in a dynamic environment that incrementally presents novel tasks. This requires evolution to find controllers that quickly adapt to new tasks while retaining knowledge of previous ones, as they may reappear in the future. We discover that the population inherently preserves information about previous tasks, and it can reuse it to foster adaptation and mitigate forgetting. In contrast, the top-performing individual for a given task catastrophically forgets previous tasks. To mitigate this phenomenon, we design a regularization process for the evolutionary algorithm, reducing forgetting in top-performing individuals. Evolving swarms in a lifelong fashion raises fundamental questions on the current state of deep lifelong learning and on the robustness of swarm controllers in dynamic environments.

Updated: 2025-03-22 13:08:31

标题: 群体的终身进化

摘要: 适应任务变化而不忘记先前知识是智能系统的关键技能，也是终身学习的关键方面。然而，群体控制器通常被设计用于特定任务，缺乏在不同任务之间保留知识的能力。另一方面，终身学习侧重于对群体的累积能力有限的个体。为了填补这一差距，我们引入了一个针对群体的终身进化框架，其中一群群体控制器在动态环境中进化，逐渐呈现新任务。这要求进化找到能够快速适应新任务并保留先前任务知识的控制器，因为它们可能在未来再次出现。我们发现，群体固有地保留了关于先前任务的信息，并且可以重复利用它来促进适应并减轻遗忘。相比之下，针对特定任务的表现最佳个体会灾难性地忘记先前的任务。为了减轻这种现象，我们设计了一种用于进化算法的正则化过程，减少了表现最佳个体的遗忘。以终身方式进化群体在当前深度终身学习的现状和在动态环境中群体控制器的稳健性方面提出了基本问题。

更新时间: 2025-03-22 13:08:31

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2503.17763v1

Oscillatory Signatures of Parkinson's Disease: Central and Parietal EEG Alterations Across Multiple Frequency Bands

This study investigates EEG as a potential early biomarker by applying deep learning techniques to resting-state EEG recordings from 31 subjects (15 with PD and 16 healthy controls). EEG signals underwent preprocessing to remove tremor artifacts before classification with CNNs using wavelet-based electrode triplet images. Our analysis across different brain regions and frequency bands showed distinct spatial-spectral patterns of PD-related neural oscillations. We identified high classification accuracy (76%) using central electrodes (C3, Cz, C4) with full-spectrum 0.4-62.4 Hz analysis and 74% accuracy in right parietal regions (P8, CP6, P4) with 10-second windows. Bilateral centro-parietal regions showed strong performance (67%) in the theta band (4.0-7.79 Hz), while multiple areas demonstrated some sensitivity (65%) in the alpha band (7.8-15.59 Hz). We also observed a distinctive topographical pattern of gamma band (40-62.4 Hz) alterations specifically localized to central-parietal regions, which remained consistent across different temporal windows. In particular, we observed pronounced right-hemisphere involvement across several frequency bands. Unlike previous studies that achieved higher accuracies by potentially including tremor artifacts, our approach isolates genuine neurophysiological alterations in cortical activity. These findings suggest that specific EEG-based oscillatory patterns, especially in central and parietal regions and across multiple frequency bands, may provide diagnostic information for PD, potentially before the onset of motor symptoms.

Updated: 2025-03-22 13:06:50

标题: 帕金森病的振荡特征：多频带下中央和顶叶脑电图变化

摘要: 这项研究通过将深度学习技术应用于31名受试者（15名帕金森病患者和16名健康对照者）的静息态脑电图记录，研究了脑电图作为潜在早期生物标志物。在分类之前，脑电信号经过预处理以去除震颤伪迹，并使用基于小波的电极三元组图像进行卷积神经网络分类。我们的分析跨越不同脑区和频带显示出与PD相关的神经振荡的明显空间-谱图案。我们发现在中央电极（C3、Cz、C4）上使用全频谱0.4-62.4 Hz分析可以获得高分类准确度（76%），在右顶叶区域（P8、CP6、P4）使用10秒窗口可以获得74%的准确度。双侧中央-顶叶区域在θ波段（4.0-7.79 Hz）表现出强大性能（67%），而多个区域在α波段（7.8-15.59 Hz）显示出一定的敏感性（65%）。我们还观察到伽玛波段（40-62.4 Hz）的独特地形图样改变，特别局限在中央-顶叶区域，这种模式在不同时间窗口下保持一致。特别是，我们观察到在几个频带上明显的右半球参与。与以往通过可能包含震颤伪迹而获得更高准确度的研究不同，我们的方法隔离了皮层活动中的真实神经生理改变。这些发现表明，特定的基于脑电图的振荡模式，尤其是在中央和顶叶区域以及跨越多个频带，可能为PD提供诊断信息，潜在地在运动症状发作之前。

更新时间: 2025-03-22 13:06:50

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2503.12392v2

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Discrete visual tokenizers transform images into a sequence of tokens, enabling token-based visual generation akin to language models. However, this process is inherently challenging, as it requires both compressing visual signals into a compact representation and discretizing them into a fixed set of codes. Traditional discrete tokenizers typically learn the two tasks jointly, often leading to unstable training, low codebook utilization, and limited reconstruction quality. In this paper, we introduce \textbf{CODA}(\textbf{CO}ntinuous-to-\textbf{D}iscrete \textbf{A}daptation), a framework that decouples compression and discretization. Instead of training discrete tokenizers from scratch, CODA adapts off-the-shelf continuous VAEs -- already optimized for perceptual compression -- into discrete tokenizers via a carefully designed discretization process. By primarily focusing on discretization, CODA ensures stable and efficient training while retaining the strong visual fidelity of continuous VAEs. Empirically, with $\mathbf{6 \times}$ less training budget than standard VQGAN, our approach achieves a remarkable codebook utilization of 100% and notable reconstruction FID (rFID) of $\mathbf{0.43}$ and $\mathbf{1.34}$ for $8 \times$ and $16 \times$ compression on ImageNet 256$\times$ 256 benchmark.

Updated: 2025-03-22 12:59:00

标题: CODA：将连续VAE重新用于离散标记化

摘要: 离散视觉标记器将图像转换为一系列标记，使得基于标记的视觉生成类似于语言模型。然而，这一过程本质上具有挑战性，因为它要求将视觉信号压缩为紧凑表示，并将其离散化为固定的代码集。传统的离散标记器通常同时学习这两个任务，往往导致训练不稳定、代码本利用率低和重建质量有限。在本文中，我们引入\textbf{CODA}（\textbf{CO}ntinuous-to-\textbf{D}iscrete \textbf{A}daptation），这是一个框架，它将压缩和离散化分离。CODA不是从头开始训练离散标记器，而是通过精心设计的离散化过程，将已经为感知压缩优化的现成连续VAEs调整为离散标记器。通过主要关注离散化，CODA确保稳定和高效的训练，同时保留连续VAEs的强视觉保真度。在实证上，与标准VQGAN相比，我们的方法的训练预算仅为其的$\mathbf{6 \times}$，在ImageNet 256$\times$ 256基准上实现了100%的显著代码本利用率和重建FID（rFID）为$\mathbf{0.43}$和$\mathbf{1.34}$，对于$8 \times$和$16 \times$的压缩。

更新时间: 2025-03-22 12:59:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17760v1

Enhancing Startup Success Predictions in Venture Capital: A GraphRAG Augmented Multivariate Time Series Method

In the Venture Capital (VC) industry, predicting the success of startups is challenging due to limited financial data and the need for subjective revenue forecasts. Previous methods based on time series analysis often fall short as they fail to incorporate crucial inter-company relationships such as competition and collaboration. To fill the gap, this paper aims to introduce a novel approach using GraphRAG augmented time series model. With GraphRAG, time series predictive methods are enhanced by integrating these vital relationships into the analysis framework, allowing for a more dynamic understanding of the startup ecosystem in venture capital. Our experimental results demonstrate that our model significantly outperforms previous models in startup success predictions.

Updated: 2025-03-22 12:47:44

标题: 提升风险投资中初创公司成功预测能力：一种GraphRAG增强的多元时间序列方法

摘要: 在风险投资（VC）行业中，预测初创公司成功的挑战在于有限的财务数据和需要主观收入预测。以往基于时间序列分析的方法常常无法胜任，因为它们未能纳入关键的公司间关系，如竞争和合作。为填补这一空白，本文旨在引入一种新颖方法，使用GraphRAG增强时间序列模型。通过GraphRAG，时间序列预测方法通过将这些重要关系整合到分析框架中得到加强，从而使风险投资中的初创公司生态系统更加动态。我们的实验结果表明，我们的模型在初创公司成功预测方面明显优于以往模型。

更新时间: 2025-03-22 12:47:44

领域: q-fin.CP,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.09420v5

Unsupervised Structural-Counterfactual Generation under Domain Shift

Motivated by the burgeoning interest in cross-domain learning, we present a novel generative modeling challenge: generating counterfactual samples in a target domain based on factual observations from a source domain. Our approach operates within an unsupervised paradigm devoid of parallel or joint datasets, relying exclusively on distinct observational samples and causal graphs for each domain. This setting presents challenges that surpass those of conventional counterfactual generation. Central to our methodology is the disambiguation of exogenous causes into effect-intrinsic and domain-intrinsic categories. This differentiation facilitates the integration of domain-specific causal graphs into a unified joint causal graph via shared effect-intrinsic exogenous variables. We propose leveraging Neural Causal models within this joint framework to enable accurate counterfactual generation under standard identifiability assumptions. Furthermore, we introduce a novel loss function that effectively segregates effect-intrinsic from domain-intrinsic variables during model training. Given a factual observation, our framework combines the posterior distribution of effect-intrinsic variables from the source domain with the prior distribution of domain-intrinsic variables from the target domain to synthesize the desired counterfactuals, adhering to Pearl's causal hierarchy. Intriguingly, when domain shifts are restricted to alterations in causal mechanisms without accompanying covariate shifts, our training regimen parallels the resolution of a conditional optimal transport problem. Empirical evaluations on a synthetic dataset show that our framework generates counterfactuals in the target domain that very closely resemble the ground truth.

Updated: 2025-03-22 12:42:42

标题: 无监督的领域漂移下结构对抗生成

摘要: 由于跨领域学习日益引起人们的兴趣，我们提出了一个新颖的生成建模挑战：在目标领域中基于源领域的事实观察生成对事实样本。我们的方法在无监督范式下运行，不涉及平行或联合数据集，完全依赖于每个领域的不同观察样本和因果图。这一设置带来的挑战超过了传统的反事实生成。我们方法的核心是将外生原因区分为效应内在和领域内在类别。这种区分有助于通过共享效应内在外生变量将领域特定因果图集成到统一的联合因果图中。我们提议在这个联合框架内利用神经因果模型，在标准可识别性假设下实现准确的反事实生成。此外，我们引入了一种有效地将效应内在变量与领域内在变量在模型训练过程中分隔的新损失函数。在给定事实观察的情况下，我们的框架将源领域的效应内在变量的后验分布与目标领域的领域内在变量的先验分布结合起来，以合成所需的反事实，遵循Pearl的因果层次。有趣的是，当领域转移仅限于因果机制的变化而没有伴随协变量转移时，我们的训练方案类似于解决条件最优输运问题。对一个合成数据集的实证评估表明，我们的框架生成的目标领域的反事实与真实情况非常接近。

更新时间: 2025-03-22 12:42:42

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.12013v2

Bandwidth Reservation for Time-Critical Vehicular Applications: A Multi-Operator Environment

Onsite bandwidth reservation requests often face challenges such as price fluctuations and fairness issues due to unpredictable bandwidth availability and stringent latency requirements. Requesting bandwidth in advance can mitigate the impact of these fluctuations and ensure timely access to critical resources. In a multi-Mobile Network Operator (MNO) environment, vehicles need to select cost-effective and reliable resources for their safety-critical applications. This research aims to minimize resource costs by finding the best price among multiple MNOs. It formulates multi-operator scenarios as a Markov Decision Process (MDP), utilizing a Deep Reinforcement Learning (DRL) algorithm, specifically Dueling Deep Q-Learning. For efficient and stable learning, we propose a novel area-wise approach and an adaptive MDP synthetic close to the real environment. The Temporal Fusion Transformer (TFT) is used to handle time-dependent data and model training. Furthermore, the research leverages Amazon spot price data and adopts a multi-phase training approach, involving initial training on synthetic data, followed by real-world data. These phases enable the DRL agent to make informed decisions using insights from historical data and real-time observations. The results show that our model leads to significant cost reductions, up to 40%, compared to scenarios without a policy model in such a complex environment.

Updated: 2025-03-22 12:36:23

标题: 标题翻译为：面向时间关键性车载应用的带宽预留：多运营商环境

摘要: 现场带宽预留请求通常面临价格波动和公平性问题，这是由于带宽可用性不可预测和严格的延迟要求造成的。提前请求带宽可以减轻这些波动的影响，并确保及时访问关键资源。在多个移动网络运营商（MNO）环境中，车辆需要为其安全关键应用选择性价比高且可靠的资源。本研究旨在通过找到多个MNO中最佳价格来最小化资源成本。它将多运营商场景形式化为马尔可夫决策过程（MDP），利用深度强化学习（DRL）算法，具体是Dueling Deep Q-Learning。为了高效稳定的学习，我们提出了一种新颖的区域化方法和一种接近真实环境的自适应MDP。时间融合变压器（TFT）用于处理时间依赖数据和模型训练。此外，研究利用亚马逊的竞价数据，并采用多阶段训练方法，包括在合成数据上进行初始训练，然后是真实数据。这些阶段使得DRL代理能够根据历史数据和实时观察得出明智决策。结果显示，与在这种复杂环境中没有政策模型的情景相比，我们的模型可以实现显著的成本降低，高达40%。

更新时间: 2025-03-22 12:36:23

领域: cs.LG,cs.AI,cs.CR,cs.NE

下载: http://arxiv.org/abs/2503.17756v1

Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes

Large Language Models (LLMs) are often used as automated judges to evaluate text, but their effectiveness can be hindered by various unintentional biases. We propose using linear classifying probes, trained by leveraging differences between contrasting pairs of prompts, to directly access LLMs' latent knowledge and extract more accurate preferences. Through extensive experiments using models of varying size from four different families and six diverse datasets assessing text quality evaluation and common sense reasoning, we demonstrate that both supervised and unsupervised probing approaches consistently outperform traditional generation-based judgement while maintaining similar computational costs. These probes generalise under domain shifts and can even outperform finetuned evaluators with the same training data size. Our results suggest linear probing offers an accurate, robust and computationally efficient approach for LLM-as-judge tasks while providing interpretable insights into how models encode judgement-relevant knowledge. Our data and code will be openly released in the future.

Updated: 2025-03-22 12:35:25

标题: 通过对分类探针进行识别潜在知识以提高LLMs中的偏好提取

摘要: 大型语言模型（LLMs）通常被用作自动评判员来评估文本，但它们的有效性可能会受到各种无意识的偏见的阻碍。我们提出使用线性分类探针，通过利用对比提示的差异进行训练，直接访问LLMs的潜在知识并提取更准确的偏好。通过使用来自四个不同家族的模型和六个不同数据集进行广泛实验，评估文本质量评估和常识推理，我们证明监督和无监督的探测方法始终优于传统的基于生成的判断，同时保持类似的计算成本。这些探针在领域转移下具有泛化能力，甚至可以胜过具有相同训练数据规模的微调评估器。我们的结果表明，线性探测提供了一种准确、稳健且计算效率高的LLM评判任务方法，同时提供了如何模型编码与判断相关知识的可解释见解。我们的数据和代码将来会公开发布。

更新时间: 2025-03-22 12:35:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.17755v1

Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information

Language agents powered by large language models (LLMs) face significant deployment challenges in resource-constrained environments, particularly for specialized domains and less-common languages. This paper presents Tox-chat, a Korean chemical toxicity information agent devised within these limitations. We propose two key innovations: a context-efficient architecture that reduces token consumption through hierarchical section search, and a scenario-based dialogue generation methodology that effectively distills tool-using capabilities from larger models. Experimental evaluations demonstrate that our fine-tuned 8B parameter model substantially outperforms both untuned models and baseline approaches, in terms of DB faithfulness and preference. Our work offers valuable insights for researchers developing domain-specific language agents under practical constraints.

Updated: 2025-03-22 12:34:15

标题: 构建受资源限制的语言智能体：一项关于化学毒性信息的韩国案例研究

摘要: 由大型语言模型（LLMs）驱动的语言代理在资源受限环境中面临着重大的部署挑战，特别是针对专业领域和较少使用的语言。本文介绍了Tox-chat，一个在这些限制条件下设计的韩文化学毒性信息代理。我们提出了两个关键创新：一种通过分层部分搜索减少令牌消耗的上下文高效架构，以及一种基于场景的对话生成方法，有效地从更大的模型中提炼出工具使用能力。实验评估表明，我们微调的8B参数模型在DB忠实度和偏好方面明显优于未调整的模型和基线方法。我们的工作为在实际约束条件下开发特定领域语言代理的研究人员提供了宝贵的见解。

更新时间: 2025-03-22 12:34:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17753v1

LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation

Recently, the field of text-guided 3D scene generation has garnered significant attention. High-quality generation that aligns with physical realism and high controllability is crucial for practical 3D scene applications. However, existing methods face fundamental limitations: (i) difficulty capturing complex relationships between multiple objects described in the text, (ii) inability to generate physically plausible scene layouts, and (iii) lack of controllability and extensibility in compositional scenes. In this paper, we introduce LayoutDreamer, a framework that leverages 3D Gaussian Splatting (3DGS) to facilitate high-quality, physically consistent compositional scene generation guided by text. Specifically, given a text prompt, we convert it into a directed scene graph and adaptively adjust the density and layout of the initial compositional 3D Gaussians. Subsequently, dynamic camera adjustments are made based on the training focal point to ensure entity-level generation quality. Finally, by extracting directed dependencies from the scene graph, we tailor physical and layout energy to ensure both realism and flexibility. Comprehensive experiments demonstrate that LayoutDreamer outperforms other compositional scene generation quality and semantic alignment methods. Specifically, it achieves state-of-the-art (SOTA) performance in the multiple objects generation metric of T3Bench.

Updated: 2025-03-22 12:12:36

标题: LAYOUTDREAMER: 物理引导的文本到3D组合场景生成布局

摘要: 最近，文本引导的3D场景生成领域引起了广泛关注。与物理现实性和高可控性相一致的高质量生成对于实际3D场景应用至关重要。然而，现有方法面临根本性限制：（i）难以捕捉文本描述的多个对象之间复杂关系，（ii）无法生成物理上合理的场景布局，以及（iii）在组合场景中缺乏可控性和可扩展性。在本文中，我们介绍了LayoutDreamer，这是一个利用3D高斯散点（3DGS）来促进受文本引导的高质量、物理一致的组合场景生成的框架。具体而言，给定一个文本提示，我们将其转换为一个定向场景图，并自适应调整初始组合3D高斯的密度和布局。随后，基于训练焦点进行动态摄像机调整，以确保实体级生成质量。最后，通过从场景图中提取定向依赖关系，我们调整物理和布局能量，以确保现实性和灵活性。全面的实验表明，LayoutDreamer在组合场景生成质量和语义对齐方法方面优于其他方法。具体而言，在T3Bench的多个对象生成指标中实现了最先进的性能。

更新时间: 2025-03-22 12:12:36

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2502.01949v2

Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals

Growing evidence suggests that layer attention mechanisms, which enhance interaction among layers in deep neural networks, have significantly advanced network architectures. However, existing layer attention methods suffer from redundancy, as attention weights learned by adjacent layers often become highly similar. This redundancy causes multiple layers to extract nearly identical features, reducing the model's representational capacity and increasing training time. To address this issue, we propose a novel approach to quantify redundancy by leveraging the Kullback-Leibler (KL) divergence between adjacent layers. Additionally, we introduce an Enhanced Beta Quantile Mapping (EBQM) method that accurately identifies and skips redundant layers, thereby maintaining model stability. Our proposed Efficient Layer Attention (ELA) architecture, improves both training efficiency and overall performance, achieving a 30\% reduction in training time while enhancing performance in tasks such as image classification and object detection.

Updated: 2025-03-22 12:05:30

标题: 通过修剪冗余检索增强层注意力效率

摘要: 越来越多的证据表明，增强深度神经网络中各层之间交互的层注意力机制显著推动了网络架构的发展。然而，现有的层注意力方法存在冗余问题，因为相邻层学习到的注意力权重往往非常相似。这种冗余导致多个层提取几乎相同的特征，降低了模型的表征能力并增加了训练时间。为了解决这个问题，我们提出了一种新颖的方法，通过利用相邻层之间的Kullback-Leibler（KL）散度来量化冗余。此外，我们引入了一种增强贝塔分位映射（EBQM）方法，准确识别并跳过冗余层，从而保持模型稳定性。我们提出的高效层注意力（ELA）架构提高了训练效率和整体性能，在图像分类和目标检测等任务中取得了30\%的训练时间缩短，并提升了性能。

更新时间: 2025-03-22 12:05:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.06473v3

Bimodal Connection Attention Fusion for Speech Emotion Recognition

Multi-modal emotion recognition is challenging due to the difficulty of extracting features that capture subtle emotional differences. Understanding multi-modal interactions and connections is key to building effective bimodal speech emotion recognition systems. In this work, we propose Bimodal Connection Attention Fusion (BCAF) method, which includes three main modules: the interactive connection network, the bimodal attention network, and the correlative attention network. The interactive connection network uses an encoder-decoder architecture to model modality connections between audio and text while leveraging modality-specific features. The bimodal attention network enhances semantic complementation and exploits intra- and inter-modal interactions. The correlative attention network reduces cross-modal noise and captures correlations between audio and text. Experiments on the MELD and IEMOCAP datasets demonstrate that the proposed BCAF method outperforms existing state-of-the-art baselines.

Updated: 2025-03-22 11:48:18

标题: 双模连接注意力融合用于语音情感识别

摘要: 多模态情感识别具有挑战性，因为提取能捕捉微妙情感差异的特征困难重重。理解多模态交互和连接对于构建有效的双模态语音情感识别系统至关重要。在这项工作中，我们提出了双模态连接注意力融合（BCAF）方法，包括三个主要模块：交互连接网络、双模态注意力网络和相关性注意力网络。交互连接网络使用编码器-解码器架构来模拟音频和文本之间的模态连接，同时利用模态特定特征。双模态注意力网络增强语义补充，利用模态内部和模态间的交互作用。相关性注意力网络减少跨模态噪声，并捕捉音频和文本之间的相关性。在MELD和IEMOCAP数据集上的实验表明，所提出的BCAF方法优于现有的最先进基线。

更新时间: 2025-03-22 11:48:18

领域: cs.SD,cs.AI,cs.CL,cs.MM,eess.AS

下载: http://arxiv.org/abs/2503.05858v3

V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction

Large Vision-Language Models (LVLMs) have made significant progress in the field of video understanding recently. However, current benchmarks uniformly lean on text prompts for evaluation, which often necessitate complex referential language and fail to provide precise spatial and temporal references. This limitation diminishes the experience and efficiency of human-model interaction. To address this limitation, we propose the Video Visual Prompt Benchmark(V2P-Bench), a comprehensive benchmark specifically designed to evaluate LVLMs' video understanding capabilities in multimodal human-model interaction scenarios. V2P-Bench includes 980 unique videos and 1,172 QA pairs, covering 5 main tasks and 12 dimensions, facilitating instance-level fine-grained understanding aligned with human cognition. Benchmarking results reveal that even the most powerful models perform poorly on V2P-Bench (65.4% for GPT-4o and 67.9% for Gemini-1.5-Pro), significantly lower than the human experts' 88.3%, highlighting the current shortcomings of LVLMs in understanding video visual prompts. We hope V2P-Bench will serve as a foundation for advancing multimodal human-model interaction and video understanding evaluation. Project page: https://github.com/gaotiexinqu/V2P-Bench.

Updated: 2025-03-22 11:30:46

标题: V2P-Bench: 使用视觉提示评估视频语言理解，以实现更好的人机交互

摘要: 最近，大型视觉语言模型（LVLMs）在视频理解领域取得了显著进展。然而，目前的基准测试统一依赖文本提示进行评估，这经常需要复杂的指代语言，并且无法提供精确的空间和时间参考。这种限制降低了人与模型交互的体验和效率。为了解决这一限制，我们提出了视频视觉提示基准测试（V2P-Bench），这是一个专门设计用于评估LVLMs在多模态人机交互场景中视频理解能力的综合基准测试。V2P-Bench包括980个独特的视频和1,172个问题-答案对，涵盖5个主要任务和12个维度，促进了与人类认知相一致的实例级细粒度理解。基准测试结果显示，即使最强大的模型在V2P-Bench上表现不佳（GPT-4o为65.4％，Gemini-1.5-Pro为67.9％），显著低于人类专家的88.3％，突显了LVLMs在理解视频视觉提示方面的当前不足。我们希望V2P-Bench能够成为推动多模态人机交互和视频理解评估的基础。项目页面：https://github.com/gaotiexinqu/V2P-Bench。

更新时间: 2025-03-22 11:30:46

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.17736v1

Score matching through the roof: linear, nonlinear, and latent variables causal discovery

Causal discovery from observational data holds great promise, but existing methods rely on strong assumptions about the underlying causal structure, often requiring full observability of all relevant variables. We tackle these challenges by leveraging the score function $\nabla \log p(X)$ of observed variables for causal discovery and propose the following contributions. First, we fine-tune the existing identifiability results with the score on additive noise models, showing that their assumption of nonlinearity of the causal mechanisms is not necessary. Second, we establish conditions for inferring causal relations from the score even in the presence of hidden variables; this result is two-faced: we demonstrate the score's potential to infer the equivalence class of causal graphs with hidden variables (while previous results are restricted to the fully observable setting), and we provide sufficient conditions for identifying direct causes in latent variable models. Building on these insights, we propose a flexible algorithm suited for causal discovery on linear, nonlinear, and latent variable models, which we empirically validate.

Updated: 2025-03-22 11:26:14

标题: 得分匹配提升至极限：线性、非线性和潜在变量因果发现

摘要: 从观测数据中发现因果关系具有巨大的潜力，但现有方法依赖于对潜在因果结构的强假设，通常要求对所有相关变量进行完全可观测。我们通过利用观测变量的评分函数$\nabla \log p(X)$来解决这些挑战，并提出以下贡献。首先，我们利用加性噪声模型的评分对现有可辨识性结果进行微调，表明它们对因果机制的非线性假设并非必要。其次，我们建立了在存在隐藏变量的情况下从评分中推断因果关系的条件；这一结果具有双重性：我们展示了评分能够推断具有隐藏变量的因果图的等价类（而先前的结果仅限于完全可观测的设置），并为在潜在变量模型中识别直接原因提供了充分条件。基于这些见解，我们提出了一个适用于线性、非线性和潜在变量模型的因果发现的灵活算法，并进行了实证验证。

更新时间: 2025-03-22 11:26:14

领域: stat.ML,cs.AI,stat.ME

下载: http://arxiv.org/abs/2407.18755v2

Aportes para el cumplimiento del Reglamento (UE) 2024/1689 en robótica y sistemas autónomos

Cybersecurity in robotics stands out as a key aspect within Regulation (EU) 2024/1689, also known as the Artificial Intelligence Act, which establishes specific guidelines for intelligent and automated systems. A fundamental distinction in this regulatory framework is the difference between robots with Artificial Intelligence (AI) and those that operate through automation systems without AI, since the former are subject to stricter security requirements due to their learning and autonomy capabilities. This work analyzes cybersecurity tools applicable to advanced robotic systems, with special emphasis on the protection of knowledge bases in cognitive architectures. Furthermore, a list of basic tools is proposed to guarantee the security, integrity, and resilience of these systems, and a practical case is presented, focused on the analysis of robot knowledge management, where ten evaluation criteria are defined to ensure compliance with the regulation and reduce risks in human-robot interaction (HRI) environments.

Updated: 2025-03-22 11:04:42

标题: Contributions to compliance with Regulation (EU) 2024/1689 in robotics and autonomous systems.

摘要: 机器人领域的网络安全在《欧盟法规2024/1689》，也被称为人工智能法案中，被视为一个关键方面，该法规为智能和自动化系统建立了具体指导方针。这一监管框架的一个基本区别在于具有人工智能（AI）的机器人和通过自动化系统而非AI操作的机器人之间的区别，因为前者由于其学习和自主能力而受到更严格的安全要求。本文分析了适用于先进机器人系统的网络安全工具，特别强调认知架构中知识库的保护。此外，提出了一系列基本工具，以确保这些系统的安全性、完整性和弹性，并提出了一个实际案例，重点分析机器人知识管理，其中定义了十项评估标准，以确保符合法规并降低人机交互（HRI）环境中的风险。

更新时间: 2025-03-22 11:04:42

领域: cs.RO,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.17730v1

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offline manner usually suffers from substantial domain gap between the VLN task and the LLM training corpus. This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision, leading to a significant mitigation of the domain gap in a cost-effective manner. Specifically, at each timestep, the LLM is prompted to forecast the navigational chain-of-thought by: 1) acting as a world model to imagine the next observation according to the instruction, 2) selecting the candidate observation that best aligns with the imagination, and 3) determining the action based on the reasoning from the prior steps. Through constructing formalized labels for training, the LLM can learn to generate desired and reasonable chain-of-thought outputs for improving the action decision. Experimental results across various training settings and popular VLN benchmarks (e.g., Room-to-Room (R2R), Room-across-Room (RxR), Room-for-Room (R4R)) show the significant superiority of NavCoT over the direct action prediction variants. Through simple parameter-efficient finetuning, our NavCoT outperforms a recent GPT4-based approach with ~7% relative improvement on the R2R dataset. We believe that NavCoT will help unlock more task-adaptive and scalable LLM-based embodied agents, which are helpful for developing real-world robotics applications. Code is available at https://github.com/expectorlin/NavCoT.

Updated: 2025-03-22 11:04:36

标题: NavCoT：通过学习解耦合推理提升基于LLM的视觉与语言导航

摘要: 视觉与语言导航（VLN）作为具有关键性研究问题的具体化AI，需要一个具体化的代理通过遵循自然语言指令在复杂的3D环境中导航。最近的研究强调了大型语言模型（LLMs）在VLN中提高导航推理准确性和可解释性的有前景能力。然而，它们主要是离线使用，通常会受到VLN任务和LLM训练语料库之间实质性领域差距的影响。本文介绍了一种称为导航思维链（NavCoT）的新策略，我们通过实现参数高效的领域内训练，使自主导航决策成为可能，从而显著减轻了领域差距，成本效益显著。具体而言，在每个时间步骤，LLM被提示通过以下方式预测导航思维链：1）充当世界模型，根据指令想象下一个观察，2）选择最符合想象的候选观察，3）根据前几步的推理确定行动。通过为训练构建形式化标签，LLM可以学习生成期望和合理的思维链输出，从而改善行动决策。在各种训练设置和流行的VLN基准测试（例如，Room-to-Room（R2R），Room-across-Room（RxR），Room-for-Room（R4R））上的实验结果显示了NavCoT相对于直接行动预测变体的显著优势。通过简单的参数高效微调，我们的NavCoT在R2R数据集上相对于最近的基于GPT4的方法实现了约7％的相对改进。我们相信NavCoT将有助于解锁更多基于任务适应性和可扩展性的LLM的具体化代理，这对于开发真实世界的机器人应用是有帮助的。代码可在https://github.com/expectorlin/NavCoT找到。

更新时间: 2025-03-22 11:04:36

领域: cs.CV,cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2403.07376v2

DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis

Recent advances in text-to-image diffusion models spurred research on personalization, i.e., a customized image synthesis, of subjects within reference images. Although existing personalization methods are able to alter the subjects' positions or to personalize multiple subjects simultaneously, they often struggle to modify the behaviors of subjects or their dynamic interactions. The difficulty is attributable to overfitting to reference images, which worsens if only a single reference image is available. We propose DynASyn, an effective multi-subject personalization from a single reference image addressing these challenges. DynASyn preserves the subject identity in the personalization process by aligning concept-based priors with subject appearances and actions. This is achieved by regularizing the attention maps between the subject token and images through concept-based priors. In addition, we propose concept-based prompt-and-image augmentation for an enhanced trade-off between identity preservation and action diversity. We adopt an SDE-based editing guided by augmented prompts to generate diverse appearances and actions while maintaining identity consistency in the augmented images. Experiments show that DynASyn is capable of synthesizing highly realistic images of subjects with novel contexts and dynamic interactions with the surroundings, and outperforms baseline methods in both quantitative and qualitative aspects.

Updated: 2025-03-22 10:56:35

标题: DynASyn：多主体个性化，实现动态动作合成

摘要: 最近文本到图像扩散模型的进展推动了个性化研究，即在参考图像中对主题进行定制图像合成。尽管现有的个性化方法能够改变主题的位置或同时个性化多个主题，但它们通常很难修改主题的行为或它们的动态互动。这种困难归因于对参考图像的过拟合，如果只有一个参考图像可用，则情况会变得更糟。我们提出了DynASyn，一种有效的从单个参考图像进行多主题个性化的方法，以解决这些挑战。DynASyn通过将基于概念的先验与主题的外观和动作进行对齐，保留了个性化过程中的主题身份。这是通过通过基于概念的先验正则化主题令牌和图像之间的注意力图实现的。此外，我们提出基于概念的提示和图像增强，以增强在保持身份一致性的同时实现动作多样性之间的权衡。我们采用基于SDE的编辑，通过增强提示来生成不同外观和动作，同时在增强图像中保持身份一致性。实验证明，DynASyn能够合成具有新颖背景和与周围环境动态互动的主题的高度逼真的图像，并在定量和定性方面优于基线方法。

更新时间: 2025-03-22 10:56:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17728v1

A Survey on Mathematical Reasoning and Optimization with Large Language Models

Mathematical reasoning and optimization are fundamental to artificial intelligence and computational problem-solving. Recent advancements in Large Language Models (LLMs) have significantly improved AI-driven mathematical reasoning, theorem proving, and optimization techniques. This survey explores the evolution of mathematical problem-solving in AI, from early statistical learning approaches to modern deep learning and transformer-based methodologies. We review the capabilities of pretrained language models and LLMs in performing arithmetic operations, complex reasoning, theorem proving, and structured symbolic computation. A key focus is on how LLMs integrate with optimization and control frameworks, including mixed-integer programming, linear quadratic control, and multi-agent optimization strategies. We examine how LLMs assist in problem formulation, constraint generation, and heuristic search, bridging theoretical reasoning with practical applications. We also discuss enhancement techniques such as Chain-of-Thought reasoning, instruction tuning, and tool-augmented methods that improve LLM's problem-solving performance. Despite their progress, LLMs face challenges in numerical precision, logical consistency, and proof verification. Emerging trends such as hybrid neural-symbolic reasoning, structured prompt engineering, and multi-step self-correction aim to overcome these limitations. Future research should focus on interpretability, integration with domain-specific solvers, and improving the robustness of AI-driven decision-making. This survey offers a comprehensive review of the current landscape and future directions of mathematical reasoning and optimization with LLMs, with applications across engineering, finance, and scientific research.

Updated: 2025-03-22 10:49:32

标题: 大型语言模型在数学推理和优化中的应用调查

摘要: 数学推理和优化是人工智能和计算问题解决的基础。最近大型语言模型（LLMs）的进展显著改善了基于人工智能的数学推理、定理证明和优化技术。本调查探讨了人工智能中数学问题解决方法的演变，从早期的统计学习方法到现代深度学习和基于Transformer的方法论。我们审查了预训练语言模型和LLMs在执行算术运算、复杂推理、定理证明和结构化符号计算方面的能力。重点关注LLMs如何与优化和控制框架集成，包括混合整数规划、线性二次控制和多智能体优化策略。我们研究了LLMs如何帮助问题的制定、约束生成和启发式搜索，将理论推理与实际应用相结合。我们还讨论了增强技术，如思维链推理、指令调整和工具增强方法，以提高LLMs的问题解决性能。尽管取得了进展，LLMs在数值精度、逻辑一致性和证明验证方面面临挑战。新兴趋势，如混合神经符号推理、结构化提示工程和多步自校正，旨在克服这些限制。未来的研究应重点关注可解释性、与领域特定求解器的集成，以及提高基于人工智能的决策的鲁棒性。本调查全面审查了LLMs在数学推理和优化领域的当前格局和未来方向，涉及工程、金融和科学研究等领域的应用。

更新时间: 2025-03-22 10:49:32

领域: cs.AI

下载: http://arxiv.org/abs/2503.17726v1

Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms

The availability of multiple training algorithms and architectures for generative models requires a selection mechanism to form a single model over a group of well-trained generation models. The selection task is commonly addressed by identifying the model that maximizes an evaluation score based on the diversity and quality of the generated data. However, such a best-model identification approach overlooks the possibility that a mixture of available models can outperform each individual model. In this work, we numerically show that a mixture of generative models on benchmark image datasets can indeed achieve a better evaluation score (based on FID and KID scores), compared to the individual models. This observation motivates the development of efficient algorithms for selecting the optimal mixture of the models. To address this, we formulate a quadratic optimization problem to find an optimal mixture model achieving the maximum of kernel-based evaluation scores including kernel inception distance (KID) and R\'enyi kernel entropy (RKE). To identify the optimal mixture of the models using the fewest possible sample queries, we view the selection task as a multi-armed bandit (MAB) problem and propose the Mixture Upper Confidence Bound (Mixture-UCB) algorithm that provably converges to the optimal mixture of the involved models. More broadly, the proposed Mixture-UCB can be extended to optimize every convex quadratic function of the mixture weights in a general MAB setting. We prove a regret bound for the Mixture-UCB algorithm and perform several numerical experiments to show the success of Mixture-UCB in finding the optimal mixture of text and image generative models. The project code is available at https://github.com/Rezaei-Parham/Mixture-UCB.

Updated: 2025-03-22 10:45:56

标题: 比最多元化更多元化：通过混合-UCB贝叶斯算法找到生成模型的最佳组合

摘要: 多种生成模型的训练算法和架构的可用性需要一个选择机制来形成一组经过良好训练的生成模型的单一模型。选择任务通常通过识别基于生成数据的多样性和质量的评估分数最大化的模型来解决。然而，这种最佳模型识别方法忽略了可用模型的混合可能优于每个单独模型的可能性。在这项工作中，我们通过数字方式展示，在基准图像数据集上的生成模型混合确实可以实现比单独模型更好的评估分数（基于FID和KID分数）。这一观察结果促使我们开发出选择最佳模型混合的高效算法。为了解决这个问题，我们制定了一个二次优化问题，以找到一个实现基于核的评估分数（包括核内禀距离（KID）和Rényi核熵（RKE））最大化的最佳混合模型。为了使用尽可能少的样本查询来识别最佳模型混合，我们将选择任务视为一个多臂赌博机（MAB）问题，并提出了Mixture Upper Confidence Bound（Mixture-UCB）算法，该算法被证明收敛到所涉及模型的最佳混合。更广泛地说，所提出的Mixture-UCB可以扩展到在一般MAB设置中优化混合权重的每个凸二次函数。我们为Mixture-UCB算法证明了一个后悔上界，并进行了几个数值实验，以展示Mixture-UCB在找到最佳文本和图像生成模型混合方面的成功。该项目代码可在https://github.com/Rezaei-Parham/Mixture-UCB上找到。

更新时间: 2025-03-22 10:45:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2412.17622v2

Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model

Backdoor attacks targeting text-to-image diffusion models have advanced rapidly, enabling attackers to implant malicious triggers into these models to manipulate their outputs. However, current backdoor samples often exhibit two key abnormalities compared to benign samples: 1) Semantic Consistency, where backdoor prompts tend to generate images with similar semantic content even with significant textual variations to the prompts; 2) Attention Consistency, where the trigger induces consistent structural responses in the cross-attention maps. These consistencies leave detectable traces for defenders, making backdoors easier to identify. To enhance the stealthiness of backdoor samples, we propose a novel Invisible Backdoor Attack (IBA) by explicitly mitigating these consistencies. Specifically, our approach leverages syntactic structures as backdoor triggers to amplify the sensitivity to textual variations, effectively breaking down the semantic consistency. Besides, a regularization method based on Kernel Maximum Mean Discrepancy (KMMD) is proposed to align the distribution of cross-attention responses between backdoor and benign samples, thereby disrupting attention consistency. Extensive experiments demonstrate that our IBA achieves a 97.5% attack success rate while exhibiting stronger resistance to defenses, with an average of over 98% backdoor samples bypassing three state-of-the-art detection mechanisms. The code is available at https://github.com/Robin-WZQ/IBA.

Updated: 2025-03-22 10:41:46

标题: 朝向对文本到图像传播模型的隐形后门攻击

摘要: 文中介绍了针对文本到图像扩散模型的后门攻击已经迅速发展，使攻击者能够植入恶意触发器到这些模型中，从而操纵它们的输出。然而，与良性样本相比，当前的后门样本通常表现出两个关键的异常：1）语义一致性，即后门提示通常会生成具有相似语义内容的图像，即使提示的文本变化很大；2）注意力一致性，即触发器在交叉注意力图中引起一致的结构响应。这些一致性给防御者留下了可检测的痕迹，使得后门更容易被识别。为了增强后门样本的隐蔽性，作者提出了一种新颖的隐形后门攻击（IBA）方法，明确地减轻了这些一致性。具体来说，他们的方法利用句法结构作为后门触发器，增加对文本变化的敏感性，有效地打破了语义一致性。此外，提出了一种基于核最大均值差异（KMMD）的正则化方法，用于对齐后门和良性样本之间的交叉注意力响应分布，从而破坏注意力一致性。大量实验证明，他们的IBA实现了97.5%的攻击成功率，同时对抗攻击的抵抗力更强，平均超过98%的后门样本绕过了三种最先进的检测机制。该代码可在https://github.com/Robin-WZQ/IBA上找到。

更新时间: 2025-03-22 10:41:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17724v1

Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

The Segment Anything Model (SAM) represents a significant breakthrough into foundation models for computer vision, providing a large-scale image segmentation model. However, despite SAM's zero-shot performance, its segmentation masks lack fine-grained details, particularly in accurately delineating object boundaries. Therefore, it is both interesting and valuable to explore whether SAM can be improved towards highly accurate object segmentation, which is known as the dichotomous image segmentation (DIS) task. To address this issue, we propose DIS-SAM, which advances SAM towards DIS with extremely accurate details. DIS-SAM is a framework specifically tailored for highly accurate segmentation, maintaining SAM's promptable design. DIS-SAM employs a two-stage approach, integrating SAM with a modified advanced network that was previously designed to handle the prompt-free DIS task. To better train DIS-SAM, we employ a ground truth enrichment strategy by modifying original mask annotations.

Updated: 2025-03-22 10:25:21

标题: 促进分段任意模型朝向高度准确的二元图像分割

摘要: The Segment Anything Model (SAM)代表了计算机视觉基础模型的重大突破，提供了一个大规模图像分割模型。然而，尽管SAM的零样本性能出色，其分割掩模缺乏细致的细节，尤其是在准确划定物体边界方面。因此，探索SAM是否可以改进为高度准确的物体分割，即被称为二分图像分割（DIS）任务，既有趣又有价值。为了解决这个问题，我们提出了DIS-SAM，将SAM推进至DIS，并具有极其准确的细节。DIS-SAM是一个专门定制的框架，用于高度准确的分割，保持了SAM的可推动设计。DIS-SAM采用了一个两阶段方法，将SAM与一个先前设计用于处理无提示的DIS任务的修改高级网络整合在一起。为了更好地训练DIS-SAM，我们采用了一个地面真实丰富策略，通过修改原始掩模标注。

更新时间: 2025-03-22 10:25:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.00248v3

Normalized Matching Transformer

We present a new state of the art approach for sparse keypoint matching between pairs of images. Our method consists of a fully deep learning based approach combining a visual backbone coupled with a SplineCNN graph neural network for feature processing and a normalized transformer decoder for decoding keypoint correspondences together with the Sinkhorn algorithm. Our method is trained using a contrastive and a hyperspherical loss for better feature representations. We additionally use data augmentation during training. This comparatively simple architecture combining extensive normalization and advanced losses outperforms current state of the art approaches on PascalVOC and SPair-71k datasets by $5.1\%$ and $2.2\%$ respectively compared to BBGM, ASAR, COMMON and GMTR while training for at least $1.7x$ fewer epochs.

Updated: 2025-03-22 10:09:11

标题: 标准化匹配变压器

摘要: 我们提出了一种新的稀疏关键点匹配的最先进方法。我们的方法由完全基于深度学习的方法组成，结合了视觉骨干和SplineCNN图神经网络进行特征处理，以及一个用于解码关键点对应关系的归一化变换器解码器，同时使用Sinkhorn算法。我们的方法使用对比损失和超球面损失进行训练，以获得更好的特征表示。我们还在训练过程中使用数据增强。这种相对简单的架构结合广泛的归一化和先进的损失优于PascalVOC和SPair-71k数据集上的当前最先进方法，分别比BBGM、ASAR、COMMON和GMTR高出$5.1\%$和$2.2\%$，同时训练的时期至少减少了$1.7x$。

更新时间: 2025-03-22 10:09:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17715v1

Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI

Artificial intelligence (AI) is currently spearheaded by machine learning (ML) methods such as deep learning which have accelerated progress on many tasks thought to be out of reach of AI. These recent ML methods are often compute hungry, energy intensive, and result in significant green house gas emissions, a known driver of anthropogenic climate change. Additionally, the platforms on which ML systems run are associated with environmental impacts that go beyond the energy consumption driven carbon emissions. The primary solution lionized by both industry and the ML community to improve the environmental sustainability of ML is to increase the compute and energy efficiency with which ML systems operate. In this perspective, we argue that it is time to look beyond efficiency in order to make ML more environmentally sustainable. We present three high-level discrepancies between the many variables that influence the efficiency of ML and the environmental sustainability of ML. Firstly, we discuss how compute efficiency does not imply energy efficiency or carbon efficiency. Second, we present the unexpected effects of efficiency on operational emissions throughout the ML model life cycle. And, finally, we explore the broader environmental impacts that are not accounted by efficiency. These discrepancies show as to why efficiency alone is not enough to remedy the adverse environmental impacts of ML. Instead, we argue for systems thinking as the next step towards holistically improving the environmental sustainability of ML.

Updated: 2025-03-22 10:02:59

标题: 效率不足：对环境可持续人工智能的批判性视角

摘要: 人工智能（AI）目前主要由深度学习等机器学习（ML）方法引领，这些方法已加速了许多被认为AI无法达到的任务的进展。这些最近的ML方法往往需要大量计算、消耗大量能源，并导致大量温室气体排放，这是人为气候变化的一个已知推动因素。此外，ML系统运行的平台与超出能源消耗驱动的碳排放的环境影响相关联。工业界和ML社区都推崇的解决方案是提高ML系统运行的计算和能源效率，以改善ML的环境可持续性。在这个观点中，我们认为是时候超越效率，使ML更具环境可持续性。我们提出了影响ML效率和环境可持续性的许多变量之间的三个高层次差异。首先，我们讨论计算效率并不意味着能源效率或碳效率。其次，我们介绍了效率对ML模型生命周期中运行排放的意想不到的影响。最后，我们探讨了效率未考虑的更广泛的环境影响。这些差异表明仅靠效率无法纠正ML的不利环境影响。相反，我们主张系统思维作为下一步，以更全面地改善ML的环境可持续性。

更新时间: 2025-03-22 10:02:59

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2309.02065v2

Multi-modality Anomaly Segmentation on the Road

Semantic segmentation allows autonomous driving cars to understand the surroundings of the vehicle comprehensively. However, it is also crucial for the model to detect obstacles that may jeopardize the safety of autonomous driving systems. Based on our experiments, we find that current uni-modal anomaly segmentation frameworks tend to produce high anomaly scores for non-anomalous regions in images. Motivated by this empirical finding, we develop a multi-modal uncertainty-based anomaly segmentation framework, named MMRAS+, for autonomous driving systems. MMRAS+ effectively reduces the high anomaly outputs of non-anomalous classes by introducing text-modal using the CLIP text encoder. Indeed, MMRAS+ is the first multi-modal anomaly segmentation solution for autonomous driving. Moreover, we develop an ensemble module to further boost the anomaly segmentation performance. Experiments on RoadAnomaly, SMIYC, and Fishyscapes validation datasets demonstrate the superior performance of our method. The code is available in https://github.com/HengGao12/MMRAS_plus.

Updated: 2025-03-22 09:55:42

标题: 多模态道路异常分割

摘要: 语义分割使自动驾驶汽车全面理解车辆周围的环境成为可能。然而，对于模型来说，检测可能危及自动驾驶系统安全的障碍也至关重要。根据我们的实验，我们发现当前的单模态异常分割框架往往会为图像中的非异常区域产生高异常分数。受到这一经验性发现的启发，我们开发了一种基于多模态不确定性的异常分割框架，名为MMRAS+，用于自动驾驶系统。MMRAS+通过引入文本模态使用CLIP文本编码器，有效降低了非异常类别的高异常输出。事实上，MMRAS+是自动驾驶的第一个多模态异常分割解决方案。此外，我们开发了一个集成模块，进一步提升异常分割性能。对RoadAnomaly、SMIYC和Fishyscapes验证数据集的实验表明了我们方法的卓越性能。代码可在https://github.com/HengGao12/MMRAS_plus上找到。

更新时间: 2025-03-22 09:55:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17712v1

Slide2Text: Leveraging LLMs for Personalized Textbook Generation from PowerPoint Presentations

The rapid advancements in Large Language Models (LLMs) have revolutionized educational technology, enabling innovative approaches to automated and personalized content creation. This paper introduces Slide2Text, a system that leverages LLMs to transform PowerPoint presentations into customized textbooks. By extracting slide content using OCR, organizing it into a coherent structure, and generating tailored materials such as explanations, exercises, and references, Slide2Text streamlines the textbook creation process. Flexible customization options further enhance its adaptability to diverse educational needs. The system highlights the potential of LLMs in modernizing textbook creation and improving educational accessibility. Future developments will explore multimedia inputs and advanced user customization features.

Updated: 2025-03-22 09:42:03

标题: Slide2Text：利用LLM进行从PowerPoint演示文稿生成个性化教材

摘要: 大型语言模型（LLMs）的迅速发展已经彻底改变了教育技术，使创新的自动化和个性化内容创作方法成为可能。本文介绍了Slide2Text，这是一个利用LLMs将PowerPoint演示文稿转化为定制教科书的系统。通过使用OCR提取幻灯片内容，将其组织成连贯的结构，并生成定制材料，例如解释、练习和参考文献，Slide2Text简化了教科书创建流程。灵活的定制选项进一步增强了其适应各种教育需求的能力。该系统突出了LLMs在现代化教科书创建和提高教育可访问性方面的潜力。未来的发展将探索多媒体输入和高级用户定制功能。

更新时间: 2025-03-22 09:42:03

领域: cs.AI,eess.IV

下载: http://arxiv.org/abs/2503.17710v1

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

GUI agents hold significant potential to enhance the experience and efficiency of human-device interaction. However, current methods face challenges in generalizing across applications (apps) and tasks, primarily due to two fundamental limitations in existing datasets. First, these datasets overlook developer-induced structural variations among apps, limiting the transferability of knowledge across diverse software environments. Second, many of them focus solely on navigation tasks, which restricts their capacity to represent comprehensive software architectures and complex user interactions. To address these challenges, we introduce GUI-Xplore, a dataset meticulously designed to enhance cross-application and cross-task generalization via an exploration-and-reasoning framework. GUI-Xplore integrates pre-recorded exploration videos providing contextual insights, alongside five hierarchically structured downstream tasks designed to comprehensively evaluate GUI agent capabilities. To fully exploit GUI-Xplore's unique features, we propose Xplore-Agent, a GUI agent framework that combines Action-aware GUI Modeling with Graph-Guided Environment Reasoning. Further experiments indicate that Xplore-Agent achieves a 10% improvement over existing methods in unfamiliar environments, yet there remains significant potential for further enhancement towards truly generalizable GUI agents.

Updated: 2025-03-22 09:30:37

标题: GUI-Xplore：通过一次探索赋予通用GUI代理能力

摘要: GUI代理人具有显著潜力，可以增强人机交互的体验和效率。然而，当前的方法在跨应用程序（应用程序）和任务上普遍存在挑战，主要是由于现有数据集中存在两个根本限制。首先，这些数据集忽视了应用程序之间由开发人员引起的结构变化，限制了知识在不同软件环境中的可转移性。其次，许多数据集仅关注导航任务，这限制了它们代表综合软件架构和复杂用户交互的能力。为了解决这些挑战，我们引入了GUI-Xplore，这是一个通过探索和推理框架精心设计的数据集，旨在增强跨应用程序和跨任务的泛化能力。GUI-Xplore集成了提供上下文洞察力的预先录制的探索视频，以及五个分层结构的下游任务，旨在全面评估GUI代理人的能力。为了充分利用GUI-Xplore的独特特性，我们提出了Xplore-Agent，这是一个将基于动作的GUI建模与基于图形引导的环境推理相结合的GUI代理框架。进一步的实验表明，Xplore-Agent在陌生环境中实现了10%的改进，但仍然存在进一步增强真正可泛化的GUI代理人的重大潜力。

更新时间: 2025-03-22 09:30:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17709v1

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathematically equivalent to full fine-tuning using a low-rank gradient for parameter updates. And this low-rank gradient can be expressed in terms of the gradients of the two low-rank matrices in LoRA. Leveraging this insight, we introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of these low-rank matrices. This adjustment allows the low-rank gradient to more accurately approximate the full fine-tuning gradient, thereby narrowing the performance gap between LoRA and full fine-tuning. Furthermore, we theoretically derive the optimal solutions for adjusting the gradients of the low-rank matrices, applying them during fine-tuning in LoRA-Pro. We conduct extensive experiments across natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification tasks, demonstrating that LoRA-Pro substantially improves LoRA's performance, effectively narrowing the gap with full fine-tuning. Code is publicly available at https://github.com/mrflogs/LoRA-Pro.

Updated: 2025-03-22 09:29:15

标题: LoRA-Pro：低秩适配器是否被正确优化？

摘要: 低秩适应，也被称为LoRA，已成为参数高效微调基础模型的突出方法。尽管LoRA具有计算效率，但与完全微调相比仍然表现较差。在本文中，我们首先揭示了LoRA和完全微调的优化过程之间的基本联系：使用LoRA进行优化在数学上等价于使用低秩梯度进行参数更新的完全微调。这个低秩梯度可以用LoRA中的两个低秩矩阵的梯度来表示。利用这一见解，我们引入了LoRA-Pro，一种通过战略调整这些低秩矩阵的梯度来增强LoRA性能的方法。这种调整允许低秩梯度更准确地近似完全微调梯度，从而缩小LoRA与完全微调之间的性能差距。此外，我们在LoRA-Pro的微调过程中理论上推导出了调整低秩矩阵梯度的最优解，证明了LoRA-Pro在自然语言理解、对话生成、数学推理、代码生成和图像分类任务中进行了大量实验，表明LoRA-Pro显著提高了LoRA的性能，有效缩小了与完全微调之间的差距。代码可在https://github.com/mrflogs/LoRA-Pro上公开获取。

更新时间: 2025-03-22 09:29:15

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.18242v3

PT-PINNs: A Parametric Engineering Turbulence Solver based on Physics-Informed Neural Networks

Physics-informed neural networks (PINNs) demonstrate promising potential in parameterized engineering turbulence optimization problems but face challenges, such as high data requirements and low computational accuracy when applied to engineering turbulence problems. This study proposes a framework that enhances the ability of PINNs to solve parametric turbulence problems without training datasets from experiments or CFD-Parametric Turbulence PINNs (PT-PINNs)). Two key methods are introduced to improve the accuracy and robustness of this framework. The first is a soft constraint method for turbulent viscosity calculation. The second is a pre-training method based on the conservation of flow rate in the flow field. The effectiveness of PT-PINNs is validated using a three-dimensional backward-facing step (BFS) turbulence problem with two varying parameters (Re = 3000-200000, ER = 1.1-1.5). PT-PINNs produce predictions that closely match experimental data and computational fluid dynamics (CFD) results across various conditions. Moreover, PT-PINNs offer a computational efficiency advantage over traditional CFD methods. The total time required to construct the parametric BFS turbulence model is 39 hours, one-sixteenth of the time required by traditional numerical methods. The inference time for a single-condition prediction is just 40 seconds-only 0.5% of a single CFD computation. These findings highlight the potential of PT-PINNs for future applications in engineering turbulence optimization problems.

Updated: 2025-03-22 09:10:53

标题: 基于物理信息神经网络的参数化工程湍流求解器PT-PINNs

摘要: 物理信息神经网络（PINNs）在参数化工程湍流优化问题中展现出有希望的潜力，但在应用于工程湍流问题时面临挑战，如高数据需求和低计算精度。本研究提出了一个框架，增强了PINNs解决参数化湍流问题的能力，而无需从实验或CFD-参数化湍流PINNs（PT-PINNs）中训练数据集。引入了两种关键方法来改善该框架的准确性和鲁棒性。第一种是用于湍流粘度计算的软约束方法。第二种是基于流场中流量守恒的预训练方法。通过使用具有两个变化参数（Re = 3000-200000，ER = 1.1-1.5）的三维背向台阶（BFS）湍流问题验证了PT-PINNs的有效性。PT-PINNs产生的预测结果与各种条件下的实验数据和计算流体动力学（CFD）结果非常接近。此外，PT-PINNs相对于传统CFD方法提供了计算效率优势。构建参数化BFS湍流模型所需的总时间为39小时，是传统数值方法所需时间的十六分之一。单个条件预测的推断时间只有40秒，仅为单个CFD计算的0.5%。这些发现突显了PT-PINNs在未来应用于工程湍流优化问题中的潜力。

更新时间: 2025-03-22 09:10:53

领域: physics.flu-dyn,cs.AI

下载: http://arxiv.org/abs/2503.17704v1

On the (im)possibility of sustainable artificial intelligence. Why it does not make sense to move faster when heading the wrong way

Artificial intelligence (AI) is currently considered a sustainability "game-changer" within and outside of academia. In order to discuss sustainable AI this article draws from insights by critical data and algorithm studies, STS, transformative sustainability science, critical computer science, and public interest theory. I argue that while there are indeed many sustainability-related use cases for AI, they are likely to have more overall drawbacks than benefits. To substantiate this claim, I differentiate three 'AI materialities' of the AI supply chain: first the literal materiality (e.g. water, cobalt, lithium, energy consumption etc.), second, the informational materiality (e.g. lots of data and centralised control necessary), and third, the social materiality (e.g. exploitative data work, communities harm by waste and pollution). In all materialities, effects are especially devastating for the global south while benefiting the global north. A second strong claim regarding sustainable AI circles around so called apolitical optimisation (e.g. regarding city traffic), however the optimisation criteria (e.g. cars, bikes, emissions, commute time, health) are purely political and have to be collectively negotiated before applying AI optimisation. Hence, sustainable AI, in principle, cannot break the glass ceiling of transformation and might even distract from necessary societal change. To address that I propose to stop 'unformation gathering' and to apply the 'small is beautiful' principle. This aims to contribute to an informed academic and collective negotiation on how to (not) integrate AI into the sustainability project while avoiding to reproduce the status quo by serving hegemonic interests between useful AI use cases, techno-utopian salvation narratives, technology-centred efficiency paradigms, the exploitative and extractivist character of AI and concepts of digital degrowth.

Updated: 2025-03-22 09:01:15

标题: 关于可持续人工智能的（不）可能性。为什么在错误的路上加速前进是没有意义的。

摘要: 人工智能（AI）目前被认为是学术界内外的可持续性“游戏改变者”。为了讨论可持续AI，本文借鉴了关键数据和算法研究、STS、变革性可持续发展科学、关键计算机科学和公共利益理论的见解。我认为，虽然确实有许多与可持续发展相关的AI应用案例，但它们可能带来的整体缺点可能超过利益。为了证实这一观点，我区分了AI供应链的三种“AI物质性”：首先是字面上的物质性（例如水、钴、锂、能耗等），其次是信息物质性（例如需要大量数据和集中控制），第三是社会物质性（例如剥削性数据工作，社区因废物和污染受到损害）。在所有的物质性中，对全球南方的影响尤为严重，同时使全球北方受益。关于可持续AI的第二个有力主张围绕所谓的无政治优化（例如关于城市交通），然而优化标准（例如汽车、自行车、排放、通勤时间、健康）纯粹是政治的，必须在应用AI优化之前进行集体协商。因此，原则上，可持续AI不能突破转型的障碍，甚至可能分散对必要社会变革的注意力。为了解决这一问题，我建议停止“非信息收集”并应用“小而美”的原则。这旨在为如何（不）将AI整合到可持续性项目中进行学术和集体协商，同时避免通过为有用的AI应用案例、技术乌托邦救赎叙事、以技术为中心的效率范式、AI的剥削性和开采性特征以及数字减少概念服务来重复现状的利益之间的霸权利益。

更新时间: 2025-03-22 09:01:15

领域: cs.CY,cs.AI,68T99,K.4; I.2; H.4

下载: http://arxiv.org/abs/2503.17702v1

BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization

Neural Combinatorial Optimization (NCO) has emerged as a promising approach for NP-hard problems. However, prevailing RL-based methods suffer from low sample efficiency due to sparse rewards and underused solutions. We propose Preference Optimization for Combinatorial Optimization (POCO), a training paradigm that leverages solution preferences via objective values. It introduces: (1) an efficient preference pair construction for better explore and exploit solutions, and (2) a novel loss function that adaptively scales gradients via objective differences, removing reliance on reward models or reference policies. Experiments on Job-Shop Scheduling (JSP), Traveling Salesman (TSP), and Flexible Job-Shop Scheduling (FJSP) show POCO outperforms state-of-the-art neural methods, reducing optimality gaps impressively with efficient inference. POCO is architecture-agnostic, enabling seamless integration with existing NCO models, and establishes preference optimization as a principled framework for combinatorial optimization.

Updated: 2025-03-22 08:59:25

标题: BOPO：通过最佳锚定和目标引导的首选优化实现神经组合优化

摘要: 神经组合优化（NCO）已经成为处理NP难问题的一种有前途的方法。然而，现有的基于强化学习的方法由于奖励稀疏和解决方案未被充分利用而导致样本效率低。我们提出了Preference Optimization for Combinatorial Optimization（POCO），一种利用解决方案偏好通过目标值的训练范式。它引入了：（1）一种有效的偏好对构造方法，以更好地探索和利用解决方案，以及（2）一种新颖的损失函数，通过目标差异自适应地缩放梯度，消除对奖励模型或参考策略的依赖。在作业车间调度（JSP）、旅行商问题（TSP）和灵活作业车间调度（FJSP）上的实验表明，POCO优于最先进的神经方法，在高效推理的情况下显著减少最优性差距。POCO是与架构无关的，能够与现有的NCO模型无缝集成，并将偏好优化确立为组合优化的原则性框架。

更新时间: 2025-03-22 08:59:25

领域: cs.LG

下载: http://arxiv.org/abs/2503.07580v2

Intelligence Sequencing and the Path-Dependence of Intelligence Evolution: AGI-First vs. DCI-First as Irreversible Attractors

The trajectory of intelligence evolution is often framed around the emergence of artificial general intelligence (AGI) and its alignment with human values. This paper challenges that framing by introducing the concept of intelligence sequencing: the idea that the order in which AGI and decentralized collective intelligence (DCI) emerge determines the long-term attractor basin of intelligence. Using insights from dynamical systems, evolutionary game theory, and network models, it argues that intelligence follows a path-dependent, irreversible trajectory. Once development enters a centralized (AGI-first) or decentralized (DCI-first) regime, transitions become structurally infeasible due to feedback loops and resource lock-in. Intelligence attractors are modeled in functional state space as the co-navigation of conceptual and adaptive fitness spaces. Early-phase structuring constrains later dynamics, much like renormalization in physics. This has major implications for AI safety: traditional alignment assumes AGI will emerge and must be controlled after the fact, but this paper argues that intelligence sequencing is more foundational. If AGI-first architectures dominate before DCI reaches critical mass, hierarchical monopolization and existential risk become locked in. If DCI-first emerges, intelligence stabilizes around decentralized cooperative equilibrium. The paper further explores whether intelligence structurally biases itself toward an attractor based on its self-modeling method -- externally imposed axioms (favoring AGI) vs. recursive internal visualization (favoring DCI). Finally, it proposes methods to test this theory via simulations, historical lock-in case studies, and intelligence network analysis. The findings suggest that intelligence sequencing is a civilizational tipping point: determining whether the future is shaped by unbounded competition or unbounded cooperation.

Updated: 2025-03-22 08:09:04

标题: 智能排序和智能进化的路径依赖性：AGI-First vs. DCI-First作为不可逆吸引子

摘要: 智能演变轨迹通常围绕人工通用智能（AGI）的出现及其与人类价值观的一致性而构建。本文通过引入智能排序的概念挑战了这种框架：即AGI和分散集体智能（DCI）出现的顺序决定了智能的长期吸引子盆地。利用动力系统、进化博弈论和网络模型的见解，论文认为智能遵循一种路径依赖、不可逆的轨迹。一旦发展进入集中（首先AGI）或分散（首先DCI）的制度，由于反馈环路和资源锁定，转换变得结构上不可行。智能吸引子在功能状态空间中建模为概念和适应性适应性空间的共同导航。早期结构约束后期动态，就像物理学中的重整化一样。这对AI安全有重大影响：传统的对齐假设AGI将出现并且必须在事后进行控制，但本文认为智能排序更具基础性。如果AGI优先架构在DCI达到临界质量之前占主导地位，层次化垄断和生存风险就会固定下来。如果DCI优先出现，智能将稳定在分散合作平衡周围。本文进一步探讨智能是否在结构上偏向于根据其自我建模方法形成吸引子--外部施加的公理（支持AGI）vs.递归内部可视化（支持DCI）。最后，它提出了通过模拟、历史锁定案例研究和智能网络分析来测试这一理论的方法。研究结果表明智能排序是一个文明的转折点：决定未来是由无限竞争还是无限合作来塑造。

更新时间: 2025-03-22 08:09:04

领域: cs.AI

下载: http://arxiv.org/abs/2503.17688v1

Can LLMs Automate Fact-Checking Article Writing?

Automatic fact-checking aims to support professional fact-checkers by offering tools that can help speed up manual fact-checking. Yet, existing frameworks fail to address the key step of producing output suitable for broader dissemination to the general public: while human fact-checkers communicate their findings through fact-checking articles, automated systems typically produce little or no justification for their assessments. Here, we aim to bridge this gap. We argue for the need to extend the typical automatic fact-checking pipeline with automatic generation of full fact-checking articles. We first identify key desiderata for such articles through a series of interviews with experts from leading fact-checking organizations. We then develop QRAFT, an LLM-based agentic framework that mimics the writing workflow of human fact-checkers. Finally, we assess the practical usefulness of QRAFT through human evaluations with professional fact-checkers. Our evaluation shows that while QRAFT outperforms several previously proposed text-generation approaches, it lags considerably behind expert-written articles. We hope that our work will enable further research in this new and important direction.

Updated: 2025-03-22 07:56:50

标题: LLMs能自动化事实核查文章写作吗？

摘要: 自动事实核查旨在通过提供可以帮助加快手动事实核查的工具来支持专业事实核查员。然而，现有框架未能解决为更广泛的公众传播而产生适当输出的关键步骤：尽管人类事实核查员通过事实核查文章传达他们的发现，但自动化系统通常几乎不提供其评估的理由。在这里，我们旨在填补这一差距。我们主张需要通过自动生成全面的事实核查文章来扩展典型的自动事实核查流程。我们首先通过与领先的事实核查组织的专家进行一系列访谈，确定了这些文章的关键要求。然后，我们开发了基于LLM的代理框架QRAFT，模拟人类事实核查员的写作工作流程。最后，我们通过与专业事实核查员的人工评估来评估QRAFT的实用性。我们的评估显示，虽然QRAFT优于几种先前提出的文本生成方法，但与专家撰写的文章相比仍有很大差距。我们希望我们的工作将促进这个新而重要方向的进一步研究。

更新时间: 2025-03-22 07:56:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17684v1

Decentralized Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation

Decentralized Multi-Source Domain Adaptation (DMSDA) is a challenging task that aims to transfer knowledge from multiple related and heterogeneous source domains to an unlabeled target domain within a decentralized framework. Our work tackles DMSDA through a fully decentralized federated approach. In particular, we extend the Federated Dataset Dictionary Learning (FedDaDiL) framework by eliminating the necessity for a central server. FedDaDiL leverages Wasserstein barycenters to model the distributional shift across multiple clients, enabling effective adaptation while preserving data privacy. By decentralizing this framework, we enhance its robustness, scalability, and privacy, removing the risk of a single point of failure. We compare our method to its federated counterpart and other benchmark algorithms, showing that our approach effectively adapts source domains to an unlabeled target domain in a fully decentralized manner.

Updated: 2025-03-22 07:48:48

标题: 去中心化联邦数据集字典学习用于多源领域自适应

摘要: Decentralized Multi-Source Domain Adaptation (DMSDA)是一个具有挑战性的任务，旨在通过分散的框架将知识从多个相关和异构的源领域转移到一个未标记的目标领域。我们的工作通过完全分散的联邦方法来解决DMSDA。具体来说，我们通过消除中央服务器的必要性来扩展Federated Dataset Dictionary Learning (FedDaDiL)框架。FedDaDiL利用Wasserstein barycenters来建模多个客户端之间的分布偏移，从而实现有效的适应性同时保护数据隐私。通过分散这一框架，我们增强了其稳健性、可扩展性和隐私性，消除了单点故障的风险。我们将我们的方法与联邦对应方法和其他基准算法进行比较，结果显示我们的方法有效地将源领域适应到一个未标记的目标领域中，且完全分散进行。

更新时间: 2025-03-22 07:48:48

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17683v1

Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models

Multimodal large language models (MLLMs) are critical for developing general-purpose AI assistants, yet they face growing safety risks. How can we ensure that MLLMs are safely aligned to prevent undesired behaviors such as discrimination, misinformation, or violations of ethical standards? In a further step, we need to explore how to fine-tune MLLMs to enhance reasoning performance while ensuring they satisfy safety constraints. Fundamentally, this can be formulated as a min-max optimization problem. In this study, we propose Safe RLHF-V, the first multimodal safety alignment framework that jointly optimizes helpfulness and safety using separate multimodal reward and cost models within a Lagrangian-based constrained optimization framework. Given that there is a lack of preference datasets that separate helpfulness and safety in multimodal scenarios, we introduce BeaverTails-V, the first open-source dataset with dual preference annotations for helpfulness and safety, along with multi-level safety labels (minor, moderate, severe). Additionally, we design a Multi-level Guardrail System to proactively defend against unsafe queries and adversarial attacks. By applying the Beaver-Guard-V moderation for 5 rounds of filtering and re-generation on the precursor model, the overall safety of the upstream model is significantly improved by an average of 40.9%. Experimental results demonstrate that fine-tuning different MLLMs with Safe RLHF can effectively enhance model helpfulness while ensuring improved safety. Specifically, Safe RLHF-V improves model safety by 34.2% and helpfulness by 34.3%. All of datasets, models, and code can be found at https://github.com/SafeRLHF-V to support the safety development of MLLMs and reduce potential societal risks.

Updated: 2025-03-22 07:40:20

标题: Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models 安全RLHF-V：在多模态大型语言模型中从人类反馈中进行安全强化学习

摘要: 多模态大型语言模型（MLLMs）对于开发通用人工智能助手至关重要，但面临着日益增长的安全风险。我们如何确保MLLMs安全对齐，以防止出现歧视、错误信息或违反道德标准等不良行为？在进一步的步骤中，我们需要探讨如何微调MLLMs以提高推理性能，同时确保它们符合安全约束。从根本上讲，这可以被形式化为一个极小极大优化问题。在本研究中，我们提出了Safe RLHF-V，这是第一个多模态安全对齐框架，它利用基于拉格朗日的约束优化框架中的单独多模态奖励和成本模型共同优化帮助性和安全性。鉴于在多模态场景下缺乏分离帮助性和安全性的偏好数据集，我们引入了BeaverTails-V，这是第一个具有帮助性和安全性的双重偏好标注以及多级安全标签（轻微、中等、严重）的开源数据集。此外，我们设计了一个多级防护系统，以主动防御不安全的查询和对抗性攻击。通过在先导模型上应用Beaver-Guard-V的5轮过滤和重新生成的调节，上游模型的整体安全性平均提高了40.9%。实验结果表明，使用Safe RLHF对不同MLLM进行微调可以有效提升模型的帮助性，同时确保改善安全性。具体来说，Safe RLHF-V将模型的安全性提高了34.2%，帮助性提高了34.3%。所有数据集、模型和代码均可在 https://github.com/SafeRLHF-V 找到，以支持MLLMs的安全发展，并减少潜在的社会风险。

更新时间: 2025-03-22 07:40:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17682v1

Staying Alive: Online Neural Network Maintenance and Systemic Drift

We present the Subset Extended Kalman Filter (SEKF) as a method to update previously trained model weights online rather than retraining or finetuning them when the system a model represents drifts away from the conditions under which it was trained. We identify the parameters to be updated using the gradient of the loss function and use the SEKF to update only these parameters. We compare finetuning and SEKF for online model maintenance in the presence of systemic drift through four dynamic regression case studies and find that the SEKF is able to maintain model accuracy as-well if not better than finetuning while requiring significantly less time per iteration, and less hyperparameter tuning.

Updated: 2025-03-22 07:38:44

标题: 保持存活：在线神经网络维护和系统漂移

摘要: 我们提出了Subset Extended Kalman Filter（SEKF）作为一种方法，用于在线更新先前训练的模型权重，而不是在模型表示的系统漂离训练条件时重新训练或微调它们。我们使用损失函数的梯度识别要更新的参数，并使用SEKF仅更新这些参数。通过四个动态回归案例研究，我们比较了微调和SEKF在存在系统漂移的情况下进行在线模型维护，并发现SEKF能够保持模型准确性，甚至比微调更好，同时每次迭代所需的时间显著较少，超参数调整也更少。

更新时间: 2025-03-22 07:38:44

领域: cs.LG

下载: http://arxiv.org/abs/2503.17681v1

BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks

The rapid growth of Large Language Models (LLMs) raises concerns about distinguishing AI-generated text from human content. Existing watermarking techniques, like \kgw, struggle with low watermark strength and stringent false-positive requirements. Our analysis reveals that current methods rely on coarse estimates of non-watermarked text, limiting watermark detectability. To address this, we propose Bipolar Watermark (\tool), which splits generated text into positive and negative poles, enhancing detection without requiring additional computational resources or knowledge of the prompt. Theoretical analysis and experimental results demonstrate \tool's effectiveness and compatibility with existing optimization techniques, providing a new optimization dimension for watermarking in LLM-generated content.

Updated: 2025-03-22 07:18:40

标题: BiMarker：使用双极水印增强大型语言模型的文本水印检测

摘要: 大型语言模型（LLMs）的快速增长引发了人们对区分人工智能生成的文本和人类内容的担忧。现有的数字水印技术，如\kgw，在水印强度低和严格的假阳性要求方面存在困难。我们的分析表明，当前的方法依赖于对未加水印文本的粗略估计，限制了水印的可检测性。为了解决这个问题，我们提出了双极水印（\tool），将生成的文本分为正极和负极，增强了检测能力，而无需额外的计算资源或对提示的了解。理论分析和实验结果表明\tool的有效性，并与现有的优化技术兼容，为LLM生成的内容中的水印提供了一个新的优化维度。

更新时间: 2025-03-22 07:18:40

领域: cs.LG

下载: http://arxiv.org/abs/2501.12174v5

Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds

Class incremental learning (CIL) aims to enable models to continuously learn new classes without catastrophically forgetting old ones. A promising direction is to learn and use prototypes of classes during incremental updates. Despite simplicity and intuition, we find that such methods suffer from inadequate representation capability and unsatisfied feature overlap. These two factors cause class-wise confusion and limited performance. In this paper, we develop a Confusion-REduced AuTo-Encoder classifier (CREATE) for CIL. Specifically, our method employs a lightweight auto-encoder module to learn compact manifold for each class in the latent subspace, constraining samples to be well reconstructed only on the semantically correct auto-encoder. Thus, the representation stability and capability of class distributions are enhanced, alleviating the potential class-wise confusion problem. To further distinguish the overlapped features, we propose a confusion-aware latent space separation loss that ensures samples are closely distributed in their corresponding low-dimensional manifold while keeping away from the distributions of features from other classes. Our method demonstrates stronger representational capacity and discrimination ability by learning disentangled manifolds and reduces class confusion. Extensive experiments on multiple datasets and settings show that CREATE outperforms other state-of-the-art methods up to 5.41%.

Updated: 2025-03-22 07:07:15

标题: 使用解耦流形减少增量学习中的类别混淆

摘要: Class incremental learning (CIL)旨在使模型能够在不灾难性地遗忘旧类别的情况下持续学习新类别。一个有前途的方向是在增量更新过程中学习和使用类别的原型。尽管这些方法简单直观，但我们发现这些方法存在表示能力不足和特征重叠不满足的问题。这两个因素导致了类别混淆和性能有限。在本文中，我们为CIL开发了一个减少混淆的自动编码器分类器（CREATE）。具体来说，我们的方法采用轻量级自动编码器模块在潜在子空间中为每个类别学习紧凑的流形，约束样本只能在语义正确的自动编码器上被很好地重构。因此，类别分布的表示稳定性和能力得到增强，缓解了潜在的类别混淆问题。为了进一步区分重叠的特征，我们提出了一种混淆感知潜在空间分离损失，确保样本在各自的低维流形上紧密分布，同时远离其他类别特征的分布。我们的方法通过学习解耦的流形表现出更强的表征能力和区分能力，并减少了类别混淆。对多个数据集和设置进行的大量实验表明，CREATE的性能优于其他最先进的方法高达5.41%。

更新时间: 2025-03-22 07:07:15

领域: cs.LG

下载: http://arxiv.org/abs/2503.17677v1

MultiScale Contextual Bandits for Long Term Objectives

The feedback that AI systems (e.g., recommender systems, chatbots) collect from user interactions is a crucial source of training data. While short-term feedback (e.g., clicks, engagement) is widely used for training, there is ample evidence that optimizing short-term feedback does not necessarily achieve the desired long-term objectives. Unfortunately, directly optimizing for long-term objectives is challenging, and we identify the disconnect in the timescales of short-term interventions (e.g., rankings) and the long-term feedback (e.g., user retention) as one of the key obstacles. To overcome this disconnect, we introduce the framework of MultiScale Policy Learning to contextually reconcile that AI systems need to act and optimize feedback at multiple interdependent timescales. For any two levels, our formulation selects the shorter-term objective at the next lower scale to optimize the longer-term objective at the next higher scale. As a result, the policies at all levels effectively optimize for the long-term. We instantiate the framework with MultiScale Off-Policy Bandit Learning (MSBL) and demonstrate its effectiveness on three tasks relating to recommender systems and text generation.

Updated: 2025-03-22 07:03:45

标题: 多尺度上下文强化学习用于长期目标的文献

摘要: AI系统（例如，推荐系统、聊天机器人）收集用户互动的反馈是训练数据的关键来源。虽然短期反馈（例如，点击、参与度）被广泛用于训练，但有充分证据表明，优化短期反馈并不一定能实现期望的长期目标。不幸的是，直接优化长期目标是具有挑战性的，我们确定了短期干预（例如，排名）和长期反馈（例如，用户留存）之间时间尺度不一致的关键障碍之一。为了克服这种不一致，我们引入了多尺度策略学习框架，以情境化地协调AI系统需要在多个相互依赖的时间尺度上行动和优化反馈。对于任何两个层级，我们的公式选择在下一个较低尺度上的较短期目标，以优化在下一个较高尺度上的较长期目标。因此，所有层级的策略有效地优化了长期目标。我们将该框架实例化为多尺度离线策略学习（MSBL），并展示其在与推荐系统和文本生成相关的三个任务上的有效性。

更新时间: 2025-03-22 07:03:45

领域: cs.LG

下载: http://arxiv.org/abs/2503.17674v1

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

ComfyUI provides a widely-adopted, workflow-based interface that enables users to customize various image generation tasks through an intuitive node-based architecture. However, the intricate connections between nodes and diverse modules often present a steep learning curve for users. In this paper, we introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI workflows based on task descriptions automatically. ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent. The core innovation of ComfyGPT lies in two key aspects. First, it focuses on generating individual node links rather than entire workflows, significantly improving generation precision. Second, we proposed FlowAgent, a LLM-based workflow generation agent that uses both supervised fine-tuning (SFT) and reinforcement learning (RL) to improve workflow generation accuracy. Moreover, we introduce FlowDataset, a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench, a comprehensive benchmark for evaluating workflow generation systems. We also propose four novel evaluation metrics: Format Validation (FV), Pass Accuracy (PA), Pass Instruct Alignment (PIA), and Pass Node Diversity (PND). Experimental results demonstrate that ComfyGPT significantly outperforms existing LLM-based methods in workflow generation.

Updated: 2025-03-22 06:48:50

标题: ComfyGPT：一个用于全面ComfyUI工作流生成的自我优化多代理系统

摘要: ComfyUI提供了一个被广泛采纳的基于工作流的界面，使用户能够通过直观的基于节点的架构定制各种图像生成任务。然而，节点之间复杂的连接和多样的模块经常给用户带来陡峭的学习曲线。本文介绍了ComfyGPT，这是第一个自我优化的多智能体系统，旨在根据任务描述自动生成ComfyUI工作流。ComfyGPT包括四个专门化的智能体：ReformatAgent、FlowAgent、RefineAgent和ExecuteAgent。ComfyGPT的核心创新在于两个关键方面。首先，它专注于生成单个节点链接而不是整个工作流，显著提高了生成精度。其次，我们提出了FlowAgent，一个基于LLM的工作流生成智能体，利用监督微调（SFT）和强化学习（RL）来提高工作流生成准确性。此外，我们介绍了FlowDataset，一个包含13,571个工作流描述对的大规模数据集，以及FlowBench，一个用于评估工作流生成系统的全面基准。我们还提出了四个新颖的评估指标：格式验证（FV）、传递准确性（PA）、传递指令对齐（PIA）和传递节点多样性（PND）。实验结果表明，ComfyGPT在工作流生成方面明显优于现有基于LLM的方法。

更新时间: 2025-03-22 06:48:50

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2503.17671v1

Multi-Modality Representation Learning for Antibody-Antigen Interactions Prediction

While deep learning models play a crucial role in predicting antibody-antigen interactions (AAI), the scarcity of publicly available sequence-structure pairings constrains their generalization. Current AAI methods often focus on residue-level static details, overlooking fine-grained structural representations of antibodies and their inter-antibody similarities. To tackle this challenge, we introduce a multi-modality representation approach that integates 3D structural and 1D sequence data to unravel intricate intra-antibody hierarchical relationships. By harnessing these representations, we present MuLAAIP, an AAI prediction framework that utilizes graph attention networks to illuminate graph-level structural features and normalized adaptive graph convolution networks to capture inter-antibody sequence associations. Furthermore, we have curated an AAI benchmark dataset comprising both structural and sequence information along with interaction labels. Through extensive experiments on this benchmark, our results demonstrate that MuLAAIP outperforms current state-of-the-art methods in terms of predictive performance. The implementation code and dataset are publicly available at https://github.com/trashTian/MuLAAIP for reproducibility.

Updated: 2025-03-22 06:23:51

标题: 多模态表示学习用于抗体-抗原相互作用预测

摘要: 尽管深度学习模型在预测抗体-抗原相互作用（AAI）中起着至关重要的作用，但公开可用的序列-结构配对的稀缺性限制了它们的泛化能力。当前的AAI方法通常侧重于残基级别的静态细节，忽视了抗体及其抗体相似性的细粒度结构表示。为了解决这一挑战，我们引入了一种多模态表示方法，将3D结构和1D序列数据整合起来，以揭示复杂的抗体内部层次关系。通过利用这些表示，我们提出了MuLAAIP，这是一个利用图注意力网络来揭示图级别结构特征和利用归一化自适应图卷积网络来捕捉抗体间序列关联的AAI预测框架。此外，我们还精心策划了一个包含结构和序列信息以及相互作用标签的AAI基准数据集。通过对这个基准数据集的广泛实验，我们的结果表明MuLAAIP在预测性能方面优于当前最先进的方法。实现代码和数据集可在https://github.com/trashTian/MuLAAIP上公开获取，以便复现。

更新时间: 2025-03-22 06:23:51

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2503.17666v1

CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data

The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within biological systems remain ambiguous in these spaces, highlighting the necessity of early detection and prediction of cardiovascular diseases to reduce the severe morbidity and mortality with these conditions worldwide. We handle this issue with CardioTabNet, which exploits the strength of tab transformer to extract feature space which carries strong understanding of clinical cardiovascular data and its feature ranking. As a result, performance of downstream classical models significantly showed outstanding result. Our study utilizes the open-source dataset for heart disease prediction with 1190 instances and 11 features. In total, 11 features are divided into numerical (age, resting blood pressure, cholesterol, maximum heart rate, old peak, weight, and fasting blood sugar) and categorical (resting ECG, exercise angina, and ST slope). Tab transformer was used to extract important features and ranked them using random forest (RF) feature ranking algorithm. Ten machine-learning models were used to predict heart disease using selected features. After extracting high-quality features, the top downstream model (a hyper-tuned ExtraTree classifier) achieved an average accuracy rate of 94.1% and an average Area Under Curve (AUC) of 95.0%. Furthermore, a nomogram analysis was conducted to evaluate the model's effectiveness in cardiovascular risk assessment. A benchmarking study was conducted using state-of-the-art models to evaluate our transformer-driven framework.

Updated: 2025-03-22 06:17:08

标题: CardioTabNet：一种新型的混合变压器模型，用于使用表格化医疗数据预测心脏病

摘要: 心血管疾病的早期检测和预测对于全球范围内减少与这些疾病相关的严重发病率和死亡率至关重要。多头自注意机制在自然语言处理（NLP）中被广泛使用，由变压器操作以理解特征空间中的特征交互。然而，在这些空间中，生物系统内各种特征之间的关系仍然不明确，突出了全球范围内减少与这些情况相关的严重发病率和死亡率的早期检测和预测的必要性。我们通过CardioTabNet来处理这个问题，它利用标签变压器的优势来提取具有对临床心血管数据和其特征排序有强大理解的特征空间。结果，下游经典模型的性能显著表现出色。我们的研究利用了包含1190个实例和11个特征的心脏病预测的开源数据集。总共，11个特征分为数值（年龄、静息血压、胆固醇、最大心率、老峰、体重和空腹血糖）和分类（静息心电图、运动性心绞痛和ST斜率）。标签变压器被用来提取重要特征，并使用随机森林（RF）特征排序算法对它们进行排名。使用选择的特征来预测心脏病的十个机器学习模型。在提取高质量特征后，顶级下游模型（经过调优的ExtraTree分类器）实现了94.1%的平均准确率和95.0%的平均曲线下面积（AUC）。此外，进行了一项诊断模型在心血管风险评估中的有效性的评估。使用最先进的模型进行了基准研究，以评估我们的变压器驱动框架。

更新时间: 2025-03-22 06:17:08

领域: cs.LG

下载: http://arxiv.org/abs/2503.17664v1

A Qualitative Study of User Perception of M365 AI Copilot

Adopting AI copilots in professional workflows presents opportunities for enhanced productivity, efficiency, and decision making. In this paper, we present results from a six month trial of M365 Copilot conducted at our organisation in 2024. A qualitative interview study was carried out with 27 participants. The study explored user perceptions of M365 Copilot's effectiveness, productivity impact, evolving expectations, ethical concerns, and overall satisfaction. Initial enthusiasm for the tool was met with mixed post trial experiences. While some users found M365 Copilot beneficial for tasks such as email coaching, meeting summaries, and content retrieval, others reported unmet expectations in areas requiring deeper contextual understanding, reasoning, and integration with existing workflows. Ethical concerns were a recurring theme, with users highlighting issues related to data privacy, transparency, and AI bias. While M365 Copilot demonstrated value in specific operational areas, its broader impact remained constrained by usability limitations and the need for human oversight to validate AI generated outputs.

Updated: 2025-03-22 06:11:10

标题: 《M365 AI Copilot用户感知的定性研究》

摘要: 在专业工作流程中采用人工智能副驾驶员为提高生产力、效率和决策制定提供了机会。本文介绍了我们在2024年进行的为期六个月的M365 Copilot试验的结果。我们对27名参与者进行了定性访谈研究。该研究探讨了用户对M365 Copilot效果、生产力影响、不断变化的期望、伦理关注以及总体满意度的看法。对该工具的初期热情与试验后的混合体验相遇。一些用户发现M365 Copilot在电子邮件辅导、会议总结和内容检索等任务中很有益处，而其他人则报告在需要更深入的上下文理解、推理和与现有工作流程集成的领域中没有达到期望。伦理关注是一个经常出现的主题，用户强调了与数据隐私、透明度和人工智能偏见相关的问题。虽然M365 Copilot在特定的运营领域展示了价值，但其更广泛的影响受到可用性限制和需要人类监督来验证人工智能生成的输出的限制。

更新时间: 2025-03-22 06:11:10

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.17661v1

On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation

The LLaMA-Adapter has recently emerged as an efficient fine-tuning technique for LLaMA models, leveraging zero-initialized attention to stabilize training and enhance performance. However, despite its empirical success, the theoretical foundations of zero-initialized attention remain largely unexplored. In this paper, we provide a rigorous theoretical analysis, establishing a connection between zero-initialized attention and mixture-of-expert models. We prove that both linear and non-linear prompts, along with gating functions, can be optimally estimated, with non-linear prompts offering greater flexibility for future applications. Empirically, we validate our findings on the open LLM benchmarks, demonstrating that non-linear prompts outperform linear ones. Notably, even with limited training data, both prompt types consistently surpass vanilla attention, highlighting the robustness and adaptability of zero-initialized attention.

Updated: 2025-03-22 06:05:33

标题: 关于零初始化注意力：最优提示和门控因子估计

摘要: LLaMA适配器最近作为LLaMA模型的一种有效的微调技术出现，利用零初始化的注意力来稳定训练并提高性能。然而，尽管它在实证上取得成功，零初始化的注意力的理论基础仍然很少被探索。在本文中，我们提供了严格的理论分析，建立了零初始化的注意力与专家模型混合之间的联系。我们证明，线性和非线性提示以及门控函数都可以被最优估计，非线性提示提供了更大的灵活性用于未来的应用。在经验上，我们在开放的LLM基准上验证了我们的发现，证明了非线性提示优于线性提示。值得注意的是，即使在有限的训练数据下，两种提示类型始终优于普通的注意力，突出了零初始化的注意力的稳健性和适应性。

更新时间: 2025-03-22 06:05:33

领域: cs.LG

下载: http://arxiv.org/abs/2502.03029v2

Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. Most of the state-of-the-art architectures exploit either temporal or inter-channel dependencies, limiting their effectiveness in multivariate time-series forecasting where both types of dependencies are crucial. We propose Sentinel, a full transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension, and a decoder designed to capture causal relations and dependencies across the temporal dimension. Additionally, we introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture, replacing the multi-head splitting process. Extensive experiments on standard benchmarks demonstrate that Sentinel, because of its ability to "monitor" both the temporal and the inter-channel dimension, achieves better or comparable performance with respect to state-of-the-art approaches.

Updated: 2025-03-22 06:01:50

标题: 哨兵：具有时间和通道注意力的多补丁变压器用于时间序列预测

摘要: 最近，基于Transformer的时间序列预测引起了强烈兴趣，因为Transformer能够对顺序数据进行建模。大多数最先进的架构利用时间或通道间的依赖关系，这限制了它们在多变量时间序列预测中的有效性，因为这两种类型的依赖关系都至关重要。我们提出了Sentinel，这是一个完全基于Transformer的架构，由一个能够从通道维度提取上下文信息的编码器和一个设计用于捕捉时间维度上因果关系和依赖关系的解码器组成。此外，我们引入了一个多块注意机制，利用补丁过程将输入序列结构化，以一种能够自然集成到Transformer架构中的方式，取代多头分割过程。对标准基准测试的广泛实验表明，由于Sentinel能够“监控”时间和通道维度，它在性能方面达到了与最先进方法相当或更好的水平。

更新时间: 2025-03-22 06:01:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.17658v1

Demo-Craft: Using In-Context Learning to Improve Code Generation in Large Language Models

Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding taskspecific contexts. To address these issues, we propose a system called DemoCraft, which enhances code generation by leveraging in-context learning and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. We then test our system on two major datasets: MBPP and Humaneval. Our experimental results demonstrate that the proposed system achieves an approximate 2x increase in the pass@k metric compared to baseline models. Furthermore, we introduce two novel evaluation metrics: correctness@k and similarity@k. Our empirical studies indicate that our system attains nearly a 3x improvement in these metrics as well.

Updated: 2025-03-22 05:52:26

标题: Demo-Craft：利用上下文学习改进大型语言模型中的代码生成

摘要: 使用大型语言模型（LLMs）从自然语言指令生成可执行代码面临语义模糊和理解任务特定上下文等挑战。为了解决这些问题，我们提出了一个名为DemoCraft的系统，通过利用上下文学习和演示选择，结合潜在概念学习来增强代码生成。潜在概念学习引入了额外的概念标记，这些可训练的嵌入捕捉了任务特定的知识。然后我们在两个主要数据集MBPP和Humaneval上测试我们的系统。我们的实验结果表明，与基线模型相比，所提出的系统在pass@k指标上实现了近2倍的增长。此外，我们引入了两个新颖的评估指标：correctness@k和similarity@k。我们的实证研究表明，我们的系统在这些指标上也取得了近3倍的改进。

更新时间: 2025-03-22 05:52:26

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.00865v2

Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

Federated Learning (FL), a privacy-preserving decentralized machine learning framework, has been shown to be vulnerable to backdoor attacks. Current research primarily focuses on the Single-Label Backdoor Attack (SBA), wherein adversaries share a consistent target. However, a critical fact is overlooked: adversaries may be non-cooperative, have distinct targets, and operate independently, which exhibits a more practical scenario called Multi-Label Backdoor Attack (MBA). Unfortunately, prior works are ineffective in the MBA scenario since non-cooperative attackers exclude each other. In this work, we conduct an in-depth investigation to uncover the inherent constraints of the exclusion: similar backdoor mappings are constructed for different targets, resulting in conflicts among backdoor functions. To address this limitation, we propose Mirage, the first non-cooperative MBA strategy in FL that allows attackers to inject effective and persistent backdoors into the global model without collusion by constructing in-distribution (ID) backdoor mapping. Specifically, we introduce an adversarial adaptation method to bridge the backdoor features and the target distribution in an ID manner. Additionally, we further leverage a constrained optimization method to ensure the ID mapping survives in the global training dynamics. Extensive evaluations demonstrate that Mirage outperforms various state-of-the-art attacks and bypasses existing defenses, achieving an average ASR greater than 97\% and maintaining over 90\% after 900 rounds. This work aims to alert researchers to this potential threat and inspire the design of effective defense mechanisms. Code has been made open-source.

Updated: 2025-03-22 05:51:50

标题: 暗中的内讧：联邦学习中的多标签后门攻击

摘要: 联邦学习（FL）是一种保护隐私的分散式机器学习框架，已被证明容易受到后门攻击。目前的研究主要集中在单标签后门攻击（SBA），其中对手共享一个一致的目标。然而，一个关键的事实被忽视了：对手可能是非合作的，有不同的目标，并且独立操作，这展示了一个更实际的情形，被称为多标签后门攻击（MBA）。不幸的是，先前的工作在MBA场景中是无效的，因为非合作的攻击者互相排斥。在这项工作中，我们进行了深入调查，揭示了排斥的固有限制：为不同的目标构建相似的后门映射，导致后门功能之间发生冲突。为了解决这一局限性，我们提出了Mirage，这是FL中第一个非合作的MBA策略，允许攻击者通过构建分布内（ID）后门映射，向全局模型注入有效且持久的后门，而无需勾结。具体而言，我们引入了一种敌对适应方法，以ID方式桥接后门特征和目标分布。此外，我们进一步利用约束优化方法来确保ID映射在全局训练动态中存活。广泛的评估表明，Mirage优于各种最先进的攻击方法，并绕过现有的防御，实现了平均ASR超过97\%，并在900轮后仍保持在90\%以上。这项工作旨在警示研究人员面临潜在威胁，并激发有效防御机制的设计。代码已开源。

更新时间: 2025-03-22 05:51:50

领域: cs.CR

下载: http://arxiv.org/abs/2409.19601v3

NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products

Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods are not well-suited for the unique tasks associated with natural products. To address these limitations, we have pre-trained a foundation model for natural products based on their unique properties. Our approach employs a novel pretraining strategy that is especially tailored to natural products. By incorporating contrastive learning and masked graph learning objectives, we emphasize evolutional information from molecular scaffolds while capturing side-chain information. Our framework achieves state-of-the-art (SOTA) results in various downstream tasks related to natural product mining and drug discovery. We first compare taxonomy classification with synthesized molecule-focused baselines to demonstrate that current models are inadequate for understanding natural synthesis. Furthermore, by diving into a fine-grained analysis at both the gene and microbial levels, NaFM demonstrates the ability to capture evolutionary information. Eventually, our method is experimented with virtual screening, illustrating informative natural product representations that can lead to more effective identification of potential drug candidates.

Updated: 2025-03-22 05:32:03

标题: NaFM：为小分子天然产物预训练的基础模型

摘要: 天然产物作为微生物、动物或植物的代谢产物，展现出多样化的生物活性，使其对药物发现至关重要。如今，现有的天然产物研究深度学习方法主要依赖于为特定下游任务设计的监督学习方法。然而，这种一模型一任务的范式通常缺乏泛化能力，并且留下了大量性能改进的空间。此外，现有的分子表征方法不适合与天然产物相关的独特任务。为了解决这些限制，我们基于天然产物的独特属性预训练了一个基础模型。我们的方法采用了一种特别针对天然产物的预训练策略。通过结合对比学习和掩蔽图学习目标，我们强调了分子骨架中的演化信息，同时捕获了侧链信息。我们的框架在各种与天然产物挖掘和药物发现相关的下游任务中实现了最先进的结果。我们首先将分类学与合成分子为重点的基线进行比较，以证明当前模型无法理解天然合成。此外，通过深入研究基因和微生物水平的细粒度分析，NaFM展示了捕获演化信息的能力。最后，我们的方法在虚拟筛选中进行了实验，展示了有益的天然产物表征，可以更有效地识别潜在的药物候选物。

更新时间: 2025-03-22 05:32:03

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.17656v1

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Escape rooms present a unique cognitive challenge that demands exploration-driven planning: players should actively search their environment, continuously update their knowledge based on new discoveries, and connect disparate clues to determine which elements are relevant to their objectives. Motivated by this, we introduce VisEscape, a benchmark of 20 virtual escape rooms specifically designed to evaluate AI models under these challenging conditions, where success depends not only on solving isolated puzzles but also on iteratively constructing and refining spatial-temporal knowledge of a dynamically changing environment. On VisEscape, we observe that even state-of-the-art multimodal models generally fail to escape the rooms, showing considerable variation in their levels of progress and trajectories. To address this issue, we propose VisEscaper, which effectively integrates Memory, Feedback, and ReAct modules, demonstrating significant improvements by performing 3.7 times more effectively and 4.9 times more efficiently on average compared to baseline agents.

Updated: 2025-03-22 05:06:18

标题: VisEscape：评估虚拟逃生室中基于探索驱动决策的基准Benchmark

摘要: 逃脱房间呈现了一种独特的认知挑战，要求探索驱动的规划：玩家应该积极搜索他们的环境，根据新发现不断更新他们的知识，并连接不同的线索，以确定哪些元素与他们的目标相关。受此启发，我们引入了VisEscape，这是一个由20个虚拟逃脱房间组成的基准，专门设计用于在这些具有挑战性的条件下评估AI模型，在这些条件下，成功不仅取决于解决孤立的难题，还取决于迭代地构建和完善对动态变化环境的空间时间知识。在VisEscape上，我们观察到，即使是最先进的多模型也通常无法逃脱房间，显示出在他们的进度和轨迹水平上有相当大的变化。为了解决这个问题，我们提出了VisEscaper，它有效地整合了记忆、反馈和ReAct模块，通过相比基线代理平均表现出3.7倍的效果提升和4.9倍的效率提升。

更新时间: 2025-03-22 05:06:18

领域: cs.AI

下载: http://arxiv.org/abs/2503.14427v2

The Federation Strikes Back: A Survey of Federated Learning Privacy Attacks, Defenses, Applications, and Policy Landscape

Deep learning has shown incredible potential across a wide array of tasks, and accompanied by this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices, and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology that enables collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be "reverse engineered" to infer information about the private training data. It has been shown under a wide variety of settings that this privacy premise does not hold. In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which the privacy of an FL client can be broken. We further dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL and conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.

Updated: 2025-03-22 04:46:17

标题: 《联邦学习的回击：联邦学习隐私攻击、防御、应用和政策格局调查》

摘要: 深度学习在各种任务中展现出了令人难以置信的潜力，随之而来的是对数据的无止境需求。然而，用于启用深度学习的大量数据存储在个人设备上，最近对隐私的担忧进一步凸显了访问此类数据的挑战。因此，联邦学习（FL）已经成为一种重要的隐私保护技术，它可以实现机器学习模型的协作训练，而无需将原始、可能敏感的数据发送到中央服务器。然而，仅当更新的模型不能被“反向工程”以推断有关私有训练数据的信息时，将模型更新发送到服务器是隐私保护的基本前提。在各种设置下已经证明这种隐私前提并不成立。在本调查论文中，我们提供了对FL中不同隐私攻击和防御方法的全面文献综述。我们确定了这些攻击的当前限制，并突出显示了FL客户的隐私可能被侵犯的设置。我们进一步剖析了一些成功的FL行业应用，并从中汲取未来成功采用的经验教训。我们调查了FL隐私监管的新兴形势，并总结了将FL引向生成准确模型并同时保护数据参与者隐私的珍贵目标的未来方向。

更新时间: 2025-03-22 04:46:17

领域: cs.CR,cs.LG,I.2; H.4; I.5

下载: http://arxiv.org/abs/2405.03636v3

Erasing Conceptual Knowledge from Language Models

In this work, we propose Erasure of Language Memory (ELM), an approach for concept-level unlearning built on the principle of matching the distribution defined by an introspective classifier. Our key insight is that effective unlearning should leverage the model's ability to evaluate its own knowledge, using the model itself as a classifier to identify and reduce the likelihood of generating content related to undesired concepts. ELM applies this framework to create targeted low-rank updates that reduce generation probabilities for concept-specific content while preserving the model's broader capabilities. We demonstrate ELM's efficacy on biosecurity, cybersecurity, and literary domain erasure tasks. Comparative analysis shows that ELM achieves superior performance across key metrics, including near-random scores on erased topic assessments, maintained coherence in text generation, preserved accuracy on unrelated benchmarks, and robustness under adversarial attacks. Our code, data, and trained models are available at https://elm.baulab.info

Updated: 2025-03-22 04:42:36

标题: 从语言模型中擦除概念知识

摘要: 在这项工作中，我们提出了一种语言记忆抹除（ELM）的方法，这是一种基于匹配内省分类器定义的分布原则建立的概念级别未学习方法。我们的关键洞察力在于，有效的未学习应该利用模型评估自己知识的能力，使用模型本身作为分类器来识别和减少生成与不希望的概念相关内容的可能性。ELM将这一框架应用于创建有针对性的低秩更新，减少特定概念内容的生成概率，同时保留模型更广泛的能力。我们展示了ELM在生物安全、网络安全和文学领域消除任务上的有效性。比较分析表明，ELM在关键指标上取得了卓越的表现，包括在消除主题评估中接近随机分数，在文本生成中保持连贯性，在不相关基准测试中保持准确性，并且在敌对攻击下具有强大的鲁棒性。我们的代码、数据和训练模型可在https://elm.baulab.info上获得。

更新时间: 2025-03-22 04:42:36

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.02760v2

Automated diagnosis of lung diseases using vision transformer: a comparative study on chest x-ray classification

Background: Lung disease is a significant health issue, particularly in children and elderly individuals. It often results from lung infections and is one of the leading causes of mortality in children. Globally, lung-related diseases claim many lives each year, making early and accurate diagnoses crucial. Radiographs are valuable tools for the diagnosis of such conditions. The most prevalent lung diseases, including pneumonia, asthma, allergies, chronic obstructive pulmonary disease (COPD), bronchitis, emphysema, and lung cancer, represent significant public health challenges. Early prediction of these conditions is critical, as it allows for the identification of risk factors and implementation of preventive measures to reduce the likelihood of disease onset Methods: In this study, we utilized a dataset comprising 3,475 chest X-ray images sourced from from Mendeley Data provided by Talukder, M. A. (2023) [14], categorized into three classes: normal, lung opacity, and pneumonia. We applied five pre-trained deep learning models, including CNN, ResNet50, DenseNet, CheXNet, and U-Net, as well as two transfer learning algorithms such as Vision Transformer (ViT) and Shifted Window (Swin) to classify these images. This approach aims to address diagnostic issues in lung abnormalities by reducing reliance on human intervention through automated classification systems. Our analysis was conducted in both binary and multiclass settings. Results: In the binary classification, we focused on distinguishing between normal and viral pneumonia cases, whereas in the multi-class classification, all three classes (normal, lung opacity, and viral pneumonia) were included. Our proposed methodology (ViT) achieved remarkable performance, with accuracy rates of 99% for binary classification and 95.25% for multiclass classification.

Updated: 2025-03-22 04:35:17

标题: 使用视觉转换器实现肺部疾病的自动诊断：胸部X射线分类的比较研究

摘要: 背景：肺部疾病是一个重要的健康问题，特别是在儿童和老年人中。它往往是由肺部感染引起的，是儿童死亡的主要原因之一。在全球范围内，肺部相关疾病每年夺走许多生命，使得早期和准确的诊断至关重要。放射照片是诊断此类疾病的有价值工具。最常见的肺部疾病，包括肺炎、哮喘、过敏、慢性阻塞性肺疾病（COPD）、支气管炎、肺气肿和肺癌，构成了重要的公共卫生挑战。早期预测这些疾病是至关重要的，因为它允许识别风险因素并实施预防措施以减少疾病发生的可能性。方法：在这项研究中，我们利用了一个包含3,475张胸部X光图像的数据集，这些图像来自Mendeley Data提供的Talukder, M. A. (2023) [14]，分为三类：正常、肺部浑浊和肺炎。我们应用了五种预训练的深度学习模型，包括CNN、ResNet50、DenseNet、CheXNet和U-Net，以及两种迁移学习算法，如Vision Transformer（ViT）和Shifted Window（Swin）来对这些图像进行分类。这种方法旨在通过自动分类系统减少对人类干预，以解决肺部异常的诊断问题。我们的分析在二元和多类设置中进行。结果：在二元分类中，我们专注于区分正常和病毒性肺炎病例，而在多类分类中，包括所有三类（正常、肺部浑浊和病毒性肺炎）。我们提出的方法（ViT）取得了显著的性能，二元分类的准确率达到99%，多类分类的准确率为95.25%。

更新时间: 2025-03-22 04:35:17

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.18973v1

A Modular Dataset to Demonstrate LLM Abstraction Capability

Large language models (LLMs) exhibit impressive capabilities but struggle with reasoning errors due to hallucinations and flawed logic. To investigate their internal representations of reasoning, we introduce ArrangementPuzzle, a novel puzzle dataset with structured solutions and automated stepwise correctness verification. We trained a classifier model on LLM activations on this dataset and found that it achieved over 80% accuracy in predicting reasoning correctness, implying that LLMs internally distinguish between correct and incorrect reasoning steps, with the strongest representations in middle-late Transformer layers. Further analysis reveals that LLMs encode abstract reasoning concepts within the middle activation layers of the transformer architecture, distinguishing logical from semantic equivalence. These findings provide insights into LLM reasoning mechanisms and contribute to improving AI reliability and interpretability, thereby offering the possibility to manipulate and refine LLM reasoning.

Updated: 2025-03-22 04:25:30

标题: 一个模块化数据集，用于展示LLM抽象能力

摘要: 大型语言模型(LLMs)展示出令人印象深刻的能力，但由于幻觉和错误逻辑而在推理方面遇到困难。为了研究它们对推理的内部表示，我们引入了ArrangementPuzzle，这是一个具有结构化解决方案和自动分步正确性验证的新型谜题数据集。我们在这个数据集上训练了一个分类器模型，发现它在预测推理正确性方面的准确率超过80%，这意味着LLMs在内部区分正确和不正确的推理步骤，最强大的表示在中后期Transformer层中。进一步的分析揭示了LLMs在transformer架构的中间激活层中编码抽象推理概念，区分逻辑和语义等价性。这些发现为LLMs推理机制提供了见解，并有助于提高人工智能的可靠性和可解释性，从而提供了操纵和完善LLMs推理的可能性。

更新时间: 2025-03-22 04:25:30

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.17645v1

Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic

Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquitous robotics scenario where performance maximization directly conflicts with energy conservation. Such competitive dynamics are resistant to simple linear reward combinations. In this paper, we present the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL). FPL allows practitioners to define logical formula representing their intentions and priorities within multi-objective reinforcement learning. Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500\% better sample efficiency compared to Soft Actor Critic. Notably, this work constitutes the first implementation of non-linear utility scalarization design, specifically for continuous control problems.

Updated: 2025-03-22 04:22:47

标题: 通过履行优先逻辑来消除意图与行为之间的差距

摘要: Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquitous robotics scenario where performance maximization directly conflicts with energy conservation. Such competitive dynamics are resistant to simple linear reward combinations. In this paper, we present the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL). FPL allows practitioners to define logical formula representing their intentions and priorities within multi-objective reinforcement learning. Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500% better sample efficiency compared to Soft Actor Critic. Notably, this work constitutes the first implementation of non-linear utility scalarization design, specifically for continuous control problems.

更新时间: 2025-03-22 04:22:47

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.05818v2

On The Sample Complexity Bounds In Bilevel Reinforcement Learning

Bilevel reinforcement learning (BRL) has emerged as a powerful mathematical framework for studying generative AI alignment and related problems. While several principled algorithmic frameworks have been proposed, key theoretical foundations, particularly those related to sample complexity, remain underexplored. Understanding and deriving tight sample complexity bounds are crucial for bridging the gap between theory and practice, guiding the development of more efficient algorithms. In this work, we present the first sample complexity result for BRL, achieving a bound of $\epsilon^{-4}$. This result extends to standard bilevel optimization problems, providing an interesting theoretical contribution with practical implications. To address the computational challenges associated with hypergradient estimation in bilevel optimization, we develop a first-order Hessian-free algorithm that does not rely on costly hypergradient computations. By leveraging matrix-free techniques and constrained optimization methods, our approach ensures scalability and practicality. Our findings pave the way for improved methods in AI alignment and other fields reliant on bilevel optimization.

Updated: 2025-03-22 04:22:04

标题: 关于双层强化学习中样本复杂性界限的研究

摘要: 胆汁层次强化学习（BRL）已经成为研究生成AI对齐及相关问题的强大数学框架。虽然已经提出了几种原则性的算法框架，但关键的理论基础，特别是与样本复杂度相关的部分，仍未得到充分探索。理解和推导严格的样本复杂度界限对于弥合理论与实践之间的差距、指导更有效算法的发展至关重要。在本研究中，我们提出了BRL的第一个样本复杂度结果，实现了一个边界为ε^{-4}。这一结果扩展到标准的双层优化问题，提供了一个具有实际意义的有趣的理论贡献。为了解决与双层优化中的超梯度估计相关的计算挑战，我们开发了一种不依赖于昂贵的超梯度计算的一阶无Hessian算法。通过利用无矩阵技术和受限优化方法，我们的方法确保了可扩展性和实用性。我们的发现为AI对齐和其他依赖于双层优化的领域的改进方法铺平了道路。

更新时间: 2025-03-22 04:22:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17644v1

Privacy-Preserving Federated Learning with Differentially Private Hyperdimensional Computing

Federated Learning (FL) has become a key method for preserving data privacy in Internet of Things (IoT) environments, as it trains Machine Learning (ML) models locally while transmitting only model updates. Despite this design, FL remains susceptible to threats such as model inversion and membership inference attacks, which can reveal private training data. Differential Privacy (DP) techniques are often introduced to mitigate these risks, but simply injecting DP noise into black-box ML models can compromise accuracy, particularly in dynamic IoT contexts, where continuous, lifelong learning leads to excessive noise accumulation. To address this challenge, we propose Federated HyperDimensional computing with Privacy-preserving (FedHDPrivacy), an eXplainable Artificial Intelligence (XAI) framework that integrates neuro-symbolic computing and DP. Unlike conventional approaches, FedHDPrivacy actively monitors the cumulative noise across learning rounds and adds only the additional noise required to satisfy privacy constraints. In a real-world application for monitoring manufacturing machining processes, FedHDPrivacy maintains high performance while surpassing standard FL frameworks - Federated Averaging (FedAvg), Federated Proximal (FedProx), Federated Normalized Averaging (FedNova), and Federated Optimization (FedOpt) - by up to 37%. Looking ahead, FedHDPrivacy offers a promising avenue for further enhancements, such as incorporating multimodal data fusion.

Updated: 2025-03-22 04:10:19

标题: 隐私保护的具有差分隐私高维计算的联邦学习

摘要: 联邦学习（FL）已成为在物联网（IoT）环境中保护数据隐私的关键方法，因为它在本地训练机器学习（ML）模型的同时仅传输模型更新。尽管有这种设计，FL仍然容易受到威胁，如模型反演和成员推断攻击，这些攻击可以揭示私有训练数据。差分隐私（DP）技术通常被引入以减轻这些风险，但仅仅将DP噪声注入黑盒ML模型可能会损害准确性，特别是在动态的IoT环境中，其中持续的终身学习导致噪声累积过多。为了解决这一挑战，我们提出了一种隐私保护的联邦高维计算（FedHDPrivacy）框架，这是一个可解释的人工智能（XAI）框架，它集成了神经符号计算和DP。与传统方法不同，FedHDPrivacy主动监视学习轮次中的累积噪声，并仅添加额外的噪声以满足隐私约束条件。在监测制造加工过程的实际应用中，FedHDPrivacy在保持高性能的同时，超越了标准的FL框架 - 联邦平均（FedAvg）、联邦近端（FedProx）、联邦归一化平均（FedNova）和联邦优化（FedOpt） - 高达37%。展望未来，FedHDPrivacy为进一步提升提供了有前途的途径，例如整合多模态数据融合。

更新时间: 2025-03-22 04:10:19

领域: cs.LG,cs.AI,cs.CR,stat.ML

下载: http://arxiv.org/abs/2411.01140v3

On the Hopf-Cole Transform for Control-affine Schrödinger Bridge

The purpose of this note is to clarify the importance of the relation $\boldsymbol{gg}^{\top}\propto \boldsymbol{\sigma\sigma}^{\top}$ in solving control-affine Schr\"{o}dinger bridge problems via the Hopf-Cole transform, where $\boldsymbol{g},\boldsymbol{\sigma}$ are the control and noise coefficients, respectively. We show that the Hopf-Cole transform applied to the conditions of optimality for generic control-affine Schr\"{o}dinger bridge problems, i.e., without the assumption $\boldsymbol{gg}^{\top}\propto\boldsymbol{\sigma\sigma}^{\top}$, gives a pair of forward-backward PDEs that are neither linear nor equation-level decoupled. We explain how the resulting PDEs can be interpreted as nonlinear forward-backward advection-diffusion-reaction equations, where the nonlinearity stem from additional drift and reaction terms involving the gradient of the log-likelihood a.k.a. the score. These additional drift and reaction vanish when $\boldsymbol{gg}^{\top}\propto\boldsymbol{\sigma\sigma}^{\top}$, and the resulting boundary-coupled system of linear PDEs can then be solved by dynamic Sinkhorn recursions. A key takeaway of our work is that the numerical solution of the generic control-affine Schr\"{o}dinger bridge requires further algorithmic development, possibly generalizing the dynamic Sinkhorn recursion or otherwise.

Updated: 2025-03-22 04:08:10

标题: 关于控制仿射薛定谔桥的Hopf-Cole变换

摘要: 这份笔记的目的是澄清在通过Hopf-Cole变换解决控制仿射Schr\"{o}dinger桥问题中关系$\boldsymbol{gg}^{\top}\propto \boldsymbol{\sigma\sigma}^{\top}$的重要性，其中$\boldsymbol{g},\boldsymbol{\sigma}$分别是控制和噪声系数。我们展示了Hopf-Cole变换应用于一般控制仿射Schr\"{o}dinger桥问题的最优条件时，即没有假设$\boldsymbol{gg}^{\top}\propto\boldsymbol{\sigma\sigma}^{\top}$时，会产生一对前向-后向PDEs，这些PDE既不是线性的也不是方程级解耦的。我们解释了这些PDE的结果可以被解释为非线性的前向-后向对流-扩散-反应方程，其中非线性性来自涉及对数似然梯度的额外漂移和反应项。当$\boldsymbol{gg}^{\top}\propto\boldsymbol{\sigma\sigma}^{\top}$时，这些额外的漂移和反应会消失，然后得到的边界耦合线性PDE系统可以通过动态Sinkhorn递归来解决。我们工作的一个关键点是，解决一般控制仿射Schr\"{o}dinger桥的数值解需要进一步的算法发展，可能是泛化动态Sinkhorn递归或其他方法。

更新时间: 2025-03-22 04:08:10

领域: math.OC,cs.AI,cs.LG,cs.SY,eess.SY,stat.ML

下载: http://arxiv.org/abs/2503.17640v1

Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry

We study large deviation upper bounds and mean-squared error (MSE) guarantees of a general framework of nonlinear stochastic gradient methods in the online setting, in the presence of heavy-tailed noise. Unlike existing works that rely on the closed form of a nonlinearity (typically clipping), our framework treats the nonlinearity in a black-box manner, allowing us to provide unified guarantees for a broad class of bounded nonlinearities, including many popular ones, like sign, quantization, normalization, as well as component-wise and joint clipping. We provide several strong results for a broad range of step-sizes in the presence of heavy-tailed noise with symmetric probability density function, positive in a neighbourhood of zero and potentially unbounded moments. In particular, for non-convex costs we provide a large deviation upper bound for the minimum norm-squared of gradients, showing an asymptotic tail decay on an exponential scale, at a rate $\sqrt{t} / \log(t)$. We establish the accompanying rate function, showing an explicit dependence on the choice of step-size, nonlinearity, noise and problem parameters. Next, for non-convex costs and the minimum norm-squared of gradients, we derive the optimal MSE rate $\widetilde{\mathcal{O}}(t^{-1/2})$. Moreover, for strongly convex costs and the last iterate, we provide an MSE rate that can be made arbitrarily close to the optimal rate $\mathcal{O}(t^{-1})$, improving on the state-of-the-art results in the presence of heavy-tailed noise. Finally, we establish almost sure convergence of the minimum norm-squared of gradients, providing an explicit rate, which can be made arbitrarily close to $o(t^{-1/4})$.

Updated: 2025-03-22 03:59:25

标题: 非线性随机梯度下降的大偏差上界和改进的均方误差率：重尾噪声和对称性的力量

摘要: 我们研究在线设置中非线性随机梯度方法的大偏差上界和均方误差（MSE）保证的一般框架，在存在重尾噪声的情况下。与依赖非线性闭合形式（通常为截断）的现有作品不同，我们的框架以黑盒的方式处理非线性，使我们能够为一类广泛的有界非线性提供统一保证，包括许多流行的非线性，如符号、量化、归一化，以及分量和联合截断。在存在具有对称概率密度函数、在零点邻域内为正且可能无界矩的重尾噪声下，我们针对广泛范围的步长提供了几个强大的结果。特别是对于非凸成本，我们为梯度最小范数的大偏差上限提供了一个指数尺度上的渐近尾部衰减，速率为$\sqrt{t}/\log(t)$。我们建立了相应的速率函数，显示了对步长、非线性、噪声和问题参数选择的显式依赖性。接下来，对于非凸成本和梯度最小范数，我们推导了最优的MSE速率$\widetilde{\mathcal{O}}(t^{-1/2})$。此外，对于强凸成本和最后的迭代，我们提供了一个MSE速率，可以无限接近最优速率$\mathcal{O}(t^{-1})，在存在重尾噪声的情况下改进了现有技术水平。最后，我们建立了梯度最小范数的几乎必然收敛性，并提供了一个明确的速率，可以无限接近$o(t^{-1/4})。

更新时间: 2025-03-22 03:59:25

领域: cs.LG,math.OC,math.PR

下载: http://arxiv.org/abs/2410.15637v2

FairFlow: Mitigating Dataset Biases through Undecided Learning

Language models are prone to dataset biases, known as shortcuts and spurious correlations in data, which often result in performance drop on new data. We present a new debiasing framework called ``FairFlow'' that mitigates dataset biases by learning to be undecided in its predictions for data samples or representations associated with known or unknown biases. The framework introduces two key components: a suite of data and model perturbation operations that generate different biased views of input samples, and a contrastive objective that learns debiased and robust representations from the resulting biased views of samples. Experiments show that FairFlow outperforms existing debiasing methods, particularly against out-of-domain and hard test samples without compromising the in-domain performance

Updated: 2025-03-22 03:35:51

标题: FairFlow：通过未决学习减轻数据集偏差

摘要: 语言模型容易受到数据集偏见的影响，即数据中的捷径和虚假相关性，这往往会导致在新数据上性能下降。我们提出了一种新的去偏见框架称为“FairFlow”，通过学习在其对具有已知或未知偏见的数据样本或表示的预测中保持中立来减轻数据集偏见。该框架引入了两个关键组件：一系列数据和模型扰动操作，生成输入样本的不同有偏见的视图，以及从样本的这些有偏见视图中学习去偏见和稳健表示的对比目标。实验表明FairFlow优于现有的去偏见方法，特别是在领域外和难度测试样本上表现更好，而不会损害领域内性能。

更新时间: 2025-03-22 03:35:51

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.17632v1

LLMs as Planning Modelers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models

Large Language Models (LLMs) excel in various natural language tasks but often struggle with long-horizon planning problems requiring structured reasoning. This limitation has drawn interest in integrating neuro-symbolic approaches within the Automated Planning (AP) and Natural Language Processing (NLP) communities. However, identifying optimal AP deployment frameworks can be daunting. This paper aims to provide a timely survey of the current research with an in-depth analysis, positioning LLMs as tools for extracting and refining planning models to support reliable AP planners. By systematically reviewing the current state of research, we highlight methodologies, and identify critical challenges and future directions, hoping to contribute to the joint research on NLP and Automated Planning.

Updated: 2025-03-22 03:35:44

标题: LLMs作为规划建模者：利用大型语言模型构建自动规划模型的调查

摘要: 大型语言模型（LLMs）在各种自然语言任务中表现出色，但通常在需要结构化推理的长期规划问题上往往遇到困难。这种限制引起了在自动规划（AP）和自然语言处理（NLP）社区中整合神经符号方法的兴趣。然而，确定最佳的AP部署框架可能是令人生畏的。本文旨在提供对当前研究的及时调查和深入分析，将LLMs定位为提取和完善规划模型以支持可靠AP规划者的工具。通过系统地审查当前研究现状，我们强调方法论，并确定关键挑战和未来方向，希望为NLP和自动规划的联合研究做出贡献。

更新时间: 2025-03-22 03:35:44

领域: cs.AI

下载: http://arxiv.org/abs/2503.18971v1

A Comprehensive Survey on Self-Interpretable Neural Networks

Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides explanations for pre-trained models, is often at risk of robustness and fidelity. This has inspired a rising interest in self-interpretable neural networks, which inherently reveal the prediction rationale through the model structures. Although there exist surveys on post-hoc interpretability, a comprehensive and systematic survey of self-interpretable neural networks is still missing. To address this gap, we first collect and review existing works on self-interpretable neural networks and provide a structured summary of their methodologies from five key perspectives: attribution-based, function-based, concept-based, prototype-based, and rule-based self-interpretation. We also present concrete, visualized examples of model explanations and discuss their applicability across diverse scenarios, including image, text, graph data, and deep reinforcement learning. Additionally, we summarize existing evaluation metrics for self-interpretability and identify open challenges in this field, offering insights for future research. To support ongoing developments, we present a publicly accessible resource to track advancements in this domain: https://github.com/yangji721/Awesome-Self-Interpretable-Neural-Network.

Updated: 2025-03-22 03:32:46

标题: 一份关于自解释神经网络的综合调查

摘要: 神经网络在各个领域取得了显著的成功。然而，缺乏解释性限制了它们的实际应用，特别是在关键决策场景中。事后可解释性通常会影响到模型的稳健性和忠实度。这激发了对自解释性神经网络的兴趣，这种网络通过模型结构本身揭示了预测的理由。尽管存在关于事后解释性的调查，但对自解释性神经网络的全面系统的调查仍然缺失。为了弥补这一空白，我们首先收集并审查现有的关于自解释性神经网络的作品，并从五个关键的角度提供了它们的方法的结构化总结：基于归因、基于函数、基于概念、基于原型和基于规则的自解释。我们还展示了模型解释的具体可视化示例，并讨论了它们在各种场景中的适用性，包括图像、文本、图数据和深度强化学习。此外，我们总结了现有的自解释性评估指标，并确定了该领域的开放挑战，为未来研究提供见解。为了支持正在进行的发展，我们提供了一个公开可访问的资源，以跟踪该领域的进展：https://github.com/yangji721/Awesome-Self-Interpretable-Neural-Network。

更新时间: 2025-03-22 03:32:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.15638v2

Data Reconstruction Attacks and Defenses: A Systematic Evaluation

Reconstruction attacks and defenses are essential in understanding the data leakage problem in machine learning. However, prior work has centered around empirical observations of gradient inversion attacks, lacks theoretical grounding, and cannot disentangle the usefulness of defending methods from the computational limitation of attacking methods. In this work, we propose to view the problem as an inverse problem, enabling us to theoretically and systematically evaluate the data reconstruction attack. On various defense methods, we derived the algorithmic upper bound and the matching (in feature dimension and architecture dimension) information-theoretical lower bound on the reconstruction error for two-layer neural networks. To complement the theoretical results and investigate the utility-privacy trade-off, we defined a natural evaluation metric of the defense methods with similar utility loss among the strongest attacks. We further propose a strong reconstruction attack that helps update some previous understanding of the strength of defense methods under our proposed evaluation metric.

Updated: 2025-03-22 03:29:27

标题: 数据重建攻击与防御：系统评估

摘要: 重建攻击和防御在理解机器学习中的数据泄漏问题中是必不可少的。然而，先前的工作主要集中在对梯度反转攻击的经验观察上，缺乏理论基础，并且无法区分防御方法的实用性和攻击方法的计算限制。在这项工作中，我们提议将问题视为一个逆问题，使我们能够理论上和系统地评估数据重建攻击。对于各种防御方法，我们推导出了两层神经网络重建误差的算法上界和匹配（在特征维度和架构维度上）的信息理论下界。为了补充理论结果并研究实用性-隐私权衡，我们定义了一种对防御方法进行评估的自然评估指标，该指标在最强攻击中具有相似的实用性损失。我们进一步提出了一种强大的重建攻击，有助于更新先前对我们提出的评估指标下防御方法强度的理解。

更新时间: 2025-03-22 03:29:27

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2402.09478v3

Enhancing Job Salary Prediction with Disentangled Composition Effect Modeling: A Neural Prototyping Approach

In the era of the knowledge economy, understanding how job skills influence salary is crucial for promoting recruitment with competitive salary systems and aligned salary expectations. Despite efforts on salary prediction based on job positions and talent demographics, there still lacks methods to effectively discern the set-structured skills' intricate composition effect on job salary. While recent advances in neural networks have significantly improved accurate set-based quantitative modeling, their lack of explainability hinders obtaining insights into the skills' composition effects. Indeed, model explanation for set data is challenging due to the combinatorial nature, rich semantics, and unique format. To this end, in this paper, we propose a novel intrinsically explainable set-based neural prototyping approach, namely \textbf{LGDESetNet}, for explainable salary prediction that can reveal disentangled skill sets that impact salary from both local and global perspectives. Specifically, we propose a skill graph-enhanced disentangled discrete subset selection layer to identify multi-faceted influential input subsets with varied semantics. Furthermore, we propose a set-oriented prototype learning method to extract globally influential prototypical sets. The resulting output is transparently derived from the semantic interplay between these input subsets and global prototypes. Extensive experiments on four real-world datasets demonstrate that our method achieves superior performance than state-of-the-art baselines in salary prediction while providing explainable insights into salary-influencing patterns.

Updated: 2025-03-22 03:28:19

标题: 用解耦合效应建模增强工作薪资预测：一种神经原型方法

摘要: 在知识经济时代，了解工作技能如何影响薪水对于推动具有竞争力的薪酬体系和薪酬期望的招聘至关重要。尽管已经在基于职位和人才人口统计数据的薪酬预测上做出了努力，但仍然缺乏有效区分结构化技能对工作薪水的复杂影响的方法。虽然最近神经网络的进步显著提高了基于集合的定量建模的准确性，但它们缺乏可解释性，阻碍了对技能组合效应的洞察。事实上，针对集合数据的模型解释具有挑战性，因为其组合性质、丰富语义和独特格式。因此，在本文中，我们提出了一种新颖的内在可解释的基于集合的神经原型方法，即\textbf{LGDESetNet}，用于可解释的薪酬预测，可以从局部和全局角度揭示影响薪水的技能集。具体来说，我们提出了一种技能图增强的分解离散子集选择层，以识别具有不同语义的多方面影响力输入子集。此外，我们提出了一种面向集合的原型学习方法，以提取全局具有影响力的原型集。最终输出透明地来源于这些输入子集和全局原型之间的语义互动。对四个真实数据集的广泛实验表明，我们的方法在薪水预测方面比最先进的基准线表现优越，同时提供了关于影响薪水模式的可解释见解。

更新时间: 2025-03-22 03:28:19

领域: cs.LG

下载: http://arxiv.org/abs/2503.12978v2

Generating Realistic, Diverse, and Fault-Revealing Inputs with Latent Space Interpolation for Testing Deep Neural Networks

Deep Neural Networks (DNNs) have been widely employed across various domains, including safety-critical systems, necessitating comprehensive testing to ensure their reliability. Although numerous DNN model testing methods have been proposed to generate adversarial samples that are capable of revealing faults, existing methods typically perturb samples in the input space and then mutate these based on feedback from the DNN model. These methods often result in test samples that are not realistic and with low-probability reveal faults. To address these limitations, we propose a black-box DNN test input generation method, ARGUS, to generate realistic, diverse, and fault-revealing test inputs. ARGUS first compresses samples into a continuous latent space and then perturbs the original samples by interpolating these with samples of different classes. Subsequently, we employ a vector quantizer and decoder to reconstruct adversarial samples back into the input space. Additionally, we employ discriminators both in the latent space and in the input space to ensure the realism of the generated samples. Evaluation of ARGUS in comparison with state-of-the-art black-box testing and white-box testing methods, shows that ARGUS excels in generating realistic and diverse adversarial samples relative to the target dataset, and ARGUS successfully perturbs all original samples and achieves up to 4 times higher error rate than the best baseline method. Furthermore, using these adversarial samples for model retraining can improve model classification accuracy.

Updated: 2025-03-22 03:19:55

标题: 使用潜在空间插值生成逼真、多样和暴露缺陷的输入，用于测试深度神经网络

摘要: 深度神经网络（DNNs）已被广泛应用于各个领域，包括安全关键系统，需要全面测试以确保其可靠性。尽管已经提出了许多DNN模型测试方法来生成能够揭示故障的对抗样本，但现有方法通常会在输入空间中扰动样本，然后根据DNN模型的反馈进行变异。这些方法通常会导致不现实且低概率揭示故障的测试样本。为了解决这些限制，我们提出了一种黑盒DNN测试输入生成方法ARGUS，用于生成现实、多样化和揭示故障的测试输入。ARGUS首先将样本压缩成连续的潜在空间，然后通过将这些样本与不同类别的样本进行插值来扰动原始样本。随后，我们使用向量量化器和解码器将对抗样本重新构建回输入空间。此外，我们在潜在空间和输入空间中都使用鉴别器来确保生成样本的现实性。与最先进的黑盒测试和白盒测试方法相比，ARGUS的评估显示ARGUS在生成相对于目标数据集现实和多样化的对抗样本方面表现优秀，ARGUS成功扰动了所有原始样本，并且比最佳基线方法实现了高达4倍的错误率。此外，使用这些对抗样本进行模型重新训练可以提高模型分类准确度。

更新时间: 2025-03-22 03:19:55

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2503.17630v1

Threshold Adaptation in Spiking Networks Enables Shortest Path Finding and Place Disambiguation

Efficient spatial navigation is a hallmark of the mammalian brain, inspiring the development of neuromorphic systems that mimic biological principles. Despite progress, implementing key operations like back-tracing and handling ambiguity in bio-inspired spiking neural networks remains an open challenge. This work proposes a mechanism for activity back-tracing in arbitrary, uni-directional spiking neuron graphs. We extend the existing replay mechanism of the spiking hierarchical temporal memory (S-HTM) by our spike timing-dependent threshold adaptation (STDTA), which enables us to perform path planning in networks of spiking neurons. We further present an ambiguity dependent threshold adaptation (ADTA) for identifying places in an environment with less ambiguity, enhancing the localization estimate of an agent. Combined, these methods enable efficient identification of the shortest path to an unambiguous target. Our experiments show that a network trained on sequences reliably computes shortest paths with fewer replays than the steps required to reach the target. We further show that we can identify places with reduced ambiguity in multiple, similar environments. These contributions advance the practical application of biologically inspired sequential learning algorithms like the S-HTM towards neuromorphic localization and navigation.

Updated: 2025-03-22 03:18:44

标题: 尖峰网络中的阈值适应性使得最短路径搜索和位置消歧成为可能

摘要: 高效的空间导航是哺乳动物大脑的特点，启发了仿生原则的神经形态系统的发展。尽管取得了进展，但在仿生脉冲神经网络中实现关键操作，如回溯和处理模糊性仍然是一个开放的挑战。本研究提出了一种在任意单向脉冲神经图中进行活动回溯的机制。我们通过我们的脉冲时序依赖阈值适应（STDTA）扩展了脉冲分层时间记忆（S-HTM）的现有重放机制，这使我们能够在脉冲神经网络中进行路径规划。我们进一步提出了一个模糊性相关的阈值适应（ADTA），用于识别环境中模糊性较小的位置，增强代理的定位估计。结合使用，这些方法使得能够高效地识别到达无歧义目标的最短路径。我们的实验表明，训练过的网络可可可靠地计算出最短路径，所需的重放次数比到达目标所需的步数少。我们进一步展示了在多个相似环境中可以识别具有减少模糊性的位置。这些贡献推动了像S-HTM这样的生物启发式序列学习算法在神经形态定位和导航方面的实际应用。

更新时间: 2025-03-22 03:18:44

领域: cs.NE,cs.AI,cs.RO

下载: http://arxiv.org/abs/2503.21795v1

Planning and Learning in Average Risk-aware MDPs

For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. However, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo method, and an off-policy algorithm dedicated to utility-base shortfall risk measures. Both the RVI and MLMC-based Q-learning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirms empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.

Updated: 2025-03-22 03:18:09

标题: 在平均风险意识MDP中的规划和学习

摘要: 在持续性任务中，平均成本马尔可夫决策过程具有深入的价值，并且可以使用高效算法解决。然而，它明确假设代理人是风险中性的。在这项工作中，我们将风险中性算法扩展到适应更一般的动态风险度量类别。具体而言，我们提出了一个相对价值迭代（RVI）算法，用于规划和设计两种无模型Q学习算法，即基于多层蒙特卡洛方法的通用算法，以及专门针对基于效用的缺口风险度量的离线策略算法。证明了RVI和基于MLMC的Q学习算法收敛到最优解。数值实验验证了我们的分析，经验证了离线策略算法的收敛性，并证明了我们的方法使得能够确定出对服务的代理人的复杂风险意识进行精细调整的策略。

更新时间: 2025-03-22 03:18:09

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2503.17629v1

Implicit Image-to-Image Schrodinger Bridge for Image Restoration

Diffusion-based models have demonstrated remarkable effectiveness in image restoration tasks; however, their iterative denoising process, which starts from Gaussian noise, often leads to slow inference speeds. The Image-to-Image Schr\"odinger Bridge (I$^2$SB) offers a promising alternative by initializing the generative process from corrupted images while leveraging training techniques from score-based diffusion models. In this paper, we introduce the Implicit Image-to-Image Schr\"odinger Bridge (I$^3$SB) to further accelerate the generative process of I$^2$SB. I$^3$SB restructures the generative process into a non-Markovian framework by incorporating the initial corrupted image at each generative step, effectively preserving and utilizing its information. To enable direct use of pretrained I$^2$SB models without additional training, we ensure consistency in marginal distributions. Extensive experiments across many image corruptions, including noise, low resolution, JPEG compression, and sparse sampling, and multiple image modalities, such as natural, human face, and medical images, demonstrate the acceleration benefits of I$^3$SB. Compared to I$^2$SB, I$^3$SB achieves the same perceptual quality with fewer generative steps, while maintaining or improving fidelity to the ground truth.

Updated: 2025-03-22 03:07:11

标题: 隐式图像到图像的薛定谔桥用于图像恢复

摘要: 扩散模型在图像恢复任务中表现出显著的有效性；然而，其迭代去噪过程，从高斯噪声开始，通常会导致推理速度较慢。图像到图像的Schr\"odinger桥梁（I$^2$SB）通过从受损图像初始化生成过程，同时利用基于分数的扩散模型的训练技术，提供了一种有前途的替代方案。在本文中，我们引入了隐式图像到图像的Schr\"odinger桥梁（I$^3$SB）来进一步加速I$^2$SB的生成过程。I$^3$SB通过将初始受损图像纳入每个生成步骤，将生成过程重构为非马尔可夫框架，有效地保留并利用其信息。为了能够直接使用预训练的I$^2$SB模型而无需额外训练，我们确保边缘分布的一致性。在许多图像损坏情况下进行了广泛实验证明，包括噪声、低分辨率、JPEG压缩和稀疏采样，以及多种图像模态，如自然、人脸和医学图像，展示了I$^3$SB的加速优势。与I$^2$SB相比，I$^3$SB在生成步骤更少的情况下实现了相同的感知质量，同时保持或改善了与真实情况的保真度。

更新时间: 2025-03-22 03:07:11

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.06069v3

Hamiltonian Monte Carlo Inference of Marginalized Linear Mixed-Effects Models

Bayesian reasoning in linear mixed-effects models (LMMs) is challenging and often requires advanced sampling techniques like Markov chain Monte Carlo (MCMC). A common approach is to write the model in a probabilistic programming language and then sample via Hamiltonian Monte Carlo (HMC). However, there are many ways a user can transform a model that make inference more or less efficient. In particular, marginalizing some variables can greatly improve inference but is difficult for users to do manually. We develop an algorithm to easily marginalize random effects in LMMs. A naive approach introduces cubic time operations within an inference algorithm like HMC, but we reduce the running time to linear using fast linear algebra techniques. We show that marginalization is always beneficial when applicable and highlight improvements in various models, especially ones from cognitive sciences.

Updated: 2025-03-22 03:06:20

标题: 汉密尔顿蒙特卡洛推断边际线性混合效应模型

摘要: 线性混合效应模型（LMMs）中的贝叶斯推理具有挑战性，通常需要像马尔科夫链蒙特卡罗（MCMC）这样的高级抽样技术。一种常见的方法是将模型写成概率编程语言，然后通过哈密顿蒙特卡罗（HMC）进行抽样。然而，用户可以通过多种方式转换模型，从而使推理更加高效或低效。特别是，边缘化一些变量可以极大地改善推理，但对用户来说手动执行较为困难。我们开发了一种算法，可以轻松地边缘化LMMs中的随机效应。一种朴素的方法会在类似HMC的推理算法中引入立方时间操作，但我们利用快速线性代数技术将运行时间减少到线性。我们表明，边缘化在适用时总是有益的，并突出了在各种模型中的改进，特别是认知科学模型。

更新时间: 2025-03-22 03:06:20

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.24079v3

Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots

Reinforcement learning (RL) has demonstrated remarkable capability in acquiring robot skills, but learning each new skill still requires substantial data collection for training. The pretrain-and-finetune paradigm offers a promising approach for efficiently adapting to new robot entities and tasks. Inspired by the idea that acquired knowledge can accelerate learning new tasks with the same robot and help a new robot master a trained task, we propose a latent training framework where a transferable latent-to-latent locomotion policy is pretrained alongside diverse task-specific observation encoders and action decoders. This policy in latent space processes encoded latent observations to generate latent actions to be decoded, with the potential to learn general abstract motion skills. To retain essential information for decision-making and control, we introduce a diffusion recovery module that minimizes information reconstruction loss during pretrain stage. During fine-tune stage, the pretrained latent-to-latent locomotion policy remains fixed, while only the lightweight task-specific encoder and decoder are optimized for efficient adaptation. Our method allows a robot to leverage its own prior experience across different tasks as well as the experience of other morphologically diverse robots to accelerate adaptation. We validate our approach through extensive simulations and real-world experiments, demonstrating that the pretrained latent-to-latent locomotion policy effectively generalizes to new robot entities and tasks with improved efficiency.

Updated: 2025-03-22 03:01:25

标题: 可转移的潜在到潜在的运动策略：用于多样化四足机器人高效灵活运动控制的标题

摘要: 强化学习（RL）已经在获取机器人技能方面展示出卓越的能力，但学习每个新技能仍然需要大量数据收集进行训练。预训练和微调范式提供了一种有效适应新机器人实体和任务的方法。受到已获知识可以加速学习相同机器人的新任务并帮助新机器人掌握已训练任务的启发，我们提出了一个潜在训练框架，其中一个可转移的潜在到潜在的运动策略与多样的任务特定观察编码器和动作解码器一起进行预训练。这个潜在空间中的策略处理编码的潜在观察以生成将被解码的潜在动作，具有学习一般抽象运动技能的潜力。为了保留决策和控制的基本信息，我们引入了一个扩散恢复模块，在预训练阶段最小化信息重建损失。在微调阶段，预训练的潜在到潜在的运动策略保持不变，只有轻量级的任务特定编码器和解码器进行优化以实现高效的适应。我们的方法允许机器人利用其在不同任务中的先前经验以及其他形态多样的机器人的经验来加速适应。我们通过广泛的模拟和现实世界实验验证了我们的方法，展示出预训练的潜在到潜在的运动策略有效地推广到新的机器人实体和任务，并提高了效率。

更新时间: 2025-03-22 03:01:25

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.17626v1

AI-Based Screening for Depression and Social Anxiety Through Eye Tracking: An Exploratory Study

Well-being is a dynamic construct that evolves over time and fluctuates within individuals, presenting challenges for accurate quantification. Reduced well-being is often linked to depression or anxiety disorders, which are characterised by biases in visual attention towards specific stimuli, such as human faces. This paper introduces a novel approach to AI-assisted screening of affective disorders by analysing visual attention scan paths using convolutional neural networks (CNNs). Data were collected from two studies examining (1) attentional tendencies in individuals diagnosed with major depression and (2) social anxiety. These data were processed using residual CNNs through images generated from eye-gaze patterns. Experimental results, obtained with ResNet architectures, demonstrated an average accuracy of 48% for a three-class system and 62% for a two-class system. Based on these exploratory findings, we propose that this method could be employed in rapid, ecological, and effective mental health screening systems to assess well-being through eye-tracking.

Updated: 2025-03-22 02:53:02

标题: 基于眼动跟踪的人工智能筛查抑郁症和社交焦虑：一项探索性研究

摘要: 幸福感是一个随时间演变和在个体内波动的动态构建，这给准确量化带来了挑战。降低的幸福感通常与抑郁或焦虑障碍有关，这些障碍表现为对特定刺激，如人脸，的视觉注意偏向。本文介绍了一种通过分析使用卷积神经网络（CNNs）的视觉注意扫描路径来辅助筛查情感障碍的新方法。数据来自两项研究，分别研究了（1）被诊断为重性抑郁症和（2）社交焦虑的个体的注意倾向。这些数据经过残差CNNs处理，通过从眼露图案生成的图像。利用ResNet架构获得的实验结果显示，三类系统平均准确率为48％，两类系统为62％。基于这些探索性发现，我们提出这种方法可以用于快速、生态和有效的心理健康筛查系统，通过眼动追踪评估幸福感。

更新时间: 2025-03-22 02:53:02

领域: cs.CV,cs.AI,cs.CY,cs.HC,cs.LG,68U01,J.3; I.2; I.5; H.4; C.3

下载: http://arxiv.org/abs/2503.17625v1

Towards Dynamic Trend Filtering through Trend Point Detection with Reinforcement Learning

Trend filtering simplifies complex time series data by applying smoothness to filter out noise while emphasizing proximity to the original data. However, existing trend filtering methods fail to reflect abrupt changes in the trend due to `approximateness,' resulting in constant smoothness. This approximateness uniformly filters out the tail distribution of time series data, characterized by extreme values, including both abrupt changes and noise. In this paper, we propose Trend Point Detection formulated as a Markov Decision Process (MDP), a novel approach to identifying essential points that should be reflected in the trend, departing from approximations. We term these essential points as Dynamic Trend Points (DTPs) and extract trends by interpolating them. To identify DTPs, we utilize Reinforcement Learning (RL) within a discrete action space and a forecasting sum-of-squares loss function as a reward, referred to as the Dynamic Trend Filtering network (DTF-net). DTF-net integrates flexible noise filtering, preserving critical original subsequences while removing noise as required for other subsequences. We demonstrate that DTF-net excels at capturing abrupt changes compared to other trend filtering algorithms and enhances forecasting performance, as abrupt changes are predicted rather than smoothed out.

Updated: 2025-03-22 02:52:18

标题: 朝向通过强化学习进行趋势点检测的动态趋势过滤

摘要: 趋势滤波通过对时间序列数据应用平滑性来简化复杂数据，以滤除噪音并强调与原始数据的接近程度。然而，现有的趋势滤波方法由于“近似性”而无法反映趋势中的突然变化，导致平滑性恒定。这种近似性均匀地滤除时间序列数据的尾部分布，其特征是极值，包括突然变化和噪音。在本文中，我们提出了一种以马尔可夫决策过程（MDP）为基础的趋势点检测方法，这是一种识别应该反映在趋势中的关键点的新方法，摆脱了近似性。我们将这些关键点称为动态趋势点（DTPs），并通过插值提取趋势。为了识别DTPs，我们利用强化学习（RL）在离散动作空间中，并将预测的平方损失函数作为奖励，称为动态趋势滤波网络（DTF-net）。DTF-net集成了灵活的噪声过滤，保留了关键的原始子序列，同时根据需要移除噪音。我们展示了与其他趋势滤波算法相比，DTF-net在捕捉突然变化方面表现出色，并提高了预测性能，因为突然变化被预测而不是被平滑掉。

更新时间: 2025-03-22 02:52:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.03665v2

Unraveling Pedestrian Fatality Patterns: A Comparative Study with Explainable AI

Road fatalities pose significant public safety and health challenges worldwide, with pedestrians being particularly vulnerable in vehicle-pedestrian crashes due to disparities in physical and performance characteristics. This study employs explainable artificial intelligence (XAI) to identify key factors contributing to pedestrian fatalities across the five U.S. states with the highest crash rates (2018-2022). It compares them to the five states with the lowest fatality rates. Using data from the Fatality Analysis Reporting System (FARS), the study applies machine learning techniques-including Decision Trees, Gradient Boosting Trees, Random Forests, and XGBoost-to predict contributing factors to pedestrian fatalities. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is utilized, while SHapley Additive Explanations (SHAP) values enhance model interpretability. The results indicate that age, alcohol and drug use, location, and environmental conditions are significant predictors of pedestrian fatalities. The XGBoost model outperformed others, achieving a balanced accuracy of 98 %, accuracy of 90 %, precision of 92 %, recall of 90 %, and an F1 score of 91 %. Findings reveal that pedestrian fatalities are more common in mid-block locations and areas with poor visibility, with older adults and substance-impaired individuals at higher risk. These insights can inform policymakers and urban planners in implementing targeted safety measures, such as improved lighting, enhanced pedestrian infrastructure, and stricter traffic law enforcement, to reduce fatalities and improve public safety.

Updated: 2025-03-22 02:44:41

标题: 揭示行人死亡模式：与可解释人工智能的比较研究

摘要: Road fatalities pose significant public safety and health challenges worldwide, with pedestrians being particularly vulnerable in vehicle-pedestrian crashes due to disparities in physical and performance characteristics. This study employs explainable artificial intelligence (XAI) to identify key factors contributing to pedestrian fatalities across the five U.S. states with the highest crash rates (2018-2022). It compares them to the five states with the lowest fatality rates. Using data from the Fatality Analysis Reporting System (FARS), the study applies machine learning techniques-including Decision Trees, Gradient Boosting Trees, Random Forests, and XGBoost-to predict contributing factors to pedestrian fatalities. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is utilized, while SHapley Additive Explanations (SHAP) values enhance model interpretability. The results indicate that age, alcohol and drug use, location, and environmental conditions are significant predictors of pedestrian fatalities. The XGBoost model outperformed others, achieving a balanced accuracy of 98 %, accuracy of 90 %, precision of 92 %, recall of 90 %, and an F1 score of 91 %. Findings reveal that pedestrian fatalities are more common in mid-block locations and areas with poor visibility, with older adults and substance-impaired individuals at higher risk. These insights can inform policymakers and urban planners in implementing targeted safety measures, such as improved lighting, enhanced pedestrian infrastructure, and stricter traffic law enforcement, to reduce fatalities and improve public safety.

更新时间: 2025-03-22 02:44:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17623v1

Debunking the CUDA Myth Towards GPU-based AI Systems

This paper presents a comprehensive evaluation of Intel Gaudi NPUs as an alternative to NVIDIA GPUs, which is currently the de facto standard in AI system design. First, we create a suite of microbenchmarks to compare Intel Gaudi-2 with NVIDIA A100, showing that Gaudi-2 achieves competitive performance not only in primitive AI compute, memory, and communication operations but also in executing several important AI workloads end-to-end. We then assess Gaudi NPU's programmability by discussing several software-level optimization strategies to employ for implementing critical FBGEMM operators and vLLM, evaluating their efficiency against GPU-optimized counterparts. Results indicate that Gaudi-2 achieves energy efficiency comparable to A100, though there are notable areas for improvement in terms of software maturity. Overall, we conclude that, with effective integration into high-level AI frameworks, Gaudi NPUs could challenge NVIDIA GPU's dominance in the AI server market, though further improvements are necessary to fully compete with NVIDIA's robust software ecosystem.

Updated: 2025-03-22 02:32:17

标题: 揭穿CUDA神话：走向基于GPU的人工智能系统

摘要: 这篇论文对Intel Gaudi NPUs作为替代NVIDIA GPU进行了全面评估，NVIDIA GPU目前是人工智能系统设计中的事实标准。首先，我们创建了一套微基准测试来比较Intel Gaudi-2和NVIDIA A100，在原始AI计算、内存和通信操作方面，Gaudi-2不仅取得了竞争性能，而且在执行多个重要的AI工作负载时也有竞争力。然后，我们通过讨论几种软件级优化策略来评估Gaudi NPU的可编程性，以用于实现关键的FBGEMM运算符和vLLM，并评估它们的效率与GPU优化对应部分相比。结果表明，尽管在软件成熟度方面还有改进的空间，Gaudi-2实现了与A100相当的能效。总体而言，我们得出结论，通过有效地整合到高级AI框架中，Gaudi NPUs可能会挑战NVIDIA GPU在AI服务器市场上的主导地位，尽管还需要进一步改进才能完全与NVIDIA强大的软件生态系统竞争。

更新时间: 2025-03-22 02:32:17

领域: cs.DC,cs.AI,cs.AR

下载: http://arxiv.org/abs/2501.00210v2

BAMDP Shaping: a Unified Framework for Intrinsic Motivation and Reward Shaping

Intrinsic motivation and reward shaping guide reinforcement learning (RL) agents by adding pseudo-rewards, which can lead to useful emergent behaviors. However, they can also encourage counterproductive exploits, e.g., fixation with noisy TV screens. Here we provide a theoretical model which anticipates these behaviors, and provides broad criteria under which adverse effects can be bounded. We characterize all pseudo-rewards as reward shaping in Bayes-Adaptive Markov Decision Processes (BAMDPs), which formulates the problem of learning in MDPs as an MDP over the agent's knowledge. Optimal exploration maximizes BAMDP state value, which we decompose into the value of the information gathered and the prior value of the physical state. Psuedo-rewards guide RL agents by rewarding behavior that increases these value components, while they hinder exploration when they align poorly with the actual value. We extend potential-based shaping theory to prove BAMDP Potential-based shaping Functions (BAMPFs) are immune to reward-hacking (convergence to behaviors maximizing composite rewards to the detriment of real rewards) in meta-RL, and show empirically how a BAMPF helps a meta-RL agent learn optimal RL algorithms for a Bernoulli Bandit domain. We finally prove that BAMPFs with bounded monotone increasing potentials also resist reward-hacking in the regular RL setting. We show that it is straightforward to retrofit or design new pseudo-reward terms in this form, and provide an empirical demonstration in the Mountain Car environment.

Updated: 2025-03-22 02:05:56

标题: BAMDP塑造：内在动机和奖励塑造的统一框架

摘要: 内在动机和奖励塑造指导增强学习（RL）代理通过添加伪奖励，这可以导致有用的新兴行为。然而，它们也可能鼓励不良利用，例如，对嘈杂的电视屏幕的固执。在这里，我们提供了一个理论模型，预测了这些行为，并提供了广泛的标准，根据这些标准，不良影响可以被限制。我们将所有伪奖励描述为Bayes自适应马尔可夫决策过程（BAMDPs）中的奖励塑造，这将MDPs中的学习问题建模为基于代理知识的MDP。最优探索最大化BAMDP状态值，我们将其分解为收集信息的价值和物理状态的先验价值。伪奖励通过奖励增加这些价值组件的行为来指导RL代理，然而，当它们与实际价值不匹配时，它们会阻碍探索。我们将潜在基础形成理论扩展到证明BAMDP潜在基础形成函数（BAMPFs）在元RL中免受奖励黑客攻击（收敛到最大化综合奖励以损害真正奖励的行为），并通过经验证明BAMPF如何帮助元RL代理学习Bernoulli多臂老虎机领域的最优RL算法。最后，我们证明具有有界单调递增潜能的BAMPFs也在常规RL设置中抵抗奖励黑客攻击。我们展示这种形式的伪奖励术语的改造或设计是简单的，并在Mountain Car环境中提供经验演示。

更新时间: 2025-03-22 02:05:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.05358v2

A Survey on Structured State Space Sequence (S4) Models

Recent advancements in sequence modeling have led to the emergence of Structured State Space Models (SSMs) as an efficient alternative to Recurrent Neural Networks (RNNs) and Transformers, addressing challenges in long-range dependency modeling and computational efficiency. While RNNs suffer from vanishing gradients and sequential inefficiencies, and Transformers face quadratic complexity, SSMs leverage structured recurrence and state-space representations to achieve superior long-sequence processing with linear or near-linear complexity. This survey provides a comprehensive review of SSMs, tracing their evolution from the foundational S4 model to its successors like Mamba, Simplified Structured State Space Sequence Model (S5), and Jamba, highlighting their improvements in computational efficiency, memory optimization, and inference speed. By comparing SSMs with traditional sequence models across domains such as natural language processing (NLP), speech recognition, vision, and time-series forecasting, we demonstrate their advantages in handling long-range dependencies while reducing computational overhead. Despite their potential, challenges remain in areas such as training optimization, hybrid modeling, and interpretability. This survey serves as a structured guide for researchers and practitioners, detailing the advancements, trade-offs, and future directions of SSM-based architectures in AI and deep learning.

Updated: 2025-03-22 01:55:32

标题: 一个关于结构化状态空间序列（S4）模型的调查

摘要: 最近序列建模的进展导致了结构化状态空间模型（SSMs）的出现，作为长程依赖建模和计算效率方面的有效替代方案，而不是循环神经网络（RNNs）和变压器。虽然RNNs存在梯度消失和顺序效率低下的问题，而变压器面临二次复杂度，但SSMs利用结构化的循环和状态空间表示来实现具有线性或接近线性复杂度的优越长序列处理。本调查提供了对SSMs的全面审查，追溯了它们从基础S4模型到其后继者如Mamba、简化的结构化状态空间序列模型（S5）和Jamba的进化，突出了它们在计算效率、内存优化和推理速度方面的改进。通过在自然语言处理（NLP）、语音识别、视觉和时间序列预测等领域比较SSMs与传统序列模型，我们展示了它们在处理长程依赖关系时减少计算开销的优势。尽管它们具有潜力，但在培训优化、混合建模和可解释性等领域仍然存在挑战。本调查为研究人员和从业者提供了结构化指南，详细介绍了基于SSM的架构在人工智能和深度学习中的进展、权衡和未来方向。

更新时间: 2025-03-22 01:55:32

领域: cs.LG

下载: http://arxiv.org/abs/2503.18970v1

Explainable identification of similarities between entities for discovery in large text

With the availability of virtually infinite number text documents in digital format, automatic comparison of textual data is essential for extracting meaningful insights that are difficult to identify manually. Many existing tools, including AI and large language models, struggle to provide precise and explainable insights into textual similarities. In many cases they determine the similarity between documents as reflected by the text, rather than the similarities between the subjects being discussed in these documents. This study addresses these limitations by developing an n-gram analysis framework designed to compare documents automatically and uncover explainable similarities. A scoring formula is applied to assigns each of the n-grams with a weight, where the weight is higher when the n-grams are more frequent in both documents, but is penalized when the n-grams are more frequent in the English language. Visualization tools like word clouds enhance the representation of these patterns, providing clearer insights. The findings demonstrate that this framework effectively uncovers similarities between text documents, offering explainable insights that are often difficult to identify manually. This non-parametric approach provides a deterministic solution for identifying similarities across various fields, including biographies, scientific literature, historical texts, and more. Code for the method is publicly available.

Updated: 2025-03-22 01:20:43

标题: 可解释的实体相似性识别用于大文本发现

摘要: 随着数字格式中几乎无限数量的文本文档的可用性，自动比较文本数据对于提取难以手工识别的有意义的见解至关重要。许多现有工具，包括人工智能和大型语言模型，很难提供精确和可解释的关于文本相似性的见解。在许多情况下，它们确定文档之间的相似性，而不是这些文档中讨论的主题之间的相似性。本研究通过开发一个n-gram分析框架来自动比较文档并揭示可解释的相似性来解决这些限制。一个评分公式被应用于为每个n-gram分配权重，其中当n-gram在两个文档中更频繁时，权重更高，但当n-gram在英语语言中更频繁时会受到惩罚。诸如词云之类的可视化工具增强了这些模式的表征，提供更清晰的见解。研究结果表明，该框架有效地揭示了文本文档之间的相似性，提供了通常难以手工识别的可解释的见解。这种非参数方法为跨领域的相似性识别提供了确定性解决方案，包括传记、科学文献、历史文本等。该方法的代码是公开可用的。

更新时间: 2025-03-22 01:20:43

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2503.17605v1

OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery

Large Language Models (LLMs) have demonstrated remarkable potential in advancing scientific knowledge and addressing complex challenges. In this work, we introduce OmniScience, a specialized large reasoning model for general science, developed through three key components: (1) domain adaptive pretraining on a carefully curated corpus of scientific literature, (2) instruction tuning on a specialized dataset to guide the model in following domain-specific tasks, and (3) reasoning-based knowledge distillation through fine-tuning to significantly enhance its ability to generate contextually relevant and logically sound responses. We demonstrate the versatility of OmniScience by developing a battery agent that efficiently ranks molecules as potential electrolyte solvents or additives. Comprehensive evaluations reveal that OmniScience is competitive with state-of-the-art large reasoning models on the GPQA Diamond and domain-specific battery benchmarks, while outperforming all public reasoning and non-reasoning models with similar parameter counts. We further demonstrate via ablation experiments that domain adaptive pretraining and reasoning-based knowledge distillation are critical to attain our performance levels, across benchmarks.

Updated: 2025-03-22 01:18:59

标题: OmniScience：一种面向科学推理和发现的领域专用LLM

摘要: 大型语言模型(LLMs)已经展现出在推动科学知识和解决复杂挑战方面的显著潜力。在这项工作中，我们介绍了OmniScience，这是一个专门针对一般科学的大型推理模型，通过三个关键组成部分开发而成：(1)在精心策划的科学文献语料库上进行领域自适应预训练，(2)在专门数据集上进行指导调整，以指导模型遵循特定领域的任务，以及(3)通过细调进行基于推理的知识蒸馏，显著增强其生成上下文相关和逻辑合理回答的能力。我们通过开发一个电池代理程序来展示OmniScience的多功能性，该程序能够有效地对分子进行排名，以确定其作为潜在电解质溶剂或添加剂的可能性。全面的评估表明，OmniScience在GPQA Diamond和领域特定电池基准测试中与最先进的大型推理模型竞争，同时在所有公开推理和非推理模型中表现优异，且参数数量相似。我们进一步通过消融实验证明，领域自适应预训练和基于推理的知识蒸馏对于实现我们的性能水平至关重要，跨基准测试。

更新时间: 2025-03-22 01:18:59

领域: cs.AI

下载: http://arxiv.org/abs/2503.17604v1

A Generative Caching System for Large Language Models

Caching has the potential to be of significant benefit for accessing large language models (LLMs) due to their high latencies which typically range from a small number of seconds to well over a minute. Furthermore, many LLMs charge money for queries; caching thus has a clear monetary benefit. This paper presents a new caching system for improving user experiences with LLMs. In addition to reducing both latencies and monetary costs for accessing LLMs, our system also provides important features that go beyond the performance benefits typically associated with caches. A key feature we provide is generative caching, wherein multiple cached responses can be synthesized to provide answers to queries which have never been seen before. Our generative caches function as repositories of valuable information which can be mined and analyzed. We also improve upon past semantic caching techniques by tailoring the caching algorithms to optimally balance cost and latency reduction with the quality of responses provided. Performance tests indicate that our caches are considerably faster than GPTcache.

Updated: 2025-03-22 01:17:56

标题: 一种用于大型语言模型的生成式缓存系统

摘要: 缓存具有潜力对访问大型语言模型（LLMs）产生显著好处，因为它们通常具有很高的延迟，从几秒到一分钟以上不等。此外，许多LLMs对查询收费；因此，缓存具有明显的经济效益。本文提出了一种新的缓存系统，用于改善用户与LLMs的体验。除了降低访问LLMs的延迟和经济成本外，我们的系统还提供了超出通常与缓存相关的性能优势的重要功能。我们提供的一个关键功能是生成缓存，其中多个缓存响应可以合成以回答以前从未见过的查询。我们的生成缓存作为有价值信息的存储库，可以进行挖掘和分析。我们还通过调整缓存算法来改进过去的语义缓存技术，以最佳平衡成本和延迟降低与提供响应质量。性能测试表明，我们的缓存比GPTcache快得多。

更新时间: 2025-03-22 01:17:56

领域: cs.DB,cs.AI,cs.DC,cs.NI

下载: http://arxiv.org/abs/2503.17603v1

Where are we in audio deepfake detection? A systematic analysis over generative and detection models

Recent advances in Text-to-Speech (TTS) and Voice-Conversion (VC) using generative Artificial Intelligence (AI) technology have made it possible to generate high-quality and realistic human-like audio. This poses growing challenges in distinguishing AI-synthesized speech from the genuine human voice and could raise concerns about misuse for impersonation, fraud, spreading misinformation, and scams. However, existing detection methods for AI-synthesized audio have not kept pace and often fail to generalize across diverse datasets. In this paper, we introduce SONAR, a synthetic AI-Audio Detection Framework and Benchmark, aiming to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. SONAR includes a novel evaluation dataset sourced from 9 diverse audio synthesis platforms, including leading TTS providers and state-of-the-art TTS models. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems. Through extensive experiments, (1) we reveal the limitations of existing detection methods and demonstrate that foundation models exhibit stronger generalization capabilities, likely due to their model size and the scale and quality of pretraining data. (2) Speech foundation models demonstrate robust cross-lingual generalization capabilities, maintaining strong performance across diverse languages despite being fine-tuned solely on English speech data. This finding also suggests that the primary challenges in audio deepfake detection are more closely tied to the realism and quality of synthetic audio rather than language-specific characteristics. (3) We explore the effectiveness and efficiency of few-shot fine-tuning in improving generalization, highlighting its potential for tailored applications, such as personalized detection systems for specific entities or individuals.

Updated: 2025-03-22 01:10:56

标题: 我们在音频深度伪造检测领域的位置如何？对生成和检测模型进行系统分析

摘要: 最近，利用生成式人工智能技术在文本转语音（TTS）和语音转换（VC）领域取得了重大进展，使得生成高质量和逼真人类化音频成为可能。这带来了在区分AI合成的语音和真实人类声音方面不断增长的挑战，并可能引发对于滥用冒充、欺诈、传播错误信息和诈骗的担忧。然而，现有的用于AI合成音频检测的方法没有跟上步伐，通常无法在各种数据集上进行泛化。在本文中，我们介绍了SONAR，一个合成AI音频检测框架和基准，旨在为区分尖端AI合成听觉内容提供全面评估。SONAR包括一个新颖的评估数据集，来源于9个不同的音频合成平台，包括领先的TTS提供商和最先进的TTS模型。这是第一个统一评估AI音频检测在传统和基础模型检测系统中的框架。通过广泛的实验，我们揭示了现有检测方法的局限性，并展示了基础模型具有更强的泛化能力，可能是由于其模型大小和预训练数据的规模和质量所致。语音基础模型展示了强大的跨语言泛化能力，尽管仅在英语语音数据上微调，但跨各种语言仍表现出色。这一发现也表明，音频深度伪造检测中的主要挑战更多地与合成音频的逼真度和质量相关，而不是与特定语言特征相关。我们探讨了少样本微调在改善泛化能力方面的有效性和效率，突显了其在定制应用中的潜力，例如针对特定实体或个人的个性化检测系统。

更新时间: 2025-03-22 01:10:56

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.04324v4

GPBench: A Comprehensive and Fine-Grained Benchmark for Evaluating Large Language Models as General Practitioners

General practitioners (GPs) serve as the cornerstone of primary healthcare systems by providing continuous and comprehensive medical services. However, due to community-oriented nature of their practice, uneven training and resource gaps, the clinical proficiency among GPs can vary significantly across regions and healthcare settings. Currently, Large Language Models (LLMs) have demonstrated great potential in clinical and medical applications, making them a promising tool for supporting general practice. However, most existing benchmarks and evaluation frameworks focus on exam-style assessments-typically multiple-choice question-lack comprehensive assessment sets that accurately mirror the real-world scenarios encountered by GPs. To evaluate how effectively LLMs can make decisions in the daily work of GPs, we designed GPBench, which consists of both test questions from clinical practice and a novel evaluation framework. The test set includes multiple-choice questions that assess fundamental knowledge of general practice, as well as realistic, scenario-based problems. All questions are meticulously annotated by experts, incorporating rich fine-grained information related to clinical management. The proposed LLM evaluation framework is based on the competency model for general practice, providing a comprehensive methodology for assessing LLM performance in real-world settings. As the first large-model evaluation set targeting GP decision-making scenarios, GPBench allows us to evaluate current mainstream LLMs. Expert assessment and evaluation reveal that in areas such as disease staging, complication recognition, treatment detail, and medication usage, these models exhibit at least ten major shortcomings. Overall, existing LLMs are not yet suitable for independent use in real-world GP working scenarios without human oversight.

Updated: 2025-03-22 01:02:44

标题: GPBench：一种全面且细粒度的评估大型语言模型作为全科医生的基准测试

摘要: 家庭医生（GPs）通过提供持续和全面的医疗服务，成为初级医疗保健系统的基石。然而，由于他们实践的社区导向性质、不均匀的培训和资源差距，家庭医生之间的临床熟练度在不同地区和医疗环境中可能存在显著差异。目前，大型语言模型（LLMs）在临床和医疗应用中展现出巨大潜力，使它们成为支持一般实践的有前途的工具。然而，大多数现有的基准和评估框架集中在考试风格的评估上-通常是多项选择题-缺乏准确反映家庭医生所遇到的真实场景的全面评估集。为了评估LLMs在家庭医生日常工作中的决策效果，我们设计了GPBench，其中包括来自临床实践的测试题和一种新颖的评估框架。测试集包括多项选择题，评估一般实践的基本知识，以及基于实际场景的问题。所有问题都由专家精心注释，包含与临床管理相关的丰富细致信息。所提出的LLM评估框架基于一般实践的能力模型，为评估LLM在真实环境中的表现提供了全面的方法。作为针对GP决策场景的第一个大型模型评估集，GPBench使我们能够评估当前主流LLMs。专家评估和评估显示，在疾病分期、并发症识别、治疗细节和药物使用等领域，这些模型至少存在十个主要缺陷。总的来说，现有的LLMs在没有人类监督的情况下，尚不适合在真实的GP工作场景中独立使用。

更新时间: 2025-03-22 01:02:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.17599v1

Benchmark Dataset for Pore-Scale CO2-Water Interaction

Accurately capturing the complex interaction between CO2 and water in porous media at the pore scale is essential for various geoscience applications, including carbon capture and storage (CCS). We introduce a comprehensive dataset generated from high-fidelity numerical simulations to capture the intricate interaction between CO2 and water at the pore scale. The dataset consists of 624 2D samples, each of size 512x512 with a resolution of 35 {\mu}m, covering 100 time steps under a constant CO2 injection rate. It includes various levels of heterogeneity, represented by different grain sizes with random variation in spacing, offering a robust testbed for developing predictive models. This dataset provides high-resolution temporal and spatial information crucial for benchmarking machine learning models.

Updated: 2025-03-22 00:42:42

标题: Pore-Scale CO2-Water Interaction的基准数据集

摘要: 在多学科应用中准确捕捉多孔介质中二氧化碳和水之间复杂相互作用的孔隙尺度至关重要，包括碳捕获和储存（CCS）。我们引入了一套由高保真度数值模拟生成的全面数据集，以捕捉孔隙尺度上二氧化碳和水之间复杂的相互作用。该数据集包含624个2D样本，每个样本尺寸为512x512，分辨率为35μm，覆盖了100个时间步骤下恒定二氧化碳注入速率的情况。它包括不同颗粒大小的各种异质性水平，通过间距的随机变化表示，为开发预测模型提供了坚固的测试基础。该数据集提供了高分辨率的时间和空间信息，对于评估机器学习模型至关重要。

更新时间: 2025-03-22 00:42:42

领域: physics.chem-ph,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2503.17592v1

Coarse Set Theory for AI Ethics and Decision-Making: A Mathematical Framework for Granular Evaluations

As artificial intelligence (AI) systems become increasingly embedded in ethically sensitive domains such as education, healthcare, and transportation, the need to balance accuracy and interpretability in decision-making has become a central concern. Coarse Ethics (CE) is a theoretical framework that justifies coarse-grained evaluations, such as letter grades or warning labels, as ethically appropriate under cognitive and contextual constraints. However, CE has lacked mathematical formalization. This paper introduces Coarse Set Theory (CST), a novel mathematical framework that models coarse-grained decision-making using totally ordered structures and coarse partitions. CST defines hierarchical relations among sets and uses information-theoretic tools, such as Kullback-Leibler Divergence, to quantify the trade-off between simplification and information loss. We demonstrate CST through applications in educational grading and explainable AI (XAI), showing how it enables more transparent and context-sensitive evaluations. By grounding coarse evaluations in set theory and probabilistic reasoning, CST contributes to the ethical design of interpretable AI systems. This work bridges formal methods and human-centered ethics, offering a principled approach to balancing comprehensibility, fairness, and informational integrity in AI-driven decisions.

Updated: 2025-03-22 00:30:11

标题: 粗糙集理论用于人工智能道德和决策：一种用于粒度评估的数学框架

摘要: 随着人工智能（AI）系统越来越多地应用于教育、医疗和交通等伦理敏感领域，平衡准确性和可解释性在决策中的需求已成为一个核心关注点。粗糙伦理学（CE）是一个理论框架，将粗粒度评估（如字母等级或警告标签）作为在认知和语境约束下伦理适当的理由。然而，CE缺乏数学形式化。本文介绍了粗糙集理论（CST），这是一个新颖的数学框架，通过完全有序结构和粗分区模拟粗粒度决策。CST定义了集合之间的层次关系，并使用信息论工具，如Kullback-Leibler散度，来量化简化与信息损失之间的权衡。我们通过教育评分和可解释AI（XAI）的应用演示了CST，展示了它如何实现更透明和上下文敏感的评估。通过将粗糙评估基于集合论和概率推理，CST有助于设计可解释的AI系统。这项工作将形式化方法与以人为中心的伦理学联系起来，提供了一种原则性方法来平衡AI驱动的决策中的可理解性、公平性和信息完整性。

更新时间: 2025-03-22 00:30:11

领域: cs.AI,cs.IT,math.IT,math.LO,math.PR

下载: http://arxiv.org/abs/2502.07347v5

LEMIX: Enabling Testing of Embedded Applications as Linux Applications

Dynamic analysis, through rehosting, is an important capability for security assessment in embedded systems software. Existing rehosting techniques aim to provide high-fidelity execution by accurately emulating hardware and peripheral interactions. However, these techniques face challenges in adoption due to the increasing number of available peripherals and the complexities involved in designing emulation models for diverse hardware. Additionally, contrary to the prevailing belief that guides existing works, our analysis of reported bugs shows that high-fidelity execution is not required to expose most bugs in embedded software. Our key hypothesis is that security vulnerabilities are more likely to arise at higher abstraction levels. To substantiate our hypothesis, we introduce LEMIX, a framework enabling dynamic analysis of embedded applications by rehosting them as x86 Linux applications decoupled from hardware dependencies. Enabling embedded applications to run natively on Linux facilitates security analysis using available techniques and takes advantage of the powerful hardware available on the Linux platform for higher testing throughput. We develop various techniques to address the challenges involved in converting embedded applications to Linux applications. We evaluated LEMIX on 18 real-world embedded applications across four RTOSes and found 21 new bugs in 12 of the applications and all 4 of the RTOS kernels. We report that LEMIX is superior to existing state-of-the-art techniques both in terms of code coverage (~2x more coverage) and bug detection (18 more bugs).

Updated: 2025-03-22 00:14:47

标题: LEMIX：使嵌入式应用程序作为Linux应用程序进行测试

摘要: 动态分析，通过重新托管，是嵌入式系统软件安全评估的重要能力。现有的重新托管技术旨在通过准确模拟硬件和外围交互来提供高保真执行。然而，这些技术面临着采用的挑战，因为可用外围设备数量的增加以及为各种硬件设计仿真模型所涉及的复杂性。此外，与现有作品引导的普遍信念相反，我们对报告的错误进行的分析表明，在嵌入式软件中暴露大多数错误并不需要高保真执行。我们的主要假设是安全漏洞更有可能在更高的抽象级别出现。为了证实我们的假设，我们引入了LEMIX，这是一个框架，通过将嵌入式应用程序重新托管为与硬件依赖分离的x86 Linux应用程序，从而实现对嵌入式应用程序的动态分析。使嵌入式应用程序在Linux上原生运行，利用Linux平台上强大的硬件进行更高的测试吞吐量，有助于使用现有技术进行安全分析。我们开发了各种技术来解决将嵌入式应用程序转换为Linux应用程序所涉及的挑战。我们在四个RTOS上评估了LEMIX对18个真实世界的嵌入式应用程序，并在其中12个应用程序和所有4个RTOS内核中发现了21个新错误。我们报告说，LEMIX在代码覆盖率（覆盖率增加了约2倍）和错误检测（增加了18个错误）方面优于现有最先进的技术。

更新时间: 2025-03-22 00:14:47

领域: cs.CR,cs.OS,D.4.6; D.4.9; K.6.5

下载: http://arxiv.org/abs/2503.17588v1

ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently

Recent advancements in large language models (LLMs) integrating explicit reasoning, such as OpenAI's o3-mini, DeepSeek-R1, and QWQ-32B, enable smaller models to solve complex tasks by generating intermediate reasoning steps prior to providing answers. However, this approach significantly increases computational costs, both monetarily and environmentally. The widely-used self-consistency method further exacerbates these costs by aggregating multiple reasoning paths to improve accuracy, often requiring between 40 to 64 samples per task. Although aggregation effectively reduces variance and bias, additional sampling can lead to diminishing returns when early samples yield consistent results. To address inefficiencies, we propose leveraging Sequential Probability Ratio Testing (SPRT) to dynamically terminate sampling once sufficient consistency is achieved. We calibrate SPRT parameters specifically for LLM applications, accounting for sensitivity to detect the mode of the distribution. Our experiments demonstrate that incorporating SPRT significantly enhances token efficiency, achieving comparable accuracy to self-consistency methods but at a substantially reduced computational cost. To promote transparency and facilitate reproducibility, we have made the source code and datasets used in our experiments publicly available at our GitHub repository: https://github.com/LiuzLab/consol, or available as a PyPI package: pip install consol. We hope that this resource will support further research and encourage the development of new methods building upon our work.

Updated: 2025-03-22 00:07:28

标题: ConSol：顺序概率比检验以高效寻找一致的LLM推理路径

摘要: 最近，大型语言模型（LLMs）的最新进展，如OpenAI的o3-mini、DeepSeek-R1和QWQ-32B，集成了显式推理，使较小的模型能够通过生成中间推理步骤来解决复杂任务，然后提供答案。然而，这种方法显著增加了计算成本，无论是在金钱上还是在环境上。广泛使用的自一致性方法进一步加剧了这些成本，通过聚合多个推理路径来提高准确性，通常每个任务需要40到64个样本。尽管聚合有效地减少了方差和偏差，但额外的采样可能导致收益递减，当早期样本产生一致的结果时。为了解决效率问题，我们提议利用顺序概率比测试（SPRT）在达到足够一致性后动态终止采样。我们特别为LLM应用校准SPRT参数，考虑到对检测分布模式的敏感性。我们的实验表明，引入SPRT显著提高了标记效率，实现了与自一致性方法相当的准确性，但计算成本大大降低。为促进透明度和便于再现性，我们已经在我们的GitHub存储库上公开了实验中使用的源代码和数据集：https://github.com/LiuzLab/consol，或者作为一个PyPI包提供：pip install consol。我们希望这一资源能够支持进一步的研究，并鼓励在我们的工作基础上开发新方法。

更新时间: 2025-03-22 00:07:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.17587v1

Samudra: An AI Global Ocean Emulator for Climate

AI emulators for forecasting have emerged as powerful tools that can outperform conventional numerical predictions. The next frontier is to build emulators for long climate simulations with skill across a range of spatiotemporal scales, a particularly important goal for the ocean. Our work builds a skillful global emulator of the ocean component of a state-of-the-art climate model. We emulate key ocean variables, sea surface height, horizontal velocities, temperature, and salinity, across their full depth. We use a modified ConvNeXt UNet architecture trained on multidepth levels of ocean data. We show that the ocean emulator - Samudra - which exhibits no drift relative to the truth, can reproduce the depth structure of ocean variables and their interannual variability. Samudra is stable for centuries and 150 times faster than the original ocean model. Samudra struggles to capture the correct magnitude of the forcing trends and simultaneously remain stable, requiring further work.

Updated: 2025-03-22 00:03:21

标题: 萨姆德拉：用于气候的人工智能全球海洋仿真器

摘要: AI仿真器用于预测已经成为强大的工具，可以胜过传统的数值预测。下一个前沿是构建能够跨越一系列时空尺度具有技能的长期气候模拟的仿真器，这对于海洋来说尤为重要。我们的工作建立了一个高技能的全球海洋组分仿真器，该仿真器基于最先进的气候模型。我们模拟了关键的海洋变量，包括海面高度、水平速度、温度和盐度，涵盖它们的全深度。我们使用一个经过修改的ConvNeXt UNet架构，对多深度水文数据进行训练。我们展示了海洋仿真器Samudra可以复制海洋变量的深度结构和它们的年际变化，而且与真实值没有漂移。Samudra稳定运行数个世纪，比原始海洋模型快150倍。然而，Samudra在捕捉强制趋势的正确幅度并同时保持稳定方面存在困难，需要进一步的研究。

更新时间: 2025-03-22 00:03:21

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2412.03795v3