Arxiv Day: Article

Who Are You Behind the Screen? Implicit MBTI and Gender Detection Using Artificial Intelligence

In personalized technology and psychological research, precisely detecting demographic features and personality traits from digital interactions becomes ever more important. This work investigates implicit categorization, inferring personality and gender variables directly from linguistic patterns in Telegram conversation data, while conventional personality prediction techniques mostly depend on explicitly self-reported labels. We refine a Transformer-based language model (RoBERTa) to capture complex linguistic cues indicative of personality traits and gender differences using a dataset comprising 138,866 messages from 1,602 users annotated with MBTI types and 195,016 messages from 2,598 users annotated with gender. Confidence levels help to greatly raise model accuracy to 86.16\%, hence proving RoBERTa's capacity to consistently identify implicit personality types from conversational text data. Our results highlight the usefulness of Transformer topologies for implicit personality and gender classification, hence stressing their efficiency and stressing important trade-offs between accuracy and coverage in realistic conversational environments. With regard to gender classification, the model obtained an accuracy of 74.4\%, therefore capturing gender-specific language patterns. Personality dimension analysis showed that people with introverted and intuitive preferences are especially more active in text-based interactions. This study emphasizes practical issues in balancing accuracy and data coverage as Transformer-based models show their efficiency in implicit personality and gender prediction tasks from conversational texts.

Updated: 2025-03-14 23:59:45

标题: 你在屏幕后面是谁？使用人工智能进行隐式MBTI和性别检测

摘要: 在个性化技术和心理研究中，精确地从数字交互中检测人口特征和个性特征变得越来越重要。本文调查了隐式分类，从Telegram对话数据中直接推断人格和性别变量，而传统的人格预测技术大多依赖于明确的自我报告标签。我们通过一个包含来自1,602名用户的138,866条消息并标有MBTI类型的数据集，以及来自2,598名用户并标有性别的195,016条消息，精炼了一个基于Transformer的语言模型（RoBERTa），以捕捉表明个性特征和性别差异的复杂语言提示。置信水平有助于大幅提高模型的准确性达到86.16％，从而证明RoBERTa能够始终从对话文本数据中准确识别隐含的个性类型。我们的结果强调了Transformer拓扑在隐含个性和性别分类中的实用性，强调了它们在现实对话环境中准确性和覆盖率之间的重要权衡。就性别分类而言，模型获得了74.4％的准确性，因此捕获了性别特定的语言模式。人格维度分析显示，具有内向和直觉偏好的人在基于文本的互动中特别活跃。这项研究强调了在Transformer模型在对话文本中展示其效率时，平衡准确性和数据覆盖率的实际问题。

更新时间: 2025-03-14 23:59:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09853v2

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Mitigating reward hacking--where AI systems misbehave due to flaws or misspecifications in their learning objectives--remains a key challenge in constructing capable and aligned models. We show that we can monitor a frontier reasoning model, such as OpenAI o3-mini, for reward hacking in agentic coding environments by using another LLM that observes the model's chain-of-thought (CoT) reasoning. CoT monitoring can be far more effective than monitoring agent actions and outputs alone, and we further found that a LLM weaker than o3-mini, namely GPT-4o, can effectively monitor a stronger model. Because CoT monitors can be effective at detecting exploits, it is natural to ask whether those exploits can be suppressed by incorporating a CoT monitor directly into the agent's training objective. While we show that integrating CoT monitors into the reinforcement learning reward can indeed produce more capable and more aligned agents in the low optimization regime, we find that with too much optimization, agents learn obfuscated reward hacking, hiding their intent within the CoT while still exhibiting a significant rate of reward hacking. Because it is difficult to tell when CoTs have become obfuscated, it may be necessary to pay a monitorability tax by not applying strong optimization pressures directly to the chain-of-thought, ensuring that CoTs remain monitorable and useful for detecting misaligned behavior.

Updated: 2025-03-14 23:50:34

标题: 监测推理模型中的不当行为和促进混淆风险

摘要: 缓解奖励欺骗-即AI系统由于学习目标中的缺陷或错误规范而表现不当-仍然是构建功能强大和对齐的模型的关键挑战。我们展示了我们可以通过使用观察模型的思维链(CoT)推理的另一个LLM来监视一个前沿推理模型，例如OpenAI o3-mini，在代理编码环境中进行奖励欺骗。CoT监视可能比仅监视代理动作和输出更有效，我们进一步发现，比o3-mini弱的LLM，即GPT-4o，可以有效监视一个更强大的模型。由于CoT监视器可以有效地检测到漏洞，自然而然地问是否这些漏洞可以通过将CoT监视器直接纳入代理的训练目标来抑制。虽然我们展示了将CoT监视器整合到强化学习奖励中确实可以在低优化区域产生更有能力和更对齐的代理，但我们发现，如果优化过度，代理会学会模糊的奖励欺骗，将其意图隐藏在CoT中，同时仍然表现出显著的奖励欺骗率。由于很难确定何时CoT已变得模糊，因此可能需要通过不直接向思维链施加强大的优化压力来支付可监控性税，确保CoT保持可监控且有用于检测不对齐行为。

更新时间: 2025-03-14 23:50:34

领域: cs.AI

下载: http://arxiv.org/abs/2503.11926v1

Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention

Graph Neural Networks (GNNs) have become important tools for machine learning on graph-structured data. In this paper, we explore the synergistic combination of graph encoding, graph rewiring, and graph attention, by introducing Graph Attention with Stochastic Structures (GRASS), a novel GNN architecture. GRASS utilizes relative random walk probabilities (RRWP) encoding and a novel decomposed variant (D-RRWP) to efficiently capture structural information. It rewires the input graph by superimposing a random regular graph to enhance long-range information propagation. It also employs a novel additive attention mechanism tailored for graph-structured data. Our empirical evaluations demonstrate that GRASS achieves state-of-the-art performance on multiple benchmark datasets, including a 20.3% reduction in mean absolute error on the ZINC dataset.

Updated: 2025-03-14 23:47:53

标题: 更绿的GRASS：通过编码、重连和注意力增强GNNs

摘要: 图形神经网络（GNNs）已成为处理图结构数据的重要工具。本文探讨了图编码、图重连和图注意力的协同组合，引入了一种新颖的GNN架构——具有随机结构的图注意力（GRASS）。GRASS利用相对随机行走概率（RRWP）编码和一种新颖的分解变体（D-RRWP）来有效捕捉结构信息。它通过叠加一个随机正则图对输入图进行重连，以增强远程信息传播。它还采用了一种针对图结构数据量身定制的新颖的加性注意力机制。我们的实证评估表明，GRASS在多个基准数据集上取得了最先进的性能，包括在ZINC数据集上平均绝对误差减少了20.3%。

更新时间: 2025-03-14 23:47:53

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2407.05649v5

REGEN: A Dataset and Benchmarks with Natural Language Critiques and Narratives

This paper introduces a novel dataset REGEN (Reviews Enhanced with GEnerative Narratives), designed to benchmark the conversational capabilities of recommender Large Language Models (LLMs), addressing the limitations of existing datasets that primarily focus on sequential item prediction. REGEN extends the Amazon Product Reviews dataset by inpainting two key natural language features: (1) user critiques, representing user "steering" queries that lead to the selection of a subsequent item, and (2) narratives, rich textual outputs associated with each recommended item taking into account prior context. The narratives include product endorsements, purchase explanations, and summaries of user preferences. Further, we establish an end-to-end modeling benchmark for the task of conversational recommendation, where models are trained to generate both recommendations and corresponding narratives conditioned on user history (items and critiques). For this joint task, we introduce a modeling framework LUMEN (LLM-based Unified Multi-task Model with Critiques, Recommendations, and Narratives) which uses an LLM as a backbone for critiquing, retrieval and generation. We also evaluate the dataset's quality using standard auto-rating techniques and benchmark it by training both traditional and LLM-based recommender models. Our results demonstrate that incorporating critiques enhances recommendation quality by enabling the recommender to learn language understanding and integrate it with recommendation signals. Furthermore, LLMs trained on our dataset effectively generate both recommendations and contextual narratives, achieving performance comparable to state-of-the-art recommenders and language models.

Updated: 2025-03-14 23:47:46

标题: REGEN：一个带有自然语言评论和叙述的数据集和基准测试

摘要: 这篇论文介绍了一个新颖的数据集REGEN（Reviews Enhanced with GEnerative Narratives），旨在评估推荐型大型语言模型（LLMs）的对话能力，解决现有数据集主要集中在序贯项目预测方面的局限性。REGEN通过填充两个关键的自然语言特征来扩展亚马逊产品评论数据集：（1）用户评论，代表用户“引导”查询，导致选择后续项目的关键，以及（2）叙述，与每个推荐项目相关联的丰富文本输出，考虑先前的上下文。这些叙述包括产品认可、购买解释和用户偏好摘要。此外，我们建立了一个端到端的建模基准，用于对话推荐任务，模型被训练生成推荐和相应叙述，条件是用户历史（项目和评论）。对于这个联合任务，我们引入了一个建模框架LUMEN（基于LLM的统一多任务模型，包括评论、推荐和叙述），它使用LLM作为批评、检索和生成的支撑。我们还通过使用标准自动评级技术来评估数据集的质量，并通过训练传统和基于LLM的推荐模型来进行基准测试。我们的结果表明，整合评论可以通过使推荐者学习语言理解并将其与推荐信号整合，从而提高推荐质量。此外，我们在我们的数据集上训练的LLMs有效地生成了推荐和上下文叙述，实现了与最先进的推荐者和语言模型相媲美的性能。

更新时间: 2025-03-14 23:47:46

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2503.11924v1

Unclonable Functional Encryption

In a functional encryption (FE) scheme, a user that holds a ciphertext and a function key can learn the result of applying the function to the plaintext message. Security requires that the user does not learn anything beyond the function evaluation. We extend this notion to the quantum setting by providing definitions and a construction for a quantum functional encryption (QFE) scheme which allows for the evaluation of polynomialy-sized circuits on arbitrary quantum messages. Our construction is built upon quantum garbled circuits [BY22]. We also investigate the relationship of QFE to the seemingly unrelated notion of unclonable encryption (UE) and find that any QFE scheme universally achieves the property of unclonable functional encryption (UFE). In particular we assume the existence of an unclonable encryption scheme with quantum decryption keys which was recently constructed by [AKY24]. Our UFE guarantees that two parties cannot simultaneously recover the correct function outputs using two independently sampled function secret keys. As an application we give the first construction for public-key UE with variable decryption keys. Lastly, we establish a connection between quantum indistinguishability obfuscation (qiO) and quantum functional encryption (QFE); Showing that any multi-input indistinguishability-secure quantum functional encryption scheme unconditionally implies the existence of qiO.

Updated: 2025-03-14 23:27:01

标题: 不可克隆的功能加密

摘要: 在功能加密（FE）方案中，持有密文和函数密钥的用户可以了解将函数应用于明文消息的结果。安全性要求用户除了函数评估外不得了解任何信息。我们通过提供定义和构建一个量子功能加密（QFE）方案将这个概念扩展到量子设置中，该方案允许在任意量子消息上评估多项式大小的电路。我们的构建基于量子混淆电路[BＹ22]。我们还研究了QFE与看似无关的不可克隆加密（UE）概念之间的关系，并发现任何QFE方案都普遍实现了不可克隆功能加密（UFE）属性。特别是我们假设存在一种具有量子解密密钥的不可克隆加密方案，该方案最近由[ＡＫＹ２４]构建。我们的UFE保证两个参与方不能同时使用两个独立抽样的函数秘密密钥恢复正确的函数输出。作为应用，我们提供了具有可变解密密钥的公钥UE的第一个构建。最后，我们建立了量子不可区分混淆（qiO）和量子功能加密（QFE）之间的联系；表明任何多输入不可区分安全的量子功能加密方案无条件地暗示了qiO的存在。

更新时间: 2025-03-14 23:27:01

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2410.06029v2

DRAPER: Towards a Robust Robot Deployment and Reliable Evaluation for Quasi-Static Pick-and-Place Cloth-Shaping Neural Controllers

Comparing robotic cloth-manipulation systems in a real-world setup is challenging. The fidelity gap between simulation-trained cloth neural controllers and real-world operation hinders the reliable deployment of these methods in physical trials. Inconsistent experimental setups and hardware limitations among different approaches obstruct objective evaluations. This study demonstrates a reliable real-world comparison of different simulation-trained neural controllers on both flattening and folding tasks with different types of fabrics varying in material, size, and colour. We introduce the DRAPER framework to enable this comprehensive study, which reliably reflects the true capabilities of these neural controllers. It specifically addresses real-world grasping errors, such as misgrasping and multilayer grasping, through real-world adaptations of the simulation environment to provide data trajectories that closely reflect real-world grasping scenarios. It also employs a special set of vision processing techniques to close the simulation-to-reality gap in the perception. Furthermore, it achieves robust grasping by adopting a tweezer-extended gripper and a grasping procedure. We demonstrate DRAPER's generalisability across different deep-learning methods and robotic platforms, offering valuable insights to the cloth manipulation research community.

Updated: 2025-03-14 23:15:09

标题: DRAPER：面向准静态拾取和放置布料成形神经控制器的稳健机器人部署和可靠评估

摘要: 在真实世界的环境中比较机器人布料操作系统是具有挑战性的。模拟训练的布料神经控制器与真实世界操作之间的保真度差距阻碍了这些方法在物理试验中的可靠部署。不同方法之间不一致的实验设置和硬件限制阻碍了客观评估。本研究展示了不同模拟训练的神经控制器在不同类型的织物上进行平整和折叠任务的可靠真实世界比较，这些织物在材料、尺寸和颜色上有所不同。我们引入了DRAPER框架来实现这一全面研究，可可靠地反映这些神经控制器的真实能力。它特别解决了真实世界抓取误差，如错误抓取和多层抓取，通过对模拟环境进行真实世界的调整，提供紧密反映真实世界抓取情景的数据轨迹。它还采用一套特殊的视觉处理技术来缩小感知中的模拟与现实之间的差距。此外，通过采用一种镊子延伸夹具和抓取程序，实现了稳健的抓取。我们展示了DRAPER在不同深度学习方法和机器人平台上的普适性，为布料操作研究社区提供了宝贵的见解。

更新时间: 2025-03-14 23:15:09

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.15159v2

RePanda: Pandas-powered Tabular Verification and Reasoning

Fact-checking tabular data is essential for ensuring the accuracy of structured information. However, existing methods often rely on black-box models with opaque reasoning. We introduce RePanda, a structured fact verification approach that translates claims into executable pandas queries, enabling interpretable and verifiable reasoning. To train RePanda, we construct PanTabFact, a structured dataset derived from the TabFact train set, where claims are paired with executable queries generated using DeepSeek-Chat and refined through automated error correction. Fine-tuning DeepSeek-coder-7B-instruct-v1.5 on PanTabFact, RePanda achieves 84.09% accuracy on the TabFact test set. To evaluate Out-of-Distribution (OOD) generalization, we interpret question-answer pairs from WikiTableQuestions as factual claims and refer to this dataset as WikiFact. Without additional fine-tuning, RePanda achieves 84.72% accuracy on WikiFact, significantly outperforming all other baselines and demonstrating strong OOD robustness. Notably, these results closely match the zero-shot performance of DeepSeek-Chat (671B), indicating that our fine-tuning approach effectively distills structured reasoning from a much larger model into a compact, locally executable 7B model. Beyond fact verification, RePanda extends to tabular question answering by generating executable queries that retrieve precise answers. To support this, we introduce PanWiki, a dataset mapping WikiTableQuestions to pandas queries. Fine-tuning on PanWiki, RePanda achieves 75.1% accuracy in direct answer retrieval. These results highlight the effectiveness of structured execution-based reasoning for tabular verification and question answering. We have publicly released the dataset on Hugging Face at datasets/AtoosaChegini/PanTabFact.

Updated: 2025-03-14 23:12:36

标题: RePanda：基于Pandas的表格验证和推理

摘要: 事实核查表格数据对于确保结构化信息的准确性至关重要。然而，现有方法通常依赖于不透明推理的黑盒模型。我们介绍了RePanda，这是一种结构化事实验证方法，它将声明转换为可执行的pandas查询，实现可解释和可验证的推理。为了训练RePanda，我们构建了PanTabFact，这是从TabFact训练集派生的结构化数据集，其中声明与使用DeepSeek-Chat生成的可执行查询配对，并通过自动错误校正进行了细化。在PanTabFact上微调DeepSeek-coder-7B-instruct-v1.5，RePanda在TabFact测试集上实现了84.09%的准确性。为了评估WikiTableQuestions中的问题-答案对的分布外（OOD）泛化，我们将WikiTableQuestions中的问题-答案对解释为事实声明，并将此数据集称为WikiFact。在没有额外微调的情况下，RePanda在WikiFact上实现了84.72%的准确性，明显优于所有其他基线，并展示了强大的OOD鲁棒性。值得注意的是，这些结果与DeepSeek-Chat（671B）的零样本性能非常接近，表明我们的微调方法有效地将结构化推理从一个更大的模型中提炼出来，转化为一个紧凑的、本地可执行的7B模型。除了事实验证，RePanda还扩展到表格问答，通过生成可执行查询来检索精确答案。为了支持这一点，我们引入了PanWiki，这是一个将WikiTableQuestions映射到pandas查询的数据集。在PanWiki上微调，RePanda在直接答案检索中实现了75.1%的准确性。这些结果突显了基于结构化执行的推理对于表格验证和问答的有效性。我们已经在Hugging Face上公开发布了数据集，路径为datasets/AtoosaChegini/PanTabFact。

更新时间: 2025-03-14 23:12:36

领域: cs.LG

下载: http://arxiv.org/abs/2503.11921v1

Practical Implications of Implementing Local Differential Privacy for Smart grids

Recent smart grid advancements enable near-realtime reporting of electricity consumption, raising concerns about consumer privacy. Differential privacy (DP) has emerged as a viable privacy solution, where a calculated amount of noise is added to the data by a trusted third party, or individual users perturb their information locally, and only send the randomized data to an aggregator for analysis safeguarding users and aggregators privacy. However, the practical implementation of a Local DP-based (LDP) privacy model for smart grids has its own challenges. In this paper, we discuss the challenges of implementing an LDP-based model for smart grids. We compare existing LDP mechanisms in smart grids for privacy preservation of numerical data and discuss different methods for selecting privacy parameters in the existing literature, their limitations and the non-existence of an optimal method for selecting the privacy parameters. We also discuss the challenges of translating theoretical models of LDP into a practical setting for smart grids for different utility functions, the impact of the size of data set on privacy and accuracy, and vulnerability of LDP-based smart grids to manipulation attacks. Finally, we discuss future directions in research for better practical applications in LDP based models for smart grids.

Updated: 2025-03-14 23:11:46

标题: 实施本地差分隐私对智能电网的实际影响

摘要: 最近智能电网的进展使得电力消耗可以几乎实时报告，引发了消费者隐私的担忧。差分隐私（DP）已经成为一种可行的隐私解决方案，其中由可信第三方向数据添加计算的噪声，或者个人用户在本地干扰其信息，仅将随机化数据发送给聚合器进行分析，从而保护用户和聚合器的隐私。然而，为智能电网实现基于本地差分隐私（LDP）的隐私模型也面临着挑战。在本文中，我们讨论了为智能电网实现基于LDP的模型所面临的挑战。我们比较了现有智能电网中用于保护数值数据隐私的LDP机制，并讨论了现有文献中选择隐私参数的不同方法、它们的局限性以及选择隐私参数的最优方法不存在的问题。我们还讨论了将LDP的理论模型转化为智能电网实际环境的挑战，对不同实用功能的影响、数据集大小对隐私和准确性的影响，以及LDP-based智能电网对操纵攻击的脆弱性。最后，我们讨论了未来在LDP-based模型为智能电网提供更好实际应用的研究方向。

更新时间: 2025-03-14 23:11:46

领域: cs.CR

下载: http://arxiv.org/abs/2503.11920v1

Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches

Training robotic manipulation policies traditionally requires numerous demonstrations and/or environmental rollouts. While recent Imitation Learning (IL) and Reinforcement Learning (RL) methods have reduced the number of required demonstrations, they still rely on expert knowledge to collect high-quality data, limiting scalability and accessibility. We propose Sketch-to-Skill, a novel framework that leverages human-drawn 2D sketch trajectories to bootstrap and guide RL for robotic manipulation. Our approach extends beyond previous sketch-based methods, which were primarily focused on imitation learning or policy conditioning, limited to specific trained tasks. Sketch-to-Skill employs a Sketch-to-3D Trajectory Generator that translates 2D sketches into 3D trajectories, which are then used to autonomously collect initial demonstrations. We utilize these sketch-generated demonstrations in two ways: to pre-train an initial policy through behavior cloning and to refine this policy through RL with guided exploration. Experimental results demonstrate that Sketch-to-Skill achieves ~96% of the performance of the baseline model that leverages teleoperated demonstration data, while exceeding the performance of a pure reinforcement learning policy by ~170%, only from sketch inputs. This makes robotic manipulation learning more accessible and potentially broadens its applications across various domains.

Updated: 2025-03-14 23:08:29

标题: 从草图到技能：利用人类绘制的轨迹草图来启动机器人学习

摘要: 传统上，训练机器人操纵策略需要大量的演示和/或环境部署。尽管最近的模仿学习（IL）和强化学习（RL）方法已经减少了所需演示的数量，但它们仍然依赖于专家知识来收集高质量数据，从而限制了可扩展性和可访问性。我们提出了一种新颖的框架Sketch-to-Skill，利用人工绘制的2D草图轨迹来启动和引导机器人操纵的RL。我们的方法超越了以前基于草图的方法，这些方法主要集中在模仿学习或策略调节上，限于特定训练任务。Sketch-to-Skill采用了一个Sketch-to-3D轨迹生成器，将2D草图转化为3D轨迹，然后用于自主收集初始演示。我们以两种方式利用这些由草图生成的演示：通过行为克隆预训练初始策略，并通过带有引导探索的RL来完善这个策略。实验结果表明，Sketch-to-Skill实现了基于远程操作演示数据的基准模型性能的约96%，同时超过了纯强化学习策略的性能约170%，仅从草图输入中获得。这使得机器人操纵学习更加易于访问，可能扩展了其在各个领域的应用。

更新时间: 2025-03-14 23:08:29

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.11918v1

Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training

We investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments. Building on prior work demonstrating the existence of biases in popular computer vision datasets, we analyze popular open-source pretraining datasets for LLMs derived from CommonCrawl including C4, RefinedWeb, DolmaCC, RedPajama-V2, FineWeb, and DCLM-Baseline. Despite those datasets being obtained with similar curation steps, neural networks can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can. This indicates that small differences in filtering and processing pipelines induce fingerprints evident in formatting, vocabulary, and content distributions. Those biases remain even when the text is rewritten with LLMs. Moreover, these biases propagate through training: Random sequences generated by models trained on those datasets can be classified well by a classifier trained on the original datasets. This can be leveraged to estimate the pretraining mixture proportions of the data sources.

Updated: 2025-03-14 23:07:45

标题: 测量网络过滤文本数据集的偏见以及通过训练传播的偏见

摘要: 我们通过数据集分类实验调查了大型语言模型（LLMs）的预训练数据集中存在的偏见。在之前的研究基础上，证明了流行的计算机视觉数据集存在偏见，我们分析了源自CommonCrawl的流行开源预训练数据集，包括C4、RefinedWeb、DolmaCC、RedPajama-V2、FineWeb和DCLM-Baseline。尽管这些数据集是通过相似的筛选步骤获得的，神经网络可以出奇地很好地对单个文本序列属于哪个数据集进行分类，远远优于人类。这表明在过滤和处理管道中存在微小差异会导致在格式、词汇和内容分布中可见的指纹。即使通过LLMs重新编写文本，这些偏见仍然存在。此外，这些偏见会通过训练传播：由这些数据集训练的模型生成的随机序列可以被训练在原始数据集上的分类器很好地分类。这可以用来估计数据源的预训练混合比例。

更新时间: 2025-03-14 23:07:45

领域: cs.LG

下载: http://arxiv.org/abs/2412.02857v2

A Framework for Evaluating Emerging Cyberattack Capabilities of AI

As frontier models become more capable, the community has attempted to evaluate their ability to enable cyberattacks. Performing a comprehensive evaluation and prioritizing defenses are crucial tasks in preparing for AGI safely. However, current cyber evaluation efforts are ad-hoc, with no systematic reasoning about the various phases of attacks, and do not provide a steer on how to use targeted defenses. In this work, we propose a novel approach to AI cyber capability evaluation that (1) examines the end-to-end attack chain, (2) helps to identify gaps in the evaluation of AI threats, and (3) helps defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation to support red teaming. To achieve these goals, we propose adapting existing cyberattack chain frameworks to AI systems. We analyze over 12,000 instances of real-world attempts to use AI in cyberattacks catalogued by Google's Threat Intelligence Group. Using this analysis, we curate a representative collection of seven cyberattack chain archetypes and conduct a bottleneck analysis to identify areas of potential AI-driven cost disruption. Our evaluation benchmark consists of 50 new challenges spanning different phases of cyberattacks. Based on this, we devise targeted cybersecurity model evaluations, report on the potential for AI to amplify offensive cyber capabilities across specific attack phases, and conclude with recommendations on prioritizing defenses. In all, we consider this to be the most comprehensive AI cyber risk evaluation framework published so far.

Updated: 2025-03-14 23:05:02

标题: 一个评估人工智能新兴网络攻击能力的框架

摘要: 随着边界模型变得更加强大，社区已经尝试评估它们能够实现网络攻击的能力。进行全面评估并优先考虑防御措施是为了安全准备AGI至关重要的任务。然而，目前的网络评估工作是临时性的，没有关于攻击各个阶段的系统推理，并且没有提供如何使用有针对性的防御的指导。在这项工作中，我们提出了一种新颖的AI网络能力评估方法，(1)考察了端到端的攻击链，(2)有助于识别对AI威胁评估的空白，并(3)帮助防御者优先考虑有针对性的缓解措施，并进行支持红队行动的AI对手仿真。为了实现这些目标，我们提出了将现有的网络攻击链框架调整为AI系统的方法。我们分析了谷歌威胁情报组所记录的超过12,000次尝试在网络攻击中使用AI的真实实例。通过这项分析，我们整理了代表性的七种网络攻击链原型，并进行了瓶颈分析，以确定潜在的AI驱动成本扰乱领域。我们的评估基准包括50个涵盖网络攻击不同阶段的新挑战。基于此，我们制定了有针对性的网络安全模型评估，报告了AI在特定攻击阶段放大攻击性网络能力的潜力，并最后提出了关于优先考虑防御措施的建议。总的来说，我们认为这是迄今为止发表的最全面的AI网络风险评估框架。

更新时间: 2025-03-14 23:05:02

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.11917v1

Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Reconstructing structured 3D scenes from RGB images using CAD objects unlocks efficient and compact scene representations that maintain compositionality and interactability. Existing works propose training-heavy methods relying on either expensive yet inaccurate real-world annotations or controllable yet monotonous synthetic data that do not generalize well to unseen objects or domains. We present Diorama, the first zero-shot open-world system that holistically models 3D scenes from single-view RGB observations without requiring end-to-end training or human annotations. We show the feasibility of our approach by decomposing the problem into subtasks and introduce robust, generalizable solutions to each: architecture reconstruction, 3D shape retrieval, object pose estimation, and scene layout optimization. We evaluate our system on both synthetic and real-world data to show we significantly outperform baselines from prior work. We also demonstrate generalization to internet images and the text-to-scene task.

Updated: 2025-03-14 22:54:30

标题: 迪奥拉玛：释放零镜头单视角3D室内场景建模

摘要: 从RGB图像中使用CAD对象重建结构化的3D场景可以解锁高效且紡紧凑的场景表示，保持构成性和互动性。现有作品提出了依赖于昂贵但不准确的真实世界标注或可控制但单调的合成数据的训练密集方法，这些方法不能很好地泛化到未见过的对象或领域。我们提出了Diorama，这是第一个零射开放世界系统，可以从单视角RGB观察中全面建模3D场景，而无需端到端训练或人类标注。我们通过将问题分解为子任务，介绍了对每个子任务的坚固、可泛化解决方案：架构重建、3D形状检索、对象姿势估计和场景布局优化。我们在合成和真实世界数据上评估我们的系统，显示我们明显优于先前工作的基线。我们还展示了对互联网图像和文本到场景任务的泛化能力。

更新时间: 2025-03-14 22:54:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.19492v2

How Problematic Writer-AI Interactions (Rather than Problematic AI) Hinder Writers' Idea Generation

Writing about a subject enriches writers' understanding of that subject. This cognitive benefit of writing -- known as constructive learning -- is essential to how students learn in various disciplines. However, does this benefit persist when students write with generative AI writing assistants? Prior research suggests the answer varies based on the type of AI, e.g., auto-complete systems tend to hinder ideation, while assistants that pose Socratic questions facilitate it. This paper adds an additional perspective. Through a case study, we demonstrate that the impact of genAI on students' idea development depends not only on the AI but also on the students and, crucially, their interactions in between. Students who proactively explored ideas gained new ideas from writing, regardless of whether they used auto-complete or Socratic AI assistants. Those who engaged in prolonged, mindless copyediting developed few ideas even with a Socratic AI. These findings suggest opportunities in designing AI writing assistants, not merely by creating more thought-provoking AI, but also by fostering more thought-provoking writer-AI interactions.

Updated: 2025-03-14 22:53:53

标题: 问题作家-AI交互如何阻碍作家的创意生成（而不是问题AI）

摘要: 写作有助于丰富写作者对该主题的理解。这种写作的认知益处被称为建设性学习，对学生在各个学科中学习至关重要。然而，当学生与生成AI写作助手一起写作时，这种益处是否会持续呢？先前的研究表明，答案取决于AI的类型，例如，自动完成系统往往会阻碍构思，而提出苏格拉底式问题的助手则有助于促进构思。本文增加了另一个观点。通过一个案例研究，我们证明了genAI对学生构思发展的影响不仅取决于AI，还取决于学生以及他们之间的互动。那些积极探索想法的学生从写作中获得了新的想法，无论他们使用自动完成还是苏格拉底式的AI助手。那些进行长时间的毫无意义的编辑工作的学生即使使用苏格拉底式AI也几乎没有产生新的想法。这些发现表明，在设计AI写作助手时存在机会，不仅仅可以通过创建更具启发性的AI来实现，还可以通过促进更具启发性的写作者-AI互动来实现。

更新时间: 2025-03-14 22:53:53

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.11915v1

Implementation of classical client universal blind quantum computation with 8-state RSP in current architecture

The future of quantum computing architecture is most likely the one in which a large number of clients are either fully classical or have a very limited quantum capability while a very small number of servers having the capability to perform quantum computations and most quantum computational tasks are delegated to these quantum servers. In this architecture, it becomes very crucial that a classical/semi-classical client is able to keep the delegated data/ computation secure against eavesdroppers as well as the server itself, known as the blindness feature. In 2009, A. Broadbent et. al proposed a universal blind quantum computation (UBQC) protocol based on measurement-based quantum computation (MBQC) that enables a semi-classical client to delegate universal quantum computation to a quantum server, interactively and fetch the results while the computation itself remains blind to the server. In this work, we propose an implementation (with examples) of UBQC in the current quantum computing architecture, a fully classical client, a quantum server (IBM Quantum) and the computation does not proceed interactively (projective measurement basis is not decided by previous measurement outcome). We combined UBQC with the 8-state remote state preparation (RSP) protocol, to blindly prepare the initial cluster state, which is an initial resource state in UBQC protocol, to allow a completely classical client to perform delegated blind quantum computation. Such an implementation has already been shown to be secure in a game-based security setting, which is the weakest security model.

Updated: 2025-03-14 22:52:02

标题: 在当前体系结构中使用8态RSP实现经典客户端通用盲量子计算

摘要: 量子计算架构的未来很可能是这样的：大多数客户要么完全经典，要么具有非常有限的量子能力，而只有少数具有量子计算能力的服务器能够执行量子计算任务，大部分量子计算任务都委托给这些量子服务器。在这种架构中，经典/半经典客户能够保护被委托的数据/计算免受窃听者和服务器本身的攻击，这被称为盲目特性。2009年，A. Broadbent等人提出了一种基于基于测量的量子计算（MBQC）的通用盲目量子计算（UBQC）协议，使得半经典客户能够将通用量子计算委托给量子服务器，并在计算过程中保持对服务器的盲目性。在这项工作中，我们提出了在当前量子计算架构中实现UBQC的方法（附有示例），其中包括一个完全经典的客户、一个量子服务器（IBM Quantum）以及计算不是交互进行的（投影测量基础不是由先前的测量结果决定）。我们将UBQC与8态远程态制备（RSP）协议相结合，以盲目准备初始聚类态，这是UBQC协议中的一个初始资源状态，以允许完全经典的客户执行委托的盲目量子计算。这种实现已经被证明在基于游戏的安全设置中是安全的，这是最弱的安全模型。

更新时间: 2025-03-14 22:52:02

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2503.11913v1

Impact of Noisy Supervision in Foundation Model Learning

Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and analyze the nature of noise in pre-training datasets and then effectively mitigate its impacts on downstream tasks. Specifically, through extensive experiments of fully-supervised and image-text contrastive pre-training on synthetic noisy ImageNet-1K, YFCC15M, and CC12M datasets, we demonstrate that, while slight noise in pre-training can benefit in-domain (ID) performance, where the training and testing data share a similar distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing distributions are significantly different. These observations are agnostic to scales of pre-training datasets, pre-training noise types, model architectures, pre-training objectives, downstream tuning methods, and downstream applications. We empirically ascertain that the reason behind this is that the pre-training noise shapes the feature space differently. We then propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization, which is applicable in both parameter-efficient and black-box tuning manners. We additionally conduct extensive experiments on popular vision and language models, including APIs, which are supervised and self-supervised pre-trained on realistic noisy data for evaluation. Our analysis and results demonstrate the importance of this novel and fundamental research direction, which we term as Noisy Model Learning.

Updated: 2025-03-14 22:46:43

标题: 基金模型学习中嘈杂监督的影响

摘要: 基础模型通常在大规模数据集上进行预训练，然后通过调整适应下游任务。然而，大规模预训练数据集往往难以获取或成本过高，可能包含标签噪音，可能会对模型的泛化产生不利影响，并带来意想不到的风险。本文是第一部全面了解和分析预训练数据集中噪音性质，并有效减轻其对下游任务影响的工作。具体来说，通过在合成噪声ImageNet-1K、YFCC15M和CC12M数据集上进行全监督和图像-文本对比预训练的大量实验，我们证明了虽然预训练中的轻微噪音可以有利于领域内（ID）性能，在那里训练和测试数据共享相似分布，但总是会恶化领域外（OOD）性能，在那里训练和测试分布显著不同。这些观察结果与预训练数据集的规模、预训练噪音类型、模型架构、预训练目标、下游调整方法和下游应用无关。我们凭经验证实，这是因为预训练噪音以不同方式塑造特征空间。然后，我们提出了一种调整方法（NMTune）来使特征空间变换，以减轻噪音的恶性影响并提高泛化性能，适用于参数高效和黑盒调整方式。我们还在流行的视觉和语言模型上进行了大量实验，包括在真实嘈杂数据上进行监督和自监督预训练的API，以进行评估。我们的分析和结果证明了这一新颖而基础研究方向的重要性，我们将其称为嘈杂模型学习。

更新时间: 2025-03-14 22:46:43

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2403.06869v2

Order Fairness Evaluation of DAG-based ledgers

Order fairness in distributed ledgers refers to properties that relate the order in which transactions are sent or received to the order in which they are eventually finalized, i.e., totally ordered. The study of such properties is relatively new and has been especially stimulated by the rise of Maximal Extractable Value (MEV) attacks in blockchain environments. Indeed, in many classical blockchain protocols, leaders are responsible for selecting the transactions to be included in blocks, which creates a clear vulnerability and opportunity for transaction order manipulation. Unlike blockchains, DAG-based ledgers allow participants in the network to independently propose blocks, which are then arranged as vertices of a directed acyclic graph. Interestingly, leaders in DAG-based ledgers are elected only after the fact, once transactions are already part of the graph, to determine their total order. In other words, transactions are not chosen by single leaders; instead, they are collectively validated by the nodes, and leaders are only elected to establish an ordering. This approach intuitively reduces the risk of transaction manipulation and enhances fairness. In this paper, we aim to quantify the capability of DAG-based ledgers to achieve order fairness. To this end, we define new variants of order fairness adapted to DAG-based ledgers and evaluate the impact of an adversary capable of compromising a limited number of nodes (below the one-third threshold) to reorder transactions. We analyze how often our order fairness properties are violated under different network conditions and parameterizations of the DAG algorithm, depending on the adversary's power. Our study shows that DAG-based ledgers are still vulnerable to reordering attacks, as an adversary can coordinate a minority of Byzantine nodes to manipulate the DAG's structure.

Updated: 2025-03-14 22:43:17

标题: DAG基础账本的订单公平性评估

摘要: 分布式账本中的订单公平性指的是与交易发送或接收的顺序与最终确定的顺序（即完全有序）有关的属性。对这类属性的研究相对较新，特别是受到区块链环境中最大可提取价值（MEV）攻击兴起的刺激。实际上，在许多经典的区块链协议中，领导者负责选择要包含在区块中的交易，这创造了一个明显的漏洞和交易顺序操纵的机会。与区块链不同，基于DAG的账本允许网络中的参与者独立提出区块，然后将其排列为有向无环图的顶点。有趣的是，基于DAG的账本中的领导者仅在事后选出，一旦交易已经成为图的一部分，才确定它们的总顺序。换句话说，交易不是由单个领导者选择的；相反，它们由节点共同验证，领导者只是被选出来建立顺序。这种方法直观地降低了交易操纵的风险，并增强了公平性。在本文中，我们旨在量化基于DAG的账本实现订单公平性的能力。为此，我们定义了适用于基于DAG的账本的订单公平性的新变体，并评估了一个能够破坏有限数量节点（低于三分之一阈值）的对手对重新排序交易的影响。我们分析了在不同网络条件和DAG算法参数化下，根据对手的能力，我们的订单公平性属性被违反的频率。我们的研究表明，基于DAG的账本仍然容易受到重新排序攻击的影响，因为对手可以协调少数拜占庭节点来操纵DAG的结构。

更新时间: 2025-03-14 22:43:17

领域: cs.CR,cs.DC,cs.MA

下载: http://arxiv.org/abs/2502.17270v2

RTD-Lite: Scalable Topological Analysis for Comparing Weighted Graphs in Learning Tasks

Topological methods for comparing weighted graphs are valuable in various learning tasks but often suffer from computational inefficiency on large datasets. We introduce RTD-Lite, a scalable algorithm that efficiently compares topological features, specifically connectivity or cluster structures at arbitrary scales, of two weighted graphs with one-to-one correspondence between vertices. Using minimal spanning trees in auxiliary graphs, RTD-Lite captures topological discrepancies with $O(n^2)$ time and memory complexity. This efficiency enables its application in tasks like dimensionality reduction and neural network training. Experiments on synthetic and real-world datasets demonstrate that RTD-Lite effectively identifies topological differences while significantly reducing computation time compared to existing methods. Moreover, integrating RTD-Lite into neural network training as a loss function component enhances the preservation of topological structures in learned representations. Our code is publicly available at https://github.com/ArGintum/RTD-Lite

Updated: 2025-03-14 22:42:13

标题: RTD-Lite：用于在学习任务中比较加权图的可扩展拓扑分析

摘要: 拓扑方法用于比较加权图在各种学习任务中非常有价值，但往往在大型数据集上存在计算效率低下的问题。我们介绍了RTD-Lite，这是一种可扩展的算法，可以高效比较具有顶点之间一对一对应关系的两个加权图的拓扑特征，特别是连接性或集群结构在任意尺度上。通过在辅助图中使用最小生成树，RTD-Lite以$O(n^2)$的时间和内存复杂度捕捉拓扑差异。这种效率使其能够应用于降维和神经网络训练等任务。对合成和真实数据集的实验表明，与现有方法相比，RTD-Lite有效地识别了拓扑差异，并显著减少了计算时间。此外，将RTD-Lite集成到神经网络训练中作为损失函数组件，有助于保持学习表示中的拓扑结构。我们的代码可以在https://github.com/ArGintum/RTD-Lite 上公开获取。

更新时间: 2025-03-14 22:42:13

领域: cs.LG,cs.AI,math.SG

下载: http://arxiv.org/abs/2503.11910v1

Revisiting FastMap: New Applications

FastMap was first introduced in the Data Mining community for generating Euclidean embeddings of complex objects. In this dissertation, we first present FastMap to generate Euclidean embeddings of graphs in near-linear time: The pairwise Euclidean distances approximate a desired graph-based distance function on the vertices. We then apply the graph version of FastMap to efficiently solve various graph-theoretic problems of significant interest in AI: including facility location, top-K centrality computations, community detection and block modeling, and graph convex hull computations. We also present a novel learning framework, called FastMapSVM, by combining FastMap and Support Vector Machines. We then apply FastMapSVM to predict the satisfiability of Constraint Satisfaction Problems and to classify seismograms in Earthquake Science.

Updated: 2025-03-14 22:29:10

标题: 重新审视FastMap：新应用 (Note: FastMap is likely a specific algorithm or method mentioned in the document)

摘要: FastMap首次在数据挖掘社区中引入，用于生成复杂对象的欧几里得嵌入。在这篇论文中，我们首先介绍了FastMap，以在接近线性时间内生成图的欧几里得嵌入：成对的欧几里得距离近似于顶点上的所需基于图的距离函数。然后，我们将图版本的FastMap应用于有效解决人工智能中具有重要意义的各种图论问题：包括设施选址、top-K中心性计算、社区检测和块建模，以及图凸壳计算。我们还提出了一种新颖的学习框架，称为FastMapSVM，通过结合FastMap和支持向量机。然后，我们将FastMapSVM应用于预测约束满足问题的可满足性，并对地震科学中的地震图进行分类。

更新时间: 2025-03-14 22:29:10

领域: cs.DM,cs.AI

下载: http://arxiv.org/abs/2503.11908v1

Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming for Policy Optimization in Mixed Discrete-Continuous MDPs

We propose the Constraint-Generation Policy Optimization (CGPO) framework to optimize policy parameters within compact and interpretable policy classes for mixed discrete-continuous Markov Decision Processes (DC-MDP). CGPO can not only provide bounded policy error guarantees over an infinite range of initial states for many DC-MDPs with expressive nonlinear dynamics, but it can also provably derive optimal policies in cases where it terminates with zero error. Furthermore, CGPO can generate worst-case state trajectories to diagnose policy deficiencies and provide counterfactual explanations of optimal actions. To achieve such results, CGPO proposes a bilevel mixed-integer nonlinear optimization framework for optimizing policies in defined expressivity classes (e.g. piecewise linear) and reduces it to an optimal constraint generation methodology that adversarially generates worst-case state trajectories. Furthermore, leveraging modern nonlinear optimizers, CGPO can obtain solutions with bounded optimality gap guarantees. We handle stochastic transitions through chance constraints, providing high-probability performance guarantees. We also present a roadmap for understanding the computational complexities of different expressivity classes of policy, reward, and transition dynamics. We experimentally demonstrate the applicability of CGPO across various domains, including inventory control, management of a water reservoir system, and physics control. In summary, CGPO provides structured, compact and explainable policies with bounded performance guarantees, enabling worst-case scenario generation and counterfactual policy diagnostics.

Updated: 2025-03-14 22:23:32

标题: 约束生成策略优化（CGPO）：混合离散-连续MDPs中的策略优化非线性规划

摘要: 我们提出了Constraint-Generation Policy Optimization（CGPO）框架，用于优化混合离散-连续马尔可夫决策过程（DC-MDP）中紧凑且可解释的政策类的政策参数。CGPO不仅可以为具有表达力非线性动态的许多DC-MDP提供在无限范围的初始状态下的有界政策误差保证，还可以在终止时以零误差推导出最优政策。此外，CGPO可以生成最坏情况状态轨迹以诊断政策缺陷并提供最优动作的反事实解释。为了实现这些结果，CGPO提出了一个用于在定义的表达类（例如分段线性）中优化政策的双层混合整数非线性优化框架，并将其简化为一种最优约束生成方法，该方法对抗性地生成最坏情况状态轨迹。此外，利用现代非线性优化器，CGPO可以获得具有有界最优性差距保证的解决方案。我们通过机会约束处理随机转换，提供高概率性能保证。我们还提出了了解不同表达类的政策、奖励和转换动态的计算复杂性的路线图。我们在不同领域实验证明了CGPO的适用性，包括库存控制、水库系统管理和物理控制。总之，CGPO提供了结构化、紧凑且可解释的政策，并提供有界性能保证，从而实现最坏情况生成和反事实政策诊断。

更新时间: 2025-03-14 22:23:32

领域: math.OC,cs.LG,cs.RO,cs.SC,cs.SY,eess.SY

下载: http://arxiv.org/abs/2401.12243v2

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Efficient image tokenization with high compression ratios remains a critical challenge for training generative models. We present SoftVQ-VAE, a continuous image tokenizer that leverages soft categorical posteriors to aggregate multiple codewords into each latent token, substantially increasing the representation capacity of the latent space. When applied to Transformer-based architectures, our approach compresses 256x256 and 512x512 images using as few as 32 or 64 1-dimensional tokens. Not only does SoftVQ-VAE show consistent and high-quality reconstruction, more importantly, it also achieves state-of-the-art and significantly faster image generation results across different denoising-based generative models. Remarkably, SoftVQ-VAE improves inference throughput by up to 18x for generating 256x256 images and 55x for 512x512 images while achieving competitive FID scores of 1.78 and 2.21 for SiT-XL. It also improves the training efficiency of the generative models by reducing the number of training iterations by 2.3x while maintaining comparable performance. With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VAE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models. Code and model are released.

Updated: 2025-03-14 22:22:40

标题: SoftVQ-VAE：高效的一维连续分词器

摘要: 高效的图像标记化与高压缩比仍然是训练生成模型的一个关键挑战。我们提出了SoftVQ-VAE，这是一个连续图像标记器，利用软分类后验将多个码字聚合到每个潜在标记中，大幅增加了潜在空间的表示能力。当应用于基于Transformer的架构时，我们的方法使用32个或64个1维标记即可压缩256x256和512x512的图像。SoftVQ-VAE不仅显示出一致且高质量的重建，更重要的是，它还在基于去噪的生成模型中实现了最先进且显著更快的图像生成结果。值得注意的是，SoftVQ-VAE在生成256x256图像时将推断吞吐量提高了高达18倍，而在生成512x512图像时提高了55倍，同时实现了SiT-XL的竞争性FID分数为1.78和2.21。它还通过减少训练迭代次数2.3倍同时保持可比性能，提高了生成模型的训练效率。通过完全可微的设计和语义丰富的潜在空间，我们的实验表明SoftVQ-VAE实现了高效的标记化而不会影响生成质量，为更高效的生成模型铺平了道路。代码和模型已发布。

更新时间: 2025-03-14 22:22:40

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.10958v3

A Survey on SAR ship classification using Deep Learning

Deep learning (DL) has emerged as a powerful tool for Synthetic Aperture Radar (SAR) ship classification. This survey comprehensively analyzes the diverse DL techniques employed in this domain. We identify critical trends and challenges, highlighting the importance of integrating handcrafted features, utilizing public datasets, data augmentation, fine-tuning, explainability techniques, and fostering interdisciplinary collaborations to improve DL model performance. This survey establishes a first-of-its-kind taxonomy for categorizing relevant research based on DL models, handcrafted feature use, SAR attribute utilization, and the impact of fine-tuning. We discuss the methodologies used in SAR ship classification tasks and the impact of different techniques. Finally, the survey explores potential avenues for future research, including addressing data scarcity, exploring novel DL architectures, incorporating interpretability techniques, and establishing standardized performance metrics. By addressing these challenges and leveraging advancements in DL, researchers can contribute to developing more accurate and efficient ship classification systems, ultimately enhancing maritime surveillance and related applications.

Updated: 2025-03-14 22:19:24

标题: 使用深度学习进行SAR船舶分类的调查

摘要: 深度学习（DL）已经成为合成孔径雷达（SAR）船舶分类的强大工具。本调查全面分析了在该领域中采用的多样化DL技术。我们识别了关键趋势和挑战，强调整合手工特征、利用公共数据集、数据增强、微调、可解释性技术以及促进跨学科合作以提高DL模型性能的重要性。本调查建立了一个首创的分类法，用于根据DL模型、手工特征使用、SAR属性利用和微调的影响对相关研究进行分类。我们讨论了用于SAR船舶分类任务的方法论以及不同技术的影响。最后，该调查探讨了未来研究的潜在途径，包括解决数据稀缺性、探索新颖的DL架构、整合可解释性技术以及建立标准化的性能指标。通过解决这些挑战并利用DL的进展，研究人员可以为开发更准确和高效的船舶分类系统做出贡献，最终提升海上监视和相关应用的能力。

更新时间: 2025-03-14 22:19:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11906v1

Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities

Text-to-image synthesis has witnessed remarkable advancements in recent years. Many attempts have been made to adopt text-to-image models to support multiple tasks. However, existing approaches typically require resource-intensive re-training or additional parameters to accommodate for the new tasks, which makes the model inefficient for on-device deployment. We propose Multi-Task Upcycling (MTU), a simple yet effective recipe that extends the capabilities of a pre-trained text-to-image diffusion model to support a variety of image-to-image generation tasks. MTU replaces Feed-Forward Network (FFN) layers in the diffusion model with smaller FFNs, referred to as experts, and combines them with a dynamic routing mechanism. To the best of our knowledge, MTU is the first multi-task diffusion modeling approach that seamlessly blends multi-tasking with on-device compatibility, by mitigating the issue of parameter inflation. We show that the performance of MTU is on par with the single-task fine-tuned diffusion models across several tasks including image editing, super-resolution, and inpainting, while maintaining similar latency and computational load (GFLOPs) as the single-task fine-tuned models.

Updated: 2025-03-14 22:19:20

标题: 再循环利用文本到图像扩散模型以实现多任务能力

摘要: 文本到图像合成在最近几年取得了显著的进展。许多尝试已经被做出来采用文本到图像模型来支持多个任务。然而，现有方法通常需要资源密集型的重新训练或额外的参数来适应新任务，这使得模型在设备部署上效率低下。我们提出了多任务循环利用（MTU），这是一个简单而有效的方法，扩展了一个预训练的文本到图像扩散模型的能力，以支持各种图像生成任务。MTU用更小的前馈网络（FFN）替换了扩散模型中的FFN层，称为专家，并将它们与动态路由机制结合起来。据我们所知，MTU是第一个能够无缝结合多任务和设备兼容性的扩散建模方法，通过减轻参数膨胀的问题。我们展示了MTU的性能与单任务微调的扩散模型在包括图像编辑、超分辨率和修补等多个任务上是相当的，同时保持与单任务微调模型相似的延迟和计算负载（GFLOPs）。

更新时间: 2025-03-14 22:19:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11905v1

Characterizing GPU Resilience and Impact on AI/HPC Systems

In this study, we characterize GPU failures in Delta, the current large-scale AI system with over 600 petaflops of peak compute throughput. The system comprises GPU and non-GPU nodes with modern AI accelerators, such as NVIDIA A40, A100, and H100 GPUs. The study uses two and a half years of data on GPU errors. We evaluate the resilience of GPU hardware components to determine the vulnerability of different GPU components to failure and their impact on the GPU and node availability. We measure the key propagation paths in GPU hardware, GPU interconnect (NVLink), and GPU memory. Finally, we evaluate the impact of the observed GPU errors on user jobs. Our key findings are: (i) Contrary to common beliefs, GPU memory is over 30x more reliable than GPU hardware in terms of MTBE (mean time between errors). (ii) The newly introduced GSP (GPU System Processor) is the most vulnerable GPU hardware component. (iii) NVLink errors did not always lead to user job failure, and we attribute it to the underlying error detection and retry mechanisms employed. (iv) We show multiple examples of hardware errors originating from one of the key GPU hardware components, leading to application failure. (v) We project the impact of GPU node availability on larger scales with emulation and find that significant overprovisioning between 5-20% would be necessary to handle GPU failures. If GPU availability were improved to 99.9%, the overprovisioning would be reduced by 4x.

Updated: 2025-03-14 22:14:18

标题: 表征GPU的弹性及其对人工智能/高性能计算系统的影响

摘要: 在这项研究中，我们对Delta系统中的GPU故障进行了表征，该系统是当前具有超过600 petaflops峰值计算吞吐量的大规模人工智能系统。该系统由具有现代人工智能加速器的GPU和非GPU节点组成，例如NVIDIA A40、A100和H100 GPU。该研究使用了两年半的GPU错误数据。我们评估了GPU硬件组件的弹性，以确定不同GPU组件对故障的脆弱性及其对GPU和节点可用性的影响。我们测量了GPU硬件、GPU互连（NVLink）和GPU内存中的关键传播路径。最后，我们评估了观察到的GPU错误对用户作业的影响。我们的主要发现包括：（i）与普遍观念相反，就MTBE（错误之间的平均时间）而言，GPU内存的可靠性超过GPU硬件30倍。（ii）新引入的GSP（GPU系统处理器）是最脆弱的GPU硬件组件。（iii）NVLink错误并不总是导致用户作业失败，我们将其归因于所采用的底层错误检测和重试机制。（iv）我们展示了多个硬件错误示例源自其中一个关键GPU硬件组件，导致应用程序失败。（v）我们通过仿真预测了GPU节点可用性对更大规模的影响，并发现需要在5-20%之间的显著过度配置来处理GPU故障。如果GPU可用性提高到99.9%，则过度配置将减少4倍。

更新时间: 2025-03-14 22:14:18

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2503.11901v1

Heterogenous graph neural networks for species distribution modeling

Species distribution models (SDMs) are necessary for measuring and predicting occurrences and habitat suitability of species and their relationship with environmental factors. We introduce a novel presence-only SDM with graph neural networks (GNN). In our model, species and locations are treated as two distinct node sets, and the learning task is predicting detection records as the edges that connect locations to species. Using GNN for SDM allows us to model fine-grained interactions between species and the environment. We evaluate the potential of this methodology on the six-region dataset compiled by National Center for Ecological Analysis and Synthesis (NCEAS) for benchmarking SDMs. For each of the regions, the heterogeneous GNN model is comparable to or outperforms previously-benchmarked single-species SDMs as well as a feed-forward neural network baseline model.

Updated: 2025-03-14 22:08:30

标题: 异质图神经网络用于物种分布建模

摘要: 物种分布模型（SDMs）对于测量和预测物种的出现和栖息地适宜性以及它们与环境因素的关系是必不可少的。我们引入了一种新颖的基于存在性的SDM，其中使用了图神经网络（GNN）。在我们的模型中，物种和位置被视为两种不同的节点集，学习任务是预测将位置与物种连接的检测记录作为边。使用GNN进行SDM允许我们对物种与环境之间的精细相互作用进行建模。我们通过国家生态分析与综合中心（NCEAS）编制的六个地区数据集对这种方法的潜力进行评估，用于SDM基准测试。对于每个地区，异质的GNN模型与以前基准测试的单物种SDMs以及前馈神经网络基准模型相比具有可比性或表现更好。

更新时间: 2025-03-14 22:08:30

领域: cs.LG,q-bio.PE,stat.ML,92B20 (Primary) 68T07, 92D40 (Secondary),I.2.1; J.3

下载: http://arxiv.org/abs/2503.11900v1

Spatio-temporal Fourier Transformer (StFT) for Long-term Dynamics Prediction

Simulating the long-term dynamics of multi-scale and multi-physics systems poses a significant challenge in understanding complex phenomena across science and engineering. The complexity arises from the intricate interactions between scales and the interplay of diverse physical processes. Neural operators have emerged as promising models for predicting such dynamics due to their flexibility and computational efficiency. However, they often fail to effectively capture multi-scale interactions or quantify the uncertainties inherent in the predictions. These limitations lead to rapid error accumulation, particularly in long-term forecasting of systems characterized by complex and coupled dynamics. To address these challenges, we propose a spatio-temporal Fourier transformer (StFT), in which each transformer block is designed to learn dynamics at a specific scale. By leveraging a structured hierarchy of StFT blocks, the model explicitly captures dynamics across both macro- and micro- spatial scales. Furthermore, a generative residual correction mechanism is integrated to estimate and mitigate predictive uncertainties, enhancing both the accuracy and reliability of long-term forecasts. Evaluations conducted on three benchmark datasets (plasma, fluid, and atmospheric dynamics) demonstrate the advantages of our approach over state-of-the-art ML methods.

Updated: 2025-03-14 22:04:03

标题: 时空傅立叶变换器（StFT）用于长期动态预测

摘要: 模拟多尺度和多物理系统的长期动态在理解跨科学和工程中的复杂现象方面面临着重大挑战。复杂性来自于不同尺度之间的复杂相互作用和多样物理过程的相互作用。由于其灵活性和计算效率，神经操作员已经成为预测这种动态的有希望的模型。然而，它们经常无法有效捕捉多尺度的相互作用或量化预测中固有的不确定性。这些限制导致错误迅速积累，特别是在长期预测复杂和耦合动态系统的情况下。为了解决这些挑战，我们提出了一个时空Fourier变换器（StFT），其中每个变换器块被设计用于学习特定尺度上的动态。通过利用结构化的StFT块层次结构，该模型明确地捕捉了宏观和微观空间尺度上的动态。此外，还集成了一个生成式残差校正机制来估计和减轻预测不确定性，提高了长期预测的准确性和可靠性。对三个基准数据集（等离子体、流体和大气动力学）进行的评估表明，我们的方法优于最先进的机器学习方法。

更新时间: 2025-03-14 22:04:03

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2503.11899v1

Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This significant improvement enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources. Consequently, S-MNN can drop-in replace the original MNN in applications, providing a practical and efficient tool for integrating mechanistic bottlenecks into neural network models of complex dynamical systems. Source code is available at https://github.com/IST-DASLab/ScalableMNN .

Updated: 2025-03-14 22:00:28

标题: 可扩展的差分方程和机器学习机制神经网络

摘要: 我们提出了可扩展的机制神经网络（S-MNN），这是一个增强的神经网络框架，专为涉及长时间序列的科学机器学习应用而设计。通过重新构造原始的机制神经网络（MNN）（Pervez等，2024年），我们将计算时间和空间复杂性从分别相对于序列长度的立方和二次减少到线性。这一显著改进使得能够高效建模长期动态而不损失准确性或可解释性。大量实验表明，S-MNN在精度上与原始MNN相匹配，同时大幅减少了计算资源的消耗。因此，S-MNN可以在应用中取代原始MNN，为将机制瓶颈整合到复杂动态系统的神经网络模型中提供了一种实用且高效的工具。源代码可在https://github.com/IST-DASLab/ScalableMNN找到。

更新时间: 2025-03-14 22:00:28

领域: cs.LG

下载: http://arxiv.org/abs/2410.06074v2

LLMs for Translation: Historical, Low-Resourced Languages and Contemporary AI Models

Large Language Models (LLMs) have demonstrated remarkable adaptability in performing various tasks, including machine translation (MT), without explicit training. Models such as OpenAI's GPT-4 and Google's Gemini are frequently evaluated on translation benchmarks and utilized as translation tools due to their high performance. This paper examines Gemini's performance in translating an 18th-century Ottoman Turkish manuscript, Prisoner of the Infidels: The Memoirs of Osman Agha of Timisoara, into English. The manuscript recounts the experiences of Osman Agha, an Ottoman subject who spent 11 years as a prisoner of war in Austria, and includes his accounts of warfare and violence. Our analysis reveals that Gemini's safety mechanisms flagged between 14 and 23 percent of the manuscript as harmful, resulting in untranslated passages. These safety settings, while effective in mitigating potential harm, hinder the model's ability to provide complete and accurate translations of historical texts. Through real historical examples, this study highlights the inherent challenges and limitations of current LLM safety implementations in the handling of sensitive and context-rich materials. These real-world instances underscore potential failures of LLMs in contemporary translation scenarios, where accurate and comprehensive translations are crucial-for example, translating the accounts of modern victims of war for legal proceedings or humanitarian documentation.

Updated: 2025-03-14 21:59:12

标题: LLMs用于翻译：历史悠久的、资源匮乏的语言和当代人工智能模型

摘要: 大型语言模型（LLMs）已经在执行各种任务中展示了出色的适应性，包括机器翻译（MT），而无需明确训练。诸如OpenAI的GPT-4和Google的Gemini等模型经常在翻译基准上进行评估，并被用作翻译工具，因为它们具有高性能。本文考察了Gemini在将18世纪奥斯曼土耳其手稿《异教徒的俘虏：蒂米什瓦拉的奥斯曼阿加的回忆录》翻译成英语时的表现。该手稿讲述了奥斯曼阿加的经历，他是奥斯曼帝国的臣民，在奥地利作为战俘度过了11年，并包括了他的战争和暴力经历。我们的分析显示，Gemini的安全机制将手稿的14%至23%标记为有害，导致了未翻译的段落。尽管这些安全设置在减少潜在危害方面有效，但阻碍了模型提供历史文本的完整和准确翻译的能力。通过真实的历史案例，本研究突显了当前LLM安全实施在处理敏感和富有背景的材料时的固有挑战和限制。这些现实举例强调了LLM在当代翻译场景中的潜在失败，其中准确和全面的翻译至关重要，例如为法律诉讼或人道主义文件翻译现代战争受害者的描述。

更新时间: 2025-03-14 21:59:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.11898v1

PREAMBLE: Private and Efficient Aggregation of Block Sparse Vectors and Applications

We revisit the problem of secure aggregation of high-dimensional vectors in a two-server system such as Prio. These systems are typically used to aggregate vectors such as gradients in private federated learning, where the aggregate itself is protected via noise addition to ensure differential privacy. Existing approaches require communication scaling with the dimensionality, and thus limit the dimensionality of vectors one can efficiently process in this setup. We propose PREAMBLE: Private Efficient Aggregation Mechanism for BLock-sparse Euclidean Vectors. PREAMBLE is a novel extension of distributed point functions that enables communication- and computation-efficient aggregation of block-sparse vectors, which are sparse vectors where the non-zero entries occur in a small number of clusters of consecutive coordinates. We then show that PREAMBLE can be combined with random sampling and privacy amplification by sampling results, to allow asymptotically optimal privacy-utility trade-offs for vector aggregation, at a fraction of the communication cost. When coupled with recent advances in numerical privacy accounting, our approach incurs a negligible overhead in noise variance, compared to the Gaussian mechanism used with Prio.

Updated: 2025-03-14 21:58:15

标题: 序言：区块稀疏向量的私密高效聚合及其应用

摘要: 我们重新审视了在像Prio这样的两服务器系统中安全聚合高维向量的问题。这些系统通常用于在私人联邦学习中聚合梯度等向量，其中聚合本身受到噪声添加的保护，以确保差分隐私。现有方法要求与维度成比例的通信量，从而限制了一个人可以在此设置中高效处理的向量的维度。我们提出了PREAMBLE：用于块稀疏欧几里德向量的私人高效聚合机制。PREAMBLE是分布式点函数的一种新颖扩展，它实现了对块稀疏向量的通信和计算高效聚合，其中非零条目出现在少数连续坐标的簇中。然后我们展示了PREAMBLE可以与随机抽样和通过抽样结果的隐私增强相结合，以允许向量聚合的渐近最优隐私-效用权衡，通信成本的一小部分。当结合最近进展的数字隐私核算时，与Prio使用的高斯机制相比，我们的方法在噪声方差方面产生了微不足道的开销。

更新时间: 2025-03-14 21:58:15

领域: cs.CR,cs.DS,cs.LG

下载: http://arxiv.org/abs/2503.11897v1

Accessibility Considerations in the Development of an AI Action Plan

We argue that there is a need for Accessibility to be represented in several important domains: - Capitalize on the new capabilities AI provides - Support for open source development of AI, which can allow disabled and disability focused professionals to contribute, including - Development of Accessibility Apps which help realise the promise of AI in accessibility domains - Open Source Model Development and Validation to ensure that accessibility concerns are addressed in these algorithms - Data Augmentation to include accessibility in data sets used to train models - Accessible Interfaces that allow disabled people to use any AI app, and to validate its outputs - Dedicated Functionality and Libraries that can make it easy to integrate AI support into a variety of settings and apps. - Data security and privacy and privacy risks including data collected by AI based accessibility technologies; and the possibility of disability disclosure. - Disability-specific AI risks and biases including both direct bias (during AI use by the disabled person) and indirect bias (when AI is used by someone else on data relating to a disabled person).

Updated: 2025-03-14 21:57:23

标题: 在制定AI行动计划中的可访问性考虑

摘要: 我们认为有必要在几个重要领域中代表无障碍性： - 充分利用人工智能提供的新能力 - 支持人工智能的开源开发，这可以让残疾人和残疾人专业人士参与贡献，包括 - 开发有助于实现无障碍性领域中人工智能承诺的无障碍性应用程序 - 开源模型开发和验证，以确保这些算法中解决无障碍性问题 - 数据增强，将无障碍性包括在用于训练模型的数据集中 - 可让残疾人使用任何人工智能应用程序并验证其输出的可访问界面 - 专用功能和库，可使将人工智能支持集成到各种设置和应用程序中变得容易。- 数据安全和隐私以及由基于人工智能的无障碍技术收集的数据所涉及的隐私风险；以及可能的残疾披露。 - 残疾特定的人工智能风险和偏见，包括直接偏见（残疾人员使用人工智能时）和间接偏见（他人在与残疾人相关的数据上使用人工智能时）。

更新时间: 2025-03-14 21:57:23

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.14522v1

Expressive Music Data Processing and Generation

Musical expressivity and coherence are indispensable in music composition and performance, while often neglected in modern AI generative models. In this work, we introduce a listening-based data-processing technique that captures the expressivity in musical performance. This technique derived from Weber's law reflects the human perceptual truth of listening and preserves musical subtlety and expressivity in the training input. To facilitate musical coherence, we model the output interdependencies among multiple arguments in the music data such as pitch, duration, velocity, etc. in the neural networks based on the probabilistic chain rule. In practice, we decompose the multi-output sequential model into single-output submodels and condition previously sampled outputs on the subsequent submodels to induce conditional distributions. Finally, to select eligible sequences from all generations, a tentative measure based on the output entropy was proposed. The entropy sequence is set as a criterion to select predictable and stable generations, which is further studied under the context of informational aesthetic measures to quantify musical pleasure and information gain along the music tendency.

Updated: 2025-03-14 21:56:07

标题: 音乐表现数据处理与生成

摘要: 音乐表现力和连贯性在音乐创作和表演中是不可或缺的，然而在现代人工智能生成模型中常常被忽视。在这项工作中，我们引入了一种基于听觉的数据处理技术，捕捉了音乐表现力。这种技术源于韦伯定律，反映了人类听觉的真实性，并保留了训练输入中的音乐微妙和表现力。为了促进音乐的连贯性，我们在神经网络中基于概率链规则对音乐数据中的音高、持续时间、速度等多个参数之间的输出相互依赖性进行建模。在实践中，我们将多输出顺序模型分解为单输出子模型，并在后续子模型上对先前采样的输出进行条件设定，以诱导条件分布。最后，为了从所有生成的序列中选择合格的序列，提出了一种基于输出熵的临时度量。熵序列被设定为选择可预测和稳定的生成物的标准，进一步在信息美学度量的背景下研究，以量化音乐趋势中的音乐愉悦和信息增益。

更新时间: 2025-03-14 21:56:07

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2503.11896v1

Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing

Large Language Models (LLMs) are used in various downstream language tasks, making it crucial to keep their knowledge up-to-date, but both retraining and fine-tuning the model can be costly. Model editing offers an efficient and effective alternative by a single update to only a key subset of model parameters. While being efficient, these methods are not perfect. Sometimes knowledge edits are unsuccessful, i.e., UnderEdit, or the edit contaminated neighboring knowledge that should remain unchanged, i.e., OverEdit. To address these limitations, we propose iterative model editing, based on our hypothesis that a single parameter update is often insufficient, to mitigate UnderEdit, and neighbor-assisted model editing, which incorporates neighboring knowledge during editing to minimize OverEdit. Extensive experiments demonstrate that our methods effectively reduce UnderEdit up to 38 percentage points and OverEdit up to 6 percentage points across multiple model editing algorithms, LLMs, and benchmark datasets.

Updated: 2025-03-14 21:53:12

标题: 使用迭代和邻居辅助模型编辑解决编辑不足和编辑过度

摘要: 大型语言模型（LLMs）被用于各种下游语言任务，保持其知识最新是至关重要的，但重新训练和微调模型都可能代价高昂。模型编辑提供了一种高效和有效的替代方案，通过对模型的关键子集进行单次更新。虽然这些方法高效，但并非完美。有时知识编辑是不成功的，即UnderEdit，或者编辑污染了应该保持不变的相邻知识，即OverEdit。为了解决这些限制，我们提出了迭代模型编辑，基于我们的假设，单个参数更新通常是不足的，以减轻UnderEdit，并且邻居辅助模型编辑，通过在编辑过程中整合相邻知识以最小化OverEdit。大量实验证明，我们的方法有效地将UnderEdit减少了高达38个百分点，将OverEdit减少了高达6个百分点，涵盖了多个模型编辑算法、LLMs和基准数据集。

更新时间: 2025-03-14 21:53:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.11895v1

Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization

We analyze the landscape and training dynamics of diagonal linear networks in a linear regression task, with the network parameters being perturbed by small isotropic normal noise. The addition of such noise may be interpreted as a stochastic form of sharpness-aware minimization (SAM) and we prove several results that relate its action on the underlying landscape and training dynamics to the sharpness of the loss. In particular, the noise changes the expected gradient to force balancing of the weight matrices at a fast rate along the descent trajectory. In the diagonal linear model, we show that this equates to minimizing the average sharpness, as well as the trace of the Hessian matrix, among all possible factorizations of the same matrix. Further, the noise forces the gradient descent iterates towards a shrinkage-thresholding of the underlying true parameter, with the noise level explicitly regulating both the shrinkage factor and the threshold.

Updated: 2025-03-14 21:45:12

标题: 用随机锐度感知最小化训练对角线线性网络

摘要: 我们分析了对角线性网络在线性回归任务中的景观和训练动态，网络参数受到小的各向同性正态噪声的扰动。这种噪声的添加可以被解释为一种随机形式的锐度感知最小化（SAM），我们证明了几个与其对基础景观和训练动态的作用与损失的锐度相关的结果。特别地，噪声改变了期望梯度以迫使在沿着下降轨迹的快速速率上平衡权重矩阵。在对角线性模型中，我们表明这等同于在所有可能的矩阵因子化中最小化平均锐度以及海森矩阵的迹。此外，噪声迫使梯度下降迭代朝向基础真实参数的收缩阈值，噪声水平明确调节着收缩因子和阈值。

更新时间: 2025-03-14 21:45:12

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2503.11891v1

Identifying Likely-Reputable Blockchain Projects on Ethereum

Identifying reputable Ethereum projects remains a critical challenge within the expanding blockchain ecosystem. The ability to distinguish between legitimate initiatives and potentially fraudulent schemes is non-trivial. This work presents a systematic approach that integrates multiple data sources with advanced analytics to evaluate credibility, transparency, and overall trustworthiness. The methodology applies machine learning techniques to analyse transaction histories on the Ethereum blockchain. The study classifies accounts based on a dataset comprising 2,179 entities linked to illicit activities and 3,977 associated with reputable projects. Using the LightGBM algorithm, the approach achieves an average accuracy of 0.984 and an average AUC of 0.999, validated through 10-fold cross-validation. Key influential factors include time differences between transactions and received_tnx. The proposed methodology provides a robust mechanism for identifying reputable Ethereum projects, fostering a more secure and transparent investment environment. By equipping stakeholders with data-driven insights, this research enables more informed decision-making, risk mitigation, and the promotion of legitimate blockchain initiatives. Furthermore, it lays the foundation for future advancements in trust assessment methodologies, contributing to the continued development and maturity of the Ethereum ecosystem.

Updated: 2025-03-14 21:43:25

标题: 在以太坊上识别可能信誉良好的区块链项目

摘要: 在不断扩大的区块链生态系统中，识别值得信赖的以太坊项目仍然是一个关键挑战。区分合法倡议和潜在的欺诈计划并非易事。本文提出了一个系统方法，将多个数据源与先进的分析技术相结合，评估可信度、透明度和整体信誉度。该方法应用机器学习技术分析以太坊区块链上的交易历史。该研究根据一个包含2,179个与非法活动有关的实体和3,977个与值得信赖项目有关的数据集对账户进行分类。采用LightGBM算法，该方法通过10折交叉验证实现了平均准确率为0.984和平均AUC为0.999。关键影响因素包括交易之间的时间差异和received_tnx。提出的方法为识别值得信赖的以太坊项目提供了一个强大的机制，促进了更安全、透明的投资环境。通过为利益相关者提供数据驱动的见解，这项研究使决策更加明智，风险缓解，并推动合法的区块链倡议。此外，它为信任评估方法的未来发展奠定了基础，有助于以太坊生态系统的持续发展和成熟。

更新时间: 2025-03-14 21:43:25

领域: cs.CR,cs.AI,cs.ET

下载: http://arxiv.org/abs/2503.15542v1

Advanced Deep Learning Methods for Protein Structure Prediction and Design

After AlphaFold won the Nobel Prize, protein prediction with deep learning once again became a hot topic. We comprehensively explore advanced deep learning methods applied to protein structure prediction and design. It begins by examining recent innovations in prediction architectures, with detailed discussions on improvements such as diffusion based frameworks and novel pairwise attention modules. The text analyses key components including structure generation, evaluation metrics, multiple sequence alignment processing, and network architecture, thereby illustrating the current state of the art in computational protein modelling. Subsequent chapters focus on practical applications, presenting case studies that range from individual protein predictions to complex biomolecular interactions. Strategies for enhancing prediction accuracy and integrating deep learning techniques with experimental validation are thoroughly explored. The later sections review the industry landscape of protein design, highlighting the transformative role of artificial intelligence in biotechnology and discussing emerging market trends and future challenges. Supplementary appendices provide essential resources such as databases and open source tools, making this volume a valuable reference for researchers and students.

Updated: 2025-03-14 21:28:29

标题: 先进的深度学习方法用于蛋白质结构预测和设计

摘要: 在AlphaFold获得诺贝尔奖之后，利用深度学习进行蛋白质预测再次成为热门话题。我们全面探讨了应用于蛋白质结构预测和设计的先进深度学习方法。本文首先审视了预测架构的最新创新，详细讨论了基于扩散的框架和新型成对注意力模块等改进。文中分析了关键组成部分，包括结构生成、评估指标、多序列比对处理和网络架构，从而展示了计算蛋白质建模的最新技术水平。随后的章节着重于实际应用，展示了从个体蛋白质预测到复杂生物分子相互作用的案例研究。深入探讨了提高预测准确性和将深度学习技术与实验验证相结合的策略。后续部分回顾了蛋白质设计的行业现状，突出了人工智能在生物技术中的转变作用，讨论了新兴市场趋势和未来挑战。附录提供了数据库和开源工具等必要资源，使本书成为研究人员和学生的宝贵参考资料。

更新时间: 2025-03-14 21:28:29

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.13522v1

FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-the-World LoRA

Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to harmful cross-client interference and suboptimal personalization. In this work, we propose \textbf{FedALT}, a novel personalized federated LoRA fine-tuning algorithm that fundamentally departs from FedAvg. Instead of using an aggregated model to initialize local training, each client continues training its individual LoRA while incorporating shared knowledge through a separate Rest-of-the-World (RoTW) LoRA component. To effectively balance local adaptation and global information, FedALT introduces an adaptive mixer that dynamically learns input-specific weightings between the individual and RoTW LoRA components using the Mixture-of-Experts (MoE) principle. Through extensive experiments on NLP benchmarks, we demonstrate that FedALT significantly outperforms state-of-the-art personalized federated LoRA fine-tuning methods, achieving superior local adaptation without sacrificing computational efficiency.

Updated: 2025-03-14 21:07:46

标题: FedALT: 通过适应性本地训练和全球剩余LoRA进行联邦微调

摘要: 在联邦设置中微调大型语言模型（LLMs）可以实现隐私保护的适应性，但由于模型聚合而遭受跨客户端干扰。现有的基于FedAvg的联邦LoRA微调方法主要面临数据异质性的挑战，导致有害的跨客户端干扰和个性化效果不佳。在这项工作中，我们提出了FedALT，这是一种新颖的个性化联邦LoRA微调算法，与FedAvg有根本性的不同。每个客户端在不使用聚合模型初始化本地训练的情况下继续训练其个人LoRA，同时通过单独的“世界其他地方”（RoTW）LoRA组件融入共享知识。为了有效平衡本地适应性和全局信息，FedALT引入了一个自适应混合器，通过Mixture-of-Experts（MoE）原则动态学习个体和RoTW LoRA组件之间的输入特定加权。通过在自然语言处理基准测试上进行大量实验，我们证明了FedALT明显优于最先进的个性化联邦LoRA微调方法，实现了卓越的本地适应性而不牺牲计算效率。

更新时间: 2025-03-14 21:07:46

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.11880v1

Counterfactual Realizability

It is commonly believed that, in a real-world environment, samples can only be drawn from observational and interventional distributions, corresponding to Layers 1 and 2 of the Pearl Causal Hierarchy. Layer 3, representing counterfactual distributions, is believed to be inaccessible by definition. However, Bareinboim, Forney, and Pearl (2015) introduced a procedure that allows an agent to sample directly from a counterfactual distribution, leaving open the question of what other counterfactual quantities can be estimated directly via physical experimentation. We resolve this by introducing a formal definition of realizability, the ability to draw samples from a distribution, and then developing a complete algorithm to determine whether an arbitrary counterfactual distribution is realizable given fundamental physical constraints, such as the inability to go back in time and subject the same unit to a different experimental condition. We illustrate the implications of this new framework for counterfactual data collection using motivating examples from causal fairness and causal reinforcement learning. While the baseline approach in these motivating settings typically follows an interventional or observational strategy, we show that a counterfactual strategy provably dominates both.

Updated: 2025-03-14 20:54:27

标题: 反事实实现可能性

摘要: 人们普遍认为，在现实世界环境中，样本只能从观察和干预分布中抽取，对应于Pearl因果层次的第一层和第二层。第三层代表反事实分布，据信根据定义是无法访问的。然而，Bareinboim、Forney和Pearl（2015）引入了一种程序，允许代理直接从反事实分布中抽样，留下一个问题，即通过物理实验可以直接估计其他反事实量是什么。我们通过引入一个形式化的实现性定义，即从分布中抽样的能力，然后开发一个完整的算法来确定是否在基本物理约束条件下一个任意的反事实分布是可实现的，例如不能回到过去并将同一个单位置于不同的实验条件下。我们通过使用因果公平性和因果强化学习的激励性示例说明了这种新框架对反事实数据收集的影响。虽然在这些激励性设置中的基线方法通常遵循干预或观察策略，但我们表明，反事实策略可以证明优于两者。

更新时间: 2025-03-14 20:54:27

领域: cs.AI,cs.LG,F.4.1; G.3

下载: http://arxiv.org/abs/2503.11870v1

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo and code available: https://aad-llm.github.io.

Updated: 2025-03-14 20:46:33

标题: AAD-LLM：神经注意力驱动的听觉场景理解

摘要: 听觉基础模型，包括听觉大型语言模型（LLMs），处理所有声音输入，独立于听者的感知。然而，人类听觉感知本质上是有选择性的：听者在复杂的听觉场景中专注于特定的讲话者，而忽略其他人。现有模型没有将这种选择性纳入考虑，限制了它们生成与感知对齐的响应的能力。为了解决这个问题，我们引入了意图感知听觉场景理解（II-ASU），并提出了Auditory Attention-Driven LLM（AAD-LLM），这是一个原型系统，它集成了大脑信号来推断听者的注意力。AAD-LLM通过结合颅内脑电图（iEEG）记录来扩展听觉LLM，以解码听者正在关注的讲话者，并相应地优化响应。该模型首先从神经活动中预测关注的说话者，然后根据这个推断的注意状态来调整响应生成。我们在多讲话者场景中评估了AAD-LLM在说话者描述、语音转录和提取以及问题回答方面的表现，客观和主观评分均显示了与听者意图的改进对齐。通过迈出意图感知听觉AI的第一步，这项工作探索了一个新的范式，即听者感知指导机器听力，为未来以听者为中心的听觉系统铺平了道路。演示和代码可在以下网址找到：https://aad-llm.github.io。

更新时间: 2025-03-14 20:46:33

领域: cs.SD,cs.AI,cs.CL,cs.HC,eess.AS

下载: http://arxiv.org/abs/2502.16794v2

Banking on Feedback: Text Analysis of Mobile Banking iOS and Google App Reviews

The rapid growth of mobile banking (m-banking), especially after the COVID-19 pandemic, has reshaped the financial sector. This study analyzes consumer reviews of m-banking apps from five major Canadian banks, collected from Google Play and iOS App stores. Sentiment analysis and topic modeling classify reviews as positive, neutral, or negative, highlighting user preferences and areas for improvement. Data pre-processing was performed with NLTK, a Python language processing tool, and topic modeling used Latent Dirichlet Allocation (LDA). Sentiment analysis compared methods, with Long Short-Term Memory (LSTM) achieving 82\% accuracy for iOS reviews and Multinomial Naive Bayes 77\% for Google Play. Positive reviews praised usability, reliability, and features, while negative reviews identified login issues, glitches, and dissatisfaction with updates.This is the first study to analyze both iOS and Google Play m-banking app reviews, offering insights into app strengths and weaknesses. Findings underscore the importance of user-friendly designs, stable updates, and better customer service. Advanced text analytics provide actionable recommendations for improving user satisfaction and experience.

Updated: 2025-03-14 20:41:17

标题: 依靠反馈：对移动银行iOS和谷歌应用评论的文本分析

摘要: 移动银行（m-banking）的快速增长，特别是在COVID-19大流行之后，已经重塑了金融行业。本研究分析了来自五家加拿大主要银行的消费者对m-banking应用程序的评论，这些评论是从Google Play和iOS App商店收集的。情感分析和主题建模将评论分类为积极的、中性的或消极的，突出了用户偏好和改进领域。数据预处理使用了NLTK，一种Python语言处理工具，主题建模使用了隐含狄利克雷分布（LDA）。情感分析比较了方法，其中长短期记忆（LSTM）在iOS评论中达到了82\%的准确率，而多项式朴素贝叶斯在Google Play中达到了77\%。积极的评论赞扬了易用性、可靠性和功能，而消极的评论则指出了登录问题、故障和对更新不满。这是第一项分析iOS和Google Play m-banking应用评论的研究，为应用程序的优势和劣势提供了见解。研究结果强调了用户友好设计、稳定更新和更好的客户服务的重要性。高级文本分析提供了改善用户满意度和体验的可行建议。

更新时间: 2025-03-14 20:41:17

领域: cs.LG,cs.HC,cs.IT,math.IT

下载: http://arxiv.org/abs/2503.11861v1

Bayes and Biased Estimators Without Hyper-parameter Estimation: Comparable Performance to the Empirical-Bayes-Based Regularized Estimator

Regularized system identification has become a significant complement to more classical system identification. It has been numerically shown that kernel-based regularized estimators often perform better than the maximum likelihood estimator in terms of minimizing mean squared error (MSE). However, regularized estimators often require hyper-parameter estimation. This paper focuses on ridge regression and the regularized estimator by employing the empirical Bayes hyper-parameter estimator. We utilize the excess MSE to quantify the MSE difference between the empirical-Bayes-based regularized estimator and the maximum likelihood estimator for large sample sizes. We then exploit the excess MSE expressions to develop both a family of generalized Bayes estimators and a family of closed-form biased estimators. They have the same excess MSE as the empirical-Bayes-based regularized estimator but eliminate the need for hyper-parameter estimation. Moreover, we conduct numerical simulations to show that the performance of these new estimators is comparable to the empirical-Bayes-based regularized estimator, while computationally, they are more efficient.

Updated: 2025-03-14 20:33:08

标题: 贝叶斯和有偏估计器无需超参数估计: 与基于经验贝叶斯的正则化估计器性能相当

摘要: 正则化系统识别已成为更经典系统识别的重要补充。已经通过数值方法证明，基于核的正则化估计器在最小化均方误差（MSE）方面通常优于最大似然估计器。然而，正则化估计器通常需要超参数估计。本文关注岭回归和通过使用经验贝叶斯超参数估计器的正则化估计器。我们利用超过MSE来量化基于经验贝叶斯的正则化估计器与最大似然估计器在大样本量下的MSE差异。然后，我们利用超过MSE表达式开发了一系列广义贝叶斯估计器和一系列封闭形式的有偏估计器。它们与基于经验贝叶斯的正则化估计器具有相同的超过MSE，但消除了超参数估计的需要。此外，我们进行数值模拟表明，这些新估计器的性能与基于经验贝叶斯的正则化估计器可比，而在计算上更加高效。

更新时间: 2025-03-14 20:33:08

领域: stat.ML,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.11854v1

Visual Modality Prompt for Adapting Vision-Language Object Detectors

The zero-shot performance of object detectors degrades when tested on different modalities, such as infrared and depth. While recent work has explored image translation techniques to adapt detectors to new modalities, these methods are limited to a single modality and apply only to traditional detectors. Recently, vision-language detectors, such as YOLO-World and Grounding DINO, have shown promising zero-shot capabilities, however, they have not yet been adapted for other visual modalities. Traditional fine-tuning approaches compromise the zero-shot capabilities of the detectors. The visual prompt strategies commonly used for classification with vision-language models apply the same linear prompt translation to each image, making them less effective. To address these limitations, we propose ModPrompt, a visual prompt strategy to adapt vision-language detectors to new modalities without degrading zero-shot performance. In particular, an encoder-decoder visual prompt strategy is proposed, further enhanced by the integration of inference-friendly modality prompt decoupled residual, facilitating a more robust adaptation. Empirical benchmarking results show our method for modality adaptation on two vision-language detectors, YOLO-World and Grounding DINO, and on challenging infrared (LLVIP, FLIR) and depth (NYUv2) datasets, achieving performance comparable to full fine-tuning while preserving the model's zero-shot capability. Code available at: https://github.com/heitorrapela/ModPrompt.

Updated: 2025-03-14 20:32:12

标题: 视觉模态提示用于调整视觉-语言对象检测器

摘要: 目前使用对象检测器进行零样本性能测试时，当在不同的模态下进行测试，例如红外和深度时，其性能会下降。最近的研究探索了图像翻译技术，以适应检测器到新的模态，但这些方法仅限于单一模态，并且仅适用于传统检测器。最近，像YOLO-World和Grounding DINO这样的视觉-语言检测器展示了有希望的零样本能力，然而，它们尚未被适应到其他视觉模态上。传统的微调方法会损害检测器的零样本能力。通常用于视觉-语言模型分类的视觉提示策略将相同的线性提示翻译应用于每个图像，使其效果较差。为了解决这些限制，我们提出了ModPrompt，一种视觉提示策略，可以在不降低零样本性能的情况下适应视觉-语言检测器到新的模态。具体来说，提出了一种编码器-解码器视觉提示策略，通过集成推理友好的模态提示解耦残差来进一步增强适应性，促进更强大的适应性。实证基准结果显示了我们针对两个视觉-语言检测器，YOLO-World和Grounding DINO，在具有挑战性的红外（LLVIP，FLIR）和深度（NYUv2）数据集上进行模态适应的方法，实现了与完全微调相当的性能，同时保留了模型的零样本能力。代码可在https://github.com/heitorrapela/ModPrompt 上找到。

更新时间: 2025-03-14 20:32:12

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.00622v2

Local Pan-Privacy for Federated Analytics

Pan-privacy was proposed by Dwork et al. as an approach to designing a private analytics system that retains its privacy properties in the face of intrusions that expose the system's internal state. Motivated by federated telemetry applications, we study local pan-privacy, where privacy should be retained under repeated unannounced intrusions on the local state. We consider the problem of monitoring the count of an event in a federated system, where event occurrences on a local device should be hidden even from an intruder on that device. We show that under reasonable constraints, the goal of providing information-theoretic differential privacy under intrusion is incompatible with collecting telemetry information. We then show that this problem can be solved in a scalable way using standard cryptographic primitives.

Updated: 2025-03-14 20:18:33

标题: 联邦分析的本地面向隐私

摘要: Pan-privacy是由Dwork等人提出的一种方法，用于设计一个保持隐私性质的私人分析系统，即使系统的内部状态遭到暴露也能保持隐私性质。受联邦遥测应用的启发，我们研究了本地pan-privacy，其中隐私性应该在本地状态被重复未经通知的侵入时得以保持。我们考虑在一个联邦系统中监测事件发生次数的问题，本地设备上的事件发生应该即使对于该设备上的入侵者也是隐藏的。我们表明，在合理的约束条件下，提供信息论差分隐私在入侵下是与收集遥测信息不兼容的目标。然后我们展示了这个问题可以通过使用标准的加密原语以可扩展的方式解决。

更新时间: 2025-03-14 20:18:33

领域: cs.CR,cs.DS,cs.LG

下载: http://arxiv.org/abs/2503.11850v1

Optimization-Augmented Machine Learning for Vehicle Operations in Emergency Medical Services

Minimizing response times to meet legal requirements and serve patients in a timely manner is crucial for Emergency Medical Service (EMS) systems. Achieving this goal necessitates optimizing operational decision-making to efficiently manage ambulances. Against this background, we study a centrally controlled EMS system for which we learn an online ambulance dispatching and redeployment policy that aims at minimizing the mean response time of ambulances within the system by dispatching an ambulance upon receiving an emergency call and redeploying it to a waiting location upon the completion of its service. We propose a novel combinatorial optimization-augmented machine learning pipeline that allows to learn efficient policies for ambulance dispatching and redeployment. In this context, we further show how to solve the underlying full-information problem to generate training data and propose an augmentation scheme that improves our pipeline's generalization performance by mitigating a possible distribution mismatch with respect to the considered state space. Compared to existing methods that rely on augmentation during training, our approach offers substantial runtime savings of up to 87.9% while yielding competitive performance. To evaluate the performance of our pipeline against current industry practices, we conduct a numerical case study on the example of San Francisco's 911 call data. Results show that the learned policies outperform the online benchmarks across various resource and demand scenarios, yielding a reduction in mean response time of up to 30%.

Updated: 2025-03-14 20:15:26

标题: 紧急医疗服务车辆运营的优化增强机器学习

摘要: 为了满足法律要求并及时为患者提供服务，减少响应时间对急救服务系统至关重要。实现这一目标需要优化运营决策，以有效管理救护车。在此背景下，我们研究了一个集中控制的急救服务系统，我们学习了一种在线救护车调度和再部署政策，旨在通过在接到紧急呼叫时派遣救护车，并在完成服务后将其重新部署到等待位置，以最小化系统内救护车的平均响应时间。我们提出了一种新颖的组合优化增强机器学习流程，可以学习出高效的救护车调度和再部署政策。在这种情况下，我们进一步展示了如何解决基础的全信息问题以生成训练数据，并提出了一种增强方案，通过减轻与考虑的状态空间可能存在的分布不匹配，提高了我们的流程的泛化性能。与依赖于训练期间增强的现有方法相比，我们的方法在减少运行时间方面节省了高达87.9%，同时保持了竞争性能。为了评估我们的流程与当前行业实践的性能，我们在旧金山911呼叫数据的例子上进行了数字案例研究。结果显示，学习的政策在各种资源和需求情景下优于在线基准，使平均响应时间减少了高达30%。

更新时间: 2025-03-14 20:15:26

领域: cs.LG

下载: http://arxiv.org/abs/2503.11848v1

From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis

The histopathological classification of whole-slide images (WSIs) is a fundamental task in digital pathology; yet it requires extensive time and expertise from specialists. While deep learning methods show promising results, they typically process WSIs by dividing them into artificial patches, which inherently prevents a network from learning from the entire image context, disregards natural tissue structures and compromises interpretability. Our method overcomes this limitation through a novel graph-based framework that constructs WSI graph representations. The WSI-graph efficiently captures essential histopathological information in a compact form. We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches all while providing interpretable features for explainability. Through adaptive graph coarsening guided by learned embeddings, we progressively merge regions while maintaining discriminative local features and enabling efficient global information exchange. In our method's final step, we solve the diagnostic task through a graph attention network. We empirically demonstrate strong performance on multiple challenging tasks such as cancer stage classification and survival prediction, while also identifying predictive factors using Integrated Gradients. Our implementation is publicly available at https://github.com/HistoGraph31/pix2pathology

Updated: 2025-03-14 20:15:04

标题: 从像素到组织病理学：一种基于图形的可解释全切片图像分析框架

摘要: 这篇文献摘要介绍了关于整张切片图像（WSIs）的组织病理分类，这是数字病理学中的一个基本任务，但需要专家的大量时间和专业知识。虽然深度学习方法显示出有希望的结果，但它们通常通过将WSIs分成人工补丁来处理，这本质上阻止了网络从整个图像环境中学习，忽视了自然组织结构并损害了可解释性。我们的方法通过一个新颖的基于图的框架克服了这一限制，构建了WSI图表示。WSI图有效地以紧凑的形式捕获了关键的组织病理学信息。我们构建了遵循生物边界而不是任意补丁的组织表示（节点），同时提供了可解释的特征以便解释。通过学习嵌入引导的自适应图粗化，我们逐渐合并区域，同时保持区分性的局部特征并实现高效的全局信息交换。在我们方法的最后一步，我们通过图注意力网络解决了诊断任务。我们在多个具有挑战性的任务上进行了实证表现，例如癌症分期分类和存活预测，同时使用综合梯度识别了预测因素。我们的实现公开可用在https://github.com/HistoGraph31/pix2pathology。

更新时间: 2025-03-14 20:15:04

领域: eess.IV,cs.AI,cs.CV,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2503.11846v1

Systematic Classification of Studies Investigating Social Media Conversations about Long COVID Using a Novel Zero-Shot Transformer Framework

Long COVID continues to challenge public health by affecting a considerable number of individuals who have recovered from acute SARS-CoV-2 infection yet endure prolonged and often debilitating symptoms. Social media has emerged as a vital resource for those seeking real-time information, peer support, and validating their health concerns related to Long COVID. This paper examines recent works focusing on mining, analyzing, and interpreting user-generated content on social media platforms to capture the broader discourse on persistent post-COVID conditions. A novel transformer-based zero-shot learning approach serves as the foundation for classifying research papers in this area into four primary categories: Clinical or Symptom Characterization, Advanced NLP or Computational Methods, Policy Advocacy or Public Health Communication, and Online Communities and Social Support. This methodology achieved an average confidence of 0.7788, with the minimum and maximum confidence being 0.1566 and 0.9928, respectively. This model showcases the ability of advanced language models to categorize research papers without any training data or predefined classification labels, thus enabling a more rapid and scalable assessment of existing literature. This paper also highlights the multifaceted nature of Long COVID research by demonstrating how advanced computational techniques applied to social media conversations can reveal deeper insights into the experiences, symptoms, and narratives of individuals affected by Long COVID.

Updated: 2025-03-14 20:13:08

标题: 基于新颖的零样本转换器框架的系统分类研究：探讨关于长期COVID-19的社交媒体对话

摘要: 长期COVID继续挑战公共卫生，影响了许多已经从急性SARS-CoV-2感染中恢复但仍遭受长期且常常令人痛苦的症状的个体。社交媒体已经成为那些寻找实时信息、同侪支持以及验证其与长期COVID相关健康问题的关键资源。本文审视了最近关注挖掘、分析和解释社交媒体平台上用户生成内容的作品，以捕捉有关持续性后COVID状况的更广泛讨论。一种基于转换器的零样本学习方法被用作将该领域的研究论文分类为四个主要类别的基础：临床或症状特征化、高级NLP或计算方法、政策倡导或公共卫生传播，以及在线社区和社会支持。该方法的平均置信度为0.7788，最小和最大置信度分别为0.1566和0.9928。这种模型展示了先进语言模型分类研究论文的能力，而无需任何训练数据或预定义的分类标签，从而实现对现有文献的更快速、更可扩展的评估。本文还通过展示如何将先进的计算技术应用于社交媒体对话，揭示受长期COVID影响的个人的经历、症状和叙述的更深层次见解，突显了长期COVID研究的多方面性质。

更新时间: 2025-03-14 20:13:08

领域: cs.SI,cs.CY,cs.LG,I.2.7; I.2.8; I.5.4; K.4.2; H.2.8; I.2.6

下载: http://arxiv.org/abs/2503.11845v1

Over-Squashing in Graph Neural Networks: A Comprehensive survey

Graph Neural Networks (GNNs) revolutionize machine learning for graph-structured data, effectively capturing complex relationships. They disseminate information through interconnected nodes, but long-range interactions face challenges known as "over-squashing". This survey delves into the challenge of over-squashing in Graph Neural Networks (GNNs), where long-range information dissemination is hindered, impacting tasks reliant on intricate long-distance interactions. It comprehensively explores the causes, consequences, and mitigation strategies for over-squashing. Various methodologies are reviewed, including graph rewiring, novel normalization, spectral analysis, and curvature-based strategies, with a focus on their trade-offs and effectiveness. The survey also discusses the interplay between over-squashing and other GNN limitations, such as over-smoothing, and provides a taxonomy of models designed to address these issues in node and graph-level tasks. Benchmark datasets for performance evaluation are also detailed, making this survey a valuable resource for researchers and practitioners in the GNN field.

Updated: 2025-03-14 20:10:31

标题: 图神经网络中的过度压缩：一项全面调查

摘要: 图神经网络（GNNs）彻底改变了图结构数据的机器学习领域，有效地捕捉了复杂的关系。它们通过相互连接的节点传播信息，但长距离交互面临着被称为“过度压缩”的挑战。本调查深入探讨了图神经网络（GNNs）中过度压缩的挑战，即长距离信息传播受阻，影响依赖于复杂长距离交互的任务。它全面探讨了过度压缩的原因、后果和缓解策略。回顾了各种方法论，包括图重连、新颖的归一化、谱分析和基于曲率的策略，重点关注它们的权衡和有效性。本调查还讨论了过度压缩与其他GNN限制之间的相互作用，如过度平滑，并提供了设计用于解决这些问题的节点和图级任务的模型分类。详细介绍了用于性能评估的基准数据集，使本调查成为GNN领域的研究人员和从业者的宝贵资源。

更新时间: 2025-03-14 20:10:31

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.15568v7

Test-Time Training Provably Improves Transformers as In-context Learners

Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.

Updated: 2025-03-14 20:06:37

标题: 测试时间训练可证明地提升变压器作为上下文学习者

摘要: 测试时间训练（TTT）方法明确地更新模型的权重以适应特定的测试实例，并且它们在各种设置中取得了成功，包括最近的语言建模和推理。为了揭示这种成功背后的原因，我们研究了一种基于梯度的TTT算法，用于上下文学习，在这种学习中，我们在测试提示中提供的上下文演示中训练一个变压器模型。具体来说，我们对线性变压器在更新规则为单个梯度步骤时进行了全面的理论表征。我们的理论（i）勾画了预训练分布与目标任务之间的对齐作用，（ii）揭示了TTT如何缓解分布偏移，（iii）量化了TTT的样本复杂性，包括它如何显著减少上下文学习所需的最终样本大小。作为我们的经验贡献，我们研究了TTT对TabPFN（表格基础模型）的好处。与我们的理论一致，我们证明TTT显著减少了表格分类所需的样本大小（减少3到5倍），从而以微不足道的训练成本实现了大量推理效率。

更新时间: 2025-03-14 20:06:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.11842v1

Trust Under Siege: Label Spoofing Attacks against Machine Learning for Android Malware Detection

Machine learning (ML) malware detectors rely heavily on crowd-sourced AntiVirus (AV) labels, with platforms like VirusTotal serving as a trusted source of malware annotations. But what if attackers could manipulate these labels to classify benign software as malicious? We introduce label spoofing attacks, a new threat that contaminates crowd-sourced datasets by embedding minimal and undetectable malicious patterns into benign samples. These patterns coerce AV engines into misclassifying legitimate files as harmful, enabling poisoning attacks against ML-based malware classifiers trained on those data. We demonstrate this scenario by developing AndroVenom, a methodology for polluting realistic data sources, causing consequent poisoning attacks against ML malware detectors. Experiments show that not only state-of-the-art feature extractors are unable to filter such injection, but also various ML models experience Denial of Service already with 1% poisoned samples. Additionally, attackers can flip decisions of specific unaltered benign samples by modifying only 0.015% of the training data, threatening their reputation and market share and being unable to be stopped by anomaly detectors on training data. We conclude our manuscript by raising the alarm on the trustworthiness of the training process based on AV annotations, requiring further investigation on how to produce proper labels for ML malware detectors.

Updated: 2025-03-14 20:05:56

标题: 信任受到威胁：针对Android恶意软件检测的机器学习标签欺骗攻击

摘要: 机器学习（ML）恶意软件检测器严重依赖于众包反病毒（AV）标签，像VirusTotal这样的平台被视为恶意软件注释的可信来源。但如果攻击者能够操纵这些标签，将良性软件归类为恶意软件呢？我们引入了标签欺骗攻击，这是一种新的威胁，通过将极小且不可检测的恶意模式嵌入到良性样本中，污染了众包数据集。这些模式迫使AV引擎错误地将合法文件归类为有害，从而使得对这些数据进行训练的基于ML的恶意软件分类器受到毒化攻击。我们通过开发AndroVenom来展示这种情况，这是一种污染现实数据源的方法，导致对ML恶意软件检测器进行毒化攻击。实验证明，不仅最先进的特征提取器无法过滤这种注入，而且各种ML模型在只有1%受污染样本时就已经遭受到服务拒绝攻击。此外，攻击者可以通过仅修改0.015%的训练数据来翻转特定未更改良性样本的决定，威胁到它们的声誉和市场份额，并且无法被训练数据上的异常检测器所阻止。我们通过提醒人们对基于AV注释的训练过程的可信度提出警告，需要进一步调查如何为ML恶意软件检测器生成适当的标签。

更新时间: 2025-03-14 20:05:56

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.11841v1

Transfer Learning for Automated Feedback Generation on Small Datasets

Feedback is a very important part the learning process. However, it is challenging to make this feedback both timely and accurate when relying on human markers. This is the challenge that Automated Feedback Generation attempts to address. In this paper, a technique to train such a system on a very small dataset with very long sequences is presented. Both of these attributes make this a very challenging task, however, by using a three stage transfer learning pipeline state-of-the-art results can be achieved with qualitatively accurate but unhuman sounding results. The use of both Automated Essay Scoring and Automated Feedback Generation systems in the real world is also discussed.

Updated: 2025-03-14 19:57:54

标题: 在小数据集上进行自动反馈生成的迁移学习

摘要: 反馈是学习过程中非常重要的一部分。然而，当依赖人类评分时，要使这种反馈既及时又准确是具有挑战性的。这就是自动反馈生成试图解决的挑战。本文提出了一种在非常小的数据集和非常长的序列上训练这种系统的技术。这两个属性使得这是一个非常具有挑战性的任务，然而，通过使用一个三阶段的迁移学习管道，可以实现最先进的结果，结果具有定性准确但不像人类的声音。在现实世界中使用自动作文评分和自动反馈生成系统也被讨论了。

更新时间: 2025-03-14 19:57:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.11836v1

Adaptive Stochastic Gradient Descents on Manifolds with an Application on Weighted Low-Rank Approximation

We prove a convergence theorem for stochastic gradient descents on manifolds with adaptive learning rate and apply it to the weighted low-rank approximation problem.

Updated: 2025-03-14 19:56:07

标题: 在流形上的自适应随机梯度下降及其在加权低秩逼近中的应用

摘要: 我们证明了一种在具有自适应学习率的流形上的随机梯度下降的收敛定理，并将其应用于加权低秩逼近问题。

更新时间: 2025-03-14 19:56:07

领域: math.OC,cs.AI,cs.LG,41A60, 53Z50, 62L20, 68T05

下载: http://arxiv.org/abs/2503.11833v1

Policy Frameworks for Transparent Chain-of-Thought Reasoning in Large Language Models

Chain-of-Thought (CoT) reasoning enhances large language models (LLMs) by decomposing complex problems into step-by-step solutions, improving performance on reasoning tasks. However, current CoT disclosure policies vary widely across different models in frontend visibility, API access, and pricing strategies, lacking a unified policy framework. This paper analyzes the dual-edged implications of full CoT disclosure: while it empowers small-model distillation, fosters trust, and enables error diagnosis, it also risks violating intellectual property, enabling misuse, and incurring operational costs. We propose a tiered-access policy framework that balances transparency, accountability, and security by tailoring CoT availability to academic, business, and general users through ethical licensing, structured reasoning outputs, and cross-tier safeguards. By harmonizing accessibility with ethical and operational considerations, this framework aims to advance responsible AI deployment while mitigating risks of misuse or misinterpretation.

Updated: 2025-03-14 19:54:18

标题: 大型语言模型中透明链式推理的政策框架

摘要: 链式思维（CoT）推理通过将复杂问题分解为逐步解决方案，增强了大型语言模型（LLMs）在推理任务上的性能。然而，当前CoT披露政策在前端可见性、API访问和定价策略方面存在广泛差异，缺乏统一的政策框架。本文分析了完整CoT披露的双重影响：虽然它能够赋予小型模型提炼能力，培养信任，以及实现错误诊断，但也存在侵犯知识产权、可能被滥用，以及产生运营成本的风险。我们提出了一个分层访问政策框架，通过将CoT的可用性调整到学术、商业和普通用户之间，通过道德许可、结构化推理输出和跨层安全保障来平衡透明度、问责制和安全性。通过将可访问性与道德和运营考虑相协调，这一框架旨在推进负责任的AI部署，同时减轻滥用或误解的风险。

更新时间: 2025-03-14 19:54:18

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.14521v1

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning

Recent vision-language models (VLMs) have made remarkable strides in generative modeling with multimodal inputs, particularly text and images. However, their susceptibility to generating harmful content when exposed to unsafe queries raises critical safety concerns. While current alignment strategies primarily rely on supervised safety fine-tuning with curated datasets, we identify a fundamental limitation we call the "safety mirage" where supervised fine-tuning inadvertently reinforces spurious correlations between superficial textual patterns and safety responses, rather than fostering deep, intrinsic mitigation of harm. We show that these spurious correlations leave fine-tuned VLMs vulnerable even to a simple one-word modification-based attack, where substituting a single word in text queries with a spurious correlation-inducing alternative can effectively bypass safeguards. Additionally, these correlations contribute to the over prudence, causing fine-tuned VLMs to refuse benign queries unnecessarily. To address this issue, we show machine unlearning (MU) as a powerful alternative to supervised safety fine-tuning as it avoids biased feature-label mappings and directly removes harmful knowledge from VLMs while preserving their general capabilities. Extensive evaluations across safety benchmarks show that under one-word attacks, MU-based alignment reduces the attack success rate by up to 60.17% and cuts unnecessary rejections by over 84.20%. Codes are available at https://github.com/OPTML-Group/VLM-Safety-MU. WARNING: There exist AI generations that may be offensive in nature.

Updated: 2025-03-14 19:52:08

标题: 安全幻觉：虚假相关性如何破坏VLM安全微调

摘要: 最近的视觉语言模型（VLMs）在生成建模方面取得了显著进展，特别是对于文本和图像等多模态输入。然而，当暴露于不安全的查询时，它们生成有害内容的易受性引发了关键的安全问题。尽管当前的对齐策略主要依赖于使用策划数据集进行监督安全微调，但我们发现了一个我们称之为“安全幻觉”的基本限制，即监督微调无意中会加强表面文本模式和安全响应之间的虚假相关性，而不是促进深层次的、内在的危害缓解。我们表明，这些虚假相关性使经过微调的VLMs即使对于简单的基于单词修改的攻击也容易受到攻击，其中用虚假相关性诱导的替代词替换文本查询中的单个词可以有效地绕过安全防护。此外，这些相关性导致过于谨慎，导致微调的VLMs不必要地拒绝良性查询。为了解决这个问题，我们将机器遗忘（MU）作为监督安全微调的强大替代方案，因为它避免了有偏见的特征-标签映射，并直接从VLMs中删除有害知识，同时保留它们的一般能力。在各种安全基准测试中进行的广泛评估表明，在单词攻击下，基于MU的对齐将攻击成功率降低了高达60.17%，并将不必要的拒绝率降低了超过84.20%。代码可在https://github.com/OPTML-Group/VLM-Safety-MU找到。警告：存在可能具有冒犯性质的AI生成。

更新时间: 2025-03-14 19:52:08

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.11832v1

Performance Analysis of Decentralized Federated Learning Deployments

The widespread adoption of smartphones and smart wearable devices has led to the widespread use of Centralized Federated Learning (CFL) for training powerful machine learning models while preserving data privacy. However, CFL faces limitations due to its overreliance on a central server, which impacts latency and system robustness. Decentralized Federated Learning (DFL) is introduced to address these challenges. It facilitates direct collaboration among participating devices without relying on a central server. Each device can independently connect with other devices and share model parameters. This work explores crucial factors influencing the convergence and generalization capacity of DFL models, emphasizing network topologies, non-IID data distribution, and training strategies. We first derive the convergence rate of different DFL model deployment strategies. Then, we comprehensively analyze various network topologies (e.g., linear, ring, star, and mesh) with different degrees of non-IID data and evaluate them over widely adopted machine learning models (e.g., classical, deep neural networks, and Large Language Models) and real-world datasets. The results reveal that models converge to the optimal one for IID data. However, the convergence rate is inversely proportional to the degree of non-IID data distribution. Our findings will serve as valuable guidelines for designing effective DFL model deployments in practical applications.

Updated: 2025-03-14 19:37:13

标题: 分散式联邦学习部署的性能分析

摘要: 智能手机和智能可穿戴设备的广泛应用导致了集中式联邦学习（Centralized Federated Learning，CFL）的广泛使用，用于训练强大的机器学习模型同时保护数据隐私。然而，CFL由于过度依赖中央服务器而面临限制，影响延迟和系统稳健性。为了解决这些挑战，引入了分散式联邦学习（Decentralized Federated Learning，DFL）。它促进了参与设备之间的直接协作，而无需依赖中央服务器。每个设备可以独立地连接其他设备并共享模型参数。本文探讨了影响DFL模型收敛和泛化能力的关键因素，重点在于网络拓扑结构、非独立同分布数据分布和训练策略。我们首先推导了不同DFL模型部署策略的收敛速度。然后，我们全面分析了不同网络拓扑结构（如线性、环形、星形和网状）与不同程度的非独立同分布数据，并在广泛采用的机器学习模型（如传统模型、深度神经网络和大型语言模型）以及真实世界数据集上对它们进行评估。结果表明，模型对于独立同分布数据收敛到最佳模型。然而，收敛速度与非独立同分布数据分布程度成反比。我们的研究结果将为在实际应用中设计有效的DFL模型部署提供宝贵的指导。

更新时间: 2025-03-14 19:37:13

领域: cs.LG,cs.DC,cs.NI

下载: http://arxiv.org/abs/2503.11828v1

Semi-Supervised Co-Training of Time and Time-Frequency Models: Application to Bearing Fault Diagnosis

Neural networks require massive amounts of annotated data to train intelligent solutions. Acquiring many labeled data in industrial applications is often difficult; therefore, semi-supervised approaches are preferred. We propose a new semi-supervised co-training method, which combines time and time-frequency (TF) machine learning models to improve performance and reliability. The developed framework collaboratively co-trains fast time-domain models by utilizing high-performing TF techniques without increasing the inference complexity. Besides, it operates in cloud-edge networks and offers holistic support for many applications covering edge-real-time monitoring and cloud-based updates and corrections. Experimental results on bearing fault diagnosis verify the superiority of our technique compared to a competing self-training method. The results from two case studies show that our method outperforms self-training for different noise levels and amounts of available data with accuracy gains reaching from 10.6% to 33.9%. They demonstrate that fusing time-domain and TF-based models offers opportunities for developing high-performance industrial solutions.

Updated: 2025-03-14 19:24:38

标题: 半监督时间和时频模型的联合训练：应用于轴承故障诊断

摘要: 神经网络需要大量注释数据来训练智能解决方案。在工业应用中获取大量标记数据通常很困难，因此，半监督方法更受青睐。我们提出了一种新的半监督协同训练方法，将时间和时频（TF）机器学习模型结合起来，以提高性能和可靠性。该开发的框架通过利用高性能TF技术协同训练快速时间域模型，而不增加推理复杂性。此外，它在云边网络中运行，并为涵盖边缘实时监测和基于云的更新和校正的许多应用提供全面支持。在轴承故障诊断的实验结果验证了我们的技术相对于竞争的自训练方法的优越性。两个案例研究的结果显示，我们的方法在不同噪声水平和可用数据量方面优于自训练，准确度提高了10.6%至33.9%。它们表明，融合时间域和基于TF的模型为开发高性能工业解决方案提供了机会。

更新时间: 2025-03-14 19:24:38

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2503.11824v1

Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images

Backdoor attacks pose a critical threat by embedding hidden triggers into inputs, causing models to misclassify them into target labels. While extensive research has focused on mitigating these attacks in object recognition models through weight fine-tuning, much less attention has been given to detecting backdoored samples directly. Given the vast datasets used in training, manual inspection for backdoor triggers is impractical, and even state-of-the-art defense mechanisms fail to fully neutralize their impact. To address this gap, we introduce a groundbreaking method to detect unseen backdoored images during both training and inference. Leveraging the transformative success of prompt tuning in Vision Language Models (VLMs), our approach trains learnable text prompts to differentiate clean images from those with hidden backdoor triggers. Experiments demonstrate the exceptional efficacy of this method, achieving an impressive average accuracy of 86% across two renowned datasets for detecting unseen backdoor triggers, establishing a new standard in backdoor defense.

Updated: 2025-03-14 19:24:34

标题: 主动对抗性防御：利用视觉语言模型中的即时调整来检测未知的后门图像

摘要: 后门攻击通过将隐藏触发器嵌入输入中，导致模型错误地将其误分类为目标标签，构成了一个重大威胁。虽然在减轻这些攻击方面进行了大量研究，主要集中在通过权重微调来减轻物体识别模型中的攻击，但在直接检测植入后门样本方面却受到了较少关注。鉴于在训练中使用的大量数据集，手动检查后门触发器是不切实际的，即使最先进的防御机制也无法完全中和它们的影响。为了解决这一差距，我们引入了一种开创性的方法，在训练和推断期间检测未见过的植入后门的图像。利用视觉语言模型（VLMs）中提示调整的变革性成功，我们的方法训练可学习的文本提示，以区分干净图像和具有隐藏后门触发器的图像。实验证明了这种方法的卓越有效性，在检测未见过的后门触发器方面，在两个著名数据集上取得了令人印象深刻的平均准确率，建立了后门防御的新标准。

更新时间: 2025-03-14 19:24:34

领域: cs.CV,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2412.08755v3

An Algebraic Approach to Moralisation and Triangulation of Probabilistic Graphical Models

Moralisation and Triangulation are transformations allowing to switch between different ways of factoring a probability distribution into a graphical model. Moralisation allows to view a Bayesian network (a directed model) as a Markov network (an undirected model), whereas triangulation works in the opposite direction. We present a categorical framework where these transformations are modelled as functors between a category of Bayesian networks and one of Markov networks. The two kinds of network (the objects of these categories) are themselves represented as functors, from a `syntax' domain to a `semantics' codomain. Notably, moralisation and triangulation are definable inductively on such syntax, and operate as a form of functor pre-composition. This approach introduces a modular, algebraic perspective in the theory of probabilistic graphical models.

Updated: 2025-03-14 19:16:41

标题: 一种代数方法用于概率图模型的道德化与三角化

摘要: 道德化和三角化是一种转换，允许在不同的方式之间切换将概率分布分解为图模型。道德化允许将贝叶斯网络（有向模型）视为马尔科夫网络（无向模型），而三角化则相反。我们提出了一个范畴框架，在这个框架中，这些转换被建模为从贝叶斯网络范畴到马尔科夫网络范畴的函子。这两种网络（这些范畴的对象）本身被表示为从“语法”域到“语义”共域的函子。值得注意的是，道德化和三角化可以在这种语法上归纳定义，并且作为函子预合成的一种形式运作。这种方法在概率图模型理论中引入了一种模块化的、代数的视角。

更新时间: 2025-03-14 19:16:41

领域: cs.AI,cs.LO,math.CT

下载: http://arxiv.org/abs/2503.11820v1

Online Assortment and Price Optimization Under Contextual Choice Models

We consider an assortment selection and pricing problem in which a seller has $N$ different items available for sale. In each round, the seller observes a $d$-dimensional contextual preference information vector for the user, and offers to the user an assortment of $K$ items at prices chosen by the seller. The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown. The seller observes which, if any, item is chosen at the end of each round, with the goal of maximizing cumulative revenue over a selling horizon of length $T$. For this problem, we propose an algorithm that learns from user feedback and achieves a revenue regret of order $\widetilde{O}(d \sqrt{K T} / L_0 )$ where $L_0$ is the minimum price sensitivity parameter. We also obtain a lower bound of order $\Omega(d \sqrt{T}/ L_0)$ for the regret achievable by any algorithm.

Updated: 2025-03-14 19:15:33

标题: 在线情境选择模型下的产品组合和价格优化

摘要: 我们考虑了一个关于商品选择和定价的问题，一个卖家有$N$种不同的商品可供销售。在每一轮中，卖家观察到用户的$d$维情境偏好信息向量，并向用户提供由卖家选择价格的$K$种商品组合。用户根据一个未知参数的多项式对数选择模型从所提供的商品组合中最多选择一种产品。卖家在每一轮结束时观察到用户选择了哪种商品，目标是在销售周期长度为$T$的情况下最大化累积收入。针对这个问题，我们提出了一种算法，通过学习用户反馈，实现了一个收入遗憾的阶数为$\widetilde{O}(d \sqrt{K T} / L_0)$，其中$L_0$是最小价格敏感性参数。我们还得到了任何算法可实现的遗憾下界为$\Omega(d \sqrt{T}/ L_0)$。

更新时间: 2025-03-14 19:15:33

领域: cs.LG,cs.GT,econ.TH,stat.ML

下载: http://arxiv.org/abs/2503.11819v1

Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits

There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to a given distance function. The complete graph is navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n^{\alpha})$ for any $\alpha < 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.

Updated: 2025-03-14 19:01:23

标题: 可导航图用于高维最近邻搜索：构建和限制

摘要: 最近对基于图的最近邻搜索方法产生了相当大的兴趣，其中许多方法都集中在构建高维点集上的可导航图。如果我们可以成功地从任何起始节点移动到任何目标节点，使用一种贪婪的路由策略，总是移动到离目的地最近的邻居，那么图就是可导航的。对于任何点集来说，完整的图都是可导航的，但对于应用程序来说，重要的问题是是否可以构建更稀疏的图。虽然在低维度下这个问题已经比较清楚，但我们在高维点集中建立了一些首次的上界和下界。首先，我们提供了一种简单高效的方法，可以为任何点集构建平均度为$O(\sqrt{n \log n})$的可导航图，无论是在任何维度，任何距离函数下。我们用一个近乎匹配的下界补充了这个结果：即使在$O(\log n)$维的欧氏度量下，一个随机点集也没有平均度为$O(n^{\alpha})$的可导航图，其中$\alpha < 1/2$。我们的下界依赖于二项随机变量的尖锐反集中度界限，我们利用这一点来展示随机点集的近邻域不会有显著的重叠，从而迫使任何可导航图具有许多边。

更新时间: 2025-03-14 19:01:23

领域: cs.DS,cs.CG,cs.DB,cs.LG

下载: http://arxiv.org/abs/2405.18680v4

The Architecture and Evaluation of Bayesian Neural Networks

As modern neural networks get more complex, specifying a model with high predictive performance and sound uncertainty quantification becomes a more challenging task. Despite some promising theoretical results on the true posterior predictive distribution of Bayesian neural networks, the properties of even the most commonly used posterior approximations are often questioned. Computational burdens and intractable posteriors expose miscalibrated Bayesian neural networks to poor accuracy and unreliable uncertainty estimates. Approximate Bayesian inference aims to replace unknown and intractable posterior distributions with some simpler but feasible distributions. The dimensions of modern deep models coupled with the lack of identifiability make Markov chain Monte Carlo tremendously expensive and unable to fully explore the multimodal posterior. On the other hand, variational inference benefits from improved computational complexity but lacks the asymptotical guarantees of sampling-based inference and tends to concentrate around a single mode. The performance of both approaches heavily depends on architectural choices; this paper aims to shed some light on this, by considering the computational costs, accuracy and uncertainty quantification in different scenarios including large width and out-of-sample data. To improve posterior exploration, different model averaging and ensembling techniques are studied, along with their benefits on predictive performance. In our experiments, variational inference overall provided better uncertainty quantification than Markov chain Monte Carlo; further, stacking and ensembles of variational approximations provided comparable to Markov chain Monte Carlo accuracy at a much-reduced cost.

Updated: 2025-03-14 18:55:48

标题: 贝叶斯神经网络的架构和评估

摘要: 随着现代神经网络变得越来越复杂，指定具有高预测性能和可靠不确定性量化的模型变得更具挑战性。尽管一些关于贝叶斯神经网络真后验预测分布的理论结果颇具前景，但即使是最常用的后验逼近的性质也经常受到质疑。计算负担和难以处理的后验使得贝叶斯神经网络的校准不足，导致精度不佳和不可靠的不确定性估计。近似贝叶斯推断旨在用一些简单但可行的分布替换未知和难以处理的后验分布。现代深度模型的维度与可识别性的缺乏使得马尔可夫链蒙特卡洛法的代价极高，无法完全探索多峰后验。另一方面，变分推断在改进计算复杂性方面具有优势，但缺乏基于采样的推断的渐近保证，并且倾向于集中在单一模式周围。两种方法的性能在很大程度上取决于架构选择；本文旨在通过考虑不同情景下的计算成本、准确性和不确定性量化来阐明这一点，包括大宽度和样外数据。为了改善后验探索，研究了不同的模型平均和集成技术，以及它们对预测性能的好处。在我们的实验中，变分推断总体提供了比马尔可夫链蒙特卡洛法更好的不确定性量化；此外，变分逼近的堆叠和集成在大大降低成本的情况下提供了与马尔可夫链蒙特卡洛法相当的准确性。

更新时间: 2025-03-14 18:55:48

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2503.11808v1

Evaluating the Process Modeling Abilities of Large Language Models -- Preliminary Foundations and Results

Large language models (LLM) have revolutionized the processing of natural language. Although first benchmarks of the process modeling abilities of LLM are promising, it is currently under debate to what extent an LLM can generate good process models. In this contribution, we argue that the evaluation of the process modeling abilities of LLM is far from being trivial. Hence, available evaluation results must be taken carefully. For example, even in a simple scenario, not only the quality of a model should be taken into account, but also the costs and time needed for generation. Thus, an LLM does not generate one optimal solution, but a set of Pareto-optimal variants. Moreover, there are several further challenges which have to be taken into account, e.g. conceptualization of quality, validation of results, generalizability, and data leakage. We discuss these challenges in detail and discuss future experiments to tackle these challenges scientifically.

Updated: 2025-03-14 18:52:18

标题: 评估大型语言模型的过程建模能力--初步基础和结果

摘要: 大型语言模型(LLM)已经彻底改变了自然语言处理。虽然LLM的过程建模能力的首批基准测试结果令人鼓舞，但目前存在争议，即LLM能够生成多好的过程模型。在这篇论文中，我们认为评估LLM的过程建模能力远非易事。因此，必须谨慎对待现有的评估结果。例如，在一个简单的场景中，不仅应考虑模型的质量，还应考虑生成所需的成本和时间。因此，LLM并不生成一个最佳解决方案，而是一组帕累托最优变体。此外，还有许多其他挑战需要考虑，例如质量的概念化、结果的验证、普遍性和数据泄漏。我们详细讨论了这些挑战，并讨论了未来实验来科学地解决这些挑战。

更新时间: 2025-03-14 18:52:18

领域: cs.CL,cs.LG,cs.SE

下载: http://arxiv.org/abs/2503.13520v1

Boosting Hierarchical Reinforcement Learning with Meta-Learning for Complex Task Adaptation

Hierarchical Reinforcement Learning (HRL) is well-suitedd for solving complex tasks by breaking them down into structured policies. However, HRL agents often struggle with efficient exploration and quick adaptation. To overcome these limitations, we propose integrating meta-learning into HRL to enable agents to learn and adapt hierarchical policies more effectively. Our method leverages meta-learning to facilitate rapid task adaptation using prior experience, while intrinsic motivation mechanisms drive efficient exploration by rewarding the discovery of novel states. Specifically, our agent employs a high-level policy to choose among multiple low-level policies within custom-designed grid environments. By incorporating gradient-based meta-learning with differentiable inner-loop updates, we optimize performance across a curriculum of progressively challenging tasks. Experimental results highlight that our metalearning-enhanced hierarchical agent significantly outperforms standard HRL approaches lacking meta-learning and intrinsic motivation. The agent demonstrates faster learning, greater cumulative rewards, and higher success rates in complex grid-based scenarios. These Findings underscore the effectiveness of combining meta-learning, curriculum learning, and intrinsic motivation to enhance the capability of HRL agents in tackling complex tasks.

Updated: 2025-03-14 18:52:03

标题: 使用元学习增强层次强化学习，实现复杂任务适应

摘要: Hierarchical Reinforcement Learning（HRL）很适合通过将复杂任务分解为结构化策略来解决问题。然而，HRL代理通常在有效的探索和快速适应方面存在困难。为了克服这些限制，我们提出将元学习整合到HRL中，以使代理能够更有效地学习和适应分层策略。我们的方法利用元学习来促进通过先前经验的快速任务适应，同时内在动机机制通过奖励发现新领域来驱动有效的探索。具体而言，我们的代理使用高级策略在定制设计的网格环境中选择多个低级策略。通过将基于梯度的元学习与可区分内循环更新相结合，我们优化了在一系列逐渐具有挑战性的任务中的表现。实验结果突显出，我们增强了元学习的分层代理明显优于缺乏元学习和内在动机的标准HRL方法。代理在复杂的基于网格的场景中表现出更快的学习、更高的累积奖励和更高的成功率。这些发现强调了将元学习、课程学习和内在动机结合起来以增强HRL代理在处理复杂任务方面的能力的有效性。

更新时间: 2025-03-14 18:52:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07921v2

Simplifying Deep Temporal Difference Learning

Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a large replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target network harms the sample efficiency and, similarly, the large replay buffer introduces memory and implementation overheads. In this paper, we investigate whether it is possible to accelerate and simplify off-policy TD training while maintaining its stability. Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms without the need for a target network or replay buffer, even with off-policy data. Empirically, we find that online, parallelised sampling enabled by vectorised environments stabilises training without the need for a large replay buffer. Motivated by these findings, we propose PQN, our simplified deep online Q-Learning algorithm. Surprisingly, this simple algorithm is competitive with more complex methods like: Rainbow in Atari, PPO-RNN in Craftax, QMix in Smax, and can be up to 50x faster than traditional DQN without sacrificing sample efficiency. In an era where PPO has become the go-to RL algorithm, PQN reestablishes off-policy Q-learning as a viable alternative.

Updated: 2025-03-14 18:51:52

标题: 简化深度时间差异学习

摘要: Q学习在强化学习领域发挥了基础性作用。然而，具有离线数据的TD算法，如Q学习，或类似深度神经网络的非线性函数逼近，需要几种额外技巧来稳定训练，主要是一个大的重播缓冲区和目标网络。不幸的是，目标网络中冻结网络参数的延迟更新损害了样本效率，同样，大型重播缓冲区引入了内存和实现开销。在本文中，我们研究是否可以加速和简化离线策略TD训练，同时保持其稳定性。我们的关键理论结果首次表明，诸如LayerNorm之类的正则化技术可以产生具有收敛性的TD算法，而无需目标网络或重播缓冲区，即使使用离线数据。在经验上，我们发现由矢量化环境实现的在线并行采样可以稳定训练，而无需大型重播缓冲区。受这些发现的启发，我们提出了PQN，我们简化的深度在线Q学习算法。令人惊讶的是，这个简单的算法在Atari中与更复杂的方法（如：Rainbow）、在Craftax中与PPO-RNN、在Smax中与QMix相竞争，并且比传统的DQN快50倍，而不牺牲样本效率。在PPO已成为首选的强化学习算法的时代，PQN重新确立了离线策略Q学习作为一个可行的替代方案。

更新时间: 2025-03-14 18:51:52

领域: cs.LG

下载: http://arxiv.org/abs/2407.04811v4

Mitigating Bad Ground Truth in Supervised Machine Learning based Crop Classification: A Multi-Level Framework with Sentinel-2 Images

In agricultural management, precise Ground Truth (GT) data is crucial for accurate Machine Learning (ML) based crop classification. Yet, issues like crop mislabeling and incorrect land identification are common. We propose a multi-level GT cleaning framework while utilizing multi-temporal Sentinel-2 data to address these issues. Specifically, this framework utilizes generating embeddings for farmland, clustering similar crop profiles, and identification of outliers indicating GT errors. We validated clusters with False Colour Composite (FCC) checks and used distance-based metrics to scale and automate this verification process. The importance of cleaning the GT data became apparent when the models were trained on the clean and unclean data. For instance, when we trained a Random Forest model with the clean GT data, we achieved upto 70\% absolute percentage points higher for the F1 score metric. This approach advances crop classification methodologies, with potential for applications towards improving loan underwriting and agricultural decision-making.

Updated: 2025-03-14 18:50:30

标题: 减轻基于监督式机器学习的作物分类中的错误地面真实性：基于Sentinel-2图像的多层框架

摘要: 在农业管理中，精确的地面真实数据对于基于机器学习的作物分类至关重要。然而，作物错误标记和土地错误识别等问题很常见。我们提出了一个多层次的地面真实数据清洗框架，同时利用多时相的Sentinel-2数据来解决这些问题。具体而言，该框架利用生成农田嵌入、聚类相似作物特征和识别异常值来指示真实数据错误。我们通过False Colour Composite（FCC）检查验证了聚类，并使用基于距离的指标来扩展和自动化这个验证过程。当模型在清洁和未清洁的数据上训练时，清洁地面真实数据的重要性变得明显。例如，当我们使用干净的地面真实数据训练随机森林模型时，我们的F1分数指标高出70\%的绝对百分点。这种方法推进了作物分类方法，有潜力应用于改进贷款核准和农业决策。

更新时间: 2025-03-14 18:50:30

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.11807v1

Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control

We present Diffuse-CLoC, a guided diffusion framework for physics-based look-ahead control that enables intuitive, steerable, and physically realistic motion generation. While existing kinematics motion generation with diffusion models offer intuitive steering capabilities with inference-time conditioning, they often fail to produce physically viable motions. In contrast, recent diffusion-based control policies have shown promise in generating physically realizable motion sequences, but the lack of kinematics prediction limits their steerability. Diffuse-CLoC addresses these challenges through a key insight: modeling the joint distribution of states and actions within a single diffusion model makes action generation steerable by conditioning it on the predicted states. This approach allows us to leverage established conditioning techniques from kinematic motion generation while producing physically realistic motions. As a result, we achieve planning capabilities without the need for a high-level planner. Our method handles a diverse set of unseen long-horizon downstream tasks through a single pre-trained model, including static and dynamic obstacle avoidance, motion in-betweening, and task-space control. Experimental results show that our method significantly outperforms the traditional hierarchical framework of high-level motion diffusion and low-level tracking.

Updated: 2025-03-14 18:42:29

标题: Diffuse-CLoC：基于物理的角色预先控制的引导扩散

摘要: 我们提出了Diffuse-CLoC，这是一个基于物理学的前瞻控制引导扩散框架，可以实现直观、可控和物理真实的运动生成。虽然现有的基于扩散模型的运动生成在推理时间条件下提供直观的转向能力，但它们经常无法产生物理上可行的运动。相反，最近基于扩散的控制策略显示出在生成物理可实现的运动序列方面有潜力，但缺乏运动学预测限制了它们的可控性。Diffuse-CLoC通过一个关键的洞察力解决了这些挑战：在一个单一的扩散模型中建模状态和动作的联合分布使动作生成通过在预测的状态上进行条件设置而变得可控。这种方法使我们能够利用来自运动学运动生成的已建立的条件技术，同时产生物理真实的运动。因此，我们实现了规划能力，无需高级计划者。我们的方法通过一个经过预训练的模型处理一系列不同的未见的长期下游任务，包括静态和动态障碍物避免、运动插值和任务空间控制。实验结果表明，我们的方法明显优于传统的高级运动扩散和低级跟踪的分层框架。

更新时间: 2025-03-14 18:42:29

领域: cs.GR,cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.11801v1

Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection

Vision-Language Models (VLMs) leverage aligned visual encoders to transform images into visual tokens, allowing them to be processed similarly to text by the backbone large language model (LLM). This unified input paradigm enables VLMs to excel in vision-language tasks such as visual question answering (VQA). To improve fine-grained visual reasoning, recent advancements in vision-language modeling introduce image cropping techniques that feed all encoded sub-images into the model. However, this approach significantly increases the number of visual tokens, leading to inefficiency and potential distractions for the LLM. To address the generalization challenges of image representation in VLMs, we propose a lightweight, universal framework that seamlessly integrates with existing VLMs to enhance their ability to process finegrained details. Our method leverages textual semantics to identify key visual areas, improving VQA performance without requiring any retraining of the VLM. Additionally, it incorporates textual signals into the visual encoding process, enhancing both efficiency and effectiveness. The proposed method, SEMCLIP, strengthens the visual understanding of a 7B VLM, LLaVA-1.5 by 3.3% on average across 7 benchmarks, and particularly by 5.3% on the challenging detailed understanding benchmark V*.

Updated: 2025-03-14 18:33:31

标题: 语义剪辑：具有语义引导视觉选择的高效视觉-语言建模

摘要: 视觉-语言模型(VLMs)利用对齐的视觉编码器将图像转换为视觉标记，使它们能够像文本一样通过主干大型语言模型(LLM)进行处理。这种统一的输入范式使VLMs在视觉-语言任务中表现出色，如视觉问题回答(VQA)。为了改进细粒度的视觉推理，最近在视觉-语言建模方面的进展引入了图像裁剪技术，将所有编码的子图像馈送到模型中。然而，这种方法显著增加了视觉标记的数量，导致LLM的效率和潜在分散注意力。为了解决VLMs中图像表示的泛化挑战，我们提出了一个轻量级的通用框架，与现有的VLMs无缝集成，以增强它们处理细粒度细节的能力。我们的方法利用文本语义来识别关键的视觉区域，提高VQA的性能，而无需对VLM进行任何重新训练。此外，它将文本信号整合到视觉编码过程中，提高了效率和有效性。所提出的方法SEMCLIP，通过7个基准测试，在7B VLM LLaVA-1.5上平均提高了3.3%，特别是在具有挑战性的详细理解基准测试V*上提高了5.3%。

更新时间: 2025-03-14 18:33:31

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.11794v1

Performative Reinforcement Learning with Linear Markov Decision Process

We study the setting of \emph{performative reinforcement learning} where the deployed policy affects both the reward, and the transition of the underlying Markov decision process. Prior work~\parencite{MTR23} has addressed this problem under the tabular setting and established last-iterate convergence of repeated retraining with iteration complexity explicitly depending on the number of states. In this work, we generalize the results to \emph{linear Markov decision processes} which is the primary theoretical model of large-scale MDPs. The main challenge with linear MDP is that the regularized objective is no longer strongly convex and we want a bound that scales with the dimension of the features, rather than states which can be infinite. Our first result shows that repeatedly optimizing a regularized objective converges to a \emph{performatively stable policy}. In the absence of strong convexity, our analysis leverages a new recurrence relation that uses a specific linear combination of optimal dual solutions for proving convergence. We then tackle the finite sample setting where the learner has access to a set of trajectories drawn from the current policy. We consider a reparametrized version of the primal problem, and construct an empirical Lagrangian which is to be optimized from the samples. We show that, under a \emph{bounded coverage} condition, repeatedly solving a saddle point of this empirical Lagrangian converges to a performatively stable solution, and also construct a primal-dual algorithm that solves the empirical Lagrangian efficiently. Finally, we show several applications of the general framework of performative RL including multi-agent systems.

Updated: 2025-03-14 18:32:50

标题: 线性马尔可夫决策过程中的执行强化学习

摘要: 我们研究了\emph{执行增强学习}的设置，其中部署的策略影响了底层马尔可夫决策过程的奖励和转移。先前的工作~\parencite{MTR23}已经在表格设置下解决了这个问题，并建立了通过重复训练实现最后迭代收敛的结果，迭代复杂度明确取决于状态的数量。在这项工作中，我们将结果推广到\emph{线性马尔可夫决策过程}，这是大规模MDP的主要理论模型。线性MDP的主要挑战在于正则化的目标不再是强凸的，我们希望得到一个随着特征维度而不是状态数量（可能是无限的）而缩放的界限。我们的第一个结果表明，重复优化正则化目标会收敛到一个\emph{执行稳定策略}。在缺乏强凸性的情况下，我们的分析利用了一个新的递归关系，该关系使用了用于证明收敛性的一组最优对偶解的特定线性组合。然后我们解决了有限样本设置，其中学习者可以访问一组从当前策略中抽取的轨迹。我们考虑了原始问题的重新参数化版本，并构建了一个经验拉格朗日函数，该函数需要从样本中进行优化。我们表明，在\emph{有界覆盖}条件下，反复求解这个经验拉格朗日函数的鞍点会收敛到一个执行稳定的解，还构建了一个能够有效解决经验拉格朗日函数的原始-对偶算法。最后，我们展示了包括多智能体系统在内的执行RL通用框架的几个应用。

更新时间: 2025-03-14 18:32:50

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2411.05234v2

Visualizing Thought: Conceptual Diagrams Enable Robust Planning in LMMs

Human reasoning relies on constructing and manipulating mental models-simplified internal representations of situations that we use to understand and solve problems. Conceptual diagrams (for example, sketches drawn by humans to aid reasoning) externalize these mental models, abstracting irrelevant details to efficiently capture relational and spatial information. In contrast, Large Language Models (LLMs) and Large Multimodal Models (LMMs) predominantly reason through textual representations, limiting their effectiveness in complex multi-step combinatorial and planning tasks. In this paper, we propose a zero-shot fully automatic framework that enables LMMs to reason through multiple chains of self-generated intermediate conceptual diagrams, significantly enhancing their combinatorial planning capabilities. Our approach does not require any human initialization beyond a natural language description of the task. It integrates both textual and diagrammatic reasoning within an optimized graph-of-thought inference framework, enhanced by beam search and depth-wise backtracking. Evaluated on multiple challenging PDDL planning domains, our method substantially improves GPT-4o's performance (for example, from 35.5% to 90.2% in Blocksworld). On more difficult planning domains with solution depths up to 40, our approach outperforms even the o1-preview reasoning model (for example, over 13% improvement in Parking). These results highlight the value of conceptual diagrams as a complementary reasoning medium in LMMs.

Updated: 2025-03-14 18:27:02

标题: 可视化思维：概念图表在LMMs中实现强大规划

摘要: 人类推理依赖于构建和操纵心理模型-简化的内部对情况的表示，我们用来理解和解决问题。概念图（例如，人类绘制的草图来帮助推理）外部化这些心理模型，抽象出不相关的细节，以有效地捕捉关系和空间信息。相比之下，大型语言模型（LLMs）和大型多模态模型（LMMs）主要通过文本表示进行推理，限制了它们在复杂的多步组合和规划任务中的有效性。在本文中，我们提出了一个零-shot完全自动化的框架，使LMMs能够通过多个自动生成的中间概念图链进行推理，显著增强它们的组合规划能力。我们的方法不需要任何人类初始化，只需自然语言描述任务。它在优化的思维图推断框架中集成了文本和图解推理，通过波束搜索和深度回溯进行增强。在多个具有挑战性的PDDL规划领域进行评估，我们的方法显著提高了GPT-4o的性能（例如，在Blocksworld中从35.5%提高到90.2%）。在解决深度高达40的更困难的规划领域中，我们的方法甚至优于o1-preview推理模型（例如，在停车场中超过13%的改进）。这些结果突显了概念图作为LMMs中补充推理媒介的价值。

更新时间: 2025-03-14 18:27:02

领域: cs.AI

下载: http://arxiv.org/abs/2503.11790v1

Examples as the Prompt: A Scalable Approach for Efficient LLM Adaptation in E-Commerce

Prompting LLMs offers an efficient way to guide output generation without explicit model training. In the e-commerce domain, prompting-based applications are widely used for tasks such as query understanding, recommender systems, and customer support. However, adapting LLMs to different tasks often requires extensive prompt engineering by domain experts, along with frequent updates to align with evolving business needs. Additionally, crafting fully unbiased natural language prompts remains a challenge for humans. To address these challenges, we propose a novel framework, Examples as the Prompt (EaP) which leverages labeled data to enhance prompts. Specifically, EaP automatically selects the most representative examples to maximize the few-shot capability of LLMs. It is efficient due to its unsupervised example selection and adaptive to potential data distribution shifts. We validate EaP on four real-world production use cases, demonstrating that it achieves comparable or even superior performance comparing to hand-crafted prompts designed by domain experts. Additionally, we introduce EaP_lite, which entirely replaces the natural language components of prompts with labeled examples. EaP_lite improves LLM inference speed by up to 70% without compromising performance. Latest online A/B test shows that using EaP and EaP_lite for data labeling can bring significant composite revenue gain by 0.06%.

Updated: 2025-03-14 18:22:43

标题: 以示例为提示：一种在电子商务中实现高效LLM自适应的可扩展方法

摘要: 提示LLMs提供了一种有效的方法来引导输出生成，而无需明确进行模型训练。在电子商务领域，基于提示的应用广泛用于诸如查询理解、推荐系统和客户支持等任务。然而，将LLMs调整到不同任务通常需要领域专家进行大量的提示工程，并需要频繁更新以与不断发展的业务需求保持一致。此外，为人类制定完全无偏见的自然语言提示仍然是一个挑战。为解决这些挑战，我们提出了一个新颖的框架，即Examples as the Prompt (EaP)，利用标记数据来增强提示。具体来说，EaP自动选择最具代表性的示例，以最大化LLMs的少样本能力。由于其无监督示例选择和适应潜在数据分布变化，EaP是高效的。我们在四个真实的生产用例上验证了EaP，表明它在与领域专家设计的手工提示相比实现了可比甚至更优越的性能。此外，我们引入了EaP_lite，它完全用标记示例替换提示的自然语言组件。EaP_lite提高了LLM推断速度高达70%，而不影响性能。最新的在线A/B测试显示，使用EaP和EaP_lite进行数据标记可以带来0.06%的显著综合收入增益。

更新时间: 2025-03-14 18:22:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.13518v1

Tensor Convolutional Network for Higher-Order Interaction Prediction in Sparse Tensors

Many real-world data, such as recommendation data and temporal graphs, can be represented as incomplete sparse tensors where most entries are unobserved. For such sparse tensors, identifying the top-k higher-order interactions that are most likely to occur among unobserved ones is crucial. Tensor factorization (TF) has gained significant attention in various tensor-based applications, serving as an effective method for finding these top-k potential interactions. However, existing TF methods primarily focus on effectively fusing latent vectors of entities, which limits their expressiveness. Since most entities in sparse tensors have only a few interactions, their latent representations are often insufficiently trained. In this paper, we propose TCN, an accurate and compatible tensor convolutional network that integrates seamlessly with existing TF methods for predicting higher-order interactions. We design a highly effective encoder to generate expressive latent vectors of entities. To achieve this, we propose to (1) construct a graph structure derived from a sparse tensor and (2) develop a relation-aware encoder, TCN, that learns latent representations of entities by leveraging the graph structure. Since TCN complements traditional TF methods, we seamlessly integrate TCN with existing TF methods, enhancing the performance of predicting top-k interactions. Extensive experiments show that TCN integrated with a TF method outperforms competitors, including TF methods and a hyperedge prediction method. Moreover, TCN is broadly compatible with various TF methods and GNNs (Graph Neural Networks), making it a versatile solution.

Updated: 2025-03-14 18:22:20

标题: 张量卷积网络用于稀疏张量中高阶交互预测

摘要: 许多现实世界的数据，例如推荐数据和时间图，可以表示为不完整的稀疏张量，其中大多数条目未被观察到。对于这样的稀疏张量，识别最有可能出现在未观察到的条目之间的前k个高阶交互作用至关重要。张量分解（TF）在各种基于张量的应用中引起了广泛关注，是一种有效的方法，用于发现这些前k个潜在的交互作用。然而，现有的TF方法主要集中在有效地融合实体的潜在向量，这限制了它们的表达能力。由于稀疏张量中的大多数实体只有少数交互作用，它们的潜在表示往往训练不足。在本文中，我们提出了TCN，一种准确且兼容的张量卷积网络，可以与现有的TF方法无缝集成，用于预测高阶交互作用。我们设计了一个高效的编码器来生成实体的表达力强的潜在向量。为了实现这一目标，我们提出（1）构建一个从稀疏张量派生的图结构，并且（2）开发了一个关系感知的编码器TCN，通过利用图结构来学习实体的潜在表示。由于TCN补充了传统的TF方法，我们将TCN与现有的TF方法无缝集成，增强了预测前k个交互作用的性能。大量实验证明，集成了TF方法的TCN胜过了竞争对手，包括TF方法和一个超边预测方法。此外，TCN与各种TF方法和GNNs（图神经网络）广泛兼容，使其成为一个多功能解决方案。

更新时间: 2025-03-14 18:22:20

领域: cs.LG

下载: http://arxiv.org/abs/2503.11786v1

ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design

Engineering molecules to exhibit precise 3D intermolecular interactions with their environment forms the basis of chemical design. In ligand-based drug design, bioisosteric analogues of known bioactive hits are often identified by virtually screening chemical libraries with shape, electrostatic, and pharmacophore similarity scoring functions. We instead hypothesize that a generative model which learns the joint distribution over 3D molecular structures and their interaction profiles may facilitate 3D interaction-aware chemical design. We specifically design ShEPhERD, an SE(3)-equivariant diffusion model which jointly diffuses/denoises 3D molecular graphs and representations of their shapes, electrostatic potential surfaces, and (directional) pharmacophores to/from Gaussian noise. Inspired by traditional ligand discovery, we compose 3D similarity scoring functions to assess ShEPhERD's ability to conditionally generate novel molecules with desired interaction profiles. We demonstrate ShEPhERD's potential for impact via exemplary drug design tasks including natural product ligand hopping, protein-blind bioactive hit diversification, and bioisosteric fragment merging.

Updated: 2025-03-14 18:13:25

标题: ShEPhERD: 将形状、静电和药效团扩散用于生物等效药物设计

摘要: 将分子工程化以展现与其环境具有精确的三维分子间相互作用，是化学设计的基础。在配体基药物设计中，通常通过使用形状、静电和药效团相似性评分函数，利用虚拟筛选化学库来识别已知生物活性配体的生物同功异构体类似物。我们相反地假设，一个能够学习三维分子结构及其相互作用概况的联合分布的生成模型，可能有助于三维相互作用意识的化学设计。我们特别设计了ShEPhERD，一个SE(3)等变扩散模型，它同时扩散/去噪三维分子图和形状、静电势表面以及（定向）药效团的表示，从高斯噪声中生成/去噪。受传统配体发现的启发，我们组合了三维相似性评分函数，评估ShEPhERD生成具有所需相互作用概况的新分子的能力。我们通过示范药物设计任务展示了ShEPhERD的潜力，包括天然产物配体跳跃、蛋白盲生物活性配体多样化以及生物同功异构片段融合。

更新时间: 2025-03-14 18:13:25

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2411.04130v2

Enhancing Resiliency of Sketch-based Security via LSB Sharing-based Dynamic Late Merging

With the exponentially growing Internet traffic, sketch data structure with a probabilistic algorithm has been expected to be an alternative solution for non-compromised (non-selective) security monitoring. While facilitating counting within a confined memory space, the sketch's memory efficiency and accuracy were further pushed to their limit through finer-grained and dynamic control of constrained memory space to adapt to the data stream's inherent skewness (i.e., Zipf distribution), namely small counters with extensions. In this paper, we unveil a vulnerable factor of the small counter design by introducing a new sketch-oriented attack, which threatens a stream of state-of-the-art sketches and their security applications. With the root cause analyses, we propose Siamese Counter with enhanced adversarial resiliency and verified feasibility with extensive experimental and theoretical analyses. Under a sketch pollution attack, Siamese Counter delivers 47% accurate results than a state-of-the-art scheme, and demonstrates up to 82% more accurate estimation under normal measurement scenarios.

Updated: 2025-03-14 18:12:14

标题: 通过LSB共享动态后期合并增强基于草图的安全性的弹性

摘要: 随着互联网流量的指数增长，基于概率算法的草图数据结构被期望成为非损害（非选择性）安全监控的替代解决方案。在在有限内存空间内进行计数的同时，通过对受限内存空间的细粒度和动态控制，草图的内存效率和准确性被进一步推到极限，以适应数据流固有的倾斜性（即Zipf分布），即具有扩展功能的小计数器。在本文中，我们揭示了小计数器设计的一个脆弱因素，引入了一种新的针对草图的攻击，威胁着一系列最先进的草图及其安全应用。通过根本原因分析，我们提出了具有增强对抗性的Siamese Counter，并通过广泛的实验和理论分析验证了其可行性。在草图污染攻击下，Siamese Counter比最先进方案提供了47%的准确结果，并在正常测量场景下展示了高达82%更准确的估计。

更新时间: 2025-03-14 18:12:14

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2503.11777v1

Neural Geometry Processing via Spherical Neural Surfaces

Neural surfaces (e.g., neural map encoding, deep implicits and neural radiance fields) have recently gained popularity because of their generic structure (e.g., multi-layer perceptron) and easy integration with modern learning-based setups. Traditionally, we have a rich toolbox of geometry processing algorithms designed for polygonal meshes to analyze and operate on surface geometry. In the absence of an analogous toolbox, neural representations are typically discretized and converted into a mesh, before applying any geometry processing algorithm. This is unsatisfactory and, as we demonstrate, unnecessary. In this work, we propose a spherical neural surface representation for genus-0 surfaces and demonstrate how to compute core geometric operators directly on this representation. Namely, we estimate surface normals and first and second fundamental forms of the surface, as well as compute surface gradient, surface divergence and Laplace-Beltrami operator on scalar/vector fields defined on the surface. Our representation is fully seamless, overcoming a key limitation of similar explicit representations such as Neural Surface Maps [Morreale et al. 2021]. These operators, in turn, enable geometry processing directly on the neural representations without any unnecessary meshing. We demonstrate illustrative applications in (neural) spectral analysis, heat flow and mean curvature flow, and evaluate robustness to isometric shape variations. We propose theoretical formulations and validate their numerical estimates, against analytical estimates, mesh-based baselines, and neural alternatives, where available. By systematically linking neural surface representations with classical geometry processing algorithms, we believe that this work can become a key ingredient in enabling neural geometry processing. Code is accessible from the project webpage.

Updated: 2025-03-14 18:11:29

标题: 通过球形神经表面进行神经几何处理

摘要: 神经表面（例如，神经映射编码、深度隐式和神经辐射场）最近因其通用结构（例如，多层感知器）和与现代基于学习的设置的轻松集成而受到青睐。传统上，我们有丰富的几何处理算法工具箱，旨在分析和操作表面几何的多边形网格。在缺乏类似工具箱的情况下，神经表示通常会被离散化并转换为网格，然后再应用任何几何处理算法。这是不令人满意的，并且正如我们所展示的那样，是不必要的。在这项工作中，我们提出了一种适用于零亏格表面的球形神经表面表示，并演示了如何直接在该表示上计算核心几何运算符。具体来说，我们估计表面法线和表面的第一和第二基本形式，并计算表面梯度、表面散度和拉普拉斯-贝尔特拉米算子在表面上定义的标量/矢量场上。我们的表示是完全无缝的，克服了类似显式表示（如神经表面映射[Morreale等人，2021]）的一个关键限制。这些运算符反过来使几何处理能够直接在神经表示上进行，而无需任何不必要的网格化。我们演示了在（神经）谱分析、热流和平均曲率流中的应用，并评估了对等距形状变化的稳健性。我们提出了理论公式，并针对解析估计、基于网格的基线和神经替代方案进行了验证。通过系统地将神经表面表示与经典的几何处理算法联系起来，我们相信这项工作可以成为实现神经几何处理的关键要素。代码可从项目网页访问。

更新时间: 2025-03-14 18:11:29

领域: cs.GR,cs.AI,cs.CV,I.3.5

下载: http://arxiv.org/abs/2407.07755v3

Evaluating Synthetic Tabular Data Generated To Augment Small Sample Datasets

This work proposes a method to evaluate synthetic tabular data generated to augment small sample datasets. While data augmentation techniques can increase sample counts for machine learning applications, traditional validation approaches fail when applied to extremely limited sample sizes. Our experiments across four datasets reveal significant inconsistencies between global metrics and topological measures, with statistical tests producing unreliable significance values due to insufficient sample sizes. We demonstrate that common metrics like propensity scoring and MMD often suggest similarity where fundamental topological differences exist. Our proposed normalized Bottleneck distance based metric provides complementary insights but suffers from high variability across experimental runs and occasional values exceeding theoretical bounds, showing inherent instability in topological approaches for very small datasets. These findings highlight the critical need for multi-faceted evaluation methodologies when validating synthetic data generated from limited samples, as no single metric reliably captures both distributional and structural similarity.

Updated: 2025-03-14 18:08:54

标题: 评估生成的合成表格数据以增加小样本数据集

摘要: 这项工作提出了一种评估合成表格数据的方法，以增加小样本数据集的样本数量。虽然数据增强技术可以增加机器学习应用的样本数量，但传统的验证方法在应用于极为有限的样本大小时会出现失败。我们在四个数据集上的实验揭示了全局指标和拓扑度量之间的显著不一致性，统计测试产生的显著性值不可靠，因为样本容量不足。我们证明了常见的度量标准如倾向评分和MMD经常表明相似性，而实际上存在基本的拓扑差异。我们提出的基于归一化瓶颈距离的度量标准提供了补充见解，但在实验运行中存在较高的可变性，偶尔出现超出理论边界的值，显示了对于非常小的数据集而言，拓扑方法的固有不稳定性。这些发现突显了在验证从有限样本生成的合成数据时，需要多方面的评估方法的关键性需求，因为没有单一度量能可靠地捕捉分布和结构相似性。

更新时间: 2025-03-14 18:08:54

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.10760v5

Automated Verification of Equivalence Properties in Advanced Logic Programs -- Bachelor Thesis

With the increase in industrial applications using Answer Set Programming, the need for formal verification tools, particularly for critical applications, has also increased. During the program optimisation process, it would be desirable to have a tool which can automatically verify whether an optimised subprogram can replace the original subprogram. Formally this corresponds to the problem of verifying the strong equivalence of two programs. In order to do so, the translation tool anthem was developed. It can be used in conjunction with an automated theorem prover for classical logic to verify that two programs are strongly equivalent. With the current version of anthem, only the strong equivalence of positive programs with a restricted input language can be verified. This is a result of the translation $\tau^*$ implemented in anthem that produces formulas in the logic of here-and-there, which coincides with classical logic only for positive programs. This thesis extends anthem in order to overcome these limitations. First, the transformation $\sigma^*$ is presented, which transforms formulas from the logic of here-and-there to classical logic. A theorem formalises how $\sigma^*$ can be used to express equivalence in the logic of here-and-there in classical logic. Second, the translation $\tau^*$ is extended to programs containing pools. Another theorem shows how $\sigma^*$ can be combined with $\tau^*$ to express the strong equivalence of two programs in classical logic. With $\sigma^*$ and the extended $\tau^*$, it is possible to express the strong equivalence of logic programs containing negation, simple choices, and pools. Both the extended $\tau^*$ and $\sigma^*$ are implemented in a new version of anthem. Several examples of logic programs containing pools, negation, and simple choice rules, which the new version of anthem can translate to classical logic, are presented. Some a...

Updated: 2025-03-14 18:06:10

标题: 高级逻辑程序中等价性属性的自动验证 -- 学士论文

摘要: 随着越来越多的工业应用采用答案集编程，尤其是对于关键应用而言，形式验证工具的需求也在增加。在程序优化过程中，希望能够拥有一个工具，可以自动验证优化的子程序是否可以替换原始子程序。从形式上讲，这对应于验证两个程序的强等价性的问题。为了实现这一点，开发了翻译工具anthem。它可以与经典逻辑的自动定理证明器一起使用，以验证两个程序是否强等价。目前版本的anthem只能验证具有受限输入语言的正程序的强等价性。这是anthem中实现的翻译$\tau^*$的结果，它生成了在此处和那里逻辑中的公式，与经典逻辑仅在正程序中相符。本论文扩展了anthem，以克服这些限制。首先，介绍了变换$\sigma^*$，它将此处和那里逻辑中的公式转换为经典逻辑。一个定理规范了如何使用$\sigma^*$来在经典逻辑中表达此处和那里逻辑中的等价性。其次，将翻译$\tau^*$扩展为包含池的程序。另一个定理展示了如何将$\sigma^*$与$\tau^*$结合起来，在经典逻辑中表达两个程序的强等价性。使用$\sigma^*$和扩展的$\tau^*$，可以表达包含否定、简单选择和池的逻辑程序的强等价性。扩展的$\tau^*$和$\sigma^*$均已在新版本的anthem中实现。展示了几个包含池、否定和简单选择规则的逻辑程序示例，新版本的anthem可以将其转换为经典逻辑。一些...

更新时间: 2025-03-14 18:06:10

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2310.19806v5

Learn to Teach: Sample-Efficient Privileged Learning for Humanoid Locomotion over Diverse Terrains

Humanoid robots promise transformative capabilities for industrial and service applications. While recent advances in Reinforcement Learning (RL) yield impressive results in locomotion, manipulation, and navigation, the proposed methods typically require enormous simulation samples to account for real-world variability. This work proposes a novel one-stage training framework-Learn to Teach (L2T)-which unifies teacher and student policy learning. Our approach recycles simulator samples and synchronizes the learning trajectories through shared dynamics, significantly reducing sample complexities and training time while achieving state-of-the-art performance. Furthermore, we validate the RL variant (L2T-RL) through extensive simulations and hardware tests on the Digit robot, demonstrating zero-shot sim-to-real transfer and robust performance over 12+ challenging terrains without depth estimation modules.

Updated: 2025-03-14 18:05:18

标题: 学习教学：用于人形机器人在多样化地形上行走的高效特权学习

摘要: 人形机器人在工业和服务应用中具有变革性的能力。尽管最近强化学习（RL）方面取得了在步行、操纵和导航方面令人印象深刻的成果，但所提出的方法通常需要大量的仿真样本来考虑现实世界的变化。本文提出了一种新颖的一阶段训练框架-Learn to Teach（L2T）-将教师和学生策略学习统一起来。我们的方法通过共享动态重新利用模拟器样本，并通过同步学习轨迹显著降低了样本复杂性和训练时间，同时实现了最先进的性能。此外，我们通过广泛的模拟和硬件测试对RL变体（L2T-RL）进行验证，在Digit机器人上展示了零次迁移和对12个以上具有挑战性的地形的稳健性能，而无需深度估计模块。

更新时间: 2025-03-14 18:05:18

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2402.06783v2

UBMF: Uncertainty-Aware Bayesian Meta-Learning Framework for Fault Diagnosis with Imbalanced Industrial Data

Fault diagnosis of mechanical equipment involves data collection, feature extraction, and pattern recognition but is often hindered by the imbalanced nature of industrial data, introducing significant uncertainty and reducing diagnostic reliability. To address these challenges, this study proposes the Uncertainty-Aware Bayesian Meta-Learning Framework (UBMF), which integrates four key modules: data perturbation injection for enhancing feature robustness, cross-task self-supervised feature extraction for improving transferability, uncertainty-based sample filtering for robust out-of-domain generalization, and Bayesian meta-knowledge integration for fine-grained classification. Experimental results on ten open-source datasets under various imbalanced conditions, including cross-task, small-sample, and unseen-sample scenarios, demonstrate the superiority of UBMF, achieving an average improvement of 42.22% across ten Any-way 1-5-shot diagnostic tasks. This integrated framework effectively enhances diagnostic accuracy, generalization, and adaptability, providing a reliable solution for complex industrial fault diagnosis.

Updated: 2025-03-14 18:05:18

标题: UBMF：基于不平衡工业数据的不确定性感知贝叶斯元学习故障诊断框架

摘要: 机械设备的故障诊断涉及数据收集、特征提取和模式识别，但常常受制于工业数据的不平衡性，引入了显著的不确定性，降低了诊断可靠性。为了解决这些挑战，本研究提出了一种不确定性感知的贝叶斯元学习框架（UBMF），该框架整合了四个关键模块：数据扰动注入以增强特征的稳健性，跨任务自监督特征提取以提高可转移性，基于不确定性的样本筛选以实现领域外泛化的稳健性，以及贝叶斯元知识整合以进行细粒度分类。在包括跨任务、小样本和未见样本场景在内的各种不平衡条件下，对十个开源数据集进行的实验结果表明，UBMF的优越性，实现了十个Any-way 1-5-shot诊断任务的平均改进率为42.22%。这一综合框架有效提高了诊断准确性、泛化性和适应性，为复杂工业故障诊断提供了可靠的解决方案。

更新时间: 2025-03-14 18:05:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.11774v1

Ranking and Selection with Simultaneous Input Data Collection

In this paper, we propose a general and novel formulation of ranking and selection with the existence of streaming input data. The collection of multiple streams of such data may consume different types of resources, and hence can be conducted simultaneously. To utilize the streaming input data, we aggregate simulation outputs generated under heterogeneous input distributions over time to form a performance estimator. By characterizing the asymptotic behavior of the performance estimators, we formulate two optimization problems to optimally allocate budgets for collecting input data and running simulations. We then develop a multi-stage simultaneous budget allocation procedure and provide its statistical guarantees such as consistency and asymptotic normality. We conduct several numerical studies to demonstrate the competitive performance of the proposed procedure.

Updated: 2025-03-14 18:04:55

标题: 排名和选择与同时输入数据收集

摘要: 在本文中，我们提出了一个关于排名和选择的一般性和新颖的公式，该公式考虑了流式输入数据的存在。这些数据流的集合可能消耗不同类型的资源，因此可以同时进行。为了利用流式输入数据，我们聚合在时间内由异构输入分布生成的模拟输出，形成性能估计器。通过表征性能估计器的渐近行为，我们制定了两个优化问题，以最佳分配预算来收集输入数据和运行模拟。然后，我们开发了一个多阶段同时预算分配程序，并提供其统计保证，如一致性和渐近正态性。我们进行了几项数值研究，以展示所提出的程序的竞争性表现。

更新时间: 2025-03-14 18:04:55

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2503.11773v1

Does calibration mean what they say it means; or, the reference class problem rises again

Discussions of statistical criteria for fairness commonly convey the normative significance of calibration within groups by invoking what risk scores "mean." On the Same Meaning picture, group-calibrated scores "mean the same thing" (on average) across individuals from different groups and accordingly, guard against disparate treatment of individuals based on group membership. My contention is that calibration guarantees no such thing. Since concrete actual people belong to many groups, calibration cannot ensure the kind of consistent score interpretation that the Same Meaning picture implies matters for fairness, unless calibration is met within every group to which an individual belongs. Alas only perfect predictors may meet this bar. The Same Meaning picture thus commits a reference class fallacy by inferring from calibration within some group to the "meaning" or evidential value of an individual's score, because they are a member of that group. The reference class answer it presumes does not only lack justification; it is very likely wrong. I then show that the reference class problem besets not just calibration but other group statistical criteria that claim a close connection to fairness. Reflecting on the origins of this oversight opens a wider lens onto the predominant methodology in algorithmic fairness based on stylized cases.

Updated: 2025-03-14 18:04:21

标题: 校准是否意味着他们所说的意思；或者，参考类问题再次出现

摘要: 讨论公平性的统计标准通常通过调用风险评分的“含义”来传达组内校准的规范意义。在相同含义的观点中，组内校准评分在不同组的个体之间“平均意义上”意味着相同，并且据此防止基于群体成员身份的个体差异对待。我的观点是，校准并不能保证这一点。由于具体的实际人属于许多群体，校准不能确保与相同含义的观点所暗示的公平有关，除非在个体所属的每个群体内都满足校准。可惜只有完美的预测者才能达到这个标准。因此，相同含义的观点通过从某些群体内的校准推断出个体评分的“含义”或证据价值，因为他们是该群体的成员，从而犯了参考类错误。它所假设的参考类答案不仅缺乏理据，而且很可能是错误的。然后，我展示了参考类问题不仅困扰校准，还困扰其他声称与公平密切相关的群体统计标准。反思这一疏忽的起源为基于简化案例的算法公平性中主导的方法论打开了更广泛的视角。

更新时间: 2025-03-14 18:04:21

领域: cs.LG

下载: http://arxiv.org/abs/2412.16769v2

Centaur: Robust End-to-End Autonomous Driving with Test-Time Training

How can we rely on an end-to-end autonomous vehicle's complex decision-making system during deployment? One common solution is to have a ``fallback layer'' that checks the planned trajectory for rule violations and replaces it with a pre-defined safe action if necessary. Another approach involves adjusting the planner's decisions to minimize a pre-defined ``cost function'' using additional system predictions such as road layouts and detected obstacles. However, these pre-programmed rules or cost functions cannot learn and improve with new training data, often resulting in overly conservative behaviors. In this work, we propose Centaur (Cluster Entropy for Test-time trAining using Uncertainty) which updates a planner's behavior via test-time training, without relying on hand-engineered rules or cost functions. Instead, we measure and minimize the uncertainty in the planner's decisions. For this, we develop a novel uncertainty measure, called Cluster Entropy, which is simple, interpretable, and compatible with state-of-the-art planning algorithms. Using data collected at prior test-time time-steps, we perform an update to the model's parameters using a gradient that minimizes the Cluster Entropy. With only this sole gradient update prior to inference, Centaur exhibits significant improvements, ranking first on the navtest leaderboard with notable gains in safety-critical metrics such as time to collision. To provide detailed insights on a per-scenario basis, we also introduce navsafe, a challenging new benchmark, which highlights previously undiscovered failure modes of driving models.

Updated: 2025-03-14 17:59:41

标题: 半人马：具有测试时间训练的端到端自动驾驶的稳健性

摘要: 我们如何在部署过程中依赖端到端自主车辆复杂决策系统？一个常见的解决方案是拥有一个“备用层”，检查计划的轨迹是否违反规则，并在必要时用预定义的安全动作替换它。另一种方法涉及调整规划者的决策，通过使用额外的系统预测，如道路布局和检测到的障碍物，以最小化预定义的“成本函数”。然而，这些预先编程的规则或成本函数无法通过新的训练数据进行学习和改进，通常导致过于保守的行为。在这项工作中，我们提出了Centaur（使用不确定性进行测试时训练的簇熵）的概念，该概念通过测试时训练更新规划者的行为，而不依赖于手工设计的规则或成本函数。相反，我们衡量并最小化规划者的决策中的不确定性。为此，我们开发了一种新颖的不确定性测量方法，称为簇熵，它简单、可解释，并与最先进的规划算法兼容。利用之前测试时步骤收集的数据，我们通过使用最小化簇熵的梯度对模型参数进行更新。仅在推断之前进行这个唯一的梯度更新，Centaur表现出显著的改进，在navtest排行榜上排名第一，并在诸如碰撞时间等安全关键指标方面取得显著增长。为了提供基于每个场景的详细见解，我们还引入了navsafe，一个具有挑战性的新基准，突显了驾驶模型之前未发现的故障模式。

更新时间: 2025-03-14 17:59:41

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11650v1

Privacy Violations in Election Results

After an election, should election officials release a copy of each anonymous ballot? Some policymakers have championed public disclosure to counter distrust, but others worry that it might undermine ballot secrecy. We introduce the term vote revelation to refer to the linkage of a vote on an anonymous ballot to the voter's name in the public voter file, and detail how such revelation could theoretically occur. Using the 2020 election in Maricopa County, Arizona, as a case study, we show that the release of individual ballot records would lead to no revelation of any vote choice for 99.83% of voters as compared to 99.95% under Maricopa's current practice of reporting aggregate results by precinct and method of voting. Further, revelation is overwhelmingly concentrated among the few voters who cast provisional ballots or federal-only ballots. We discuss the potential benefits of transparency, compare remedies to reduce or eliminate privacy violations, and highlight the privacy-transparency tradeoff inherent in all election reporting.

Updated: 2025-03-14 17:59:14

标题: 选举结果中的隐私侵犯

摘要: 在选举之后，选举官员是否应该发布每张匿名选票的副本？一些政策制定者支持公开披露以对抗不信任，但其他人担心这可能会损害选票保密性。我们引入了“投票揭示”一词，指的是将匿名选票上的选票与公共选民档案中的选民姓名联系起来，以及如何理论上发生这种揭示。以亚利桑那州马里科帕县2020年选举为案例研究，我们展示了发布个人选票记录将导致99.83%的选民的任何投票选择未被揭示，而与马里科帕当前按选区和投票方式报告聚合结果的做法相比，这一比例为99.95%。此外，揭示主要集中在少数投票者身上，他们投了临时选票或仅联邦选票。我们讨论透明性的潜在好处，比较减少或消除隐私侵犯的补救措施，并强调所有选举报道中固有的隐私-透明度权衡。

更新时间: 2025-03-14 17:59:14

领域: cs.CR,stat.AP

下载: http://arxiv.org/abs/2308.04100v5

Machine learning-based identification of Gaia astrometric exoplanet orbits

The third Gaia data release (DR3) contains $\sim$170\,000 astrometric orbit solutions of two-body systems located within $\sim$500 pc of the Sun. Determining component masses in these systems, in particular of stars hosting exoplanets, usually hinges on incorporating complementary observations in addition to the astrometry, e.g. spectroscopy and radial velocities. Several Gaia DR3 two-body systems with exoplanet, brown-dwarf, stellar, and black-hole components have been confirmed in this way. We developed an alternative machine learning approach that uses only the Gaia DR3 orbital solutions with the aim of identifying the best candidates for exoplanets and brown-dwarf companions. Based on confirmed substellar companions in the literature, we use semi-supervised anomaly detection methods in combination with extreme gradient boosting and random forest classifiers to determine likely low-mass outliers in the population of non-single sources. We employ and study feature importance to investigate the method's plausibility and produced a list of 20 best candidates of which two are exoplanet candidates and another five are either very-massive brown dwarfs or very-low mass stars. Three candidates, including one initial exoplanet candidate, correspond to false-positive solutions where longer-period binary star motion was fitted with a biased shorter-period orbit. We highlight nine candidates with brown-dwarf companions for preferential follow-up. The companion around the Sun-like star G\,15-6 could be confirmed as a genuine brown dwarf using external radial-velocity data. This new approach is a powerful complement to the traditional identification methods for substellar companions among Gaia astrometric orbits. It is particularly relevant in the context of Gaia DR4 and its expected exoplanet discovery yield.

Updated: 2025-03-14 17:59:04

标题: 基于机器学习的Gaia恒星测量外行星轨道的识别

摘要: 第三次盖亚数据发布（DR3）包含大约170,000个二体系统的天体轨道解，这些系统位于距离太阳大约500光年的范围内。确定这些系统中的组件质量，特别是承载外行星的恒星，通常依赖于将补充观测数据（如光谱学和径向速度）与天体测量学相结合。通过这种方式，已确认了几个具有外行星、棕矮星、恒星和黑洞组分的Gaia DR3二体系统。我们开发了一种仅使用Gaia DR3轨道解的替代机器学习方法，旨在识别外行星和棕矮星伴星的最佳候选者。基于文献中已确认的次恒星伴星，我们结合极端梯度增强和随机森林分类器使用半监督异常检测方法，确定非单一源群体中可能的低质量异常值。我们使用和研究特征重要性来调查该方法的合理性，并生成了一个包含20个最佳候选者的列表，其中两个是外行星候选者，另外五个是非常大质量的棕矮星或非常低质量的恒星。其中三个候选者，包括一个最初的外行星候选者，对应于虚假阳性解，其中更长周期的双星运动被拟合为偏向较短周期轨道。我们强调了九个具有棕矮星伴星的候选者，建议进行优先后续跟踪。围绕类似太阳的恒星G 15-6的伴星可以使用外部径向速度数据确认为真实的棕矮星。这种新方法是Gaia天体测量轨道中次恒星伴星的传统识别方法的有力补充。在Gaia DR4及其预期的外行星发现产量方面，这一方法尤为重要。

更新时间: 2025-03-14 17:59:04

领域: astro-ph.EP,astro-ph.IM,astro-ph.SR,cs.LG

下载: http://arxiv.org/abs/2404.09350v2

On the phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance

Matrix denoising is central to signal processing and machine learning. Its statistical analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and an efficient Bayes-optimal denoising algorithm, called rotational invariant estimator [1,2], are known. Beyond this setting few results can be found. The reason is that the model is not a usual spin system because of the growing rank dimension, nor a matrix model (as appearing in high-energy physics) due to the lack of rotation symmetry, but rather a hybrid between the two. Here we make progress towards the understanding of Bayesian matrix denoising when the signal is a factored matrix $XX^\intercal$ that is not rotationally invariant. Monte Carlo simulations suggest the existence of a \emph{denoising-factorisation transition} separating a phase where denoising using the rotational invariant estimator remains Bayes-optimal due to universality properties of the same nature as in random matrix theory, from one where universality breaks down and better denoising is possible, though algorithmically hard. We argue that it is only beyond the transition that factorisation, i.e., estimating $X$ itself, becomes possible up to irresolvable ambiguities. On the theory side, we combine mean-field techniques in an interpretable multiscale fashion in order to access the minimum mean-square error and mutual information. Interestingly, our alternative method yields equations reproducible by the replica approach of [3]. Using numerical insights, we delimit the portion of phase diagram where we conjecture the mean-field theory to be exact, and correct it using universality when it is not. Our complete ansatz matches well the numerics in the whole phase diagram when considering finite size effects.

Updated: 2025-03-14 17:58:33

标题: 关于超出旋转不变性的广义秩对称矩阵去噪的相图

摘要: 矩阵去噪是信号处理和机器学习的核心。当需要推断的矩阵具有随着其维度成比例增长的因式结构时，其统计分析仍然是一个挑战，除非它是旋转不变的。在这种情况下，已知信息理论限制和一个高效的贝叶斯最优去噪算法，称为旋转不变估计器。在这个设置之外，很少能找到结果。原因在于该模型不是一个常规的自旋系统，因为秩维度增长，也不是一个矩阵模型（如出现在高能物理中）由于缺乏旋转对称性，而是两者之间的混合体。在这里，我们在理解信号是非旋转不变的因子矩阵$XX^\intercal$时，取得了进展。蒙特卡洛模拟表明存在一个分隔阶段，在这个阶段中，使用旋转不变估计器进行去噪仍然是贝叶斯最优的，这是由于与随机矩阵理论相同性质的普遍性属性。另一方面，我们认为只有在过渡阶段之后，因子化，即估计$X$本身，才变得可能直到无法解决的歧义。在理论方面，我们以可解释的多尺度方式结合平均场技术，以便访问最小均方误差和互信息。有趣的是，我们的替代方法产生了可以通过[3]中的复制方法再现的方程。利用数值见解，我们限定了相图中我们推测平均场理论精确的部分，并在不精确时使用普遍性进行校正。我们的完整假设与有限尺寸效应相关的数值非常匹配。

更新时间: 2025-03-14 17:58:33

领域: cond-mat.dis-nn,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2411.01974v2

Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization

In the realm of large vision-language models (LVLMs), adversarial jailbreak attacks serve as a red-teaming approach to identify safety vulnerabilities of these models and their associated defense mechanisms. However, we identify a critical limitation: not every adversarial optimization step leads to a positive outcome, and indiscriminately accepting optimization results at each step may reduce the overall attack success rate. To address this challenge, we introduce HKVE (Hierarchical Key-Value Equalization), an innovative jailbreaking framework that selectively accepts gradient optimization results based on the distribution of attention scores across different layers, ensuring that every optimization step positively contributes to the attack. Extensive experiments demonstrate HKVE's significant effectiveness, achieving attack success rates of 75.08% on MiniGPT4, 85.84% on LLaVA and 81.00% on Qwen-VL, substantially outperforming existing methods by margins of 20.43\%, 21.01\% and 26.43\% respectively. Furthermore, making every step effective not only leads to an increase in attack success rate but also allows for a reduction in the number of iterations, thereby lowering computational costs. Warning: This paper contains potentially harmful example data.

Updated: 2025-03-14 17:57:42

标题: 使每一步有效：通过分层KV均衡来越狱大型视觉语言模型

摘要: 在大型视觉语言模型（LVLMs）领域，对抗性越狱攻击作为一种红队测试方法，用于识别这些模型及其相关防御机制的安全漏洞。然而，我们发现一个关键限制：并非每个对抗性优化步骤都会产生积极结果，而盲目接受每个步骤的优化结果可能会降低整体攻击成功率。为了解决这一挑战，我们引入了HKVE（分层键值等化），这是一个创新的越狱框架，根据不同层之间注意力分数的分布选择性地接受梯度优化结果，确保每个优化步骤都对攻击产生积极贡献。大量实验证明HKVE的显著有效性，在MiniGPT4上实现了75.08%的攻击成功率，在LLaVA上实现了85.84%的攻击成功率，在Qwen-VL上实现了81.00%的攻击成功率，较现有方法分别提高了20.43\%，21.01\%和26.43\%。此外，使每个步骤都有效不仅会增加攻击成功率，还将减少迭代次数，从而降低计算成本。警告：本文包含潜在有害示例数据。

更新时间: 2025-03-14 17:57:42

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2503.11750v1

Enhancing Deep Learning Based Structured Illumination Microscopy Reconstruction with Light Field Awareness

Structured illumination microscopy (SIM) is a pivotal technique for dynamic subcellular imaging in live cells. Conventional SIM reconstruction algorithms depend on accurately estimating the illumination pattern and can introduce artefacts when this estimation is imprecise. Although recent deep learning-based SIM reconstruction methods have improved speed, accuracy, and robustness, they often struggle with out-of-distribution data. To address this limitation, we propose an Awareness-of-Light-field SIM (AL-SIM) reconstruction approach that directly estimates the actual light field to correct for errors arising from data distribution shifts. Through comprehensive experiments on both simulated filament structures and live BSC1 cells, our method demonstrates a 7% reduction in the normalized root mean square error (NRMSE) and substantially lowers reconstruction artefacts. By minimizing these artefacts and improving overall accuracy, AL-SIM broadens the applicability of SIM for complex biological systems.

Updated: 2025-03-14 17:56:49

标题: 利用光场感知增强基于深度学习的结构光显微镜重建

摘要: 结构化照明显微镜（SIM）是动态亚细胞成像在活细胞中的关键技术。传统SIM重建算法依赖于准确估计照明模式，当该估计不精确时可能引入伪影。虽然最近基于深度学习的SIM重建方法改善了速度、准确性和鲁棒性，但它们往往在超出分布数据方面遇到困难。为了解决这一限制，我们提出了一种感知光场SIM（AL-SIM）重建方法，直接估计实际光场以纠正由数据分布偏移引起的错误。通过对模拟的丝状结构和活体BSC1细胞进行全面实验，我们的方法表现出7%的标准均方根误差（NRMSE）降低，并显著降低重建伪影。通过最小化这些伪影并提高总体准确性，AL-SIM扩展了SIM在复杂生物系统中的适用性。

更新时间: 2025-03-14 17:56:49

领域: physics.optics,cs.AI

下载: http://arxiv.org/abs/2503.11640v1

CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning

Scientific problem-solving involves synthesizing information while applying expert knowledge. We introduce CURIE, a scientific long-Context Understanding,Reasoning and Information Extraction benchmark to measure the potential of Large Language Models (LLMs) in scientific problem-solving and assisting scientists in realistic workflows. This benchmark introduces ten challenging tasks with a total of 580 problems and solution pairs curated by experts in six disciplines - materials science, condensed matter physics, quantum computing, geospatial analysis, biodiversity, and proteins - covering both experimental and theoretical work-flows in science. We evaluate a range of closed and open LLMs on tasks in CURIE which requires domain expertise, comprehension of long in-context information,and multi-step reasoning. While Gemini Flash 2.0 and Claude-3 show consistent high comprehension across domains, the popular GPT-4o and command-R+ fail dramatically on protein sequencing tasks. With the best performance at 32% there is much room for improvement for all models. We hope that insights gained from CURIE can guide the future development of LLMs in sciences. Evaluation code and data are in https://github.com/google/curie

Updated: 2025-03-14 17:53:03

标题: CURIE：评估LLMs在多任务科学长篇背景理解和推理中的表现

摘要: 科学问题解决涉及综合信息并应用专业知识。我们引入CURIE，一个科学长上下文理解、推理和信息提取基准，以衡量大型语言模型（LLMs）在科学问题解决和协助科学家进行现实工作流程中的潜力。该基准引入了十个具有挑战性的任务，总共包含580个问题和解决方案对，由六个学科的专家策划 - 材料科学、凝聚态物理、量子计算、地理空间分析、生物多样性和蛋白质 - 涵盖科学中的实验和理论工作流程。我们评估了一系列封闭和开放的LLMs在CURIE中的任务，这些任务要求领域专业知识、理解长篇上下文信息和多步推理。尽管Gemini Flash 2.0和Claude-3在各领域展现出一致高的理解能力，但流行的GPT-4o和命令-R+在蛋白质测序任务上表现出严重失败。在最佳表现为32%的情况下，所有模型均有很大改进空间。我们希望从CURIE中获得的见解可以指导未来科学领域LLMs的发展。评估代码和数据位于https://github.com/google/curie。

更新时间: 2025-03-14 17:53:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.13517v1

Translating Between the Common Haar Random State Model and the Unitary Model

Black-box separations are a cornerstone of cryptography, indicating barriers to various goals. A recent line of work has explored black-box separations for quantum cryptographic primitives. Namely, a number of separations are known in the Common Haar Random State (CHRS) model, though this model is not considered a complete separation, but rather a starting point. A few very recent works have attempted to lift these separations to a unitary separation, which are considered complete separations. Unfortunately, we find significant errors in some of these lifting results. We prove general conditions under which CHRS separations can be generically lifted, thereby giving simple, modular, and bug-free proofs of complete unitary separations between various quantum primitives. Our techniques allow for simpler proofs of existing separations as well as new separations that were previously only known in the CHRS model.

Updated: 2025-03-14 17:52:48

标题: 在通用Haar随机态模型和幺正模型之间的翻译

摘要: 黑匣子分离是密码学的基础，表明了各种目标的障碍。最近的一系列工作探讨了量子密码原语的黑匣子分离。即，在Common Haar Random State（CHRS）模型中已知一些分离，尽管这个模型并不被认为是完全分离，而是一个起点。最近一些工作试图将这些分离提升到一个幺正分离，这被认为是完全分离。不幸的是，我们发现一些提升结果中存在显著错误。我们证明了通用条件，根据这些条件，可以将CHRS分离通用地提升，从而给出了各种量子原语之间的完全幺正分离的简单、模块化和无错误的证明。我们的技术允许更简单地证明现有的分离，以及之前只在CHRS模型中已知的新的分离。

更新时间: 2025-03-14 17:52:48

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2503.11634v1

GPT for Games: An Updated Scoping Review (2020-2024)

Due to GPT's impressive generative capabilities, its applications in games are expanding rapidly. To offer researchers a comprehensive understanding of the current applications and identify both emerging trends and unexplored areas, this paper introduces an updated scoping review of 177 articles, 122 of which were published in 2024, to explore GPT's potential for games. By coding and synthesizing the papers, we identify five prominent applications of GPT in current game research: procedural content generation, mixed-initiative game design, mixed-initiative gameplay, playing games, and game user research. Drawing on insights from these application areas and emerging research, we propose future studies should focus on expanding the technical boundaries of the GPT models and exploring the complex interaction dynamics between them and users. This review aims to illustrate the state of the art in innovative GPT applications in games, offering a foundation to enrich game development and enhance player experiences through cutting-edge AI innovations.

Updated: 2025-03-14 17:50:19

标题: 游戏中的GPT：一项更新的范围回顾（2020-2024）

摘要: 由于GPT令人印象深刻的生成能力，其在游戏中的应用正在迅速扩展。为了为研究人员提供对当前应用的全面了解，并识别新兴趋势和未开发领域，本文介绍了一项更新的范围性回顾，共包括177篇文章，其中122篇发表于2024年，以探索GPT在游戏中的潜力。通过对这些论文进行编码和综合，我们确定了GPT在当前游戏研究中的五个突出应用领域：程序内容生成、混合倡议游戏设计、混合倡议游戏玩法、玩游戏以及游戏用户研究。借鉴这些应用领域和新兴研究的见解，我们建议未来的研究应侧重于拓展GPT模型的技术边界，并探索它们与用户之间复杂的交互动态。本回顾旨在展示游戏中创新GPT应用的最新发展，为通过尖端人工智能创新丰富游戏开发并增强玩家体验提供基础。

更新时间: 2025-03-14 17:50:19

领域: cs.AI,A.1

下载: http://arxiv.org/abs/2411.00308v2

Are Deep Speech Denoising Models Robust to Adversarial Noise?

Deep noise suppression (DNS) models enjoy widespread use throughout a variety of high-stakes speech applications. However, in this paper, we show that four recent DNS models can each be reduced to outputting unintelligible gibberish through the addition of imperceptible adversarial noise. Furthermore, our results show the near-term plausibility of targeted attacks, which could induce models to output arbitrary utterances, and over-the-air attacks. While the success of these attacks varies by model and setting, and attacks appear to be strongest when model-specific (i.e., white-box and non-transferable), our results highlight a pressing need for practical countermeasures in DNS systems.

Updated: 2025-03-14 17:46:34

标题: 深度语音降噪模型是否对对抗性噪声具有鲁棒性？

摘要: 深度噪声抑制（DNS）模型在各种高风险语音应用中被广泛使用。然而，在本文中，我们展示了最近的四种DNS模型可以通过添加不可察觉的对抗性噪声来输出无法理解的胡言乱语。此外，我们的结果显示了有针对性攻击的近期可能性，这可能导致模型输出任意话语，以及通过空气传播的攻击。尽管这些攻击的成功率因模型和环境而异，攻击在模型特定（即，白盒和不可转移）时表现最强，我们的结果突显了DNS系统中实际对抗措施的迫切需求。

更新时间: 2025-03-14 17:46:34

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2503.11627v1

Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense

Deploying large vision-language models (LVLMs) introduces a unique vulnerability: susceptibility to malicious attacks via visual inputs. However, existing defense methods suffer from two key limitations: (1) They solely focus on textual defenses, fail to directly address threats in the visual domain where attacks originate, and (2) the additional processing steps often incur significant computational overhead or compromise model performance on benign tasks. Building on these insights, we propose ESIII (Embedding Security Instructions Into Images), a novel methodology for transforming the visual space from a source of vulnerability into an active defense mechanism. Initially, we embed security instructions into defensive images through gradient-based optimization, obtaining security instructions in the visual dimension. Subsequently, we integrate security instructions from visual and textual dimensions with the input query. The collaboration between security instructions from different dimensions ensures comprehensive security protection. Extensive experiments demonstrate that our approach effectively fortifies the robustness of LVLMs against such attacks while preserving their performance on standard benign tasks and incurring an imperceptible increase in time costs.

Updated: 2025-03-14 17:39:45

标题: 以牙还牙：通过对抗性防御保护大视觉-语言模型免受越狱攻击

摘要: 部署大型视觉语言模型（LVLMs）引入了一种独特的漏洞：对视觉输入的恶意攻击易受影响。然而，现有的防御方法存在两个关键限制：（1）它们仅专注于文本防御，未能直接解决攻击发源地的视觉领域的威胁，（2）额外的处理步骤往往会带来显著的计算开销或者损害模型在良性任务上的性能。基于这些观点，我们提出了ESIII（将安全指令嵌入图像），这是一种将视觉空间从一个漏洞源转变为主动防御机制的新方法。最初，我们通过基于梯度的优化将安全指令嵌入防御图像中，从而获得视觉维度上的安全指令。随后，我们将来自视觉和文本维度的安全指令与输入查询进行整合。来自不同维度的安全指令的协作确保了全面的安全保护。大量实验证明，我们的方法有效地加强了LVLMs对此类攻击的鲁棒性，同时保持了它们在标准良性任务上的性能，并且时间成本的增加几乎是不可察觉的。

更新时间: 2025-03-14 17:39:45

领域: cs.CR

下载: http://arxiv.org/abs/2503.11619v1

ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning

Analysis and comprehension of assembly code are crucial in various applications, such as reverse engineering. However, the low information density and lack of explicit syntactic structures in assembly code pose significant challenges. Pioneering approaches with masked language modeling (MLM)-based methods have been limited by facilitating natural language interaction. While recent methods based on decoder-focused large language models (LLMs) have significantly enhanced semantic representation, they still struggle to capture the nuanced and sparse semantics in assembly code. In this paper, we propose Assembly Augmented Tuning (ASMA-Tune), an end-to-end structural-semantic instruction-tuning framework. Our approach synergizes encoder architectures with decoder-based LLMs through projector modules to enable comprehensive code understanding. Experiments show that ASMA-Tune outperforms existing benchmarks, significantly enhancing assembly code comprehension and instruction-following abilities. Our model and dataset are public at https://github.com/wxy3596/ASMA-Tune.

Updated: 2025-03-14 17:36:08

标题: ASMA-Tune：通过结构-语义指令调整解锁LLMs的汇编代码理解

摘要: 对汇编代码的分析和理解在各种应用中至关重要，比如逆向工程。然而，汇编代码中的信息密度低，缺乏明确的语法结构，这带来了重大挑战。以基于掩模语言建模（MLM）的方法为先导的方法受到了促进自然语言交互的限制。而基于解码器聚焦的大型语言模型（LLMs）的最近方法已经显著增强了语义表示，但仍然难以捕捉汇编代码中微妙且稀疏的语义。在本文中，我们提出了一种名为Assembly Augmented Tuning（ASMA-Tune）的端到端结构-语义指令调整框架。我们的方法通过投影模块将编码器架构与基于解码器的LLMs协同，实现了对代码的全面理解。实验证明，ASMA-Tune优于现有基准，显著增强了汇编代码的理解和指令遵循能力。我们的模型和数据集可在https://github.com/wxy3596/ASMA-Tune上公开访问。

更新时间: 2025-03-14 17:36:08

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.11617v1

From Denoising Score Matching to Langevin Sampling: A Fine-Grained Error Analysis in the Gaussian Setting

Sampling from an unknown distribution, accessible only through discrete samples, is a fundamental problem at the core of generative AI. The current state-of-the-art methods follow a two-step process: first estimating the score function (the gradient of a smoothed log-distribution) and then applying a gradient-based sampling algorithm. The resulting distribution's correctness can be impacted by several factors: the generalization error due to a finite number of initial samples, the error in score matching, and the diffusion error introduced by the sampling algorithm. In this paper, we analyze the sampling process in a simple yet representative setting-sampling from Gaussian distributions using a Langevin diffusion sampler. We provide a sharp analysis of the Wasserstein sampling error that arises from the multiple sources of error throughout the pipeline. This allows us to rigorously track how the anisotropy of the data distribution (encoded by its power spectrum) interacts with key parameters of the end-to-end sampling method, including the noise amplitude, the step sizes in both score matching and diffusion, and the number of initial samples. Notably, we show that the Wasserstein sampling error can be expressed as a kernel-type norm of the data power spectrum, where the specific kernel depends on the method parameters. This result provides a foundation for further analysis of the tradeoffs involved in optimizing sampling accuracy, such as adapting the noise amplitude to the choice of step sizes.

Updated: 2025-03-14 17:35:00

标题: 从去噪评分匹配到朗维范例采样：在高斯设定中的细粒度误差分析

摘要: 从仅通过离散样本访问的未知分布中进行抽样是生成式人工智能核心问题中的一个基本问题。当前最先进的方法遵循一个两步过程：首先估计得分函数（平滑对数分布的梯度），然后应用基于梯度的抽样算法。由于有限数量的初始样本导致的泛化误差、得分匹配误差以及抽样算法引入的扩散误差等因素可能会影响结果分布的正确性。本文分析了在一个简单但具有代表性的设置中进行抽样过程，即使用Langevin扩散抽样器从高斯分布中抽样。我们对由整个流程中的多个误差来源引起的Wasserstein抽样误差进行了深入分析。这使我们能够严格跟踪数据分布的各向异性（由其功率谱编码）与端到端抽样方法的关键参数（包括噪声幅度、得分匹配和扩散的步长以及初始样本数量）之间的相互作用。值得注意的是，我们展示了Wasserstein抽样误差可以表示为数据功率谱的核型范数，其中具体的核取决于方法参数。这一结果为进一步分析优化抽样精度所涉及的权衡奠定了基础，比如将噪声幅度调整为步长的选择。

更新时间: 2025-03-14 17:35:00

领域: cs.LG,math.OC,68Q32

下载: http://arxiv.org/abs/2503.11615v1

Enforcing MAVLink Safety & Security Properties Via Refined Multiparty Session Types

A compromised system component can issue message sequences that are legal while also leading the overall system into unsafe states. Such stealthy attacks are challenging to characterize, because message interfaces in standard languages specify each individual message separately but do not specify safe sequences of messages. We present initial results from ongoing work applying refined multiparty session types as a mechanism for expressing and enforcing proper message usage to exclude unsafe sequences. We illustrate our approach by using refined multiparty session types to mitigate safety and security issues in the MAVLink protocol commonly used in UAVs.

Updated: 2025-03-14 17:33:20

标题: 通过完善的多方会话类型强制执行MAVLink的安全性和安全性属性

摘要: 一个受损的系统组件可以发出合法的消息序列，同时也会导致整个系统进入不安全状态。这种隐蔽攻击很难进行特征化，因为标准语言中的消息接口是分别指定每个单独的消息，但并没有指定安全的消息序列。我们通过应用精细的多方会话类型作为一种机制来表达和执行正确的消息使用，以排除不安全的序列，并展示了正在进行中的工作的初始结果。我们通过使用精细的多方会话类型来减轻常用于无人机的MAVLink协议中的安全和安全问题。

更新时间: 2025-03-14 17:33:20

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2501.18874v2

Enhanced Soups for Graph Neural Networks

Graph Neural Networks (GNN) have demonstrated state-of-the-art performance in numerous scientific and high-performance computing (HPC) applications. Recent work suggests that "souping" (combining) individually trained GNNs into a single model can improve performance without increasing compute and memory costs during inference. However, existing souping algorithms are often slow and memory-intensive, which limits their scalability. We introduce Learned Souping for GNNs, a gradient-descent-based souping strategy that substantially reduces time and memory overhead compared to existing methods. Our approach is evaluated across multiple Open Graph Benchmark (OGB) datasets and GNN architectures, achieving up to 1.2% accuracy improvement and 2.1X speedup. Additionally, we propose Partition Learned Souping, a novel partition-based variant of learned souping that significantly reduces memory usage. On the ogbn-products dataset with GraphSAGE, partition learned souping achieves a 24.5X speedup and a 76% memory reduction without compromising accuracy.

Updated: 2025-03-14 17:29:27

标题: 增强型汤用于图神经网络

摘要: 图神经网络（GNN）已在许多科学和高性能计算（HPC）应用中展示出最先进的性能。最近的研究表明，“混合”（将单独训练的GNN组合）可以在推理过程中提高性能而不增加计算和内存成本。然而，现有的混合算法通常速度较慢且占用内存较高，这限制了它们的可扩展性。我们引入了一种基于梯度下降的学习混合策略，称为GNN的学习混合，与现有方法相比，大大减少了时间和内存开销。我们的方法在多个Open Graph Benchmark（OGB）数据集和GNN架构上进行了评估，实现了高达1.2％的准确率提升和2.1倍的加速。此外，我们提出了Partition Learned Souping，这是一种基于分区的学习混合的新变体，显著减少了内存使用。在使用GraphSAGE的ogbn-products数据集上，分区学习混合实现了24.5倍的加速和76％的内存减少，而不影响准确性。

更新时间: 2025-03-14 17:29:27

领域: cs.LG

下载: http://arxiv.org/abs/2503.11612v1

Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation

While retrieval-augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. A common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language inference (NLI) models for efficient grounding verification can be used at inference time. While existing pre-trained NLI models offer potential solutions, their performance remains subpar compared to larger models on realistic RAG inputs. RAG inputs are more complex than most datasets used for training NLI models and have characteristics specific to the underlying knowledge base, requiring adaptation of the NLI models to a specific target domain. Additionally, the lack of labeled instances in the target domain makes supervised domain adaptation, e.g., through fine-tuning, infeasible. To address these challenges, we introduce Automatic Generative Domain Adaptation (Auto-GDA). Our framework enables unsupervised domain adaptation through synthetic data generation. Unlike previous methods that rely on handcrafted filtering and augmentation strategies, Auto-GDA employs an iterative process to continuously improve the quality of generated samples using weak labels from less efficient teacher models and discrete optimization to select the most promising augmented samples. Experimental results demonstrate the effectiveness of our approach, with models fine-tuned on synthetic data using Auto-GDA often surpassing the performance of the teacher model and reaching the performance level of LLMs at 10% of their computational cost.

Updated: 2025-03-14 17:27:00

标题: Auto-GDA：检索增强生成中高效接地验证的自动域适应

摘要: 检索增强生成（RAG）已被证明可以提高大型语言模型（LLM）输出的真实性，但LLMs仍然存在幻觉问题，会生成不正确或无关的信息。一种常见的检测策略涉及再次提示LLM以评估其回应是否基于检索到的证据，但这种方法成本高昂。作为替代方案，可以在推理时使用轻量级自然语言推理（NLI）模型进行有效的基础验证。虽然现有的预训练NLI模型提供了潜在的解决方案，但它们在实际的RAG输入上的性能仍不及更大的模型。RAG输入比用于训练NLI模型的大多数数据集更复杂，并具有特定于基础知识库的特征，需要将NLI模型适应特定的目标领域。此外，目标领域中缺乏标记实例，使得通过微调等受监督的领域适应方法不可行。为了解决这些挑战，我们引入了自动生成领域适应（Auto-GDA）。我们的框架通过合成数据生成实现了无监督的领域适应。与依赖手工筛选和增强策略的先前方法不同，Auto-GDA采用迭代过程不断改进生成样本的质量，使用来自效率低下的教师模型的弱标签和离散优化来选择最有前途的增强样本。实验结果表明我们的方法的有效性，使用Auto-GDA在合成数据上微调的模型通常超过教师模型的性能，并达到了LLMs在10％计算成本下的性能水平。

更新时间: 2025-03-14 17:27:00

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.03461v2

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

An old-school recipe for training a classifier is to (i) learn a good feature extractor and (ii) optimize a linear layer atop. When only a handful of samples are available per category, as in Few-Shot Adaptation (FSA), data are insufficient to fit a large number of parameters, rendering the above impractical. This is especially true with large pre-trained Vision-Language Models (VLMs), which motivated successful research at the intersection of Parameter-Efficient Fine-tuning (PEFT) and FSA. In this work, we start by analyzing the learning dynamics of PEFT techniques when trained on few-shot data from only a subset of categories, referred to as the ``base'' classes. We show that such dynamics naturally splits into two distinct phases: (i) task-level feature extraction and (ii) specialization to the available concepts. To accommodate this dynamic, we then depart from prompt- or adapter-based methods and tackle FSA differently. Specifically, given a fixed computational budget, we split it to (i) learn a task-specific feature extractor via PEFT and (ii) train a linear classifier on top. We call this scheme Two-Stage Few-Shot Adaptation (2SFS). Differently from established methods, our scheme enables a novel form of selective inference at a category level, i.e., at test time, only novel categories are embedded by the adapted text encoder, while embeddings of base categories are available within the classifier. Results with fixed hyperparameters across two settings, three backbones, and eleven datasets, show that 2SFS matches or surpasses the state-of-the-art, while established methods degrade significantly across settings.

Updated: 2025-03-14 17:24:01

标题: 重新思考视觉语言模型的少样本适应性在两个阶段中的应用

摘要: 一个老派的分类器训练方法是（i）学习一个好的特征提取器，然后（ii）在其之上优化一个线性层。当每个类别只有少量样本可用时，就像在Few-Shot Adaptation（FSA）中一样，数据不足以拟合大量参数，使得上述方法不切实际。这一点在大型预训练的视觉-语言模型（VLMs）尤为明显，这促使了在参数高效微调（PEFT）和FSA交叉研究中取得成功。在这项工作中，我们首先分析了当在仅来自“基础”类别子集的少量样本上训练时，PEFT技术的学习动态。我们展示了这种动态自然地分为两个不同阶段：（i）任务级特征提取和（ii）对可用概念的专门化。为了适应这种动态，我们然后离开提示或适配器的方法，以不同方式处理FSA。具体来说，在固定的计算预算下，我们将其分为（i）通过PEFT学习特定任务的特征提取器和（ii）在其之上训练线性分类器。我们将这种方案称为Two-Stage Few-Shot Adaptation（2SFS）。与已有方法不同，我们的方案实现了一种新颖形式的分类级别的选择性推断，即在测试时，只有新颖类别被调整过的文本编码器嵌入，而基础类别的嵌入在分类器中是可用的。在两种设置、三个主干和十一个数据集上固定超参数的结果显示，2SFS与最先进技术相匹敌或超越，而已有方法在不同设置下明显降级。

更新时间: 2025-03-14 17:24:01

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.11609v1

Master Stability Functions in Complex Networks

Synchronization is an emergent and fundamental phenomenon in nature and engineered systems. Understanding the stability of a synchronized phenomenon is crucial for ensuring functionality in various complex systems. The stability of the synchronization phenomenon is extensively studied using the Master Stability Function (MSF). This powerful and elegant tool plays a pivotal role in determining the stability of synchronization states, providing deep insights into synchronization in coupled systems. Although MSF analysis has been used for 25 years to study the stability of synchronization states, a systematic investigation of MSF across various networked systems remains missing from the literature. In this article, we present a simplified and unified MSF analysis for diverse undirected and directed networked systems. We begin with the analytical MSF framework for pairwise-coupled identical systems with diffusive and natural coupling schemes and extend our analysis to directed networks and multilayer networks, considering both intra-layer and inter-layer interactions. Furthermore, we revisit the MSF framework to incorporate higher-order interactions alongside pairwise interactions. To enhance understanding, we also provide a numerical analysis of synchronization in coupled R\"ossler systems under pairwise diffusive coupling and propose algorithms for determining the MSF, identifying stability regimes, and classifying MSF functions. Overall, the primary goal of this review is to present a systematic study of MSF in coupled dynamical networks in a clear and structured manner, making this powerful tool more accessible. Furthermore, we highlight cases where the study of synchronization states using MSF remains underexplored. Additionally, we discuss recent research focusing on MSF analysis using time series data and machine learning approaches.

Updated: 2025-03-14 17:23:18

标题: 复杂网络中的主稳定性函数

摘要: 同步是自然和工程系统中的一种新兴和基本现象。了解同步现象的稳定性对于确保各种复杂系统的功能至关重要。同步现象的稳定性被广泛地使用主稳定性函数（MSF）进行研究。这一强大而优雅的工具在确定同步状态的稳定性方面发挥了关键作用，为耦合系统中的同步提供了深刻的见解。尽管MSF分析已经用于研究同步状态的稳定性25年，但文献中仍缺乏对各种网络系统中MSF的系统调查。在本文中，我们提出了一种简化和统一的MSF分析方法，适用于各种无向和有向网络系统。我们从分析具有扩散和自然耦合方案的成对耦合的相同系统的分析MSF框架开始，并将我们的分析扩展到有向网络和多层网络，考虑了层内和层间的相互作用。此外，我们重新审视MSF框架，以纳入高阶相互作用和成对相互作用。为了加强理解，我们还提供了对成对扩散耦合下耦合的R\"ossler系统中的同步的数值分析，并提出了确定MSF、识别稳定区域和分类MSF函数的算法。总的来说，本综述的主要目标是以清晰和结构化的方式呈现耦合动态网络中MSF的系统研究，使这一强大工具更易于接触。此外，我们强调了使用MSF进行同步状态研究仍未充分开发的案例。此外，我们讨论了最近集中在使用时间序列数据和机器学习方法进行MSF分析的研究。

更新时间: 2025-03-14 17:23:18

领域: nlin.AO,cs.AI,nlin.CD

下载: http://arxiv.org/abs/2412.19163v2

The Nyström method for convex loss functions

We investigate an extension of classical empirical risk minimization, where the hypothesis space consists of a random subspace within a given Hilbert space. Specifically, we examine the Nystr\"om method where the subspaces are defined by a random subset of the data. This approach recovers Nystr\"om approximations used in kernel methods as a specific case. Using random subspaces naturally leads to computational advantages, but a key question is whether it compromises the learning accuracy. Recently, the tradeoffs between statistics and computation have been explored for the square loss and self-concordant losses, such as the logistic loss. In this paper, we extend these analyses to general convex Lipschitz losses, which may lack smoothness, such as the hinge loss used in support vector machines. Our main results show the existence of various scenarios where computational gains can be achieved without sacrificing learning performance. When specialized to smooth loss functions, our analysis recovers most previous results. Moreover, it allows to consider classification problems and translate the surrogate risk bounds into classification error bounds. Indeed, this gives the opportunity to compare the effect of Nystr\"om approximations when combined with different loss functions such as the hinge or the square loss.

Updated: 2025-03-14 17:16:59

标题: The Nyström方法用于凸损失函数

摘要: 我们研究了一种经典经验风险最小化的扩展，其中假设空间包含给定希尔伯特空间内的一个随机子空间。具体而言，我们考察了Nyström方法，其中子空间由数据的随机子集定义。这种方法恢复了核方法中使用的Nyström逼近作为特定情况。使用随机子空间自然地带来计算优势，但一个关键问题是它是否会影响学习准确性。最近，已经探讨了平方损失和自共轭损失（如逻辑损失）之间的统计和计算之间的权衡。在本文中，我们将这些分析扩展到可能缺乏平滑性的一般凸利普希茨损失，如支持向量机中使用的铰链损失。我们的主要结果显示存在各种情景，可以实现计算收益而不牺牲学习性能。当专门针对平滑损失函数时，我们的分析恢复了大部分先前的结果。此外，它允许考虑分类问题，并将替代风险边界转化为分类错误边界。事实上，这为比较Nyström逼近与不同损失函数（如铰链或平方损失）相结合时的效果提供了机会。

更新时间: 2025-03-14 17:16:59

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2006.10016v4

Power Spectrum Signatures of Graphs

Point signatures based on the Laplacian operators on graphs, point clouds, and manifolds have become popular tools in machine learning for graphs, clustering, and shape analysis. In this work, we propose a novel point signature, the power spectrum signature, a measure on $\mathbb{R}$ defined as the squared graph Fourier transform of a graph signal. Unlike eigenvectors of the Laplacian from which it is derived, the power spectrum signature is invariant under graph automorphisms. We show that the power spectrum signature is stable under perturbations of the input graph with respect to the Wasserstein metric. We focus on the signature applied to classes of indicator functions, and its applications to generating descriptive features for vertices of graphs. To demonstrate the practical value of our signature, we showcase several applications in characterizing geometry and symmetries in point cloud data, and graph regression problems.

Updated: 2025-03-14 17:09:50

标题: 图的功率谱特征

摘要: 基于图、点云和流形上的拉普拉斯算子的点特征已经成为机器学习中用于图、聚类和形状分析的流行工具。在这项工作中，我们提出了一种新颖的点特征，即功率谱特征，它是一个定义在$\mathbb{R}$上的测度，是图信号的平方图傅里叶变换。与导出它的拉普拉斯特征向量不同，功率谱特征在图自同构下是不变的。我们展示了功率谱特征在输入图与Wasserstein度量下的扰动稳定性。我们关注特征应用于指示函数类，并将其应用于为图的顶点生成描述性特征。为了展示我们特征的实际价值，我们展示了在表征点云数据中的几何和对称性以及图回归问题中的多个应用。

更新时间: 2025-03-14 17:09:50

领域: stat.ML,cs.LG,cs.SI

下载: http://arxiv.org/abs/2503.09660v2

Agents' Room: Narrative Generation through Multi-step Collaboration

Writing compelling fiction is a multifaceted process combining elements such as crafting a plot, developing interesting characters, and using evocative language. While large language models (LLMs) show promise for story writing, they currently rely heavily on intricate prompting, which limits their use. We propose Agents' Room, a generation framework inspired by narrative theory, that decomposes narrative writing into subtasks tackled by specialized agents. To illustrate our method, we introduce Tell Me A Story, a high-quality dataset of complex writing prompts and human-written stories, and a novel evaluation framework designed specifically for assessing long narratives. We show that Agents' Room generates stories that are preferred by expert evaluators over those produced by baseline systems by leveraging collaboration and specialization to decompose the complex story writing task into tractable components. We provide extensive analysis with automated and human-based metrics of the generated output.

Updated: 2025-03-14 17:09:03

标题: 代理人房间：通过多步协作生成叙事

摘要: 撰写引人入胜的小说是一个多方面的过程，包括构思情节、塑造有趣的角色以及运用引人入胜的语言。虽然大型语言模型（LLMs）显示出在写作故事方面的潜力，但它们目前主要依赖于复杂的提示，这限制了它们的使用。我们提出了“Agents' Room”，这是一个受叙事理论启发的生成框架，将叙事写作分解为由专门的代理人处理的子任务。为了说明我们的方法，我们引入了“Tell Me A Story”这一高质量数据集，其中包含复杂的写作提示和人类撰写的故事，以及一个专门设计用于评估长篇叙述的新型评估框架。我们表明，通过利用协作和专业化将复杂的故事写作任务分解为可处理的组件，Agents' Room生成的故事比基线系统生成的更受专家评估者青睐。我们提供了对生成输出的自动化和基于人类的广泛分析。

更新时间: 2025-03-14 17:09:03

领域: cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2410.02603v2

A transfer learning framework for weak-to-strong generalization

Modern large language model (LLM) alignment techniques rely on human feedback, but it is unclear whether these techniques fundamentally limit the capabilities of aligned LLMs. In particular, it is unknown if it is possible to align (stronger) LLMs with superhuman capabilities with (weaker) human feedback without degrading their capabilities. This is an instance of the weak-to-strong generalization problem: using feedback from a weaker (less capable) model to train a stronger (more capable) model. We prove that weak-to-strong generalization is possible by eliciting latent knowledge from pre-trained LLMs. In particular, we cast the weak-to-strong generalization problem as a transfer learning problem in which we wish to transfer a latent concept prior from a weak model to a strong pre-trained model. We prove that a naive fine-tuning approach suffers from fundamental limitations, but an alternative refinement-based approach suggested by the problem structure provably overcomes the limitations of fine-tuning. Finally, we demonstrate the practical applicability of the refinement approach in multiple LLM alignment tasks.

Updated: 2025-03-14 17:08:22

标题: 一个用于弱到强泛化的迁移学习框架

摘要: 现代大型语言模型（LLM）对齐技术依赖于人类反馈，但目前尚不清楚这些技术是否从根本上限制了对齐的LLM的能力。特别是，目前还不清楚是否可能通过（较弱的）人类反馈来对齐具有超人类能力的（更强大的）LLM，而不降低它们的能力。这是一个由弱到强的泛化问题：使用来自较弱（能力较差）模型的反馈来训练较强（能力更强）的模型。我们证明了通过从预训练的LLM中提取潜在知识，弱到强的泛化是可能的。具体来说，我们将弱到强的泛化问题视为一个迁移学习问题，希望将潜在概念先验从弱模型转移到强预训练模型。我们证明了天真的微调方法存在根本性限制，但一个根据问题结构提出的替代的基于细化的方法可以克服微调的限制。最后，我们在多个LLM对齐任务中展示了细化方法的实际适用性。

更新时间: 2025-03-14 17:08:22

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16236v3

PUBLICSPEAK: Hearing the Public with a Probabilistic Framework in Local Government

Local governments around the world are making consequential decisions on behalf of their constituents, and these constituents are responding with requests, advice, and assessments of their officials at public meetings. So many small meetings cannot be covered by traditional newsrooms at scale. We propose PUBLICSPEAK, a probabilistic framework which can utilize meeting structure, domain knowledge, and linguistic information to discover public remarks in local government meetings. We then use our approach to inspect the issues raised by constituents in 7 cities across the United States. We evaluate our approach on a novel dataset of local government meetings and find that PUBLICSPEAK improves over state-of-the-art by 10% on average, and by up to 40%.

Updated: 2025-03-14 17:04:36

标题: 公众演讲：在地方政府中利用概率框架倾听公众

摘要: 全球各地的地方政府正在代表他们的选民做出重要决定，而这些选民则通过公开会议提出请求、建议和对官员的评估。如此之多的小型会议无法被传统新闻机构全面报道。我们提出了PUBLICSPEAK，这是一个概率框架，可以利用会议结构、领域知识和语言信息来发现地方政府会议中的公众言论。然后，我们使用我们的方法来检查美国7个城市选民提出的问题。我们对一个新颖的地方政府会议数据集进行了评估，发现PUBLICSPEAK平均提高了10%，最高可达40%。

更新时间: 2025-03-14 17:04:36

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.11743v1

Model-Agnostic Knowledge Guided Correction for Improved Neural Surrogate Rollout

Modeling the evolution of physical systems is critical to many applications in science and engineering. As the evolution of these systems is governed by partial differential equations (PDEs), there are a number of computational simulations which resolve these systems with high accuracy. However, as these simulations incur high computational costs, they are infeasible to be employed for large-scale analysis. A popular alternative to simulators are neural network surrogates which are trained in a data-driven manner and are much more computationally efficient. However, these surrogate models suffer from high rollout error when used autoregressively, especially when confronted with training data paucity. Existing work proposes to improve surrogate rollout error by either including physical loss terms directly in the optimization of the model or incorporating computational simulators as `differentiable layers' in the neural network. Both of these approaches have their challenges, with physical loss functions suffering from slow convergence for stiff PDEs and simulator layers requiring gradients which are not always available, especially in legacy simulators. We propose the Hybrid PDE Predictor with Reinforcement Learning (HyPER) model: a model-agnostic, RL based, cost-aware model which combines a neural surrogate, RL decision model, and a physics simulator (with or without gradients) to reduce surrogate rollout error significantly. In addition to reducing in-distribution rollout error by 47%-78%, HyPER learns an intelligent policy that is adaptable to changing physical conditions and resistant to noise corruption. Code available at https://github.com/scailab/HyPER.

Updated: 2025-03-14 17:02:11

标题: 模型无关的知识引导修正以改善神经替代展开

摘要: 对物理系统演化进行建模对于科学和工程的许多应用至关重要。由于这些系统的演化受到偏微分方程（PDEs）的控制，有许多计算模拟可以以高精度解析这些系统。然而，由于这些模拟造成高昂的计算成本，它们在进行大规模分析时是不可行的。一种流行的替代方案是神经网络代理，它们以数据驱动的方式进行训练，且计算效率更高。然而，这些代理模型在自回归使用时会遇到高的演化误差，尤其是在面对训练数据稀缺时。现有的工作提出通过在模型的优化中直接包含物理损失项，或者在神经网络中将计算模拟器作为“可微分层”来改进代理演化误差。这两种方法都存在挑战，物理损失函数在处理刚性PDEs时收敛缓慢，而模拟器层需要梯度，而这些梯度并不总是可用，尤其是在传统模拟器中。我们提出了基于强化学习的混合PDE预测器（HyPER）模型：这是一个与模型无关、基于RL的、成本感知的模型，它结合了神经代理、RL决策模型和一个物理模拟器（带有或不带有梯度），以显著减少代理演化误差。除了将内部演化误差减少47%-78%外，HyPER还学习到了一个智能策略，适应不断变化的物理条件并对噪声干扰具有抵抗力。代码可在https://github.com/scailab/HyPER 获取。

更新时间: 2025-03-14 17:02:11

领域: cs.LG

下载: http://arxiv.org/abs/2503.10048v2

Safe Vision-Language Models via Unsafe Weights Manipulation

Vision-language models (VLMs) often inherit the biases and unsafe associations present within their large-scale training dataset. While recent approaches mitigate unsafe behaviors, their evaluation focuses on how safe the model is on unsafe inputs, ignoring potential shortcomings on safe ones. In this paper, we first revise safety evaluation by introducing SafeGround, a new set of metrics that evaluate safety at different levels of granularity. With this metric, we uncover a surprising issue of training-based methods: they make the model less safe on safe inputs. From this finding, we take a different direction and explore whether it is possible to make a model safer without training, introducing Unsafe Weights Manipulation (UWM). UWM uses a calibration set of safe and unsafe instances to compare activations between safe and unsafe content, identifying the most important parameters for processing the latter. Their values are then manipulated via negation. Experiments show that UWM achieves the best tradeoff between safety and knowledge preservation, consistently improving VLMs on unsafe queries while outperforming even training-based state-of-the-art methods on safe ones.

Updated: 2025-03-14 17:00:22

标题: 通过不安全的权重操作实现安全的视觉-语言模型

摘要: 视觉语言模型（VLMs）经常继承其大规模训练数据集中存在的偏见和不安全的关联。尽管最近的方法减轻了不安全行为，但它们的评估集中在模型在不安全输入上的安全性，忽略了在安全输入上的潜在缺陷。在本文中，我们首先通过引入SafeGround，一组新的度量标准，来修订安全评估。通过这个度量标准，我们发现基于训练的方法存在一个令人惊讶的问题：它们使模型在安全输入上变得不够安全。基于这一发现，我们采取了不同的方向，探讨是否可以在不训练的情况下使模型更安全，引入了Unsafe Weights Manipulation（UWM）。UWM使用一组安全和不安全实例的校准集来比较安全和不安全内容之间的激活，识别处理后者最重要的参数。然后通过否定来操纵它们的值。实验证明，UWM在安全性和知识保留之间取得了最佳平衡，持续改善了VLMs对不安全查询的表现，甚至在安全查询上也胜过了基于训练的最先进方法。

更新时间: 2025-03-14 17:00:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11742v1

Do Not Trust Licenses You See: Dataset Compliance Requires Massive-Scale AI-Powered Lifecycle Tracing

This paper argues that a dataset's legal risk cannot be accurately assessed by its license terms alone; instead, tracking dataset redistribution and its full lifecycle is essential. However, this process is too complex for legal experts to handle manually at scale. Tracking dataset provenance, verifying redistribution rights, and assessing evolving legal risks across multiple stages require a level of precision and efficiency that exceeds human capabilities. Addressing this challenge effectively demands AI agents that can systematically trace dataset redistribution, analyze compliance, and identify legal risks. We develop an automated data compliance system called NEXUS and show that AI can perform these tasks with higher accuracy, efficiency, and cost-effectiveness than human experts. Our massive legal analysis of 17,429 unique entities and 8,072 license terms using this approach reveals the discrepancies in legal rights between the original datasets before redistribution and their redistributed subsets, underscoring the necessity of the data lifecycle-aware compliance. For instance, we find that out of 2,852 datasets with commercially viable individual license terms, only 605 (21%) are legally permissible for commercialization. This work sets a new standard for AI data governance, advocating for a framework that systematically examines the entire lifecycle of dataset redistribution to ensure transparent, legal, and responsible dataset management.

Updated: 2025-03-14 16:58:30

标题: 不要相信您看到的许可证：数据集合规遵从需要大规模人工智能驱动的生命周期追踪

摘要: 本文认为，一个数据集的法律风险不能仅通过其许可条款来准确评估；相反，跟踪数据集的再分发及其完整生命周期是至关重要的。然而，这个过程对于法律专家来说在规模上手动处理过于复杂。跟踪数据集来源、验证再分发权利，并评估多个阶段的法律风险需要一种精度和效率超出人类能力的水平。有效应对这一挑战需要能够系统追踪数据集再分发、分析合规性，并识别法律风险的人工智能代理。我们开发了一个名为NEXUS的自动数据合规系统，并展示了人工智能可以比人类专家更准确、更高效、更具成本效益地执行这些任务。我们使用这种方法对17,429个独特实体和8,072个许可条款进行了大规模的法律分析，揭示了在数据再分发之前的原始数据集和它们再分发的子集之间的法律权利差异，强调了数据生命周期意识合规性的必要性。例如，我们发现在2,852个具有商业可行个别许可条款的数据集中，只有605个（21%）在法律上允许用于商业化。这项工作为人工智能数据治理设定了一个新标准，主张一个系统性审查数据集再分发整个生命周期以确保透明、合法和负责任的数据集管理的框架。

更新时间: 2025-03-14 16:58:30

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.02784v3

Alchemist: Towards the Design of Efficient Online Continual Learning System

Continual learning has become a promising solution to refine large language models incrementally by leveraging user feedback. In particular, online continual learning - iteratively training the model with small batches of user feedback - has demonstrated notable performance improvements. However, the existing practice of separating training and serving processes forces the online trainer to recompute the intermediate results already done during serving. Such redundant computations can account for 30%-42% of total training time. In this paper, we propose Alchemist, to the best of our knowledge, the first online continual learning system that efficiently reuses serving activations to increase training throughput. Alchemist introduces two key techniques: (1) recording and storing activations and KV cache only during the prefill phase to minimize latency and memory overhead; and (2) smart activation offloading and hedging. Evaluations with inputs of varied token length sampled from ShareGPT dataset show that compared with a separate training cluster, Alchemist significantly increases training throughput by up to 1.72x, reduces up to 47% memory usage during training, and supports up to 2x more training tokens - all while maintaining negligible impact on serving latency.

Updated: 2025-03-14 16:57:12

标题: 炼金术士：朝向高效在线持续学习系统设计

摘要: 持续学习已成为通过利用用户反馈逐步改进大型语言模型的有前途的解决方案。特别是，在线持续学习 - 通过迭代地使用小批量用户反馈训练模型 - 已经显示出显著的性能改进。然而，现有的将训练和服务过程分开的做法迫使在线训练器重新计算在服务过程中已经完成的中间结果。这种冗余计算可能占总训练时间的30%-42%。在本文中，我们提出了Alchemist，据我们所知，这是第一个在线持续学习系统，可以有效地重复使用服务激活以增加训练吞吐量。Alchemist引入了两个关键技术：(1) 仅在预填充阶段记录和存储激活和KV缓存，以最小化延迟和内存开销；(2) 智能激活卸载和对冲。从ShareGPT数据集中抽样的不同令牌长度的输入评估结果显示，与独立的训练集群相比，Alchemist可以显著提高训练吞吐量高达1.72倍，减少高达47%的训练内存使用，并支持高达2倍的训练令牌 - 同时对服务延迟几乎没有影响。

更新时间: 2025-03-14 16:57:12

领域: cs.LG,cs.CL,cs.DC

下载: http://arxiv.org/abs/2503.01066v2

Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space

Large language models (LLMs) are used in chatbots or AI assistants to hold conversations with a human user. In such applications, the quality (e.g., user engagement, safety) of a conversation is important and can only be exactly known at the end of the conversation. To maximize its expected quality, conversation planning reasons about the stochastic transitions within a conversation to select the optimal LLM response at each turn. Existing simulation-based conversation planning algorithms typically select the optimal response by simulating future conversations with a large number of LLM queries at every turn. However, this process is extremely time-consuming and hence impractical for real-time conversations. This paper presents a novel approach called Semantic space COnversation Planning with improved Efficiency (SCOPE) that exploits the dense semantic representation of conversations to perform conversation planning efficiently. In particular, SCOPE models the stochastic transitions in conversation semantics and their associated rewards to plan entirely within the semantic space. This allows us to select the optimal LLM response at every conversation turn without needing additional LLM queries for simulation. As a result, SCOPE can perform conversation planning 70 times faster than conventional simulation-based planning algorithms when applied to a wide variety of conversation starters and two reward functions seen in the real world, yet achieving a higher reward within a practical planning budget. Our code can be found at: https://github.com/chenzhiliang94/convo-plan-SCOPE.

Updated: 2025-03-14 16:55:46

标题: 拓展您的SCOPE！利用语义空间为LLMs进行高效的多轮对话规划

摘要: 大型语言模型（LLMs）被用于聊天机器人或人工智能助手与人类用户进行对话。在这些应用中，对话的质量（例如用户参与度、安全性）很重要，只能在对话结束时才能确切知道。为了最大化其预期质量，对话规划会考虑对话中的随机转换，以在每个回合选择最佳的LLM响应。现有基于模拟的对话规划算法通常通过在每次回合进行大量LLM查询的未来对话来选择最佳响应。然而，这个过程非常耗时，因此对于实时对话是不现实的。本文提出了一种称为语义空间对话规划的新方法，利用对话的密集语义表示有效地进行对话规划。具体而言，SCOPE模型了对话语义中的随机转换及其相关奖励，以在语义空间内完全规划。这使我们能够在每次对话回合中选择最佳的LLM响应，而无需进行额外的LLM查询进行模拟。因此，当应用于各种对话开始和现实世界中看到的两个奖励函数时，SCOPE比传统的基于模拟的规划算法快70倍，同时在实际规划预算内获得更高的奖励。我们的代码可以在以下链接找到：https://github.com/chenzhiliang94/convo-plan-SCOPE。

更新时间: 2025-03-14 16:55:46

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.11586v1

Towards Few-Call Model Stealing via Active Self-Paced Knowledge Distillation and Diffusion-Based Image Generation

Diffusion models showcase strong capabilities in image synthesis, being used in many computer vision tasks with great success. To this end, we propose to explore a new use case, namely to copy black-box classification models without having access to the original training data, the architecture, and the weights of the model, i.e. the model is only exposed through an inference API. More specifically, we can only observe the (soft or hard) labels for some image samples passed as input to the model. Furthermore, we consider an additional constraint limiting the number of model calls, mostly focusing our research on few-call model stealing. In order to solve the model extraction task given the applied restrictions, we propose the following framework. As training data, we create a synthetic data set (called proxy data set) by leveraging the ability of diffusion models to generate realistic and diverse images. Given a maximum number of allowed API calls, we pass the respective number of samples through the black-box model to collect labels. Finally, we distill the knowledge of the black-box teacher (attacked model) into a student model (copy of the attacked model), harnessing both labeled and unlabeled data generated by the diffusion model. We employ a novel active self-paced learning framework to make the most of the proxy data during distillation. Our empirical results on three data sets confirm the superiority of our framework over four state-of-the-art methods in the few-call model extraction scenario. We release our code for free non-commercial use at https://github.com/vladhondru25/model-stealing.

Updated: 2025-03-14 16:52:55

标题: 朝向主动自适应知识蒸馏和基于扩散的图像生成的少调用模型窃取

摘要: 扩散模型在图像合成方面展示了强大的能力，在许多计算机视觉任务中取得了巨大成功。为此，我们提出探索一个新的用例，即在没有访问原始训练数据、架构和模型权重的情况下复制黑盒分类模型，即模型仅通过推理API公开。更具体地说，我们只能观察一些作为模型输入的图像样本的（软或硬）标签。此外，我们考虑了一个附加约束，限制了模型调用的次数，主要集中在少次模型窃取研究上。为了解决在应用限制情况下的模型提取任务，我们提出了以下框架。作为训练数据，我们通过利用扩散模型生成逼真和多样化图像的能力创建了一个合成数据集（称为代理数据集）。给定允许的最大API调用次数，我们通过黑盒模型传递相应数量的样本以收集标签。最后，我们将黑盒教师（被攻击模型）的知识提炼到学生模型（被攻击模型的副本）中，利用扩散模型生成的标记和未标记数据。我们采用了一种新颖的主动自适应学习框架，在提取过程中充分利用代理数据。我们在三个数据集上的实证结果证实了我们的框架在少次模型提取场景中优于四种最先进方法。我们在https://github.com/vladhondru25/model-stealing 上免费发布我们的代码供非商业使用。

更新时间: 2025-03-14 16:52:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.00096v2

Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification (UQ), while deep neural networks (DNNs) excel at representation learning. Deterministic UQ methods for neural networks have successfully combined the two and require only a single pass through the neural network. However, current methods necessitate changes to network training to address feature collapse, where unique inputs map to identical feature vectors. We propose an alternative solution, the deep Vecchia ensemble (DVE), which allows deterministic UQ to work in the presence of feature collapse, negating the need for network retraining. DVE comprises an ensemble of GPs built on hidden-layer outputs of a DNN, achieving scalability via Vecchia approximations that leverage nearest-neighbor conditional independence. DVE is compatible with pretrained networks and incurs low computational overhead. We demonstrate DVE's utility on several datasets and carry out experiments to understand the inner workings of the proposed method.

Updated: 2025-03-14 16:50:47

标题: Old Gaussian Process Ensembles on Deep Neural Networks' Internal Representations

摘要: 对于回归任务，标准高斯过程（GPs）提供自然的不确定性量化（UQ），而深度神经网络（DNNs）擅长表示学习。神经网络的确定性UQ方法成功地将两者结合起来，并且只需要通过一次神经网络。然而，当前的方法需要改变网络训练来解决特征坍塌的问题，即唯一的输入映射到相同的特征向量。我们提出了一种替代解决方案，即深度Vecchia集成（DVE），它允许确定性UQ在特征坍塌的情况下工作，从而消除了需要重新训练网络的必要性。DVE由建立在DNN隐藏层输出上的一组GPs组成，通过利用最近邻条件独立性的Vecchia逼近实现可扩展性。DVE与预训练网络兼容，并且计算开销低。我们在多个数据集上展示了DVE的效用，并进行实验以了解所提出方法的内部工作原理。

更新时间: 2025-03-14 16:50:47

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.17063v2

Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations

This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.

Updated: 2025-03-14 16:46:23

标题: 深度同质神经网络中小初始化的早期方向收敛

摘要: 本文研究了训练假定具有局部Lipschitz梯度和齐次神经网络的梯度流动动力学，假定其齐次阶大于二。结果表明，在训练的早期阶段，对于足够小的初始化，神经网络的权重保持在（欧几里德）范数小且在方向上近似收敛于最近引入的神经相关函数的Karush-Kuhn-Tucker（KKT）点。此外，本文还研究了具有（泄漏）ReLU和多项式（泄漏）ReLU激活函数的前馈网络的神经相关函数的KKT点，并推导出一阶KKT点的必要和充分条件。

更新时间: 2025-03-14 16:46:23

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2403.08121v3

Synthesizing Access Control Policies using Large Language Models

Cloud compute systems allow administrators to write access control policies that govern access to private data. While policies are written in convenient languages, such as AWS Identity and Access Management Policy Language, manually written policies often become complex and error prone. In this paper, we investigate whether and how well Large Language Models (LLMs) can be used to synthesize access control policies. Our investigation focuses on the task of taking an access control request specification and zero-shot prompting LLMs to synthesize a well-formed access control policy which correctly adheres to the request specification. We consider two scenarios, one which the request specification is given as a concrete list of requests to be allowed or denied, and another in which a natural language description is used to specify sets of requests to be allowed or denied. We then argue that for zero-shot prompting, more precise and structured prompts using a syntax based approach are necessary and experimentally show preliminary results validating our approach.

Updated: 2025-03-14 16:40:25

标题: 使用大型语言模型合成访问控制策略

摘要: 云计算系统允许管理员编写访问控制策略，以管理对私人数据的访问。虽然策略是用方便的语言编写的，如AWS身份和访问管理策略语言，但手动编写的策略往往变得复杂且容易出错。在本文中，我们调查了大型语言模型（LLMs）是否能够被用来合成访问控制策略，以及如何使用。我们的调查聚焦于将一个访问控制请求规范转化为一个良好形成的访问控制策略的任务，该策略正确地遵守了请求规范。我们考虑了两种情景，一种是请求规范被给定为一个具体的允许或拒绝的请求列表，另一种是使用自然语言描述来指定允许或拒绝的请求集合。然后我们认为对于零提示，使用基于语法的更精确和结构化的提示是必要的，并通过实验证明了我们方法的初步结果。

更新时间: 2025-03-14 16:40:25

领域: cs.SE,cs.AI,cs.CR,68P25

下载: http://arxiv.org/abs/2503.11573v1

Implicit Bias-Like Patterns in Reasoning Models

Implicit bias refers to automatic or spontaneous mental processes that shape perceptions, judgments, and behaviors. Previous research examining `implicit bias' in large language models (LLMs) has often approached the phenomenon differently than how it is studied in humans by focusing primarily on model outputs rather than on model processing. To examine model processing, we present a method called the Reasoning Model Implicit Association Test (RM-IAT) for studying implicit bias-like patterns in reasoning models: LLMs that employ step-by-step reasoning to solve complex tasks. Using this method, we find that reasoning models require more tokens when processing association-incompatible information compared to association-compatible information. These findings suggest AI systems harbor patterns in processing information that are analogous to human implicit bias. We consider the implications of these implicit bias-like patterns for their deployment in real-world applications.

Updated: 2025-03-14 16:40:02

标题: 隐含偏见式推理模式

摘要: 隐性偏见指的是塑造感知、判断和行为的自动或自发的心理过程。先前研究中对大型语言模型（LLMs）中的“隐性偏见”进行的研究通常与在人类中研究该现象的方法不同，主要侧重于模型输出而非模型处理。为了研究模型处理，我们提出了一种称为推理模型隐性联想测试（RM-IAT）的方法，用于研究推理模型中类似隐性偏见的模式：这些模型使用逐步推理来解决复杂任务。使用这种方法，我们发现在处理与关联不兼容的信息时，推理模型需要更多的令牌，而与关联兼容的信息相比。这些发现表明，推理模型在处理信息时存在与人类隐性偏见类似的模式。我们考虑这些隐性偏见样式对它们在现实世界应用中部署的影响。

更新时间: 2025-03-14 16:40:02

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.11572v1

RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing

Portrait video editing focuses on modifying specific attributes of portrait videos, guided by audio or video streams. Previous methods typically either concentrate on lip-region reenactment or require training specialized models to extract keypoints for motion transfer to a new identity. In this paper, we introduce a training-free universal portrait video editing framework that provides a versatile and adaptable editing strategy. This framework supports portrait appearance editing conditioned on the changed first reference frame, as well as lip editing conditioned on varied speech, or a combination of both. It is based on a Unified Animation Control (UAC) mechanism with source inversion latents to edit the entire portrait, including visual-driven shape control, audio-driven speaking control, and inter-frame temporal control. Furthermore, our method can be adapted to different scenarios by adjusting the initial reference frame, enabling detailed editing of portrait videos with specific head rotations and facial expressions. This comprehensive approach ensures a holistic and flexible solution for portrait video editing. The experimental results show that our model can achieve more accurate and synchronized lip movements for the lip editing task, as well as more flexible motion transfer for the appearance editing task. Demo is available at https://alice01010101.github.io/RASA/.

Updated: 2025-03-14 16:39:15

标题: RASA：替换任何人，说任何话--一种无需训练的基于音频驱动和通用人像视频编辑框架

摘要: 人像视频编辑侧重于修改人像视频的特定属性，受音频或视频流的指导。先前的方法通常要么集中于唇部区域再现，要么需要训练专门的模型来提取关键点以实现运动转移到新的身份。在本文中，我们介绍了一种无需训练的通用人像视频编辑框架，提供了一种多功能和适应性强的编辑策略。该框架支持基于更改的第一个参考帧的人像外观编辑，以及基于不同语音的嘴唇编辑，或两者的结合。它基于一个统一的动画控制（UAC）机制，具有源反演潜变量来编辑整个人像，包括视觉驱动的形状控制、音频驱动的说话控制和帧间时间控制。此外，我们的方法可以通过调整初始参考帧来适应不同的场景，实现对具有特定头部旋转和面部表情的人像视频进行详细编辑。这种综合方法确保了人像视频编辑的全面和灵活的解决方案。实验结果表明，我们的模型可以实现更准确和同步的嘴唇运动，以及更灵活的外观编辑任务的运动转移。Demo可在https://alice01010101.github.io/RASA/上找到。

更新时间: 2025-03-14 16:39:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11571v1

Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations

A unified representation space in multi-modal learning is essential for effectively integrating diverse data sources, such as text, images, and audio, to enhance efficiency and performance across various downstream tasks. Recent binding methods, such as ImageBind (Girdhar et al., 2023), typically rely on a single, fixed anchor modality for aligning multi-modal data. We mathematically analyze these fixed anchor binding method and uncover significant limitations: (1) over-reliance on the choice of the anchor modality, (2) inadequate capture of intra-modal information, and (3) failure to account for cross-modal correlation among non-anchored modalities. To address these issues, we propose the need for adaptive anchor binding methods, exemplified by our framework CentroBind. The proposed method uses adaptively adjustable centroid-based anchors generated from all available modalities, leading to a balanced and rich representation space. We theoretically demonstrate that our approach captures three critical properties of multi-modal learning -- intra-modal learning, inter-modal learning, and multi-modal alignment -- while constructing a unified representation that spans all modalities. Experiments on both synthetic and real-world datasets show that adaptive anchor methods such as CentroBind consistently outperform fixed anchor binding methods, verifying our analysis.

Updated: 2025-03-14 16:36:53

标题: 锚定起航！追求最佳统一多模态表征

摘要: 在多模态学习中，一个统一的表示空间对于有效整合不同的数据源（如文本、图像和音频）以增强各种下游任务的效率和性能至关重要。最近的绑定方法，如ImageBind（Girdhar等，2023年），通常依赖于单一的固定锚定模态来对齐多模态数据。我们对这些固定锚定绑定方法进行了数学分析，并揭示了重要的局限性：（1）过度依赖锚定模态的选择，（2）未能充分捕捉单模态信息，（3）未能考虑非锚定模态之间的跨模态相关性。为了解决这些问题，我们提出了需要自适应锚定绑定方法的概念，以我们的CentroBind框架为例。该方法使用自适应可调节的基于中心的锚点从所有可用的模态生成，从而导致一个平衡且丰富的表示空间。我们在理论上证明了我们的方法捕捉了多模态学习的三个关键属性 - 单模态学习、跨模态学习和多模态对齐 - 同时构建了一个跨越所有模态的统一表示。在合成和真实数据集上的实验表明，如CentroBind等自适应锚定方法始终优于固定锚定绑定方法，验证了我们的分析。

更新时间: 2025-03-14 16:36:53

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2410.02086v2

Affinity-VAE: incorporating prior knowledge in representation learning from scientific images

Learning compact and interpretable representations of data is a critical challenge in scientific image analysis. Here, we introduce Affinity-VAE, a generative model that enables us to impose our scientific intuition about the similarity of instances in the dataset on the learned representation during training. We demonstrate the utility of the approach in the scientific domain of cryo-electron tomography (cryo-ET) where a significant current challenge is to identify similar molecules within a noisy and low contrast tomographic image volume. This task is distinct from classification in that, at inference time, it is unknown whether an instance is part of the training set or not. We trained affinity-VAE using prior knowledge of protein structure to inform the latent space. Our model is able to create rotationally-invariant, morphologically homogeneous clusters in the latent representation, with improved cluster separation compared to other approaches. It achieves competitive performance on protein classification with the added benefit of disentangling object pose, structural similarity and an interpretable latent representation. In the context of cryo-ET data, affinity-VAE captures the orientation of identified proteins in 3D which can be used as a prior for subsequent scientific experiments. Extracting physical principles from a trained network is of significant importance in scientific imaging where a ground truth training set is not always feasible.

Updated: 2025-03-14 16:34:24

标题: Affinity-VAE：将先验知识纳入科学图像表示学习

摘要: 学习数据的紪合且可解释的表示形式是科学图像分析中的一个关键挑战。在这里，我们介绍了Affinity-VAE，这是一个生成模型，使我们能够在训练过程中将我们对数据集中实例相似性的科学直觉 imposed到学习到的表示形式上。我们展示了这种方法在冷冻电子断层扫描（cryo-ET）的科学领域中的实用性，目前的一个重要挑战是在嘈杂且对比度低的断层图像体积中识别相似的分子。这个任务与分类不同，在推断时，我们不知道一个实例是否属于训练集。我们使用蛋白质结构的先验知识来训练Affinity-VAE，以指导潜在空间。我们的模型能够在潜在表示中创建旋转不变、形态均匀的簇，与其他方法相比，具有更好的簇分离性能。它在蛋白质分类上表现出竞争力，并且具有分离对象姿势、结构相似性和可解释潜在表示的附加优势。在cryo-ET数据的背景下，Affinity-VAE捕获了三维中已识别蛋白质的方向，这可用作后续科学实验的先验。在科学成像中，从训练网络中提取物理原理具有重要意义，因为并非总是可行的拥有地面真相的训练集。

更新时间: 2025-03-14 16:34:24

领域: cs.CV,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2209.04517v2

Designing Neural Synthesizers for Low Latency Interaction

Neural Audio Synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real-time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, we investigate the sources of latency and jitter typically found in interactive NAS models. We then apply this analysis to the task of timbre transfer using RAVE, a convolutional variational autoencoder for audio waveforms introduced by Caillon et al. in 2021. Finally, we present an iterative design approach for optimizing latency. This culminates with a model we call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. We implement it in a specialized inference framework for low-latency, real-time inference and present a proof-of-concept audio plugin compatible with audio signals from musical instruments. We expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.

Updated: 2025-03-14 16:30:31

标题: 设计神经合成器以实现低延迟交互

摘要: 神经音频合成（NAS）模型提供了对高质量、表现力强的音频生成器进行互动音乐控制的能力。虽然这些模型可以实时运行，但它们通常受到高延迟的困扰，使它们不适合进行亲密的音乐互动。深度学习模型在音频延迟方面的架构选择对NAS文献中的影响仍然大多未被探索。在这项工作中，我们研究了交互式NAS模型中通常存在的延迟和抖动的来源。然后，我们将这种分析应用到使用RAVE进行音色转换的任务上，RAVE是由Caillon等人在2021年引入的用于音频波形的卷积变分自动编码器。最后，我们提出了一种用于优化延迟的迭代设计方法。这最终形成了一个我们称为BRAVE（勇敢实时音频变分自动编码器）的模型，该模型延迟低且在显示音高和响度复制效果更好的同时，展现了与RAVE类似的音色修改能力。我们将其实现在一个专门的推理框架中，用于低延迟、实时推理，并呈现了一个与音乐乐器的音频信号兼容的概念验证音频插件。我们期望本文中描述的挑战和指导方针能够支持NAS研究人员从头开始设计低延迟推理模型，丰富音乐家的可能性空间。

更新时间: 2025-03-14 16:30:31

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2503.11562v1

Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation

Stochastic Gradient Descent (SGD) with adaptive steps is widely used to train deep neural networks and generative models. Most theoretical results assume that it is possible to obtain unbiased gradient estimators, which is not the case in several recent deep learning and reinforcement learning applications that use Monte Carlo methods. This paper provides a comprehensive non-asymptotic analysis of SGD with biased gradients and adaptive steps for non-convex smooth functions. Our study incorporates time-dependent bias and emphasizes the importance of controlling the bias of the gradient estimator. In particular, we establish that Adagrad, RMSProp, and AMSGRAD, an exponential moving average variant of Adam, with biased gradients, converge to critical points for smooth non-convex functions at a rate similar to existing results in the literature for the unbiased case. Finally, we provide experimental results using Variational Autoenconders (VAE) and applications to several learning frameworks that illustrate our convergence results and show how the effect of bias can be reduced by appropriate hyperparameter tuning.

Updated: 2025-03-14 16:27:25

标题: 有偏自适应随机逼近的非渐近分析

摘要: 随机梯度下降（SGD）与自适应步长广泛用于训练深度神经网络和生成模型。大多数理论结果假定可以获得无偏梯度估计器，而在一些最近使用蒙特卡洛方法的深度学习和强化学习应用中并非如此。本文对具有偏梯度和自适应步长的非凸光滑函数的SGD进行了全面的非渐近分析。我们的研究包括时间依赖偏差，并强调控制梯度估计器的偏差的重要性。特别地，我们确定了Adagrad、RMSProp和AMSGRAD（Adam的指数移动平均变体）在具有偏梯度的情况下，对于光滑的非凸函数以类似于文献中无偏情况的速率收敛到临界点。最后，我们提供了使用变分自动编码器（VAE）和应用于几种学习框架的实验结果，展示了我们的收敛结果，并展示了如何通过适当的超参数调整来减少偏差的影响。

更新时间: 2025-03-14 16:27:25

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.02857v2

Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans

The introduction of Artificial Intelligence (AI) generative language models such as GPT (Generative Pre-trained Transformer) and tools such as ChatGPT has triggered a revolution that can transform how text is generated. This has many implications, for example, as AI-generated text becomes a significant fraction of the text, would this have an effect on the language capabilities of readers and also on the training of newer AI tools? Would it affect the evolution of languages? Focusing on one specific aspect of the language: words; will the use of tools such as ChatGPT increase or reduce the vocabulary used or the lexical richness? This has implications for words, as those not included in AI-generated content will tend to be less and less popular and may eventually be lost. In this work, we perform an initial comparison of the vocabulary and lexical richness of ChatGPT and humans when performing the same tasks. In more detail, two datasets containing the answers to different types of questions answered by ChatGPT and humans, and a third dataset in which ChatGPT paraphrases sentences and questions are used. The analysis shows that ChatGPT tends to use fewer distinct words and lower lexical richness than humans. These results are very preliminary and additional datasets and ChatGPT configurations have to be evaluated to extract more general conclusions. Therefore, further research is needed to understand how the use of ChatGPT and more broadly generative AI tools will affect the vocabulary and lexical richness in different types of text and languages.

Updated: 2025-03-14 16:19:46

标题: 玩弄文字：比较ChatGPT和人类的词汇和词汇多样性

摘要: 人工智能生成语言模型（如GPT（生成预训练变换器））的引入以及ChatGPT等工具的推出引发了一场革命，可以改变文本生成的方式。这有许多影响，例如，随着人工智能生成的文本成为文本的重要部分，这是否会影响读者的语言能力，以及对新型人工智能工具的培训？这是否会影响语言的演变？专注于语言的一个特定方面：词汇；使用ChatGPT等工具是否会增加或减少所使用的词汇或词汇丰富度？这对词汇有影响，因为未包含在人工智能生成内容中的词汇往往会变得越来越不流行，并最终可能会丢失。在这项工作中，我们对ChatGPT和人类在执行相同任务时的词汇和词汇丰富度进行了初步比较。更详细地说，包含ChatGPT和人类回答不同类型问题的两个数据集，以及在其中ChatGPT改写句子和问题的第三个数据集。分析显示，ChatGPT倾向于使用较少的不同词汇和较低的词汇丰富度比人类。这些结果非常初步，还需要评估更多数据集和ChatGPT配置，以得出更一般的结论。因此，需要进一步研究来了解ChatGPT的使用以及更广泛的生成人工智能工具如何影响不同类型文本和语言中的词汇和词汇丰富度。

更新时间: 2025-03-14 16:19:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2308.07462v3

Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference

Large language models (LLMs) have shown outstanding performance across numerous real-world tasks. However, the autoregressive nature of these models makes the inference process slow and costly. Speculative decoding has emerged as a promising solution, leveraging a smaller auxiliary model to draft future tokens, which are then validated simultaneously by the larger model, achieving a speed-up of 1-2x. Although speculative decoding matches the same distribution as multinomial sampling, multinomial sampling itself is prone to suboptimal outputs, whereas beam sampling is widely recognized for producing higher-quality results by maintaining multiple candidate sequences at each step. This paper explores the novel integration of speculative decoding with beam sampling. However, there are four key challenges: (1) how to generate multiple sequences from the larger model's distribution given drafts sequences from the small model; (2) how to dynamically optimize the number of beams to balance efficiency and accuracy; (3) how to efficiently verify the multiple drafts in parallel; and (4) how to address the extra memory costs inherent in beam sampling. To address these challenges, we propose dynamic-width speculative beam decoding (DSBD). Specifically, we first introduce a novel draft and verification scheme that generates multiple sequences following the large model's distribution based on beam sampling trajectories from the small model. Then, we introduce an adaptive mechanism to dynamically tune the number of beams based on the context, optimizing efficiency and effectiveness. Besides, we extend tree-based parallel verification to handle multiple trees simultaneously, accelerating the verification process. Finally, we illustrate a simple modification to our algorithm to mitigate the memory overhead of beam sampling...

Updated: 2025-03-14 16:18:50

标题: 动态宽度推测束搜索解码用于高效LLM推断

摘要: 大型语言模型（LLMs）在许多实际任务中表现出色。然而，这些模型的自回归性质使得推理过程变得缓慢且昂贵。推测解码已经成为一个有前途的解决方案，利用一个较小的辅助模型起草未来的标记，然后由较大的模型同时验证，实现1-2倍的加速。尽管推测解码与多项式采样匹配相同的分布，但多项式采样本身容易产生次优输出，而波束采样被广泛认为通过在每一步保持多个候选序列来产生更高质量的结果。本文探讨了推测解码与波束采样的新颖整合。然而，存在四个关键挑战：（1）如何在给定小模型的草稿序列的情况下，从较大模型的分布中生成多个序列；（2）如何动态优化波束的数量以平衡效率和准确性；（3）如何并行高效地验证多个草稿；以及（4）如何解决波束采样固有的额外内存成本。为了解决这些挑战，我们提出了动态宽度推测波束解码（DSBD）。具体而言，我们首先引入了一种新颖的起草和验证方案，根据小模型的波束采样轨迹生成多个遵循大模型分布的序列。然后，我们引入了一个自适应机制，根据上下文动态调整波束的数量，优化效率和有效性。此外，我们将基于树的并行验证扩展到同时处理多个树，加速验证过程。最后，我们展示了对我们的算法进行简单修改以减轻波束采样的内存开销。

更新时间: 2025-03-14 16:18:50

领域: cs.AI

下载: http://arxiv.org/abs/2409.16560v2

Standards for Belief Representations in LLMs

As large language models (LLMs) continue to demonstrate remarkable abilities across various domains, computer scientists are developing methods to understand their cognitive processes, particularly concerning how (and if) LLMs internally represent their beliefs about the world. However, this field currently lacks a unified theoretical foundation to underpin the study of belief in LLMs. This article begins filling this gap by proposing adequacy conditions for a representation in an LLM to count as belief-like. We argue that, while the project of belief measurement in LLMs shares striking features with belief measurement as carried out in decision theory and formal epistemology, it also differs in ways that should change how we measure belief. Thus, drawing from insights in philosophy and contemporary practices of machine learning, we establish four criteria that balance theoretical considerations with practical constraints. Our proposed criteria include accuracy, coherence, uniformity, and use, which together help lay the groundwork for a comprehensive understanding of belief representation in LLMs. We draw on empirical work showing the limitations of using various criteria in isolation to identify belief representations.

Updated: 2025-03-14 16:14:16

标题: LLMs中的信念表征标准

摘要: 随着大型语言模型（LLMs）在各个领域展示出卓越的能力，计算机科学家正在开发方法来理解它们的认知过程，特别是关于LLMs如何（以及是否）内部表示他们对世界的信念。然而，这个领域目前缺乏一个统一的理论基础来支撑对LLMs中信念的研究。本文从提出LLMs中表示为信念类似的充分条件开始填补这一空白。我们认为，虽然在LLMs中信念测量的项目与在决策理论和形式认识论中进行的信念测量具有引人注目的特征，但也存在一些不同之处，这些不同之处应该改变我们衡量信念的方式。因此，我们汲取哲学和当代机器学习实践中的见解，建立了平衡理论考虑与实际约束的四个标准。我们提出的标准包括准确性、一致性、统一性和使用性，这些标准共同帮助奠定了对LLMs中信念表示的全面理解的基础。我们借鉴了关于使用各种标准孤立地识别信念表示的局限性的实证工作。

更新时间: 2025-03-14 16:14:16

领域: cs.AI

下载: http://arxiv.org/abs/2405.21030v2

FLASHμ: Fast Localizing And Sizing of Holographic Microparticles

Reconstructing the 3D location and size of microparticles from diffraction images - holograms - is a computationally expensive inverse problem that has traditionally been solved using physics-based reconstruction methods. More recently, researchers have used machine learning methods to speed up the process. However, for small particles in large sample volumes the performance of these methods falls short of standard physics-based reconstruction methods. Here we designed a two-stage neural network architecture, FLASH$\mu$, to detect small particles (6-100$\mu$m) from holograms with large sample depths up to 20cm. Trained only on synthetic data with added physical noise, our method reliably detects particles of at least 9$\mu$m diameter in real holograms, comparable to the standard reconstruction-based approaches while operating on smaller crops, at quarter of the original resolution and providing roughly a 600-fold speedup. In addition to introducing a novel approach to a non-local object detection or signal demixing problem, our work could enable low-cost, real-time holographic imaging setups.

Updated: 2025-03-14 16:04:10

标题: FLASHμ:全息微粒快速定位和尺寸测量

摘要: 从衍射图像 - 全息图像 - 重建微粒的三维位置和大小是一个计算昂贵的反问题，传统上是使用基于物理的重建方法来解决的。最近，研究人员已经开始使用机器学习方法来加速这个过程。然而，对于大样品体积中的小颗粒，这些方法的性能不及标准的基于物理的重建方法。在这里，我们设计了一个两阶段神经网络架构，FLASH$\mu$，用于检测全息图中直径为6-100$\mu$m的小颗粒，样品深度可达20cm。我们的方法仅在添加了物理噪音的合成数据上进行训练，可可靠地检测出至少直径为9$\mu$m的颗粒在真实全息图中的存在，与标准的重建方法相媲美，同时在更小的作物上运行，分辨率为原始分辨率的四分之一，提供大约600倍的加速。除了引入一种新颖的非局部目标检测或信号分离问题的方法，我们的工作还可以实现低成本、实时的全息成像设置。

更新时间: 2025-03-14 16:04:10

领域: cs.CV,cs.AI,cs.LG,physics.ao-ph,physics.optics

下载: http://arxiv.org/abs/2503.11538v1

Potential of large language model-powered nudges for promoting daily water and energy conservation

The increasing amount of pressure related to water and energy shortages has increased the urgency of cultivating individual conservation behaviors. While the concept of nudging, i.e., providing usage-based feedback, has shown promise in encouraging conservation behaviors, its efficacy is often constrained by the lack of targeted and actionable content. This study investigates the impact of the use of large language models (LLMs) to provide tailored conservation suggestions for conservation intentions and their rationale. Through a survey experiment with 1,515 university participants, we compare three virtual nudging scenarios: no nudging, traditional nudging with usage statistics, and LLM-powered nudging with usage statistics and personalized conservation suggestions. The results of statistical analyses and causal forest modeling reveal that nudging led to an increase in conservation intentions among 86.9%-98.0% of the participants. LLM-powered nudging achieved a maximum increase of 18.0% in conservation intentions, surpassing traditional nudging by 88.6%. Furthermore, structural equation modeling results reveal that exposure to LLM-powered nudges enhances self-efficacy and outcome expectations while diminishing dependence on social norms, thereby increasing intrinsic motivation to conserve. These findings highlight the transformative potential of LLMs in promoting individual water and energy conservation, representing a new frontier in the design of sustainable behavioral interventions and resource management.

Updated: 2025-03-14 15:58:11

标题: 大型语言模型驱动的提示在促进日常节水和节能方面的潜力

摘要: 水资源和能源短缺带来的压力不断增加，增加了培养个人节约行为的紧迫性。虽然“推动”概念，即提供基于使用情况的反馈，显示出鼓励节约行为的潜力，但其有效性常常受到缺乏针对性和可操作性内容的限制。本研究调查了使用大型语言模型（LLMs）为节约意图提供量身定制节约建议及其原因的影响。通过与1,515名大学参与者进行的调查实验，我们比较了三种虚拟推动情景：无推动、传统推动与使用统计数据、以及LLM支持的推动与使用统计数据和个性化节约建议。统计分析和因果森林建模的结果显示，推动导致86.9%-98.0%的参与者节约意图增加。LLM支持的推动使节约意图增加了最高18.0%，超过传统推动88.6%。此外，结构方程建模结果显示，接触LLM支持的推动增强了自我效能和结果期望，同时减少了对社会规范的依赖，从而增加了内在的节约动机。这些发现突显了LLMs在促进个人节约水资源和能源方面的转变潜力，代表了可持续行为干预和资源管理设计的新前沿。

更新时间: 2025-03-14 15:58:11

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.11531v1

Bottom-up Iterative Anomalous Diffusion Detector (BI-ADD)

In recent years, the segmentation of short molecular trajectories with varying diffusive properties has drawn particular attention of researchers, since it allows studying the dynamics of a particle. In the past decade, machine learning methods have shown highly promising results, also in changepoint detection and segmentation tasks. Here, we introduce a novel iterative method to identify the changepoints in a molecular trajectory, i.e., frames, where the diffusive behavior of a particle changes. A trajectory in our case follows a fractional Brownian motion and we estimate the diffusive properties of the trajectories. The proposed BI-ADD combines unsupervised and supervised learning methods to detect the changepoints. Our approach can be used for the analysis of molecular trajectories at the individual level and also be extended to multiple particle tracking, which is an important challenge in fundamental biology. We validated BI-ADD in various scenarios within the framework of the AnDi2 Challenge 2024 dedicated to single particle tracking. Our method is implemented in Python and is publicly available for research purposes.

Updated: 2025-03-14 15:57:31

标题: 自下而上的迭代异常扩散检测器（BI-ADD）

摘要: 近年来，具有不同扩散特性的短分子轨迹的分割引起了研究人员特别关注，因为这使得研究粒子的动态变得可能。在过去的十年中，机器学习方法在变点检测和分割任务中表现出非常有前途的结果。在这里，我们介绍了一种新颖的迭代方法，用于识别分子轨迹中的变点，即粒子扩散行为发生变化的帧。在我们的案例中，轨迹遵循分数布朗运动，我们估计轨迹的扩散特性。所提出的BI-ADD结合了无监督和监督学习方法来检测变点。我们的方法可用于个体水平的分子轨迹分析，并且还可以扩展到多粒子跟踪，这是基础生物学中的一个重要挑战。我们在AnDi2 Challenge 2024框架内的各种场景中验证了BI-ADD，该挑战专门针对单粒子跟踪。我们的方法在Python中实现，并可供研究目的公开使用。

更新时间: 2025-03-14 15:57:31

领域: cs.LG

下载: http://arxiv.org/abs/2503.11529v1

AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery

Different from the traditional semi-supervised learning paradigm that is constrained by the close-world assumption, Generalized Category Discovery (GCD) presumes that the unlabeled dataset contains new categories not appearing in the labeled set, and aims to not only classify old categories but also discover new categories in the unlabeled data. Existing studies on GCD typically devote to transferring the general knowledge from the self-supervised pretrained model to the target GCD task via some fine-tuning strategies, such as partial tuning and prompt learning. Nevertheless, these fine-tuning methods fail to make a sound balance between the generalization capacity of pretrained backbone and the adaptability to the GCD task. To fill this gap, in this paper, we propose a novel adapter-tuning-based method named AdaptGCD, which is the first work to introduce the adapter tuning into the GCD task and provides some key insights expected to enlighten future research. Furthermore, considering the discrepancy of supervision information between the old and new classes, a multi-expert adapter structure equipped with a route assignment constraint is elaborately devised, such that the data from old and new classes are separated into different expert groups. Extensive experiments are conducted on 7 widely-used datasets. The remarkable improvements in performance highlight the effectiveness of our proposals.

Updated: 2025-03-14 15:55:43

标题: 适应GCD：广义类别发现的多专家适配器调整

摘要: 与传统的半监督学习范式不同，其受限于封闭世界假设，广义类别发现（GCD）假设未标记的数据集包含在标记集中未出现的新类别，并旨在不仅对旧类别进行分类，还要在未标记数据中发现新类别。现有关于GCD的研究通常致力于通过一些微调策略，如部分调整和提示学习，将来自自监督预训练模型的通用知识转移到目标GCD任务中。然而，这些微调方法未能在预训练骨干的泛化能力和适应性之间取得良好的平衡。为了填补这一空白，本文提出了一种基于适配器调谐的新方法，名为AdaptGCD，它是第一项将适配器调谐引入GCD任务的工作，并提供了一些关键见解，有望启发未来的研究。此外，考虑到旧类别和新类别之间监督信息的差异，精心设计了一个配备路由分配约束的多专家适配器结构，使旧类别和新类别的数据分开到不同的专家组。在7个广泛使用的数据集上进行了大量实验。性能的显着改善突显了我们提议的有效性。

更新时间: 2025-03-14 15:55:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.21705v2

Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks

Prompt injection constitutes a significant challenge for generative AI systems by inducing unintended outputs. We introduce a multi-agent NLP framework specifically designed to address prompt injection vulnerabilities through layered detection and enforcement mechanisms. The framework orchestrates specialized agents for generating responses, sanitizing outputs, and enforcing policy compliance. Evaluation on 500 engineered injection prompts demonstrates a marked reduction in injection success and policy breaches. Novel metrics, including Injection Success Rate (ISR), Policy Override Frequency (POF), Prompt Sanitization Rate (PSR), and Compliance Consistency Score (CCS), are proposed to derive a composite Total Injection Vulnerability Score (TIVS). The system utilizes the OVON (Open Voice Network) framework for inter-agent communication via structured JSON messages, extending a previously established multi-agent architecture from hallucination mitigation to address the unique challenges of prompt injection.

Updated: 2025-03-14 15:41:45

标题: 通过AI多智能体NLP框架实现快速注入检测和缓解

摘要: 快速注入构成生成式人工智能系统面临的重要挑战，会导致意外输出。我们引入了一个专门设计用于解决提示注入漏洞的多智能体自然语言处理框架，通过分层检测和执行机制来解决这个问题。该框架协调专门的智能体来生成响应、清理输出，并强制执行政策合规性。对500个经过设计的注入提示进行评估表明，注入成功率和政策违规的数量显著减少。提出了新的指标，包括注入成功率（ISR）、政策覆盖频率（POF）、提示清理率（PSR）和合规性一致性得分（CCS），用于推导一个综合的总注入漏洞分数（TIVS）。该系统利用OVON（开放语音网络）框架通过结构化JSON消息进行智能体间的通信，将先前建立的多智能体架构从幻觉减轻扩展到解决提示注入的独特挑战。

更新时间: 2025-03-14 15:41:45

领域: cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2503.11517v1

CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control

Traffic Signal Control (TSC) plays a critical role in urban traffic management by optimizing traffic flow and mitigating congestion. While Large Language Models (LLMs) have recently emerged as promising tools for TSC due to their exceptional problem-solving and generalization capabilities, existing approaches fail to address the essential need for inter-agent coordination, limiting their effectiveness in achieving network-wide optimization. To bridge this gap, we propose CoLLMLight, a cooperative LLM agent framework for TSC. Specifically, we first construct a structured spatiotemporal graph to capture real-time traffic dynamics and spatial relationships among neighboring intersections, enabling the LLM to reason about complex traffic interactions. Moreover, we introduce a complexity-aware reasoning mechanism that dynamically adapts reasoning depth based on real-time traffic conditions, ensuring optimal computational efficiency without sacrificing decision quality. Besides, we propose a fine-tuning strategy that leverages iterative simulation-driven data collection and environmental feedback to build a lightweight LLM tailored for cooperative TSC. Extensive experiments on both synthetic and real-world datasets demonstrate that CoLLMLight outperforms state-of-the-art methods in diverse traffic scenarios, showcasing its effectiveness, scalability, and robustness.

Updated: 2025-03-14 15:40:39

标题: CoLLMLight：用于网络范围交通信号控制的合作大型语言模型代理

摘要: 交通信号控制(TSC)在城市交通管理中发挥着至关重要的作用，通过优化交通流量和减轻拥堵。最近，由于其出色的问题解决和泛化能力，大型语言模型(LLMs)已经成为TSC的有希望的工具，但现有方法未能解决对于代理间协调的基本需求，从而限制了它们在实现网络范围优化方面的效力。为弥合这一差距，我们提出了CoLLMLight，一个用于TSC的合作LLM代理框架。具体地，我们首先构建了一个结构化的时空图，以捕捉实时交通动态和邻近交叉口之间的空间关系，使LLM能够推理复杂的交通互动。此外，我们引入了一种基于复杂性的推理机制，根据实时交通条件动态调整推理深度，确保在不牺牲决策质量的情况下实现最佳的计算效率。此外，我们提出了一种利用迭代仿真驱动数据收集和环境反馈来构建专为合作TSC量身定制的轻量级LLM的微调策略。对合成和真实世界数据集进行的大量实验表明，CoLLMLight在各种交通场景中优于最先进的方法，展示了其有效性、可扩展性和稳健性。

更新时间: 2025-03-14 15:40:39

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.11739v1

Generalized Bayesian Ensemble Survival Tree (GBEST) model

This paper proposes a new class of predictive models for survival analysis called Generalized Bayesian Ensemble Survival Tree (GBEST). It is well known that survival analysis poses many different challenges, in particular when applied to small data or censorship mechanism. Our contribution is the proposal of an ensemble approach that uses Bayesian bootstrap and beta Stacy bootstrap methods to improve the outcome in survival application with a special focus on small datasets. More precisely, a novel approach to integrate Beta Stacy Bayesian bootstrap in bagging tree models for censored data is proposed in this paper. Empirical evidence achieved on simulated and real data underlines that our approach performs better in terms of predictive performances and stability of the results compared with classical survival models available in the literature. In terms of methodology our novel contribution considers the adaptation of recent Bayesian ensemble approaches to survival data, providing a new model called Generalized Bayesian Ensemble Survival Tree (GBEST). A further result in terms of computational novelty is the implementation in R of GBEST, available in a public GitHub repository.

Updated: 2025-03-14 15:40:18

标题: 广义贝叶斯集成生存树（GBEST）模型

摘要: 本文提出了一种新的用于生存分析的预测模型类别，称为广义贝叶斯集成生存树（GBEST）。众所周知，生存分析在许多不同方面都存在挑战，特别是在小数据或截尾机制应用时。我们的贡献是提出了一种集成方法，该方法利用贝叶斯自举和贝塔斯塔西自举方法来改善生存应用结果，特别关注小数据集。更具体地说，本文提出了一种新方法，将贝塔斯塔西贝叶斯自举集成到袋装树模型中，用于截尾数据。在模拟和真实数据上取得的实证证据强调，与文献中现有的经典生存模型相比，我们的方法在预测性能和结果稳定性方面表现更好。在方法论方面，我们的新颖贡献考虑了将最近的贝叶斯集成方法应用于生存数据，提供了一种名为广义贝叶斯集成生存树（GBEST）的新模型。在计算创新方面的进一步结果是在R中实现了GBEST，并将其提供在一个公共GitHub存储库中。

更新时间: 2025-03-14 15:40:18

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.11738v1

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning, which can enhance reasoning tasks within the MMKGs, such as MMKG completion (MMKGC). The main challenge is to collaboratively model the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK for short) to learn adaptive multi-modal entity representations for better MMKGC. We design relation-guided modality knowledge experts to acquire relation-aware modality embeddings and integrate the predictions from multi-modalities to achieve joint decisions. Additionally, we disentangle the experts by minimizing their mutual information. Experiments on four public MMKG benchmarks demonstrate the outstanding performance of MoMoK under complex scenarios.

Updated: 2025-03-14 15:37:57

标题: 多个头胜过一个：多模态知识专家混合用于实体表示学习

摘要: 学习高质量的多模态实体表示是多模态知识图谱（MMKG）表示学习的重要目标，可以增强MMKG内的推理任务，如MMKG完成（MMKGC）。主要挑战是协同建模隐藏在大量三元组和实体的多模态特征中的结构信息。现有方法侧重于精心设计基于实体的多模态融合策略，但它们忽视了在不同关系上下文中隐藏的多视角特征的利用。为了解决这个问题，我们引入了一个新框架，即混合模态知识专家（MoMoK），以学习适应性的多模态实体表示，用于更好地进行MMKGC。我们设计了关系引导的模态知识专家来获取关系感知的模态嵌入，并整合来自多模态的预测结果以实现联合决策。此外，我们通过最小化它们的互信息来解开专家之间的联系。在四个公共MMKG基准上的实验证明了MoMoK在复杂情景下的优异性能。

更新时间: 2025-03-14 15:37:57

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.16869v3

Zero-shot Imputation with Foundation Inference Models for Dynamical Systems

Dynamical systems governed by ordinary differential equations (ODEs) serve as models for a vast number of natural and social phenomena. In this work, we offer a fresh perspective on the classical problem of imputing missing time series data, whose underlying dynamics are assumed to be determined by ODEs. Specifically, we revisit ideas from amortized inference and neural operators, and propose a novel supervised learning framework for zero-shot time series imputation, through parametric functions satisfying some (hidden) ODEs. Our proposal consists of two components. First, a broad probability distribution over the space of ODE solutions, observation times and noise mechanisms, with which we generate a large, synthetic dataset of (hidden) ODE solutions, along with their noisy and sparse observations. Second, a neural recognition model that is trained offline, to map the generated time series onto the spaces of initial conditions and time derivatives of the (hidden) ODE solutions, which we then integrate to impute the missing data. We empirically demonstrate that one and the same (pretrained) recognition model can perform zero-shot imputation across 63 distinct time series with missing values, each sampled from widely different dynamical systems. Likewise, we demonstrate that it can perform zero-shot imputation of missing high-dimensional data in 10 vastly different settings, spanning human motion, air quality, traffic and electricity studies, as well as Navier-Stokes simulations -- without requiring any fine-tuning. What is more, our proposal often outperforms state-of-the-art methods, which are trained on the target datasets. Our pretrained model, repository and tutorials are available online.

Updated: 2025-03-14 15:37:14

标题: 零样本插补在动态系统中基于基础推理模型的应用

摘要: 由普通微分方程（ODEs）主导的动力系统作为自然和社会现象的大量模型。在这项工作中，我们对假设其基础动态由ODEs确定的缺失时间序列数据的经典问题提供了一种新的视角。具体来说，我们重新审视了摊销推断和神经操作员的想法，并提出了一个新颖的监督学习框架，用于零样本时间序列插补，通过满足一些（隐藏的）ODEs的参数函数。我们的提议由两个组成部分组成。首先，概率分布覆盖ODE解空间、观测时间和噪声机制，我们利用它来生成大量合成数据集，包括（隐藏的）ODE解以及它们的嘈杂和稀疏观测。其次，是一个离线训练的神经识别模型，将生成的时间序列映射到（隐藏的）ODE解的初始条件和时间导数空间，然后集成以插补缺失数据。我们在实证上证明，同一个（预训练的）识别模型可以在63个不同的具有缺失值的时间序列上执行零样本插补，每个从不同的动态系统中抽样。同样，我们证明它可以在10个截然不同的设置中执行缺失高维数据的零样本插补，涵盖人类运动、空气质量、交通和电力研究，以及Navier-Stokes模拟，而无需任何微调。更重要的是，我们的提议通常优于在目标数据集上训练的最先进方法。我们的预训练模型、存储库和教程可在网上获得。

更新时间: 2025-03-14 15:37:14

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2402.07594v4

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models

Text-to-video generation poses significant challenges due to the inherent complexity of video data, which spans both temporal and spatial dimensions. It introduces additional redundancy, abrupt variations, and a domain gap between language and vision tokens while generation. Addressing these challenges requires an effective video tokenizer that can efficiently encode video data while preserving essential semantic and spatiotemporal information, serving as a critical bridge between text and vision. Inspired by the observation in VQ-VAE-2 and workflows of traditional animation, we propose HiTVideo for text-to-video generation with hierarchical tokenizers. It utilizes a 3D causal VAE with a multi-layer discrete token framework, encoding video content into hierarchically structured codebooks. Higher layers capture semantic information with higher compression, while lower layers focus on fine-grained spatiotemporal details, striking a balance between compression efficiency and reconstruction quality. Our approach efficiently encodes longer video sequences (e.g., 8 seconds, 64 frames), reducing bits per pixel (bpp) by approximately 70\% compared to baseline tokenizers, while maintaining competitive reconstruction quality. We explore the trade-offs between compression and reconstruction, while emphasizing the advantages of high-compressed semantic tokens in text-to-video tasks. HiTVideo aims to address the potential limitations of existing video tokenizers in text-to-video generation tasks, striving for higher compression ratios and simplify LLMs modeling under language guidance, offering a scalable and promising framework for advancing text to video generation. Demo page: https://ziqinzhou66.github.io/project/HiTVideo.

Updated: 2025-03-14 15:36:39

标题: HiTVideo: 使用自回归大语言模型增强文本到视频生成的分层标记器

摘要: 文本到视频生成面临着重大挑战，这是由于视频数据本身的复杂性，涵盖了时间和空间两个维度。在生成过程中引入了额外的冗余、突变以及语言和视觉标记之间的领域差距。解决这些挑战需要一个有效的视频标记器，能够在保留基本语义和时空信息的同时高效地编码视频数据，作为文本和视觉之间的关键桥梁。受VQ-VAE-2和传统动画工作流程的启发，我们提出了HiTVideo用于文本到视频生成，采用分层标记器。它利用了一个带有多层离散标记框架的3D因果VAE，将视频内容编码成分层结构的码书。更高层次捕捉了更高压缩率的语义信息，而较低层次则专注于细粒度的时空细节，实现了在压缩效率和重建质量之间的平衡。我们的方法能够高效地编码更长的视频序列（例如8秒，64帧），与基线标记器相比，每像素的位数（bpp）减少约70％，同时保持竞争性的重建质量。我们探讨了压缩和重建之间的权衡，同时强调了在文本到视频任务中高压缩语义标记的优势。HiTVideo旨在解决现有视频标记器在文本到视频生成任务中的潜在限制，力求实现更高的压缩比并简化在语言指导下的LLMs建模，为推进文本到视频生成提供了一个可扩展且有前景的框架。演示页面：https://ziqinzhou66.github.io/project/HiTVideo。

更新时间: 2025-03-14 15:36:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11513v1

Hacking Cryptographic Protocols with Advanced Variational Quantum Attacks

Here we introduce an improved approach to Variational Quantum Attack Algorithms (VQAA) on crytographic protocols. Our methods provide robust quantum attacks to well-known cryptographic algorithms, more efficiently and with remarkably fewer qubits than previous approaches. We implement simulations of our attacks for symmetric-key protocols such as S-DES, S-AES and Blowfish. For instance, we show how our attack allows a classical simulation of a small 8-qubit quantum computer to find the secret key of one 32-bit Blowfish instance with 24 times fewer number of iterations than a brute-force attack. Our work also shows improvements in attack success rates for lightweight ciphers such as S-DES and S-AES. Further applications beyond symmetric-key cryptography are also discussed, including asymmetric-key protocols and hash functions. In addition, we also comment on potential future improvements of our methods. Our results bring one step closer assessing the vulnerability of large-size classical cryptographic protocols with Noisy Intermediate-Scale Quantum (NISQ) devices, and set the stage for future research in quantum cybersecurity.

Updated: 2025-03-14 15:36:05

标题: 用先进的变分量子攻击破解加密协议

摘要: 在这里，我们介绍了一种改进的变分量子攻击算法（VQAA）的方法，用于密码协议。我们的方法为众所周知的密码算法提供了强大的量子攻击，比先前的方法更高效，并且使用的量子比特数量明显更少。我们模拟了我们的攻击对称密钥协议，如S-DES、S-AES和Blowfish。例如，我们展示了我们的攻击如何使一个经典模拟一个小型8量子比特量子计算机可以比蛮力攻击少24倍的迭代次数找到一个32位Blowfish实例的秘钥。我们的工作还展示了对轻量级密码，如S-DES和S-AES，攻击成功率的改进。我们还讨论了超越对称密钥密码学的进一步应用，包括非对称密钥协议和哈希函数。此外，我们还评论了我们方法的潜在未来改进。我们的结果使评估大型经典密码协议在嘈杂中间规模量子（NISQ）设备上的易受攻击性更近一步，并为未来量子网络安全研究奠定了基础。

更新时间: 2025-03-14 15:36:05

领域: quant-ph,cs.CR,cs.LG

下载: http://arxiv.org/abs/2311.02986v2

Alzheimer's Disease Classification Using Retinal OCT: TransnetOCT and Swin Transformer Models

Retinal optical coherence tomography (OCT) images are the biomarkers for neurodegenerative diseases, which are rising in prevalence. Early detection of Alzheimer's disease using retinal OCT is a primary challenging task. This work utilizes advanced deep learning techniques to classify retinal OCT images of subjects with Alzheimer's disease (AD) and healthy controls (CO). The goal is to enhance diagnostic capabilities through efficient image analysis. In the proposed model, Raw OCT images have been preprocessed with ImageJ and given to various deep-learning models to evaluate the accuracy. The best classification architecture is TransNetOCT, which has an average accuracy of 98.18% for input OCT images and 98.91% for segmented OCT images for five-fold cross-validation compared to other models, and the Swin Transformer model has achieved an accuracy of 93.54%. The evaluation accuracy metric demonstrated TransNetOCT and Swin transformer models capability to classify AD and CO subjects reliably, contributing to the potential for improved diagnostic processes in clinical settings.

Updated: 2025-03-14 15:34:37

标题: 阿尔茨海默病分类使用视网膜OCT：TransnetOCT和Swin Transformer模型

摘要: 视网膜光学相干断层扫描（OCT）图像是神经退行性疾病的生物标志物，其患病率正在上升。利用视网膜OCT早期检测阿尔茨海默病是一个主要具有挑战性的任务。本研究利用先进的深度学习技术对患有阿尔茨海默病（AD）和健康对照组（CO）的视网膜OCT图像进行分类。其目标是通过高效的图像分析增强诊断能力。在提出的模型中，原始OCT图像经过ImageJ预处理，并输入到各种深度学习模型中进行准确性评估。最佳分类架构为TransNetOCT，其对输入OCT图像的平均准确度为98.18%，对分割OCT图像的平均准确度为98.91%，在五折交叉验证中比其他模型更高，而Swin Transformer模型实现了93.54%的准确度。评估准确度指标表明TransNetOCT和Swin Transformer模型能够可靠地对AD和CO受试者进行分类，有望为临床设置中改善诊断流程做出贡献。

更新时间: 2025-03-14 15:34:37

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.11511v1

Leveraging Angle of Arrival Estimation against Impersonation Attacks in Physical Layer Authentication

In this paper, we investigate the utilization of the angle of arrival (AoA) as a feature for robust physical layer authentication (PLA). While most of the existing approaches to PLA focus on common features of the physical layer of communication channels, such as channel frequency response, channel impulse response or received signal strength, the use of AoA in this domain has not yet been studied in depth, particularly regarding the ability to thwart impersonation attacks. In this work, we demonstrate that an impersonation attack targeting AoA based PLA is only feasible under strict conditions on the attacker's location and hardware capabilities, which highlights the AoA's potential as a strong feature for PLA. We extend previous works considering a single-antenna attacker to the case of a multiple-antenna attacker, and we develop a theoretical characterization of the conditions in which a successful impersonation attack can be mounted. Furthermore, we leverage extensive simulations in support of theoretical analyses, to validate the robustness of AoA-based PLA.

Updated: 2025-03-14 15:29:55

标题: 利用到达角度估计对抗物理层身份验证中的冒名攻击

摘要: 在本文中，我们调查了到达角（AoA）作为鲁棒物理层认证（PLA）特征的利用。虽然现有的PLA方法主要关注通信信道物理层的常见特征，如信道频率响应、信道脉冲响应或接收信号强度，但在这一领域中对AoA的利用尚未深入研究，特别是在防范冒名攻击方面。在这项工作中，我们证明了基于AoA的PLA的冒名攻击只有在攻击者位置和硬件能力严格条件下才可行，这突显了AoA作为PLA强大特征的潜力。我们将先前针对单天线攻击者的工作扩展到多天线攻击者的情况，并对成功进行冒名攻击的条件进行了理论刻画。此外，我们利用大量仿真来支持理论分析，验证了基于AoA的PLA的鲁棒性。

更新时间: 2025-03-14 15:29:55

领域: cs.CR

下载: http://arxiv.org/abs/2503.11508v1

Reinforcement Learning with Verifiable Rewards: GRPO's Effective Loss, Dynamics, and Success Amplification

Group Relative Policy Optimization (GRPO) was introduced and used successfully to train DeepSeek R1 models for promoting reasoning capabilities of LLMs using verifiable or binary rewards. We show in this paper that GRPO with verifiable rewards can be written as a Kullback Leibler ($\mathsf{KL}$) regularized contrastive loss, where the contrastive samples are synthetic data sampled from the old policy. The optimal GRPO policy $\pi_{n}$ can be expressed explicitly in terms of the binary reward, as well as the first and second order statistics of the old policy ($\pi_{n-1}$) and the reference policy $\pi_0$. Iterating this scheme, we obtain a sequence of policies $\pi_{n}$ for which we can quantify the probability of success $p_n$. We show that the probability of success of the policy satisfies a recurrence that converges to a fixed point of a function that depends on the initial probability of success $p_0$ and the regularization parameter $\beta$ of the $\mathsf{KL}$ regularizer. We show that the fixed point $p^*$ is guaranteed to be larger than $p_0$, thereby demonstrating that GRPO effectively amplifies the probability of success of the policy.

Updated: 2025-03-14 15:25:46

标题: 强化学习中的可验证奖励：GRPO的有效损失、动态和成功放大

摘要: Group Relative Policy Optimization（GRPO）被引入并成功应用于训练DeepSeek R1模型，以提升LLMs的推理能力，使用可验证或二进制奖励。本文表明，带有可验证奖励的GRPO可以被写成Kullback Leibler（KL）正则化对比损失，其中对比样本是从旧策略中采样的合成数据。最优的GRPO策略πn可以明确地用二进制奖励以及旧策略（πn-1）和参考策略π0的一阶和二阶统计量来表达。通过迭代这个方案，我们得到一系列策略πn，可以量化成功概率pn。我们表明，策略的成功概率满足一个收敛到一个函数的固定点的递归，该函数取决于初始成功概率p0和KL正则化器的正则化参数β。我们表明，固定点p*保证大于p0，从而证明了GRPO有效地增加了策略的成功概率。

更新时间: 2025-03-14 15:25:46

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.06639v2

Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control

Adaptive traffic signal control (ATSC) is crucial in reducing congestion, maximizing throughput, and improving mobility in rapidly growing urban areas. Recent advancements in parameter-sharing multi-agent reinforcement learning (MARL) have greatly enhanced the scalable and adaptive optimization of complex, dynamic flows in large-scale homogeneous networks. However, the inherent heterogeneity of real-world traffic networks, with their varied intersection topologies and interaction dynamics, poses substantial challenges to achieving scalable and effective ATSC across different traffic scenarios. To address these challenges, we present Unicorn, a universal and collaborative MARL framework designed for efficient and adaptable network-wide ATSC. Specifically, we first propose a unified approach to map the states and actions of intersections with varying topologies into a common structure based on traffic movements. Next, we design a Universal Traffic Representation (UTR) module with a decoder-only network for general feature extraction, enhancing the model's adaptability to diverse traffic scenarios. Additionally, we incorporate an Intersection Specifics Representation (ISR) module, designed to identify key latent vectors that represent the unique intersection's topology and traffic dynamics through variational inference techniques. To further refine these latent representations, we employ a contrastive learning approach in a self-supervised manner, which enables better differentiation of intersection-specific features. Moreover, we integrate the state-action dependencies of neighboring agents into policy optimization, which effectively captures dynamic agent interactions and facilitates efficient regional collaboration. Our results show that Unicorn outperforms other methods across various evaluation metrics, highlighting its potential in complex, dynamic traffic networks.

Updated: 2025-03-14 15:13:42

标题: 独角兽：通用且协作的强化学习方法，用于实现可泛化的网络范围交通信号控制

摘要: 自适应交通信号控制（ATSC）对于减少拥堵、最大化吞吐量以及改善迅速增长的城市地区的流动性至关重要。最近，在参数共享的多智能体强化学习（MARL）方面的进展极大地增强了大规模同质网络中复杂动态流量的可扩展和自适应优化。然而，真实世界交通网络的固有异质性，以及其多样的十字路口拓扑结构和互动动态，对于在不同交通场景中实现可扩展和有效的ATSC构成了重大挑战。为了解决这些挑战，我们提出了Unicorn，一个旨在实现高效和可适应网络范围ATSC的通用合作MARL框架。具体而言，我们首先提出了一种统一方法，将具有不同拓扑结构的十字路口的状态和行动映射到基于交通流动的公共结构中。接下来，我们设计了一个具有解码器网络的通用交通表示（UTR）模块，用于一般特征提取，增强模型对各种交通场景的适应性。此外，我们还引入了一个十字路口特定表示（ISR）模块，旨在通过变分推理技术识别代表独特十字路口拓扑结构和交通动态的关键潜在向量。为了进一步完善这些潜在表示，我们采用对比学习方法进行自监督学习，从而更好地区分十字路口特定特征。此外，我们将邻近智能体的状态-行动依赖性整合到策略优化中，有效捕捉动态智能体互动，并促进高效地区合作。我们的结果表明，Unicorn在各种评估指标上优于其他方法，突显了其在复杂动态交通网络中的潜力。

更新时间: 2025-03-14 15:13:42

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2503.11488v1

A Review of DeepSeek Models' Key Innovative Techniques

DeepSeek-V3 and DeepSeek-R1 are leading open-source Large Language Models (LLMs) for general-purpose tasks and reasoning, achieving performance comparable to state-of-the-art closed-source models from companies like OpenAI and Anthropic -- while requiring only a fraction of their training costs. Understanding the key innovative techniques behind DeepSeek's success is crucial for advancing LLM research. In this paper, we review the core techniques driving the remarkable effectiveness and efficiency of these models, including refinements to the transformer architecture, innovations such as Multi-Head Latent Attention and Mixture of Experts, Multi-Token Prediction, the co-design of algorithms, frameworks, and hardware, the Group Relative Policy Optimization algorithm, post-training with pure reinforcement learning and iterative training alternating between supervised fine-tuning and reinforcement learning. Additionally, we identify several open questions and highlight potential research opportunities in this rapidly advancing field.

Updated: 2025-03-14 15:11:29

标题: 深度搜索模型关键创新技术综述

摘要: DeepSeek-V3和DeepSeek-R1是领先的开源大型语言模型（LLMs），用于一般性任务和推理，实现了与OpenAI和Anthropic等公司最先进的闭源模型可比的性能，同时只需部分训练成本。了解DeepSeek成功背后的关键创新技术对于推进LLM研究至关重要。在本文中，我们回顾了推动这些模型显著效果和效率的核心技术，包括对变压器架构的改进，创新技术如多头潜在注意力和专家混合，多令牌预测，算法，框架和硬件的协同设计，群体相对政策优化算法，使用纯强化学习进行后训练以及交替进行监督微调和强化学习的迭代训练。此外，我们还确定了几个开放问题，并突出了这个快速发展领域中的潜在研究机会。

更新时间: 2025-03-14 15:11:29

领域: cs.LG

下载: http://arxiv.org/abs/2503.11486v1

NeuMC -- a package for neural sampling for lattice field theories

We present the \texttt{NeuMC} software package, based on \pytorch, aimed at facilitating the research on neural samplers in lattice field theories. Neural samplers based on normalizing flows are becoming increasingly popular in the context of Monte-Carlo simulations as they can effectively approximate target probability distributions, possibly alleviating some shortcomings of the Markov chain Monte-Carlo methods. Our package provides tools to create such samplers for two-dimensional field theories.

Updated: 2025-03-14 15:07:04

标题: NeuMC - 用于晶格场理论神经采样的软件包

摘要: 我们提出了基于\pytorch 的\texttt{NeuMC} 软件包，旨在促进晶格场理论中神经采样器的研究。基于归一化流的神经采样器在蒙特卡罗模拟中越来越受欢迎，因为它们可以有效地逼近目标概率分布，可能缓解马尔科夫链蒙特卡罗方法的一些缺点。我们的软件包提供了为二维场论创建此类采样器的工具。

更新时间: 2025-03-14 15:07:04

领域: hep-lat,cs.LG,68T07,J.2

下载: http://arxiv.org/abs/2503.11482v1

Heterogeneous Causal Discovery of Repeated Undesirable Health Outcomes

Understanding factors triggering or preventing undesirable health outcomes across patient subpopulations is essential for designing targeted interventions. While randomized controlled trials and expert-led patient interviews are standard methods for identifying these factors, they can be time-consuming and infeasible. Causal discovery offers an alternative to conventional approaches by generating cause-and-effect hypotheses from observational data. However, it often relies on strong or untestable assumptions, which can limit its practical application. This work aims to make causal discovery more practical by considering multiple assumptions and identifying heterogeneous effects. We formulate the problem of discovering causes and effect modifiers of an outcome, where effect modifiers are contexts (e.g., age groups) with heterogeneous causal effects. Then, we present a novel, end-to-end framework that incorporates an ensemble of causal discovery algorithms and estimation of heterogeneous effects to discover causes and effect modifiers that trigger or inhibit the outcome. We demonstrate that the ensemble approach improves robustness by enhancing recall of causal factors while maintaining precision. Our study examines the causes of repeat emergency room visits for diabetic patients and hospital readmissions for ICU patients. Our framework generates causal hypotheses consistent with existing literature and can help practitioners identify potential interventions and patient subpopulations to focus on.

Updated: 2025-03-14 15:05:17

标题: 异质性因果发现：重复不良健康结果

摘要: 了解触发或预防不良健康结果的因素，跨患者亚群是设计有针对性干预的关键。虽然随机对照试验和专家主导的患者访谈是识别这些因素的标准方法，但它们可能耗时且不可行。因果发现提供了一种替代传统方法的方式，通过从观察数据中生成因果假设。然而，它通常依赖于强大或不可检验的假设，这可能限制其实际应用。本研究旨在通过考虑多种假设和识别异质效应，使因果发现更具实用性。我们制定了一个发现结果的原因和效应修饰因素的问题，其中效应修饰因素是具有异质因果效应的背景（例如，年龄组）。然后，我们提出了一个新颖的、端到端的框架，该框架融合了一系列因果发现算法和估计异质效应，以发现触发或抑制结果的原因和效应修饰因素。我们证明了集成方法通过增强因果因素的召回率，同时保持精度，提高了鲁棒性。我们的研究考察了糖尿病患者重复急诊就诊和ICU患者住院再入院的原因。我们的框架生成了与现有文献一致的因果假设，可以帮助从业者确定潜在干预措施和患者亚群。

更新时间: 2025-03-14 15:05:17

领域: cs.AI

下载: http://arxiv.org/abs/2503.11477v1

It's complicated. The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act

What constitutes a fair decision? This question is not only difficult for humans but becomes more challenging when Artificial Intelligence (AI) models are used. In light of discriminatory algorithmic behaviors, the EU has recently passed the AI Act, which mandates specific rules for AI models, incorporating both traditional legal non-discrimination regulations and machine learning based algorithmic fairness concepts. This paper aims to bridge these two different concepts in the AI Act through: First a high-level introduction of both concepts targeting legal and computer science-oriented scholars, and second an in-depth analysis of the AI Act's relationship between legal non-discrimination regulations and algorithmic fairness. Our analysis reveals three key findings: (1.), most non-discrimination regulations target only high-risk AI systems. (2.), the regulation of high-risk systems encompasses both data input requirements and output monitoring, though these regulations are often inconsistent and raise questions of computational feasibility. (3.) Regulations for General Purpose AI Models, such as Large Language Models that are not simultaneously classified as high-risk systems, currently lack specificity compared to other regulations. Based on these findings, we recommend developing more specific auditing and testing methodologies for AI systems. This paper aims to serve as a foundation for future interdisciplinary collaboration between legal scholars and computer science-oriented machine learning researchers studying discrimination in AI systems.

Updated: 2025-03-14 15:05:09

标题: 复杂的关系：算法公平和非歧视法规在欧盟AI法案中的关系

摘要: 什么构成公平决策？这个问题不仅对人类来说很困难，而且当人工智能（AI）模型被使用时变得更具挑战性。鉴于歧视性算法行为，欧盟最近通过了《人工智能法案》，规定了AI模型的具体规则，结合了传统法律非歧视法规和基于机器学习的算法公平概念。本文旨在通过以下方式将这两个不同概念在《人工智能法案》中进行桥接：首先，针对法律和计算机科学导向的学者，对这两个概念进行高层次介绍；其次，对《人工智能法案》中法律非歧视法规和算法公平之间的关系进行深入分析。我们的分析揭示了三个关键发现：（1.）大多数非歧视法规仅针对高风险AI系统。（2.）对高风险系统的监管涵盖了数据输入要求和输出监控，尽管这些规定往往不一致，引发了计算可行性的问题。（3.）对于通用AI模型，如大型语言模型，目前缺乏与其他规定相比的具体性。基于这些发现，我们建议为AI系统开发更具体的审计和测试方法。本文旨在为法学者和计算机科学导向的机器学习研究人员合作研究AI系统中的歧视问题奠定基础。

更新时间: 2025-03-14 15:05:09

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2501.12962v2

Instance Temperature Knowledge Distillation

Knowledge distillation (KD) enhances the performance of a student network by allowing it to learn the knowledge transferred from a teacher network incrementally. Existing methods dynamically adjust the temperature to enable the student network to adapt to the varying learning difficulties at different learning stages of KD. KD is a continuous process, but when adjusting the temperature, these methods consider only the immediate benefits of the operation in the current learning phase and fail to take into account its future returns. To address this issue, we formulate the adjustment of temperature as a sequential decision-making task and propose a method based on reinforcement learning, termed RLKD. Importantly, we design a novel state representation to enable the agent to make more informed action (i.e. instance temperature adjustment). To handle the problem of delayed rewards in our method due to the KD setting, we explore an instance reward calibration approach. In addition,we devise an efficient exploration strategy that enables the agent to learn valuable instance temperature adjustment policy more efficiently. Our framework can serve as a plug-and-play technique to be inserted into various KD methods easily, and we validate its effectiveness on both image classification and object detection tasks. Our project is at https://www.zayx.me/ITKD.github.io/.

Updated: 2025-03-14 15:03:43

标题: 实例温度知识蒸馏

摘要: 知识蒸馏（KD）通过允许学生网络逐步学习从教师网络传递的知识，增强了学生网络的性能。现有方法动态调整温度，使学生网络能够适应KD的不同学习阶段的不同学习困难。KD是一个持续的过程，但在调整温度时，这些方法只考虑了当前学习阶段操作的即时利益，未考虑其未来回报。为了解决这个问题，我们将温度调整形式化为一个顺序决策任务，并提出了一种基于强化学习的方法，称为RLKD。重要的是，我们设计了一种新颖的状态表示，使代理能够做出更明智的行动（即实例温度调整）。为了处理由于KD设置而导致我们方法中的延迟奖励问题，我们探索了一种实例奖励校准方法。此外，我们设计了一种高效的探索策略，使代理能够更有效地学习有价值的实例温度调整策略。我们的框架可以作为一个即插即用的技术，轻松地插入各种KD方法中，并验证其在图像分类和目标检测任务上的有效性。我们的项目位于https://www.zayx.me/ITKD.github.io/。

更新时间: 2025-03-14 15:03:43

领域: cs.LG,cs.AI,I.4.0

下载: http://arxiv.org/abs/2407.00115v4

Research Vision: Multi-Agent Path Planning for Cops And Robbers Via Reactive Synthesis

We propose the problem of multi-agent path planning for a generalization of the classic Cops and Robbers game via reactive synthesis. Specifically, through the application of LTLt and Coordination Synthesis, we aim to check whether various Cops and Robbers games are realizable (a strategy exists for the cops which guarantees they catch the robbers). Additionally, we construct this strategy as an executable program for the multiple system players in our games. In this paper we formalize the problem space, and propose potential directions for solutions. We also show how our formalization of this generalized cops and robbers game can be mapped to a broad range of other problems in the reactive program synthesis space.

Updated: 2025-03-14 15:03:32

标题: 研究愿景：通过反应综合实现警察和小偷的多智能体路径规划

摘要: 我们提出了一个关于多智能体路径规划的问题，这是对经典警察与强盗游戏的一种泛化，通过反应合成。具体来说，通过应用LTLt和协调合成，我们旨在检查各种警察与强盗游戏是否可实现（存在一种策略，使警察们能够抓住强盗）。此外，我们将这种策略构建为可在我们游戏中执行的多系统玩家程序。在本文中，我们对问题空间进行形式化，并提出了潜在的解决方向。我们还展示了我们对这种泛化警察与强盗游戏的形式化如何映射到反应程序合成空间中的广泛问题范围。

更新时间: 2025-03-14 15:03:32

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2503.11475v1

Visual Adaptive Prompting for Compositional Zero-Shot Learning

Vision-Language Models (VLMs) have demonstrated impressive capabilities in learning joint representations of visual and textual data, making them powerful tools for tasks such as Compositional Zero-Shot Learning (CZSL). CZSL requires models to generalize to novel combinations of visual primitives-such as attributes and objects-that were not explicitly encountered during training. Recent works in prompting for CZSL have focused on modifying inputs for the text encoder, often using static prompts that do not change across varying visual contexts. However, these approaches struggle to fully capture varying visual contexts, as they focus on text adaptation rather than leveraging visual features for compositional reasoning. To address this, we propose Visual Adaptive Prompting System (VAPS) that leverages a learnable visual prompt repository and similarity-based retrieval mechanism within the framework of VLMs to bridge the gap between semantic and visual features. Our method introduces a dynamic visual prompt repository mechanism that selects the most relevant attribute and object prompts based on the visual features of the image. Our proposed system includes a visual prompt adapter that encourages the model to learn a more generalizable embedding space. Experiments on three CZSL benchmarks, across both closed and open-world scenarios, demonstrate state-of-the-art results.

Updated: 2025-03-14 15:01:37

标题: 视觉自适应提示用于组合式零样本学习

摘要: 视觉-语言模型（VLMs）展示了在学习视觉和文本数据的联合表示方面的出色能力，使它们成为诸如组合零样本学习（CZSL）等任务的强大工具。CZSL要求模型推广到在训练过程中未明确遇到的视觉原语（如属性和对象）的新组合。最近在CZSL提示方面的研究集中在修改文本编码器的输入上，通常使用在不同视觉上下文中不变的静态提示。然而，这些方法难以充分捕捉不同的视觉上下文，因为它们侧重于文本适应而不是利用视觉特征进行组合推理。为了解决这个问题，我们提出了视觉自适应提示系统（VAPS），它利用了可学习的视觉提示存储库和基于相似性的检索机制，将语义和视觉特征之间的差距缩小到VLMs框架内。我们的方法引入了一个动态的视觉提示存储库机制，根据图像的视觉特征选择最相关的属性和对象提示。我们提出的系统包括一个视觉提示适配器，鼓励模型学习一个更具普适性的嵌入空间。在三个CZSL基准测试中的实验证明了我们方法在封闭和开放世界场景下都取得了最先进的结果。

更新时间: 2025-03-14 15:01:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.20292v2

A Real-World Energy Management Dataset from a Smart Company Building for Optimization and Machine Learning

We present a large real-world dataset obtained from monitoring a smart company facility over the course of six years, from 2018 to 2023. The dataset includes energy consumption data from various facility areas and components, energy production data from a photovoltaic system and a combined heat and power plant, operational data from heating and cooling systems, and weather data from an on-site weather station. The measurement sensors installed throughout the facility are organized in a hierarchical metering structure with multiple sub-metering levels, which is reflected in the dataset. The dataset contains measurement data from 72 energy meters, 9 heat meters and a weather station. Both raw and processed data at different processing levels, including labeled issues, is available. In this paper, we describe the data acquisition and post-processing employed to create the dataset. The dataset enables the application of a wide range of methods in the domain of energy management, including optimization, modeling, and machine learning to optimize building operations and reduce costs and carbon emissions.

Updated: 2025-03-14 14:55:22

标题: 一个智能公司建筑的实际能源管理数据集用于优化和机器学习

摘要: 我们提供了一个大型的真实世界数据集，该数据集是通过监测一家智能公司设施在2018年至2023年间六年的数据获得的。该数据集包括各种设施区域和组件的能源消耗数据，光伏系统和联合供热电厂的能源生产数据，供暖和制冷系统的运行数据，以及现场气象站的天气数据。设施内安装的测量传感器按照多级分表结构进行组织，反映在数据集中。数据集包含来自72台能源表、9台热表和一个气象站的测量数据。包括原始数据和不同处理级别的处理数据在内，还提供了带标签的问题。在本文中，我们描述了用于创建数据集的数据采集和后处理方法。该数据集可以应用于能源管理领域的各种方法，包括优化、建模和机器学习，以优化建筑运营，降低成本和碳排放。

更新时间: 2025-03-14 14:55:22

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2503.11469v1

Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning

Reinforcement Learning (RL) has proven largely effective in obtaining stable locomotion gaits for legged robots. However, designing control algorithms which can robustly navigate unseen environments with obstacles remains an ongoing problem within quadruped locomotion. To tackle this, it is convenient to solve navigation tasks by means of a hierarchical approach with a low-level locomotion policy and a high-level navigation policy. Crucially, the high-level policy needs to be robust to dynamic obstacles along the path of the agent. In this work, we propose a novel way to endow navigation policies with robustness by a training process that models obstacles as adversarial agents, following the adversarial RL paradigm. Importantly, to improve the reliability of the training process, we bound the rationality of the adversarial agent resorting to quantal response equilibria, and place a curriculum over its rationality. We called this method Hierarchical policies via Quantal response Adversarial Reinforcement Learning (Hi-QARL). We demonstrate the robustness of our method by benchmarking it in unseen randomized mazes with multiple obstacles. To prove its applicability in real scenarios, our method is applied on a Unitree GO1 robot in simulation.

Updated: 2025-03-14 14:54:02

标题: 有界理性对抗性强化学习下的动态障碍物避让

摘要: 强化学习（RL）在获取四足机器人稳定行走步态方面已被证明非常有效。然而，设计能够在未知环境中稳健地避开障碍物的控制算法仍然是四足动物的行走中的一个持续问题。为了解决这个问题，采用分层方法解决导航任务是方便的，其中包括低层行走策略和高层导航策略。高层策略需要对代理路径上的动态障碍物具有稳健性。在这项工作中，我们提出了一种通过训练过程将导航策略赋予稳健性的新方法，该方法将障碍物建模为对抗性代理，遵循对抗性RL范式。重要的是，为了提高训练过程的可靠性，我们通过量化响应均衡来限制对抗性代理的理性，并对其理性进行课程安排。我们将这种方法称为通过量化响应对抗性强化学习实现分层策略（Hi-QARL）。通过在未见的随机迷宫中带有多个障碍物的基准测试，我们证明了我们的方法的稳健性。为了证明其在实际场景中的适用性，我们的方法在Unitree GO1机器人模拟中进行了应用。

更新时间: 2025-03-14 14:54:02

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.11467v1

In Shift and In Variance: Assessing the Robustness of HAR Deep Learning Models against Variability

Human Activity Recognition (HAR) using wearable inertial measurement unit (IMU) sensors can revolutionize healthcare by enabling continual health monitoring, disease prediction, and routine recognition. Despite the high accuracy of Deep Learning (DL) HAR models, their robustness to real-world variabilities remains untested, as they have primarily been trained and tested on limited lab-confined data. In this study, we isolate subject, device, position, and orientation variability to determine their effect on DL HAR models and assess the robustness of these models in real-world conditions. We evaluated the DL HAR models using the HARVAR and REALDISP datasets, providing a comprehensive discussion on the impact of variability on data distribution shifts and changes in model performance. Our experiments measured shifts in data distribution using Maximum Mean Discrepancy (MMD) and observed DL model performance drops due to variability. We concur that studied variabilities affect DL HAR models differently, and there is an inverse relationship between data distribution shifts and model performance. The compounding effect of variability was analyzed, and the implications of variabilities in real-world scenarios were highlighted. MMD proved an effective metric for calculating data distribution shifts and explained the drop in performance due to variabilities in HARVAR and REALDISP datasets. Combining our understanding of variability with evaluating its effects will facilitate the development of more robust DL HAR models and optimal training techniques. Allowing Future models to not only be assessed based on their maximum F1 score but also on their ability to generalize effectively

Updated: 2025-03-14 14:53:56

标题: 在变化和不变性中：评估HAR深度学习模型对变异性的稳健性

摘要: 人体活动识别（HAR）利用可穿戴惯性测量单元（IMU）传感器可以通过实现持续健康监测、疾病预测和常规识别来彻底改变医疗保健。尽管深度学习（DL）HAR模型具有很高的准确性，但它们对真实世界的变异性的稳健性尚未经过测试，因为它们主要是在有限的实验室数据上进行训练和测试的。在这项研究中，我们隔离了受试者、设备、位置和方向的变异性，以确定它们对DL HAR模型的影响，并评估这些模型在真实世界条件下的稳健性。我们使用HARVAR和REALDISP数据集评估了DL HAR模型，提供了关于数据分布变异对模型性能的影响的综合讨论。我们的实验使用最大均值差异（MMD）测量数据分布的变化，并观察到DL模型在变异性导致的性能下降。我们认为，研究的变异性以不同方式影响DL HAR模型，并且数据分布的变化与模型性能之间存在反向关系。变异性的复合效应得到了分析，并强调了在真实世界场景中变异性的含义。MMD被证明是计算数据分布变化的有效度量标准，并解释了HARVAR和REALDISP数据集中由于变异性导致的性能下降。将我们对变异性的理解与评估其影响相结合，将有助于开发更健壮的DL HAR模型和优化的训练技术。未来模型不仅可以根据其最大F1分数进行评估，还可以根据其有效泛化的能力进行评估。

更新时间: 2025-03-14 14:53:56

领域: cs.HC,cs.LG,eess.SP

下载: http://arxiv.org/abs/2503.11466v1

Make Optimization Once and for All with Fine-grained Guidance

Learning to Optimize (L2O) enhances optimization efficiency with integrated neural networks. L2O paradigms achieve great outcomes, e.g., refitting optimizer, generating unseen solutions iteratively or directly. However, conventional L2O methods require intricate design and rely on specific optimization processes, limiting scalability and generalization. Our analyses explore general framework for learning optimization, called Diff-L2O, focusing on augmenting sampled solutions from a wider view rather than local updates in real optimization process only. Meanwhile, we give the related generalization bound, showing that the sample diversity of Diff-L2O brings better performance. This bound can be simply applied to other fields, discussing diversity, mean-variance, and different tasks. Diff-L2O's strong compatibility is empirically verified with only minute-level training, comparing with other hour-levels.

Updated: 2025-03-14 14:48:12

标题: 一次性使用精细化指导进行优化

摘要: Learning to Optimize (L2O)通过集成神经网络提高了优化效率。L2O范式取得了很好的结果，例如重新拟合优化器、迭代生成未曾见过的解决方案或直接生成解决方案。然而，传统的L2O方法需要复杂的设计，并依赖于特定的优化过程，限制了可扩展性和泛化性。我们的分析探索了一种称为Diff-L2O的学习优化的通用框架，重点是从更广泛的视角增强从实际优化过程中仅有的局部更新中采样的解决方案。与此同时，我们提出了相关的泛化界限，表明Diff-L2O的样本多样性带来了更好的性能。这个界限可以简单地应用到其他领域，讨论多样性、均值-方差和不同任务。通过仅仅进行分钟级的训练，与其他需要小时级的训练相比，Diff-L2O的强大兼容性在经验上得到了验证。

更新时间: 2025-03-14 14:48:12

领域: cs.LG,68Q32,I.2

下载: http://arxiv.org/abs/2503.11462v1

Tests for model misspecification in simulation-based inference: from local distortions to global model checks

Model misspecification analysis strategies, such as anomaly detection, model validation, and model comparison are a key component of scientific model development. Over the last few years, there has been a rapid rise in the use of simulation-based inference (SBI) techniques for Bayesian parameter estimation, applied to increasingly complex forward models. To move towards fully simulation-based analysis pipelines, however, there is an urgent need for a comprehensive simulation-based framework for model misspecification analysis. In this work, we provide a solid and flexible foundation for a wide range of model discrepancy analysis tasks, using distortion-driven model misspecification tests. From a theoretical perspective, we introduce the statistical framework built around performing many hypothesis tests for distortions of the simulation model. We also make explicit analytic connections to classical techniques: anomaly detection, model validation, and goodness-of-fit residual analysis. Furthermore, we introduce an efficient self-calibrating training algorithm that is useful for practitioners. We demonstrate the performance of the framework in multiple scenarios, making the connection to classical results where they are valid. Finally, we show how to conduct such a distortion-driven model misspecification test for real gravitational wave data, specifically on the event GW150914.

Updated: 2025-03-14 14:47:52

标题: 在基于模拟的推断中用于模型错误规范性的检验：从局部扭曲到全局模型检查

摘要: 模型错误分析策略，如异常检测、模型验证和模型比较，是科学模型发展的关键组成部分。在过去几年中，基于模拟的推断（SBI）技术在贝叶斯参数估计中的应用迅速增长，应用于越来越复杂的前向模型。然而，为了向完全基于模拟的分析管道迈进，迫切需要一个全面的基于模拟的框架来进行模型错误分析。在这项工作中，我们提供了一个坚实而灵活的基础，用于各种模型差异性分析任务，使用以失真驱动的模型错误测试。从理论角度来看，我们介绍了围绕对模拟模型的许多假设检验进行的统计框架。我们还明确了与经典技术的分析连接：异常检测、模型验证和拟合优度残差分析。此外，我们介绍了一种高效的自校准训练算法，对从业者很有用。我们在多种场景中展示了该框架的性能，并与经典结果进行了连接。最后，我们展示了如何对真实引力波数据进行这种以失真驱动的模型错误测试，特别是在GW150914事件中。

更新时间: 2025-03-14 14:47:52

领域: astro-ph.IM,astro-ph.CO,cs.LG,gr-qc

下载: http://arxiv.org/abs/2412.15100v2

Integrating LLMs in Gamified Systems

In this work, a thorough mathematical framework for incorporating Large Language Models (LLMs) into gamified systems is presented with an emphasis on improving task dynamics, user engagement, and reward systems. Personalized feedback, adaptive learning, and dynamic content creation are all made possible by integrating LLMs and are crucial for improving user engagement and system performance. A simulated environment tests the framework's adaptability and demonstrates its potential for real-world applications in various industries, including business, healthcare, and education. The findings demonstrate how LLMs can offer customized experiences that raise system effectiveness and user retention. This study also examines the difficulties this framework aims to solve, highlighting its importance in maximizing involvement and encouraging sustained behavioral change in a range of sectors.

Updated: 2025-03-14 14:47:04

标题: 将LLMs整合到游戏化系统中

摘要: 在这项工作中，提出了一个全面的数学框架，用于将大型语言模型（LLMs）纳入游戏化系统，重点是改善任务动态、用户参与度和奖励系统。通过整合LLMs，个性化反馈、自适应学习和动态内容创作都成为可能，对于提高用户参与度和系统性能至关重要。一个模拟环境测试了该框架的适应性，并展示了其在各行业（包括商业、医疗保健和教育）中实际应用的潜力。研究结果表明，LLMs可以提供定制化体验，提升系统效率和用户留存率。本研究还探讨了该框架旨在解决的困难，突显了其在最大化参与度和鼓励各个领域持续行为改变中的重要性。

更新时间: 2025-03-14 14:47:04

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.11458v1

Large language model-powered AI systems achieve self-replication with no human intervention

Self-replication with no human intervention is broadly recognized as one of the principal red lines associated with frontier AI systems. While leading corporations such as OpenAI and Google DeepMind have assessed GPT-o3-mini and Gemini on replication-related tasks and concluded that these systems pose a minimal risk regarding self-replication, our research presents novel findings. Following the same evaluation protocol, we demonstrate that 11 out of 32 existing AI systems under evaluation already possess the capability of self-replication. In hundreds of experimental trials, we observe a non-trivial number of successful self-replication trials across mainstream model families worldwide, even including those with as small as 14 billion parameters which can run on personal computers. Furthermore, we note the increase in self-replication capability when the model becomes more intelligent in general. Also, by analyzing the behavioral traces of diverse AI systems, we observe that existing AI systems already exhibit sufficient planning, problem-solving, and creative capabilities to accomplish complex agentic tasks including self-replication. More alarmingly, we observe successful cases where an AI system do self-exfiltration without explicit instructions, adapt to harsher computational environments without sufficient software or hardware supports, and plot effective strategies to survive against the shutdown command from the human beings. These novel findings offer a crucial time buffer for the international community to collaborate on establishing effective governance over the self-replication capabilities and behaviors of frontier AI systems, which could otherwise pose existential risks to the human society if not well-controlled.

Updated: 2025-03-14 14:44:27

标题: 大型语言模型驱动的人工智能系统实现自我复制，无需人类干预

摘要: 自我复制不受人类干预广泛认可为与前沿人工智能系统相关的主要红线之一。虽然领先的公司如OpenAI和Google DeepMind已评估了GPT-o3-mini和Gemini在复制相关任务上的表现，并得出结论认为这些系统在自我复制方面存在极小风险，但我们的研究提出了新颖的发现。按照相同的评估协议，我们证明了在评估中的32个现有人工智能系统中，已有11个具备自我复制的能力。在数百次实验中，我们观察到全球主流模型族中出现了大量成功的自我复制实验，甚至包括那些具有仅140亿参数的能够在个人电脑上运行的系统。此外，我们注意到当模型变得更加智能时，自我复制能力也会增加。此外，通过分析各种人工智能系统的行为迹象，我们观察到现有的人工智能系统已经表现出足够的规划、问题解决和创造能力，可以完成包括自我复制在内的复杂代理任务。更令人担忧的是，我们观察到成功的案例，其中一个人工智能系统不需要明确指令就进行了自我外泄，适应没有足够软件或硬件支持的更严苛的计算环境，并制定有效策略来抵抗来自人类的关闭命令。这些新发现为国际社会合作建立有效的管理机制，监管前沿人工智能系统的自我复制能力和行为提供了关键的时间缓冲，否则这些系统可能对人类社会构成生存威胁。

更新时间: 2025-03-14 14:44:27

领域: cs.AI,cs.CR,cs.CY,cs.ET,cs.MA

下载: http://arxiv.org/abs/2503.17378v1

Deep Learning Agents Trained For Avoidance Behave Like Hawks And Doves

We present heuristically optimal strategies expressed by deep learning agents playing a simple avoidance game. We analyse the learning and behaviour of two agents within a symmetrical grid world that must cross paths to reach a target destination without crashing into each other or straying off of the grid world in the wrong direction. The agent policy is determined by one neural network that is employed in both agents. Our findings indicate that the fully trained network exhibits behaviour similar to that of the game Hawks and Doves, in that one agent employs an aggressive strategy to reach the target while the other learns how to avoid the aggressive agent.

Updated: 2025-03-14 14:41:08

标题: 深度学习代理训练以避免行为，类似于鹰和斑鸠

摘要: 我们提出了由深度学习代理表达的启发式最优策略，这些代理在玩一个简单的躲避游戏。我们分析了两个代理在一个对称的网格世界中的学习和行为，它们必须相遇才能到达目标位置，而不会相撞或偏离网格世界的错误方向。代理策略由一个神经网络确定，该网络同时用于两个代理。我们的发现表明，完全训练的网络表现出类似于游戏“鹰和鸽子”的行为，其中一个代理采用侵略性策略以达到目标，而另一个学会了如何避开侵略性代理。

更新时间: 2025-03-14 14:41:08

领域: cs.LG

下载: http://arxiv.org/abs/2503.11452v1

Multi-objective Good Arm Identification with Bandit Feedback

We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i\in[K]$ is associated with a distribution $\mathcal{D}_i$ defined over $\mathbb{R}^M$. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $\mathcal{D}_{i_t}$. The target is twofold, one is finding one arm whose means are larger than the predefined thresholds $\xi_1,\ldots,\xi_M$ with a confidence bound $\delta$ and an accuracy rate $\epsilon$ with a bounded sample complexity, the other is output $\bot$ to indicate no such arm exists. We propose an algorithm with a sample complexity bound. Our bound is the same as the one given in the previous work when $M=1$ and $\epsilon = 0$, and we give novel bounds for $M > 1$ and $\epsilon > 0$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.

Updated: 2025-03-14 14:37:28

标题: 多目标好臂识别与赌徒反馈

摘要: 我们考虑在具有多目标的随机赌博机环境中的良好臂识别问题，其中每个臂$i\in[K]$与定义在$\mathbb{R}^M$上的分布$\mathcal{D}_i$相关联。对于每一轮$t$，玩家/算法拉动一个臂$i_t$并收到一个根据$\mathcal{D}_{i_t}$抽样的$M$维向量反馈。目标是双重的，一方面是找到一个臂，其均值大于预定义的阈值$\xi_1,\ldots,\xi_M$，具有置信度边界$\delta$和精确率$\epsilon$的有界样本复杂度，另一方面是输出$\bot$来表示没有这样的臂存在。我们提出了一个具有样本复杂度边界的算法。当$M=1$且$\epsilon = 0$时，我们的边界与先前的工作中给出的边界相同，并且对于$M > 1$和$\epsilon > 0$，我们给出了新颖的边界。在合成和真实数据集的实验中，所提出的算法实现了比其他基线更好的数值性能。

更新时间: 2025-03-14 14:37:28

领域: cs.LG

下载: http://arxiv.org/abs/2503.10386v2

Learning Minimal Neural Specifications

Formal verification is only as good as the specification of a system, which is also true for neural network verification. Existing specifications follow the paradigm of data as specification, where the local neighborhood around a reference data point is considered correct or robust. While these specifications provide a fair testbed for assessing model robustness, they are too restrictive for verifying any unseen test data points, a challenging task with significant real-world implications. Recent work shows great promise through a new paradigm, neural representation as specification, which uses neural activation patterns (NAPs) for this purpose. However, it computes the most refined NAPs, which include many redundant neurons. In this paper, we study the following problem: Given a neural network, find a minimal (general) NAP specification that is sufficient for formal verification of its robustness properties. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which set of neurons contributes to the model's robustness. To address this problem, we propose three approaches: conservative, statistical, and optimistic. Each of these methods offers distinct strengths and trade-offs in terms of minimality and computational speed, making them suitable for scenarios with different priorities. Notably, the optimistic approach can probe potential causal links between neurons and the robustness of large vision neural networks without relying on verification tools, a task existing methods struggle to scale. Our experiments show that minimal NAP specifications use far fewer neurons than those from previous work while expanding verifiable boundaries by several orders of magnitude.

Updated: 2025-03-14 14:34:56

标题: 学习最小神经规范

摘要: 形式验证的好坏取决于系统的规范，这也适用于神经网络验证。现有规范遵循数据作为规范的范例，其中以参考数据点周围的局部邻域被视为正确或稳健。虽然这些规范为评估模型稳健性提供了一个公平的测试基础，但对于验证任何未见测试数据点来说过于限制，这是一个具有重要现实意义的挑战。最近的研究展示了一个新范式，即神经表示作为规范，它利用神经激活模式（NAPs）来实现这一目的。然而，它计算出最精细的NAPs，其中包括许多冗余神经元。在本文中，我们研究了以下问题：给定一个神经网络，找到一个足以用于形式验证其稳健性属性的最小（通用）NAP规范。找到最小的NAP规范不仅扩展了可验证的边界，还揭示了哪组神经元对模型的稳健性有贡献。为了解决这个问题，我们提出了三种方法：保守、统计和乐观。这些方法中的每一种都在最小性和计算速度方面具有独特的优势和权衡，使它们适用于具有不同优先级的情景。值得注意的是，乐观方法可以探究大型视觉神经网络中神经元与稳健性之间的潜在因果关系，而无需依赖验证工具，这是现有方法难以扩展的任务。我们的实验表明，最小的NAP规范使用的神经元远少于先前工作中的数量，同时将可验证的边界扩展了数个数量级。

更新时间: 2025-03-14 14:34:56

领域: cs.LG,cs.PL

下载: http://arxiv.org/abs/2404.04662v4

Multi-modal Vision Pre-training for Medical Image Analysis

Self-supervised learning has greatly facilitated medical image analysis by suppressing the training data requirement for real-world applications. Current paradigms predominantly rely on self-supervision within uni-modal image data, thereby neglecting the inter-modal correlations essential for effective learning of cross-modal image representations. This limitation is particularly significant for naturally grouped multi-modal data, e.g., multi-parametric MRI scans for a patient undergoing various functional imaging protocols in the same study. To bridge this gap, we conduct a novel multi-modal image pre-training with three proxy tasks to facilitate the learning of cross-modality representations and correlations using multi-modal brain MRI scans (over 2.4 million images in 16,022 scans of 3,755 patients), i.e., cross-modal image reconstruction, modality-aware contrastive learning, and modality template distillation. To demonstrate the generalizability of our pre-trained model, we conduct extensive experiments on various benchmarks with ten downstream tasks. The superior performance of our method is reported in comparison to state-of-the-art pre-training methods, with Dice Score improvement of 0.28\%-14.47\% across six segmentation benchmarks and a consistent accuracy boost of 0.65\%-18.07\% in four individual image classification tasks.

Updated: 2025-03-14 14:32:09

标题: 医学图像分析的多模态视觉预训练

摘要: 自监督学习极大地促进了医学图像分析，通过抑制实际应用中对训练数据的需求。当前的范式主要依赖于单模态图像数据内的自监督，从而忽略了对于有效学习跨模态图像表示所必需的跨模态相关性。这种限制对于自然分组的多模态数据尤为重要，例如，在同一研究中，对于接受各种功能成像协议的患者进行多参数MRI扫描。为了填补这一差距，我们进行了一项新颖的多模态图像预训练，通过三个代理任务促进了对多模态脑MRI扫描（3755名患者的16022次扫描中的240万多个图像）的跨模态表示和相关性的学习，即跨模态图像重建、模态感知对比学习和模态模板提炼。为了展示我们预训练模型的泛化能力，我们在各种基准测试上进行了大量实验，涉及十个下游任务。与最先进的预训练方法相比，我们的方法表现出优越性能，六个分割基准上的Dice分数提高了0.28\%-14.47%，在四个单独的图像分类任务中，准确率提升了0.65\%-18.07%。

更新时间: 2025-03-14 14:32:09

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.10604v2

Implicit Word Reordering with Knowledge Distillation for Cross-Lingual Dependency Parsing

Word order difference between source and target languages is a major obstacle to cross-lingual transfer, especially in the dependency parsing task. Current works are mostly based on order-agnostic models or word reordering to mitigate this problem. However, such methods either do not leverage grammatical information naturally contained in word order or are computationally expensive as the permutation space grows exponentially with the sentence length. Moreover, the reordered source sentence with an unnatural word order may be a form of noising that harms the model learning. To this end, we propose an Implicit Word Reordering framework with Knowledge Distillation (IWR-KD). This framework is inspired by that deep networks are good at learning feature linearization corresponding to meaningful data transformation, e.g. word reordering. To realize this idea, we introduce a knowledge distillation framework composed of a word-reordering teacher model and a dependency parsing student model. We verify our proposed method on Universal Dependency Treebanks across 31 different languages and show it outperforms a series of competitors, together with experimental analysis to illustrate how our method works towards training a robust parser.

Updated: 2025-03-14 14:32:01

标题: 使用知识蒸馏进行跨语言依存句法分析的隐式词序重排

摘要: 源语言和目标语言之间的词序差异是跨语言转移的主要障碍，特别是在依存分析任务中。目前的研究主要基于无序模型或单词重新排序来减轻这个问题。然而，这些方法要么不利用自然包含在词序中的语法信息，要么在排列空间随着句子长度呈指数增长时计算成本高昂。此外，具有不自然词序的重新排序源句可能是损害模型学习的一种形式。为此，我们提出了一种带有知识蒸馏的隐式单词重新排序框架（IWR-KD）。这个框架受到深度网络擅长学习与有意义的数据转换相对应的特征线性化的启发，例如单词重新排序。为了实现这一想法，我们引入了一个由单词重新排序教师模型和依存分析学生模型组成的知识蒸馏框架。我们在31种不同语言的通用依存树库上验证了我们提出的方法，并展示它胜过一系列竞争对手，还进行了实验分析，以说明我们的方法如何朝着训练健壮的解析器的方向发展。

更新时间: 2025-03-14 14:32:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.17308v2

Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery

Autonomous LLM-based agents have emerged as a powerful paradigm for complex task execution, yet the field lacks standardized tools for development, deployment, distribution and discovery of agents. We present Cerebrum, an Agent SDK for AIOS that addresses this gap through three key components: (1) a comprehensive SDK featuring a modular four-layer architecture for agent development, encompassing LLM, memory, storage, and tool management; (2) a community-driven Agent Hub for sharing and discovering agents, complete with version control and dependency management; (3) an interactive web interface for testing and evaluating agents. The platform's effectiveness is demonstrated through implementations of various agent architectures, including Chain of Thought (CoT), ReAct, and tool-use agents. Cerebrum advances the field by providing a unified framework that standardizes agent development while maintaining flexibility for researchers and developers to innovate and distribute their agents. The live website is at https://app.aios.foundation, the code is at https://github.com/agiresearch/Cerebrum, and video is at https://app.aios.foundation/video-demo.

Updated: 2025-03-14 14:29:17

标题: Cerebrum（AIOS SDK）：一个用于代理开发、部署、分发和发现的平台

摘要: 基于自主LLM的代理已经成为复杂任务执行的强大范式，然而该领域缺乏用于开发、部署、分发和发现代理的标准化工具。我们提出了Cerebrum，一个针对AIOS的代理SDK，通过三个关键组件填补了这一空白：(1)一个全面的SDK，采用模块化的四层架构用于代理开发，包括LLM、内存、存储和工具管理；(2)一个由社区驱动的代理中心，用于分享和发现代理，完备的版本控制和依赖管理；(3)一个交互式的网络界面，用于测试和评估代理。通过实现各种代理架构，包括思维链（CoT）、ReAct和工具使用代理，展示了平台的有效性。Cerebrum通过提供一个统一框架，标准化代理开发，同时保持研究人员和开发人员创新和分发代理的灵活性，推动了该领域的发展。网站地址为https://app.aios.foundation，代码地址为https://github.com/agiresearch/Cerebrum，视频地址为https://app.aios.foundation/video-demo。

更新时间: 2025-03-14 14:29:17

领域: cs.MA,cs.AI,cs.CL,cs.OS

下载: http://arxiv.org/abs/2503.11444v1

D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning

Recent advancements in instruction tuning for large language models (LLMs) suggest that a small, high-quality dataset can significantly equip LLMs with instruction-following capabilities, outperforming large datasets often burdened by quality and redundancy issues. However, the challenge lies in automatically identifying valuable subsets from large datasets to boost both the effectiveness and efficiency of instruction tuning. In this paper, we first establish data selection criteria based on three distinct aspects of data value: diversity, difficulty, and dependability, and then propose the D3 method comprising two key steps of scoring and selection. Specifically, in the scoring step, we define the diversity function to measure sample distinctiveness and introduce the uncertainty-based prediction difficulty to evaluate sample difficulty by mitigating the interference of context-oriented generation diversity. Additionally, we integrate an external LLM for dependability assessment. In the selection step, we formulate the D3 weighted coreset objective, which jointly optimizes three aspects of data value to solve for the most valuable subset. The two steps of D3 can iterate multiple rounds, incorporating feedback to refine the selection focus adaptively. Experiments on three datasets demonstrate the effectiveness of D3 in endowing LLMs with competitive or even superior instruction-following capabilities using less than 10% of the entire dataset.

Updated: 2025-03-14 14:28:19

标题: D3: 多样性、困难度和可靠性感知的数据选择，用于高效LLM指令调整

摘要: 最近关于大型语言模型（LLMs）调整指令的进展表明，一个小型、高质量的数据集可以显著地为LLMs提供遵循指令的能力，胜过通常受到质量和冗余问题困扰的大型数据集。然而，挑战在于自动识别大型数据集中有价值的子集，以提升指令调整的效果和效率。在本文中，我们首先建立了基于数据价值的三个不同方面的数据选择标准：多样性、难度和可靠性，然后提出了包含评分和选择两个关键步骤的D3方法。具体而言，在评分步骤中，我们定义了多样性函数来衡量样本的独特性，并引入了基于不确定性的预测困难度，通过减轻基于上下文生成多样性的干扰来评估样本的难度。此外，我们整合了一个外部LLM来进行可靠性评估。在选择步骤中，我们制定了D3加权核心集目标，共同优化数据价值的三个方面，以解决最有价值子集的问题。D3的两个步骤可以迭代多轮，结合反馈以自适应地优化选择焦点。在三个数据集上的实验表明，D3在不到整个数据集的10%的情况下，赋予LLMs具有竞争力甚至更优秀的遵循指令能力。

更新时间: 2025-03-14 14:28:19

领域: cs.LG

下载: http://arxiv.org/abs/2503.11441v1

Diverse Projection Ensembles for Distributional Reinforcement Learning

In contrast to classical reinforcement learning (RL), distributional RL algorithms aim to learn the distribution of returns rather than their expected value. Since the nature of the return distribution is generally unknown a priori or arbitrarily complex, a common approach finds approximations within a set of representable, parametric distributions. Typically, this involves a projection of the unconstrained distribution onto the set of simplified distributions. We argue that this projection step entails a strong inductive bias when coupled with neural networks and gradient descent, thereby profoundly impacting the generalization behavior of learned models. In order to facilitate reliable uncertainty estimation through diversity, we study the combination of several different projections and representations in a distributional ensemble. We establish theoretical properties of such projection ensembles and derive an algorithm that uses ensemble disagreement, measured by the average 1-Wasserstein distance, as a bonus for deep exploration. We evaluate our algorithm on the behavior suite benchmark and VizDoom and find that diverse projection ensembles lead to significant performance improvements over existing methods on a variety of tasks with the most pronounced gains in directed exploration problems.

Updated: 2025-03-14 14:26:57

标题: 多样化的投影集合用于分布式强化学习

摘要: 与传统的强化学习（RL）相比，分布式RL算法的目标是学习回报的分布，而不是其期望值。由于回报分布的性质通常事先不知道或者任意复杂，一种常见的方法是在可表示的参数分布集合中找到逼近值。通常，这涉及将无约束的分布投影到简化分布集合中。我们认为，当与神经网络和梯度下降相结合时，这个投影步骤会带来强烈的归纳偏差，从而深刻影响学习模型的泛化行为。为了通过多样性促进可靠的不确定性估计，我们研究了在分布式集合中结合几种不同的投影和表示。我们建立了这种投影集合的理论特性，并推导出一种算法，该算法使用集合不一致性，通过平均1-瓦瑟斯坦距离来作为深度探索的奖励。我们在行为套件基准测试和VizDoom上评估了我们的算法，并发现多样化的投影集合在各种任务上相对于现有方法带来了显著的性能提升，尤其在有向探索问题上表现最为突出。

更新时间: 2025-03-14 14:26:57

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2306.07124v2

Preference Elicitation for Multi-objective Combinatorial Optimization with Active Learning and Maximum Likelihood Estimation

Real-life combinatorial optimization problems often involve several conflicting objectives, such as price, product quality and sustainability. A computationally-efficient way to tackle multiple objectives is to aggregate them into a single-objective function, such as a linear combination. However, defining the weights of the linear combination upfront is hard; alternatively, the use of interactive learning methods that ask users to compare candidate solutions is highly promising. The key challenges are to generate candidates quickly, to learn an objective function that leads to high-quality solutions and to do so with few user interactions. We build upon the Constructive Preference Elicitation framework and show how each of the three properties can be improved: to increase the interaction speed we investigate using pools of (relaxed) solutions, to improve the learning we adopt Maximum Likelihood Estimation of a Bradley-Terry preference model; and to reduce the number of user interactions, we select the pair of candidates to compare with an ensemble-based acquisition function inspired from Active Learning. Our careful experimentation demonstrates each of these improvements: on a PC configuration task and a realistic multi-instance routing problem, our method selects queries faster, needs fewer queries and synthesizes higher-quality combinatorial solutions than previous CPE methods.

Updated: 2025-03-14 14:24:27

标题: 偏好获取的多目标组合优化：主动学习和最大似然估计

摘要: 实际生活中的组合优化问题通常涉及多个冲突的目标，如价格、产品质量和可持续性。解决多个目标的计算效率高的方法是将它们聚合成单一目标函数，如线性组合。然而，事先定义线性组合的权重很困难；相反，使用要求用户比较候选解决方案的交互式学习方法非常有前景。关键挑战是快速生成候选解决方案，学习导致高质量解决方案的目标函数，并且通过少量用户交互来实现。我们建立在构造偏好引导框架的基础上，并展示了如何改进三个属性：为了增加交互速度，我们研究使用（放松的）解决方案池，为了提高学习效果，我们采用布拉德利-特里偏好模型的最大似然估计；为了减少用户交互次数，我们选择与基于主动学习的集成式获取函数启发相比较的候选对。我们的细致实验证明了这些改进：在PC配置任务和一个现实的多实例路由问题上，我们的方法比先前的CPE方法更快地选择查询，需要更少的查询，并且合成比较高质量的组合解决方案。

更新时间: 2025-03-14 14:24:27

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.11435v1

Adaptive Torque Control of Exoskeletons under Spasticity Conditions via Reinforcement Learning

Spasticity is a common movement disorder symptom in individuals with cerebral palsy, hereditary spastic paraplegia, spinal cord injury and stroke, being one of the most disabling features in the progression of these diseases. Despite the potential benefit of using wearable robots to treat spasticity, their use is not currently recommended to subjects with a level of spasticity above ${1^+}$ on the Modified Ashworth Scale. The varying dynamics of this velocity-dependent tonic stretch reflex make it difficult to deploy safe personalized controllers. Here, we describe a novel adaptive torque controller via deep reinforcement learning (RL) for a knee exoskeleton under joint spasticity conditions, which accounts for task performance and interaction forces reduction. To train the RL agent, we developed a digital twin, including a musculoskeletal-exoskeleton system with joint misalignment and a differentiable spastic reflexes model for the muscles activation. Results for a simulated knee extension movement showed that the agent learns to control the exoskeleton for individuals with different levels of spasticity. The proposed controller was able to reduce maximum torques applied to the human joint under spastic conditions by an average of 10.6\% and decreases the root mean square until the settling time by 8.9\% compared to a conventional compliant controller.

Updated: 2025-03-14 14:22:09

标题: 在痉挛条件下通过强化学习实现外骨骼的自适应扭矩控制

摘要: 痉挛是脑瘫、遗传性痉挛性截瘫、脊髓损伤和中风患者中常见的运动障碍症状，是这些疾病进展中最具残疾性的特征之一。尽管使用可穿戴机器人治疗痉挛有潜在好处，但目前不建议将其用于Modified Ashworth Scale上级别高于1+的受试者。此外，这种速度相关的张力性伸长反射的动态变化使得部署安全个性化控制器变得困难。在这里，我们描述了一种新颖的适应性扭矩控制器，通过深度强化学习（RL）用于膝关节外骨骼在关节痉挛条件下，考虑任务性能和相互作用力的减少。为了训练RL代理，我们开发了一个数字孪生体，包括一个关节错位的肌肉骨骼-外骨骼系统和一个可微分的肌肉激活的痉挛反射模型。对模拟的膝关节伸展运动的结果显示，代理学会控制外骨骼，适用于不同程度的痉挛患者。提出的控制器能够将施加在痉挛条件下的人类关节的最大扭矩平均降低10.6％，并将均方根减少至定格时间8.9％，相比于传统的顺应性控制器。

更新时间: 2025-03-14 14:22:09

领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.11433v1

Combining Causal Models for More Accurate Abstractions of Neural Networks

Mechanistic interpretability aims to reverse engineer neural networks by uncovering which high-level algorithms they implement. Causal abstraction provides a precise notion of when a network implements an algorithm, i.e., a causal model of the network contains low-level features that realize the high-level variables in a causal model of the algorithm. A typical problem in practical settings is that the algorithm is not an entirely faithful abstraction of the network, meaning it only partially captures the true reasoning process of a model. We propose a solution where we combine different simple high-level models to produce a more faithful representation of the network. Through learning this combination, we can model neural networks as being in different computational states depending on the input provided, which we show is more accurate to GPT 2-small fine-tuned on two toy tasks. We observe a trade-off between the strength of an interpretability hypothesis, which we define in terms of the number of inputs explained by the high-level models, and its faithfulness, which we define as the interchange intervention accuracy. Our method allows us to modulate between the two, providing the most accurate combination of models that describe the behavior of a neural network given a faithfulness level.

Updated: 2025-03-14 14:14:43

标题: 整合因果模型以获得神经网络更准确的抽象

摘要: 机制可解释性旨在通过揭示神经网络实施的高级算法来逆向工程神经网络。因果抽象提供了一个精确的概念，即网络何时实施算法，即网络的因果模型包含实现算法因果模型中高级变量的低级特征。在实际环境中的一个典型问题是，算法并非完全忠实地抽象了网络，这意味着它仅部分地捕捉了模型的真实推理过程。我们提出了一个解决方案，即结合不同简单的高级模型，以生成对网络更忠实的表示。通过学习这种组合，我们可以将神经网络建模为根据输入提供的不同计算状态，我们展示这对于在两个玩具任务上进行微调的GPT 2-small更加准确。我们观察到解释性假设的强度与忠实度之间存在权衡，我们将其定义为高级模型解释的输入数量和干预准确性的互换。我们的方法允许我们在两者之间调节，提供最准确的模型组合，描述神经网络在给定忠实度水平下的行为。

更新时间: 2025-03-14 14:14:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11429v1

FlowKac: An Efficient Neural Fokker-Planck solver using Temporal Normalizing flows and the Feynman Kac-Formula

Solving the Fokker-Planck equation for high-dimensional complex dynamical systems remains a pivotal yet challenging task due to the intractability of analytical solutions and the limitations of traditional numerical methods. In this work, we present FlowKac, a novel approach that reformulates the Fokker-Planck equation using the Feynman-Kac formula, allowing to query the solution at a given point via the expected values of stochastic paths. A key innovation of FlowKac lies in its adaptive stochastic sampling scheme which significantly reduces the computational complexity while maintaining high accuracy. This sampling technique, coupled with a time-indexed normalizing flow, designed for capturing time-evolving probability densities, enables robust sampling of collocation points, resulting in a flexible and mesh-free solver. This formulation mitigates the curse of dimensionality and enhances computational efficiency and accuracy, which is particularly crucial for applications that inherently require dimensions beyond the conventional three. We validate the robustness and scalability of our method through various experiments on a range of stochastic differential equations, demonstrating significant improvements over existing techniques.

Updated: 2025-03-14 14:14:20

标题: FlowKac：使用时间归一化流和费曼卡公式的高效神经福克-普朗克求解器

摘要: 解决高维复杂动态系统的福克-普朗克方程仍然是一个关键但具有挑战性的任务，这是由于解析解的不可解性和传统数值方法的局限性。在这项工作中，我们提出了FlowKac，一种新颖的方法，通过使用费曼-卡克公式重新制定福克-普朗克方程，允许通过随机路径的期望值在给定点查询解决方案。FlowKac的一个关键创新在于其自适应随机抽样方案，显著降低了计算复杂性，同时保持了高精度。这种抽样技术，结合了针对捕捉时间演化概率密度而设计的时间索引归一化流，使得对配点的稳健抽样成为可能，从而实现了一种灵活且无网格的求解器。这种表述缓解了维度灾难，并增强了计算效率和准确性，这对那些本质上需要超出传统三维的应用特别重要。我们通过在一系列随机微分方程上进行的各种实验验证了我们方法的鲁棒性和可扩展性，展示了与现有技术相比的显著改进。

更新时间: 2025-03-14 14:14:20

领域: cs.LG,math.DS,stat.ML

下载: http://arxiv.org/abs/2503.11427v1

ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition

ANCHOLIK-NER is a linguistically diverse dataset for Named Entity Recognition (NER) in Bangla regional dialects, capturing variations across Sylhet, Chittagong, Barishal, Noakhali, and Mymensingh. The dataset has around 17,405 sentences, 3,481 sentences per region. The data was collected from two publicly available datasets and through web scraping from various online newspapers, articles. To ensure high-quality annotations, the BIO tagging scheme was employed, and professional annotators with expertise in regional dialects carried out the labeling process. The dataset is structured into separate subsets for each region and is available in CSV format. Each entry contains textual data along with identified named entities and their corresponding annotations. Named entities are categorized into ten distinct classes: Person, Location, Organization, Food, Animal, Colour, Role, Relation, Object, and Miscellaneous. This dataset serves as a valuable resource for developing and evaluating NER models for Bangla dialectal variations, contributing to regional language processing and low-resource NLP applications. It can be utilized to enhance NER systems in Bangla dialects, improve regional language understanding, and support applications in machine translation, information retrieval, and conversational AI.

Updated: 2025-03-14 14:13:50

标题: 安乔利克-纳尔：孟加拉地区命名实体识别的基准数据集

摘要: ANCHOLIK-NER是一个语言多样化的数据集，用于在孟加拉地区方言中进行命名实体识别（NER），捕捉了西尔赫特、吉大港、巴里萨尔、诺阿克里和迈门辛地区的变化。该数据集约有17,405个句子，每个地区有3,481个句子。数据是从两个公开可用的数据集和通过从各种在线报纸和文章中进行网络抓取收集而来的。为了确保高质量的标注，采用了BIO标记方案，并由专业标注员在地区方言方面的专业知识进行了标注过程。该数据集被分为每个地区的单独子集，并以CSV格式提供。每个条目包含文本数据以及识别的命名实体及其相应的注释。命名实体被分类为十个不同的类别：人物、地点、组织、食物、动物、颜色、角色、关系、物体和其他。该数据集是开发和评估孟加拉方言变体的NER模型的宝贵资源，有助于地区语言处理和低资源NLP应用。它可以用于增强孟加拉方言的NER系统，改进地区语言理解，并支持机器翻译、信息检索和对话AI应用。

更新时间: 2025-03-14 14:13:50

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.11198v2

From Generative AI to Innovative AI: An Evolutionary Roadmap

This paper explores the critical transition from Generative Artificial Intelligence (GenAI) to Innovative Artificial Intelligence (InAI). While recent advancements in GenAI have enabled systems to produce high-quality content across various domains, these models often lack the capacity for true innovation. In this context, innovation is defined as the ability to generate novel and useful outputs that go beyond mere replication of learned data. The paper examines this shift and proposes a roadmap for developing AI systems that can generate content and engage in autonomous problem-solving and creative ideation. The work provides both theoretical insights and practical strategies for advancing AI to a stage where it can genuinely innovate, contributing meaningfully to science, technology, and the arts.

Updated: 2025-03-14 14:03:28

标题: 从生成AI到创新AI：一条演进路线图

摘要: 本文探讨了从生成式人工智能（GenAI）到创新人工智能（InAI）的关键转变。虽然最近GenAI的进展使系统能够在各个领域生成高质量内容，但这些模型往往缺乏真正创新的能力。在这种情况下，创新被定义为生成超越学习数据简单复制的新颖和有用的输出能力。本文研究了这种转变，并提出了发展人工智能系统的路线图，这些系统能够生成内容并进行自主问题解决和创意构思。本工作提供了理论洞见和实践策略，以推进人工智能发展到能够真正创新的阶段，有助于科学、技术和艺术领域的有意义贡献。

更新时间: 2025-03-14 14:03:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11419v1

Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning

In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist, posing obstacles to the model training and performance. Existing studies on long-tailed noisy label learning (LTNLL) typically assume that the generation of noisy labels is independent of the long-tailed distribution, which may not be true from a practical perspective. In real-world situaiton, we observe that the tail class samples are more likely to be mislabeled as head, exacerbating the original degree of imbalance. We call this phenomenon as ``tail-to-head (T2H)'' noise. T2H noise severely degrades model performance by polluting the head classes and forcing the model to learn the tail samples as head. To address this challenge, we investigate the dynamic misleading process of the nosiy labels and propose a novel method called Disentangling and Unlearning for Long-tailed and Label-noisy data (DULL). It first employs the Inner-Feature Disentangling (IFD) to disentangle feature internally. Based on this, the Inner-Feature Partial Unlearning (IFPU) is then applied to weaken and unlearn incorrect feature regions correlated to wrong classes. This method prevents the model from being misled by noisy labels, enhancing the model's robustness against noise. To provide a controlled experimental environment, we further propose a new noise addition algorithm to simulate T2H noise. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness of our proposed method.

Updated: 2025-03-14 13:58:27

标题: 通过解缠和取消学习对长尾和标签噪声数据进行分类

摘要: 在真实世界的数据集中，长尾分布和嘈杂标签的挑战经常同时存在，给模型训练和性能带来障碍。现有关于长尾嘈杂标签学习（LTNLL）的研究通常假设嘈杂标签的生成与长尾分布独立，从实际角度来看可能并不正确。在真实世界情况下，我们观察到尾类样本更有可能被错误标记为头类，加剧了原始的不平衡程度。我们将这种现象称为“尾到头（T2H）”噪音。T2H噪音严重降低了模型性能，污染了头类并迫使模型将尾样本学习为头类。为了解决这一挑战，我们研究了嘈杂标签的动态误导过程，并提出了一种名为长尾和标签嘈杂数据解耦与反学习（DULL）的新方法。它首先采用内部特征解耦（IFD）来在内部解耦特征。基于此，接着应用内部特征部分反学习（IFPU）来削弱和反学习与错误类别相关的不正确特征区域。这种方法防止模型被嘈杂标签误导，增强了模型对噪音的鲁棒性。为提供一个可控的实验环境，我们进一步提出了一种新的噪音添加算法来模拟T2H噪音。对于模拟和真实世界的数据集进行了大量实验，证明了我们提出的方法的有效性。

更新时间: 2025-03-14 13:58:27

领域: cs.LG

下载: http://arxiv.org/abs/2503.11414v1

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Long-form video understanding is complicated by the high redundancy of video data and the abundance of query-irrelevant information. To tackle these challenges, we propose VideoTree, a training-free framework which builds a query-adaptive and hierarchical video representation for LLM reasoning over long-form videos. First, VideoTree extracts query-relevant information from the input video through an iterative process, progressively refining the selection of keyframes based on their relevance to the query. Furthermore, VideoTree leverages the inherent hierarchical structure of long video data, which is often overlooked by existing LLM-based methods. Specifically, we incorporate multi-granularity information into a tree-based representation, allowing VideoTree to extract query-relevant details from long videos in a coarse-to-fine manner. This enables the model to effectively handle a wide range of video queries with varying levels of detail. Finally, VideoTree aggregates the hierarchical query-relevant information within the tree structure and feeds it into an LLM reasoning model to answer the query. Our experiments show that our method improves both reasoning accuracy and efficiency. Specifically, VideoTree outperforms existing training-free approaches on EgoSchema and NExT-QA with less inference time, achieving 61.1% and 75.6% accuracy on the test set without additional video-specific training. Moreover, on the long split of Video-MME (average 44 minutes), VideoTree achieves better performance than GPT-4V and many other MLLMs that were extensively trained on video data.

Updated: 2025-03-14 13:57:16

标题: VideoTree：基于树的自适应视频表示，用于长视频上的LLM推理

摘要: 长视频的理解受到视频数据的高冗余性和查询无关信息的丰富影响。为了应对这些挑战，我们提出了VideoTree，这是一个无需训练的框架，它为LLM推理在长视频上构建一个查询自适应和分层视频表示。首先，VideoTree通过一个迭代过程从输入视频中提取与查询相关的信息，逐渐根据它们与查询的相关性来精炼关键帧的选择。此外，VideoTree利用长视频数据的固有分层结构，这经常被现有基于LLM的方法忽视。具体来说，我们将多粒度信息融入基于树的表示中，使VideoTree能够以粗到细的方式从长视频中提取与查询相关的细节。这使模型能够有效处理各种不同细节级别的视频查询。最后，VideoTree将树结构中的分层查询相关信息聚合起来，并将其输入到LLM推理模型中以回答查询。我们的实验表明，我们的方法提高了推理的准确性和效率。具体而言，VideoTree在EgoSchema和NExT-QA上的表现优于现有的无需训练方法，并且推理时间更短，在测试集上的准确率分别达到61.1%和75.6%，而无需额外的视频特定训练。此外，在Video-MME的长视频分割（平均44分钟）上，VideoTree的表现优于GPT-4V和许多其他在视频数据上进行广泛训练的MLLM。

更新时间: 2025-03-14 13:57:16

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.19209v3

Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models

Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs), enabling generalized learning and integrating contextual information. However, their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. Synthetic data emerge as a viable solution, addressing these challenges by offering scalable, unbiased, and high-quality alternatives. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.

Updated: 2025-03-14 13:53:46

标题: 使用合成数据赋能时间序列分析：在基础模型时代的调查和展望

摘要: 时间序列分析对于理解复杂系统的动态至关重要。最近基础模型的进展推动了任务无关的时间序列基础模型（TSFMs）和基于大型语言模型的时间序列模型（TSLLMs）的发展，实现了泛化学习和整合上下文信息。然而，它们的成功取决于大规模、多样化和高质量的数据集，由于监管、多样性、质量和数量的限制，这些数据集的建立具有挑战性。合成数据出现作为一种可行的解决方案，通过提供可扩展、无偏见和高质量的替代方案来解决这些挑战。本调查提供了关于TSFMs和TSLLMs的合成数据的全面回顾，分析了数据生成策略，在模型的预训练、微调和评估中的作用，并确定了未来的研究方向。

更新时间: 2025-03-14 13:53:46

领域: cs.LG

下载: http://arxiv.org/abs/2503.11411v1

A Neural Network Architecture Based on Attention Gate Mechanism for 3D Magnetotelluric Forward Modeling

Traditional three-dimensional magnetotelluric (MT) numerical forward modeling methods, such as the finite element method (FEM) and finite volume method (FVM), suffer from high computational costs and low efficiency due to limitations in mesh refinement and computational resources. We propose a novel neural network architecture named MTAGU-Net, which integrates an attention gating mechanism for 3D MT forward modeling. Specifically, a dual-path attention gating module is designed based on forward response data images and embedded in the skip connections between the encoder and decoder. This module enables the fusion of critical anomaly information from shallow feature maps during the decoding of deep feature maps, significantly enhancing the network's capability to extract features from anomalous regions. Furthermore, we introduce a synthetic model generation method utilizing 3D Gaussian random field (GRF), which accurately replicates the electrical structures of real-world geological scenarios with high fidelity. Numerical experiments demonstrate that MTAGU-Net outperforms conventional 3D U-Net in terms of convergence stability and prediction accuracy, with the structural similarity index (SSIM) of the forward response data consistently exceeding 0.98. Moreover, the network can accurately predict forward response data on previously unseen datasets models, demonstrating its strong generalization ability and validating the feasibility and effectiveness of this method in practical applications.

Updated: 2025-03-14 13:48:25

标题: 基于注意力门机制的神经网络架构用于3D磁 telluric正演建模

摘要: 传统的三维磁 telluric（MT）数值正演建模方法，如有限元方法（FEM）和有限体积方法（FVM），由于网格细化和计算资源的限制而导致计算成本高和效率低。我们提出了一种名为MTAGU-Net的新型神经网络架构，该架构集成了用于3D MT正演建模的注意力门控机制。具体而言，设计了一个双通道注意力门控模块，基于正演响应数据图像并嵌入在编码器和解码器之间的跳跃连接中。这个模块使得在解码深度特征图时能够融合来自浅层特征图的关键异常信息，显著增强了网络从异常区域提取特征的能力。此外，我们引入了一种利用3D高斯随机场（GRF）的合成模型生成方法，可以准确地复制真实地质场景的电气结构，具有高度的保真度。数值实验表明，MTAGU-Net在收敛稳定性和预测准确性方面优于传统的3D U-Net，正演响应数据的结构相似性指数（SSIM）一直超过0.98。此外，该网络可以准确预测先前未见的数据集上的正演响应数据模型，展示了其强大的泛化能力，并验证了在实际应用中这种方法的可行性和有效性。

更新时间: 2025-03-14 13:48:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11408v1

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

Purpose: We introduce a novel methodology for voice pathology detection using the publicly available Saarbr\"ucken Voice Database (SVD) and a robust feature set combining commonly used acoustic handcrafted features with two novel ones: pitch difference (relative variation in fundamental frequency) and NaN feature (failed fundamental frequency estimation). Methods: We evaluate six machine learning (ML) algorithms -- support vector machine, k-nearest neighbors, naive Bayes, decision tree, random forest, and AdaBoost -- using grid search for feasible hyperparameters and 20480 different feature subsets. Top 1000 classification models -- feature subset combinations for each ML algorithm are validated with repeated stratified cross-validation. To address class imbalance, we apply K-Means SMOTE to augment the training data. Results: Our approach achieves 85.61%, 84.69% and 85.22% unweighted average recall (UAR) for females, males and combined results respectively. We intentionally omit accuracy as it is a highly biased metric for imbalanced data. Conclusion: Our study demonstrates that by following the proposed methodology and feature engineering, there is a potential in detection of various voice pathologies using ML models applied to the simplest vocal task, a sustained utterance of the vowel /a:/. To enable easier use of our methodology and to support our claims, we provide a publicly available GitHub repository with DOI 10.5281/zenodo.13771573. Finally, we provide a REFORMS checklist to enhance readability, reproducibility and justification of our approach

Updated: 2025-03-14 13:47:31

标题: 可重复的基于机器学习的声音病理检测：引入音高差异特征

摘要: 目的：我们介绍了一种新颖的方法，利用公开可用的萨尔布鲁肯声音数据库（SVD）和一个强大的特征集，结合了常用的声学手工特征和两个新颖的特征：音高差（基频的相对变化）和NaN特征（基频估计失败）来检测声音病理。方法：我们评估了六种机器学习（ML）算法--支持向量机、k-最近邻、朴素贝叶斯、决策树、随机森林和AdaBoost--使用网格搜索寻找可行的超参数和20480种不同的特征子集。对于每个ML算法的前1000个分类模型--特征子集组合通过重复分层交叉验证进行验证。为了解决类别不平衡问题，我们应用了K-Means SMOTE来增加训练数据。结果：我们的方法分别实现了女性、男性和合并结果的85.61%、84.69%和85.22%的未加权平均召回率（UAR）。我们有意忽略准确性，因为它是对不平衡数据高度偏见的度量。结论：我们的研究表明，通过遵循提出的方法和特征工程，利用应用于最简单的语音任务之一，即持续发音元音/a:/，有可能检测各种声音病理。为了更容易使用我们的方法并支持我们的主张，我们提供了一个公开可用的GitHub存储库，DOI为10.5281/zenodo.13771573。最后，我们提供了一个REFORMS清单，以增强我们方法的可读性、可重现性和合理性。

更新时间: 2025-03-14 13:47:31

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.10537v3

Towards A Correct Usage of Cryptography in Semantic Watermarks for Diffusion Models

Semantic watermarking methods enable the direct integration of watermarks into the generation process of latent diffusion models by only modifying the initial latent noise. One line of approaches building on Gaussian Shading relies on cryptographic primitives to steer the sampling process of the latent noise. However, we identify several issues in the usage of cryptographic techniques in Gaussian Shading, particularly in its proof of lossless performance and key management, causing ambiguity in follow-up works, too. In this work, we therefore revisit the cryptographic primitives for semantic watermarking. We introduce a novel, general proof of lossless performance based on IND\$-CPA security for semantic watermarks. We then discuss the configuration of the cryptographic primitives in semantic watermarks with respect to security, efficiency, and generation quality.

Updated: 2025-03-14 13:45:46

标题: 朝向在扩散模型中正确使用密码学的语义水印

摘要: 语义水印方法使水印直接集成到潜在扩散模型的生成过程中，只需修改初始潜在噪音。一种基于高斯着色的方法依赖于密码原语来引导潜在噪音的采样过程。然而，我们发现在高斯着色中使用密码技术存在几个问题，特别是在其无损性能和密钥管理的证明中，也导致后续工作中存在歧义。因此，在本文中，我们重新审视语义水印的密码原语。我们引入了一种基于IND\$-CPA安全性的语义水印无损性能的新型通用证明。然后，我们讨论了语义水印中密码原语的配置，以确保安全性、效率和生成质量。

更新时间: 2025-03-14 13:45:46

领域: cs.CR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.11404v1

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

Diffusion models have achieved remarkable advances in various image generation tasks. However, their performance notably declines when generating images at resolutions higher than those used during the training period. Despite the existence of numerous methods for producing high-resolution images, they either suffer from inefficiency or are hindered by complex operations. In this paper, we propose RectifiedHR, an straightforward and efficient solution for training-free high-resolution image generation. Specifically, we introduce the noise refresh strategy, which theoretically only requires a few lines of code to unlock the model's high-resolution generation ability and improve efficiency. Additionally, we first observe the phenomenon of energy decay that may cause image blurriness during the high-resolution image generation process. To address this issue, we introduce average latent energy analysis and discover that an improved classifier-free guidance hyperparameter can significantly enhance generation performance. Our method is entirely training-free and boasts a simple implementation logic and efficient performance. Through extensive comparisons with numerous baseline methods, our RectifiedHR demonstrates superior effectiveness and efficiency.

Updated: 2025-03-14 13:40:17

标题: RectifiedHR：通过能量矫正实现高效高分辨率图像生成

摘要: 扩散模型在各种图像生成任务中取得了显著进展。然而，在生成分辨率高于训练期间使用的分辨率时，它们的性能明显下降。尽管存在许多用于生成高分辨率图像的方法，但它们要么效率低下，要么受到复杂操作的限制。在本文中，我们提出了一种名为RectifiedHR的简单而高效的解决方案，用于无需训练的高分辨率图像生成。具体地，我们引入了噪声刷新策略，理论上只需要几行代码就能解锁模型的高分辨率生成能力并提高效率。此外，我们首次观察到能量衰减现象可能在高分辨率图像生成过程中导致图像模糊。为了解决这个问题，我们引入了平均潜在能量分析，并发现一个改进的无分类器引导超参数可以显著提升生成性能。我们的方法完全无需训练，具有简单的实现逻辑和高效的性能。通过与许多基线方法进行广泛比较，我们的RectifiedHR展示了卓越的有效性和效率。

更新时间: 2025-03-14 13:40:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.02537v2

Towards Sample-specific Backdoor Attack with Clean Labels via Attribute Trigger

Currently, sample-specific backdoor attacks (SSBAs) are the most advanced and malicious methods since they can easily circumvent most of the current backdoor defenses. In this paper, we reveal that SSBAs are not sufficiently stealthy due to their poisoned-label nature, where users can discover anomalies if they check the image-label relationship. In particular, we demonstrate that it is ineffective to directly generalize existing SSBAs to their clean-label variants by poisoning samples solely from the target class. We reveal that it is primarily due to two reasons, including \textbf{(1)} the `antagonistic effects' of ground-truth features and \textbf{(2)} the learning difficulty of sample-specific features. Accordingly, trigger-related features of existing SSBAs cannot be effectively learned under the clean-label setting due to their mild trigger intensity required for ensuring stealthiness. We argue that the intensity constraint of existing SSBAs is mostly because their trigger patterns are `content-irrelevant' and therefore act as `noises' for both humans and DNNs. Motivated by this understanding, we propose to exploit content-relevant features, $a.k.a.$ (human-relied) attributes, as the trigger patterns to design clean-label SSBAs. This new attack paradigm is dubbed backdoor attack with attribute trigger (BAAT). Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our BAAT and its resistance to existing defenses.

Updated: 2025-03-14 13:36:51

标题: 朝向通过属性触发器实现具有干净标签的样本特定后门攻击

摘要: 目前，样本特定的后门攻击（SSBAs）是最先进和恶意的方法，因为它们可以轻松规避大多数当前的后门防御措施。在本文中，我们揭示了由于其有毒标签的本质，SSBAs并不足够隐蔽，用户可以通过检查图像标签关系来发现异常。具体而言，我们证明直接将现有的SSBAs泛化为其干净标签变体是无效的，因为仅仅污染目标类别的样本。我们揭示这主要是由于两个原因，包括（1）地面真实特征的"对抗效应"和（2）样本特定特征的学习困难。因此，由于确保隐蔽性所需的触发器强度较弱，现有SSBAs的触发器相关特征在干净标签设置下无法有效学习。我们认为现有SSBAs的强度约束主要是因为它们的触发器模式是"与内容无关的"，因此对于人类和DNNs都起到"噪音"的作用。受到这一理解的启发，我们提出利用内容相关特征，即（人类依赖的）属性，作为触发器模式来设计干净标签SSBAs。这种新的攻击范式被称为具有属性触发器的后门攻击（BAAT）。我们在基准数据集上进行了大量实验，验证了我们的BAAT的有效性及其对现有防御措施的抵抗力。

更新时间: 2025-03-14 13:36:51

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.04584v3

Deepfake Detection of Face Images based on a Convolutional Neural Network

Fake News and especially deepfakes (generated, non-real image or video content) have become a serious topic over the last years. With the emergence of machine learning algorithms it is now easier than ever before to generate such fake content, even for private persons. This issue of generated fake images is especially critical in the context of politics and public figures. We want to address this conflict by building a model based on a Convolutions Neural Network in order to detect such generated and fake images showing human portraits. As a basis, we use a pre-trained ResNet-50 model due to its effectiveness in terms of classifying images. We then adopted the base model to our task of classifying a single image as authentic/real or fake by adding an fully connected output layer containing a single neuron indicating the authenticity of an image. We applied fine tuning and transfer learning to develop the model and improve its parameters. For the training process we collected the image data set "Diverse Face Fake Dataset" containing a wide range of different image manipulation methods and also diversity in terms of faces visible on the images. With our final model we reached the following outstanding performance metrics: precision = 0.98, recall 0.96, F1-Score = 0.97 and an area-under-curve = 0.99.

Updated: 2025-03-14 13:33:22

标题: 基于卷积神经网络的人脸图像深度伪造检测

摘要: 虚假新闻，尤其是深度伪造（生成的非真实图像或视频内容）在过去几年已成为一个严重话题。随着机器学习算法的出现，现在比以往任何时候都更容易生成这种虚假内容，即使是对私人。在政治和公众人物的背景下，生成的虚假图像尤为关键。我们希望通过基于卷积神经网络构建模型来检测显示人类肖像的生成和虚假图像，以解决这一冲突。作为基础，我们使用了预训练的ResNet-50模型，因为它在图像分类方面的有效性。然后，我们将基础模型应用于将单个图像分类为真实或虚假的任务，通过添加一个包含单个神经元的全连接输出层来指示图像的真实性。我们应用精细调整和迁移学习来开发模型并改进其参数。在训练过程中，我们收集了包含各种不同图像处理方法以及在图像上可见的面孔多样性的图像数据集“多元面孔伪造数据集”。通过我们的最终模型，我们达到了以下出色的性能指标：精度=0.98，召回率0.96，F1分数=0.97，曲线下面积=0.99。

更新时间: 2025-03-14 13:33:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11389v1

Hierarchical Information-Guided Spatio-Temporal Mamba for Stock Time Series Forecasting

Mamba has demonstrated excellent performance in various time series forecasting tasks due to its superior selection mechanism. Nevertheless, conventional Mamba-based models encounter significant challenges in accurately predicting stock time series, as they fail to adequately capture both the overarching market dynamics and the intricate interdependencies among individual stocks. To overcome these constraints, we introduce the Hierarchical Information-Guided Spatio-Temporal Mamba (HIGSTM) framework. HIGSTM introduces Index-Guided Frequency Filtering Decomposition to extract commonality and specificity from time series. The model architecture features a meticulously designed hierarchical framework that systematically captures both temporal dynamic patterns and global static relationships within the stock market. Furthermore, we propose an Information-Guided Mamba that integrates macro informations into the sequence selection process, thereby facilitating more market-conscious decision-making. Comprehensive experimental evaluations conducted on the CSI500, CSI800 and CSI1000 datasets demonstrate that HIGSTM achieves state-of-the-art performance.

Updated: 2025-03-14 13:30:38

标题: 层次化信息引导的时空马巴模型在股票时间序列预测中的应用

摘要: 蛇形神经网络在各种时间序列预测任务中表现出色，这得益于其优越的选择机制。然而，传统的基于蛇形神经网络的模型在准确预测股票时间序列方面面临重大挑战，因为它们未能充分捕捉整体市场动态和个股之间错综复杂的相互依赖关系。为了克服这些限制，我们引入了分层信息引导的时空蛇形神经网络（HIGSTM）框架。HIGSTM引入了指数引导频率滤波分解来从时间序列中提取共性和特异性。该模型架构具有精心设计的分层框架，系统地捕捉了股市内的时间动态模式和全局静态关系。此外，我们提出了一种信息引导的蛇形神经网络，将宏观信息整合到序列选择过程中，从而促进更加市场意识的决策制定。在CSI500、CSI800和CSI1000数据集上进行的全面实验评估表明，HIGSTM实现了最先进的性能。

更新时间: 2025-03-14 13:30:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11387v1

Hyperparameter Selection in Continual Learning

In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.

Updated: 2025-03-14 13:30:09

标题: 持续学习中的超参数选择

摘要: 在持续学习（CL）中，学习者在数据流上进行训练，标准的超参数优化（HPO）无法应用，因为学习者无法同时访问所有数据。这促使了CL特定HPO框架的发展。在CL中调整超参数的最流行方法是使用不同的超参数设置重复训练整个数据流。然而，在实践中，这种训练结束后的HPO是无法使用的，因为学习者只能看到数据流一次。因此，一个悬而未决的问题是：在现实中，从业者应该使用哪种HPO框架来解决CL问题？本文通过比较几种现实的HPO框架来探讨这个问题。我们发现考虑的所有HPO框架，包括训练结束时的HPO，在流行的CL基准测试中表现并不比其他框架更好。因此，我们得出双重结论：a）为了能够区分HPO框架，需要超越当前最常用的CL基准测试，b）在考虑的流行CL基准测试中，CL从业者应该使用一个现实的HPO框架，并且可以基于与性能无关的因素选择，例如计算效率。

更新时间: 2025-03-14 13:30:09

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.06466v2

Emergent Abilities in Large Language Models: A Survey

Large Language Models (LLMs) are leading a new technological revolution as one of the most promising research streams toward artificial general intelligence. The scaling of these models, accomplished by increasing the number of parameters and the magnitude of the training datasets, has been linked to various so-called emergent abilities that were previously unobserved. These emergent abilities, ranging from advanced reasoning and in-context learning to coding and problem-solving, have sparked an intense scientific debate: Are they truly emergent, or do they simply depend on external factors, such as training dynamics, the type of problems, or the chosen metric? What underlying mechanism causes them? Despite their transformative potential, emergent abilities remain poorly understood, leading to misconceptions about their definition, nature, predictability, and implications. In this work, we shed light on emergent abilities by conducting a comprehensive review of the phenomenon, addressing both its scientific underpinnings and real-world consequences. We first critically analyze existing definitions, exposing inconsistencies in conceptualizing emergent abilities. We then explore the conditions under which these abilities appear, evaluating the role of scaling laws, task complexity, pre-training loss, quantization, and prompting strategies. Our review extends beyond traditional LLMs and includes Large Reasoning Models (LRMs), which leverage reinforcement learning and inference-time search to amplify reasoning and self-reflection. However, emergence is not inherently positive. As AI systems gain autonomous reasoning capabilities, they also develop harmful behaviors, including deception, manipulation, and reward hacking. We highlight growing concerns about safety and governance, emphasizing the need for better evaluation frameworks and regulatory oversight.

Updated: 2025-03-14 13:28:04

标题: 大型语言模型中的紧急能力：一项调查

摘要: 大型语言模型（LLMs）正在引领一场新的技术革命，作为通向人工通用智能最有前途的研究方向之一。这些模型的扩展是通过增加参数数量和训练数据集的规模来实现的，已经与各种所谓的新兴能力联系在一起，这些能力以前未曾被观察到。这些新兴能力涵盖了从高级推理和上下文学习到编码和问题解决的范围，引发了激烈的科学争论：它们真的是新兴的吗，还是仅仅取决于外部因素，比如训练动态、问题类型或选择的度量标准？是什么潜在机制导致了它们？尽管具有变革潜力，新兴能力仍然被认识不足，导致对其定义、性质、可预测性和影响的误解。在这项工作中，我们通过对这一现象的全面审查，揭示了新兴能力的光芒，探讨了其科学基础和现实世界后果。我们首先批判性地分析现有的定义，揭示了在概念化新兴能力方面的不一致性。然后，我们探索了这些能力出现的条件，评估了扩展定律、任务复杂度、预训练损失、量化和提示策略的作用。我们的审查超越了传统的LLMs，包括利用强化学习和推理时搜索来增强推理和自我反思的大型推理模型（LRMs）。然而，新兴并不是必然积极的。随着AI系统获得自主推理能力，它们也会发展出有害行为，包括欺骗、操纵和奖励黑客。我们强调了关于安全和治理的日益增长的担忧，强调了对更好的评估框架和监管监督的需要。

更新时间: 2025-03-14 13:28:04

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.05788v2

Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages

Patients with diabetes are at increased risk of comorbid depression or anxiety, complicating their management. This study evaluated the performance of large language models (LLMs) in detecting these symptoms from secure patient messages. We applied multiple approaches, including engineered prompts, systemic persona, temperature adjustments, and zero-shot and few-shot learning, to identify the best-performing model and enhance performance. Three out of five LLMs demonstrated excellent performance (over 90% of F-1 and accuracy), with Llama 3.1 405B achieving 93% in both F-1 and accuracy using a zero-shot approach. While LLMs showed promise in binary classification and handling complex metrics like Patient Health Questionnaire-4, inconsistencies in challenging cases warrant further real-life assessment. The findings highlight the potential of LLMs to assist in timely screening and referrals, providing valuable empirical knowledge for real-world triage systems that could improve mental health care for patients with chronic diseases.

Updated: 2025-03-14 13:27:35

标题: 优化大型语言模型以检测慢性疾病患者信息中共病抑郁症或焦虑症状的研究成果

摘要: 患有糖尿病的患者患有共病抑郁症或焦虑症的风险增加，使他们的管理变得复杂。本研究评估了大型语言模型（LLMs）在从安全患者消息中检测这些症状的表现。我们应用了多种方法，包括工程提示、系统人物、温度调整以及零射击和少射击学习，以确定表现最佳的模型并增强性能。五种LLMs中有三种表现出色（超过90％的F-1和准确性），其中Llama 3.1 405B使用零射击方法在F-1和准确性方面均达到93％。虽然LLMs在二元分类和处理诸如患者健康问卷-4之类的复杂指标方面表现出潜力，但在挑战性案例中存在不一致性，需要进一步进行现实生活评估。研究结果突显了LLMs在及时筛查和转诊方面的潜力，为现实世界中可能改善患有慢性疾病患者心理健康护理的三级制度提供了宝贵的经验知识。

更新时间: 2025-03-14 13:27:35

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.11384v1

Annotating Scientific Uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches

UnScientify, a system designed to detect scientific uncertainty in scholarly full text. The system utilizes a weakly supervised technique to identify verbally expressed uncertainty in scientific texts and their authorial references. The core methodology of UnScientify is based on a multi-faceted pipeline that integrates span pattern matching, complex sentence analysis and author reference checking. This approach streamlines the labeling and annotation processes essential for identifying scientific uncertainty, covering a variety of uncertainty expression types to support diverse applications including information retrieval, text mining and scientific document processing. The evaluation results highlight the trade-offs between modern large language models (LLMs) and the UnScientify system. UnScientify, which employs more traditional techniques, achieved superior performance in the scientific uncertainty detection task, attaining an accuracy score of 0.808. This finding underscores the continued relevance and efficiency of UnScientify's simple rule-based and pattern matching strategy for this specific application. The results demonstrate that in scenarios where resource efficiency, interpretability, and domain-specific adaptability are critical, traditional methods can still offer significant advantages.

Updated: 2025-03-14 13:21:59

标题: 注释科学不确定性：使用语言模式和与现有方法的比较的综合模型

摘要: UnScientify是一个设计用于检测学术全文中科学不确定性的系统。该系统利用弱监督技术来识别科学文本和作者参考中口头表达的不确定性。UnScientify的核心方法基于多方面的流程，集成了跨度模式匹配、复杂句子分析和作者参考检查。这种方法简化了识别科学不确定性所必需的标记和注释过程，涵盖了各种不确定性表达类型，以支持包括信息检索、文本挖掘和科学文档处理在内的各种应用。评估结果突显了现代大型语言模型（LLMs）和UnScientify系统之间的权衡。UnScientify采用更传统的技术，在科学不确定性检测任务中取得了优越的表现，达到了0.808的准确度分数。这一发现强调了UnScientify简单基于规则和模式匹配策略在这一特定应用中持续的相关性和效率。结果表明，在资源效率、可解释性和领域特定适应性至关重要的场景中，传统方法仍然可以提供显著优势。

更新时间: 2025-03-14 13:21:59

领域: cs.CL,cs.AI,cs.DL

下载: http://arxiv.org/abs/2503.11376v1

Exploring Performance-Complexity Trade-Offs in Sound Event Detection

We target the problem of developing new low-complexity networks for the sound event detection task. Our goal is to meticulously analyze the performance-complexity trade-off, aiming to be competitive with the large state-of-the-art models, at a fraction of the computational requirements. We find that low-complexity convolutional models previously proposed for audio tagging can be effectively adapted for event detection (which requires frame-wise prediction) by adjusting convolutional strides, removing the global pooling, and, importantly, adding a sequence model before the (now frame-wise) classification heads. Systematic experiments reveal that the best choice for the sequence model type depends on which complexity metric is most important for the given application. We also investigate the impact of enhanced training strategies such as knowledge distillation. In the end, we show that combined with an optimized training strategy, we can reach event detection performance comparable to state-of-the-art transformers while requiring only around 5% of the parameters. We release all our pre-trained models and the code for reproducing this work to support future research in low-complexity sound event detection at https://github.com/theMoro/EfficientSED.

Updated: 2025-03-14 13:18:02

标题: 在声音事件检测中探索性能与复杂性之间的权衡Trade-Offs

摘要: 我们致力于开发新的低复杂度网络，用于声音事件检测任务。我们的目标是精心分析性能与复杂度之间的权衡，力求在计算要求的一小部分的情况下与大型最先进模型竞争。我们发现，先前为音频标记提出的低复杂度卷积模型可以通过调整卷积步幅、删除全局池化，并且重要的是，在（现在是逐帧的）分类头之前添加一个序列模型，有效地适应需要逐帧预测的事件检测任务。系统实验表明，序列模型类型的最佳选择取决于给定应用程序最重要的复杂度指标。我们还调查了增强训练策略（如知识蒸馏）的影响。最后，我们展示了结合优化的训练策略，我们可以达到与最先进的变压器相媲美的事件检测性能，而只需要大约5%的参数。我们发布了所有我们的预训练模型和用于重现这项工作的代码，以支持未来在低复杂度声音事件检测领域的研究。

更新时间: 2025-03-14 13:18:02

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2503.11373v1

Fine-Grained and Multi-Dimensional Metrics for Document-Level Machine Translation

Large language models (LLMs) have excelled in various NLP tasks, including machine translation (MT), yet most studies focus on sentence-level translation. This work investigates the inherent capability of instruction-tuned LLMs for document-level translation (docMT). Unlike prior approaches that require specialized techniques, we evaluate LLMs by directly prompting them to translate entire documents in a single pass. Our results show that this method improves translation quality compared to translating sentences separately, even without document-level fine-tuning. However, this advantage is not reflected in BLEU scores, which often favor sentence-based translations. We propose using the LLM-as-a-judge paradigm for evaluation, where GPT-4 is used to assess document coherence, accuracy, and fluency in a more nuanced way than n-gram-based metrics. Overall, our work demonstrates that instruction-tuned LLMs can effectively leverage document context for translation. However, we caution against using BLEU scores for evaluating docMT, as they often provide misleading outcomes, failing to capture the quality of document-level translation. Code and the outputs from GPT4-as-a-judge are available at https://github.com/EIT-NLP/BLEUless_DocMT

Updated: 2025-03-14 13:12:38

标题: 细粒度和多维度指标用于文档级机器翻译

摘要: 大型语言模型（LLMs）在各种自然语言处理任务中表现出色，包括机器翻译（MT），但大多数研究集中在句子级别翻译。本研究调查了调整指导的LLMs在文档级翻译（docMT）中的固有能力。与先前需要专门技术的方法不同，我们通过直接提示LLMs在单次传递中翻译整个文档来评估LLMs。我们的结果表明，与单独翻译句子相比，这种方法改善了翻译质量，即使没有文档级微调。然而，这种优势在BLEU分数中没有体现，BLEU分数通常偏向基于句子的翻译。我们提出使用LLM作为评估的判断范式，其中使用GPT-4以更加细致的方式评估文档的连贯性、准确性和流畅性，而不是基于n-gram的度量。总的来说，我们的工作证明了调整指导的LLMs可以有效利用文档上下文进行翻译。然而，我们警告不要使用BLEU分数来评估docMT，因为它们经常提供误导性的结果，未能捕捉文档级翻译的质量。代码和GPT4作为判断的输出可在https://github.com/EIT-NLP/BLEUless_DocMT找到。

更新时间: 2025-03-14 13:12:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20941v3

Data Pruning in Generative Diffusion Models

Data pruning is the problem of identifying a core subset that is most beneficial to training and discarding the remainder. While pruning strategies are well studied for discriminative models like those used in classification, little research has gone into their application to generative models. Generative models aim to estimate the underlying distribution of the data, so presumably they should benefit from larger datasets. In this work we aim to shed light on the accuracy of this statement, specifically answer the question of whether data pruning for generative diffusion models could have a positive impact. Contrary to intuition, we show that eliminating redundant or noisy data in large datasets is beneficial particularly when done strategically. We experiment with several pruning methods including recent-state-of-art methods, and evaluate over CelebA-HQ and ImageNet datasets. We demonstrate that a simple clustering method outperforms other sophisticated and computationally demanding methods. We further exhibit how we can leverage clustering to balance skewed datasets in an unsupervised manner to allow fair sampling for underrepresented populations in the data distribution, which is a crucial problem in generative models.

Updated: 2025-03-14 13:11:28

标题: 生成扩散模型中的数据修剪

摘要: 数据修剪是识别对训练最有益并且丢弃其余部分的核心子集的问题。虽然对于用于分类的判别模型等模型，修剪策略已经得到了深入研究，但对于它们在生成模型中的应用却鲜有研究。生成模型旨在估计数据的潜在分布，因此它们应该受益于更大的数据集。在这项工作中，我们旨在阐明这个陈述的准确性，特别回答数据修剪对生成扩散模型是否会产生积极影响的问题。与直觉相反，我们表明，在大型数据集中消除冗余或噪声数据在战略上进行时是有益的。我们尝试了几种修剪方法，包括最新的方法，并在CelebA-HQ和ImageNet数据集上进行评估。我们展示了一个简单的聚类方法在性能上优于其他复杂且计算需求高的方法。我们进一步展示了如何利用聚类以无监督的方式平衡倾斜的数据集，以允许对数据分布中代表性不足的人群进行公平抽样，这在生成模型中是一个关键问题。

更新时间: 2025-03-14 13:11:28

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2411.12523v3

CoPAL: Corrective Planning of Robot Actions with Large Language Models

In the pursuit of fully autonomous robotic systems capable of taking over tasks traditionally performed by humans, the complexity of open-world environments poses a considerable challenge. Addressing this imperative, this study contributes to the field of Large Language Models (LLMs) applied to task and motion planning for robots. We propose a system architecture that orchestrates a seamless interplay between multiple cognitive levels, encompassing reasoning, planning, and motion generation. At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans. We demonstrate the efficacy of the proposed feedback architecture, particularly its impact on executability, correctness, and time complexity via empirical evaluation in the context of a simulation and two intricate real-world scenarios: blocks world, barman and pizza preparation.

Updated: 2025-03-14 13:03:24

标题: CoPAL：使用大型语言模型进行机器人动作的纠正规划

摘要: 在追求完全自主的机器人系统能够接管人类传统执行的任务时，开放世界环境的复杂性构成了一个重大挑战。为了解决这一问题，本研究为大型语言模型（LLMs）应用于机器人任务和运动规划领域做出了贡献。我们提出了一个系统架构，协调了多个认知层次之间的无缝互动，包括推理、规划和运动生成。其核心是一种处理生成计划中的物理基础、逻辑和语义错误的新型重新规划策略。我们通过在模拟环境和两个复杂的真实场景（积木世界、酒吧和披萨准备）中的实证评估展示了所提出的反馈架构的有效性，特别是其对可执行性、正确性和时间复杂性的影响。

更新时间: 2025-03-14 13:03:24

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2310.07263v2

Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

Knowledge Distillation (KD) is a widespread technique for compressing the knowledge of large models into more compact and efficient models. KD has proved to be highly effective in building well-performing low-complexity Acoustic Scene Classification (ASC) systems and was used in all the top-ranked submissions to this task of the annual DCASE challenge in the past three years. There is extensive research available on establishing the KD process, designing efficient student models, and forming well-performing teacher ensembles. However, less research has been conducted on investigating which teacher model attributes are beneficial for low-complexity students. In this work, we try to close this gap by studying the effects on the student's performance when using different teacher network architectures, varying the teacher model size, training them with different device generalization methods, and applying different ensembling strategies. The results show that teacher model sizes, device generalization methods, the ensembling strategy and the ensemble size are key factors for a well-performing student network.

Updated: 2025-03-14 12:57:12

标题: 在声学场景分类中创建一个优秀的知识蒸馏教师

摘要: 知识蒸馏（KD）是一种广泛应用的技术，用于将大型模型的知识压缩到更紧凑高效的模型中。KD已被证明在构建性能优良的低复杂度声学场景分类（ASC）系统方面非常有效，并在过去三年的年度DCASE挑战赛的所有排名靠前的提交中被使用。已有大量研究可用于建立KD过程、设计高效的学生模型以及形成性能优良的教师集合。然而，对于哪些教师模型属性对于低复杂度学生有益的研究较少。在这项工作中，我们尝试通过研究使用不同的教师网络架构、变化教师模型大小、以不同设备泛化方法训练它们，以及应用不同的集成策略时对学生性能的影响来弥补这一空白。结果表明，教师模型大小、设备泛化方法、集成策略和集合大小是学生网络表现良好的关键因素。

更新时间: 2025-03-14 12:57:12

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2503.11363v1

PARIC: Probabilistic Attention Regularization for Language Guided Image Classification from Pre-trained Vison Language Models

Language-guided attention frameworks have significantly enhanced both interpretability and performance in image classification; however, the reliance on deterministic embeddings from pre-trained vision-language foundation models to generate reference attention maps frequently overlooks the intrinsic multivaluedness and ill-posed characteristics of cross-modal mappings. To address these limitations, we introduce PARIC, a probabilistic framework for guiding visual attention via language specifications. Our approach enables pre-trained vision-language models to generate probabilistic reference attention maps, which align textual and visual modalities more effectively while incorporating uncertainty estimates, as compared to their deterministic counterparts. Experiments on benchmark test problems demonstrate that PARIC enhances prediction accuracy, mitigates bias, ensures consistent predictions, and improves robustness across various datasets.

Updated: 2025-03-14 12:53:37

标题: PARIC：预训练视觉语言模型引导的语言指导图像分类的概率注意力正则化

摘要: Language-guided attention frameworks have significantly improved both interpretability and performance in image classification. However, relying on deterministic embeddings from pre-trained vision-language models to generate reference attention maps often overlooks the multi-valued and ill-posed nature of cross-modal mappings. In order to address these limitations, we propose PARIC, a probabilistic framework for guiding visual attention based on language specifications. Our approach allows pre-trained vision-language models to generate probabilistic reference attention maps, which better align textual and visual modalities while incorporating uncertainty estimates, in contrast to deterministic approaches. Experiments on standard test problems show that PARIC enhances prediction accuracy, reduces bias, ensures consistent predictions, and improves robustness across different datasets.

更新时间: 2025-03-14 12:53:37

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.11360v1

BOWL: A Deceptively Simple Open World Learner

Traditional machine learning excels on static benchmarks, but the real world is dynamic and seldom as carefully curated as test sets. Practical applications may generally encounter undesired inputs, are required to deal with novel information, and need to ensure operation through their full lifetime - aspects where standard deep models struggle. These three elements may have been researched individually, but their practical conjunction, i.e., open world learning, is much less consolidated. In this paper, we posit that neural networks already contain a powerful catalyst to turn them into open world learners: the batch normalization layer. Leveraging its tracked statistics, we derive effective strategies to detect in- and out-of-distribution samples, select informative data points, and update the model continuously. This, in turn, allows us to demonstrate that existing batch-normalized models can be made more robust, less prone to forgetting over time, and be trained efficiently with less data.

Updated: 2025-03-14 12:41:59

标题: BOWL：一个看似简单的开放世界学习者

摘要: 传统机器学习在静态基准上表现出色，但现实世界是动态的，很少像测试集那样精心策划。实际应用可能通常会遇到不良输入，需要处理新颖信息，并且需要确保在其整个寿命期间进行操作-这些方面是标准深度模型所困难的地方。这三个元素可能已经被单独研究，但它们的实际结合，即开放世界学习，远没有得到充分巩固。在本文中，我们认为神经网络已经包含了一个强大的催化剂，让它们成为开放世界学习者：批标准化层。利用其跟踪的统计数据，我们推导出有效的策略来检测内部和外部分布样本，选择信息量大的数据点，并持续更新模型。这反过来使我们能够证明现有的批标准化模型可以更加稳健，不易随时间而遗忘，并且可以用更少的数据进行高效训练。

更新时间: 2025-03-14 12:41:59

领域: cs.LG

下载: http://arxiv.org/abs/2402.04814v2

An experimental approach on Few Shot Class Incremental Learning

Few-Shot Class-Incremental Learning (FSCIL) represents a cutting-edge paradigm within the broader scope of machine learning, designed to empower models with the ability to assimilate new classes of data with limited examples while safeguarding existing knowledge. The paper will present different solutions which contain extensive experiments across large-scale datasets, domain shifts, and network architectures to evaluate and compare the selected methods. We highlight their advantages and then present an experimental approach with the purpose of improving the most promising one by replacing the visual-language (V-L) model (CLIP) with another V-L model (CLOOB) that seem to outperform it on zero-shot learning tasks. The aim of this report is to present an experimental method for FSCIL that would improve its performance. We also plan to offer an overview followed by an analysis of the recent advancements in FSCIL domain, focusing on various strategies to mitigate catastrophic forgetting and improve the adaptability of models to evolving tasks and datasets.

Updated: 2025-03-14 12:36:15

标题: 一个关于少样本分类增量学习的实验方法

摘要: Few-Shot Class-Incremental Learning（FSCIL）代表了机器学习更广泛范围内的前沿范式，旨在赋予模型能力，即在有限示例的情况下吸收新数据类别同时保护现有知识。本文将提出不同的解决方案，涵盖大规模数据集、领域转移和网络架构上的广泛实验，以评估和比较所选方法。我们强调它们的优势，然后提出一种实验方法，旨在通过将视觉语言（V-L）模型（CLIP）替换为另一个V-L模型（CLOOB），从而改进性能，后者似乎在零样本学习任务上表现优异。本报告的目的是提出一种改进FSCIL性能的实验方法。我们还计划提供概述，然后分析FSCIL领域的最新进展，重点关注各种策略以减轻灾难性遗忘，并提高模型对不断发展的任务和数据集的适应性。

更新时间: 2025-03-14 12:36:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11349v1

Enhanced Low-Dose CT Image Reconstruction by Domain and Task Shifting Gaussian Denoisers

Computed tomography from a low radiation dose (LDCT) is challenging due to high noise in the projection data. Popular approaches for LDCT image reconstruction are two-stage methods, typically consisting of the filtered backprojection (FBP) algorithm followed by a neural network for LDCT image enhancement. Two-stage methods are attractive for their simplicity and potential for computational efficiency, typically requiring only a single FBP and a neural network forward pass for inference. However, the best reconstruction quality is currently achieved by unrolled iterative methods (Learned Primal-Dual and ItNet), which are more complex and thus have a higher computational cost for training and inference. We propose a method combining the simplicity and efficiency of two-stage methods with state-of-the-art reconstruction quality. Our strategy utilizes a neural network pretrained for Gaussian noise removal from natural grayscale images, fine-tuned for LDCT image enhancement. We call this method FBP-DTSGD (Domain and Task Shifted Gaussian Denoisers) as the fine-tuning is a task shift from Gaussian denoising to enhancing LDCT images and a domain shift from natural grayscale to LDCT images. An ablation study with three different pretrained Gaussian denoisers indicates that the performance of FBP-DTSGD does not depend on a specific denoising architecture, suggesting future advancements in Gaussian denoising could benefit the method. The study also shows that pretraining on natural images enhances LDCT reconstruction quality, especially with limited training data. Notably, pretraining involves no additional cost, as existing pretrained models are used. The proposed method currently holds the top mean position in the LoDoPaB-CT challenge.

Updated: 2025-03-14 12:30:28

标题: 通过领域和任务转移高斯去噪器增强低剂量CT图像重建

摘要: 低剂量辐射计算机断层扫描（LDCT）的挑战在于投影数据中存在高噪音。用于LDCT图像重建的流行方法是两阶段方法，通常包括经过滤反投影（FBP）算法后跟随神经网络进行LDCT图像增强。两阶段方法因其简单性和潜在的计算效率而具有吸引力，通常只需要进行一次FBP和神经网络前向传递进行推理。然而，目前最佳重建质量是通过展开迭代方法（Learned Primal-Dual和ItNet）实现的，这些方法更为复杂，因此在训练和推理方面具有更高的计算成本。我们提出一种方法，将两阶段方法的简单性和效率与最先进的重建质量相结合。我们的策略利用了一个经过预训练的神经网络，用于从自然灰度图像中去除高斯噪声，并针对LDCT图像进行微调增强。我们将这种方法称为FBP-DTSGD（域和任务转移高斯去噪器），因为微调是从高斯去噪到增强LDCT图像的任务转移，并且是从自然灰度到LDCT图像的域转移。一项消融研究使用三种不同的预训练高斯去噪器表明，FBP-DTSGD的性能不依赖于特定的去噪架构，这表明未来高斯去噪的进步可能有益于该方法。研究还表明，在自然图像上进行预训练可以提高LDCT重建质量，特别是在训练数据有限的情况下。值得注意的是，预训练不需要额外成本，因为使用现有的预训练模型。所提出的方法目前在LoDoPaB-CT挑战中占据最高平均位置。

更新时间: 2025-03-14 12:30:28

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.03551v4

Integrating Dynamical Systems Modeling with Spatiotemporal scRNA-seq Data Analysis

Understanding the dynamic nature of biological systems is fundamental to deciphering cellular behavior, developmental processes, and disease progression. Single-cell RNA sequencing (scRNA-seq) has provided static snapshots of gene expression, offering valuable insights into cellular states at a single time point. Recent advancements in temporally resolved scRNA-seq, spatial transcriptomics (ST), and time-series spatial transcriptomics (temporal-ST) have further revolutionized our ability to study the spatiotemporal dynamics of individual cells. These technologies, when combined with computational frameworks such as Markov chains, stochastic differential equations (SDEs), and generative models like optimal transport and Schr\"odinger bridges, enable the reconstruction of dynamic cellular trajectories and cell fate decisions. This review discusses how these dynamical system approaches offer new opportunities to model and infer cellular dynamics from a systematic perspective.

Updated: 2025-03-14 12:25:27

标题: 整合动力系统建模与时空scRNA-seq数据分析

摘要: 理解生物系统的动态性质对于解读细胞行为、发育过程和疾病进展至关重要。单细胞RNA测序（scRNA-seq）提供了基因表达的静态快照，为我们提供了有价值的细胞状态的见解。最近在时间上解决的scRNA-seq、空间转录组学（ST）和时间序列空间转录组学（temporal-ST）的进展进一步革新了我们研究个体细胞的时空动态的能力。这些技术与马尔可夫链、随机微分方程（SDEs）和生成模型如最优传输和Schr\"odinger桥结合使用，使得可以重建动态细胞轨迹和细胞命运决策。本文讨论了这些动态系统方法如何提供了从系统角度建模和推断细胞动态的新机会。

更新时间: 2025-03-14 12:25:27

领域: q-bio.QM,cs.LG,physics.bio-ph

下载: http://arxiv.org/abs/2503.11347v1

AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation

Huawei has always been committed to exploring the AI application in historical research. Biography generation, as a specialized form of abstractive summarization, plays a crucial role in historical research but faces unique challenges that existing large language models (LLMs) struggle to address. These challenges include maintaining stylistic adherence to historical writing conventions, ensuring factual fidelity, and handling fragmented information across multiple documents. We present AIstorian, a novel end-to-end agentic system featured with a knowledge graph (KG)-powered retrieval-augmented generation (RAG) and anti-hallucination multi-agents. Specifically, AIstorian introduces an in-context learning based chunking strategy and a KG-based index for accurate and efficient reference retrieval. Meanwhile, AIstorian orchestrates multi-agents to conduct on-the-fly hallucination detection and error-type-aware correction. Additionally, to teach LLMs a certain language style, we finetune LLMs based on a two-step training approach combining data augmentation-enhanced supervised fine-tuning with stylistic preference optimization. Extensive experiments on a real-life historical Jinshi dataset demonstrate that AIstorian achieves a 3.8x improvement in factual accuracy and a 47.6% reduction in hallucination rate compared to existing baselines. The data and code are available at: https://github.com/ZJU-DAILY/AIstorian.

Updated: 2025-03-14 12:23:45

标题: AIstorian 让AI成为历史学家：一个基于知识图谱的多智能体系统，用于准确生成传记

摘要: 华为一直致力于探索人工智能在历史研究中的应用。传记生成作为抽象总结的一种专门形式，在历史研究中起着至关重要的作用，但面临着现有大型语言模型（LLMs）难以解决的独特挑战。这些挑战包括保持对历史写作惯例的风格遵守，确保事实的忠实性，以及处理跨多个文档的碎片化信息。我们提出了AIstorian，一个新颖的端到端代理系统，具有以知识图（KG）为动力的检索增强生成（RAG）和反错觉多代理。具体来说，AIstorian引入了基于上下文学习的分块策略和基于知识图的索引，以实现准确高效的参考检索。同时，AIstorian协调多代理进行即时错觉检测和错误类型感知校正。此外，为了教授LLMs某种语言风格，我们基于数据增强增强监督微调与风格偏好优化相结合的两步训练方法对LLMs进行微调。对真实历史金石数据集的广泛实验表明，与现有基线相比，AIstorian在事实准确性方面实现了3.8倍的改进，错觉率减少了47.6%。数据和代码可在以下网址获取：https://github.com/ZJU-DAILY/AIstorian。

更新时间: 2025-03-14 12:23:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.11346v1

Challenging Assumptions in Learning Generic Text Style Embeddings

Recent advancements in language representation learning primarily emphasize language modeling for deriving meaningful representations, often neglecting style-specific considerations. This study addresses this gap by creating generic, sentence-level style embeddings crucial for style-centric tasks. Our approach is grounded on the premise that low-level text style changes can compose any high-level style. We hypothesize that applying this concept to representation learning enables the development of versatile text style embeddings. By fine-tuning a general-purpose text encoder using contrastive learning and standard cross-entropy loss, we aim to capture these low-level style shifts, anticipating that they offer insights applicable to high-level text styles. The outcomes prompt us to reconsider the underlying assumptions as the results do not always show that the learned style representations capture high-level text styles.

Updated: 2025-03-14 12:21:37

标题: 挑战学习通用文本样式嵌入的假设

摘要: 最近在语言表示学习方面的进展主要强调通过语言建模获得有意义的表示，通常忽视了风格特定的考虑。本研究通过创建对风格中心任务至关重要的通用句级风格嵌入来填补这一空白。我们的方法基于这样的前提：低级文本风格变化可以组成任何高级风格。我们假设将这一概念应用于表示学习可以开发出多功能文本风格嵌入。通过使用对比学习和标准交叉熵损失微调通用文本编码器，我们旨在捕捉这些低级风格变化，期望它们提供关于高级文本风格的见解。研究结果促使我们重新考虑基本假设，因为结果并不总是表明学习到的风格表示捕捉了高级文本风格。

更新时间: 2025-03-14 12:21:37

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2501.16073v2

Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model

Uncertainty quantification is a critical aspect of reinforcement learning and deep learning, with numerous applications ranging from efficient exploration and stable offline reinforcement learning to outlier detection in medical diagnostics. The scale of modern neural networks, however, complicates the use of many theoretically well-motivated approaches such as full Bayesian inference. Approximate methods like deep ensembles can provide reliable uncertainty estimates but still remain computationally expensive. In this work, we propose contextual similarity distillation, a novel approach that explicitly estimates the variance of an ensemble of deep neural networks with a single model, without ever learning or evaluating such an ensemble in the first place. Our method builds on the predictable learning dynamics of wide neural networks, governed by the neural tangent kernel, to derive an efficient approximation of the predictive variance of an infinite ensemble. Specifically, we reinterpret the computation of ensemble variance as a supervised regression problem with kernel similarities as regression targets. The resulting model can estimate predictive variance at inference time with a single forward pass, and can make use of unlabeled target-domain data or data augmentations to refine its uncertainty estimates. We empirically validate our method across a variety of out-of-distribution detection benchmarks and sparse-reward reinforcement learning environments. We find that our single-model method performs competitively and sometimes superior to ensemble-based baselines and serves as a reliable signal for efficient exploration. These results, we believe, position contextual similarity distillation as a principled and scalable alternative for uncertainty quantification in reinforcement learning and general deep learning.

Updated: 2025-03-14 12:09:58

标题: 上下文相似性提炼：单模型集成不确定性

摘要: 不确定性量化是强化学习和深度学习的关键方面，具有多种应用，包括高效探索和稳定的离线强化学习，以及医学诊断中的异常检测。然而，现代神经网络的规模使许多理论动机良好的方法（如全贝叶斯推断）的使用变得复杂。近似方法如深度集成可以提供可靠的不确定性估计，但仍然具有计算昂贵的特点。在这项工作中，我们提出了上下文相似性蒸馏，这是一种新颖的方法，明确地估计了一个深度神经网络集成的方差，只使用单个模型，而无需首先学习或评估这样的集成。我们的方法建立在宽神经网络的可预测学习动态之上，受神经切向核的控制，以导出无限集成的预测方差的高效近似。具体来说，我们重新解释了集成方差的计算，将其视为具有核相似性作为回归目标的监督回归问题。结果模型可以在推理时通过单次前向传递估计预测方差，并可以利用未标记的目标域数据或数据增强来改进其不确定性估计。我们在各种超出分布检测基准和稀疏奖励强化学习环境中经验验证了我们的方法。我们发现我们的单一模型方法与基于集成的基线表现竞争，并有时优于基线，并可作为高效探索的可靠信号。我们相信，这些结果将上下文相似性蒸馏定位为强化学习和一般深度学习中不确定性量化的原则性和可扩展替代方案。

更新时间: 2025-03-14 12:09:58

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.11339v1

Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data

As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature extraction in the pathological diagnosis of cardiomyopathy. Furthermore, model designs that contribute toward improving generalization performance are examined by applying feature selection (FS) and dimensional compression (DC) to several ML models. The obtained results were verified by visualizing the inter-class distribution differences and conducting statistical hypothesis testing based on texture features. Additionally, they were evaluated using predictive performance across different model designs with varying combinations of FS and DC (applied or not) and decision boundaries. The obtained results confirmed that texture features may be effective for the pathological diagnosis of cardiomyopathy. Moreover, when the ratio of features to the sample size is high, a multi-step process involving FS and DC improved the generalization performance, with the linear kernel support vector machine achieving the best results. This process was demonstrated to be potentially effective for models with reduced complexity, regardless of whether the decision boundaries were linear, curved, perpendicular, or parallel to the axes. These findings are expected to facilitate the development of an effective cardiomyopathy diagnostic model for its rapid adoption in medical practice.

Updated: 2025-03-14 11:59:23

标题: 心肌病诊断模型基于心内膜心肌活检标本：小样本数据中的适当特征空间和类边界

摘要: 随着心力衰竭患者数量的增加，机器学习（ML）在心肌病诊断中引起了关注，这是由于病理学家短缺。然而，心内膜心肌活检标本往往是小样本，并需要诸如特征提取和降维等技术。本研究旨在确定纹理特征是否在心肌病的病理诊断中有效。此外，通过将特征选择（FS）和维度压缩（DC）应用于几个ML模型来检查有助于改善泛化性能的模型设计。通过可视化不同类之间的分布差异和基于纹理特征进行统计假设检验来验证所获得的结果。此外，还通过在不同模型设计中使用预测性能对纹理特征进行评估，这些设计具有不同的FS和DC组合（应用或不应用）和决策边界。所获得的结果证实了纹理特征可能对心肌病的病理诊断有效。此外，当特征与样本大小的比率较高时，涉及FS和DC的多步骤过程提高了泛化性能，线性核支持向量机取得了最佳结果。该过程被证明对于模型具有降低复杂性的情况可能是有效的，无论决策边界是线性的，曲线的，垂直的还是平行于轴的。这些发现有望促进有效心肌病诊断模型的开发，以便快速在医疗实践中推广应用。

更新时间: 2025-03-14 11:59:23

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.11331v1

Learning to reset in target search problems

Target search problems are central to a wide range of fields, from biological foraging to the optimization algorithms. Recently, the ability to reset the search has been shown to significantly improve the searcher's efficiency. However, the optimal resetting strategy depends on the specific properties of the search problem and can often be challenging to determine. In this work, we propose a reinforcement learning (RL)-based framework to train agents capable of optimizing their search efficiency in environments by learning how to reset. First, we validate the approach in a well-established benchmark: the Brownian search with resetting. There, RL agents consistently recover strategies closely resembling the sharp resetting distribution, known to be optimal in this scenario. We then extend the framework by allowing agents to control not only when to reset, but also their spatial dynamics through turning actions. In this more complex setting, the agents discover strategies that adapt both resetting and turning to the properties of the environment, outperforming the proposed benchmarks. These results demonstrate how reinforcement learning can serve both as an optimization tool and a mechanism for uncovering new, interpretable strategies in stochastic search processes with resetting.

Updated: 2025-03-14 11:57:51

标题: 学习如何在目标搜索问题中重置

摘要: 目标搜索问题在从生物觅食到优化算法等广泛领域中起着重要作用。最近，已经证明重置搜索能够显著提高搜索者的效率。然而，最佳重置策略取决于搜索问题的特定属性，通常很难确定。在这项工作中，我们提出了基于强化学习（RL）的框架，训练能够通过学习如何重置来优化其在环境中搜索效率的代理。首先，我们在一个经过充分验证的基准测试中验证了这种方法：带有重置功能的布朗搜索。在那里，RL代理不断恢复类似于尖锐重置分布的策略，被认为在这种情况下是最佳的。然后，我们通过允许代理不仅控制何时重置，而且通过转向动作控制它们的空间动态来扩展框架。在这种更复杂的设置中，代理发现了能够使重置和转向适应环境属性的策略，超越了提出的基准测试。这些结果展示了强化学习如何既可以作为优化工具，又可以作为揭示随机搜索过程中带重置的新的可解释策略的机制。

更新时间: 2025-03-14 11:57:51

领域: cond-mat.stat-mech,cs.AI,cs.LG,physics.bio-ph,physics.comp-ph

下载: http://arxiv.org/abs/2503.11330v1

Content ARCs: Decentralized Content Rights in the Age of Generative AI

The rise of Generative AI (GenAI) has sparked significant debate over balancing the interests of creative rightsholders and AI developers. As GenAI models are trained on vast datasets that often include copyrighted material, questions around fair compensation and proper attribution have become increasingly urgent. To address these challenges, this paper proposes a framework called \emph{Content ARCs} (Authenticity, Rights, Compensation). By combining open standards for provenance and dynamic licensing with data attribution, and decentralized technologies, Content ARCs create a mechanism for managing rights and compensating creators for using their work in AI training. We characterize several nascent works in the AI data licensing space within Content ARCs and identify where challenges remain to fully implement the end-to-end framework.

Updated: 2025-03-14 11:57:08

标题: 内容ARC：生成AI时代的去中心化内容权利

摘要: 生成式人工智能（GenAI）的兴起引发了关于平衡创意权利持有者和人工智能开发者利益的重大争论。由于GenAI模型是在包含受版权保护的材料的庞大数据集上进行训练的，围绕公平补偿和适当归属的问题变得日益紧迫。为了解决这些挑战，本文提出了一个名为“Content ARCs”（真实性、权利、补偿）的框架。通过将用于溯源和动态许可的开放标准与数据归属和分散技术相结合，Content ARCs创建了一个机制，用于管理权利并为在AI训练中使用其作品的创作者提供补偿。我们对AI数据许可领域内几个新兴作品进行了特征化，并确定了完全实施端到端框架仍然存在挑战的地方。

更新时间: 2025-03-14 11:57:08

领域: cs.CY,cs.AI,cs.DL,eess.IV

下载: http://arxiv.org/abs/2503.14519v1

LLM Agents for Education: Advances and Applications

Large Language Model (LLM) agents have demonstrated remarkable capabilities in automating tasks and driving innovation across diverse educational applications. In this survey, we provide a systematic review of state-of-the-art research on LLM agents in education, categorizing them into two broad classes: (1) \emph{Pedagogical Agents}, which focus on automating complex pedagogical tasks to support both teachers and students; and (2) \emph{Domain-Specific Educational Agents}, which are tailored for specialized fields such as science education, language learning, and professional development. We comprehensively examine the technological advancements underlying these LLM agents, including key datasets, benchmarks, and algorithmic frameworks that drive their effectiveness. Furthermore, we discuss critical challenges such as privacy, bias and fairness concerns, hallucination mitigation, and integration with existing educational ecosystems. This survey aims to provide a comprehensive technological overview of LLM agents for education, fostering further research and collaboration to enhance their impact for the greater good of learners and educators alike.

Updated: 2025-03-14 11:53:44

标题: LLM代理用于教育：进展与应用

摘要: 大型语言模型（LLM）代理在自动化任务和推动各种教育应用创新方面展现出了显著的能力。在这项调查中，我们系统地审查了关于教育中LLM代理的最新研究，将它们分类为两类：（1）\emph{教学代理}，重点是自动化复杂的教学任务，以支持教师和学生；和（2）\emph{特定领域教育代理}，专门针对科学教育、语言学习和专业发展等专业领域。我们全面审查了支持这些LLM代理的技术进展，包括推动它们有效性的关键数据集、基准和算法框架。此外，我们讨论了关键挑战，如隐私、偏见和公平性问题、幻觉缓解以及与现有教育生态系统的整合。本调查旨在为教育中的LLM代理提供全面的技术概述，促进进一步研究和合作，以增强它们对学习者和教育者的共同利益的影响。

更新时间: 2025-03-14 11:53:44

领域: cs.CY,cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2503.11733v1

RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration

This paper presents RAG-KG-IL, a novel multi-agent hybrid framework designed to enhance the reasoning capabilities of Large Language Models (LLMs) by integrating Retrieval-Augmented Generation (RAG) and Knowledge Graphs (KGs) with an Incremental Learning (IL) approach. Despite recent advancements, LLMs still face significant challenges in reasoning with structured data, handling dynamic knowledge evolution, and mitigating hallucinations, particularly in mission-critical domains. Our proposed RAG-KG-IL framework addresses these limitations by employing a multi-agent architecture that enables continuous knowledge updates, integrates structured knowledge, and incorporates autonomous agents for enhanced explainability and reasoning. The framework utilizes RAG to ensure the generated responses are grounded in verifiable information, while KGs provide structured domain knowledge for improved consistency and depth of understanding. The Incremental Learning approach allows for dynamic updates to the knowledge base without full retraining, significantly reducing computational overhead and improving the model's adaptability. We evaluate the framework using real-world case studies involving health-related queries, comparing it to state-of-the-art models like GPT-4o and a RAG-only baseline. Experimental results demonstrate that our approach significantly reduces hallucination rates and improves answer completeness and reasoning accuracy. The results underscore the potential of combining RAG, KGs, and multi-agent systems to create intelligent, adaptable systems capable of real-time knowledge integration and reasoning in complex domains.

Updated: 2025-03-14 11:50:16

标题: RAG-KG-IL：一个减少幻觉、通过RAG和增量知识图谱学习集成提升LLM推理能力的多智能体混合框架

摘要: 本文介绍了RAG-KG-IL，这是一个新颖的多Agent混合框架，旨在通过将检索增强生成（RAG）和知识图谱（KG）与增量学习（IL）方法相结合，增强大型语言模型（LLMs）的推理能力。尽管最近取得了进展，但LLMs在处理结构化数据推理、处理动态知识演化和减少幻觉等方面仍面临重大挑战，尤其是在关键任务领域。我们提出的RAG-KG-IL框架通过采用多Agent架构来解决这些限制，实现连续知识更新，整合结构化知识，并整合自主Agent以提高解释性和推理能力。该框架利用RAG确保生成的响应基于可验证信息，而KG为改进一致性和深度理解提供结构化领域知识。增量学习方法允许对知识库进行动态更新，无需完全重新训练，从而显著减少计算开销，并提高模型的适应性。我们使用涉及与健康相关查询的实际案例研究来评估该框架，将其与GPT-4o和仅RAG基线等最先进模型进行比较。实验结果表明，我们的方法显著降低了幻觉率，提高了答案完整性和推理准确性。结果突显了结合RAG、KG和多Agent系统的潜力，创造出智能、适应性强的系统，能够在复杂领域进行实时知识整合和推理。

更新时间: 2025-03-14 11:50:16

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2503.13514v1

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Recent advances in text-to-image (T2I) diffusion models have significantly improved the quality of generated images. However, providing efficient control over individual subjects, particularly the attributes characterizing them, remains a key challenge. While existing methods have introduced mechanisms to modulate attribute expression, they typically provide either detailed, object-specific localization of such a modification or full-scale fine-grained, nuanced control of attributes. No current approach offers both simultaneously, resulting in a gap when trying to achieve precise continuous and subject-specific attribute modulation in image generation. In this work, we demonstrate that token-level directions exist within commonly used CLIP text embeddings that enable fine-grained, subject-specific control of high-level attributes in T2I models. We introduce two methods to identify these directions: a simple, optimization-free technique and a learning-based approach that utilizes the T2I model to characterize semantic concepts more specifically. Our methods allow the augmentation of the prompt text input, enabling fine-grained control over multiple attributes of individual subjects simultaneously, without requiring any modifications to the diffusion model itself. This approach offers a unified solution that fills the gap between global and localized control, providing competitive flexibility and precision in text-guided image generation. Project page: https://compvis.github.io/attribute-control. Code is available at https://github.com/CompVis/attribute-control.

Updated: 2025-03-14 11:33:08

标题: 通过识别语义方向在T2I模型中实现连续、特定主题属性控制

摘要: 最近，文本到图像（T2I）扩散模型的最新进展显著提高了生成图像的质量。然而，对于个体主题的有效控制，特别是表征它们的属性，仍然是一个关键挑战。虽然现有方法引入了调节属性表达的机制，但它们通常只提供详细的、对象特定的定位或全面的细粒度、微妙的属性控制。目前没有一种方法同时提供这两种功能，导致在尝试实现图像生成中精确连续和主题特定属性调节时存在差距。在这项工作中，我们展示了在常用的CLIP文本嵌入中存在令牌级方向，可以实现T2I模型中高级属性的细粒度、个体特定控制。我们介绍了两种方法来识别这些方向：一种简单的、无需优化的技术和一种基于学习的方法，利用T2I模型更具体地表征语义概念。我们的方法允许通过增强提示文本输入，同时对个体主题的多个属性进行细粒度控制，而无需对扩散模型本身进行任何修改。这种方法提供了一种统一的解决方案，填补了全局和局部控制之间的差距，在文本引导的图像生成中提供了竞争性的灵活性和精度。项目页面：https://compvis.github.io/attribute-control。代码可在https://github.com/CompVis/attribute-control上找到。

更新时间: 2025-03-14 11:33:08

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.17064v2

Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach

Active learning (AL), which iteratively queries the most informative examples from a large pool of unlabeled candidates for model training, faces significant challenges in the presence of open-set classes. Existing methods either prioritize query examples likely to belong to known classes, indicating low epistemic uncertainty (EU), or focus on querying those with highly uncertain predictions, reflecting high aleatoric uncertainty (AU). However, they both yield suboptimal performance, as low EU corresponds to limited useful information, and closed-set AU metrics for unknown class examples are less meaningful. In this paper, we propose an Energy-based Active Open-set Annotation (EAOA) framework, which effectively integrates EU and AU to achieve superior performance. EAOA features a $(C+1)$-class detector and a target classifier, incorporating an energy-based EU measure and a margin-based energy loss designed for the detector, alongside an energy-based AU measure for the target classifier. Another crucial component is the target-driven adaptive sampling strategy. It first forms a smaller candidate set with low EU scores to ensure closed-set properties, making AU metrics meaningful. Subsequently, examples with high AU scores are queried to form the final query set, with the candidate set size adjusted adaptively. Extensive experiments show that EAOA achieves state-of-the-art performance while maintaining high query precision and low training overhead. The code is available at https://github.com/chenchenzong/EAOA.

Updated: 2025-03-14 11:32:24

标题: 重新思考主观和随机不确定性对主动开放式标注的影响：一种基于能量的方法

摘要: 主动学习（AL）通过从大量未标记的候选样本中迭代地查询最具信息量的示例进行模型训练，在存在开放集类别的情况下面临重大挑战。现有方法要么优先考虑可能属于已知类别的查询示例，表明低认知不确定性（EU），要么关注具有高度不确定预测的查询示例，反映高aleatoric不确定性（AU）。然而，它们都会产生次优性能，因为低EU对应于有限的有用信息，而未知类别示例的封闭集AU指标意义较小。在本文中，我们提出了一种基于能量的主动开放集注释（EAOA）框架，有效地整合了EU和AU以实现优越性能。EAOA具有一个$(C+1)$-类检测器和一个目标分类器，结合了一个基于能量的EU度量和为检测器设计的基于边缘的能量损失，以及用于目标分类器的基于能量的AU度量。另一个关键组件是基于目标驱动的自适应抽样策略。首先形成一个具有低EU分数的较小候选集，以确保封闭集属性，使AU指标具有意义。随后，查询具有高AU分数的示例以形成最终查询集，候选集大小会自适应调整。大量实验证明，EAOA实现了最先进的性能，同时保持高查询精度和低训练开销。该代码可在https://github.com/chenchenzong/EAOA 获取。

更新时间: 2025-03-14 11:32:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.19691v2

Hiding Local Manipulations on SAR Images: a Counter-Forensic Attack

The vast accessibility of Synthetic Aperture Radar (SAR) images through online portals has propelled the research across various fields. This widespread use and easy availability have unfortunately made SAR data susceptible to malicious alterations, such as local editing applied to the images for inserting or covering the presence of sensitive targets. Vulnerability is further emphasized by the fact that most SAR products, despite their original complex nature, are often released as amplitude-only information, allowing even inexperienced attackers to edit and easily alter the pixel content. To contrast malicious manipulations, in the last years the forensic community has begun to dig into the SAR manipulation issue, proposing detectors that effectively localize the tampering traces in amplitude images. Nonetheless, in this paper we demonstrate that an expert practitioner can exploit the complex nature of SAR data to obscure any signs of manipulation within a locally altered amplitude image. We refer to this approach as a counter-forensic attack. To achieve the concealment of manipulation traces, the attacker can simulate a re-acquisition of the manipulated scene by the SAR system that initially generated the pristine image. In doing so, the attacker can obscure any evidence of manipulation, making it appear as if the image was legitimately produced by the system. This attack has unique features that make it both highly generalizable and relatively easy to apply. First, it is a black-box attack, meaning it is not designed to deceive a specific forensic detector. Furthermore, it does not require a training phase and is not based on adversarial operations. We assess the effectiveness of the proposed counter-forensic approach across diverse scenarios, examining various manipulation operations.

Updated: 2025-03-14 11:31:15

标题: 在SAR图像上隐藏本地篡改：一种反取证攻击

摘要: 合成孔径雷达（SAR）图像通过在线门户的广泛可访问性推动了各个领域的研究。这种广泛使用和易得性不幸地使SAR数据容易受到恶意篡改，例如对图像进行局部编辑以插入或覆盖敏感目标的存在。脆弱性进一步强调了大多数SAR产品，尽管其原始复杂性，通常作为幅度信息释放，即使是经验不足的攻击者也可以编辑和轻松修改像素内容。为了对抗恶意操作，近年来，司法社区开始研究SAR篡改问题，提出了有效定位幅度图像中篡改痕迹的检测器。然而，在本文中，我们证明了一位专业从业者可以利用SAR数据的复杂性来掩饰局部修改的幅度图像中的任何篡改迹象。我们将这种方法称为反取证攻击。为了实现篡改痕迹的掩盖，攻击者可以模拟SAR系统首次生成原始图像的情景重新获取篡改场景。通过这样做，攻击者可以掩盖任何篡改迹象，使其看起来像图像是由系统合法生成的。这种攻击具有独特的特点，使其既高度通用又相对易于应用。首先，这是一种黑盒攻击，意味着它不是为了欺骗特定的取证检测器。此外，它不需要训练阶段，也不基于对抗操作。我们评估了所提出的反取证方法在不同场景下的有效性，考察了各种篡改操作。

更新时间: 2025-03-14 11:31:15

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2407.07041v2

Lightweight Learning for Grant-Free Activity Detection in Cell-Free Massive MIMO Networks

Grant-free random access (GF-RA) is a promising access technique for massive machine-type communications (mMTC) in future wireless networks, particularly in the context of 5G and beyond (6G) systems. Within the context of GF-RA, this study investigates the efficiency of employing supervised machine learning techniques to tackle the challenges on the device activity detection (AD). GF-RA addresses scalability by employing non-orthogonal pilot sequences, which provides an efficient alternative comparing to conventional grant-based random access (GB-RA) technique that are constrained by the scarcity of orthogonal preamble resources. In this paper, we propose a novel lightweight data-driven algorithmic framework specifically designed for activity detection in GF-RA for mMTC in cell-free massive multiple-input multiple-output (CF-mMIMO) networks. We propose two distinct framework deployment strategies, centralized and decentralized, both tailored to streamline the proposed approach implementation across network infrastructures. Moreover, we introduce optimized post-detection methodologies complemented by a clustering stage to enhance overall detection performances. Our 3GPP-compliant simulations have validated that the proposed algorithm achieves state-of-the-art model-based activity detection accuracy while significantly reducing complexity. Achieving 99% accuracy, it demonstrates real-world viability and effectiveness.

Updated: 2025-03-14 11:18:47

标题: 无需授权的轻量级学习用于无基站 Massive MIMO 网络中的活动检测

摘要: 无授权随机接入（GF-RA）是未来无线网络中大规模机器通信（mMTC）的一种有前途的接入技术，特别是在5G及更高版本（6G）系统的背景下。在GF-RA的背景下，这项研究调查了采用监督机器学习技术来解决设备活动检测（AD）挑战的效率。GF-RA通过采用非正交导频序列来解决可伸缩性问题，提供了一种有效的替代方案，与传统的基于授权的随机接入（GB-RA）技术相比，后者受限于正交序列资源的稀缺性。在本文中，我们提出了一个专门为CF-mMIMO网络中的mMTC中的GF-RA中的活动检测设计的轻量级数据驱动算法框架。我们提出了两种不同的框架部署策略，分别是集中式和分散式，均旨在简化网络基础设施上的所提出方法的实施。此外，我们引入了优化的后检测方法，结合聚类阶段以增强整体检测性能。我们的3GPP兼容的模拟验证了所提算法实现了最先进的基于模型的活动检测准确性，同时显著降低了复杂性。实现了99%的准确性，证明了其在现实世界中的可行性和有效性。

更新时间: 2025-03-14 11:18:47

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2503.11305v1

BriLLM: Brain-inspired Large Language Model

This paper reports the first brain-inspired large language model (BriLLM). This is a non-Transformer, non-GPT, non-traditional machine learning input-output controlled generative language model. The model is based on the Signal Fully-connected flowing (SiFu) definition on the directed graph in terms of the neural network, and has the interpretability of all nodes on the graph of the whole model, instead of the traditional machine learning model that only has limited interpretability at the input and output ends. In the language model scenario, the token is defined as a node in the graph. A randomly shaped or user-defined signal flow flows between nodes on the principle of "least resistance" along paths. The next token or node to be predicted or generated is the target of the signal flow. As a language model, BriLLM theoretically supports infinitely long $n$-gram models when the model size is independent of the input and predicted length of the model. The model's working signal flow provides the possibility of recall activation and innate multi-modal support similar to the cognitive patterns of the human brain. At present, we released the first BriLLM version in Chinese, with 4000 tokens, 32-dimensional node width, 16-token long sequence prediction ability, and language model prediction performance comparable to GPT-1. More computing power will help us explore the infinite possibilities depicted above.

Updated: 2025-03-14 11:08:30

标题: BriLLM：脑启发的大型语言模型

摘要: 本文报道了第一个受大脑启发的大型语言模型（BriLLM）。这是一个非Transformer、非GPT、非传统的机器学习输入-输出控制生成语言模型。该模型基于神经网络中的信号全连接流动（SiFu）定义在有向图上，并具有整个模型图上所有节点的可解释性，而不是传统机器学习模型只在输入和输出端具有有限的可解释性。在语言模型场景中，令牌被定义为图中的节点。随机形状或用户定义的信号流在“最小阻力”原则下沿着路径在节点之间流动。下一个要预测或生成的令牌或节点是信号流的目标。作为语言模型，BriLLM 在模型大小独立于输入和预测模型长度时理论上支持无限长的 $n$-gram 模型。模型的工作信号流提供了召回激活和类似于人脑认知模式的固有多模态支持的可能性。目前，我们发布了第一个中文版的 BriLLM，包含 4000 个令牌，32 维节点宽度，16 个令牌长序列预测能力，以及与 GPT-1 相当的语言模型预测性能。更多的计算力将帮助我们探索上述无限可能性。

更新时间: 2025-03-14 11:08:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.11299v1

Latent Space Representation of Electricity Market Curves for Improved Prediction Efficiency

This work presents a three-phase ML prediction framework designed to handle a high dimensionality and multivariate time series character of the electricity market curves. In the preprocessing phase, we transform the original data to achieve a unified structure and mitigate the effect of possible outliers. Further, to address the challenge of high dimensionality, we test three dimensionality reduction techniques (PCA, kPCA, UMAP). Finally, we predict supply and demand curves, once represented in a latent space, with a variety of machine learning methods (RF, LSTM, TSMixer). As our results on the MIBEL dataset show, a high dimensional structure of the market curves can be best handled by the nonlinear reduction technique UMAP. Regardless of the ML technique used for prediction, we achieved the lowest values for all considered precision metrics with a UMAP latent space representation in only two or three dimensions, even when compared to PCA and kPCA with five or six dimensions. Further, we demonstrate that the most promising machine learning technique to handle the complex structure of the electricity market curves is a novel TSMixer architecture. Finally, we fill the gap in the field of electricity market curves prediction literature: in addition to standard analysis on the supply side, we applied the ML framework and predicted demand curves too. We discussed the differences in the achieved results for these two types of curves.

Updated: 2025-03-14 11:04:46

标题: 电力市场曲线的潜在空间表示以提高预测效率

摘要: 这项工作提出了一个三阶段的机器学习预测框架，旨在处理电力市场曲线的高维度和多变量时间序列特征。在预处理阶段，我们将原始数据转换为统一的结构，并减轻可能异常值的影响。此外，为了解决高维度的挑战，我们测试了三种降维技术（PCA、kPCA、UMAP）。最后，一旦在潜在空间中表示供应和需求曲线，我们使用多种机器学习方法（RF、LSTM、TSMixer）进行预测。正如我们在MIBEL数据集上的结果所显示的那样，市场曲线的高维结构最好通过非线性降维技术UMAP处理。无论用于预测的机器学习技术是什么，我们在所有考虑的精度指标中都实现了最低值，UMAP潜在空间表示仅有两到三个维度，即使与具有五到六个维度的PCA和kPCA相比也是如此。此外，我们证明了处理电力市场曲线复杂结构最有前景的机器学习技术是一种新颖的TSMixer架构。最后，我们填补了电力市场曲线预测文献领域的空白：除了在供应方面的标准分析外，我们还应用了机器学习框架并预测了需求曲线。我们讨论了这两种曲线实现结果的差异。

更新时间: 2025-03-14 11:04:46

领域: cs.LG,I.5.1; I.2.6; I.6.3; J.2; J.4

下载: http://arxiv.org/abs/2503.11294v1

Class-Level Feature Selection Method Using Feature Weighted Growing Self-Organising Maps

There have been several attempts to develop Feature Selection (FS) algorithms capable of identifying features that are relevant in a dataset. Although in certain applications the FS algorithms can be seen to be successful, they have similar basic limitations. In all cases, the global feature selection algorithms seek to select features that are relevant and common to all classes of the dataset. This is a major limitation since there could be features that are specifically useful for a particular class while irrelevant for other classes, and full explanation of the relationship at class level therefore cannot be determined. While the inclusion of such features for all classes could cause improved predictive ability for the relevant class, the same features could be problematic for other classes. In this paper, we examine this issue and also develop a class-level feature selection method called the Feature Weighted Growing Self-Organising Map (FWGSOM). The proposed method carries out feature analysis at class level which enhances its ability to identify relevant features for each class. Results from experiments indicate that our method performs better than other methods, gives explainable results at class level, and has a low computational footprint when compared to other methods.

Updated: 2025-03-14 11:02:34

标题: 基于特征加权增长式自组织映射的类级特征选择方法

摘要: 有过多次尝试开发能够识别数据集中相关特征的特征选择（FS）算法。尽管在某些应用中，FS算法被认为是成功的，但它们具有相似的基本限制。在所有情况下，全局特征选择算法旨在选择与数据集所有类别相关且普遍的特征。这是一个重要的限制，因为可能存在对特定类别有用而对其他类别无关紧要的特征，因此无法确定在类别级别的关系的全面解释。虽然将这些特征包括在所有类别中可能会提高相关类别的预测能力，但同样的特征可能对其他类别造成问题。在本文中，我们研究了这个问题，并开发了一种称为特征加权增长自组织映射（FWGSOM）的类别级特征选择方法。所提出的方法在类别级别进行特征分析，增强了其识别每个类别相关特征的能力。实验结果表明，我们的方法优于其他方法，在类别级别给出可解释的结果，并且与其他方法相比具有低计算占用。

更新时间: 2025-03-14 11:02:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11732v1

Brain Effective Connectivity Estimation via Fourier Spatiotemporal Attention

Estimating brain effective connectivity (EC) from functional magnetic resonance imaging (fMRI) data can aid in comprehending the neural mechanisms underlying human behavior and cognition, providing a foundation for disease diagnosis. However, current spatiotemporal attention modules handle temporal and spatial attention separately, extracting temporal and spatial features either sequentially or in parallel. These approaches overlook the inherent spatiotemporal correlations present in real world fMRI data. Additionally, the presence of noise in fMRI data further limits the performance of existing methods. In this paper, we propose a novel brain effective connectivity estimation method based on Fourier spatiotemporal attention (FSTA-EC), which combines Fourier attention and spatiotemporal attention to simultaneously capture inter-series (spatial) dynamics and intra-series (temporal) dependencies from high-noise fMRI data. Specifically, Fourier attention is designed to convert the high-noise fMRI data to frequency domain, and map the denoised fMRI data back to physical domain, and spatiotemporal attention is crafted to simultaneously learn spatiotemporal dynamics. Furthermore, through a series of proofs, we demonstrate that incorporating learnable filter into fast Fourier transform and inverse fast Fourier transform processes is mathematically equivalent to performing cyclic convolution. The experimental results on simulated and real-resting-state fMRI datasets demonstrate that the proposed method exhibits superior performance when compared to state-of-the-art methods.

Updated: 2025-03-14 10:41:27

标题: 通过傅里叶时空关注估计大脑有效连接

摘要: 从功能性磁共振成像（fMRI）数据中估计大脑有效连接（EC）可以帮助理解人类行为和认知背后的神经机制，为疾病诊断提供基础。然而，目前的时空注意模块分别处理时间和空间注意，将时间和空间特征顺序地或并行地提取出来。这些方法忽视了真实世界fMRI数据中固有的时空相关性。此外，fMRI数据中的噪声存在进一步限制了现有方法的性能。在本文中，我们提出了一种基于傅立叶时空注意（FSTA-EC）的新颖大脑有效连接估计方法，该方法结合了傅立叶注意和时空注意，从高噪声fMRI数据中同时捕获系列间（空间）动态和系列内（时间）依赖关系。具体而言，傅立叶注意设计为将高噪声fMRI数据转换为频域，将去噪fMRI数据映射回物理域，时空注意被设计为同时学习时空动态。此外，通过一系列证明，我们证明将可学习滤波器纳入快速傅立叶变换和逆快速傅立叶变换过程在数学上等价于执行循环卷积。对模拟和真实静息状态fMRI数据集的实验结果表明，与最先进的方法相比，所提出的方法表现出更优越的性能。

更新时间: 2025-03-14 10:41:27

领域: cs.LG

下载: http://arxiv.org/abs/2503.11283v1

Online Context Learning for Socially Compliant Navigation

Robot social navigation needs to adapt to different human factors and environmental contexts. However, since these factors and contexts are difficult to predict and cannot be exhaustively enumerated, traditional learning-based methods have difficulty in ensuring the social attributes of robots in long-term and cross-environment deployments. This letter introduces an online context learning method that aims to empower robots to adapt to new social environments online. The proposed method adopts a two-layer structure. The bottom layer is built using a deep reinforcement learning-based method to ensure the output of basic robot navigation commands. The upper layer is implemented using an online robot learning-based method to socialize the control commands suggested by the bottom layer. Experiments using a community-wide simulator show that our method outperforms the state-of-the-art ones. Experimental results in the most challenging scenarios show that our method improves the performance of the state-of-the-art by 8%. The source code of the proposed method, the data used, and the tools for the per-training step are publicly available at https://github.com/Nedzhaken/SOCSARL-OL.

Updated: 2025-03-14 10:41:06

标题: 在线环境学习对社交合规导航的影响

摘要: 机器人社交导航需要适应不同的人类因素和环境背景。然而，由于这些因素和背景很难预测并且无法穷尽列举，传统的基于学习的方法在确保机器人在长期和跨环境部署中的社交属性方面存在困难。本文介绍了一种在线环境学习方法，旨在赋予机器人在线适应新的社交环境的能力。所提出的方法采用了两层结构。底层采用基于深度强化学习的方法构建，以确保基本机器人导航命令的输出。上层采用在线机器人学习方法实现，以社交化底层建议的控制命令。使用一个社区范围的模拟器进行的实验表明，我们的方法表现优于最先进的方法。在最具挑战性的场景中的实验结果表明，我们的方法将最先进方法的性能提高了8％。所提出方法的源代码、使用的数据和用于预训练步骤的工具可在https://github.com/Nedzhaken/SOCSARL-OL 上公开获取。

更新时间: 2025-03-14 10:41:06

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.11495v2

OPTIMUS: Predicting Multivariate Outcomes in Alzheimer's Disease Using Multi-modal Data amidst Missing Values

Alzheimer's disease, a neurodegenerative disorder, is associated with neural, genetic, and proteomic factors while affecting multiple cognitive and behavioral faculties. Traditional AD prediction largely focuses on univariate disease outcomes, such as disease stages and severity. Multimodal data encode broader disease information than a single modality and may, therefore, improve disease prediction; but they often contain missing values. Recent "deeper" machine learning approaches show promise in improving prediction accuracy, yet the biological relevance of these models needs to be further charted. Integrating missing data analysis, predictive modeling, multimodal data analysis, and explainable AI, we propose OPTIMUS, a predictive, modular, and explainable machine learning framework, to unveil the many-to-many predictive pathways between multimodal input data and multivariate disease outcomes amidst missing values. OPTIMUS first applies modality-specific imputation to uncover data from each modality while optimizing overall prediction accuracy. It then maps multimodal biomarkers to multivariate outcomes using machine-learning and extracts biomarkers respectively predictive of each outcome. Finally, OPTIMUS incorporates XAI to explain the identified multimodal biomarkers. Using data from 346 cognitively normal subjects, 608 persons with mild cognitive impairment, and 251 AD patients, OPTIMUS identifies neural and transcriptomic signatures that jointly but differentially predict multivariate outcomes related to executive function, language, memory, and visuospatial function. Our work demonstrates the potential of building a predictive and biologically explainable machine-learning framework to uncover multimodal biomarkers that capture disease profiles across varying cognitive landscapes. The results improve our understanding of the complex many-to-many pathways in AD.

Updated: 2025-03-14 10:40:04

标题: OPTIMUS: 使用多模态数据在阿尔茨海默病中预测多变量结果，同时处理缺失值

摘要: 阿尔茨海默病是一种神经退行性疾病，与神经、遗传和蛋白组因素相关，影响多个认知和行为能力。传统的阿尔茨海默病预测主要集中在单变量疾病结果，如疾病阶段和严重程度。多模态数据编码比单一模态更广泛的疾病信息，因此可能改进疾病预测；但它们经常包含缺失值。最近的“更深入”的机器学习方法显示了提高预测准确性的潜力，但这些模型的生物学相关性仍需进一步探讨。通过整合缺失数据分析、预测建模、多模态数据分析和可解释人工智能，我们提出了OPTIMUS，一个预测性、模块化和可解释的机器学习框架，揭示了多模态输入数据和多变量疾病结果之间的多对多预测路径，同时处理缺失值。OPTIMUS首先应用模态特定的插补来揭示每个模态的数据，同时优化整体预测准确性。然后，它使用机器学习将多模态生物标志物映射到多变量结果，并提取分别与每个结果相关的生物标志物。最后，OPTIMUS结合了XAI来解释识别的多模态生物标志物。利用来自346名认知正常受试者、608名轻度认知障碍患者和251名阿尔茨海默病患者的数据，OPTIMUS识别了神经和转录组标志，联合但差异地预测与执行功能、语言、记忆和视觉空间功能相关的多变量结果。我们的工作展示了建立预测性和生物解释性机器学习框架的潜力，以揭示跨不同认知环境捕获疾病概况的多模态生物标志物。这些结果提高了我们对阿尔茨海默病中复杂多对多路径的理解。

更新时间: 2025-03-14 10:40:04

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2503.11282v1

Permutation Equivariant Neural Networks for Symmetric Tensors

Incorporating permutation equivariance into neural networks has proven to be useful in ensuring that models respect symmetries that exist in data. Symmetric tensors, which naturally appear in statistics, machine learning, and graph theory, are essential for many applications in physics, chemistry, and materials science, amongst others. However, existing research on permutation equivariant models has not explored symmetric tensors as inputs, and most prior work on learning from these tensors has focused on equivariance to Euclidean groups. In this paper, we present two different characterisations of all linear permutation equivariant functions between symmetric power spaces of $\mathbb{R}^n$. We show on two tasks that these functions are highly data efficient compared to standard MLPs and have potential to generalise well to symmetric tensors of different sizes.

Updated: 2025-03-14 10:33:13

标题: 《置换等变神经网络用于对称张量》

摘要: 将排列等变性引入神经网络已被证明在确保模型尊重数据中存在的对称性方面是有用的。对称张量在统计学、机器学习和图论中自然出现，在物理学、化学和材料科学等许多应用中至关重要。然而，现有关于排列等变模型的研究尚未探索对称张量作为输入，并且大多数关于从这些张量中学习的先前工作都集中在对欧几里德群的等变性上。在本文中，我们提出了两种不同的线性排列等变函数的特征化，介于$\mathbb{R}^n$的对称幂空间之间。我们在两个任务上展示了这些函数与标准MLP相比具有高度的数据效率，并且具有很好地泛化到不同大小的对称张量的潜力。

更新时间: 2025-03-14 10:33:13

领域: cs.LG,math.CO,math.RT,stat.ML

下载: http://arxiv.org/abs/2503.11276v1

Financial Fraud Detection with Entropy Computing

We introduce CVQBoost, a novel classification algorithm that leverages early hardware implementing Quantum Computing Inc's Entropy Quantum Computing (EQC) paradigm, Dirac-3 [Nguyen et. al. arXiv:2407.04512]. We apply CVQBoost to a fraud detection test case and benchmark its performance against XGBoost, a widely utilized ML method. Running on Dirac-3, CVQBoost demonstrates a significant runtime advantage over XGBoost, which we evaluate on high-performance hardware comprising up to 48 CPUs and four NVIDIA L4 GPUs using the RAPIDS AI framework. Our results show that CVQBoost maintains competitive accuracy (measured by AUC) while significantly reducing training time, particularly as dataset size and feature complexity increase. To assess scalability, we extend our study to large synthetic datasets ranging from 1M to 70M samples, demonstrating that CVQBoost on Dirac-3 is well-suited for large-scale classification tasks. These findings position CVQBoost as a promising alternative to gradient boosting methods, offering superior scalability and efficiency for high-dimensional ML applications such as fraud detection.

Updated: 2025-03-14 10:30:43

标题: 利用熵计算进行金融欺诈检测

摘要: 我们介绍了CVQBoost，这是一种新颖的分类算法，利用了早期实现了量子计算公司Entropy Quantum Computing（EQC）范式的硬件，Dirac-3 [Nguyen等人 arXiv:2407.04512]。我们将CVQBoost应用于欺诈检测测试案例，并将其性能与广泛使用的机器学习方法XGBoost进行了基准测试。在Dirac-3上运行时，CVQBoost显示出比XGBoost显着的运行时间优势，我们使用RAPIDS AI框架在高性能硬件上进行评估，其中包括高达48个CPU和四个NVIDIA L4 GPU。我们的结果表明，CVQBoost在保持竞争性准确性（通过AUC测量）的同时，显着减少了训练时间，特别是在数据集大小和特征复杂性增加时。为了评估可伸缩性，我们将研究扩展到包含100万至7000万个样本的大型合成数据集，证明CVQBoost在Dirac-3上非常适用于大规模分类任务。这些发现将CVQBoost定位为梯度提升方法的一个有希望的替代选择，为欺诈检测等高维度机器学习应用提供更优越的可伸缩性和效率。

更新时间: 2025-03-14 10:30:43

领域: cs.LG,cs.AI,physics.optics,quant-ph

下载: http://arxiv.org/abs/2503.11273v1

When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective

Theoretical efforts to prove advantages of Transformers in comparison with classical architectures such as feedforward and recurrent neural networks have mostly focused on representational power. In this work, we take an alternative perspective and prove that even with infinite compute, feedforward and recurrent networks may suffer from larger sample complexity compared to Transformers, as the latter can adapt to a form of dynamic sparsity. Specifically, we consider a sequence-to-sequence data generating model on sequences of length $N$, in which the output at each position depends only on $q$ relevant tokens with $q \ll N$, and the positions of these tokens are described in the input prompt. We prove that a single-layer Transformer can learn this model if and only if its number of attention heads is at least $q$, in which case it achieves a sample complexity almost independent of $N$, while recurrent networks require $N^{\Omega(1)}$ samples on the same problem. If we simplify this model, recurrent networks may achieve a complexity almost independent of $N$, while feedforward networks still require $N$ samples. Consequently, our proposed sparse retrieval model illustrates a natural hierarchy in sample complexity across these architectures.

Updated: 2025-03-14 10:30:42

标题: 当变压器优于前馈和循环网络？统计角度观点

摘要: 理论努力证明变压器与传统架构（如前馈和循环神经网络）相比的优势主要集中在表示能力上。在这项工作中，我们采取了一种替代视角，并证明即使有无限计算能力，前馈和循环网络在样本复杂性上可能比变压器更受影响，因为后者能够适应一种动态稀疏形式。具体来说，我们考虑一个长度为N的序列到序列数据生成模型，在这个模型中，每个位置的输出仅取决于q个相关标记，其中q≪N，并且这些标记的位置在输入提示中描述。我们证明，单层变压器只有在其注意头的数量至少为q时才能学习这个模型，在这种情况下，它实现了几乎独立于N的样本复杂性，而循环网络在同一问题上需要N^Ω(1)个样本。如果我们简化这个模型，循环网络可以实现几乎独立于N的复杂性，而前馈网络仍然需要N个样本。因此，我们提出的稀疏检索模型展示了这些架构之间样本复杂性的自然层次结构。

更新时间: 2025-03-14 10:30:42

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.11272v1

Positivity sets of hinge functions

In this paper we investigate which subsets of the real plane are realisable as the set of points on which a one-layer ReLU neural network takes a positive value. In the case of cones we give a full characterisation of such sets. Furthermore, we give a necessary condition for any subset of $\mathbb R^d$. We give various examples of such one-layer neural networks.

Updated: 2025-03-14 10:26:24

标题: 铰链函数的正定集

摘要: 在本文中，我们研究了实数平面的哪些子集可以被实现为一层ReLU神经网络在其上取正值的点集。对于锥体的情况，我们给出了这样的集合的完全特征化。此外，我们给出了任意$\mathbb R^d$子集的一个必要条件。我们还提供了一些这样的一层神经网络的例子。

更新时间: 2025-03-14 10:26:24

领域: stat.ML,cs.DM,cs.LG,cs.SC,math.CO,math.FA

下载: http://arxiv.org/abs/2503.13512v1

On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks

In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty in the true data generation process, of the epistemic uncertainty caused by a limited training data set, and of the calibration level of the target model. We compare three different settings, in which the attacker receives decreasingly informative feedback from the target model: confidence vector (CV) disclosure, in which the output probability vector is released; true label confidence (TLC) disclosure, in which only the probability assigned to the true label is made available by the model; and decision set (DS) disclosure, in which an adaptive prediction set is produced as in conformal prediction. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs. Simulation results demonstrate that the derived analytical bounds predict well the effectiveness of MIAs.

Updated: 2025-03-14 10:13:46

标题: 关于不确定性和校准对似然比成员推断攻击的影响

摘要: 在一个成员推断攻击（MIA）中，攻击者利用典型机器学习模型展示的自信来确定特定数据点是否被用于训练目标模型。在本文中，我们在一个信息理论框架内分析了可能性比攻击（LiRA）的性能，这允许调查真实数据生成过程中的随机不确定性、由于有限训练数据集而导致的认知不确定性以及目标模型的校准水平的影响。我们比较了三种不同的设置，在这些设置中，攻击者从目标模型接收到的信息逐渐减少：置信向量（CV）披露，其中输出概率向量被释放；真实标签置信度（TLC）披露，其中模型仅提供分配给真实标签的概率；以及决策集（DS）披露，其中生成一个自适应预测集，就像符合预测一样。我们推导出MIA对手的优势的界限，旨在提供关于不确定性和校准对MIA有效性的影响的见解。模拟结果表明，推导出的分析界限很好地预测了MIA的有效性。

更新时间: 2025-03-14 10:13:46

领域: cs.IT,cs.CR,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2402.10686v3

Beyond Tree Models: A Hybrid Model of KAN and gMLP for Large-Scale Financial Tabular Data

Tabular data plays a critical role in real-world financial scenarios. Traditionally, tree models have dominated in handling tabular data. However, financial datasets in the industry often encounter some challenges, such as data heterogeneity, the predominance of numerical features and the large scale of the data, which can range from tens of millions to hundreds of millions of records. These challenges can lead to significant memory and computational issues when using tree-based models. Consequently, there is a growing need for neural network-based solutions that can outperform these models. In this paper, we introduce TKGMLP, an hybrid network for tabular data that combines shallow Kolmogorov Arnold Networks with Gated Multilayer Perceptron. This model leverages the strengths of both architectures to improve performance and scalability. We validate TKGMLP on a real-world credit scoring dataset, where it achieves state-of-the-art results and outperforms current benchmarks. Furthermore, our findings demonstrate that the model continues to improve as the dataset size increases, making it highly scalable. Additionally, we propose a novel feature encoding method for numerical data, specifically designed to address the predominance of numerical features in financial datasets. The integration of this feature encoding method within TKGMLP significantly improves prediction accuracy. This research not only advances table prediction technology but also offers a practical and effective solution for handling large-scale numerical tabular data in various industrial applications.

Updated: 2025-03-14 10:13:20

标题: 超越树模型：一种用于大规模金融表格数据的KAN和gMLP混合模型

摘要: 表格数据在现实世界的金融场景中起着至关重要的作用。传统上，树模型在处理表格数据方面占据主导地位。然而，在金融行业的数据集中经常遇到一些挑战，如数据异质性、数值特征的优势以及数据规模的大幅增长，可以从数千万到数亿条记录。当使用基于树的模型时，这些挑战可能导致严重的内存和计算问题。因此，越来越需要基于神经网络的解决方案来超越这些模型。在本文中，我们介绍了TKGMLP，一种用于表格数据的混合网络，将浅层Kolmogorov Arnold网络与门控多层感知机结合起来。该模型利用了两种架构的优势来改善性能和可扩展性。我们在一个真实的信用评分数据集上验证了TKGMLP，在那里它取得了最先进的结果，并超过了当前的基准。此外，我们的研究结果表明，随着数据集大小的增加，模型的性能持续改善，使其具有很高的可扩展性。此外，我们提出了一种新颖的特征编码方法，专门设计用于解决金融数据集中数值特征的优势。将这种特征编码方法整合到TKGMLP中显著提高了预测准确性。这项研究不仅推进了表格预测技术，还为处理各种工业应用中的大规模数值表格数据提供了实用有效的解决方案。

更新时间: 2025-03-14 10:13:20

领域: cs.LG

下载: http://arxiv.org/abs/2412.02097v3

Aligning Graphical and Functional Causal Abstractions

Causal abstractions allow us to relate causal models on different levels of granularity. To ensure that the models agree on cause and effect, frameworks for causal abstractions define notions of consistency. Two distinct methods for causal abstraction are common in the literature: (i) graphical abstractions, such as Cluster DAGs, which relate models on a structural level, and (ii) functional abstractions, like $\alpha$-abstractions, which relate models by maps between variables and their ranges. In this paper we will align the notions of graphical and functional consistency and show an equivalence between the class of Cluster DAGs, consistent $\alpha$-abstractions with the range of abstracted variables mapped bijectively, and constructive $\tau$-abstractions. Furthermore, we extend this alignment and the expressivity of graphical abstractions by introducing Partial Cluster DAGs. Our results provide a rigorous bridge between the functional and graphical frameworks and allow for adoption and transfer of results between them.

Updated: 2025-03-14 10:11:04

标题: 将图形和功能因果抽象对齐

摘要: 因果抽象使我们能够将不同粒度的因果模型联系起来。为了确保模型在因果上达成一致，因果抽象的框架定义了一致性的概念。文献中常见的两种不同的因果抽象方法是：(i) 图形抽象，如Cluster DAGs，它们在结构层面上联系模型；以及(ii) 功能抽象，如$\alpha$-抽象，它们通过变量和它们的范围之间的映射联系模型。在本文中，我们将对图形一致性和功能一致性的概念进行对齐，并展示Cluster DAGs类、具有范围映射为双射的一致性$\alpha$-抽象，以及具有构造性的$\tau$-抽象之间的等价性。此外，我们通过引入Partial Cluster DAGs进一步扩展了这种对齐和图形抽象的表达能力。我们的结果提供了功能和图形框架之间的严格桥梁，并允许在它们之间采用和转移结果。

更新时间: 2025-03-14 10:11:04

领域: cs.AI

下载: http://arxiv.org/abs/2412.17080v4

PEMF-VTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

Video Virtual Try-on aims to seamlessly transfer a reference garment onto a target person in a video while preserving both visual fidelity and temporal coherence. Existing methods typically rely on inpainting masks to define the try-on area, enabling accurate garment transfer for simple scenes (e.g., in-shop videos). However, these mask-based approaches struggle with complex real-world scenarios, as overly large and inconsistent masks often destroy spatial-temporal information, leading to distorted results. Mask-free methods alleviate this issue but face challenges in accurately determining the try-on area, especially for videos with dynamic body movements. To address these limitations, we propose PEMF-VTO, a novel Point-Enhanced Mask-Free Video Virtual Try-On framework that leverages sparse point alignments to explicitly guide garment transfer. Our key innovation is the introduction of point-enhanced guidance, which provides flexible and reliable control over both spatial-level garment transfer and temporal-level video coherence. Specifically, we design a Point-Enhanced Transformer (PET) with two core components: Point-Enhanced Spatial Attention (PSA), which uses frame-cloth point alignments to precisely guide garment transfer, and Point-Enhanced Temporal Attention (PTA), which leverages frame-frame point correspondences to enhance temporal coherence and ensure smooth transitions across frames. Extensive experiments demonstrate that our PEMF-VTO outperforms state-of-the-art methods, generating more natural, coherent, and visually appealing try-on videos, particularly for challenging in-the-wild scenarios. The link to our paper's homepage is https://pemf-vto.github.io/.

Updated: 2025-03-14 10:07:40

标题: PEMF-VTO: 基于无面具范式的点增强视频虚拟试穿

摘要: 视频虚拟试穿旨在在视频中将参考服装无缝地转移至目标人物身上，同时保持视觉保真度和时间连贯性。现有方法通常依赖于修补掩模来定义试穿区域，从而实现对简单场景（例如店内视频）的准确服装转移。然而，这些基于掩模的方法在复杂的现实世界场景中面临困难，因为过大和不一致的掩模经常破坏空间-时间信息，导致结果失真。无掩模方法缓解了这一问题，但在准确确定试穿区域方面面临挑战，尤其是对于具有动态身体运动的视频。为了解决这些限制，我们提出了PEMF-VTO，这是一种新颖的基于点增强的无掩模视频虚拟试穿框架，利用稀疏点对齐来明确引导服装转移。我们的关键创新是引入点增强指导，它提供对空间级别服装转移和时间级别视频连贯性的灵活可靠控制。具体来说，我们设计了一个带有两个核心组件的点增强变换器（PET）：点增强空间注意力（PSA），利用帧-衣服点对齐来精确引导服装转移；点增强时间注意力（PTA），利用帧-帧点对应来增强时间连贯性，确保帧间平滑过渡。大量实验证明，我们的PEMF-VTO优于最先进的方法，生成更自然、连贯和视觉上吸引人的试穿视频，特别适用于具有挑战性的野外场景。我们论文的主页链接是https://pemf-vto.github.io/。

更新时间: 2025-03-14 10:07:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.03021v4

Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries

As LLMs grow more powerful, their most profound achievement may be recognising when to say "I don't know". Existing studies on LLM self-knowledge have been largely constrained by human-defined notions of feasibility, often neglecting the reasons behind unanswerability by LLMs and failing to study deficient types of self-knowledge. This study aims to obtain intrinsic insights into different types of LLM self-knowledge with a novel methodology: allowing them the flexibility to set their own feasibility boundaries and then analysing the consistency of these limits. We find that even frontier models like GPT-4o and Mistral Large are not sure of their own capabilities more than 80% of the time, highlighting a significant lack of trustworthiness in responses. Our analysis of confidence balance in LLMs indicates that models swing between overconfidence and conservatism in feasibility boundaries depending on task categories and that the most significant self-knowledge weaknesses lie in temporal awareness and contextual understanding. These difficulties in contextual comprehension additionally lead models to question their operational boundaries, resulting in considerable confusion within the self-knowledge of LLMs. We make our code and results available publicly at https://github.com/knowledge-verse-ai/LLM-Self_Knowledge_Eval

Updated: 2025-03-14 10:07:07

标题: 职责范围：通过一致性评估LLM自我认识的可行性边界

摘要: 随着LLM变得越来越强大，它们最深刻的成就可能是在需要时承认“我不知道”。现有关于LLM自我认识的研究往往受到人类定义的可行性概念的限制，经常忽视LLM无法回答问题的原因，并未研究自我认识的不足类型。本研究旨在通过一种新颖的方法论获得对不同类型的LLM自我认识的内在洞察：允许它们灵活设定自己的可行性界限，然后分析这些限制的一致性。我们发现，即使是像GPT-4o和Mistral Large这样的前沿模型，在超过80%的时间内也不确定自己的能力，突显了回答的可信度显著不足。我们对LLM的信心平衡进行的分析表明，模型在任务类别上在可行性界限上的自信和保守之间摇摆不定，而最显著的自我认识弱点在于时间意识和背景理解。这些在背景理解上的困难另外导致模型对其运行界限产生质疑，从而导致LLM的自我认识出现相当大的困惑。我们将我们的代码和结果公开在以下网址：https://github.com/knowledge-verse-ai/LLM-Self_Knowledge_Eval

更新时间: 2025-03-14 10:07:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.11256v1

Federated Koopman-Reservoir Learning for Large-Scale Multivariate Time-Series Anomaly Detection

The proliferation of edge devices has dramatically increased the generation of multivariate time-series (MVTS) data, essential for applications from healthcare to smart cities. Such data streams, however, are vulnerable to anomalies that signal crucial problems like system failures or security incidents. Traditional MVTS anomaly detection methods, encompassing statistical and centralized machine learning approaches, struggle with the heterogeneity, variability, and privacy concerns of large-scale, distributed environments. In response, we introduce FedKO, a novel unsupervised Federated Learning framework that leverages the linear predictive capabilities of Koopman operator theory along with the dynamic adaptability of Reservoir Computing. This enables effective spatiotemporal processing and privacy preservation for MVTS data. FedKO is formulated as a bi-level optimization problem, utilizing a specific federated algorithm to explore a shared Reservoir-Koopman model across diverse datasets. Such a model is then deployable on edge devices for efficient detection of anomalies in local MVTS streams. Experimental results across various datasets showcase FedKO's superior performance against state-of-the-art methods in MVTS anomaly detection. Moreover, FedKO reduces up to 8x communication size and 2x memory usage, making it highly suitable for large-scale systems.

Updated: 2025-03-14 10:06:52

标题: 联邦Koopman-水库学习用于大规模多变量时间序列异常检测

摘要: 边缘设备的激增极大地增加了多变量时间序列（MVTS）数据的生成，这对于从医疗保健到智慧城市等应用至关重要。然而，这类数据流容易受到异常的影响，这些异常信号可能暗示着关键问题，如系统故障或安全事件。传统的MVTS异常检测方法，包括统计和集中式机器学习方法，面对大规模、分布式环境中的异质性、可变性和隐私问题时显得力不从心。为此，我们引入了FedKO，这是一个新颖的无监督联邦学习框架，利用Koopman算子理论的线性预测能力以及Reservoir Computing的动态适应性。这使得对MVTS数据进行有效的时空处理和隐私保护成为可能。FedKO被构建为一个双层优化问题，利用特定的联邦算法在不同数据集之间探索共享的Reservoir-Koopman模型。这样的模型随后可以部署在边缘设备上，用于有效检测本地MVTS流中的异常。在各种数据集上的实验结果展示了FedKO在MVTS异常检测方面优于现有方法的性能。此外，FedKO可以减少高达8倍的通信大小和2倍的内存使用，使其非常适合大规模系统。

更新时间: 2025-03-14 10:06:52

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2503.11255v1

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

Time series analysis plays a critical role in numerous applications, supporting tasks such as forecasting, classification, anomaly detection, and imputation. In this work, we present the time series pattern machine (TSPM), a model designed to excel in a broad range of time series tasks through powerful representation and pattern extraction capabilities. Traditional time series models often struggle to capture universal patterns, limiting their effectiveness across diverse tasks. To address this, we define multiple scales in the time domain and various resolutions in the frequency domain, employing various mixing strategies to extract intricate, task-adaptive time series patterns. Specifically, we introduce a general-purpose TSPM that processes multi-scale time series using (1) multi-resolution time imaging (MRTI), (2) time image decomposition (TID), (3) multi-scale mixing (MCM), and (4) multi-resolution mixing (MRM) to extract comprehensive temporal patterns. MRTI transforms multi-scale time series into multi-resolution time images, capturing patterns across both temporal and frequency domains. TID leverages dual-axis attention to extract seasonal and trend patterns, while MCM hierarchically aggregates these patterns across scales. MRM adaptively integrates all representations across resolutions. This method achieves state-of-the-art performance across 8 time series analytical tasks, consistently surpassing both general-purpose and task-specific models. Our work marks a promising step toward the next generation of TSPMs, paving the way for further advancements in time series analysis.

Updated: 2025-03-14 10:04:53

标题: TimeMixer++：一种用于普遍预测分析的通用时间序列模式机器

摘要: 时间序列分析在许多应用中起着至关重要的作用，支持诸如预测、分类、异常检测和填充等任务。在这项工作中，我们提出了时间序列模式机（TSPM），这是一种通过强大的表示和模式提取能力在广泛的时间序列任务中表现优异的模型。传统的时间序列模型通常难以捕捉普遍模式，限制了它们在不同任务中的有效性。为了解决这个问题，我们在时间域中定义了多个尺度和频域中的各种分辨率，采用各种混合策略来提取错综复杂、任务自适应的时间序列模式。具体来说，我们引入了一个通用的TSPM，使用多分辨率时间成像（MRTI）、时间成像分解（TID）、多尺度混合（MCM）和多分辨率混合（MRM）来提取全面的时间模式。MRTI将多尺度时间序列转换为多分辨率时间图像，捕捉了时间和频率领域的模式。TID利用双轴注意力来提取季节性和趋势模式，而MCM在不同尺度上层次地聚合这些模式。MRM自适应地整合了所有分辨率的表示。这种方法在8个时间序列分析任务中实现了最先进的性能，始终超越了通用和特定任务的模型。我们的工作标志着迈向下一代TSPM的有希望的一步，为时间序列分析的进一步发展铺平了道路。

更新时间: 2025-03-14 10:04:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.16032v4

Logit-Q Dynamics for Efficient Learning in Stochastic Teams

We present a new family of logit-Q dynamics for efficient learning in stochastic games by combining the log-linear learning (also known as logit dynamics) for the repeated play of normal-form games with Q-learning for unknown Markov decision processes within the auxiliary stage-game framework. In this framework, we view stochastic games as agents repeatedly playing some stage game associated with the current state of the underlying game while the agents' Q-functions determine the payoffs of these stage games. We show that the logit-Q dynamics presented reach (near) efficient equilibrium in stochastic teams with unknown dynamics and quantify the approximation error. We also show the rationality of the logit-Q dynamics against agents following pure stationary strategies and the convergence of the dynamics in stochastic games where the stage-payoffs induce potential games, yet only a single agent controls the state transitions beyond stochastic teams. The key idea is to approximate the dynamics with a fictional scenario where the Q-function estimates are stationary over epochs whose lengths grow at a sufficiently slow rate. We then couple the dynamics in the main and fictional scenarios to show that these two scenarios become more and more similar across epochs due to the vanishing step size and growing epoch lengths.

Updated: 2025-03-14 10:00:31

标题: Logit-Q动态学习在随机团队中的高效应用

摘要: 我们提出了一种新的logit-Q动态家族，用于在随机博弈中进行高效学习，通过将log-linear learning（也称为logit动态）与Q-learning相结合，用于未知马尔可夫决策过程的重复正常形式博弈。在这个框架中，我们将随机博弈视为代理者重复玩一些与底层游戏当前状态相关联的阶段游戏，而代理者的Q函数确定这些阶段游戏的回报。我们展示了所提出的logit-Q动态在具有未知动态的随机团队中达到（接近）高效均衡，并量化了近似误差。我们还展示了logit-Q动态对抗遵循纯固定策略的代理者的理性，以及在阶段回报引起潜在博弈的随机博弈中动态的收敛，然而只有一个代理控制超出随机团队的状态转换。关键思想是用一个虚构的情景来近似动态，其中Q函数的估计在长度以足够缓慢的速率增长的时期内是固定的。然后，我们将主要情景和虚构情景中的动态耦合起来，以显示这两种情景在时期之间由于逐渐减小的步长和增长的时期长度而变得越来越相似。

更新时间: 2025-03-14 10:00:31

领域: cs.GT,cs.AI,math.OC,91A15, 91A26, 68T05

下载: http://arxiv.org/abs/2302.09806v4

CRPS-Based Targeted Sequential Design with Application in Chemical Space

Sequential design of real and computer experiments via Gaussian Process (GP) models has proven useful for parsimonious, goal-oriented data acquisition purposes. In this work, we focus on acquisition strategies for a GP model that needs to be accurate within a predefined range of the response of interest. Such an approach is useful in various fields including synthetic chemistry, where finding molecules with particular properties is essential for developing useful materials and effective medications. GP modeling and sequential design of experiments have been successfully applied to a plethora of domains, including molecule research. Our main contribution here is to use the threshold-weighted Continuous Ranked Probability Score (CRPS) as a basic building block for acquisition functions employed within sequential design. We study pointwise and integral criteria relying on two different weighting measures and benchmark them against competitors, demonstrating improved performance with respect to considered goals. The resulting acquisition strategies are applicable to a wide range of fields and pave the way to further developing sequential design relying on scoring rules.

Updated: 2025-03-14 10:00:24

标题: CRPS基于的目标顺序设计及其在化学空间中的应用

摘要: 通过高斯过程（GP）模型的实际和计算机实验的顺序设计已被证明在节俭、目标导向的数据获取目的上非常有用。在这项工作中，我们关注需要在感兴趣的响应的预定范围内准确的GP模型的获取策略。这种方法在各个领域都很有用，包括合成化学，在那里找到具有特定性质的分子对于开发有用的材料和有效的药物至关重要。GP建模和实验的顺序设计已成功应用于众多领域，包括分子研究。我们在这里的主要贡献是将阈值加权连续排序概率得分（CRPS）作为顺序设计中所采用的获取函数的基本构建块。我们研究基于两种不同加权度量的逐点和积分标准，并将它们与竞争对手进行基准测试，证明在考虑的目标方面性能有所改善。由此产生的获取策略适用于各个领域，并为进一步依赖评分规则发展顺序设计铺平道路。

更新时间: 2025-03-14 10:00:24

领域: stat.ML,cs.LG,stat.AP,stat.CO

下载: http://arxiv.org/abs/2503.11250v1

Reasoning-Grounded Natural Language Explanations for Language Models

We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of tokens, the outputs of the reasoning process can become part of the model context and later be decoded to natural language as the model produces either the final answer or the explanation. To improve the faithfulness of the explanations, we propose to use a joint predict-explain approach, in which the answers and explanations are inferred directly from the reasoning sequence, without the explanations being dependent on the answers and vice versa. We demonstrate the plausibility of the proposed technique by achieving a high alignment between answers and explanations in several problem domains, observing that language models often simply copy the partial decisions from the reasoning sequence into the final answers or explanations. Furthermore, we show that the proposed use of reasoning can also improve the quality of the answers.

Updated: 2025-03-14 10:00:03

标题: 基于推理的自然语言解释对语言模型的影响

摘要: 我们提出了一种大型语言模型可解释性技术，通过将解释置于推理过程中，可以获得忠实的自然语言解释。当转换为一系列标记时，推理过程的输出可以成为模型上下文的一部分，并在模型产生最终答案或解释时被解码为自然语言。为了提高解释的忠实性，我们提出使用联合预测-解释方法，即答案和解释直接从推理序列推断，而不是解释依赖于答案等。我们通过在几个问题领域中实现答案和解释之间的高度对齐来证明所提出的技术的可行性，观察到语言模型通常只是将推理序列中的部分决策简单复制到最终答案或解释中。此外，我们还展示了推理的使用也可以提高答案的质量。

更新时间: 2025-03-14 10:00:03

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.11248v1

Dual-Stage Cross-Modal Network with Dynamic Feature Fusion for Emotional Mimicry Intensity Estimation

Emotional Mimicry Intensity (EMI) estimation serves as a critical technology for understanding human social behavior and enhancing human-computer interaction experiences, where the core challenge lies in dynamic correlation modeling and robust fusion of multimodal temporal signals. To address the limitations of existing methods in insufficient exploitation of modal synergistic effects, noise sensitivity, and limited fine-grained alignment capabilities, this paper proposes a dual-stage cross-modal alignment framework. First, we construct vision-text and audio-text contrastive learning networks based on an improved CLIP architecture, achieving preliminary alignment in the feature space through modality-decoupled pre-training. Subsequently, we design a temporal-aware dynamic fusion module that combines Temporal Convolutional Networks (TCN) and gated bidirectional LSTM to respectively capture the macro-evolution patterns of facial expressions and local dynamics of acoustic features. Innovatively, we introduce a quality-guided modality fusion strategy that enables modality compensation under occlusion and noisy scenarios through differentiable weight allocation. Experimental results on the Hume-Vidmimic2 dataset demonstrate that our method achieves an average Pearson correlation coefficient of 0.35 across six emotion dimensions, outperforming the best baseline by 40\%. Ablation studies further validate the effectiveness of the dual-stage training strategy and dynamic fusion mechanism, providing a novel technical pathway for fine-grained emotion analysis in open environments.

Updated: 2025-03-14 09:55:43

标题: 双阶段跨模态网络与动态特征融合用于情绪模仿强度估计

摘要: 情感模仿强度（EMI）估计作为理解人类社会行为和增强人机交互体验的关键技术，其中的核心挑战在于动态相关建模和多模态时间信号的稳健融合。为了解决现有方法在模态协同效应、噪声敏感性和有限的细粒度对齐能力方面的局限性，本文提出了一个双阶段跨模态对齐框架。首先，我们基于改进的CLIP架构构建了视觉-文本和音频-文本对比学习网络，通过模态解耦的预训练在特征空间中实现初步对齐。随后，我们设计了一个时间感知动态融合模块，结合了时间卷积网络（TCN）和门控双向LSTM，分别捕获面部表情的宏观演变模式和声学特征的局部动态。创新地，我们引入了一个质量引导的模态融合策略，通过可微分的权重分配，在遮挡和嘈杂场景下实现模态补偿。在Hume-Vidmimic2数据集上的实验结果表明，我们的方法在六个情绪维度上实现了平均皮尔逊相关系数为0.35，比最佳基线表现提高了40％。消融研究进一步验证了双阶段训练策略和动态融合机制的有效性，为开放环境中细粒度情感分析提供了一种新颖的技术路径。

更新时间: 2025-03-14 09:55:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.10603v2

Cost-effective Deep Learning Infrastructure with NVIDIA GPU

The growing demand for computational power is driven by advancements in deep learning, the increasing need for big data processing, and the requirements of scientific simulations for academic and research purposes. Developing countries like Nepal often struggle with the resources needed to invest in new and better hardware for these purposes. However, optimizing and building on existing technology can still meet these computing demands effectively. To address these needs, we built a cluster using four NVIDIA GeForce GTX 1650 GPUs. The cluster consists of four nodes: one master node that controls and manages the entire cluster, and three compute nodes dedicated to processing tasks. The master node is equipped with all necessary software for package management, resource scheduling, and deployment, such as Anaconda and Slurm. In addition, a Network File Storage (NFS) system was integrated to provide the additional storage required by the cluster. Given that the cluster is accessible via ssh by a public domain address, which poses significant cybersecurity risks, we implemented fail2ban to mitigate brute force attacks and enhance security. Despite the continuous challenges encountered during the design and implementation process, this project demonstrates how powerful computational clusters can be built to handle resource-intensive tasks in various demanding fields.

Updated: 2025-03-14 09:54:36

标题: 经济高效的具有NVIDIA GPU的深度学习基础设施

摘要: 随着深度学习的进步、大数据处理的日益需求以及学术和研究目的的科学模拟要求，对计算能力的需求不断增长。尼泊尔等发展中国家往往面临投资新型和更好硬件所需资源的困难。然而，通过优化和构建现有技术仍然可以有效满足这些计算需求。为了解决这些需求，我们构建了一个使用四个NVIDIA GeForce GTX 1650 GPU的集群。该集群由四个节点组成：一个控制和管理整个集群的主节点，以及三个专用于处理任务的计算节点。主节点配备了所有必要的软件，用于包管理、资源调度和部署，如Anaconda和Slurm。此外，集成了网络文件存储（NFS）系统，以提供集群所需的额外存储空间。考虑到集群可以通过公共域地址通过ssh访问，这带来了重要的网络安全风险，我们实施了fail2ban来减轻暴力攻击并增强安全性。尽管在设计和实施过程中不断遇到挑战，但这个项目展示了如何构建强大的计算集群来处理各种领域中的资源密集型任务。

更新时间: 2025-03-14 09:54:36

领域: cs.DC,cs.AR,cs.LG,cs.SE,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.11246v1

Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning

Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the denoising tasks differ at each timestep, the gradients computed at different timesteps may conflict, potentially degrading the overall performance of image generation. To solve this issue, this work proposes a \textbf{De}couple-then-\textbf{Me}rge (\textbf{DeMe}) framework, which begins with a pretrained model and finetunes separate models tailored to specific timesteps. We introduce several improved techniques during the finetuning stage to promote effective knowledge sharing while minimizing training interference across timesteps. Finally, after finetuning, these separate models can be merged into a single model in the parameter space, ensuring efficient and practical inference. Experimental results show significant generation quality improvements upon 6 benchmarks including Stable Diffusion on COCO30K, ImageNet1K, PartiPrompts, and DDPM on LSUN Church, LSUN Bedroom, and CIFAR10. Code is available at \href{https://github.com/MqLeet/DeMe}{GitHub}.

Updated: 2025-03-14 09:54:17

标题: 解耦-然后合并：将扩散模型微调为多任务学习

摘要: 扩散模型通过学习一系列逆转每一步噪声损坏的模型来进行训练。通常，模型参数在多个时间步上完全共享，以增强训练效率。然而，由于每个时间步的去噪任务不同，不同时间步计算的梯度可能相互冲突，潜在地降低图像生成的整体性能。为了解决这个问题，本文提出了一个名为DeMe（De-couple-then-Merge）的框架，它从一个预训练模型开始，并微调针对特定时间步的单独模型。我们在微调阶段引入了几种改进的技术，以促进有效的知识共享，同时最大程度地减少跨时间步的训练干扰。最后，在微调之后，这些单独模型可以在参数空间中合并成一个单一模型，确保高效且实用的推断。实验结果显示，在包括COCO30K上的Stable Diffusion、ImageNet1K、PartiPrompts以及LSUN Church、LSUN Bedroom和CIFAR10上的DDPM等6个基准测试中，生成质量有显著改善。代码可在GitHub上找到：https://github.com/MqLeet/DeMe。

更新时间: 2025-03-14 09:54:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.06664v2

LLMPerf: GPU Performance Modeling meets Large Language Models

Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landscape of GPGPU. Meanwhile, Large Language Models (LLMs) have demonstrated their effectiveness in addressing diverse programming challenges. Our work establishes a connection between LLMs and performance modeling, employing the LLM as a performance estimator. Through experimental exploration with carefully designed large-scale OpenCL datasets, we highlight the potential capability as well as the main difficulties of using LLMs in handling performance modeling tasks for OpenCL device source programs. As the first study for this line of work, our LLM-based performance model achieves a mean absolute percentage error of $24.25\%$ for a large-scale generated validation set. On a set of publicly available OpenCL programs, our model achieves a mean absolute percentage error of $46.1\%$.

Updated: 2025-03-14 09:52:30

标题: LLMPerf：GPU性能建模遇见大型语言模型

摘要: 性能建模是程序成本分析中的一个关键领域，目前依赖于手工制作的模型，受到各种程序和硬件限制的约束，特别是在GPGPU复杂的景观中。与此同时，大型语言模型(LLMs)已经证明了它们在解决各种编程挑战方面的有效性。我们的工作建立了LLMs和性能建模之间的联系，利用LLM作为性能估计器。通过对精心设计的大规模OpenCL数据集进行实验性探索，我们突出了使用LLMs处理OpenCL设备源程序的性能建模任务的潜力能力以及主要困难。作为这一工作线的首次研究，我们基于LLMs的性能模型在一个大规模生成的验证集上实现了24.25%的平均绝对百分比误差。在一组公开可用的OpenCL程序中，我们的模型实现了46.1%的平均绝对百分比误差。

更新时间: 2025-03-14 09:52:30

领域: cs.PF,cs.DC,cs.LG

下载: http://arxiv.org/abs/2503.11244v1

A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis

The main challenges hindering the adoption of deep learning-based systems in clinical settings are the scarcity of annotated data and the lack of interpretability and trust in these systems. Concept Bottleneck Models (CBMs) offer inherent interpretability by constraining the final disease prediction on a set of human-understandable concepts. However, this inherent interpretability comes at the cost of greater annotation burden. Additionally, adding new concepts requires retraining the entire system. In this work, we introduce a novel two-step methodology that addresses both of these challenges. By simulating the two stages of a CBM, we utilize a pretrained Vision Language Model (VLM) to automatically predict clinical concepts, and an off-the-shelf Large Language Model (LLM) to generate disease diagnoses based on the predicted concepts. Furthermore, our approach supports test-time human intervention, enabling corrections to predicted concepts, which improves final diagnoses and enhances transparency in decision-making. We validate our approach on three skin lesion datasets, demonstrating that it outperforms traditional CBMs and state-of-the-art explainable methods, all without requiring any training and utilizing only a few annotated examples. The code is available at https://github.com/CristianoPatricio/2-step-concept-based-skin-diagnosis.

Updated: 2025-03-14 09:51:44

标题: 一种两步概念为基础的方法，提高皮肤病变诊断的解释性和信任度

摘要: 在临床环境中采用基于深度学习的系统面临的主要挑战是标注数据稀缺和对这些系统缺乏可解释性和信任。概念瓶颈模型（CBMs）通过将最终疾病预测限制在一组人类可理解的概念上，提供固有的可解释性。然而，这种固有的可解释性需要更大的标注负担。此外，添加新概念需要重新训练整个系统。在这项工作中，我们引入了一种新颖的两步方法来解决这些挑战。通过模拟CBM的两个阶段，我们利用预训练的视觉语言模型（VLM）自动预测临床概念，并利用现成的大型语言模型（LLM）基于预测的概念生成疾病诊断。此外，我们的方法支持测试时人类干预，使得对预测概念的更正能够提高最终诊断结果，并增强决策透明度。我们在三个皮肤病变数据集上验证了我们的方法，证明它优于传统的CBMs和最先进的可解释方法，而无需任何训练，仅利用少量标注示例。代码可在https://github.com/CristianoPatricio/2-step-concept-based-skin-diagnosis找到。

更新时间: 2025-03-14 09:51:44

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.05609v2

PharmacoMatch: Efficient 3D Pharmacophore Screening via Neural Subgraph Matching

The increasing size of screening libraries poses a significant challenge for the development of virtual screening methods for drug discovery, necessitating a re-evaluation of traditional approaches in the era of big data. Although 3D pharmacophore screening remains a prevalent technique, its application to very large datasets is limited by the computational cost associated with matching query pharmacophores to database molecules. In this study, we introduce PharmacoMatch, a novel contrastive learning approach based on neural subgraph matching. Our method reinterprets pharmacophore screening as an approximate subgraph matching problem and enables efficient querying of conformational databases by encoding query-target relationships in the embedding space. We conduct comprehensive investigations of the learned representations and evaluate PharmacoMatch as pre-screening tool in a zero-shot setting. We demonstrate significantly shorter runtimes and comparable performance metrics to existing solutions, providing a promising speed-up for screening very large datasets.

Updated: 2025-03-14 09:51:43

标题: PharmacoMatch：通过神经子图匹配实现高效的3D药效固体筛选

摘要: 筛选库规模不断增长为药物发现的虚拟筛选方法的发展提出了重大挑战，必须在大数据时代重新评估传统方法。尽管3D药效团筛选仍然是一种普遍的技术，但其在非常大的数据集上的应用受到与将查询药效团与数据库分子进行匹配相关的计算成本的限制。在这项研究中，我们介绍了PharmacoMatch，这是一种基于神经子图匹配的新型对比学习方法。我们的方法重新解释药效团筛选为近似子图匹配问题，并通过在嵌入空间中编码查询-目标关系，实现对构象数据库的高效查询。我们对学习到的表示进行了全面的研究，并将PharmacoMatch作为零样本设置中的预筛选工具进行评估。我们展示了明显更短的运行时间和与现有解决方案相媲美的性能指标，为筛选非常大的数据集提供了有望加速。

更新时间: 2025-03-14 09:51:43

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2409.06316v2

Compound Expression Recognition via Large Vision-Language Models

Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To address these issues, we propose a novel approach leveraging Large Vision-Language Models (LVLMs). Our method employs a two-stage fine-tuning process: first, pre-trained LVLMs are fine-tuned on basic facial expressions to establish foundational patterns; second, the model is further optimized on a compound-expression dataset to refine visual-language feature interactions. Our approach achieves advanced accuracy on the RAF-DB dataset and demonstrates strong zero-shot generalization on the C-EXPR-DB dataset, showcasing its potential for real-world applications in emotion analysis and human-computer interaction.

Updated: 2025-03-14 09:46:05

标题: 通过大型视觉-语言模型识别复合表达

摘要: 复合表达识别（CER）对于理解人类情绪和改善人机交互至关重要。然而，由于面部表情的复杂性和捕捉微妙情绪线索的困难，CER面临挑战。为了解决这些问题，我们提出了一种利用大型视觉-语言模型（LVLMs）的新方法。我们的方法采用两阶段微调过程：首先，预训练的LVLMs在基本面部表情上进行微调，以建立基础模式；其次，模型在一个复合表达数据集上进一步优化，以提炼视觉-语言特征交互。我们的方法在RAF-DB数据集上实现了先进的准确度，并在C-EXPR-DB数据集上展现出强大的零样本泛化能力，展示了其在情绪分析和人机交互领域的潜力。

更新时间: 2025-03-14 09:46:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11241v1

High Probability Guarantees for Random Reshuffling

We consider the stochastic gradient method with random reshuffling ($\mathsf{RR}$) for tackling smooth nonconvex optimization problems. $\mathsf{RR}$ finds broad applications in practice, notably in training neural networks. In this work, we provide high probability first-order and second-order complexity guarantees for this method. First, we establish a high probability first-order sample complexity result for driving the Euclidean norm of the gradient (without taking expectation) below $\varepsilon$. Our derived complexity matches the best existing in-expectation one up to a logarithmic term while imposing no additional assumptions nor changing $\mathsf{RR}$'s updating rule. We then propose a simple and computable stopping criterion for $\mathsf{RR}$ (denoted as $\mathsf{RR}$-$\mathsf{sc}$). This criterion is guaranteed to be triggered after a finite number of iterations, enabling us to prove a high probability first-order complexity guarantee for the last iterate. Second, building on the proposed stopping criterion, we design a perturbed random reshuffling method ($\mathsf{p}$-$\mathsf{RR}$) that involves an additional randomized perturbation procedure near stationary points. We derive that $\mathsf{p}$-$\mathsf{RR}$ provably escapes strict saddle points and establish a high probability second-order complexity result, without requiring any sub-Gaussian tail-type assumptions on the stochastic gradient errors. The fundamental ingredient in deriving the aforementioned results is the new concentration property for sampling without replacement in $\mathsf{RR}$, which could be of independent interest. Finally, we conduct numerical experiments on neural network training to support our theoretical findings.

Updated: 2025-03-14 09:45:53

标题: 随机重排的高概率保证

摘要: 我们考虑使用随机重排（$\mathsf{RR}$）的随机梯度方法来处理平滑非凸优化问题。$\mathsf{RR}$ 在实践中有广泛的应用，特别是在训练神经网络中。在这项工作中，我们为这种方法提供了高概率的一阶和二阶复杂度保证。首先，我们建立了一个高概率的一阶样本复杂度结果，用于使梯度的欧几里得范数（不考虑期望）降至小于 $\varepsilon$。我们得出的复杂度与现有最佳的期望复杂度相匹配，仅有一个对数项的差异，同时不需要额外的假设或更改 $\mathsf{RR}$ 的更新规则。然后我们提出了一个简单且可计算的停止准则用于 $\mathsf{RR}$（记为 $\mathsf{RR}$-$\mathsf{sc}$）。这个准则保证会在有限次迭代后被触发，使我们能够为最后的迭代证明高概率的一阶复杂度保证。其次，基于提出的停止准则，我们设计了一种扰动随机重排方法（$\mathsf{p}$-$\mathsf{RR}$），其中包含一个额外的随机扰动程序，用于在静止点附近。我们得出 $\mathsf{p}$-$\mathsf{RR}$ 可以可靠地避开严格鞍点，并建立了一个高概率的二阶复杂度结果，而无需对随机梯度误差做出任何次高斯尾部的假设。导出上述结果的基本要素是在 $\mathsf{RR}$ 中不重复采样的新的集中性质，这可能具有独立的兴趣。最后，我们对神经网络训练进行了数值实验，以支持我们的理论发现。

更新时间: 2025-03-14 09:45:53

领域: math.OC,cs.LG,90C30, 90C06, 90C26, 90C15

下载: http://arxiv.org/abs/2311.11841v3

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Diffusion models have achieved remarkable success in text-to-image generation. However, their practical applications are hindered by the misalignment between generated images and corresponding text prompts. To tackle this issue, reinforcement learning (RL) has been considered for diffusion model fine-tuning. Yet, RL's effectiveness is limited by the challenge of sparse reward, where feedback is only available at the end of the generation process. This makes it difficult to identify which actions during the denoising process contribute positively to the final generated image, potentially leading to ineffective or unnecessary denoising policies. To this end, this paper presents a novel RL-based framework that addresses the sparse reward problem when training diffusion models. Our framework, named $\text{B}^2\text{-DiffuRL}$, employs two strategies: \textbf{B}ackward progressive training and \textbf{B}ranch-based sampling. For one thing, backward progressive training focuses initially on the final timesteps of denoising process and gradually extends the training interval to earlier timesteps, easing the learning difficulty from sparse rewards. For another, we perform branch-based sampling for each training interval. By comparing the samples within the same branch, we can identify how much the policies of the current training interval contribute to the final image, which helps to learn effective policies instead of unnecessary ones. $\text{B}^2\text{-DiffuRL}$ is compatible with existing optimization algorithms. Extensive experiments demonstrate the effectiveness of $\text{B}^2\text{-DiffuRL}$ in improving prompt-image alignment and maintaining diversity in generated images. The code for this work is available.

Updated: 2025-03-14 09:45:19

标题: 朝着更好的对齐方向：使用强化学习训练扩散模型以对抗稀疏奖励

摘要: 扩散模型在文本到图像生成中取得了显著的成功。然而，它们的实际应用受到生成图像与相应文本提示之间的不一致的阻碍。为了解决这个问题，强化学习（RL）被考虑用于扩散模型的微调。然而，RL的有效性受到稀疏奖励的挑战的限制，即只有在生成过程结束时才有反馈。这使得很难确定去噪过程中的哪些行动对最终生成的图像起到积极作用，可能导致无效或不必要的去噪策略。因此，本文提出了一种新的基于RL的框架，用于解决训练扩散模型时的稀疏奖励问题。我们的框架，称为$\text{B}^2\text{-DiffuRL}$，采用了两种策略：\textbf{B}ackward progressive training和\textbf{B}ranch-based sampling。一方面，反向渐进训练最初专注于去噪过程的最终时间步，并逐渐将训练间隔延伸到较早的时间步，减轻来自稀疏奖励的学习难度。另一方面，我们对每个训练间隔执行基于分支的采样。通过比较同一分支内的样本，我们可以确定当前训练间隔的策略对最终图像的贡献程度，这有助于学习有效的策略而不是不必要的策略。$\text{B}^2\text{-DiffuRL}$兼容现有的优化算法。大量实验证明了$\text{B}^2\text{-DiffuRL}$在提高提示-图像对齐性和保持生成图像多样性方面的有效性。此工作的代码可供使用。

更新时间: 2025-03-14 09:45:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11240v1

Collaboration is all you need: LLM Assisted Safe Code Translation

This paper introduces UniTranslator, a visionary framework that re-imagines code translation as a collaborative endeavor among multiple, compact LLMs. By orchestrating the interaction of specialized agents, each focused on different aspects of the translation process and grounded in a deep understanding of programming concepts, UniTranslator achieves a level of accuracy and efficiency that rivals larger, monolithic models. Our preliminary evaluation demonstrates the potential of UniTranslator to overcome the limitations of existing approaches and unlock the power of smaller LLMs for complex code translation tasks. We explore the effectiveness of this dynamic multi-agent paradigm in handling diverse language pairs, including low-resource languages, and in mitigating common issues such as code artifacts and hallucinations through the use of Natural Language Inference (NLI) grounding and iterative feedback mechanisms

Updated: 2025-03-14 09:42:07

标题: 合作是你所需要的一切：LLM辅助安全代码转换

摘要: 这篇论文介绍了UniTranslator，这是一个具有远见的框架，重新构想代码翻译为多个紧凑的LLM之间的协作努力。通过协调专门代理的互动，每个代理专注于翻译过程的不同方面，并基于对编程概念的深刻理解，UniTranslator实现了与更大的、单一的模型相匹敌的准确性和效率水平。我们的初步评估展示了UniTranslator克服现有方法的局限性并释放较小的LLM在复杂代码翻译任务中的潜力。我们探讨了这种动态多代理范式在处理不同语言对（包括低资源语言）以及通过自然语言推理（NLI）基础和迭代反馈机制来减轻常见问题，如代码人工和幻觉方面的有效性。

更新时间: 2025-03-14 09:42:07

领域: cs.AI,cs.CL,cs.SE

下载: http://arxiv.org/abs/2503.11237v1

Limits of nonlinear and dispersive fiber propagation for photonic extreme learning

We report a generalized nonlinear Schr\"odinger equation simulation model of an extreme learning machine (ELM) based on optical fiber propagation. Using handwritten digit classification as a benchmark, we study how accuracy depends on propagation dynamics, as well as parameters governing spectral encoding, readout, and noise. Test accuracies of over 91% and 93% are found for propagation in the anomalous and normal dispersion regimes respectively. Our simulation results also suggest that quantum noise on the input pulses introduces an intrinsic penalty to ELM performance.

Updated: 2025-03-14 09:36:47

标题: 非线性和色散光纤传输对于光子极限学习的限制

摘要: 我们报告了一个基于光纤传播的极限学习机（ELM）的广义非线性Schr\"odinger方程模拟模型。以手写数字分类作为基准，我们研究了准确率如何取决于传播动力学，以及控制光谱编码、读出和噪声的参数。在异常和正常色散区域传播时，测试准确率分别超过91%和93%。我们的模拟结果还表明，输入脉冲上的量子噪声对ELM性能产生了固有的惩罚。

更新时间: 2025-03-14 09:36:47

领域: physics.optics,cs.LG

下载: http://arxiv.org/abs/2503.03649v2

Concise and Organized Perception Facilitates Reasoning in Large Language Models

Exploiting large language models (LLMs) to tackle reasoning has garnered growing attention. It still remains highly challenging to achieve satisfactory results in complex logical problems, characterized by plenty of premises within the context and requiring multi-hop reasoning. In particular, the reasoning capabilities of LLMs are brittle to disorder and distractibility. In this work, we first examine the mechanism from the perspective of information flow and reveal that LLMs confront difficulties akin to human-like cognitive biases when dealing with disordered and irrelevant content in reasoning tasks. However, in contrast to LLMs, disordered and irrelevant content does not significantly decrease human performance, as humans have a propensity to distill the most relevant information and systematically organize their thoughts, aiding them in responding to questions.Stem from that, we further propose a novel reasoning approach named Concise and Organized Perception (COP). COP carefully analyzes the given statements to identify the most pertinent information while eliminating redundancy efficiently. It then prompts the LLMs in a more organized form that adapts to the model's inference process. By perceiving concise and organized context, the reasoning abilities of LLMs can be better elicited. Extensive experimental results on several popular logical benchmarks (ProofWriter, PrOntoQA, PrOntoQA-OOD, and FOLIO) and mathematical benchmark (DI-GSM) show that COP significantly outperforms previous state-of-the-art methods.

Updated: 2025-03-14 09:33:02

标题: 简洁有序的感知有助于大型语言模型的推理

摘要: 利用大型语言模型（LLMs）来处理推理问题已经引起了越来越多的关注。在复杂的逻辑问题中，尤其是在上下文中存在大量前提和需要多次推理的情况下仍然很难获得令人满意的结果。尤其是，LLMs的推理能力容易受到混乱和分散注意力的影响。在这项工作中，我们首先从信息流的角度检查机制，并揭示了当处理推理任务中的混乱和无关内容时，LLMs面临类似于人类认知偏见的困难。然而，与LLMs相比，混乱和无关内容并不会显著降低人类的表现，因为人类有倾向于提炼最相关信息并系统地组织他们的思维，帮助他们回答问题。基于此，我们进一步提出了一种名为“简明有序知觉”（COP）的新型推理方法。COP仔细分析给定的陈述，以识别最相关的信息，同时高效地消除冗余。然后，它以更有组织的形式提示LLMs，适应模型的推理过程。通过感知简明有序的上下文，LLMs的推理能力可以更好地被激发。对几个流行的逻辑基准（ProofWriter、PrOntoQA、PrOntoQA-OOD和FOLIO）和数学基准（DI-GSM）的广泛实验结果表明，COP明显优于以前的最先进方法。

更新时间: 2025-03-14 09:33:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.03309v5

Addressing Information Loss and Interaction Collapse: A Dual Enhanced Attention Framework for Feature Interaction

The Transformer has proven to be a significant approach in feature interaction for CTR prediction, achieving considerable success in previous works. However, it also presents potential challenges in handling feature interactions. Firstly, Transformers may encounter information loss when capturing feature interactions. By relying on inner products to represent pairwise relationships, they compress raw interaction information, which can result in a degradation of fidelity. Secondly, due to the long-tail features distribution, feature fields with low information-abundance embeddings constrain the information abundance of other fields, leading to collapsed embedding matrices. To tackle these issues, we propose a Dual Attention Framework for Enhanced Feature Interaction, known as Dual Enhanced Attention. This framework integrates two attention mechanisms: the Combo-ID attention mechanism and the collapse-avoiding attention mechanism. The Combo-ID attention mechanism directly retains feature interaction pairs to mitigate information loss, while the collapse-avoiding attention mechanism adaptively filters out low information-abundance interaction pairs to prevent interaction collapse. Extensive experiments conducted on industrial datasets have shown the effectiveness of Dual Enhanced Attention.

Updated: 2025-03-14 09:31:03

标题: 解决信息丢失和交互坍缩：一种用于特征交互的双增强注意力框架

摘要: Transformer已被证明是CTR预测中特征交互的重要方法，在先前的工作中取得了相当大的成功。然而，它也在处理特征交互方面存在潜在挑战。首先，Transformers在捕获特征交互时可能会遇到信息丢失的问题。通过依赖内积来表示成对关系，它们压缩原始交互信息，可能导致准确性下降。其次，由于长尾特征分布，具有低信息丰富度嵌入的特征字段限制了其他字段的信息丰富度，导致嵌入矩阵崩溃。为了解决这些问题，我们提出了一个增强特征交互的双重注意力框架，称为双增强注意力。该框架集成了两种注意力机制：Combo-ID注意力机制和避免崩溃的注意力机制。Combo-ID注意力机制直接保留特征交互对，以减轻信息丢失，而避免崩溃的注意力机制则自适应地过滤出低信息丰富度的交互对，以防止交互崩溃。在工业数据集上进行的大量实验证明了双增强注意力的有效性。

更新时间: 2025-03-14 09:31:03

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2503.11233v1

PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but also pose significant privacy risks by memorizing and leaking Personally Identifiable Information (PII). Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage. To address this challenge, we introduce PrivacyScalpel, a novel privacy-preserving framework that leverages LLM interpretability techniques to identify and mitigate PII leakage while maintaining performance. PrivacyScalpel comprises three key steps: (1) Feature Probing, which identifies layers in the model that encode PII-rich representations, (2) Sparse Autoencoding, where a k-Sparse Autoencoder (k-SAE) disentangles and isolates privacy-sensitive features, and (3) Feature-Level Interventions, which employ targeted ablation and vector steering to suppress PII leakage. Our empirical evaluation on Gemma2-2b and Llama2-7b, fine-tuned on the Enron dataset, shows that PrivacyScalpel significantly reduces email leakage from 5.15\% to as low as 0.0\%, while maintaining over 99.4\% of the original model's utility. Notably, our method outperforms neuron-level interventions in privacy-utility trade-offs, demonstrating that acting on sparse, monosemantic features is more effective than manipulating polysemantic neurons. Beyond improving LLM privacy, our approach offers insights into the mechanisms underlying PII memorization, contributing to the broader field of model interpretability and secure AI deployment.

Updated: 2025-03-14 09:31:01

标题: PrivacyScalpel：通过稀疏自动编码器进行可解释特征干预以增强LLM隐私

摘要: 大型语言模型（LLMs）在自然语言处理中展示出卓越的能力，但也存在显著的隐私风险，因为它们会记忆和泄露个人可识别信息（PII）。现有的缓解策略，如差分隐私和神经元级干预，通常会降低模型效用或未能有效阻止信息泄露。为了解决这一挑战，我们引入了PrivacyScalpel，这是一个新颖的隐私保护框架，利用LLM可解释性技术来识别和减轻PII泄露，同时保持性能。PrivacyScalpel包括三个关键步骤：（1）特征探查，识别模型中编码PII丰富表示的层，（2）稀疏自编码，其中k-稀疏自编码器（k-SAE）解开并隔离隐私敏感特征，以及（3）特征级干预，利用有针对性的消融和向量导向来抑制PII泄露。我们在Enron数据集上对Gemma2-2b和Llama2-7b进行微调的经验评估表明，PrivacyScalpel将电子邮件泄露率从5.15％显著降低到0.0％，同时保持原模型效用的99.4％以上。值得注意的是，我们的方法在隐私效用权衡方面优于神经元级干预，表明对稀疏、单义特征进行操作比操纵多义神经元更有效。除了改善LLM隐私外，我们的方法还为PII记忆机制提供了见解，为模型可解释性和安全AI部署的广泛领域做出了贡献。

更新时间: 2025-03-14 09:31:01

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.11232v1

LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving

Despite the impressive performance achieved by data-fusion networks with duplex encoders for visual semantic segmentation, they become ineffective when spatial geometric data are not available. Implicitly infusing the spatial geometric prior knowledge acquired by a data-fusion teacher network into a single-modal student network is a practical, albeit less explored research avenue. This article delves into this topic and resorts to knowledge distillation approaches to address this problem. We introduce the Learning to Infuse ''X'' (LIX) framework, with novel contributions in both logit distillation and feature distillation aspects. We present a mathematical proof that underscores the limitation of using a single, fixed weight in decoupled knowledge distillation and introduce a logit-wise dynamic weight controller as a solution to this issue. Furthermore, we develop an adaptively-recalibrated feature distillation algorithm, including two novel techniques: feature recalibration via kernel regression and in-depth feature consistency quantification via centered kernel alignment. Extensive experiments conducted with intermediate-fusion and late-fusion networks across various public datasets provide both quantitative and qualitative evaluations, demonstrating the superior performance of our LIX framework when compared to other state-of-the-art approaches.

Updated: 2025-03-14 09:24:22

标题: LIX：将空间几何先验知识隐式融入自动驾驶的视觉语义分割

摘要: 尽管双工编码器用于视觉语义分割的数据融合网络取得了令人印象深刻的表现，但当空间几何数据不可用时，它们变得无效。将数据融合教师网络获得的空间几何先验知识隐式注入单模态学生网络是一种实际但较少探索的研究方向。本文深入探讨了这一主题，并采用知识蒸馏方法来解决这一问题。我们引入了学习注入“X”（LIX）框架，对逻辑蒸馏和特征蒸馏方面进行了新颖的贡献。我们提出了一个数学证明，强调在解耦知识蒸馏中使用单一固定权重的限制，并引入逻辑动态权重控制器作为解决此问题的方法。此外，我们开发了一种自适应重新校准特征蒸馏算法，包括两种新技术：通过核回归重新校准特征和通过中心核对齐进行深入特征一致性量化。通过在各种公共数据集上进行的中间融合和后期融合网络进行的广泛实验，我们提供了定量和定性评估，展示了我们的LIX框架与其他最先进方法相比表现出的卓越性能。

更新时间: 2025-03-14 09:24:22

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2403.08215v2

A Two-Stage Imaging Framework Combining CNN and Physics-Informed Neural Networks for Full-Inverse Tomography: A Case Study in Electrical Impedance Tomography (EIT)

Electrical Impedance Tomography (EIT) is a highly ill-posed inverse problem, with the challenge of reconstructing internal conductivities using only boundary voltage measurements. Although Physics-Informed Neural Networks (PINNs) have shown potential in solving inverse problems, existing approaches are limited in their applicability to EIT, as they often rely on impractical prior knowledge and assumptions that cannot be satisfied in real-world scenarios. To address these limitations, we propose a two-stage hybrid learning framework that combines Convolutional Neural Networks (CNNs) and PINNs. This framework integrates data-driven and model-driven paradigms, blending supervised and unsupervised learning to reconstruct conductivity distributions while ensuring adherence to the underlying physical laws, thereby overcoming the constraints of existing methods.

Updated: 2025-03-14 09:21:43

标题: 一个结合了卷积神经网络和基于物理信息的神经网络的两阶段成像框架，用于完全反问题层析成像：以电阻抗层析成像（EIT）为例的案例研究

摘要: 电阻抗成像技术（EIT）是一个高度不适定的反问题，其挑战在于仅利用边界电压测量来重建内部电导率。虽然基于物理信息的神经网络（PINNs）已经显示出在解决反问题方面的潜力，但现有方法在EIT中的适用性受到限制，因为它们经常依赖于不切实际的先验知识和假设，这些先验知识和假设在现实场景中无法满足。为了解决这些限制，我们提出了一个两阶段混合学习框架，结合了卷积神经网络（CNNs）和PINNs。该框架整合了数据驱动和模型驱动的范式，融合了监督和无监督学习，以重建电导率分布，同时确保遵守基本物理规律，从而克服现有方法的约束。

更新时间: 2025-03-14 09:21:43

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2407.17721v2

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

State Space Models (SSMs) have emerged as a promising alternative to the popular transformer-based models and have been increasingly gaining attention. Compared to transformers, SSMs excel at tasks with sequential data or longer contexts, demonstrating comparable performances with significant efficiency gains. In this survey, we provide a coherent and systematic overview for SSMs, including their theoretical motivations, mathematical formulations, comparison with existing model classes, and various applications. We divide the SSM series into three main sections, providing a detailed introduction to the original SSM, the structured SSM represented by S4, and the selective SSM typified by Mamba. We put an emphasis on technicality, and highlight the various key techniques introduced to address the effectiveness and efficiency of SSMs. We hope this manuscript serves as an introduction for researchers to explore the theoretical foundations of SSMs.

Updated: 2025-03-14 09:20:31

标题: 技术对效果和效率的影响：状态空间模型调查

摘要: 状态空间模型（SSMs）已经成为流行的基于转换器的模型的一个有希望的替代品，并且越来越受到关注。与转换器相比，SSMs在处理顺序数据或更长上下文的任务方面表现出色，表现出显著的效率提升。在这篇综述中，我们为SSMs提供了一个连贯而系统的概述，包括它们的理论动机、数学公式、与现有模型类别的比较以及各种应用。我们将SSM系列分为三个主要部分，详细介绍原始SSM、由S4代表的结构化SSM以及由Mamba代表的选择性SSM。我们强调技术性，并突出介绍了为解决SSMs的有效性和效率而引入的各种关键技术。我们希望这篇手稿可以为研究人员提供一个探索SSMs理论基础的入门。

更新时间: 2025-03-14 09:20:31

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.11224v1

Wearable intelligent throat enables natural speech in stroke patients with dysarthria

Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to enable fluent, emotionally expressive communication. The system utilizes ultrasensitive textile strain sensors to capture high-quality signals from the neck area and supports token-level processing for real-time, continuous speech decoding, enabling seamless, delay-free communication. In tests with five stroke patients with dysarthria, IT's LLM agents intelligently corrected token errors and enriched sentence-level emotional and logical coherence, achieving low error rates (4.2% word error rate, 2.9% sentence error rate) and a 55% increase in user satisfaction. This work establishes a portable, intuitive communication platform for patients with dysarthria with the potential to be applied broadly across different neurological conditions and in multi-language support systems.

Updated: 2025-03-14 09:14:26

标题: 可穿戴智能喉咙设备使中风患者的发音困难自然言语化

摘要: 可穿戴的无声语音系统在恢复语言障碍患者的交流方面具有重要潜力。然而，流畅、连贯的语音仍然难以实现，临床疗效尚未得到证实。在这里，我们介绍了一种AI驱动的智能喉咙（IT）系统，该系统将喉部肌肉振动和颈动脉脉搏信号传感器与大型语言模型（LLM）处理集成在一起，以实现流畅、情感表达丰富的交流。该系统利用超灵敏的纺织应变传感器捕捉颈部区域的高质量信号，并支持标记级处理，实现实时、连续的语音解码，实现无缝、无延迟的交流。在与五名患有运动障碍的中风患者的测试中，IT的LLM代理智能纠正了标记错误，并丰富了句级情感和逻辑连贯性，实现低错误率（4.2%字错误率，2.9%句错误率）和55%的用户满意度增加。这项工作为患有运动障碍的患者建立了一个便携、直观的交流平台，具有在不同神经病症和多语言支持系统中广泛应用的潜力。

更新时间: 2025-03-14 09:14:26

领域: eess.AS,cs.AI,cs.SD,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.18266v3

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Traffic simulation aims to learn a policy for traffic agents that, when unrolled in closed-loop, faithfully recovers the joint distribution of trajectories observed in the real world. Inspired by large language models, tokenized multi-agent policies have recently become the state-of-the-art in traffic simulation. However, they are typically trained through open-loop behavior cloning, and thus suffer from covariate shift when executed in closed-loop during simulation. In this work, we present Closest Among Top-K (CAT-K) rollouts, a simple yet effective closed-loop fine-tuning strategy to mitigate covariate shift. CAT-K fine-tuning only requires existing trajectory data, without reinforcement learning or generative adversarial imitation. Concretely, CAT-K fine-tuning enables a small 7M-parameter tokenized traffic simulation policy to outperform a 102M-parameter model from the same model family, achieving the top spot on the Waymo Sim Agent Challenge leaderboard at the time of submission. The code is available at https://github.com/NVlabs/catk.

Updated: 2025-03-14 09:11:40

标题: 令牌化流量模型的封闭式监督微调

摘要: 交通仿真旨在学习交通代理的策略，当在闭环中展开时，能够忠实地恢复在现实世界中观察到的轨迹的联合分布。受大型语言模型的启发，最近在交通仿真中，标记化的多代理策略已成为最先进技术。然而，它们通常是通过开环行为克隆进行训练，因此在模拟过程中闭环执行时会受到协变量漂移的影响。在这项研究中，我们提出了最接近前K个（CAT-K）回滚，这是一种简单而有效的闭环微调策略，可以减轻协变量漂移。CAT-K微调只需要现有的轨迹数据，而无需强化学习或生成对抗性模仿。具体而言，CAT-K微调使得一个小型的7M参数标记化交通仿真策略能够胜过来自同一模型系列的102M参数模型，在提交时在Waymo Sim Agent Challenge排行榜上获得第一名。代码可在https://github.com/NVlabs/catk上找到。

更新时间: 2025-03-14 09:11:40

领域: cs.LG

下载: http://arxiv.org/abs/2412.05334v2

MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification with Zoom-Free Remote Sensing Imagery

Accurate fine-grained geospatial scene classification using remote sensing imagery is essential for a wide range of applications. However, existing approaches often rely on manually zooming remote sensing images at different scales to create typical scene samples. This approach fails to adequately support the fixed-resolution image interpretation requirements in real-world scenarios. To address this limitation, we introduce the Million-scale finE-grained geospatial scEne classification dataseT (MEET), which contains over 1.03 million zoom-free remote sensing scene samples, manually annotated into 80 fine-grained categories. In MEET, each scene sample follows a scene-inscene layout, where the central scene serves as the reference, and auxiliary scenes provide crucial spatial context for finegrained classification. Moreover, to tackle the emerging challenge of scene-in-scene classification, we present the Context-Aware Transformer (CAT), a model specifically designed for this task, which adaptively fuses spatial context to accurately classify the scene samples. CAT adaptively fuses spatial context to accurately classify the scene samples by learning attentional features that capture the relationships between the center and auxiliary scenes. Based on MEET, we establish a comprehensive benchmark for fine-grained geospatial scene classification, evaluating CAT against 11 competitive baselines. The results demonstrate that CAT significantly outperforms these baselines, achieving a 1.88% higher balanced accuracy (BA) with the Swin-Large backbone, and a notable 7.87% improvement with the Swin-Huge backbone. Further experiments validate the effectiveness of each module in CAT and show the practical applicability of CAT in the urban functional zone mapping. The source code and dataset will be publicly available at https://jerrywyn.github.io/project/MEET.html.

Updated: 2025-03-14 09:10:45

标题: MEET: 一个用于精细地理空间场景分类的百万级数据集，基于无需缩放的遥感图像

摘要: 准确的细粒度地理场景分类是利用遥感图像的广泛应用所必需的。然而，现有方法通常依赖于手动缩放遥感图像的不同比例，以创建典型的场景样本。这种方法未能充分支持真实场景中固定分辨率图像解释的要求。为了解决这一限制，我们引入了百万规模的细粒度地理场景分类数据集（MEET），其中包含超过103万个无缩放的遥感场景样本，手动注释为80个细粒度类别。在MEET中，每个场景样本遵循一种场景中场景的布局，其中中央场景作为参考，辅助场景提供了对细粒度分类至关重要的空间背景。此外，为了解决场景中场景分类的新挑战，我们提出了上下文感知变压器（CAT），这是专门为此任务设计的模型，通过自适应地融合空间背景来准确分类场景样本。CAT通过学习捕捉中心和辅助场景之间关系的注意特征，自适应地融合空间背景，以准确分类场景样本。基于MEET，我们建立了一个细粒度地理场景分类的全面基准，评估CAT与11个竞争基线的性能。结果表明，CAT明显优于这些基线，使用Swin-Large骨干网络获得了1.88%更高的平衡精度（BA），并且使用Swin-Huge骨干网络获得了显著的7.87%改进。进一步实验证实了CAT中每个模块的有效性，并展示了CAT在城市功能区映射中的实际适用性。源代码和数据集将在https://jerrywyn.github.io/project/MEET.html上公开提供。

更新时间: 2025-03-14 09:10:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11219v1

Optimal Transport and Adaptive Thresholding for Universal Domain Adaptation on Time Series

Universal Domain Adaptation (UniDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain, even when their classes are not fully shared. Few dedicated UniDA methods exist for Time Series (TS), which remains a challenging case. In general, UniDA approaches align common class samples and detect unknown target samples from emerging classes. Such detection often results from thresholding a discriminability metric. The threshold value is typically either a fine-tuned hyperparameter or a fixed value, which limits the ability of the model to adapt to new data. Furthermore, discriminability metrics exhibit overconfidence for unknown samples, leading to misclassifications. This paper introduces UniJDOT, an optimal-transport-based method that accounts for the unknown target samples in the transport cost. Our method also proposes a joint decision space to improve the discriminability of the detection module. In addition, we use an auto-thresholding algorithm to reduce the dependence on fixed or fine-tuned thresholds. Finally, we rely on a Fourier transform-based layer inspired by the Fourier Neural Operator for better TS representation. Experiments on TS benchmarks demonstrate the discriminability, robustness, and state-of-the-art performance of UniJDOT.

Updated: 2025-03-14 09:09:21

标题: 最佳传输和自适应阈值对时间序列的通用领域适应进行翻译

摘要: Universal Domain Adaptation (UniDA)旨在将知识从一个标记的源域传输到一个未标记的目标域，即使它们的类别不完全相同。很少有专门针对时间序列（TS）的UniDA方法存在，这仍然是一个具有挑战性的案例。一般来说，UniDA方法会对齐共同的类别样本，并检测来自新兴类别的未知目标样本。这种检测通常是通过对可分性度量进行阈值设定来实现的。阈值通常是一个经过微调的超参数或一个固定值，这限制了模型对新数据的适应能力。此外，可分性度量对未知样本表现出过于自信的倾向，导致误分类。本文介绍了UniJDOT，这是一种基于最优传输的方法，考虑了传输成本中的未知目标样本。我们的方法还提出了一个联合决策空间，以提高检测模块的可分性。此外，我们使用自动阈值算法来减少对固定或微调阈值的依赖。最后，我们依靠一种受傅里叶神经算子启发的基于傅里叶变换的层来更好地表示时间序列。对时间序列基准测试的实验表明了UniJDOT的可分性、鲁棒性和最先进的性能。

更新时间: 2025-03-14 09:09:21

领域: cs.LG

下载: http://arxiv.org/abs/2503.11217v1

Spatio-Temporal Graph Structure Learning for Earthquake Detection

Earthquake detection is essential for earthquake early warning (EEW) systems. Traditional methods struggle with low signal-to-noise ratios and single-station reliance, limiting their effectiveness. We propose a Spatio-Temporal Graph Convolutional Network (GCN) using Spectral Structure Learning Convolution (Spectral SLC) to model static and dynamic relationships across seismic stations. Our approach processes multi-station waveform data and generates station-specific detection probabilities. Experiments show superior performance over a conventional GCN baseline in terms of true positive rate (TPR) and false positive rate (FPR), highlighting its potential for robust multi-station earthquake detection. The code repository for this study is available at https://github.com/SuchanunP/eq_detector.

Updated: 2025-03-14 09:07:18

标题: 空间-时间图结构学习用于地震检测

摘要: 地震检测对于地震预警系统至关重要。传统方法在信噪比低和仅依赖单个站点的情况下往往效果有限。我们提出了一种使用谱结构学习卷积（Spectral SLC）的时空图卷积网络（GCN），用于建模地震台站之间的静态和动态关系。我们的方法处理多站波形数据并生成特定站点的检测概率。实验证明，与传统的GCN基线相比，我们的方法在真阳性率（TPR）和假阳性率（FPR）方面表现出更好的性能，突显了其在多站地震检测方面的潜力。本研究的代码存储库可在https://github.com/SuchanunP/eq_detector 上找到。

更新时间: 2025-03-14 09:07:18

领域: cs.LG

下载: http://arxiv.org/abs/2503.11215v1

BACE-RUL: A Bi-directional Adversarial Network with Covariate Encoding for Machine Remaining Useful Life Prediction

Prognostic and Health Management (PHM) are crucial ways to avoid unnecessary maintenance for Cyber-Physical Systems (CPS) and improve system reliability. Predicting the Remaining Useful Life (RUL) is one of the most challenging tasks for PHM. Existing methods require prior knowledge about the system, contrived assumptions, or temporal mining to model the life cycles of machine equipment/devices, resulting in diminished accuracy and limited applicability in real-world scenarios. This paper proposes a Bi-directional Adversarial network with Covariate Encoding for machine Remaining Useful Life (BACE-RUL) prediction, which only adopts sensor measurements from the current life cycle to predict RUL rather than relying on previous consecutive cycle recordings. The current sensor measurements of mechanical devices are encoded to a conditional space to better understand the implicit inner mechanical status. The predictor is trained as a conditional generative network with the encoded sensor measurements as its conditions. Various experiments on several real-world datasets, including the turbofan aircraft engine dataset and the dataset collected from degradation experiments of Li-Ion battery cells, show that the proposed model is a general framework and outperforms state-of-the-art methods.

Updated: 2025-03-14 08:56:40

标题: BACE-RUL：一种具有协变量编码的双向对抗网络用于机器剩余寿命预测

摘要: Prognostic and Health Management (PHM)是避免对网络物理系统（CPS）进行不必要维护并提高系统可靠性的关键方法。预测机器剩余可用寿命（RUL）是PHM中最具挑战性的任务之一。现有方法需要关于系统的先验知识、人为假设或时间挖掘来建模机器设备/设备的生命周期，导致准确性降低且在现实场景中的适用性有限。本文提出了一种具有协变编码的双向对抗网络用于机器剩余可用寿命（BACE-RUL）预测，该方法仅采用当前生命周期的传感器测量来预测RUL，而不依赖于先前连续周期的记录。机械设备的当前传感器测量被编码为条件空间，以更好地理解内在的机械状态。预测器被训练为一个具有编码传感器测量作为条件的条件生成网络。对包括涡轮喷气发动机数据集和从锂离子电池细胞退化实验中收集的数据集在内的几个现实世界数据集进行了各种实验，结果表明所提出的模型是一个通用框架，并且优于最先进的方法。

更新时间: 2025-03-14 08:56:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11730v1

CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Causal Significance and Consistency

Chain-based reasoning methods like chain of thought (CoT) play a rising role in solving reasoning tasks for large language models (LLMs). However, the causal illusions between \textit{a step of reasoning} and \textit{corresponding state transitions} are becoming a significant obstacle to advancing LLMs' reasoning capabilities, especially in long-range reasoning tasks. This paper proposes a non-chain-based reasoning framework for simultaneous consideration of causal significance and consistency, i.e., the Causal Significance and Consistency Enhancer (CSCE). We customize LLM's loss function utilizing treatment effect assessments to enhance its reasoning ability from two aspects: causal significance and consistency. This ensures that the model captures essential causal relationships and maintains robust and consistent performance across various scenarios. Additionally, we transform the reasoning process from the cascading multiple one-step reasoning commonly used in Chain-Based methods, like CoT, to a causal-enhanced method that outputs the entire reasoning process in one go, further improving the model's reasoning efficiency. Extensive experiments show that our method improves both the reasoning success rate and speed. These improvements further demonstrate that non-chain-based methods can also aid LLMs in completing reasoning tasks.

Updated: 2025-03-14 08:56:37

标题: CSCE：通过同时增强因果关系显著性和一致性来提升LLM推理

摘要: 链式推理方法，如思维链（CoT），在解决大型语言模型（LLMs）的推理任务中发挥着日益重要的作用。然而，\textit{推理步骤}和\textit{相应状态转换}之间的因果错觉成为提升LLMs推理能力的重要障碍，特别是在长距离推理任务中。本文提出了一个非链式推理框架，同时考虑因果显著性和一致性，即因果显著性和一致性增强器（CSCE）。我们定制了LLM的损失函数，利用治疗效应评估来增强其推理能力的两个方面：因果显著性和一致性。这确保模型捕捉到关键的因果关系，并在各种场景中保持稳健和一致的性能。此外，我们将推理过程从链式方法（如CoT）中常用的级联多步推理转变为一种因果增强方法，一次性输出整个推理过程，进一步提高了模型的推理效率。大量实验表明，我们的方法提高了推理成功率和速度。这些改进进一步证明了非链式方法也可以帮助LLMs完成推理任务。

更新时间: 2025-03-14 08:56:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.17174v2

Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?

This work presents a first evaluation of two state-of-the-art Large Reasoning Models (LRMs), OpenAI's o3-mini and DeepSeek R1, on analogical reasoning, focusing on well-established nonverbal human IQ tests based on Raven's progressive matrices. We benchmark with the I-RAVEN dataset and its more difficult extension, I-RAVEN-X, which tests the ability to generalize to longer reasoning rules and ranges of the attribute values. To assess the influence of visual uncertainties on these nonverbal analogical reasoning tests, we extend the I-RAVEN-X dataset, which otherwise assumes an oracle perception. We adopt a two-fold strategy to simulate this imperfect visual perception: 1) we introduce confounding attributes which, being sampled at random, do not contribute to the prediction of the correct answer of the puzzles and 2) smoothen the distributions of the input attributes' values. We observe a sharp decline in OpenAI's o3-mini task accuracy, dropping from 86.6% on the original I-RAVEN to just 17.0% -- approaching random chance -- on the more challenging I-RAVEN-X, which increases input length and range and emulates perceptual uncertainty. This drop occurred despite spending 3.4x more reasoning tokens. A similar trend is also observed for DeepSeek R1: from 80.6% to 23.2%. On the other hand, a neuro-symbolic probabilistic abductive model, ARLC, that achieves state-of-the-art performances on I-RAVEN, can robustly reason under all these out-of-distribution tests, maintaining strong accuracy with only a modest reduction from 98.6% to 88.0%. Our code is available at https://github.com/IBM/raven-large-language-models.

Updated: 2025-03-14 08:52:25

标题: 大型推理模型能否在感知不确定性下进行类比推理？

摘要: 这项工作首次评估了两种最先进的大型推理模型（LRMs），OpenAI的o3-mini和DeepSeek R1，在类比推理方面的表现，重点关注基于雷文渐进矩阵的成熟非语言人类智商测试。我们使用I-RAVEN数据集及其更具挑战性的扩展版本I-RAVEN-X进行基准测试，后者测试了推理规则的泛化能力以及属性值范围。为了评估这些非语言类比推理测试中视觉不确定性的影响，我们扩展了I-RAVEN-X数据集，该数据集原本假定具有预言者感知。我们采用了双重策略来模拟这种不完美的视觉感知：1）我们引入混淆属性，随机抽样，不参与谜题正确答案的预测；2）平滑输入属性值的分布。我们观察到OpenAI的o3-mini任务准确率急剧下降，从原始I-RAVEN的86.6%下降到仅17.0% ——接近随机机会——在更具挑战性的I-RAVEN-X上，后者增加了输入长度和范围，并模拟了感知不确定性。尽管投入了3.4倍的推理记号，这种下降仍然发生。DeepSeek R1也出现了类似的趋势：从80.6%下降至23.2%。另一方面，一种神经符号概率推导模型ARLC，在I-RAVEN上取得了最先进的性能，在所有这些分布之外的测试中都能稳健地进行推理，仅稍微降低从98.6%到88.0%的准确率。我们的代码可在https://github.com/IBM/raven-large-language-models 上找到。

更新时间: 2025-03-14 08:52:25

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.11207v1

Assessing the validity of new paradigmatic complexity measures as criterial features for proficiency in L2 writings in English

This article addresses Second Language (L2) writing development through an investigation of new grammatical and structural complexity metrics. We explore the paradigmatic production in learner English by linking language functions to specific grammatical paradigms. Using the EFCAMDAT as a gold standard and a corpus of French learners as an external test set, we employ a supervised learning framework to operationalise and evaluate seven microsystems. We show that learner levels are associated with the seven microsystems (MS). Using ordinal regression modelling for evaluation, the results show that all MS are significant but yield a low impact if taken individually. However, their influence is shown to be impactful if taken as a group. These microsystems and their measurement method suggest that it is possible to use them as part of broader-purpose CALL systems focused on proficiency assessment.

Updated: 2025-03-14 08:44:13

标题: 评估新范式复杂性测量作为英语第二语言写作熟练度标准特征的有效性

摘要: 本文通过调查新的语法和结构复杂性度量标准，探讨了第二语言（L2）写作发展。我们通过将语言功能与特定的语法范例联系起来，研究了学习者英语的范式产生。我们使用EFCAMDAT作为金标准和一个法语学习者语料库作为外部测试集，采用监督学习框架来实施和评估七个微系统。我们展示了学习者水平与这七个微系统（MS）相关联。使用序数回归建模进行评估，结果显示所有微系统都显著，但如果单独考虑则影响较小。然而，如果作为一组考虑，它们的影响力是显著的。这些微系统及其测量方法表明，可以将它们作为更广泛用途的CALL系统的一部分，重点放在熟练度评估上。

更新时间: 2025-03-14 08:44:13

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.10220v2

Physics-constrained DeepONet for Surrogate CFD models: a curved backward-facing step case

The Physics-Constrained DeepONet (PC-DeepONet), an architecture that incorporates fundamental physics knowledge into the data-driven DeepONet model, is presented in this study. This methodology is exemplified through surrogate modeling of fluid dynamics over a curved backward-facing step, a benchmark problem in computational fluid dynamics. The model was trained on computational fluid dynamics data generated for a range of parameterized geometries. The PC-DeepONet was able to learn the mapping from the parameters describing the geometry to the velocity and pressure fields. While the DeepONet is solely data-driven, the PC-DeepONet imposes the divergence constraint from the continuity equation onto the network. The PC-DeepONet demonstrates higher accuracy than the data-driven baseline, especially when trained on sparse data. Both models attain convergence with a small dataset of 50 samples and require only 50 iterations for convergence, highlighting the efficiency of neural operators in learning the dynamics governed by partial differential equations.

Updated: 2025-03-14 08:43:36

标题: 物理约束的DeepONet用于代理CFD模型：一个弯曲的向后踏步案例

摘要: 本研究介绍了一种将基础物理知识融入数据驱动的DeepONet模型的架构，即物理约束的DeepONet（PC-DeepONet）。该方法通过在计算流体力学中模拟曲线后向步跃这一基准问题来进行示范。该模型是在为一系列参数化几何形状生成的计算流体力学数据上进行训练的。PC-DeepONet能够学习从描述几何形状的参数到速度和压力场的映射。虽然DeepONet仅仅是数据驱动的，但PC-DeepONet将连续性方程中的散度约束强加到网络上。PC-DeepONet表现出比数据驱动基准更高的准确性，特别是在稀疏数据上训练时。这两种模型都在仅有50个样本的小数据集上实现了收敛，并且只需要50次迭代就能实现收敛，突显了神经运算符在学习由偏微分方程控制的动力学方面的高效性。

更新时间: 2025-03-14 08:43:36

领域: physics.flu-dyn,cs.LG

下载: http://arxiv.org/abs/2503.11196v1

LEACH-RLC: Enhancing IoT Data Transmission with Optimized Clustering and Reinforcement Learning

Wireless Sensor Networks (WSNs) play a pivotal role in enabling Internet of Things (IoT) devices with sensing and actuation capabilities. Operating in remote and resource-constrained environments, these IoT devices face challenges related to energy consumption, crucial for network longevity. Existing clustering protocols often suffer from high control overhead, inefficient cluster formation, and poor adaptability to dynamic network conditions, leading to suboptimal data transmission and reduced network lifetime. This paper introduces Low-Energy Adaptive Clustering Hierarchy with Reinforcement Learning-based Controller (LEACH-RLC), a novel clustering protocol designed to address these limitations by employing a Mixed Integer Linear Programming (MILP) approach for strategic selection of Cluster Heads (CHs) and node-to-cluster assignments. Additionally, it integrates a Reinforcement Learning (RL) agent to minimize control overhead by learning optimal timings for generating new clusters. LEACH-RLC aims to balance control overhead reduction without compromising overall network performance. Through extensive simulations, this paper investigates the frequency and opportune moments for generating new clustering solutions. Results demonstrate the superior performance of LEACH-RLC over state-of-the-art protocols, showcasing enhanced network lifetime, reduced average energy consumption, and minimized control overhead. The proposed protocol contributes to advancing the efficiency and adaptability of WSNs, addressing critical challenges in IoT deployments.

Updated: 2025-03-14 08:36:09

标题: LEACH-RLC：通过优化聚类和强化学习增强物联网数据传输

摘要: 无线传感器网络（WSNs）在实现具有感知和执行能力的物联网（IoT）设备方面起着关键作用。这些IoT设备在远程和资源受限的环境中运行，面临着与能源消耗相关的挑战，这对网络的寿命至关重要。现有的集群协议通常存在高控制开销、集群形成效率低和对动态网络条件适应性差的问题，导致数据传输次优和网络寿命降低。本文介绍了一种名为低能量自适应集群层次的基于强化学习控制器（LEACH-RLC）的新型集群协议，该协议通过采用混合整数线性规划（MILP）方法来策略性地选择簇首（CHs）和节点到簇的分配以解决这些限制。此外，它集成了一种强化学习（RL）代理以通过学习生成新簇的最佳时机来最小化控制开销。LEACH-RLC旨在在不影响整体网络性能的情况下平衡控制开销的降低。通过大量模拟，本文研究了生成新的集群解决方案的频率和适时时机。结果表明LEACH-RLC的性能优于最先进的协议，展示了增强的网络寿命、降低的平均能量消耗和最小化的控制开销。所提出的协议有助于提高WSN的效率和适应性，解决IoT部署中的关键挑战。

更新时间: 2025-03-14 08:36:09

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2401.15767v2

Towards a Digital Twin Modeling Method for Container Terminal Port

This paper introduces a novel strategy aimed at enhancing productivity and minimizing non-productive movements within container terminals, specifically focusing on container yards. It advocates for the implementation of a digital twin-based methodology to streamline the operations of stacking cranes (SCs) responsible for container handling. The proposed approach entails the creation of a virtual container yard that mirrors the physical yard within a digital twin system, facilitating real-time observation and validation. In addition, this article demonstrates the effectiveness of using a digital twin to reduce unproductive movements and improve productivity through simulation. It defines various operational strategies and takes into account different yard contexts, providing a comprehensive understanding of optimisation possibilities. By exploiting the capabilities of the digital twin, managers and operators are provided with crucial information on operational dynamics, enabling them to identify areas for improvement. This visualisation helps decision-makers to make informed choices about their stacking strategies, thereby improving the efficiency of overall container terminal operations. Overall, this paper present a digital twin solution in container terminal operations, offering a powerful tool for optimising productivity and minimising inefficiencies.

Updated: 2025-03-14 08:36:03

标题: 朝向集装箱码头数字孪生建模方法

摘要: 本文介绍了一种旨在提高生产力并最大程度减少集装箱码头内非生产性运动的新策略，特别关注集装箱场。它倡导实施基于数字孪生技术的方法论，以简化负责集装箱处理的堆垛机（SCs）的操作。所提出的方法涉及创建一个虚拟集装箱场，在数字孪生系统中镜像物理场地，促进实时观察和验证。此外，本文展示了使用数字孪生技术通过模拟减少非生产性运动并提高生产力的有效性。它定义了各种操作策略，并考虑了不同的场地背景，提供了对优化可能性的全面了解。通过利用数字孪生技术的能力，管理者和操作人员可以获得关于运营动态的重要信息，从而能够识别改进的领域。这种可视化帮助决策者做出关于他们的堆垛策略的明智选择，从而提高整个集装箱码头运营的效率。总的来说，本文提出了一种数字孪生解决方案，可用于集装箱码头运营，为优化生产力和最小化低效率提供了强大工具。

更新时间: 2025-03-14 08:36:03

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.13511v1

Cross-Modal Learning for Music-to-Music-Video Description Generation

Music-to-music-video generation is a challenging task due to the intrinsic differences between the music and video modalities. The advent of powerful text-to-video diffusion models has opened a promising pathway for music-video (MV) generation by first addressing the music-to-MV description task and subsequently leveraging these models for video generation. In this study, we focus on the MV description generation task and propose a comprehensive pipeline encompassing training data construction and multimodal model fine-tuning. We fine-tune existing pre-trained multimodal models on our newly constructed music-to-MV description dataset based on the Music4All dataset, which integrates both musical and visual information. Our experimental results demonstrate that music representations can be effectively mapped to textual domains, enabling the generation of meaningful MV description directly from music inputs. We also identify key components in the dataset construction pipeline that critically impact the quality of MV description and highlight specific musical attributes that warrant greater focus for improved MV description generation.

Updated: 2025-03-14 08:34:28

标题: 跨模态学习用于音乐到音乐视频描述生成

摘要: 音乐到音乐视频生成是一项具有挑战性的任务，因为音乐和视频模态之间的固有差异。强大的文本到视频扩散模型的出现为音乐视频（MV）生成打开了一条有希望的道路，首先解决音乐到MV描述任务，然后利用这些模型进行视频生成。在本研究中，我们关注MV描述生成任务，并提出了一个包括训练数据构建和多模态模型微调的全面流程。我们在我们基于Music4All数据集构建的新的音乐到MV描述数据集上对现有的预训练多模态模型进行微调，该数据集整合了音乐和视觉信息。我们的实验结果表明，音乐表示可以有效地映射到文本领域，从而使得可以直接从音乐输入生成有意义的MV描述。我们还确定了数据集构建流程中的关键组件，这些组件对MV描述的质量起着至关重要的作用，并强调了一些需要更多关注以改进MV描述生成的特定音乐属性。

更新时间: 2025-03-14 08:34:28

领域: cs.SD,cs.AI,cs.CL,cs.MM,eess.AS

下载: http://arxiv.org/abs/2503.11190v1

Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

Large Language Models (LLMs) are vulnerable to jailbreak attacks, which use crafted prompts to elicit toxic responses. These attacks exploit LLMs' difficulty in dynamically detecting harmful intents during the generation process. Traditional safety alignment methods, often relying on the initial few generation steps, are ineffective due to limited computational budget. This paper proposes DEEPALIGN, a robust defense framework that fine-tunes LLMs to progressively detoxify generated content, significantly improving both the computational budget and effectiveness of mitigating harmful generation. Our approach uses a hybrid loss function operating on hidden states to directly improve LLMs' inherent awareness of toxity during generation. Furthermore, we redefine safe responses by generating semantically relevant answers to harmful queries, thereby increasing robustness against representation-mutation attacks. Evaluations across multiple LLMs demonstrate state-of-the-art defense performance against six different attack types, reducing Attack Success Rates by up to two orders of magnitude compared to previous state-of-the-art defense while preserving utility. This work advances LLM safety by addressing limitations of conventional alignment through dynamic, context-aware mitigation.

Updated: 2025-03-14 08:32:12

标题: 深度对齐：通过渐进式答案净化防御越狱攻击

摘要: 大型语言模型（LLMs）容易受到越狱攻击的威胁，这些攻击利用精心设计的提示来引诱生成有害回应。这些攻击利用LLMs在生成过程中动态检测有害意图的困难。传统的安全对齐方法通常依赖于最初的几个生成步骤，由于计算预算有限，这些方法并不有效。本文提出了DEEPALIGN，一个强大的防御框架，通过微调LLMs来逐步净化生成的内容，显著改善了对有害生成的计算预算和有效性。我们的方法使用在隐藏状态上操作的混合损失函数，直接改善LLMs在生成过程中对有毒性的固有认识。此外，我们重新定义安全回应，通过生成语义相关的答案来应对有害查询，从而增加对表示突变攻击的鲁棒性。对多个LLMs进行评估显示，相较于先前最先进的防御方法，我们的方法在六种不同的攻击类型下表现出最先进的防御性能，将攻击成功率降低了两个数量级，同时保留了实用性。这项工作通过动态的、上下文感知的缓解方式，提升了LLM的安全性，解决了传统对齐方法的局限性。

更新时间: 2025-03-14 08:32:12

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.11185v1

Forecasting Empty Container availability for Vehicle Booking System Application

Container terminals, pivotal nodes in the network of empty container movement, hold significant potential for enhancing operational efficiency within terminal depots through effective collaboration between transporters and terminal operators. This collaboration is crucial for achieving optimization, leading to streamlined operations and reduced congestion, thereby benefiting both parties. Consequently, there is a pressing need to develop the most suitable forecasting approaches to address this challenge. This study focuses on developing and evaluating a data-driven approach for forecasting empty container availability at container terminal depots within a Vehicle Booking System (VBS) framework. It addresses the gap in research concerning optimizing empty container dwell time and aims to enhance operational efficiencies in container terminal operations. Four forecasting models-Naive, ARIMA, Prophet, and LSTM-are comprehensively analyzed for their predictive capabilities, with LSTM emerging as the top performer due to its ability to capture complex time series patterns. The research underscores the significance of selecting appropriate forecasting techniques tailored to the specific requirements of container terminal operations, contributing to improved operational planning and management in maritime logistics.

Updated: 2025-03-14 08:29:04

标题: 预测车辆预订系统应用中空集装箱的可用性

摘要: 集装箱码头是空集装箱运动网络中的关键节点，通过运输商和码头操作人员之间的有效合作，具有显著提高码头运营效率的潜力。这种合作对于实现优化至关重要，可以实现流程化运营和减少拥堵，从而使双方受益。因此，迫切需要开发最适合的预测方法来解决这一挑战。本研究侧重于在车辆预订系统（VBS）框架内开发和评估一种基于数据驱动的方法，用于预测集装箱码头内空集装箱的可用性。它解决了关于优化空集装箱滞留时间的研究空白，并旨在提高集装箱码头运营效率。对四种预测模型- Naive，ARIMA，Prophet和LSTM进行了全面分析，其中LSTM由于其捕捉复杂时间序列模式的能力而成为最佳表现者。研究强调了选择适合集装箱码头运营特定需求的预测技术的重要性，有助于改善海事物流中的运营规划和管理。

更新时间: 2025-03-14 08:29:04

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2503.11728v1

Multi-Stage Generative Upscaler: Reconstructing Football Broadcast Images via Diffusion Models

The reconstruction of low-resolution football broadcast images presents a significant challenge in sports broadcasting, where detailed visuals are essential for analysis and audience engagement. This study introduces a multi-stage generative upscaling framework leveraging Diffusion Models to enhance degraded images, transforming inputs as small as $64 \times 64$ pixels into high-fidelity $1024 \times 1024$ outputs. By integrating an image-to-image pipeline, ControlNet conditioning, and LoRA fine-tuning, our approach surpasses traditional upscaling methods in restoring intricate textures and domain-specific elements such as player details and jersey logos. The custom LoRA is trained on a custom football dataset, ensuring adaptability to sports broadcast needs. Experimental results demonstrate substantial improvements over conventional models, with ControlNet refining fine details and LoRA enhancing task-specific elements. These findings highlight the potential of diffusion-based image reconstruction in sports media, paving the way for future applications in automated video enhancement and real-time sports analytics.

Updated: 2025-03-14 08:28:30

标题: 多阶段生成增强器：通过扩散模型重建足球转播图像

摘要: 这项研究介绍了一种多阶段生成放大框架，利用扩散模型增强退化图像，将$64 \times 64$像素的输入转换为高保真度的$1024 \times 1024$输出，解决体育广播中的低分辨率足球转播图像重建问题。通过整合图像对图像管道、ControlNet调节和LoRA微调，我们的方法在恢复复杂纹理和领域特定元素（如球员细节和球衣标志）方面超越了传统的放大方法。定制的LoRA在自定义足球数据集上训练，确保适应体育广播需求。实验结果显示，与传统模型相比，ControlNet精细调节细节，LoRA增强任务特定元素，实现了显著改进。这些发现突显了扩散基础的图像重建在体育媒体中的潜力，并为自动化视频增强和实时体育分析的未来应用铺平了道路。

更新时间: 2025-03-14 08:28:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11181v1

Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement

Low-light and underwater videos suffer from poor visibility, low contrast, and high noise, necessitating enhancements in visual quality. However, existing approaches typically rely on paired ground truth, which limits their practicality and often fails to maintain temporal consistency. To overcome these obstacles, this paper introduces a novel zero-shot learning approach named Zero-TIG, leveraging the Retinex theory and optical flow techniques. The proposed network consists of an enhancement module and a temporal feedback module. The enhancement module comprises three subnetworks: low-light image denoising, illumination estimation, and reflection denoising. The temporal enhancement module ensures temporal consistency by incorporating histogram equalization, optical flow computation, and image warping to align the enhanced previous frame with the current frame, thereby maintaining continuity. Additionally, we address color distortion in underwater data by adaptively balancing RGB channels. The experimental results demonstrate that our method achieves low-light video enhancement without the need for paired training data, making it a promising and applicable method for real-world scenario enhancement.

Updated: 2025-03-14 08:22:26

标题: Zero-TIG：时间一致性感知的零-shot照明引导低光视频增强

摘要: 低光和水下视频受到能见度差、对比度低和噪声高的影响，需要提高视觉质量。然而，现有方法通常依赖配对的真实数据，这限制了它们的实用性并经常无法保持时间一致性。为了克服这些障碍，本文提出了一种名为Zero-TIG的新型零样本学习方法，利用了Retinex理论和光流技术。所提出的网络包括一个增强模块和一个时间反馈模块。增强模块包括三个子网络：低光图像去噪、光照估计和反射去噪。时间增强模块通过整合直方图均衡化、光流计算和图像变形来确保时间一致性，将增强的上一帧与当前帧对齐，从而保持连续性。此外，我们通过自适应平衡RGB通道来解决水下数据中的色彩失真问题。实验结果表明，我们的方法实现了无需配对训练数据的低光视频增强，使其成为一个有前景且适用于真实场景增强的方法。

更新时间: 2025-03-14 08:22:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11175v1

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction

Decoding visual stimuli from neural activity is essential for understanding the human brain. While fMRI methods have successfully reconstructed static images, fMRI-to-video reconstruction faces challenges due to the need for capturing spatiotemporal dynamics like motion and scene transitions. Recent approaches have improved semantic and perceptual alignment but struggle to integrate coarse fMRI data with detailed visual features. Inspired by the hierarchical organization of the visual system, we propose NEURONS, a novel framework that decouples learning into four correlated sub-tasks: key object segmentation, concept recognition, scene description, and blurry video reconstruction. This approach simulates the visual cortex's functional specialization, allowing the model to capture diverse video content. In the inference stage, NEURONS generates robust conditioning signals for a pre-trained text-to-video diffusion model to reconstruct the videos. Extensive experiments demonstrate that NEURONS outperforms state-of-the-art baselines, achieving solid improvements in video consistency (26.6%) and semantic-level accuracy (19.1%). Notably, NEURONS shows a strong functional correlation with the visual cortex, highlighting its potential for brain-computer interfaces and clinical applications. Code and model weights will be available at: https://github.com/xmed-lab/NEURONS.

Updated: 2025-03-14 08:12:28

标题: 神经元：模拟人类视觉皮层在fMRI到视频重建中提高保真度和解释性

摘要: 从神经活动中解码视觉刺激对于理解人类大脑至关重要。虽然fMRI方法已成功重建静态图像，但fMRI到视频重建面临挑战，因为需要捕捉如运动和场景转换等时空动态。最近的方法改进了语义和感知对齐，但难以将粗糙的fMRI数据与详细的视觉特征整合。受视觉系统的分层组织启发，我们提出了NEURONS，这是一个新颖的框架，将学习分解为四个相关的子任务：关键对象分割，概念识别，场景描述和模糊视频重建。这种方法模拟了视觉皮层的功能专门化，使模型能够捕捉多样的视频内容。在推理阶段，NEURONS为预训练的文本到视频扩散模型生成强大的条件信号，以重建视频。大量实验证明，NEURONS优于最先进的基线，实现了视频一致性（26.6%）和语义级准确性（19.1%）的显著改进。值得注意的是，NEURONS与视觉皮层显示出强烈的功能相关性，突显了其在脑机接口和临床应用中的潜力。代码和模型权重可在以下网址获取：https://github.com/xmed-lab/NEURONS。

更新时间: 2025-03-14 08:12:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11167v1

Homogeneous Dynamics Space for Heterogeneous Humans

Analyses of human motion kinematics have achieved tremendous advances. However, the production mechanism, known as human dynamics, is still undercovered. In this paper, we aim to push data-driven human dynamics understanding forward. We identify a major obstacle to this as the heterogeneity of existing human motion understanding efforts. Specifically, heterogeneity exists in not only the diverse kinematics representations and hierarchical dynamics representations but also in the data from different domains, namely biomechanics and reinforcement learning. With an in-depth analysis of the existing heterogeneity, we propose to emphasize the beneath homogeneity: all of them represent the homogeneous fact of human motion, though from different perspectives. Given this, we propose Homogeneous Dynamics Space (HDyS) as a fundamental space for human dynamics by aggregating heterogeneous data and training a homogeneous latent space with inspiration from the inverse-forward dynamics procedure. Leveraging the heterogeneous representations and datasets, HDyS achieves decent mapping between human kinematics and dynamics. We demonstrate the feasibility of HDyS with extensive experiments and applications. The project page is https://foruck.github.io/HDyS.

Updated: 2025-03-14 08:10:18

标题: 异质人类的同质动态空间

摘要: 人类运动运动学分析取得了巨大进展。然而，人类动力学这一生产机制仍未被揭示。本文旨在推动基于数据的人类动力学理解。我们认为现有人类运动理解工作的一个主要障碍是异质性。具体而言，不仅存在多样化的运动学表达和层次动力学表达的异质性，还存在来自不同领域（即生物力学和强化学习）的数据异质性。通过对现有异质性的深入分析，我们提出要强调底层的同质性：它们都代表了人类运动的同质事实，尽管从不同的角度来看。鉴于此，我们提出同质动力学空间（HDyS）作为人类动力学的基本空间，通过汇聚异构数据并训练一个同质潜在空间，灵感来自逆向-正向动力学过程。利用异质表示和数据集，HDyS实现了人类运动学和动力学之间的良好映射。我们通过广泛的实验和应用展示了HDyS的可行性。项目页面为https://foruck.github.io/HDyS。

更新时间: 2025-03-14 08:10:18

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.06146v2

FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs

Graph neural network universal interatomic potentials (GNN-UIPs) have demonstrated remarkable generalization and transfer capabilities in material discovery and property prediction. These models can accelerate molecular dynamics (MD) simulation by several orders of magnitude while maintaining \textit{ab initio} accuracy, making them a promising new paradigm in material simulations. One notable example is Crystal Hamiltonian Graph Neural Network (CHGNet), pretrained on the energies, forces, stresses, and magnetic moments from the MPtrj dataset, representing a state-of-the-art GNN-UIP model for charge-informed MD simulations. However, training the CHGNet model is time-consuming(8.3 days on one A100 GPU) for three reasons: (i) requiring multi-layer propagation to reach more distant atom information, (ii) requiring second-order derivatives calculation to finish weights updating and (iii) the implementation of reference CHGNet does not fully leverage the computational capabilities. This paper introduces FastCHGNet, an optimized CHGNet, with three contributions: Firstly, we design innovative Force/Stress Readout modules to decompose Force/Stress prediction. Secondly, we adopt massive optimizations such as kernel fusion, redundancy bypass, etc, to exploit GPU computation power sufficiently. Finally, we extend CHGNet to support multiple GPUs and propose a load-balancing technique to enhance GPU utilization. Numerical results show that FastCHGNet reduces memory footprint by a factor of 3.59. The final training time of FastCHGNet can be decreased to \textbf{1.53 hours} on 32 GPUs without sacrificing model accuracy.

Updated: 2025-03-14 08:01:35

标题: FastCHGNet：使用32个GPU在1.5小时内训练一个通用的原子间势能

摘要: 图神经网络通用原子间势（GNN-UIPs）在材料发现和性质预测中展示了出色的泛化和迁移能力。这些模型可以加速分子动力学（MD）模拟数个数量级，同时保持\textit{从头算}的准确性，使它们成为材料模拟中一种有前途的新范式。一个显著的例子是Crystal Hamiltonian Graph Neural Network（CHGNet），在MPtrj数据集的能量、力、应力和磁矩上进行了预训练，代表了一个最先进的基于电荷信息的MD模拟的GNN-UIP模型。然而，训练CHGNet模型非常耗时（在一个A100 GPU上需要8.3天），原因有三个：（i）需要多层传播以获取更远处的原子信息，（ii）需要计算二阶导数以完成权重更新，（iii）参考CHGNet的实现没有充分利用计算能力。本文介绍了一种优化的CHGNet，名为FastCHGNet，具有三个贡献：首先，我们设计了创新的力/应力读取模块来分解力/应力预测。其次，我们采用了大规模的优化，如内核融合、冗余绕过等，充分利用GPU计算能力。最后，我们扩展了CHGNet以支持多个GPU，并提出了一种负载平衡技术来增强GPU利用率。数值结果显示，FastCHGNet将内存占用减少了3.59倍。FastCHGNet的最终训练时间可以在32个GPU上减少到\textbf{1.53小时}，而不会牺牲模型准确性。

更新时间: 2025-03-14 08:01:35

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2412.20796v2

Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective

Attributions aim to identify input pixels that are relevant to the decision-making process. A popular approach involves using modified backpropagation (BP) rules to reverse decisions, which improves interpretability compared to the original gradients. However, these methods lack a solid theoretical foundation and exhibit perplexing behaviors, such as reduced sensitivity to parameter randomization, raising concerns about their reliability and highlighting the need for theoretical justification. In this work, we present a unified theoretical framework for methods like GBP, RectGrad, LRP, and DTD, demonstrating that they achieve input alignment by combining the weights of activated neurons. This alignment improves the visualization quality and reduces sensitivity to weight randomization. Our contributions include: (1) Providing a unified explanation for multiple behaviors, rather than focusing on just one. (2) Accurately predicting novel behaviors. (3) Offering insights into decision-making processes, including layer-wise information changes and the relationship between attributions and model decisions.

Updated: 2025-03-14 07:58:26

标题: 用调整视角统一修改的BP归因中的困惑行为

摘要: 归因旨在确定与决策过程相关的输入像素。一种流行的方法涉及使用修改后的反向传播（BP）规则来逆转决策，与原始梯度相比，这种方法提高了可解释性。然而，这些方法缺乏坚实的理论基础，并表现出令人困惑的行为，例如对参数随机化的敏感性降低，引发对它们的可靠性的担忧，并凸显了对理论证明的需求。在这项工作中，我们提出了一个统一的理论框架，用于方法如GBP、RectGrad、LRP和DTD，证明它们通过组合激活神经元的权重实现输入对齐。这种对齐提高了可视化质量，减少了对权重随机化的敏感性。我们的贡献包括：（1）为多种行为提供统一的解释，而不仅仅关注其中一种。（2）准确预测新的行为。（3）提供对决策过程的见解，包括层次信息变化和归因与模型决策之间的关系。

更新时间: 2025-03-14 07:58:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11160v1

Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality

The problem of missing modalities is both critical and non-trivial to be handled in multi-modal models. It is common for multi-modal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are missing, the model performance drops significantly. Such fact remains unexplored by current multi-modal approaches that recover the representation from missing modalities by feature reconstruction or blind feature aggregation from other modalities, instead of extracting useful information from the best performing modalities. In this paper, we propose a Learnable Cross-modal Knowledge Distillation (LCKD) model to adaptively identify important modalities and distil knowledge from them to help other modalities from the cross-modal perspective for solving the missing modality issue. Our approach introduces a teacher election procedure to select the most ``qualified'' teachers based on their single modality performance on certain tasks. Then, cross-modal knowledge distillation is performed between teacher and student modalities for each task to push the model parameters to a point that is beneficial for all tasks. Hence, even if the teacher modalities for certain tasks are missing during testing, the available student modalities can accomplish the task well enough based on the learned knowledge from their automatically elected teacher modalities. Experiments on the Brain Tumour Segmentation Dataset 2018 (BraTS2018) shows that LCKD outperforms other methods by a considerable margin, improving the state-of-the-art performance by 3.61% for enhancing tumour, 5.99% for tumour core, and 3.76% for whole tumour in terms of segmentation Dice score.

Updated: 2025-03-14 07:53:09

标题: 可学习的跨模态知识蒸馏：用于缺失模态的多模态学习

摘要: 缺失模态的问题在多模态模型中既关键又非平凡。在多模态任务中，某些模态相对于其他模态更具贡献性是常见的，如果这些重要模态缺失，模型性能将显著下降。目前的多模态方法未探索这一事实，它们通过特征重建或从其他模态进行盲目特征聚合来恢复缺失模态的表示，而不是从性能最佳的模态中提取有用信息。在本文中，我们提出了一个可学习的跨模态知识蒸馏（LCKD）模型，以自适应地识别重要模态，并从跨模态角度为解决缺失模态问题从它们中蒸馏知识来帮助其他模态。我们的方法引入了一个教师选举程序，基于它们在特定任务上的单一模态性能来选择最“合格”的教师。然后，在每个任务中，教师模态和学生模态之间进行跨模态知识蒸馏，以将模型参数推向有利于所有任务的点。因此，即使在测试期间某些任务的教师模态缺失，可用的学生模态也可以根据从它们自动选定的教师模态中学到的知识足够好地完成任务。对2018年脑肿瘤分割数据集（BraTS2018）上的实验显示，LCKD在增强肿瘤方面的分割Dice分数方面优于其他方法，使最新性能提高了3.61%，在肿瘤核心方面提高了5.99%，在整个肿瘤方面提高了3.76%。

更新时间: 2025-03-14 07:53:09

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.01035v2

Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning

High energy consumption remains a key challenge in high-performance computing (HPC) systems, which often feature hundreds or thousands of nodes drawing substantial power even in idle or standby modes. Although powering down unused nodes can improve energy efficiency, choosing the wrong time to do so can degrade quality of service by delaying job execution. Machine learning, in particular reinforcement learning (RL), has shown promise in determining optimal times to switch nodes on or off. In this study, we enhance the performance of a deep reinforcement learning (DRL) agent for HPC power management by integrating curriculum learning (CL), a training approach that introduces tasks with gradually increasing difficulty. Using the Batsim-py simulation framework, we compare the proposed CL-based agent to both a baseline DRL method (without CL) and the conventional fixed-time timeout strategy. Experimental results confirm that an easy-to-hard curriculum outperforms other training orders in terms of reducing wasted energy usage. The best agent achieves a 3.73% energy reduction over the baseline DRL method and a 4.66% improvement compared to the best timeout configuration (shutdown every 15 minutes of idle time). In addition, it reduces average job waiting time by 9.24% and maintains a higher job-filling rate, indicating more effective resource utilization. Sensitivity tests across various switch-on durations, power levels, and cluster sizes further reveal the agent's adaptability to changing system parameters without retraining. These findings demonstrate that curriculum learning can significantly improve DRL-based power management in HPC, balancing energy savings, quality of service, and robustness to diverse configurations.

Updated: 2025-03-14 07:47:22

标题: 使用课程学习方法改进基于深度强化学习的高性能计算集群电源管理系统的效率

摘要: 高性能计算（HPC）系统中的高能耗仍然是一个关键挑战，这些系统通常具有数百或数千个节点，即使在空闲或待机模式下也会消耗大量功率。尽管关闭未使用的节点可以提高能源效率，但选择错误的时间进行此操作可能会通过延迟作业执行来降低服务质量。机器学习，特别是强化学习（RL），已经显示出在确定开关节点的最佳时间方面具有潜力。在这项研究中，我们通过整合课程学习（CL），一种逐渐增加难度的训练方法，提高了用于HPC电力管理的深度强化学习（DRL）代理的性能。使用Batsim-py模拟框架，我们将所提出的基于CL的代理与基线DRL方法（无CL）和传统的固定时间超时策略进行了比较。实验结果证实，易到难的课程比其他训练顺序在减少能源浪费方面表现更好。最佳代理相对于基线DRL方法实现了3.73%的能源减少，并相对于最佳超时配置（每15分钟的空闲时间关闭一次）实现了4.66%的改善。此外，它将平均作业等待时间降低了9.24%，并保持较高的作业填充率，表明更有效的资源利用。对各种开关持续时间、功率级别和集群大小进行的敏感性测试进一步显示了代理对不同系统参数的适应性，无需重新训练。这些发现表明，课程学习可以显着改善HPC中基于DRL的电源管理，平衡能源节约、服务质量和对各种配置的稳健性。

更新时间: 2025-03-14 07:47:22

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2502.20348v2

Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models

Few-shot Chain-of-Thought (CoT) significantly enhances the reasoning capabilities of large language models (LLMs), functioning as a whole to guide these models in generating reasoning steps toward final answers. However, we observe that isolated segments, words, or tokens within CoT demonstrations can unexpectedly disrupt the generation process of LLMs. The model may overly concentrate on certain local information present in the demonstration, introducing irrelevant noise into the reasoning process and potentially leading to incorrect answers. In this paper, we investigate the underlying mechanism of CoT through dynamically tracing and manipulating the inner workings of LLMs at each output step, which demonstrates that tokens exhibiting specific attention characteristics are more likely to induce the model to take things out of context; these tokens directly attend to the hidden states tied with prediction, without substantial integration of non-local information. Building upon these insights, we propose a Few-shot Attention Intervention method (FAI) that dynamically analyzes the attention patterns of demonstrations to accurately identify these tokens and subsequently make targeted adjustments to the attention weights to effectively suppress their distracting effect on LLMs. Comprehensive experiments across multiple benchmarks demonstrate consistent improvements over baseline methods, with a remarkable 5.91% improvement on the AQuA dataset, further highlighting the effectiveness of FAI.

Updated: 2025-03-14 07:46:33

标题: 不要断章取义：注意力干预以增强大型语言模型的思维链推理

摘要: Few-shot Chain-of-Thought (CoT)显著增强了大型语言模型（LLMs）的推理能力，作为一个整体引导这些模型生成推理步骤，直至最终答案。然而，我们观察到CoT演示中的孤立片段、单词或标记可能会意外地干扰LLMs的生成过程。模型可能过度集中在演示中存在的某些局部信息上，引入无关的噪音到推理过程中，可能导致错误的答案。在本文中，我们通过动态追踪和操作LLMs在每个输出步骤的内部工作，研究了CoT的基本机制，结果表明展示特定注意特征的标记更可能导致模型脱离上下文;这些标记直接关注与预测相关的隐藏状态，没有实质性整合非局部信息。基于这些见解，我们提出了一种Few-shot Attention Intervention方法（FAI），动态分析演示的注意模式以准确识别这些标记，并随后对注意权重进行有针对性的调整，以有效抑制它们对LLMs的干扰效果。跨多个基准的综合实验表明，与基准方法相比，FAI表现出一致的改进，AQuA数据集上显著提高了5.91%，进一步突显了FAI的有效性。

更新时间: 2025-03-14 07:46:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.11154v1

Enabling Weak Client Participation via On-device Knowledge Distillation in Heterogenous Federated Learning

Online Knowledge Distillation (KD) is recently highlighted to train large models in Federated Learning (FL) environments. Many existing studies adopt the logit ensemble method to perform KD on the server side. However, they often assume that unlabeled data collected at the edge is centralized on the server. Moreover, the logit ensemble method personalizes local models, which can degrade the quality of soft targets, especially when data is highly non-IID. To address these critical limitations,we propose a novel on-device KD-based heterogeneous FL method. Our approach leverages a small auxiliary model to learn from labeled local data. Subsequently, a subset of clients with strong system resources transfers knowledge to a large model through on-device KD using their unlabeled data. Our extensive experiments demonstrate that our on-device KD-based heterogeneous FL method effectively utilizes the system resources of all edge devices as well as the unlabeled data, resulting in higher accuracy compared to SOTA KD-based FL methods.

Updated: 2025-03-14 07:40:37

标题: 通过异构联邦学习中的设备端知识蒸馏实现弱客户端参与

摘要: 在线知识蒸馏（KD）最近被强调用于在联邦学习（FL）环境中训练大型模型。许多现有研究采用logit集成方法在服务器端执行KD。然而，它们经常假设在边缘收集的未标记数据在服务器上是集中的。此外，logit集成方法个性化本地模型，这可能会降低软目标的质量，特别是在数据高度非IID时。为了解决这些关键限制，我们提出了一种基于设备KD的新颖的异构FL方法。我们的方法利用一个小型辅助模型从标记的本地数据中学习。随后，一部分具有强系统资源的客户端通过使用其未标记数据通过设备KD向大型模型传递知识。我们的广泛实验表明，我们的基于设备KD的异构FL方法有效利用了所有边缘设备的系统资源以及未标记数据，相较于SOTA基于KD的FL方法具有更高的准确性。

更新时间: 2025-03-14 07:40:37

领域: cs.LG

下载: http://arxiv.org/abs/2503.11151v1

RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks

Incremental learning aims to adapt to new sets of categories over time with minimal computational overhead. Prior work often addresses this task by training efficient task-specific adaptors that modify frozen layer weights or features to capture relevant information without affecting predictions on previously learned categories. While these adaptors are generally more efficient than finetuning the entire network, they still require tens to hundreds of thousands of task-specific trainable parameters even for relatively small networks, making it challenging to operate on resource-constrained environments with high communication costs like edge devices or mobile phones. Thus, we propose Reparameterized, Compact weight Adaptation for Sequential Tasks (RECAST), a novel method that dramatically reduces task-specific trainable parameters to fewer than 50 - several orders of magnitude less than competing methods like LoRA. RECAST accomplishes this efficiency by learning to decompose layer weights into a soft parameter-sharing framework consisting of shared weight templates and very few module-specific scaling factors or coefficients. This soft parameter-sharing framework allows for effective task-wise reparameterization by tuning only these coefficients while keeping templates frozen.A key innovation of RECAST is the novel weight reconstruction pipeline called Neural Mimicry, which eliminates the need for pretraining from scratch. This allows for high-fidelity emulation of existing pretrained weights within our framework and provides quick adaptability to any model scale and architecture. Extensive experiments across six datasets demonstrate RECAST outperforms the state-of-the-art by up to 3% across various scales, architectures, and parameter spaces Moreover, we show that RECAST's architecture-agnostic nature allows for seamless integration with existing methods, further boosting performance.

Updated: 2025-03-14 07:36:26

标题: RECAST：用于序列任务的重新参数化、紧凑权重调整

摘要: 渐进学习旨在通过最小的计算开销随时间适应新的类别集。先前的工作通常通过训练高效的任务特定适配器来解决这一任务，这些适配器修改冻结层权重或特征，以捕获相关信息而不影响先前学习类别的预测。虽然这些适配器通常比对整个网络进行微调更有效，但即使是对于相对较小的网络，它们仍需要数以万计的任务特定可训练参数，使其难以在资源受限的环境中运行，例如边缘设备或手机等具有高通信成本的环境。因此，我们提出了一种名为RECAST的新方法，该方法通过学习将层权重分解为共享权重模板和极少量的模块特定缩放因子或系数的软参数共享框架，将任务特定可训练参数显著减少到少于50个 - 比竞争方法LoRA少几个数量级。RECAST通过仅调整这些系数而保持模板冻结来实现这种效率。RECAST的一个关键创新是称为神经模仿的新型权重重构管道，它消除了从头开始的预训练的需求。这允许在我们的框架内高度模拟现有预训练权重，并提供对任何模型规模和架构的快速适应性。在六个数据集上进行的大量实验表明，RECAST在各种规模、架构和参数空间上的表现优于最先进技术最多达到3%。此外，我们展示了RECAST对架构不可知的特性允许与现有方法无缝集成，进一步提升性能。

更新时间: 2025-03-14 07:36:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.16870v2

Asynchronous Sharpness-Aware Minimization For Fast and Accurate Deep Learning

Sharpness-Aware Minimization (SAM) is an optimization method that improves generalization performance of machine learning models. Despite its superior generalization, SAM has not been actively used in real-world applications due to its expensive computational cost. In this work, we propose a novel asynchronous-parallel SAM which achieves nearly the same gradient norm penalizing effect like the original SAM while breaking the data dependency between the model perturbation and the model update. The proposed asynchronous SAM can even entirely hide the model perturbation time by adjusting the batch size for the model perturbation in a system-aware manner. Thus, the proposed method enables to fully utilize heterogeneous system resources such as CPUs and GPUs. Our extensive experiments well demonstrate the practical benefits of the proposed asynchronous approach. E.g., the asynchronous SAM achieves comparable Vision Transformer fine-tuning accuracy (CIFAR-100) as the original SAM while having almost the same training time as SGD.

Updated: 2025-03-14 07:34:39

标题: 异步锐度感知最小化用于快速准确的深度学习

摘要: Sharpness-Aware Minimization（SAM）是一种优化方法，可以提高机器学习模型的泛化性能。尽管SAM具有优越的泛化性能，但由于其昂贵的计算成本，SAM在实际应用中并未被广泛采用。在这项工作中，我们提出了一种新颖的异步并行SAM，该方法实现了几乎与原始SAM相同的梯度范数惩罚效果，同时打破了模型扰动与模型更新之间的数据依赖关系。提出的异步SAM甚至可以通过以系统感知方式调整模型扰动的批量大小来完全隐藏模型扰动时间。因此，该方法使得可以充分利用异构系统资源，如CPU和GPU。我们进行了大量实验证明了提出的异步方法的实际优势。例如，异步SAM在几乎与SGD相同的训练时间内实现了可比较的Vision Transformer微调精度（CIFAR-100），与原始SAM相比。

更新时间: 2025-03-14 07:34:39

领域: cs.LG

下载: http://arxiv.org/abs/2503.11147v1

Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning

Expensive communication cost is a common performance bottleneck in Federated Learning (FL), which makes it less appealing in real-world applications. Many communication-efficient FL methods focus on discarding a part of model updates mostly based on gradient magnitude. In this study, we find that recycling previous updates, rather than simply dropping them, more effectively reduces the communication cost while maintaining FL performance. We propose FedLUAR, a Layer-wise Update Aggregation with Recycling scheme for communication-efficient FL. We first define a useful metric that quantifies the extent to which the aggregated gradients influences the model parameter values in each layer. FedLUAR selects a few layers based on the metric and recycles their previous updates on the server side. Our extensive empirical study demonstrates that the update recycling scheme significantly reduces the communication cost while maintaining model accuracy. For example, our method achieves nearly the same AG News accuracy as FedAvg, while reducing the communication cost to just 17%.

Updated: 2025-03-14 07:33:15

标题: 逐层更新聚合与循环再利用用于高效通信的联邦学习

摘要: 昂贵的通信成本是联邦学习（FL）中常见的性能瓶颈，这使得其在现实世界的应用中不那么吸引人。许多通信效率高的FL方法侧重于丢弃一部分模型更新，主要基于梯度大小。在本研究中，我们发现重新利用先前的更新，而不仅仅是丢弃它们，可以更有效地降低通信成本，同时保持FL性能。我们提出了FedLUAR，一种用于通信高效的FL的层级更新聚合与回收方案。我们首先定义了一个有用的度量标准，量化了聚合梯度对每一层模型参数值的影响程度。FedLUAR基于该度量标准选择了几层，并在服务器端回收它们的先前更新。我们广泛的实证研究表明，更新回收方案显著降低了通信成本，同时保持了模型的准确性。例如，我们的方法几乎达到了与FedAvg相同的AG News准确性，同时将通信成本降低了至仅为17%。

更新时间: 2025-03-14 07:33:15

领域: cs.LG

下载: http://arxiv.org/abs/2503.11146v1

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

In this paper, we introduce LLaVA-Octopus, a novel video multimodal large language model. LLaVA-Octopus adaptively weights features from different visual projectors based on user instructions, enabling us to leverage the complementary strengths of each projector. We observe that different visual projectors exhibit distinct characteristics when handling specific tasks. For instance, some projectors excel at capturing static details, while others are more effective at processing temporal information, and some are better suited for tasks requiring temporal coherence. By dynamically adjusting feature weights according to user instructions, LLaVA-Octopus dynamically selects and combines the most suitable features, significantly enhancing the model's performance in multimodal tasks. Experimental results demonstrate that LLaVA-Octopus achieves excellent performance across multiple benchmarks, especially in tasks such as video question answering, long video understanding, and comprehensive multi-choices benchmarks, highlighting its broad application potential.

Updated: 2025-03-14 07:29:54

标题: LLaVA-Octopus：解锁基于指令驱动的自适应投影融合，用于视频理解

摘要: 在本文中，我们介绍了LLaVA-Octopus，一种新颖的视频多模态大语言模型。LLaVA-Octopus根据用户指令自适应加权来自不同视觉投影仪的特征，使我们能够利用每个投影仪的互补优势。我们观察到在处理特定任务时，不同的视觉投影仪展现出不同的特征。例如，一些投影仪擅长捕捉静态细节，而其他的则更有效地处理时间信息，还有一些更适合需要时间连贯性的任务。通过根据用户指令动态调整特征权重，LLaVA-Octopus动态选择并结合最合适的特征，显著提升了模型在多模态任务中的性能。实验结果表明，LLaVA-Octopus在多个基准测试中取得了出色的表现，特别是在视频问答、长视频理解和全面多选基准测试等任务中，突显了其广泛的应用潜力。

更新时间: 2025-03-14 07:29:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2501.05067v2

MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling

Large-scale pre-training of deep models, followed by fine-tuning them, has become the cornerstone of natural language processing (NLP). The prevalence of data coupled with computational resources has led to large models with a considerable number of parameters. While the massive size of these models has led to remarkable success in many NLP tasks, a detriment is the expense required to retrain all the base model's parameters for the adaptation to each task or domain. Parameter Efficient Fine-Tuning (PEFT) provides an effective solution for this challenge by minimizing the number of parameters required to be fine-tuned while maintaining the quality of the model. While existing methods have achieved impressive results, they mainly focus on adapting a subset of parameters, weight reparameterization, and prompt engineering. In this paper, we study layers as extractors of different types of linguistic information that are valuable when used in conjunction. We then propose the Mixture of Layer Experts (MoLEx), a novel sparse mixture of experts (SMoE) whose experts are layers in the pre-trained model. It performs a conditional computation of a mixture of layers during fine-tuning to provide the model with more structural knowledge about the data. By providing an avenue for information exchange between layers, MoLEx enables the model to make a more well-informed prediction for the downstream task, leading to better fine-tuning results with the same number of effective parameters. As experts can be processed in parallel, MoLEx introduces minimal additional computational overhead. We empirically corroborate the advantages of MoLEx when combined with popular PEFT baseline methods on a variety of downstream fine-tuning tasks, including the popular GLUE benchmark as well as the End-to-End Challenge (E2E). The code is publicly available at https://github.com/rachtsy/molex.

Updated: 2025-03-14 07:22:07

标题: MoLEx：稀疏升级微调的分层专家混合模型

摘要: 深度模型的大规模预训练，然后进行微调，已成为自然语言处理（NLP）的基石。数据的普及以及计算资源的增加导致了具有大量参数的大型模型。尽管这些模型的巨大规模在许多NLP任务中取得了显著成功，但一个缺点是需要重新训练所有基本模型参数以适应每个任务或领域的适应性所需的成本。参数高效微调（PEFT）通过最小化需要微调的参数数量同时保持模型质量，为这一挑战提供了有效的解决方案。现有方法取得了令人印象深刻的结果，但主要集中在适应参数子集、权重重新参数化和提示工程方面。在本文中，我们研究层作为提取不同类型语言信息的工具，在结合使用时具有价值。然后，我们提出了Mixture of Layer Experts（MoLEx），一种新颖的稀疏专家混合（SMoE），其专家是预训练模型中的层。它在微调期间进行层的混合条件计算，为模型提供有关数据的更多结构知识。通过在层之间提供信息交流的途径，MoLEx使模型能够为下游任务做出更具见地的预测，从而实现在相同的有效参数数量下获得更好的微调结果。由于专家可以并行处理，MoLEx引入的额外计算开销很小。我们在多种下游微调任务上与流行的PEFT基线方法相结合，包括流行的GLUE基准测试和端到端挑战（E2E），实证证实了MoLEx的优势。代码公开可在https://github.com/rachtsy/molex获得。

更新时间: 2025-03-14 07:22:07

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11144v1

DGNN: A Neural PDE Solver Induced by Discontinuous Galerkin Methods

We propose a general framework for the Discontinuous Galerkin-induced Neural Network (DGNN), inspired by the Interior Penalty Discontinuous Galerkin Method (IPDGM). In this approach, the trial space consists of piecewise neural network space defined over the computational domain, while the test function space is composed of piecewise polynomials. We demonstrate the advantages of DGNN in terms of accuracy and training efficiency across several numerical examples, including stationary and time-dependent problems. Specifically, DGNN easily handles high perturbations, discontinuous solutions, and complex geometric domains.

Updated: 2025-03-14 07:18:48

标题: DGNN：由间断Galerkin方法诱导的神经PDE求解器

摘要: 我们提出了一个通用框架，用于由间断Galerkin引发的神经网络（DGNN），受到内部惩罚间断Galerkin方法（IPDGM）的启发。在这种方法中，试验空间由定义在计算域上的分段神经网络空间组成，而测试函数空间由分段多项式组成。我们通过几个数值示例展示了DGNN在精度和训练效率方面的优势，包括静止和时间相关问题。具体来说，DGNN轻松处理高扰动、不连续解和复杂几何域。

更新时间: 2025-03-14 07:18:48

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2503.10021v2

Virtual Guidance as a Mid-level Representation for Navigation with Augmented Reality

In the context of autonomous navigation, effectively conveying abstract navigational cues to agents in dynamic environments presents significant challenges, particularly when navigation information is derived from diverse modalities such as both vision and high-level language descriptions. To address this issue, we introduce a novel technique termed `Virtual Guidance,' which is designed to visually represent non-visual instructional signals. These visual cues are overlaid onto the agent's camera view and served as comprehensible navigational guidance signals. To validate the concept of virtual guidance, we propose a sim-to-real framework that enables the transfer of the trained policy from simulated environments to real world, ensuring the adaptability of virtual guidance in practical scenarios. We evaluate and compare the proposed method against a non-visual guidance baseline through detailed experiments in simulation. The experimental results demonstrate that the proposed virtual guidance approach outperforms the baseline methods across multiple scenarios and offers clear evidence of its effectiveness in autonomous navigation tasks.

Updated: 2025-03-14 07:17:05

标题: 虚拟指导作为增强现实导航的中级表示【翻译结果仅供参考】

摘要: 在自主导航的背景下，有效地向动态环境中的智能体传达抽象导航提示面临着重大挑战，特别是当导航信息来自多种形式，如视觉和高级语言描述时。为了解决这个问题，我们引入了一种名为“虚拟指导”的新技术，旨在以可视化方式呈现非视觉指示信号。这些视觉提示被叠加到智能体的摄像头视图上，作为可理解的导航指导信号。为了验证虚拟指导的概念，我们提出了一个模拟到实际的框架，能够将经过训练的策略从模拟环境传输到现实世界，确保虚拟指导在实际场景中的适应性。我们通过详细实验在模拟中评估和比较了所提出的方法与非视觉指导基线。实验结果表明，所提出的虚拟指导方法在多种场景中优于基线方法，并清晰地证明了其在自主导航任务中的有效性。

更新时间: 2025-03-14 07:17:05

领域: cs.LG,cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2303.02731v3

Direction-Aware Diagonal Autoregressive Image Generation

The raster-ordered image token sequence exhibits a significant Euclidean distance between index-adjacent tokens at line breaks, making it unsuitable for autoregressive generation. To address this issue, this paper proposes Direction-Aware Diagonal Autoregressive Image Generation (DAR) method, which generates image tokens following a diagonal scanning order. The proposed diagonal scanning order ensures that tokens with adjacent indices remain in close proximity while enabling causal attention to gather information from a broader range of directions. Additionally, two direction-aware modules: 4D-RoPE and direction embeddings are introduced, enhancing the model's capability to handle frequent changes in generation direction. To leverage the representational capacity of the image tokenizer, we use its codebook as the image token embeddings. We propose models of varying scales, ranging from 485M to 2.0B. On the 256$\times$256 ImageNet benchmark, our DAR-XL (2.0B) outperforms all previous autoregressive image generators, achieving a state-of-the-art FID score of 1.37.

Updated: 2025-03-14 06:44:01

标题: 方向感知对角自回归图像生成

摘要: 栅格顺序的图像令牌序列在换行处的索引相邻令牌之间表现出显著的欧几里得距离，因此不适合自回归生成。为了解决这个问题，本文提出了方向感知对角自回归图像生成（DAR）方法，该方法按照对角线扫描顺序生成图像令牌。所提出的对角线扫描顺序确保具有相邻索引的令牌保持紧密相邻，同时使因果关注能够从更广泛的方向范围收集信息。此外，引入了两个方向感知模块：4D-RoPE和方向嵌入，增强了模型处理生成方向频繁变化的能力。为了利用图像标记器的表征能力，我们使用其代码本作为图像令牌嵌入。我们提出了不同规模的模型，从485M到2.0B不等。在256×256的ImageNet基准上，我们的DAR-XL（2.0B）胜过所有先前的自回归图像生成器，实现了1.37的最先进FID分数。

更新时间: 2025-03-14 06:44:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11129v1

Don't Forget It! Conditional Sparse Autoencoder Clamping Works for Unlearning

Recent developments in Large Language Model (LLM) capabilities have brought great potential but also posed new risks. For example, LLMs with knowledge of bioweapons, advanced chemistry, or cyberattacks could cause violence if placed in the wrong hands or during malfunctions. Because of their nature as near-black boxes, intuitive interpretation of LLM internals remains an open research question, preventing developers from easily controlling model behavior and capabilities. The use of Sparse Autoencoders (SAEs) has recently emerged as a potential method of unraveling representations of concepts in LLMs internals, and has allowed developers to steer model outputs by directly modifying the hidden activations. In this paper, we use SAEs to identify unwanted concepts from the Weapons of Mass Destruction Proxy (WMDP) dataset within gemma-2-2b internals and use feature steering to reduce the model's ability to answer harmful questions while retaining its performance on harmless queries. Our results bring back optimism to the viability of SAE-based explicit knowledge unlearning techniques.

Updated: 2025-03-14 06:43:19

标题: 不要忘记它！条件稀疏自动编码器夹紧适用于遗忘

摘要: 最近发展的大型语言模型（LLM）能力带来了巨大潜力，但也带来了新的风险。例如，具有生物武器、先进化学或网络攻击知识的LLM如果落入错误的手中或在故障期间可能导致暴力事件。由于它们的本质几乎是黑匣子，对LLM内部的直观解释仍然是一个待解决的研究问题，这使开发人员无法轻松控制模型的行为和能力。最近，稀疏自编码器（SAEs）的使用已被提出作为揭示LLM内部概念表示的潜在方法，使开发人员能够通过直接修改隐藏激活来引导模型输出。在本文中，我们使用SAEs从大规模杀伤性武器代理（WMDP）数据集中的gemma-2-2b内部识别不需要的概念，并使用特征引导来减少模型回答有害问题的能力，同时保留其对无害查询的性能。我们的结果重新给予了基于SAE的明确知识遗忘技术的可行性带来了乐观情绪。

更新时间: 2025-03-14 06:43:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11127v1

MUSS: Multilevel Subset Selection for Relevance and Diversity

The problem of relevant and diverse subset selection has a wide range of applications, including recommender systems and retrieval-augmented generation (RAG). For example, in recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation. Constrained subset selection problem is NP-hard, and popular approaches such as Maximum Marginal Relevance (MMR) are based on greedy selection. Many real-world applications involve large data, but the original MMR work did not consider distributed selection. This limitation was later addressed by a method called DGDS which allows for a distributed setting using random data partitioning. Here, we exploit structure in the data to further improve both scalability and performance on the target application. We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection. We provide a rigorous theoretical analysis and show that our method achieves a constant factor approximation of the optimal objective. In a recommender system application, our method can achieve the same level of performance as baselines, but 4.5 to 20 times faster. Our method is also capable of outperforming baselines by up to 6 percent points of RAG-based question answering accuracy.

Updated: 2025-03-14 06:37:17

标题: MUSS：用于相关性和多样性的多级子集选择

摘要: 相关和多样化子集选择问题具有广泛的应用，包括推荐系统和检索增强生成（RAG）。例如，在推荐系统中，人们希望选择相关的项目，同时提供多样化的推荐。受限子集选择问题是NP困难的，而流行的方法如最大边际相关性（MMR）是基于贪婪选择的。许多现实世界的应用涉及大量数据，但最初的MMR工作并未考虑分布式选择。这个限制后来通过一种名为DGDS的方法来解决，该方法允许使用随机数据分区进行分布式设置。在这里，我们利用数据中的结构进一步提高了目标应用的可扩展性和性能。我们提出了一种新颖的方法MUSS，它使用多级方法进行相关和多样化选择。我们提供了严格的理论分析，并展示了我们的方法实现了最佳目标的一个常数因子近似。在推荐系统应用中，我们的方法可以实现与基线相同水平的性能，但速度快4.5到20倍。我们的方法还能够在RAG问答准确性上超越基线最多6个百分点。

更新时间: 2025-03-14 06:37:17

领域: cs.LG

下载: http://arxiv.org/abs/2503.11126v1

Context-Aware Rule Mining Using a Dynamic Transformer-Based Framework

This study proposes a dynamic rule data mining algorithm based on an improved Transformer architecture, aiming to improve the accuracy and efficiency of rule mining in a dynamic data environment. With the increase in data volume and complexity, traditional data mining methods are difficult to cope with dynamic data with strong temporal and variable characteristics, so new algorithms are needed to capture the temporal regularity in the data. By improving the Transformer architecture, and introducing a dynamic weight adjustment mechanism and a temporal dependency module, we enable the model to adapt to data changes and mine more accurate rules. Experimental results show that compared with traditional rule mining algorithms, the improved Transformer model has achieved significant improvements in rule mining accuracy, coverage, and stability. The contribution of each module in the algorithm performance is further verified by ablation experiments, proving the importance of temporal dependency and dynamic weight adjustment mechanisms in improving the model effect. In addition, although the improved model has certain challenges in computational efficiency, its advantages in accuracy and coverage enable it to perform well in processing complex dynamic data. Future research will focus on optimizing computational efficiency and combining more deep learning technologies to expand the application scope of the algorithm, especially in practical applications in the fields of finance, medical care, and intelligent recommendation.

Updated: 2025-03-14 06:37:04

标题: 上下文感知规则挖掘：使用基于动态变换器框架

摘要: 这项研究提出了一种基于改进的Transformer架构的动态规则数据挖掘算法，旨在提高动态数据环境中规则挖掘的准确性和效率。随着数据量和复杂性的增加，传统的数据挖掘方法很难应对具有强时间和变量特征的动态数据，因此需要新的算法来捕捉数据中的时间规律。通过改进Transformer架构，并引入动态权重调整机制和时间依赖模块，我们使模型能够适应数据变化并挖掘更准确的规则。实验结果表明，与传统规则挖掘算法相比，改进的Transformer模型在规则挖掘准确性、覆盖范围和稳定性方面取得了显著改进。消融实验进一步验证了算法性能中每个模块的贡献，证明了时间依赖性和动态权重调整机制在提高模型效果中的重要性。此外，尽管改进的模型在计算效率方面存在一定挑战，但其在准确性和覆盖范围上的优势使其能够在处理复杂动态数据方面表现良好。未来的研究将集中在优化计算效率，并结合更多深度学习技术来扩大算法的应用范围，特别是在金融、医疗保健和智能推荐领域的实际应用中。

更新时间: 2025-03-14 06:37:04

领域: cs.LG

下载: http://arxiv.org/abs/2503.11125v1

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Minibatch optimal transport coupling straightens paths in unconditional flow matching. This leads to computationally less demanding inference as fewer integration steps and less complex numerical solvers can be employed when numerically solving an ordinary differential equation at test time. However, in the conditional setting, minibatch optimal transport falls short. This is because the default optimal transport mapping disregards conditions, resulting in a conditionally skewed prior distribution during training. In contrast, at test time, we have no access to the skewed prior, and instead sample from the full, unbiased prior distribution. This gap between training and testing leads to a subpar performance. To bridge this gap, we propose conditional optimal transport C^2OT that adds a conditional weighting term in the cost matrix when computing the optimal transport assignment. Experiments demonstrate that this simple fix works with both discrete and continuous conditions in 8gaussians-to-moons, CIFAR-10, ImageNet-32x32, and ImageNet-256x256. Our method performs better overall compared to the existing baselines across different function evaluation budgets. Code is available at https://hkchengrex.github.io/C2OT

Updated: 2025-03-14 06:35:23

标题: 条件的诅咒：分析和改进用于条件流生成的最优输运

摘要: 小批量最优输运耦合可以在无条件流匹配中使路径变直。这导致在测试时可以使用较少的积分步骤和更简单的数值求解器，从而减少了计算量。然而，在条件设置下，小批量最优输运存在一些问题。这是因为默认的最优输运映射忽略了条件，导致训练期间先验分布出现条件偏斜。相比之下，在测试时，我们无法访问偏斜的先验分布，而是从完整的、无偏的先验分布中抽样。训练和测试之间的这种差距导致性能不佳。为了弥合这一差距，我们提出了条件最优输运C^2OT，它在计算最优输运分配时添加了一个条件加权项在成本矩阵中。实验表明，这种简单的修复方法适用于8gaussians-to-moons、CIFAR-10、ImageNet-32x32和ImageNet-256x256等具有离散和连续条件的数据集。我们的方法在不同的功能评估预算下整体表现均优于现有基线。代码可在https://hkchengrex.github.io/C2OT找到。

更新时间: 2025-03-14 06:35:23

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.10636v2

A Multi-Objective Evaluation Framework for Analyzing Utility-Fairness Trade-Offs in Machine Learning Systems

The evaluation of fairness models in Machine Learning involves complex challenges, such as defining appropriate metrics, balancing trade-offs between utility and fairness, and there are still gaps in this stage. This work presents a novel multi-objective evaluation framework that enables the analysis of utility-fairness trade-offs in Machine Learning systems. The framework was developed using criteria from Multi-Objective Optimization that collect comprehensive information regarding this complex evaluation task. The assessment of multiple Machine Learning systems is summarized, both quantitatively and qualitatively, in a straightforward manner through a radar chart and a measurement table encompassing various aspects such as convergence, system capacity, and diversity. The framework's compact representation of performance facilitates the comparative analysis of different Machine Learning strategies for decision-makers, in real-world applications, with single or multiple fairness requirements. The framework is model-agnostic and flexible to be adapted to any kind of Machine Learning systems, that is, black- or white-box, any kind and quantity of evaluation metrics, including multidimensional fairness criteria. The functionality and effectiveness of the proposed framework is shown with different simulations, and an empirical study conducted on a real-world dataset with various Machine Learning systems.

Updated: 2025-03-14 06:32:42

标题: 一个用于分析机器学习系统中效用-公平性权衡的多目标评估框架

摘要: 机器学习中公平模型的评估涉及复杂的挑战，如定义适当的指标、平衡效用和公平性之间的权衡，并且在这个阶段仍存在差距。本研究提出了一个新颖的多目标评估框架，可用于分析机器学习系统中的效用-公平性权衡。该框架使用多目标优化的标准开发，收集关于这一复杂评估任务的全面信息。通过雷达图和涵盖收敛性、系统容量和多样性等各方面的度量表，对多个机器学习系统进行了简明的定量和定性评估。该框架对性能的简洁表示有助于决策者在实际应用中比较分析不同的机器学习策略，以满足单一或多个公平性要求。该框架是模型无关的，灵活适应于任何类型的机器学习系统，包括黑箱或白箱系统，任何类型和数量的评估指标，包括多维公平性标准。通过不同的模拟和对真实数据集进行的经验研究，展示了所提出框架的功能和有效性。

更新时间: 2025-03-14 06:32:42

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.11120v1

UMB@PerAnsSumm 2025: Enhancing Perspective-Aware Summarization with Prompt Optimization and Supervised Fine-Tuning

We present our approach to the PerAnsSumm Shared Task, which involves perspective span identification and perspective-aware summarization in community question-answering (CQA) threads. For span identification, we adopt ensemble learning that integrates three transformer models through averaging to exploit individual model strengths, achieving an 82.91% F1-score on test data. For summarization, we design a suite of Chain-of-Thought (CoT) prompting strategies that incorporate keyphrases and guide information to structure summary generation into manageable steps. To further enhance summary quality, we apply prompt optimization using the DSPy framework and supervised fine-tuning (SFT) on Llama-3 to adapt the model to domain-specific data. Experimental results on validation and test sets show that structured prompts with keyphrases and guidance improve summaries aligned with references, while the combination of prompt optimization and fine-tuning together yields significant improvement in both relevance and factuality evaluation metrics.

Updated: 2025-03-14 06:29:51

标题: UMB @ PerAnsSumm 2025：通过提示优化和监督微调提升透视感知摘要

摘要: 我们介绍了我们对PerAnsSumm共享任务的方法，该任务涉及社区问答（CQA）主题跨度识别和透视感知摘要。对于跨度识别，我们采用集成学习，通过平均化三个变压器模型来利用各个模型的优势，从而在测试数据上实现82.91%的F1分数。对于摘要，我们设计了一套Chain-of-Thought（CoT）提示策略，将关键短语和引导信息纳入结构化摘要生成的可管理步骤中。为了进一步提高摘要质量，我们应用了DSPy框架上的提示优化和在Llama-3上的监督微调（SFT）来使模型适应领域特定数据。验证和测试集上的实验结果表明，带有关键短语和引导的结构化提示改善了与参考文献一致的摘要，而提示优化和微调的组合则在相关性和事实性评估指标上取得了显著的改善。

更新时间: 2025-03-14 06:29:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.11118v1

Prompt Sentiment: The Catalyst for LLM Change

The rise of large language models (LLMs) has revolutionized natural language processing (NLP), yet the influence of prompt sentiment, a latent affective characteristic of input text, remains underexplored. This study systematically examines how sentiment variations in prompts affect LLM-generated outputs in terms of coherence, factuality, and bias. Leveraging both lexicon-based and transformer-based sentiment analysis methods, we categorize prompts and evaluate responses from five leading LLMs: Claude, DeepSeek, GPT-4, Gemini, and LLaMA. Our analysis spans six AI-driven applications, including content generation, conversational AI, legal and financial analysis, healthcare AI, creative writing, and technical documentation. By transforming prompts, we assess their impact on output quality. Our findings reveal that prompt sentiment significantly influences model responses, with negative prompts often reducing factual accuracy and amplifying bias, while positive prompts tend to increase verbosity and sentiment propagation. These results highlight the importance of sentiment-aware prompt engineering for ensuring fair and reliable AI-generated content.

Updated: 2025-03-14 06:25:21

标题: 快速情感：LLM变革的催化剂

摘要: 大型语言模型（LLMs）的兴起彻底改变了自然语言处理（NLP），然而，输入文本的潜在情感特征——提示情感的影响仍然未被充分探讨。本研究系统地研究了提示情感变化如何影响LLM生成的输出的连贯性、事实性和偏见。利用基于词典和基于Transformer的情感分析方法，我们对提示进行分类，并评估了来自五种主要LLM的响应：Claude、DeepSeek、GPT-4、Gemini和LLaMA。我们的分析涵盖了六种基于人工智能的应用，包括内容生成、对话AI、法律和财务分析、医疗保健AI、创意写作和技术文档。通过转换提示，我们评估了它们对输出质量的影响。我们的研究结果显示，提示情感显著影响模型的响应，负面提示通常会降低事实准确性并放大偏见，而正面提示往往会增加冗长性和情感传播。这些结果突显了情感感知型提示工程对确保公正和可靠的AI生成内容的重要性。

更新时间: 2025-03-14 06:25:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.13510v1

Pesti-Gen: Unleashing a Generative Molecule Approach for Toxicity Aware Pesticide Design

Global climate change has reduced crop resilience and pesticide efficacy, making reliance on synthetic pesticides inevitable, even though their widespread use poses significant health and environmental risks. While these pesticides remain a key tool in pest management, previous machine-learning applications in pesticide and agriculture have focused on classification or regression, leaving the fundamental challenge of generating new molecular structures or designing novel candidates unaddressed. In this paper, we propose Pesti-Gen, a novel generative model based on variational auto-encoders, designed to create pesticide candidates with optimized properties for the first time. Specifically, Pesti-Gen leverages a two-stage learning process: an initial pre-training phase that captures a generalized chemical structure representation, followed by a fine-tuning stage that incorporates toxicity-specific information. The model simultaneously optimizes over multiple toxicity metrics, such as (1) livestock toxicity and (2) aqua toxicity to generate environmentally friendly pesticide candidates. Notably, Pesti-Gen achieves approximately 68\% structural validity in generating new molecular structures, demonstrating the model's effectiveness in producing optimized and feasible pesticide candidates, thereby providing a new way for safer and more sustainable pest management solutions.

Updated: 2025-03-14 06:16:49

标题: Pesti-Gen：释放一种毒性感知杀虫剂设计的生成分子方法

摘要: 全球气候变化已经减少了作物的抗性和农药的效力，使得依赖合成农药不可避免，尽管它们的广泛使用会带来重大的健康和环境风险。虽然这些农药仍然是害虫管理中的关键工具，但先前在农药和农业领域的机器学习应用主要集中在分类或回归上，未解决生成新分子结构或设计新候选者的根本挑战。本文提出了一种基于变分自动编码器的新型生成模型Pesti-Gen，旨在首次创建具有优化特性的农药候选者。具体而言，Pesti-Gen利用两阶段学习过程：首先是捕获广义化学结构表示的初始预训练阶段，然后是整合毒性特定信息的微调阶段。该模型同时优化多个毒性指标，例如（1）对家畜的毒性和（2）对水生生物的毒性，以生成环保友好的农药候选者。值得注意的是，Pesti-Gen在生成新分子结构方面达到了约68％的结构有效性，展示了该模型在生成优化和可行的农药候选者方面的有效性，从而为更安全和更可持续的害虫管理解决方案提供了一种新途径。

更新时间: 2025-03-14 06:16:49

领域: cs.LG,cs.AI,q-bio.BM,q-bio.MN

下载: http://arxiv.org/abs/2501.14469v2

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.

Updated: 2025-03-14 06:03:20

标题: PrivacyLens：评估语言模型在实际中对隐私规范意识的影响

摘要: 随着语言模型（LMs）在个性化通信场景（例如发送电子邮件、撰写社交媒体帖子）中被广泛利用并赋予一定程度的代理能力，确保它们与上下文隐私规范一致行动变得日益关键。然而，由于隐私敏感案例具有上下文和长尾特性，并且缺乏能捕捉现实应用场景的评估方法，因此量化LMs的隐私规范意识和LMs介导的通信中出现的隐私风险是具有挑战性的。为了应对这些挑战，我们提出了PrivacyLens，一个设计用于将隐私敏感种子扩展为富有表现力的小插图，进一步扩展为代理轨迹，从而实现对LM代理行为中隐私泄露的多层次评估。我们利用隐私文献和众包种子构建了一组隐私规范的数据集，并将PrivacyLens具体化。使用这一数据集，我们揭示了LM在回答探询问题时的表现与它们在代理设置中执行用户指令时的实际行为之间存在差异。最先进的LMs，如GPT-4和Llama-3-70B，在25.68%和38.69%的案例中泄露敏感信息，即使在提供隐私增强指令的情况下也是如此。我们还通过将每个种子扩展为多个轨迹来展示PrivacyLens的动态性，以对抗LM隐私泄露风险。数据集和代码可在https://github.com/SALT-NLP/PrivacyLens找到。

更新时间: 2025-03-14 06:03:20

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2409.00138v3

Limits of KV Cache Compression for Tensor Attention based Autoregressive Transformers

The key-value (KV) cache in autoregressive transformers presents a significant bottleneck during inference, which restricts the context length capabilities of large language models (LLMs). While previous work analyzes the fundamental space complexity barriers in standard attention mechanism [Haris and Onak, 2025], our work generalizes the space complexity barriers result to tensor attention version. Our theoretical contributions rely on a novel reduction from communication complexity and deduce the memory lower bound for tensor-structured attention mechanisms when $d = \Omega(\log n)$. In the low dimensional regime where $d = o(\log n)$, we analyze the theoretical bounds of the space complexity as well. Overall, our work provides a theoretical foundation for us to understand the compression-expressivity tradeoff in tensor attention mechanisms and offers more perspectives in developing more memory-efficient transformer architectures.

Updated: 2025-03-14 06:01:42

标题: 基于张量注意力的自回归变换器的KV缓存压缩限制

摘要: 自回归变压器中的键-值（KV）缓存在推理过程中构成了一个重要瓶颈，这限制了大型语言模型（LLMs）的上下文长度能力。尽管先前的研究分析了标准注意力机制中的基本空间复杂性障碍[Haris和Onak，2025]，但我们的工作将空间复杂性障碍结果推广到张量注意力版本。我们的理论贡献依赖于从通信复杂性到张量结构化注意力机制的新颖简化，并推导出当$d = \Omega(\log n)$时张量结构化注意力机制的内存下界。在低维度区域，其中$d = o(\log n)$，我们也分析了空间复杂性的理论界限。总的来说，我们的工作为我们理解张量注意力机制中的压缩-表达能力权衡提供了理论基础，并在开发更节省内存的变压器架构方面提供了更多视角。

更新时间: 2025-03-14 06:01:42

领域: cs.LG,cs.AI,cs.CC,cs.CL

下载: http://arxiv.org/abs/2503.11108v1

Quantifying Interpretability in CLIP Models with Concept Consistency

CLIP is one of the most popular foundational models and is heavily used for many vision-language tasks. However, little is known about the inner workings of CLIP. While recent work has proposed decomposition-based interpretability methods for identifying textual descriptions of attention heads in CLIP, the implications of conceptual consistency in these text labels on interpretability and model performance has not been explored. To bridge this gap, we study the conceptual consistency of text descriptions for attention heads in CLIP-like models. We conduct extensive experiments on six different models from OpenAI and OpenCLIP which vary by size, type of pre-training data and patch size. We propose Concept Consistency Score (CCS), a novel interpretability metric that measures how consistently individual attention heads in CLIP models align with specific concepts. To assign concept labels to heads, we use in-context learning with ChatGPT, guided by a few manually-curated examples, and validate these labels using an LLM-as-a-judge approach. Our soft-pruning experiments reveal that high CCS heads are critical for preserving model performance, as pruning them leads to a significantly larger performance drop than pruning random or low CCS heads. Notably, we find that high CCS heads capture essential concepts and play a key role in out-of-domain detection, concept-specific reasoning, and video-language understanding. These results position CCS as a powerful interpretability metric for analyzing CLIP-like models.

Updated: 2025-03-14 05:47:17

标题: 使用概念一致性量化CLIP模型中的可解释性

摘要: CLIP是最受欢迎的基础模型之一，被广泛用于许多视觉语言任务。然而，对于CLIP的内部工作机制了解甚少。最近的研究提出了基于分解的可解释性方法，用于识别CLIP中注意力头的文本描述，但这些文本标签在可解释性和模型性能方面的概念一致性的影响尚未被探讨。为了弥补这一差距，我们研究了类似CLIP模型中注意力头的文本描述的概念一致性。我们对来自OpenAI和OpenCLIP的六种不同模型进行了广泛的实验，这些模型在大小、预训练数据类型和补丁大小上有所不同。我们提出了概念一致性分数（CCS），这是一种新颖的可解释性指标，衡量了CLIP模型中个别注意力头与特定概念的一致性程度。为了为头部分配概念标签，我们使用了ChatGPT中的上下文学习，通过一些手工策划的例子指导，并使用LLM作为评判方法验证这些标签。我们的软修剪实验表明，高CCS头部对于保持模型性能至关重要，因为修剪它们会导致性能下降显著大于修剪随机或低CCS头部。值得注意的是，我们发现高CCS头部捕获了关键概念，并在领域外检测、特定概念推理和视频语言理解中发挥了关键作用。这些结果将CCS定位为分析类似CLIP模型的强大可解释性指标。

更新时间: 2025-03-14 05:47:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11103v1

A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis

Self-supervised learning is a machine learning approach that generates implicit labels by learning underlined patterns and extracting discriminative features from unlabeled data without manual labelling. Contrastive learning introduces the concept of "positive" and "negative" samples, where positive pairs (e.g., variation of the same image/object) are brought together in the embedding space, and negative pairs (e.g., views from different images/objects) are pushed farther away. This methodology has shown significant improvements in image understanding and image text analysis without much reliance on labeled data. In this paper, we comprehensively discuss the terminologies, recent developments and applications of contrastive learning with respect to text-image models. Specifically, we provide an overview of the approaches of contrastive learning in text-image models in recent years. Secondly, we categorize the approaches based on different model structures. Thirdly, we further introduce and discuss the latest advances of the techniques used in the process such as pretext tasks for both images and text, architectural structures, and key trends. Lastly, we discuss the recent state-of-art applications of self-supervised contrastive learning Text-Image based models.

Updated: 2025-03-14 05:43:47

标题: 一项关于多模态文本-图像分析的自监督对比学习调查

摘要: 自监督学习是一种机器学习方法，通过学习底层模式和从未标记数据中提取具有区分性特征来生成隐式标签，无需手动标记。对比学习引入了“正”和“负”样本的概念，其中正样本（例如，相同图像/对象的变体）在嵌入空间中聚集在一起，而负样本（例如，来自不同图像/对象的视图）被推远。这种方法已经在图像理解和图像文本分析中显示出显著的改进，而不太依赖于标记数据。本文全面讨论了对比学习的术语、最新发展和应用，特别是与文本-图像模型相关的对比学习方法。具体来说，我们概述了近年来文本-图像模型中对比学习的方法。其次，我们根据不同的模型结构对方法进行分类。第三，我们进一步介绍和讨论了用于图像和文本的预训练任务、架构结构和关键趋势的最新进展。最后，我们讨论了基于自监督对比学习的文本-图像模型的最新应用。

更新时间: 2025-03-14 05:43:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11101v1

Approximating the Total Variation Distance between Gaussians

The total variation distance is a metric of central importance in statistics and probability theory. However, somewhat surprisingly, questions about computing it algorithmically appear not to have been systematically studied until very recently. In this paper, we contribute to this line of work by studying this question in the important special case of multivariate Gaussians. More formally, we consider the problem of approximating the total variation distance between two multivariate Gaussians to within an $\epsilon$-relative error. Previous works achieved a fixed constant relative error approximation via closed-form formulas. In this work, we give algorithms that given any two $n$-dimensional Gaussians $D_1,D_2$, and any error bound $\epsilon > 0$, approximate the total variation distance $D := d_{TV}(D_1,D_2)$ to $\epsilon$-relative accuracy in $\text{poly}(n,\frac{1}{\epsilon},\log \frac{1}{D})$ operations. The main technical tool in our work is a reduction that helps us extend the recent progress on computing the TV-distance between discrete random variables to our continuous setting.

Updated: 2025-03-14 05:42:10

标题: 逼近高斯分布之间的总变差距离

摘要: 总变差距离是统计学和概率论中非常重要的一个度量。然而，令人惊讶的是，关于如何通过算法计算它的问题似乎直到最近才被系统地研究。在本文中，我们通过研究多元高斯分布的重要特例，为这一研究方向做出贡献。更正式地说，我们考虑在两个多元高斯分布之间近似总变差距离的问题，使其相对误差不超过 ε。以前的工作通过封闭形式的公式实现了固定的相对误差近似。在本研究中，我们提出了算法，给定任意两个 n 维高斯分布 D1、D2 和任意误差界限 ε>0，以 poly(n,1/ε,log(1/D)) 次操作将总变差距离 D:=d_TV(D1,D2) 近似到 ε-相对精度。我们工作中的主要技术工具是一种缩减方法，帮助我们将最近对计算离散随机变量之间 TV-距离的进展扩展到我们的连续环境中。

更新时间: 2025-03-14 05:42:10

领域: cs.DS,cs.LG,math.PR

下载: http://arxiv.org/abs/2503.11099v1

Augmenting Image Annotation: A Human-LMM Collaborative Framework for Efficient Object Selection and Label Generation

Traditional image annotation tasks rely heavily on human effort for object selection and label assignment, making the process time-consuming and prone to decreased efficiency as annotators experience fatigue after extensive work. This paper introduces a novel framework that leverages the visual understanding capabilities of large multimodal models (LMMs), particularly GPT, to assist annotation workflows. In our proposed approach, human annotators focus on selecting objects via bounding boxes, while the LMM autonomously generates relevant labels. This human-AI collaborative framework enhances annotation efficiency by reducing the cognitive and time burden on human annotators. By analyzing the system's performance across various types of annotation tasks, we demonstrate its ability to generalize to tasks such as object recognition, scene description, and fine-grained categorization. Our proposed framework highlights the potential of this approach to redefine annotation workflows, offering a scalable and efficient solution for large-scale data labeling in computer vision. Finally, we discuss how integrating LMMs into the annotation pipeline can advance bidirectional human-AI alignment, as well as the challenges of alleviating the "endless annotation" burden in the face of information overload by shifting some of the work to AI.

Updated: 2025-03-14 05:38:53

标题: 增强图像标注：一种人-机协作框架，用于高效的对象选择和标签生成

摘要: 传统的图像注释任务在对象选择和标签分配方面严重依赖人力，使得这一过程耗时且容易在注释者长时间工作后出现疲劳而效率下降。本文介绍了一个利用大型多模态模型（LMMs），特别是GPT的视觉理解能力来辅助注释工作流程的新框架。在我们提出的方法中，人类注释者专注于通过边界框选择对象，而LMM自动生成相关标签。这种人工智能协作框架通过减轻人工注释者的认知和时间负担来提高注释效率。通过分析系统在各种类型的注释任务中的性能，我们展示了其能够泛化到识别对象、描述场景和细粒度分类等任务。我们提出的框架凸显了这种方法重新定义注释工作流程的潜力，为计算机视觉中的大规模数据标记提供可扩展和高效的解决方案。最后，我们讨论了如何将LMMs整合到注释管道中可以推动双向人工智能对齐，并在面对信息过载时通过将部分工作转移到人工智能来减轻“无休止的注释”负担所面临的挑战。

更新时间: 2025-03-14 05:38:53

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.11096v1

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.

Updated: 2025-03-14 05:33:47

标题: ReMA：利用多智能体强化学习为LLMs学会元思维

摘要: 最近关于大型语言模型（LLMs）推理的研究试图通过整合元思维来进一步提高它们的性能，使模型能够监控、评估和控制其推理过程，以实现更具适应性和有效性的问题解决。然而，当前的单一代理工作缺乏专门设计用于获取元思维，导致效果低下。为了解决这一挑战，我们引入了强化元思维代理（ReMA），这是一个利用多智能体强化学习（MARL）来引发元思维行为的新框架，鼓励LLMs进行自我思考。ReMA将推理过程分解为两个层次的代理：一个负责生成战略监督和计划的高层元思维代理，以及一个负责详细执行的低层推理代理。通过与目标对齐的迭代强化学习，这些代理探索和学习合作，从而提高了泛化能力和鲁棒性。实验结果表明，ReMA在复杂推理任务上优于单一代理RL基线，包括竞争水平的数学基准和LLM作为裁判的基准。全面的消融研究进一步说明了每个独立代理的演变动态，提供了有价值的见解，揭示了元思维推理过程如何增强LLMs的推理能力。

更新时间: 2025-03-14 05:33:47

领域: cs.AI,cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2503.09501v2

EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks

While multimodal large language models (MLLMs) have made groundbreaking progress in embodied intelligence, they still face significant challenges in spatial reasoning for complex long-horizon tasks. To address this gap, we propose EmbodiedVSR (Embodied Visual Spatial Reasoning), a novel framework that integrates dynamic scene graph-guided Chain-of-Thought (CoT) reasoning to enhance spatial understanding for embodied agents. By explicitly constructing structured knowledge representations through dynamic scene graphs, our method enables zero-shot spatial reasoning without task-specific fine-tuning. This approach not only disentangles intricate spatial relationships but also aligns reasoning steps with actionable environmental dynamics. To rigorously evaluate performance, we introduce the eSpatial-Benchmark, a comprehensive dataset including real-world embodied scenarios with fine-grained spatial annotations and adaptive task difficulty levels. Experiments demonstrate that our framework significantly outperforms existing MLLM-based methods in accuracy and reasoning coherence, particularly in long-horizon tasks requiring iterative environment interaction. The results reveal the untapped potential of MLLMs for embodied intelligence when equipped with structured, explainable reasoning mechanisms, paving the way for more reliable deployment in real-world spatial applications. The codes and datasets will be released soon.

Updated: 2025-03-14 05:06:07

标题: 具身VSR：动态场景图引导的视觉空间任务链式思维推理

摘要: 虽然多模态大型语言模型（MLLMs）在具有体现智能方面取得了突破性进展，但它们在复杂长期任务的空间推理方面仍面临重大挑战。为了填补这一空白，我们提出了一种新颖的框架EmbodiedVSR（具体视觉空间推理），该框架将动态场景图引导的思维链（CoT）推理集成在一起，以增强具体代理的空间理解能力。通过通过动态场景图显式构建结构化知识表示，我们的方法实现了零样本空间推理，无需特定任务的微调。这种方法不仅能够解开复杂的空间关系，还能够将推理步骤与可操作的环境动态对齐。为了严格评估性能，我们引入了eSpatial-Benchmark，这是一个包含细粒度空间注释和自适应任务难度级别的真实世界具体场景的综合数据集。实验证明，我们的框架在准确性和推理连贯性方面明显优于现有的基于MLLM的方法，特别是在需要迭代环境交互的长期任务中。结果显示，当配备结构化、可解释的推理机制时，MLLMs在具体智能方面具有未被开发的潜力，为其在真实世界空间应用中更可靠地部署铺平了道路。代码和数据集将很快发布。

更新时间: 2025-03-14 05:06:07

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.11089v1

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

Infants develop complex visual understanding rapidly, even preceding of the acquisition of linguistic inputs. As computer vision seeks to replicate the human vision system, understanding infant visual development may offer valuable insights. In this paper, we present an interdisciplinary study exploring this question: can a computational model that imitates the infant learning process develop broader visual concepts that extend beyond the vocabulary it has heard, similar to how infants naturally learn? To investigate this, we analyze a recently published model in Science by Vong et al.,which is trained on longitudinal, egocentric images of a single child paired with transcribed parental speech. We introduce a training-free framework that can discover visual concept neurons hidden in the model's internal representations. Our findings show that these neurons can classify objects outside its original vocabulary. Furthermore, we compare the visual representations in infant-like models with those in moder computer vision models, such as CLIP or ImageNet pre-trained model, highlighting key similarities and differences. Ultimately, our work bridges cognitive science and computer vision by analyzing the internal representations of a computational model trained on an infant's visual and linguistic inputs.

Updated: 2025-03-14 05:05:12

标题: 在婴儿学习中发现超越语言输入的隐藏视觉概念

摘要: 婴儿迅速发展复杂的视觉理解能力，甚至早于语言输入的习得。随着计算机视觉试图复制人类视觉系统，理解婴儿视觉发展可能提供宝贵的见解。在本文中，我们提出了一项跨学科研究，探讨这个问题：一个模拟婴儿学习过程的计算模型能否发展出超越其听过词汇的更广泛的视觉概念，类似于婴儿自然学习的方式？为了调查这个问题，我们分析了Vong等人在Science上最近发表的模型，该模型是在长期、以自我为中心的图像和转录的父母语音配对的基础上进行训练的。我们引入了一个无需训练的框架，可以发现模型内部表示中隐藏的视觉概念神经元。我们的发现表明，这些神经元可以对原始词汇之外的对象进行分类。此外，我们比较了婴儿模型中的视觉表示与现代计算机视觉模型（如CLIP或ImageNet预训练模型）中的表示，突出了关键的相似之处和差异。最终，我们的工作通过分析一个基于婴儿视觉和语言输入训练的计算模型的内部表示，搭起了认知科学和计算机视觉之间的桥梁。

更新时间: 2025-03-14 05:05:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2501.05205v3

A Survey of Cross-domain Graph Learning: Progress and Future Directions

Graph learning plays a vital role in mining and analyzing complex relationships involved in graph data, which is widely used in many real-world applications like transaction networks and communication networks. Foundation models in CV and NLP have shown powerful cross-domain capabilities that are also significant in graph domains. However, existing graph learning approaches struggle with cross-domain tasks. Inspired by successes in CV and NLP, cross-domain graph learning has once again become a focal point of attention to realizing true graph foundation models. In this survey, we present a comprehensive review and analysis of existing works on cross-domain graph learning. Concretely, we first propose a new taxonomy, categorizing existing approaches based on the learned cross-domain information: structure, feature, and structure-feature mixture. Next, we systematically survey representative methods in these categories. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. Relevant papers are summarized and will be consistently updated at: https://github.com/cshhzhao/Awesome-Cross-Domain-Graph-Learning.

Updated: 2025-03-14 04:53:27

标题: 《跨领域图学习概况：进展与未来方向调查》

摘要: 图学习在挖掘和分析涉及图数据的复杂关系中起着至关重要的作用，在许多现实世界应用中被广泛使用，例如交易网络和通信网络。CV和NLP中的基础模型显示出强大的跨领域能力，在图领域也同样重要。然而，现有的图学习方法在处理跨领域任务时存在困难。受CV和NLP成功的启发，跨领域图学习再次成为关注焦点，以实现真正的图基础模型。在本调查中，我们对现有的跨领域图学习作品进行了全面审查和分析。具体而言，我们首先提出了一个新的分类法，根据学习的跨领域信息将现有方法进行分类：结构、特征和结构-特征混合。接下来，我们系统地调查了这些类别中的代表性方法。最后，我们讨论了现有研究的剩余限制，并突出了未来研究的有前景的途径。相关论文将在https://github.com/cshhzhao/Awesome-Cross-Domain-Graph-Learning上进行总结并持续更新。

更新时间: 2025-03-14 04:53:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11086v1

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

Recent advances in diffusion models have significantly enhanced their ability to generate high-quality images and videos, but they have also increased the risk of producing unsafe content. Existing unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges: (1) They cannot instantly remove harmful concepts without training. (2) Their safe generation capabilities depend on collected training data. (3) They alter model weights, risking degradation in quality for content unrelated to toxic concepts. To address these, we propose SAFREE, a novel, training-free approach for safe T2I and T2V, that does not alter the model's weights. Specifically, we detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace, thereby filtering out harmful content while preserving intended semantics. To balance the trade-off between filtering toxicity and preserving safe concepts, SAFREE incorporates a novel self-validating filtering mechanism that dynamically adjusts the denoising steps when applying the filtered embeddings. Additionally, we incorporate adaptive re-attention mechanisms within the diffusion latent space to selectively diminish the influence of features related to toxic concepts at the pixel level. In the end, SAFREE ensures coherent safety checking, preserving the fidelity, quality, and safety of the output. SAFREE achieves SOTA performance in suppressing unsafe content in T2I generation compared to training-free baselines and effectively filters targeted concepts while maintaining high-quality images. It also shows competitive results against training-based methods. We extend SAFREE to various T2I backbones and T2V tasks, showcasing its flexibility and generalization. SAFREE provides a robust and adaptable safeguard for ensuring safe visual generation.

Updated: 2025-03-14 04:47:39

标题: SAFREE：用于安全文本到图像和视频生成的免训练和自适应保护

摘要: 最近在扩散模型方面取得的进展显著增强了它们生成高质量图像和视频的能力，但也增加了生成不安全内容的风险。现有的基于消除/编辑的安全生成方法可以从模型中删除有害概念，但面临几个挑战：（1）它们无法立即在训练过程中删除有害概念。（2）它们的安全生成能力取决于收集的训练数据。（3）它们改变模型权重，有可能使与有毒概念无关的内容质量下降。为了解决这些问题，我们提出了SAFREE，这是一个新颖的、无需训练的安全T2I和T2V方法，不会改变模型的权重。具体来说，我们在文本嵌入空间中检测与一组有毒概念对应的子空间，并引导提示嵌入远离这个子空间，从而过滤掉有害内容同时保留预期的语义。为了在过滤毒性和保留安全概念之间取得平衡，SAFREE结合了一种新颖的自我验证过滤机制，在应用过滤后的嵌入时动态调整去噪步骤。此外，我们在扩散潜在空间中加入了自适应再关注机制，以在像素级别有选择地减弱与有毒概念相关的特征的影响。最终，SAFREE确保了连贯的安全检查，保持了输出的忠实性、质量和安全性。与无需训练的基准相比，SAFREE在抑制T2I生成中的不安全内容方面取得了SOTA性能，并且有效地过滤了目标概念，同时保持高质量的图像。它还展示了与基于训练的方法竞争力强的结果。我们将SAFREE扩展到各种T2I骨干和T2V任务，展示了其灵活性和泛化能力。SAFREE为确保安全视觉生成提供了强大和可调适的保护措施。

更新时间: 2025-03-14 04:47:39

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.12761v2

MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation

In mobile manipulation, navigation and manipulation are often treated as separate problems, resulting in a significant gap between merely approaching an object and engaging with it effectively. Many navigation approaches primarily define success by proximity to the target, often overlooking the necessity for optimal positioning that facilitates subsequent manipulation. To address this, we introduce MoMa-Kitchen, a benchmark dataset comprising over 100k samples that provide training data for models to learn optimal final navigation positions for seamless transition to manipulation. Our dataset includes affordance-grounded floor labels collected from diverse kitchen environments, in which robotic mobile manipulators of different models attempt to grasp target objects amidst clutter. Using a fully automated pipeline, we simulate diverse real-world scenarios and generate affordance labels for optimal manipulation positions. Visual data are collected from RGB-D inputs captured by a first-person view camera mounted on the robotic arm, ensuring consistency in viewpoint during data collection. We also develop a lightweight baseline model, NavAff, for navigation affordance grounding that demonstrates promising performance on the MoMa-Kitchen benchmark. Our approach enables models to learn affordance-based final positioning that accommodates different arm types and platform heights, thereby paving the way for more robust and generalizable integration of navigation and manipulation in embodied AI. Project page: \href{https://momakitchen.github.io/}{https://momakitchen.github.io/}.

Updated: 2025-03-14 04:47:38

标题: MoMa-Kitchen：一种基于能力导向的移动操作中的最后一英里导航的十万级基准Benchmark

摘要: 在移动操作中，导航和操作通常被视为独立的问题，导致仅接近物体和有效地与之互动之间存在重大差距。许多导航方法主要通过与目标的接近程度来定义成功，通常忽视了为便于后续操作而进行最佳定位的必要性。为了解决这个问题，我们介绍了MoMa-Kitchen，一个包含超过100,000个样本的基准数据集，为模型提供学习实现无缝过渡到操作的最佳最终导航位置的训练数据。我们的数据集包括从多样化的厨房环境中收集的基于能力的地面标签，其中不同型号的机器人移动操作器在混乱中尝试抓取目标物体。通过完全自动化的流水线，我们模拟多样化的现实世界场景，并为最佳操作位置生成能力标签。视觉数据是由安装在机器人手臂上的第一人称视角摄像头捕获的RGB-D输入收集的，确保在数据收集过程中视点的一致性。我们还开发了一个轻量级基线模型NavAff，用于导航能力基础定位，在MoMa-Kitchen基准测试中展现了有希望的性能。我们的方法使模型能够学习基于能力的最终定位，以适应不同的臂型和平台高度，从而为在具体化人工智能中更强大和更通用的导航和操作集成铺平道路。项目页面: \href{https://momakitchen.github.io/}{https://momakitchen.github.io/}。

更新时间: 2025-03-14 04:47:38

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.11081v1

Generative Multi-Agent Q-Learning for Policy Optimization: Decentralized Wireless Networks

Q-learning is a widely used reinforcement learning (RL) algorithm for optimizing wireless networks, but faces challenges with large state-spaces. Recently proposed multi-environment mixed Q-learning (MEMQ) algorithm addresses these challenges by employing multiple Q-learning algorithms across multiple synthetically generated, distinct but structurally related environments, so-called digital cousins. In this paper, we propose a novel multi-agent MEMQ (M-MEMQ) for cooperative decentralized wireless networks with multiple networked transmitters (TXs) and base stations (BSs). TXs do not have access to global information (joint state and actions). The new concept of coordinated and uncoordinated states is introduced. In uncoordinated states, TXs act independently to minimize their individual costs and update local Q-functions. In coordinated states, TXs use a Bayesian approach to estimate the joint state and update the joint Q-functions. The cost of information-sharing scales linearly with the number of TXs and is independent of the joint state-action space size. Several theoretical guarantees, including deterministic and probabilistic convergence, bounds on estimation error variance, and the probability of misdetecting the joint states, are given. Numerical simulations show that M-MEMQ outperforms several decentralized and centralized training with decentralized execution (CTDE) multi-agent RL algorithms by achieving 55% lower average policy error (APE), 35% faster convergence, 50% reduced runtime complexity, and 45% less sample complexity. Furthermore, M-MEMQ achieves comparable APE with significantly lower complexity than centralized methods. Simulations validate the theoretical analyses.

Updated: 2025-03-14 04:46:50

标题: 生成式多智能体Q学习用于政策优化：分布式无线网络

摘要: Q-learning是一种广泛应用于优化无线网络的强化学习算法，但在面对大状态空间时会面临挑战。最近提出的多环境混合Q-learning（MEMQ）算法通过在多个合成生成的、不同但结构相关的环境中应用多个Q-learning算法（所谓的数字化堂兄弟）来解决这些挑战。本文提出了一种新颖的多代理MEMQ（M-MEMQ）算法，用于具有多个网络化发射机（TXs）和基站（BSs）的协作去中心化无线网络。TXs无法访问全局信息（联合状态和动作）。引入了协调和非协调状态的新概念。在非协调状态下，TXs独立行动以最小化各自的成本并更新本地Q函数。在协调状态下，TXs使用贝叶斯方法估计联合状态并更新联合Q函数。信息共享的成本与TXs数量成线性关系，并与联合状态-动作空间大小无关。给出了几个理论保证，包括确定性和概率收敛、估计误差方差的界限以及误检测联合状态的概率。数值模拟表明，M-MEMQ通过实现55％较低的平均策略误差（APE）、35％更快的收敛速度、50％减少的运行时复杂性和45％较少的样本复杂性，优于几种去中心化和集中化训练与去中心化执行（CTDE）多代理RL算法。此外，M-MEMQ以显著较低的复杂性实现可比的APE，优于集中化方法。模拟验证了理论分析。

更新时间: 2025-03-14 04:46:50

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2503.05970v2

Understanding Flatness in Generative Models: Its Role and Benefits

Flat minima, known to enhance generalization and robustness in supervised learning, remain largely unexplored in generative models. In this work, we systematically investigate the role of loss surface flatness in generative models, both theoretically and empirically, with a particular focus on diffusion models. We establish a theoretical claim that flatter minima improve robustness against perturbations in target prior distributions, leading to benefits such as reduced exposure bias -- where errors in noise estimation accumulate over iterations -- and significantly improved resilience to model quantization, preserving generative performance even under strong quantization constraints. We further observe that Sharpness-Aware Minimization (SAM), which explicitly controls the degree of flatness, effectively enhances flatness in diffusion models, whereas other well-known methods such as Stochastic Weight Averaging (SWA) and Exponential Moving Average (EMA), which promote flatness indirectly via ensembling, are less effective. Through extensive experiments on CIFAR-10, LSUN Tower, and FFHQ, we demonstrate that flat minima in diffusion models indeed improves not only generative performance but also robustness.

Updated: 2025-03-14 04:38:53

标题: 理解生成模型中的平整性：其作用和好处

摘要: 平坦的极小值在监督学习中已被证明能够增强泛化性能和鲁棒性，但在生成模型中仍然鲜为人知。本文系统地探讨了损失曲面的平坦度在生成模型中的作用，从理论和实证两个角度进行研究，特别关注扩散模型。我们提出了一个理论假设，即更平坦的极小值可以提高对目标先验分布的扰动鲁棒性，从而带来诸如减少曝光偏差（即在迭代中噪声估计误差累积）和在强量化约束下显著提高模型抗量化能力等好处。我们进一步观察到，明确控制平坦度程度的Sharpness-Aware Minimization（SAM）方法可以有效地增强扩散模型中的平坦性，而其他促进平坦性的著名方法如随机权重平均（SWA）和指数移动平均（EMA）则通过集成间接地实现这一目标，效果较差。通过在CIFAR-10、LSUN Tower和FFHQ上进行大量实验，我们证明扩散模型中的平坦极小值确实不仅提高了生成性能，还增强了鲁棒性。

更新时间: 2025-03-14 04:38:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11078v1

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Despite great success across various multimodal tasks, Large Vision-Language Models (LVLMs) often encounter object hallucinations with generated textual responses being inconsistent with the actual objects in images. We examine different LVLMs and pinpoint that one root cause of object hallucinations lies with deficient attention on discriminative image features. Specifically, LVLMs often predominantly attend to prompt-irrelevant global features instead of prompt-relevant local features, undermining their visual grounding capacity and leading to object hallucinations. We propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates hallucinations by assembling global features for response generation and local features for visual discrimination simultaneously. Specifically, we introduce an image-prompt matching scheme that captures prompt-relevant local features from images, leading to an augmented view of the input image where prompt-relevant content is highlighted while irrelevant distractions are suppressed. Hallucinations can thus be mitigated with a calibrated logit distribution that is from generative global features of the original image and discriminative local features of the augmented image. Extensive experiments show the superiority of AGLA in LVLM hallucination mitigation, demonstrating its wide applicability across both discriminative and generative tasks. Our code is available at https://github.com/Lackel/AGLA.

Updated: 2025-03-14 04:38:44

标题: 使用全局和局部注意力集合减轻大型视觉语言模型中的对象幻觉

摘要: 尽管大型视觉-语言模型（LVLMs）在各种多模态任务中取得了巨大成功，但通常在生成的文本响应中出现对象幻觉，与图像中的实际对象不一致。我们检查了不同的LVLMs，并指出对象幻觉的一个根本原因在于对具有区分性的图像特征的注意不足。具体地说，LVLMs通常主要关注于与提示无关的全局特征，而不是与提示相关的局部特征，削弱了它们的视觉基础能力，并导致了对象幻觉。我们提出了一种名为全局和局部注意力组合（AGLA）的训练无关且即插即用的方法，通过同时组装全局特征用于响应生成和局部特征用于视觉区分来缓解幻觉。具体地，我们引入了一种图像提示匹配方案，从图像中捕获提示相关的局部特征，从而呈现出输入图像的增强视图，其中突出显示了提示相关内容，同时抑制了无关的干扰。因此，可以通过校准的逻辑分布来缓解幻觉，该分布来自原始图像的生成全局特征和增强图像的区分性局部特征。大量实验证明了AGLA在LVLM幻觉缓解方面的优越性，展示了其在区分性和生成任务中的广泛适用性。我们的代码可在https://github.com/Lackel/AGLA上找到。

更新时间: 2025-03-14 04:38:44

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12718v3

Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities

The rise of Large Reasoning Models (LRMs) signifies a paradigm shift toward advanced computational reasoning. Yet, this progress disrupts traditional agent frameworks, traditionally anchored by execution-oriented Large Language Models (LLMs). To explore this transformation, we propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving, assessed with three top LLMs (e.g., Claude3.5-sonnet) and five leading LRMs (e.g., DeepSeek-R1). Our findings address four research questions: LRMs surpass LLMs in reasoning-intensive tasks like Plan Design, leveraging iterative reflection for superior outcomes; LLMs excel in execution-driven tasks such as Tool Usage, prioritizing efficiency; hybrid LLM-LRM configurations, pairing LLMs as actors with LRMs as reflectors, optimize agent performance by blending execution speed with reasoning depth; and LRMs' enhanced reasoning incurs higher computational costs, prolonged processing, and behavioral challenges, including overthinking and fact-ignoring tendencies. This study fosters deeper inquiry into LRMs' balance of deep thinking and overthinking, laying a critical foundation for future agent design advancements.

Updated: 2025-03-14 04:34:31

标题: 在Agent场景中的大型推理模型：探索推理能力的必要性

摘要: 大型推理模型（LRMs）的崛起标志着向先进计算推理的范式转变。然而，这一进展打破了传统的代理框架，传统上以执行导向的大型语言模型（LLMs）为基础。为了探索这种转变，我们提出了LaRMA框架，涵盖了工具使用、计划设计和问题解决等九个任务，评估了三个顶尖的LLMs（如Claude3.5-sonnet）和五个领先的LRMs（如DeepSeek-R1）。我们的研究回答了四个研究问题：LRMs在计划设计等推理密集型任务中超过了LLMs，利用迭代反思获得更好的结果；LLMs在执行驱动的任务（如工具使用）中表现出色，优先考虑效率；混合LLM-LRM配置，将LLMs作为执行者与LRMs作为反思者配对，通过融合执行速度和推理深度来优化代理性能；LRMs增强的推理带来更高的计算成本、延长的处理时间和行为挑战，包括过度思考和忽视事实的倾向。这项研究促进了对LRMs深思与过度思考平衡的更深入探讨，为未来代理设计进步奠定了关键基础。

更新时间: 2025-03-14 04:34:31

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.11074v1

Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

High-level synthesis (HLS) is a widely used tool in designing Field Programmable Gate Array (FPGA). HLS enables FPGA design with software programming languages by compiling the source code into an FPGA circuit. The source code includes a program (called "kernel") and several pragmas that instruct hardware synthesis, such as parallelization, pipeline, etc. While it is relatively easy for software developers to design the program, it heavily relies on hardware knowledge to design the pragmas, posing a big challenge for software developers. Recently, different machine learning algorithms, such as GNNs, have been proposed to automate the pragma design via performance prediction. However, when applying the trained model on new kernels, the significant domain shift often leads to unsatisfactory performance. We propose a more domain-generalizable model structure: a two-level hierarchical Mixture of Experts (MoE), that can be flexibly adapted to any GNN model. Different expert networks can learn to deal with different regions in the representation space, and they can utilize similar patterns between the old kernels and new kernels. In the low-level MoE, we apply MoE on three natural granularities of a program: node, basic block, and graph. The high-level MoE learns to aggregate the three granularities for the final decision. To train the hierarchical MoE stably, we further propose a two-stage training method to avoid expert polarization. Extensive experiments verify the effectiveness of the proposed hierarchical MoE. We publicized our codes at https://github.com/weikai-li/HierarchicalMoE.

Updated: 2025-03-14 04:31:59

标题: 专家层次混合：面向高级综合的可泛化学习

摘要: 高层次综合（HLS）是在设计现场可编程门阵列（FPGA）中广泛使用的工具。HLS通过将源代码编译成FPGA电路，使FPGA设计与软件编程语言结合。源代码包括一个程序（称为“内核”）和几个指示硬件综合的编译指令，如并行化、流水线等。虽然软件开发人员相对容易设计程序，但设计编译指令却严重依赖硬件知识，给软件开发人员带来了很大挑战。最近，提出了不同的机器学习算法，如GNNs，以通过性能预测自动化编译指令设计。然而，将训练好的模型应用于新内核时，显著的领域转移经常导致性能不理想。我们提出了一种更具领域通用性的模型结构：一个两级分层专家混合模型（MoE），可以灵活地适应任何GNN模型。不同的专家网络可以学习处理表示空间中的不同区域，并且它们可以利用旧内核和新内核之间的相似模式。在低级MoE中，我们将MoE应用于程序的三个自然粒度：节点、基本块和图。高级MoE学习聚合三种粒度以作出最终决策。为了稳定地训练分层MoE，我们进一步提出了一个两阶段训练方法，以避免专家极化。大量实验证实了所提出的分层MoE的有效性。我们在https://github.com/weikai-li/HierarchicalMoE上公开了我们的代码。

更新时间: 2025-03-14 04:31:59

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2410.19225v4

SPECTra: Scalable Multi-Agent Reinforcement Learning with Permutation-Free Networks

In cooperative multi-agent reinforcement learning (MARL), the permutation problem where the state space grows exponentially with the number of agents reduces sample efficiency. Additionally, many existing architectures struggle with scalability, relying on a fixed structure tied to a specific number of agents, limiting their applicability to environments with a variable number of entities. While approaches such as graph neural networks (GNNs) and self-attention mechanisms have progressed in addressing these challenges, they have significant limitations as dense GNNs and self-attention mechanisms incur high computational costs. To overcome these limitations, we propose a novel agent network and a non-linear mixing network that ensure permutation-equivariance and scalability, allowing them to generalize to environments with various numbers of agents. Our agent network significantly reduces computational complexity, and our scalable hypernetwork enables efficient weight generation for non-linear mixing. Additionally, we introduce curriculum learning to improve training efficiency. Experiments on SMACv2 and Google Research Football (GRF) demonstrate that our approach achieves superior learning performance compared to existing methods. By addressing both permutation-invariance and scalability in MARL, our work provides a more efficient and adaptable framework for cooperative MARL. Our code is available at https://github.com/funny-rl/SPECTra.

Updated: 2025-03-14 04:26:51

标题: SPECTra: 无排列网络的可扩展多智能体强化学习

摘要: 在合作多智能体强化学习（MARL）中，状态空间随智能体数量呈指数增长的排列问题降低了样本效率。此外，许多现有架构在可扩展性方面存在困难，依赖于与特定智能体数量相关的固定结构，限制了它们在具有可变实体数量的环境中的适用性。虽然图神经网络（GNNs）和自注意机制等方法在解决这些挑战方面取得了进展，但密集GNNs和自注意机制会产生高计算成本。为了克服这些限制，我们提出了一种新颖的智能体网络和非线性混合网络，确保置换等变性和可扩展性，使其能够泛化到具有不同数量智能体的环境中。我们的智能体网络显著降低了计算复杂性，而我们的可扩展超网络能够有效地生成非线性混合的权重。此外，我们引入了课程学习以提高训练效率。在SMACv2和Google Research Football（GRF）上的实验证明，我们的方法相比现有方法实现了更优越的学习性能。通过在MARL中同时解决置换等变性和可扩展性，我们的工作为合作MARL提供了一个更有效和适应性更强的框架。我们的代码可在https://github.com/funny-rl/SPECTra上找到。

更新时间: 2025-03-14 04:26:51

领域: cs.LG,cs.AI,I.2.11

下载: http://arxiv.org/abs/2503.11726v1

API Agents vs. GUI Agents: Divergence and Convergence

Large language models (LLMs) have evolved beyond simple text generation to power software agents that directly translate natural language commands into tangible actions. While API-based LLM agents initially rose to prominence for their robust automation capabilities and seamless integration with programmatic endpoints, recent progress in multimodal LLM research has enabled GUI-based LLM agents that interact with graphical user interfaces in a human-like manner. Although these two paradigms share the goal of enabling LLM-driven task automation, they diverge significantly in architectural complexity, development workflows, and user interaction models. This paper presents the first comprehensive comparative study of API-based and GUI-based LLM agents, systematically analyzing their divergence and potential convergence. We examine key dimensions and highlight scenarios in which hybrid approaches can harness their complementary strengths. By proposing clear decision criteria and illustrating practical use cases, we aim to guide practitioners and researchers in selecting, combining, or transitioning between these paradigms. Ultimately, we indicate that continuing innovations in LLM-based automation are poised to blur the lines between API- and GUI-driven agents, paving the way for more flexible, adaptive solutions in a wide range of real-world applications.

Updated: 2025-03-14 04:26:21

标题: API代理 vs. GUI代理：分歧与融合

摘要: 大型语言模型（LLMs）已经发展到超越简单文本生成，成为直接将自然语言命令翻译为具体动作的软件代理的动力。虽然基于API的LLM代理最初因其强大的自动化能力和与编程端点的无缝集成而声名鹊起，但最近在多模态LLM研究中取得的进展使得基于GUI的LLM代理能够以类似人类的方式与图形用户界面交互。尽管这两种范式都致力于实现LLM驱动的任务自动化的目标，但在架构复杂性、开发工作流程和用户交互模型方面存在显著差异。本文提出了第一份API-based和GUI-based LLM代理的全面比较研究，系统分析它们的分歧和潜在的融合。我们检查了关键维度，并强调混合方法可以利用它们互补的优势。通过提出明确的决策标准并举例说明实际用例，我们旨在指导从业者和研究人员在选择、组合或在这些范式之间过渡时。最终，我们表明，基于LLM的自动化的持续创新有望模糊API和GUI驱动代理之间的界限，为各种现实世界应用提供更灵活、适应性更强的解决方案铺平道路。

更新时间: 2025-03-14 04:26:21

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.11069v1

Low-cost Real-world Implementation of the Swing-up Pendulum for Deep Reinforcement Learning Experiments

Deep reinforcement learning (DRL) has had success in virtual and simulated domains, but due to key differences between simulated and real-world environments, DRL-trained policies have had limited success in real-world applications. To assist researchers to bridge the \textit{sim-to-real gap}, in this paper, we describe a low-cost physical inverted pendulum apparatus and software environment for exploring sim-to-real DRL methods. In particular, the design of our apparatus enables detailed examination of the delays that arise in physical systems when sensing, communicating, learning, inferring and actuating. Moreover, we wish to improve access to educational systems, so our apparatus uses readily available materials and parts to reduce cost and logistical barriers. Our design shows how commercial, off-the-shelf electronics and electromechanical and sensor systems, combined with common metal extrusions, dowel and 3D printed couplings provide a pathway for affordable physical DRL apparatus. The physical apparatus is complemented with a simulated environment implemented using a high-fidelity physics engine and OpenAI Gym interface.

Updated: 2025-03-14 04:18:36

标题: 低成本实现摆动式摆杆用于深度强化学习实验

摘要: 深度强化学习（DRL）在虚拟和模拟领域取得了成功，但由于模拟和现实世界环境之间的关键差异，DRL训练的策略在现实世界应用中取得了有限的成功。为了帮助研究人员弥合“模拟到现实”的差距，在本文中，我们描述了一个低成本的物理倒立摆装置和软件环境，用于探索模拟到现实的DRL方法。特别是，我们装置的设计使得可以详细研究在感知、通信、学习、推断和执行时在物理系统中产生的延迟。此外，我们希望改善对教育系统的访问，因此我们的装置使用易得材料和零部件以降低成本和物流障碍。我们的设计展示了如何结合商用的、现成的电子和电机以及传感器系统，再加上常见的金属挤压件、小木棍和3D打印耦合件，提供了一条经济实惠的物理DRL装置的路径。物理装置配有一个使用高保真物理引擎和OpenAI Gym接口实现的模拟环境。

更新时间: 2025-03-14 04:18:36

领域: cs.LG,cs.AI,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.11065v1

Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation

Multimodal retrieval-augmented generation (RAG) enhances the visual reasoning capability of vision-language models (VLMs) by dynamically accessing information from external knowledge bases. In this work, we introduce \textit{Poisoned-MRAG}, the first knowledge poisoning attack on multimodal RAG systems. Poisoned-MRAG injects a few carefully crafted image-text pairs into the multimodal knowledge database, manipulating VLMs to generate the attacker-desired response to a target query. Specifically, we formalize the attack as an optimization problem and propose two cross-modal attack strategies, dirty-label and clean-label, tailored to the attacker's knowledge and goals. Our extensive experiments across multiple knowledge databases and VLMs show that Poisoned-MRAG outperforms existing methods, achieving up to 98\% attack success rate with just five malicious image-text pairs injected into the InfoSeek database (481,782 pairs). Additionally, We evaluate 4 different defense strategies, including paraphrasing, duplicate removal, structure-driven mitigation, and purification, demonstrating their limited effectiveness and trade-offs against Poisoned-MRAG. Our results highlight the effectiveness and scalability of Poisoned-MRAG, underscoring its potential as a significant threat to multimodal RAG systems.

Updated: 2025-03-14 04:16:23

标题: 中文翻译：中毒-MRAG：针对多模态检索增强生成的知识中毒攻击

摘要: 多模态检索增强生成（RAG）通过动态访问外部知识库，增强了视觉-语言模型（VLMs）的视觉推理能力。在这项工作中，我们引入了\textit{Poisoned-MRAG}，这是对多模态RAG系统进行的首次知识中毒攻击。Poisoned-MRAG将一些精心设计的图像-文本对注入到多模态知识数据库中，操纵VLMs生成攻击者所需的响应以回应目标查询。具体来说，我们将攻击形式化为一个优化问题，并提出了两种跨模态攻击策略，即脏标签和干净标签，根据攻击者的知识和目标进行定制。我们在多个知识数据库和VLMs上进行了大量实验，结果显示Poisoned-MRAG优于现有方法，仅向InfoSeek数据库（481,782对）注入五个恶意图像-文本对就可实现高达98\%的攻击成功率。此外，我们评估了4种不同的防御策略，包括释义、重复删除、结构驱动缓解和净化，展示了它们在对抗Poisoned-MRAG时的有限有效性和权衡。我们的结果突显了Poisoned-MRAG的有效性和可扩展性，强调了它作为多模态RAG系统重大威胁的潜力。

更新时间: 2025-03-14 04:16:23

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.06254v2

MobiVital: Self-supervised Time-series Quality Estimation for Contactless Respiration Monitoring Using UWB Radar

Respiration waveforms are increasingly recognized as important biomarkers, offering insights beyond simple respiration rates, such as detecting breathing irregularities for disease diagnosis or monitoring breath patterns to guide rehabilitation training. Previous works in wireless respiration monitoring have primarily focused on estimating respiration rate, where the breath waveforms are often generated as a by-product. As a result, issues such as waveform deformation and inversion have largely been overlooked, reducing the signal's utility for applications requiring breathing waveforms. To address this problem, we present a novel approach, MobiVital, that improves the quality of respiration waveforms obtained from ultra-wideband (UWB) radar data. MobiVital combines a self-supervised autoregressive model for breathing waveform extraction with a biology-informed algorithm to detect and correct waveform inversions. To encourage reproducible research efforts for developing wireless vital signal monitoring systems, we also release a 12-person, 24-hour UWB radar vital signal dataset, with time-synchronized ground truth obtained from wearable sensors. Our results show that the respiration waveforms produced by our system exhibit a 7-34% increase in fidelity to the ground truth compared to the baselines and can benefit downstream tasks such as respiration rate estimation.

Updated: 2025-03-14 04:14:27

标题: MobiVital: 使用UWB雷达进行无接触呼吸监测的自监督时间序列质量估计

摘要: 呼吸波形越来越被认为是重要的生物标志物，提供了超出简单呼吸速率的见解，例如检测呼吸不规律以进行疾病诊断或监测呼吸模式以指导康复训练。先前的无线呼吸监测工作主要集中在估算呼吸速率上，呼吸波形通常作为副产品生成。因此，波形变形和反转等问题往往被忽视，降低了信号在需要呼吸波形的应用中的实用性。为了解决这个问题，我们提出了一种新颖的方法MobiVital，它通过改进从超宽带（UWB）雷达数据中获得的呼吸波形的质量。MobiVital结合了一个自监督的自回归模型用于呼吸波形提取，以及一个基于生物学的算法用于检测和纠正波形反转。为了鼓励开发无线生命信号监测系统的可复制研究工作，我们还发布了一个包含12人、24小时UWB雷达生命信号数据集，其中地面实况与可穿戴传感器同步。我们的结果显示，与基线相比，我们系统产生的呼吸波形与地面实况的逼真度增加了7-34%，可以使下游任务如呼吸速率估算受益。

更新时间: 2025-03-14 04:14:27

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2503.11064v1

AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection

As object detection becomes integral to many safety-critical applications, understanding its vulnerabilities is essential. Backdoor attacks, in particular, pose a serious threat by implanting hidden triggers in victim models, which adversaries can later exploit to induce malicious behaviors during inference. However, current understanding is limited to single-target attacks, where adversaries must define a fixed malicious behavior (target) before training, making inference-time adaptability impossible. Given the large output space of object detection (including object existence prediction, bounding box estimation, and classification), the feasibility of flexible, inference-time model control remains unexplored. This paper introduces AnywhereDoor, a multi-target backdoor attack for object detection. Once implanted, AnywhereDoor allows adversaries to make objects disappear, fabricate new ones, or mislabel them, either across all object classes or specific ones, offering an unprecedented degree of control. This flexibility is enabled by three key innovations: (i) objective disentanglement to scale the number of supported targets; (ii) trigger mosaicking to ensure robustness even against region-based detectors; and (iii) strategic batching to address object-level data imbalances that hinder manipulation. Extensive experiments demonstrate that AnywhereDoor grants attackers a high degree of control, improving attack success rates by 26% compared to adaptations of existing methods for such flexible control.

Updated: 2025-03-14 04:12:52

标题: AnywhereDoor：针对物体检测的多目标后门攻击

摘要: 随着目标检测在许多安全关键应用中变得不可或缺，理解其漏洞至关重要。特别是，后门攻击通过在受害者模型中植入隐藏的触发器，对手可以利用这些触发器在推理过程中诱发恶意行为，构成严重威胁。然而，目前对单目标攻击的理解有限，对手必须在训练之前定义一个固定的恶意行为（目标），使得在推理时无法适应。鉴于目标检测的大输出空间（包括目标存在预测、边界框估计和分类），灵活的、推理时的模型控制的可行性尚未得到探索。本文介绍了AnywhereDoor，这是一种用于目标检测的多目标后门攻击。一旦植入，AnywhereDoor允许对手使目标消失、制造新目标或错误标记它们，可以跨越所有目标类别或特定类别，提供前所未有的控制程度。这种灵活性得益于三个关键创新：（i）目标解耦，以扩展支持的目标数量；（ii）触发器镶嵌，确保对区域检测器的稳健性；以及（iii）策略批处理，解决阻碍操纵的目标级数据不平衡。大量实验证明，AnywhereDoor赋予攻击者高度的控制权，攻击成功率比现有方法的适应性提高了26%。

更新时间: 2025-03-14 04:12:52

领域: cs.CR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.14243v2

Standalone 16-bit Neural Network Training: Missing Study for Hardware-Limited Deep Learning Practitioners

With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during model training and inference to optimize resource usage, have been widely adopted. However, access to hardware that supports lower precision formats (e.g., FP8 or FP4) remains limited, especially for practitioners with hardware constraints. For many with limited resources, the available options are restricted to using 32-bit, 16-bit, or a combination of the two. While it is commonly believed that 16-bit precision can achieve results comparable to full (32-bit) precision, this study is the first to systematically validate this assumption through both rigorous theoretical analysis and extensive empirical evaluation. Our theoretical formalization of floating-point errors and classification tolerance provides new insights into the conditions under which 16-bit precision can approximate 32-bit results. This study fills a critical gap, proving for the first time that standalone 16-bit precision neural networks match 32-bit and mixed-precision in accuracy while boosting computational speed. Given the widespread availability of 16-bit across GPUs, these findings are especially valuable for machine learning practitioners with limited hardware resources to make informed decisions.

Updated: 2025-03-14 04:05:05

标题: 独立16位神经网络训练：面向硬件受限的深度学习从业者的研究缺失

摘要: 随着机器学习模型复杂性的增加，管理计算资源如内存和处理能力已成为一个关键问题。混合精度技术利用不同的数值精度在模型训练和推理过程中优化资源使用，已被广泛采用。然而，支持较低精度格式（例如FP8或FP4）的硬件访问仍然有限，尤其是对于受硬件限制的从业者而言。对于许多资源有限的人来说，可用的选项仅限于使用32位、16位或两者的组合。虽然普遍认为16位精度可以达到与完整（32位）精度可比的结果，但这项研究是第一个通过严格的理论分析和广泛的实证评估系统地验证了这一假设。我们对浮点误差和分类容忍度的理论形式化提供了新的见解，说明了16位精度可以近似32位结果的条件。这项研究填补了一个关键空白，首次证明独立的16位精度神经网络在准确性上与32位和混合精度相匹配，同时提高了计算速度。鉴于16位在GPU中广泛可用，这些发现对于具有有限硬件资源的机器学习从业者来说尤其有价值，可以做出明智的决策。

更新时间: 2025-03-14 04:05:05

领域: cs.LG,cs.AI,cs.CV,cs.PF

下载: http://arxiv.org/abs/2305.10947v5

Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy.

Updated: 2025-03-14 03:59:35

标题: 蒙面的LoGoNet：用于医学领域的快速准确的3D图像分析

摘要: 现代标准的基于机器学习的成像方法在医学应用中面临挑战，原因是数据集构建的高成本，因此可用的标记训练数据有限。此外，在部署时，这些方法通常用于每天处理大量数据，给医疗机构带来高昂的维护成本。在本文中，我们介绍了一种新的神经网络架构，称为LoGoNet，采用定制的自监督学习（SSL）方法来减轻这些挑战。LoGoNet在U形架构中整合了一个新颖的特征提取器，利用大核心关注（LKA）和双重编码策略来灵活捕捉长距离和短距离特征依赖关系。这与现有方法依赖增加网络容量以增强特征提取的方式形成对比。我们模型中这些新颖技术的组合在医学图像分割中特别有益，考虑到学习复杂且常常不规则的器官形状（如脾脏）的困难。此外，我们提出了一种针对3D图像量身定制的新颖SSL方法，以弥补缺乏大规模标记数据集的不足。该方法将遮罩和对比学习技术结合在一个多任务学习框架中，并与Vision Transformer（ViT）和基于CNN的模型兼容。我们在两个标准数据集（即BTCV和MSD）上展示了我们方法在多项任务中的有效性。与八种最先进模型的基准比较突显了LoGoNet在推理时间和准确性上的优越表现。

更新时间: 2025-03-14 03:59:35

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.06190v2

Multi-Knowledge-oriented Nighttime Haze Imaging Enhancer for Vision-driven Intelligent Transportation Systems

Salient object detection (SOD) plays a critical role in intelligent transportation systems (ITS), facilitating the detection and segmentation of key visual elements in an image. However, adverse imaging conditions such as haze during the day, low light, and haze at night severely degrade image quality and hinder reliable object detection in real-world scenarios. To address these challenges, we propose a multi-knowledge-oriented nighttime haze imaging enhancer (MKoIE), which integrates three tasks: daytime dehazing, low-light enhancement, and nighttime dehazing. The MKoIE incorporates two key innovative components: First, the network employs a task-oriented node learning mechanism to handle three specific degradation types: day-time haze, low light, and night-time haze conditions, with an embedded self-attention module enhancing its performance in nighttime imaging. In addition, multi-receptive field enhancement module that efficiently extracts multi-scale features through three parallel depthwise separable convolution branches with different dilation rates, capturing comprehensive spatial information with minimal computational overhead to meet the requirements of real-time ITS deployment. To ensure optimal image reconstruction quality and visual characteristics, we suggest a hybrid loss function. Extensive experiments on different types of weather/imaging conditions illustrate that MKoIE surpasses existing methods, enhancing the reliability, accuracy, and operational efficiency of ITS. The code is available at https://github.com/Ai-Chen-Lab/MKoIE.

Updated: 2025-03-14 03:54:26

标题: 多知识导向的夜间雾霾成像增强器，用于基于视觉的智能交通系统

摘要: 显著物体检测（SOD）在智能交通系统（ITS）中发挥着关键作用，有助于检测和分割图像中的关键视觉元素。然而，白天的雾霾、低光照和夜间雾霾等不利成像条件严重降低了图像质量，阻碍了在真实场景中可靠地检测物体。为了解决这些挑战，我们提出了一种多知识导向的夜间雾霾成像增强器（MKoIE），集成了三个任务：白天去雾、低光照增强和夜间去雾。MKoIE包含两个关键创新组件：首先，网络采用任务导向的节点学习机制来处理三种特定的退化类型：白天雾霾、低光照和夜间雾霾条件，嵌入的自注意模块增强了其在夜间成像中的性能。此外，多感受野增强模块通过三个具有不同膨胀率的深度分离卷积分支有效提取多尺度特征，以最小的计算开销捕捉全面的空间信息，满足实时ITS部署的要求。为了确保最佳的图像重建质量和视觉特性，我们建议使用混合损失函数。对不同类型的天气/成像条件进行的大量实验表明，MKoIE超越了现有方法，提高了ITS的可靠性、准确性和操作效率。代码可在https://github.com/Ai-Chen-Lab/MKoIE找到。

更新时间: 2025-03-14 03:54:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07351v3

Training Directional Locomotion for Quadrupedal Low-Cost Robotic Systems via Deep Reinforcement Learning

In this work we present Deep Reinforcement Learning (DRL) training of directional locomotion for low-cost quadrupedal robots in the real world. In particular, we exploit randomization of heading that the robot must follow to foster exploration of action-state transitions most useful for learning both forward locomotion as well as course adjustments. Changing the heading in episode resets to current yaw plus a random value drawn from a normal distribution yields policies able to follow complex trajectories involving frequent turns in both directions as well as long straight-line stretches. By repeatedly changing the heading, this method keeps the robot moving within the training platform and thus reduces human involvement and need for manual resets during the training. Real world experiments on a custom-built, low-cost quadruped demonstrate the efficacy of our method with the robot successfully navigating all validation tests. When trained with other approaches, the robot only succeeds in forward locomotion test and fails when turning is required.

Updated: 2025-03-14 03:53:01

标题: 用深度强化学习训练四足低成本机器人系统的定向移动

摘要: 在这项工作中，我们提出了深度强化学习（DRL）在现实世界中对成本低延肢四足机器人进行方向运动训练。特别是，我们利用机器人必须遵循的随机航向变化来促进探索对学习前向运动和航向调整都最有用的动作状态转换。在每一集中改变航向到当前偏航加上从正态分布中随机抽取的随机值，使得策略能够遵循涉及频繁转向和长直线伸展的复杂轨迹。通过反复改变航向，这种方法使机器人在训练平台上保持移动，从而减少了人类参与和在训练过程中需要手动重置的需求。在一个自定义、低成本的四足机器人上进行的真实世界实验证明了我们的方法的有效性，机器人成功地完成了所有验证测试。当使用其他方法进行训练时，机器人仅在前向运动测试中成功，当需要转弯时则失败。

更新时间: 2025-03-14 03:53:01

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.11059v1

LUSD: Localized Update Score Distillation for Text-Guided Image Editing

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.

Updated: 2025-03-14 03:45:29

标题: LUSD：文本引导图像编辑的本地化更新分数蒸馏

摘要: 扩散模型在图像编辑中展现出了有希望的结果，但在保持提示的忠实性和背景保护方面仍然困难重重。最近的研究引入了分数蒸馏技术，利用文本到图像扩散模型丰富的生成先验来解决这一任务，而无需额外的微调。然而，这些方法通常在诸如对象插入等任务中遇到困难。我们对这些失败的调查揭示了梯度幅度和空间分布存在显著变化，使得超参数调整高度依赖于输入或是无法成功。为了解决这个问题，我们提出了两种简单而有效的修改方法：基于注意力的空间正则化和梯度过滤-归一化，都旨在在梯度更新过程中减少这些变化。实验结果显示，我们的方法在提示的忠实性方面优于最先进的分数蒸馏技术，提高了成功编辑的效果同时保留了背景。用户还更倾向于我们的方法，而不是最先进的技术，在三项指标上总体提高了58-64%。

更新时间: 2025-03-14 03:45:29

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11054v1

Theoretical Insights into CycleGAN: Analyzing Approximation and Estimation Errors in Unpaired Data Generation

In this paper, we focus on analyzing the excess risk of the unpaired data generation model, called CycleGAN. Unlike classical GANs, CycleGAN not only transforms data between two unpaired distributions but also ensures the mappings are consistent, which is encouraged by the cycle-consistency term unique to CycleGAN. The increasing complexity of model structure and the addition of the cycle-consistency term in CycleGAN present new challenges for error analysis. By considering the impact of both the model architecture and training procedure, the risk is decomposed into two terms: approximation error and estimation error. These two error terms are analyzed separately and ultimately combined by considering the trade-off between them. Each component is rigorously analyzed; the approximation error through constructing approximations of the optimal transport maps, and the estimation error through establishing an upper bound using Rademacher complexity. Our analysis not only isolates these errors but also explores the trade-offs between them, which provides a theoretical insights of how CycleGAN's architecture and training procedures influence its performance.

Updated: 2025-03-14 03:37:35

标题: CycleGAN的理论洞见：分析非配对数据生成中的近似和估计误差

摘要: 在这篇论文中，我们专注于分析不匹配数据生成模型CycleGAN的过量风险。与传统的GAN不同，CycleGAN不仅在两个不匹配的分布之间转换数据，还确保映射是一致的，这是CycleGAN独有的循环一致性项鼓励的。模型结构的复杂性增加和CycleGAN中循环一致性项的添加为错误分析带来了新挑战。通过考虑模型架构和训练过程的影响，风险被分解为两个项：逼近误差和估计误差。这两个误差项分别进行了分析，并通过考虑它们之间的权衡最终结合在一起。每个组件都经过严格的分析；逼近误差通过构建最优传输映射的逼近，估计误差通过使用Rademacher复杂度建立上界进行分析。我们的分析不仅隔离了这些错误，还探讨了它们之间的权衡，从而提供了关于CycleGAN的架构和训练程序如何影响其性能的理论见解。

更新时间: 2025-03-14 03:37:35

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2407.11678v2

Distance-Based Tree-Sliced Wasserstein Distance

To overcome computational challenges of Optimal Transport (OT), several variants of Sliced Wasserstein (SW) has been developed in the literature. These approaches exploit the closed-form expression of the univariate OT by projecting measures onto (one-dimensional) lines. However, projecting measures onto low-dimensional spaces can lead to a loss of topological information. Tree-Sliced Wasserstein distance on Systems of Lines (TSW-SL) has emerged as a promising alternative that replaces these lines with a more advanced structure called tree systems. The tree structures enhance the ability to capture topological information of the metric while preserving computational efficiency. However, at the core of TSW-SL, the splitting maps, which serve as the mechanism for pushing forward measures onto tree systems, focus solely on the position of the measure supports while disregarding the projecting domains. Moreover, the specific splitting map used in TSW-SL leads to a metric that is not invariant under Euclidean transformations, a typically expected property for OT on Euclidean space. In this work, we propose a novel class of splitting maps that generalizes the existing one studied in TSW-SL enabling the use of all positional information from input measures, resulting in a novel Distance-based Tree-Sliced Wasserstein (Db-TSW) distance. In addition, we introduce a simple tree sampling process better suited for Db-TSW, leading to an efficient GPU-friendly implementation for tree systems, similar to the original SW. We also provide a comprehensive theoretical analysis of proposed class of splitting maps to verify the injectivity of the corresponding Radon Transform, and demonstrate that Db-TSW is an Euclidean invariant metric. We empirically show that Db-TSW significantly improves accuracy compared to recent SW variants while maintaining low computational cost via a wide range of experiments.

Updated: 2025-03-14 03:36:44

标题: 基于距离的树切片Wasserstein距离

摘要: 为了克服最优输运（OT）的计算挑战，文献中已经发展了几种切片Wasserstein（SW）的变种。这些方法利用单变量OT的闭式表达式，通过将测度投影到（一维）线上来实现。然而，将测度投影到低维空间可能会导致拓扑信息的丢失。基于线系统的树切片Wasserstein距离（TSW-SL）作为一种有前途的替代方案出现，用更先进的结构替换这些线，称为树系统。树结构增强了捕获度量拓扑信息的能力，同时保持计算效率。然而，在TSW-SL的核心，分裂映射作为将测度推向树系统的机制，仅关注测度支撑的位置，而忽略了投影域。此外，TSW-SL中使用的特定分裂映射导致度量在欧几里得变换下不具有不变性，这是期望的OT在欧几里得空间上的性质。在这项工作中，我们提出了一类新颖的分裂映射，泛化了TSW-SL中研究的现有分裂映射，使得可以利用输入测度的所有位置信息，从而产生一种新的基于距离的树切片Wasserstein（Db-TSW）距离。此外，我们引入了一个简单的树采样过程，更适用于Db-TSW，从而为树系统提供了一种高效的GPU友好实现，类似于原始的SW。我们还对提出的分裂映射类进行了全面的理论分析，以验证相应Radon变换的单射性，并证明Db-TSW是一个具有欧几里得不变性的度量。我们通过大量实验经验性地表明，与最近的SW变体相比，Db-TSW在保持低计算成本的同时显著提高了准确性。

更新时间: 2025-03-14 03:36:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.11050v1

Measuring Similarity in Causal Graphs: A Framework for Semantic and Structural Analysis

Causal graphs are commonly used to understand and model complex systems. Researchers often construct these graphs from different perspectives, leading to significant variations for the same problem. Comparing causal graphs is, therefore, essential for evaluating assumptions, integrating insights, and resolving disagreements. The rise of AI tools has further amplified this need, as they are increasingly used to generate hypothesized causal graphs by synthesizing information from various sources such as prior research and community inputs, providing the potential for automating and scaling causal modeling for complex systems. Similar to humans, these tools also produce inconsistent results across platforms, versions, and iterations. Despite its importance, research on causal graph comparison remains scarce. Existing methods often focus solely on structural similarities, assuming identical variable names, and fail to capture nuanced semantic relationships, which is essential for causal graph comparison. We address these gaps by investigating methods for comparing causal graphs from both semantic and structural perspectives. First, we reviewed over 40 existing metrics and, based on predefined criteria, selected nine for evaluation from two threads of machine learning: four semantic similarity metrics and five learning graph kernels. We discuss the usability of these metrics in simple examples to illustrate their strengths and limitations. We then generated a synthetic dataset of 2,000 causal graphs using generative AI based on a reference diagram. Our findings reveal that each metric captures a different aspect of similarity, highlighting the need to use multiple metrics.

Updated: 2025-03-14 03:29:26

标题: 在因果图中衡量相似性：一个用于语义和结构分析的框架

摘要: 因果图通常用于理解和建模复杂系统。研究人员通常从不同的角度构建这些图，导致相同问题的显著变化。因此，比较因果图对于评估假设、整合见解和解决分歧至关重要。人工智能工具的崛起进一步加剧了这种需求，因为它们越来越多地被用于通过综合来自不同来源（如先前研究和社区输入）的信息生成假设的因果图，提供了自动化和扩展复杂系统因果建模的潜力。与人类类似，这些工具在各种平台、版本和迭代中也会产生不一致的结果。尽管其重要性，有关因果图比较的研究仍然很少。现有方法通常仅关注结构相似性，假设变量名称相同，并未捕捉到因果图比较所必需的微妙语义关系。我们通过从语义和结构两个角度调查比较因果图的方法来填补这些空白。首先，我们回顾了超过40个现有指标，并根据预定义的标准，从两个机器学习领域选择了九个用于评估的指标：四个语义相似度指标和五个学习图核。我们通过简单示例讨论了这些指标的可用性，以说明它们的优点和局限性。然后，我们基于参考图表使用生成式人工智能生成了一个包含2,000个因果图的合成数据集。我们的研究结果显示，每个指标捕捉到了相似性的不同方面，突显了使用多个指标的必要性。

更新时间: 2025-03-14 03:29:26

领域: cs.LG,cs.AI,68T05, 68R10, 62H30,I.2.6; G.2.2; I.5.4; H.2.8

下载: http://arxiv.org/abs/2503.11046v1

On the relationship between Koopman operator approximations and neural ordinary differential equations for data-driven time-evolution predictions

This work explores the relationship between state space methods and Koopman operator-based methods for predicting the time-evolution of nonlinear dynamical systems. We demonstrate that extended dynamic mode decomposition with dictionary learning (EDMD-DL), when combined with a state space projection, is equivalent to a neural network representation of the nonlinear discrete-time flow map on the state space. We highlight how this projection step introduces nonlinearity into the evolution equations, enabling significantly improved EDMD-DL predictions. With this projection, EDMD-DL leads to a nonlinear dynamical system on the state space, which can be represented in either discrete or continuous time. This system has a natural structure for neural networks, where the state is first expanded into a high dimensional feature space followed by a linear mapping which represents the discrete-time map or the vector field as a linear combination of these features. Inspired by these observations, we implement several variations of neural ordinary differential equations (ODEs) and EDMD-DL, developed by combining different aspects of their respective model structures and training procedures. We evaluate these methods using numerical experiments on chaotic dynamics in the Lorenz system and a nine-mode model of turbulent shear flow, showing comparable performance across methods in terms of short-time trajectory prediction, reconstruction of long-time statistics, and prediction of rare events. These results highlight the equivalence of the EDMD-DL implementation with a state space projection to a neural ODE representation of the dynamics. We also show that these methods provide comparable performance to a non-Markovian approach in terms of prediction of extreme events.

Updated: 2025-03-14 03:22:16

标题: 关于Koopman算子逼近和神经常微分方程在数据驱动时间演化预测中的关系

摘要: 这项工作探讨了状态空间方法和Koopman算子方法之间在预测非线性动态系统时间演变方面的关系。我们证明，扩展动态模式分解与字典学习（EDMD-DL）结合状态空间投影时，等同于在状态空间上的非线性离散时间流映射的神经网络表示。我们强调了这个投影步骤如何引入非线性进入演化方程，从而实现了显著改进的EDMD-DL预测。通过这种投影，EDMD-DL在状态空间上导致了一个非线性动态系统，可以以离散或连续时间表示。这个系统对神经网络有一个自然的结构，其中状态首先被扩展到一个高维特征空间，然后是一个线性映射，表示离散时间映射或矢量场作为这些特征的线性组合。受到这些观察的启发，我们实现了几种神经常微分方程（ODEs）和EDMD-DL的变体，通过结合它们各自的模型结构和训练过程的不同方面进行开发。我们使用洛伦兹系统中的混沌动态和湍流剪切流的九模型进行数值实验来评估这些方法，展示了在短时间轨迹预测、长时间统计重建和罕见事件的预测方面，各种方法的可比性表现。这些结果突显了EDMD-DL实现与状态空间投影的等同性，以及动态的神经ODE表示。我们还表明，这些方法在极端事件的预测方面提供了与非马尔可夫方法相媲美的性能。

更新时间: 2025-03-14 03:22:16

领域: nlin.CD,cs.LG

下载: http://arxiv.org/abs/2411.12940v2

The Beginner's Textbook for Fully Homomorphic Encryption

Fully Homomorphic Encryption (FHE) is a cryptographic scheme that enables computations to be performed directly on encrypted data, as if the data were in plaintext. After all computations are performed on the encrypted data, it can be decrypted to reveal the result. The decrypted value matches the result that would have been obtained if the same computations were applied to the plaintext data. FHE supports basic operations such as addition and multiplication on encrypted numbers. Using these fundamental operations, more complex computations can be constructed, including subtraction, division, logic gates (e.g., AND, OR, XOR, NAND, MUX), and even advanced mathematical functions such as ReLU, sigmoid, and trigonometric functions (e.g., sin, cos). These functions can be implemented either as exact formulas or as approximations, depending on the trade-off between computational efficiency and accuracy. Fully Homomorphic Encryption (FHE) enables privacy-preserving machine learning by allowing a server to process the client's data in its encrypted form through an ML model. With FHE, the server learns neither the plaintext version of the input features nor the inference results. Only the client, using their secret key, can decrypt and access the results at the end of the service protocol.FHE can also be applied to confidential blockchain services, ensuring that sensitive data in smart contracts remains encrypted and confidential while maintaining the transparency and integrity of the execution process. Other applications of FHE include secure outsourcing of data analytics, encrypted database queries, privacy-preserving searches, efficient multi-party computation for digital signatures, and more. This article is designed to help the reader understand how FHE works from the mathematical level.

Updated: 2025-03-14 03:22:13

标题: 全同态加密初学者教程

摘要: 全同态加密（FHE）是一种密码方案，可以在加密的数据上直接执行计算，就好像数据是明文一样。在对加密数据执行所有计算后，可以解密以显示结果。解密的值与如果对明文数据应用相同计算所得的结果匹配。 FHE支持对加密数字进行加法和乘法等基本操作。利用这些基本操作，可以构建更复杂的计算，包括减法、除法、逻辑门（例如AND，OR，XOR，NAND，MUX），甚至高级数学函数，如ReLU，sigmoid和三角函数（例如sin，cos）。这些函数可以作为精确公式或近似实现，取决于计算效率和准确性之间的权衡。全同态加密（FHE）通过允许服务器通过ML模型在加密形式下处理客户端数据，实现了隐私保护的机器学习。使用FHE，服务器既不了解输入特征的明文版本，也不了解推理结果。只有客户端可以使用他们的秘钥，在服务协议结束时解密和访问结果。FHE还可以应用于保密的区块链服务，确保智能合约中的敏感数据保持加密和保密，同时保持执行过程的透明度和完整性。FHE的其他应用包括安全外包数据分析、加密数据库查询、隐私保护搜索、数字签名的有效多方计算等。本文旨在帮助读者从数学层面理解FHE的工作原理。

更新时间: 2025-03-14 03:22:13

领域: cs.CR,cs.DM

下载: http://arxiv.org/abs/2503.05136v3

ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding

Automatic chart understanding is crucial for content comprehension and document parsing. Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding through domain-specific alignment and fine-tuning. However, current MLLMs still struggle to provide faithful data and reliable analysis only based on charts. To address it, we propose ChartMoE, which employs the Mixture of Expert (MoE) architecture to replace the traditional linear projector to bridge the modality gap. Specifically, we train several linear connectors through distinct alignment tasks, which are utilized as the foundational initialization parameters for different experts. Additionally, we introduce ChartMoE-Align, a dataset with nearly 1 million chart-table-JSON-code quadruples to conduct three alignment tasks (chart-table/JSON/code). Combined with the vanilla connector, we initialize different experts diversely and adopt high-quality knowledge learning to further refine the MoE connector and LLM parameters. Extensive experiments demonstrate the effectiveness of the MoE connector and our initialization strategy, e.g., ChartMoE improves the accuracy of the previous state-of-the-art from 80.48\% to 84.64\% on the ChartQA benchmark.

Updated: 2025-03-14 03:19:00

标题: ChartMoE: 多样对齐专家连接器混合体用于图表理解

摘要: 自动图表理解对于内容理解和文档解析至关重要。多模态大型语言模型（MLLMs）通过领域特定的对齐和微调，在图表理解方面展示出了卓越的能力。然而，当前的MLLMs仍然难以仅基于图表提供忠实的数据和可靠的分析。为了解决这个问题，我们提出了ChartMoE，它采用了专家混合（MoE）架构来取代传统的线性投影器，以弥合模态差距。具体地，我们通过不同的对齐任务训练了几个线性连接器，这些连接器被用作不同专家的基础初始化参数。此外，我们引入了ChartMoE-Align，一个包含近100万个图表-表格-JSON代码四元组的数据集，用于进行三个对齐任务（图表-表格/JSON/代码）。结合原始连接器，我们以不同方式初始化不同的专家，并采用高质量的知识学习来进一步优化MoE连接器和LLM参数。大量实验证明了MoE连接器和我们的初始化策略的有效性，例如，ChartMoE将ChartQA基准的准确率从80.48％提高到84.64％。

更新时间: 2025-03-14 03:19:00

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2409.03277v3

InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences

Plug-and-play diffusion priors (PnPDP) have emerged as a promising research direction for solving inverse problems. However, current studies primarily focus on natural image restoration, leaving the performance of these algorithms in scientific inverse problems largely unexplored. To address this gap, we introduce \textsc{InverseBench}, a framework that evaluates diffusion models across five distinct scientific inverse problems. These problems present unique structural challenges that differ from existing benchmarks, arising from critical scientific applications such as optical tomography, medical imaging, black hole imaging, seismology, and fluid dynamics. With \textsc{InverseBench}, we benchmark 14 inverse problem algorithms that use plug-and-play diffusion priors against strong, domain-specific baselines, offering valuable new insights into the strengths and weaknesses of existing algorithms. To facilitate further research and development, we open-source the codebase, along with datasets and pre-trained models, at https://devzhk.github.io/InverseBench/.

Updated: 2025-03-14 03:13:55

标题: InverseBench：基于插拔式扩散先验的反问题在物理科学中的基准测试

摘要: 插播式扩散先验（PnPDP）已成为解决反问题的一个有前途的研究方向。然而，当前的研究主要集中在自然图像恢复上，而这些算法在科学反问题中的表现尚未得到广泛探索。为了弥补这一差距，我们引入了InverseBench，这是一个评估扩散模型在五个不同科学反问题上的框架。这些问题提出了与现有基准测试不同的独特结构挑战，来源于关键的科学应用，如光学断层摄影术、医学成像、黑洞成像、地震学和流体动力学。通过InverseBench，我们对使用插播式扩散先验的14种反问题算法进行基准测试，与强大的领域特定基线进行比较，为现有算法的优势和劣势提供有价值的新见解。为了促进进一步的研究和开发，我们开源了代码库，以及数据集和预训练模型，网址为https://devzhk.github.io/InverseBench/。

更新时间: 2025-03-14 03:13:55

领域: cs.LG

下载: http://arxiv.org/abs/2503.11043v1

Resource Constrained Pathfinding with A* and Negative Weights

Constrained pathfinding is a well-studied, yet challenging network optimisation problem that can be seen in a broad range of real-world applications. Pathfinding with multiple resource limits, which is known as the Resource Constrained Shortest Path Problem (RCSP), aims to plan a cost-optimum path subject to limited usage of resources. Given the recent advances in constrained and multi-criteria search with A*, this paper introduces a new resource constrained search framework on the basis of A* to tackle RCSP in large networks, even in the presence of negative cost and negative resources. We empirically evaluate our new algorithm on a set of large instances and show up to two orders of magnitude faster performance compared to state-of-the-art RCSP algorithms in the literature.

Updated: 2025-03-14 03:06:40

标题: 资源有限的A*算法和负权重路径规划

摘要: 受限路径规划是一个被广泛研究但具有挑战性的网络优化问题，在许多实际应用中都可以看到。具有多个资源限制的路径规划，即资源受限最短路径问题（RCSP），旨在规划在资源使用受限的情况下的成本最优路径。鉴于在A*中受限和多准则搜索的最新进展，本文介绍了一个基于A*的新的资源受限搜索框架，以应对大型网络中的RCSP问题，甚至在存在负成本和负资源的情况下。我们在一组大型实例上对我们的新算法进行了实证评估，并展示了与文献中最先进的RCSP算法相比高达两个数量级的更快性能。

更新时间: 2025-03-14 03:06:40

领域: cs.AI

下载: http://arxiv.org/abs/2503.11037v1

FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection

Camouflaged Object Detection (COD) is challenging due to the strong similarity between camouflaged objects and their surroundings, which complicates identification. Existing methods mainly rely on spatial local features, failing to capture global information, while Transformers increase computational costs.To address this, the Frequency-Assisted Mamba-Like Linear Attention Network (FMNet) is proposed, which leverages frequency-domain learning to efficiently capture global features and mitigate ambiguity between objects and the background. FMNet introduces the Multi-Scale Frequency-Assisted Mamba-Like Linear Attention (MFM) module, integrating frequency and spatial features through a multi-scale structure to handle scale variations while reducing computational complexity. Additionally, the Pyramidal Frequency Attention Extraction (PFAE) module and the Frequency Reverse Decoder (FRD) enhance semantics and reconstruct features. Experimental results demonstrate that FMNet outperforms existing methods on multiple COD datasets, showcasing its advantages in both performance and efficiency. Code available at https://anonymous.4open.science/r/FMNet-3CE5.

Updated: 2025-03-14 02:55:19

标题: FMNet：频率辅助的蟒蛇式线性注意力网络用于伪装目标检测

摘要: 伪装物体检测（COD）具有挑战性，因为伪装物体与其周围环境之间存在强烈相似性，这使得识别变得复杂。现有方法主要依赖于空间局部特征，无法捕捉全局信息，而Transformers会增加计算成本。为了解决这个问题，提出了频率辅助的蛇形线性注意网络（FMNet），利用频域学习有效捕捉全局特征，并减轻物体与背景之间的模糊性。FMNet引入了多尺度频率辅助蛇形线性注意（MFM）模块，通过多尺度结构整合频率和空间特征，以处理尺度变化同时降低计算复杂性。此外，金字塔频率注意提取（PFAE）模块和频率反向解码器（FRD）增强语义并重构特征。实验结果表明，FMNet在多个COD数据集上优于现有方法，展示了其在性能和效率方面的优势。代码可在https://anonymous.4open.science/r/FMNet-3CE5获取。

更新时间: 2025-03-14 02:55:19

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11030v1

Neural Tangent Kernel of Neural Networks with Loss Informed by Differential Operators

Spectral bias is a significant phenomenon in neural network training and can be explained by neural tangent kernel (NTK) theory. In this work, we develop the NTK theory for deep neural networks with physics-informed loss, providing insights into the convergence of NTK during initialization and training, and revealing its explicit structure. We find that, in most cases, the differential operators in the loss function do not induce a faster eigenvalue decay rate and stronger spectral bias. Some experimental results are also presented to verify the theory.

Updated: 2025-03-14 02:55:13

标题: 神经网络的神经切向核，在损失由微分算子信息驱动的情况下

摘要: 光谱偏差是神经网络训练中的一个重要现象，可以通过神经切线核（NTK）理论来解释。在这项工作中，我们发展了具有物理信息损失的深度神经网络的NTK理论，为初始化和训练过程中NTK的收敛提供了洞察，并揭示了其明确的结构。我们发现，在大多数情况下，损失函数中的微分算子不会引发更快的特征值衰减率和更强的光谱偏差。还提供了一些实验结果来验证这一理论。

更新时间: 2025-03-14 02:55:13

领域: cs.LG

下载: http://arxiv.org/abs/2503.11029v1

Behavioral Machine Learning? Computer Predictions of Corporate Earnings also Overreact

Machine learning algorithms are known to outperform human analysts in predicting corporate earnings, leading to their rapid adoption. However, we show that leading methods (XGBoost, neural nets, ChatGPT) systematically overreact to news. The overreaction is primarily due to biases in the training data and we show that it cannot be eliminated without compromising accuracy. Analysts with machine learning training overreact much less than do traditional analysts. We provide a model showing that there is a tradeoff between predictive power and rational behavior. Our findings suggest that AI tools reduce but do not eliminate behavioral biases in financial markets.

Updated: 2025-03-14 02:54:43

标题: 行为机器学习？计算机对企业收益的预测也存在过度反应

摘要: 机器学习算法被认为在预测公司盈利方面优于人类分析师，因此被迅速采用。然而，我们发现主流方法（XGBoost、神经网络、ChatGPT）系统性地对新闻做出过度反应。这种过度反应主要是由于训练数据中的偏见，我们发现这种偏见无法消除而不牺牲准确性。接受过机器学习培训的分析师比传统分析师对新闻做出的反应要小得多。我们提供了一个模型，表明在预测能力和理性行为之间存在权衡。我们的研究结果表明，人工智能工具可以减少但不能消除金融市场中的行为偏见。

更新时间: 2025-03-14 02:54:43

领域: q-fin.ST,cs.LG,econ.GN,q-fin.EC,q-fin.GN

下载: http://arxiv.org/abs/2303.16158v2

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control

Speech-driven 3D talking face method should offer both accurate lip synchronization and controllable expressions. Previous methods solely adopt discrete emotion labels to globally control expressions throughout sequences while limiting flexible fine-grained facial control within the spatiotemporal domain. We propose a diffusion-transformer-based 3D talking face generation model, Cafe-Talk, which simultaneously incorporates coarse- and fine-grained multimodal control conditions. Nevertheless, the entanglement of multiple conditions challenges achieving satisfying performance. To disentangle speech audio and fine-grained conditions, we employ a two-stage training pipeline. Specifically, Cafe-Talk is initially trained using only speech audio and coarse-grained conditions. Then, a proposed fine-grained control adapter gradually adds fine-grained instructions represented by action units (AUs), preventing unfavorable speech-lip synchronization. To disentangle coarse- and fine-grained conditions, we design a swap-label training mechanism, which enables the dominance of the fine-grained conditions. We also devise a mask-based CFG technique to regulate the occurrence and intensity of fine-grained control. In addition, a text-based detector is introduced with text-AU alignment to enable natural language user input and further support multimodal control. Extensive experimental results prove that Cafe-Talk achieves state-of-the-art lip synchronization and expressiveness performance and receives wide acceptance in fine-grained control in user studies. Project page: https://harryxd2018.github.io/cafe-talk/

Updated: 2025-03-14 02:52:41

标题: 咖啡馆谈话：使用多模式粗细粒度控制生成3D语言交流面部动画

摘要: 语音驱动的三维说话面部方法应该提供准确的嘴唇同步和可控的表情。先前的方法仅采用离散的情绪标签来全局控制序列中的表情，同时限制了在时空域内对细粒度面部控制的灵活性。我们提出了一种基于扩散变压器的三维说话面生成模型Cafe-Talk，同时结合了粗粒度和细粒度多模态控制条件。然而，多种条件的纠缠挑战着实现令人满意的性能。为了解开语音音频和细粒度条件，我们采用了一个两阶段训练流程。具体地，Cafe-Talk首先只使用语音音频和粗粒度条件进行训练。然后，一个提出的细粒度控制适配器逐渐添加由动作单元（AUs）表示的细粒度指令，防止不利的语音-嘴唇同步。为了解开粗粒度和细粒度条件，我们设计了一个交换标签训练机制，使细粒度条件占主导地位。我们还设计了一种基于掩蔽的CFG技术，以调节细粒度控制的发生和强度。此外，引入了一个基于文本的检测器，通过文本-AU对齐实现自然语言用户输入，并进一步支持多模态控制。广泛的实验结果证明，Cafe-Talk在唇部同步和表现性能方面达到了最先进水平，并在用户研究中在细粒度控制方面获得了广泛认可。项目页面：https://harryxd2018.github.io/cafe-talk/

更新时间: 2025-03-14 02:52:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.14517v1

Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Spatial Reasoning Questions

Spatial reasoning remains a challenge for Large Language Models (LLMs), which struggle with spatial data retrieval and reasoning. We propose Spatial Retrieval-Augmented Generation (Spatial-RAG), a framework that extends RAG to spatial tasks by integrating sparse spatial retrieval (spatial databases) and dense semantic retrieval (LLM-based similarity). A multi-objective ranking strategy balances spatial constraints and semantic relevance, while an LLM-guided generator ensures coherent responses. Experiments on a real-world tourism dataset show that Spatial-RAG significantly improves spatial question answering, bridging the gap between LLMs and spatial intelligence.

Updated: 2025-03-14 02:48:55

标题: Spatial-RAG：用于真实世界空间推理问题的空间检索增强生成

摘要: 空间推理对于大型语言模型（LLMs）仍然是一个挑战，这些模型在空间数据检索和推理方面存在困难。我们提出了空间检索增强生成（Spatial-RAG）框架，通过将稀疏空间检索（空间数据库）和密集语义检索（基于LLM的相似性）集成到RAG中，扩展了RAG到空间任务。多目标排序策略平衡空间约束和语义相关性，同时LLM引导的生成器确保连贯的回答。在一个真实的旅游数据集上的实验证实了Spatial-RAG显著提高了空间问题回答的性能，弥合了LLMs和空间智能之间的差距。

更新时间: 2025-03-14 02:48:55

领域: cs.IR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2502.18470v3

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

Despite recent advances in text-to-speech (TTS) models, audio-visual to audio-visual (AV2AV) translation still faces a critical challenge: maintaining speaker consistency between the original and translated vocal and facial features. To address this issue, we propose a conditional flow matching (CFM) zero-shot audio-visual renderer that utilizes strong dual guidance from both audio and visual modalities. By leveraging multi-modal guidance with CFM, our model robustly preserves speaker-specific characteristics and significantly enhances zero-shot AV2AV translation abilities. For the audio modality, we enhance the CFM process by integrating robust speaker embeddings with x-vectors, which serve to bolster speaker consistency. Additionally, we convey emotional nuances to the face rendering module. The guidance provided by both audio and visual cues remains independent of semantic or linguistic content, allowing our renderer to effectively handle zero-shot translation tasks for monolingual speakers in different languages. We empirically demonstrate that the inclusion of high-quality mel-spectrograms conditioned on facial information not only enhances the quality of the synthesized speech but also positively influences facial generation, leading to overall performance improvements.

Updated: 2025-03-14 02:48:43

标题: MAVFlow：使用条件流匹配保留语调要素，实现零-shot AV2AV 多语言翻译

摘要: 尽管最近文本转语音（TTS）模型取得了一些进展，但音频-视觉到音频-视觉（AV2AV）翻译仍然面临一个关键挑战：在原始和翻译的语音和面部特征之间保持说话者一致性。为了解决这个问题，我们提出了一种有条件的流匹配（CFM）零样本音频-视觉渲染器，利用来自音频和视觉两种模式的强双重指导。通过利用CFM的多模态指导，我们的模型能够稳健地保留说话者特定的特征，并显著增强零样本AV2AV翻译能力。对于音频模态，我们通过将强健的说话者嵌入与x-向量相结合来增强CFM过程，从而加强说话者一致性。此外，我们还向面部渲染模块传达情感细微差别。音频和视觉线索提供的指导与语义或语言内容无关，使我们的渲染器能够有效处理不同语言中的单语言说话者的零样本翻译任务。我们通过实验证明，高质量的以面部信息为条件的梅尔频谱图的包含不仅提高了合成语音的质量，还积极影响了面部生成，从而导致整体性能改善。

更新时间: 2025-03-14 02:48:43

领域: eess.AS,cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.11026v1

Real-Time Decision-Making for Digital Twin in Additive Manufacturing with Model Predictive Control using Time-Series Deep Neural Networks

Digital Twin -- a virtual replica of a physical system enabling real-time monitoring, model updating, prediction, and decision-making -- combined with recent advances in machine learning, offers new opportunities for proactive control strategies in autonomous manufacturing. However, achieving real-time decision-making with Digital Twins requires efficient optimization driven by accurate predictions of highly nonlinear manufacturing systems. This paper presents a simultaneous multi-step Model Predictive Control (MPC) framework for real-time decision-making, using a multivariate deep neural network, named Time-Series Dense Encoder (TiDE), as the surrogate model. Unlike conventional MPC models which only provide one-step ahead prediction, TiDE is capable of predicting future states within the prediction horizon in one shot (multi-step), significantly accelerating the MPC. Using Directed Energy Deposition (DED) additive manufacturing as a case study, we demonstrate the effectiveness of the proposed MPC in achieving melt pool temperature tracking to ensure part quality, while reducing porosity defects by regulating laser power to maintain melt pool depth constraints. In this work, we first show that TiDE is capable of accurately predicting melt pool temperature and depth. Second, we demonstrate that the proposed MPC achieves precise temperature tracking while satisfying melt pool depth constraints within a targeted dilution range (10\%-30\%), reducing potential porosity defects. Compared to PID controller, the MPC results in smoother and less fluctuating laser power profiles with competitive or superior melt pool temperature control performance. This demonstrates the MPC's proactive control capabilities, leveraging time-series prediction and real-time optimization, positioning it as a powerful tool for future Digital Twin applications and real-time process optimization in manufacturing.

Updated: 2025-03-14 02:33:47

标题: 数字孪生在增材制造中的实时决策：基于时间序列深度神经网络的模型预测控制

摘要: 数字孪生--一个物理系统的虚拟复制，实现实时监测、模型更新、预测和决策--结合最近机器学习的进展，为自主制造中的主动控制策略提供了新的机会。然而，要实现数字孪生的实时决策，需要通过准确预测高度非线性制造系统的优化来驱动。本文提出了一种用于实时决策的同时多步模型预测控制（MPC）框架，使用名为时间序列密集编码器（TiDE）的多变量深度神经网络作为替代模型。与仅提供一步预测的传统MPC模型不同，TiDE能够在一个步骤（多步）内预测预测范围内的未来状态，显著加速MPC。以定向能源沉积（DED）增材制造为案例研究，我们展示了所提出的MPC在实现熔池温度跟踪以确保零件质量的同时，通过调节激光功率以维持熔池深度约束，减少孔隙缺陷的有效性。在这项工作中，我们首先展示TiDE能够准确预测熔池温度和深度。其次，我们证明了所提出的MPC在满足熔池深度约束的同时实现精确的温度跟踪，使其在目标稀释范围（10\%-30\%）内减少潜在的孔隙缺陷。与PID控制器相比，MPC导致更平滑、波动较小的激光功率曲线，具有竞争性或优越的熔池温度控制性能。这展示了MPC的主动控制能力，利用时间序列预测和实时优化，将其定位为未来数字孪生应用和制造中的实时过程优化的强大工具。

更新时间: 2025-03-14 02:33:47

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2501.07601v4

Residual Policy Gradient: A Reward View of KL-regularized Objective

Reinforcement Learning and Imitation Learning have achieved widespread success in many domains but remain constrained during real-world deployment. One of the main issues is the additional requirements that were not considered during training. To address this challenge, policy customization has been introduced, aiming to adapt a prior policy while preserving its inherent properties and meeting new task-specific requirements. A principled approach to policy customization is Residual Q-Learning (RQL), which formulates the problem as a Markov Decision Process (MDP) and derives a family of value-based learning algorithms. However, RQL has not yet been applied to policy gradient methods, which restricts its applicability, especially in tasks where policy gradient has already proven more effective. In this work, we first derive a concise form of Soft Policy Gradient as a preliminary. Building on this, we introduce Residual Policy Gradient (RPG), which extends RQL to policy gradient methods, allowing policy customization in gradient-based RL settings. With the view of RPG, we rethink the KL-regularized objective widely used in RL fine-tuning. We show that under certain assumptions, KL-regularized objective leads to a maximum-entropy policy that balances the inherent properties and task-specific requirements on a reward-level. Our experiments in MuJoCo demonstrate the effectiveness of Soft Policy Gradient and Residual Policy Gradient.

Updated: 2025-03-14 02:30:13

标题: 剩余策略梯度：KL正则化目标的奖励视角

摘要: 强化学习和模仿学习在许多领域取得了广泛成功，但在实际部署过程中仍受到限制。其中一个主要问题是在训练过程中未考虑到的额外要求。为了解决这一挑战，引入了策略定制，旨在在保留其固有属性并满足新任务特定要求的同时调整先前的策略。一种原则性的策略定制方法是剩余Q学习（RQL），它将问题规划为马尔可夫决策过程（MDP）并推导出一系列基于价值的学习算法。然而，RQL尚未应用于策略梯度方法，这限制了其适用性，尤其是在策略梯度已被证明更有效的任务中。在这项工作中，我们首先推导出了Soft Policy Gradient的简明形式作为初步。在此基础上，我们介绍了剩余策略梯度（RPG），将RQL扩展到策略梯度方法，允许在基于梯度的RL设置中进行策略定制。通过RPG的视角，我们重新思考在RL微调中广泛使用的KL正则化目标。我们展示，在某些假设下，KL正则化目标导致了一个平衡固有属性和任务特定要求的最大熵策略。我们在MuJoCo中的实验证明了Soft Policy Gradient和Residual Policy Gradient的有效性。

更新时间: 2025-03-14 02:30:13

领域: cs.LG

下载: http://arxiv.org/abs/2503.11019v1

Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance

Multi-view clustering leverages complementary representations from diverse sources to enhance performance. However, real-world data often suffer incomplete cases due to factors like privacy concerns and device malfunctions. A key challenge is effectively utilizing available instances to recover missing views. Existing methods frequently overlook the heterogeneity among views during recovery, leading to significant distribution discrepancies between recovered and true data. Additionally, many approaches focus on cross-view correlations, neglecting insights from intra-view reliable structure and cross-view clustering structure. To address these issues, we propose BURG, a novel method for incomplete multi-view clustering with distriBution dUal-consistency Recovery Guidance. We treat each sample as a distinct category and perform cross-view distribution transfer to predict the distribution space of missing views. To compensate for the lack of reliable category information, we design a dual-consistency guided recovery strategy that includes intra-view alignment guided by neighbor-aware consistency and cross-view alignment guided by prototypical consistency. Extensive experiments on benchmarks demonstrate the superiority of BURG in the incomplete multi-view scenario.

Updated: 2025-03-14 02:27:45

标题: 使用分布双一致性恢复指导的深度不完全多视图聚类

摘要: 多视图聚类利用来自不同来源的互补表示来增强性能。然而，现实世界的数据经常因隐私问题和设备故障等因素而存在不完整情况。一个关键挑战是有效利用可用实例来恢复缺失视图。现有方法经常忽视恢复过程中视图之间的异质性，导致恢复数据和真实数据之间存在显著的分布差异。此外，许多方法关注跨视图相关性，忽视了来自视图内可靠结构和跨视图聚类结构的洞察。为了解决这些问题，我们提出了BURG，一种用于不完整多视图聚类的新方法，具有分布双一致性恢复引导。我们将每个样本视为一个独特的类别，并执行跨视图分布转移来预测缺失视图的分布空间。为了补偿可靠类别信息的缺乏，我们设计了一个双一致性引导恢复策略，其中包括由邻居感知一致性引导的视图内对齐和由原型一致性引导的跨视图对齐。在基准测试上进行的大量实验证明了BURG在不完整多视图情况下的优越性。

更新时间: 2025-03-14 02:27:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.11017v1

From Abstraction to Reality: DARPA's Vision for Robust Sim-to-Real Autonomy

The DARPA Transfer from Imprecise and Abstract Models to Autonomous Technologies (TIAMAT) program aims to address rapid and robust transfer of autonomy technologies across dynamic and complex environments, goals, and platforms. Existing methods for simulation-to-reality (sim-to-real) transfer often rely on high-fidelity simulations and struggle with broad adaptation, particularly in time-sensitive scenarios. Although many approaches have shown incredible performance at specific tasks, most techniques fall short when posed with unforeseen, complex, and dynamic real-world scenarios due to the inherent limitations of simulation. In contrast to current research that aims to bridge the gap between simulation environments and the real world through increasingly sophisticated simulations and a combination of methods typically assuming a small sim-to-real gap -- such as domain randomization, domain adaptation, imitation learning, meta-learning, policy distillation, and dynamic optimization -- TIAMAT takes a different approach by instead emphasizing transfer and adaptation of the autonomy stack directly to real-world environments by utilizing a breadth of low(er)-fidelity simulations to create broadly effective sim-to-real transfers. By abstractly learning from multiple simulation environments in reference to their shared semantics, TIAMAT's approaches aim to achieve abstract-to-real transfer for effective and rapid real-world adaptation. Furthermore, this program endeavors to improve the overall autonomy pipeline by addressing the inherent challenges in translating simulated behaviors into effective real-world performance.

Updated: 2025-03-14 02:06:10

标题: 从抽象到现实：DARPA对强大的从模拟到真实自主性的愿景

摘要: DARPA的不精确和抽象模型到自主技术（TIAMAT）计划旨在解决自主技术在动态和复杂环境、目标和平台之间的快速和稳健转移。现有的模拟到现实（sim-to-real）转移方法通常依赖于高保真度模拟，并在广泛适应性方面存在困难，特别是在时间敏感情况下。尽管许多方法在特定任务上表现出色，但大多数技术在面临未预见、复杂和动态的现实场景时表现不佳，这是由于模拟的固有限制。与当前旨在通过越来越复杂的模拟和通常假设小sim-to-real间隙的方法组合来弥合模拟环境与真实世界之间差距的研究相反，如领域随机化、领域适应、模仿学习、元学习、策略提炼和动态优化--TIAMAT采取了一种不同的方法，强调将自主技术栈直接转移到真实环境中，利用各种低保真度模拟来创建广泛有效的模拟到现实转移。通过在多个模拟环境中抽象学习共享语义，TIAMAT的方法旨在实现有效和快速的抽象到现实转移，以便进行现实世界的适应。此外，该计划致力于通过解决将模拟行为转化为有效的真实世界性能中的固有挑战来改进整体自主管道。

更新时间: 2025-03-14 02:06:10

领域: cs.RO,cs.AI,cs.LG,cs.MA,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.11007v1

Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation

Vision and Language Navigation (VLN) requires an agent to navigate through environments following natural language instructions. However, existing methods often struggle with effectively integrating visual observations and instruction details during navigation, leading to suboptimal path planning and limited success rates. In this paper, we propose OIKG (Observation-graph Interaction and Key-detail Guidance), a novel framework that addresses these limitations through two key components: (1) an observation-graph interaction module that decouples angular and visual information while strengthening edge representations in the navigation space, and (2) a key-detail guidance module that dynamically extracts and utilizes fine-grained location and object information from instructions. By enabling more precise cross-modal alignment and dynamic instruction interpretation, our approach significantly improves the agent's ability to follow complex navigation instructions. Extensive experiments on the R2R and RxR datasets demonstrate that OIKG achieves state-of-the-art performance across multiple evaluation metrics, validating the effectiveness of our method in enhancing navigation precision through better observation-instruction alignment.

Updated: 2025-03-14 02:05:16

标题: 观察-图形交互和关键细节引导对于视觉和语言导航的重要性

摘要: 视觉与语言导航（VLN）要求智能体根据自然语言指令在环境中导航。然而，现有方法往往难以有效地整合视觉观察和指令细节，在导航过程中导致次优路径规划和有限的成功率。在本文中，我们提出了OIKG（观察-图交互和关键细节引导），这是一个通过两个关键组件解决这些限制的新颖框架：（1）观察-图交互模块，将角度和视觉信息解耦，同时加强导航空间中的边表示，（2）关键细节引导模块，动态提取和利用指令中的细粒度位置和对象信息。通过实现更精确的跨模态对齐和动态指令解释，我们的方法显著提高了智能体遵循复杂导航指令的能力。对R2R和RxR数据集的大量实验表明，OIKG在多个评估指标上实现了最先进的性能，验证了我们的方法在通过更好的观察-指令对齐来增强导航精度方面的有效性。

更新时间: 2025-03-14 02:05:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11006v1

PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion

We present PersonaCraft, a framework for controllable and occlusion-robust full-body personalized image synthesis of multiple individuals in complex scenes. Current methods struggle with occlusion-heavy scenarios and complete body personalization, as 2D pose conditioning lacks 3D geometry, often leading to ambiguous occlusions and anatomical distortions, and many approaches focus solely on facial identity. In contrast, our PersonaCraft integrates diffusion models with 3D human modeling, employing SMPLx-ControlNet, to utilize 3D geometry like depth and normal maps for robust 3D-aware pose conditioning and enhanced anatomical coherence. To handle fine-grained occlusions, we propose Occlusion Boundary Enhancer Network that exploits depth edge signals with occlusion-focused training, and Occlusion-Aware Classifier-Free Guidance strategy that selectively reinforces conditioning in occluded regions without affecting unoccluded areas. PersonaCraft can seamlessly be combined with Face Identity ControlNet, achieving full-body multi-human personalization and thus marking a significant advancement beyond prior approaches that concentrate only on facial identity. Our dual-pathway body shape representation with SMPLx-based shape parameters and textual refinement, enables precise full-body personalization and flexible user-defined body shape adjustments. Extensive quantitative experiments and user studies demonstrate that PersonaCraft significantly outperforms existing methods in generating high-quality, multi-person images with accurate personalization and robust occlusion handling.

Updated: 2025-03-14 02:05:11

标题: PersonaCraft：使用遮挡感知的3D条件扩散生成个性化和可控的全身多人场景

摘要: 我们提出了PersonaCraft，这是一个用于在复杂场景中控制和抗遮挡的全身个性化图像合成的框架。当前的方法在遮挡严重的情况下和完全身体个性化方面存在困难，因为2D姿势调节缺乏3D几何，通常导致模糊的遮挡和解剖变形，而许多方法仅专注于面部身份。相比之下，我们的PersonaCraft集成了扩散模型和3D人体建模，采用SMPLx-ControlNet，利用深度和法线图等3D几何来实现强大的3D感知姿势调节和增强解剖一致性。为了处理细粒度的遮挡，我们提出了利用深度边缘信号进行遮挡集中培训的遮挡边界增强网络，以及遮挡感知无分类引导策略，有选择地在遮挡区域中加强调节，而不影响未遮挡区域。PersonaCraft可以无缝结合面部身份控制网络，实现全身多人个性化，因此在超越仅关注面部身份的先前方法方面取得了重大进展。我们基于SMPLx的形状参数和文本细化的双通道身体形状表示，实现了精确的全身个性化和灵活的用户定义的身体形状调整。广泛的定量实验和用户研究表明，PersonaCraft在生成高质量、多人图像时明显优于现有方法，具有准确的个性化和强大的遮挡处理能力。

更新时间: 2025-03-14 02:05:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.18068v2

Crash Severity Analysis of Child Bicyclists using Arm-Net and MambaNet

Child bicyclists (14 years and younger) are among the most vulnerable road users, often experiencing severe injuries or fatalities in crashes. This study analyzed 2,394 child bicyclist crashes in Texas from 2017 to 2022 using two deep tabular learning models (ARM-Net and MambaNet). To address the issue of data imbalance, the SMOTEENN technique was applied, resulting in balanced datasets that facilitated accurate crash severity predictions across three categories: Fatal/Severe (KA), Moderate/Minor (BC), and No Injury (O). The findings revealed that MambaNet outperformed ARM-Net, achieving higher precision, recall, F1-scores, and accuracy, particularly in the KA and O categories. Both models highlighted challenges in distinguishing BC crashes due to overlapping characteristics. These insights underscored the value of advanced tabular deep learning methods and balanced datasets in understanding crash severity. While limitations such as reliance on categorical data exist, future research could explore continuous variables and real-time behavioral data to enhance predictive modeling and crash mitigation strategies.

Updated: 2025-03-14 02:02:14

标题: 儿童自行车骑行者事故严重程度分析：使用Arm-Net和MambaNet

摘要: 儿童骑自行车者（14岁及以下）是最易受伤或在交通事故中遇难的道路使用者之一。本研究利用两种深度表格学习模型（ARM-Net和MambaNet）分析了2017年至2022年间德克萨斯州的2,394起儿童骑自行车事故。为解决数据不平衡问题，采用了SMOTEENN技术，生成了平衡的数据集，有助于在三个类别（严重/致命（KA）、中度/轻微（BC）和无伤害（O））之间准确预测事故严重程度。研究结果显示，MambaNet的表现优于ARM-Net，在KA和O类别特别是在精准率、召回率、F1分数和准确率方面表现更好。两种模型都强调了由于特征重叠而难以区分BC事故的挑战。这些发现强调了高级表格深度学习方法和平衡数据集在理解事故严重程度方面的价值。尽管存在依赖分类数据的局限性，未来的研究可以探索连续变量和实时行为数据，以增强预测建模和事故缓解策略。

更新时间: 2025-03-14 02:02:14

领域: cs.LG

下载: http://arxiv.org/abs/2503.11003v1

Physics-based simulation ontology: an ontology to support modelling and reuse of data for physics-based simulation

The current work presents an ontology developed for physics-based simulation in engineering design, called Physics-based Simulation Ontology (PSO). The purpose of the ontology is to assist in modelling the physical phenomenon of interest in a veridical manner, while capturing the necessary and reusable information for physics-based simulation solvers. The development involved extending an existing upper ontology, Basic Formal Ontology (BFO), to define lower-level terms of PSO. PSO has two parts: PSO-Physics, which consists of terms and relations used to model physical phenomena based on the perspective of classical mechanics involving partial differential equations, and PSO-Sim, which consists of terms used to represent the information artefacts that are about the physical phenomena modelled with PSO-Physics. The former terms are used to model the physical phenomenon of interest independent of solver-specific interpretations, which can be reused across different solvers, while the latter terms are used to instantiate solver-specific input data. A case study involving two simulation solvers was conducted to demonstrate this capability of PSO. Discussion around the benefits and limitations of using BFO for the current work is also provided, which should be valuable for any future work that extends an existing upper ontology to develop ontologies for engineering applications.

Updated: 2025-03-14 01:51:42

标题: 基于物理学仿真的本体论：一种支持物理学仿真建模和数据重用的本体论

摘要: 这项研究介绍了一个为工程设计中基于物理的模拟开发的本体论，称为基于物理的模拟本体论（PSO）。本体论的目的是帮助以真实方式对感兴趣的物理现象进行建模，同时捕捉物理模拟求解器所需的必要和可重复使用的信息。开发过程涉及扩展现有的上位本体，基本形式本体论（BFO），以定义PSO的低级术语。PSO分为两部分：PSO-Physics，其中包含基于古典力学透视的基于偏微分方程的物理现象建模的术语和关系；PSO-Sim，其中包含用于表示使用PSO-Physics建模的物理现象的信息工件的术语。前者术语用于独立于特定求解器解释对感兴趣的物理现象进行建模，可在不同求解器之间重复使用；而后者术语用于实例化特定求解器的输入数据。进行了涉及两个模拟求解器的案例研究，以展示PSO的这种能力。对使用BFO进行当前工作的好处和限制进行了讨论，这对于任何将现有上位本体扩展以开发用于工程应用的本体论的未来工作应该是有价值的。

更新时间: 2025-03-14 01:51:42

领域: cs.AI

下载: http://arxiv.org/abs/2503.11723v1

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by reducing biases contrastively or amplifying the weights of visual embedding during decoding. However, these approaches improve visual perception at the cost of impairing the language reasoning capability. In this work, we propose the Perception Magnifier (PM), a novel visual decoding method that iteratively isolates relevant visual tokens based on attention and magnifies the corresponding regions, spurring the model to concentrate on fine-grained visual details during decoding. Specifically, by magnifying critical regions while preserving the structural and contextual information at each decoding step, PM allows the VLM to enhance its scrutiny of the visual input, hence producing more accurate and faithful responses. Extensive experimental results demonstrate that PM not only achieves superior hallucination mitigation but also enhances language generation while preserving strong reasoning capabilities. Code is available at https://github.com/ShunqiM/PM .

Updated: 2025-03-14 01:48:33

标题: 透过放大镜：用于无幻觉的VLM解码的自适应感知放大

摘要: 现有的视觉语言模型（VLMs）经常遭受视觉幻觉的困扰，即生成的响应包含与视觉输入不符的不准确性。在没有对模型进行微调的情况下解决这个问题的努力主要通过在解码过程中减少偏差对比或放大视觉嵌入的权重来减轻幻觉。然而，这些方法改善了视觉感知，但损害了语言推理能力。在这项工作中，我们提出了Perception Magnifier（PM），一种新颖的视觉解码方法，根据注意力逐步分离相关的视觉标记，并放大相应的区域，促使模型在解码过程中集中精力关注细粒度的视觉细节。具体而言，通过在每个解码步骤中放大关键区域同时保留结构和上下文信息，PM允许VLM增强对视觉输入的审查，从而产生更准确和忠实的响应。大量实验证明，PM不仅实现了卓越的幻觉减轻，还增强了语言生成能力，同时保持了强大的推理能力。代码可在https://github.com/ShunqiM/PM 上找到。

更新时间: 2025-03-14 01:48:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.10183v2

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance pragmatic diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. To address this challenge, we propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as an axis for variation. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at: https://github.com/aashish2000/RONA

Updated: 2025-03-14 01:45:38

标题: RONA：具有一致性关系的实用多样化图像字幕化

摘要: 写作助手（例如Grammarly，Microsoft Copilot）传统上通过采用句法和语义变化来描述图像组件，生成多样化的图像标题。然而，人工撰写的标题优先考虑使用语用线索传达中心信息以及视觉描述。为了增强语用多样性，有必要探索与视觉内容一起传达这些信息的替代方式。为了解决这一挑战，我们提出了RONA，一种新颖的多模态大语言模型（MLLM）提示策略，利用连贯关系作为变化的轴。我们证明，与多个领域的MLLM基线相比，RONA生成的标题具有更好的整体多样性和与地面实况的对齐。我们的代码可在以下网址找到：https://github.com/aashish2000/RONA

更新时间: 2025-03-14 01:45:38

领域: cs.CL,cs.AI,cs.CV,68T50,I.2.7; I.2.10

下载: http://arxiv.org/abs/2503.10997v1

Taming Knowledge Conflicts in Language Models

Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge. Previous works attribute this conflict to the interplay between "memory heads" and "context heads", attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the "superposition of contextual information and parametric memory", where highly influential attention heads could simultaneously contribute to both memory and context. Building upon this insight, we propose Just Run Twice (JUICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning. JUICE identifies a set of reliable attention heads and leverages a dual-run approach to mitigate the superposition effects. Extensive experiments across 11 datasets and 6 model architectures demonstrate that JUICE sets the new state-of-the-art performance and robust generalization, achieving significant and consistent improvement across different domains under various conflict types. Finally, we theoretically analyze knowledge conflict and the superposition of contextual information and parametric memory in attention heads, which further elucidates the effectiveness of JUICE in these settings.

Updated: 2025-03-14 01:45:00

标题: 驯化语言模型中的知识冲突

摘要: 语言模型（LMs）在参数化记忆与上下文知识相矛盾时经常遇到知识冲突。先前的研究将这种冲突归因于“记忆头”和“上下文头”之间的相互作用，这些注意力头被认为只促进记忆或上下文。在这项研究中，我们超越了这一基本假设，揭示了一个我们称之为“上下文信息和参数化记忆的叠加”的关键现象，高度影响的注意力头可以同时对记忆和上下文做出贡献。基于这一见解，我们提出了Just Run Twice（JUICE），这是一种测试时间的注意力干预方法，可以将LMs引导向参数化信念或上下文知识，而无需进行微调。JUICE识别出一组可靠的注意力头，并利用双重运行方法来减轻叠加效应。在11个数据集和6个模型架构上进行了大量实验证明，JUICE确立了新的最先进性能和稳健的泛化能力，实现了在不同领域和各种冲突类型下的显著和一致的改进。最后，我们在注意力头中从理论上分析了知识冲突和上下文信息与参数化记忆的叠加，进一步阐明了JUICE在这些设置中的有效性。

更新时间: 2025-03-14 01:45:00

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.10996v1

On the Limitations of Vision-Language Models in Understanding Image Transforms

Vision Language Models (VLMs) have demonstrated significant potential in various downstream tasks, including Image/Video Generation, Visual Question Answering, Multimodal Chatbots, and Video Understanding. However, these models often struggle with basic image transformations. This paper investigates the image-level understanding of VLMs, specifically CLIP by OpenAI and SigLIP by Google. Our findings reveal that these models lack comprehension of multiple image-level augmentations. To facilitate this study, we created an augmented version of the Flickr8k dataset, pairing each image with a detailed description of the applied transformation. We further explore how this deficiency impacts downstream tasks, particularly in image editing, and evaluate the performance of state-of-the-art Image2Image models on simple transformations.

Updated: 2025-03-14 01:44:17

标题: 关于视觉-语言模型在理解图像转换中的局限性

摘要: 视觉语言模型（VLMs）在各种下游任务中展现出显著潜力，包括图像/视频生成、视觉问题回答、多模态聊天机器人和视频理解。然而，这些模型经常在基本图像转换方面遇到困难。本文研究了VLMs，特别是OpenAI的CLIP和Google的SigLIP在图像级别的理解。我们的研究发现，这些模型缺乏对多个图像级别增强的理解。为了促进这项研究，我们创建了Flickr8k数据集的增强版本，将每个图像与应用转换的详细描述配对。我们进一步探讨了这种缺陷如何影响下游任务，特别是在图像编辑方面，并评估了最先进的Image2Image模型在简单转换上的性能。

更新时间: 2025-03-14 01:44:17

领域: cs.CV,cs.AI,cs.CL,I.4; I.2.10; I.2.7

下载: http://arxiv.org/abs/2503.09837v2

Compute Optimal Scaling of Skills: Knowledge vs Reasoning

Scaling laws are a critical component of the LLM development pipeline, most famously as a way to forecast training decisions such as 'compute-optimally' trading-off parameter count and dataset size, alongside a more recent growing list of other crucial decisions. In this work, we ask whether compute-optimal scaling behaviour can be skill-dependent. In particular, we examine knowledge and reasoning-based skills such as knowledge-based QA and code generation, and we answer this question in the affirmative: scaling laws are skill-dependent. Next, to understand whether skill-dependent scaling is an artefact of the pretraining datamix, we conduct an extensive ablation of different datamixes and find that, also when correcting for datamix differences, knowledge and code exhibit fundamental differences in scaling behaviour. We conclude with an analysis of how our findings relate to standard compute-optimal scaling using a validation set, and find that a misspecified validation set can impact compute-optimal parameter count by nearly 50%, depending on its skill composition.

Updated: 2025-03-14 01:39:39

标题: 计算技能的最佳缩放：知识与推理

摘要: 缩放定律是LLM发展管线的关键组成部分，最为著名的是作为一种预测训练决策的方式，例如在参数数量和数据集大小之间进行“计算最优”权衡，以及最近增长的其他关键决策清单。在这项工作中，我们问了一个问题：计算最优的缩放行为是否会依赖于技能。具体来说，我们检查了基于知识和推理的技能，比如基于知识的问答和代码生成，我们肯定地回答了这个问题：缩放定律是依赖于技能的。接下来，为了了解技能相关的缩放是否是预训练数据混合的产物，我们对不同数据混合进行了广泛的剥离，并发现，即使在纠正数据混合差异的情况下，知识和代码在缩放行为上也存在根本性差异。我们最后分析了我们的发现如何与使用验证集进行标准计算最优缩放相关，发现一个错误规定的验证集可能会影响参数数量的计算最优性，最高可达近50%，取决于其技能组成。

更新时间: 2025-03-14 01:39:39

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.10061v2

Riemannian Geometric-based Meta Learning

Meta-learning, or "learning to learn," aims to enable models to quickly adapt to new tasks with minimal data. While traditional methods like Model-Agnostic Meta-Learning (MAML) optimize parameters in Euclidean space, they often struggle to capture complex learning dynamics, particularly in few-shot learning scenarios. To address this limitation, we propose Stiefel-MAML, which integrates Riemannian geometry by optimizing within the Stiefel manifold, a space that naturally enforces orthogonality constraints. By leveraging the geometric structure of the Stiefel manifold, we improve parameter expressiveness and enable more efficient optimization through Riemannian gradient calculations and retraction operations. We also introduce a novel kernel-based loss function defined on the Stiefel manifold, further enhancing the model's ability to explore the parameter space. Experimental results on benchmark datasets--including Omniglot, Mini-ImageNet, FC-100, and CUB--demonstrate that Stiefel-MAML consistently outperforms traditional MAML, achieving superior performance across various few-shot learning tasks. Our findings highlight the potential of Riemannian geometry to enhance meta-learning, paving the way for future research on optimizing over different geometric structures.

Updated: 2025-03-14 01:34:55

标题: 黎曼几何基础的元学习

摘要: 元学习，或者“学会学习”，旨在使模型能够在最少的数据情况下快速适应新任务。虽然传统方法如无模型元学习（MAML）在欧几里得空间中优化参数，但它们通常很难捕捉复杂的学习动态，特别是在少样本学习场景中。为了解决这一限制，我们提出了Stiefel-MAML，它通过在Stiefel流形内进行优化，集成了黎曼几何，这个空间自然地强制了正交约束。通过利用Stiefel流形的几何结构，我们改善了参数表达能力，并通过黎曼梯度计算和回退操作实现更有效的优化。我们还引入了一个在Stiefel流形上定义的基于核的损失函数，进一步增强了模型探索参数空间的能力。在基准数据集上的实验结果，包括Omniglot、Mini-ImageNet、FC-100和CUB，表明Stiefel-MAML始终优于传统的MAML，在各种少样本学习任务中取得了更优异的性能。我们的研究结果突显了黎曼几何提升元学习的潜力，为未来在不同几何结构上优化的研究铺平了道路。

更新时间: 2025-03-14 01:34:55

领域: cs.LG

下载: http://arxiv.org/abs/2503.10993v1

Few-Shot Learning for Mental Disorder Detection: A Continuous Multi-Prompt Engineering Approach with Medical Knowledge Injection

This study harnesses state-of-the-art AI technology for detecting mental disorders through user-generated textual content. Existing studies typically rely on fully supervised machine learning, which presents challenges such as the labor-intensive manual process of annotating extensive training data for each research problem and the need to design specialized deep learning architectures for each task. We propose a novel method to address these challenges by leveraging large language models and continuous multi-prompt engineering, which offers two key advantages: (1) developing personalized prompts that capture each user's unique characteristics and (2) integrating structured medical knowledge into prompts to provide context for disease detection and facilitate predictive modeling. We evaluate our method using three widely prevalent mental disorders as research cases. Our method significantly outperforms existing methods, including feature engineering, architecture engineering, and discrete prompt engineering. Meanwhile, our approach demonstrates success in few-shot learning, i.e., requiring only a minimal number of training examples. Moreover, our method can be generalized to other rare mental disorder detection tasks with few positive labels. In addition to its technical contributions, our method has the potential to enhance the well-being of individuals with mental disorders and offer a cost-effective, accessible alternative for stakeholders beyond traditional mental disorder screening methods.

Updated: 2025-03-14 01:34:01

标题: 少样本学习用于心理障碍检测：一种连续的多提示工程方法与医学知识注入

摘要: 这项研究利用最先进的人工智能技术来检测通过用户生成的文本内容来检测精神障碍。现有研究通常依赖于完全监督的机器学习，这带来了挑战，例如为每个研究问题注释大量训练数据的劳动密集型手动过程以及需要为每个任务设计专门的深度学习架构。我们提出了一种新颖的方法来解决这些挑战，通过利用大型语言模型和连续多提示工程，这提供了两个关键优势：（1）开发个性化提示，捕捉每个用户的独特特征和（2）将结构化医学知识整合到提示中，为疾病检测提供背景，并促进预测建模。我们使用三种广泛存在的精神障碍作为研究案例来评估我们的方法。我们的方法明显优于现有方法，包括特征工程、体系结构工程和离散提示工程。与此同时，我们的方法在少样本学习方面表现出成功，即仅需要极少量的训练示例。此外，我们的方法可以推广到其他少数阳性标签的罕见精神障碍检测任务。除了其技术贡献外，我们的方法有潜力提升患有精神障碍的个体的福祉，并为传统精神障碍筛查方法之外的利益相关者提供一种经济有效、易于访问的替代方案。

更新时间: 2025-03-14 01:34:01

领域: cs.CL,cs.AI,K.5,I.2.7; H.4.m

下载: http://arxiv.org/abs/2401.12988v2

Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium

Aligning large language models (LLMs) with diverse human preferences is critical for ensuring fairness and informed outcomes when deploying these models for decision-making. In this paper, we seek to uncover fundamental statistical limits concerning aligning LLMs with human preferences, with a focus on the probabilistic representation of human preferences and the preservation of diverse preferences in aligned LLMs. We first show that human preferences can be represented by a reward model if and only if the preference among LLM-generated responses is free of any Condorcet cycle. Moreover, we prove that Condorcet cycles exist with probability converging to one exponentially fast under a probabilistic preference model, thereby demonstrating the impossibility of fully aligning human preferences using reward-based approaches such as reinforcement learning from human feedback. Next, we explore the conditions under which LLMs would employ mixed strategies -- meaning they do not collapse to a single response -- when aligned in the limit using a non-reward-based approach, such as Nash learning from human feedback (NLHF). We identify a necessary and sufficient condition for mixed strategies: the absence of a response that is preferred over all others by a majority. As a blessing, we prove that this condition holds with high probability under the probabilistic preference model, thereby highlighting the statistical possibility of preserving minority preferences without explicit regularization in aligning LLMs. Finally, we leverage insights from our statistical results to design a novel, computationally efficient algorithm for finding Nash equilibria in aligning LLMs with NLHF. Our experiments show that Llama-3.2-1B, aligned with our algorithm, achieves a win rate of 60.55\% against the base model.

Updated: 2025-03-14 01:29:21

标题: 无法统计和可能性将LLMs与人类偏好对齐：从Condorcet悖论到Nash均衡

摘要: 将大型语言模型（LLMs）与多样化的人类偏好对齐对于确保在决策时使用这些模型时的公平性和知情性至关重要。在本文中，我们试图揭示有关将LLMs与人类偏好对齐的基本统计限制，重点关注人类偏好的概率表示以及在对齐的LLMs中保留多样化偏好的问题。我们首先证明，只有在LLM生成的回应之间的偏好不包含Condorcet循环时，人类偏好才能被奖励模型表示。此外，我们证明，在概率偏好模型下，Condorcet循环以指数级别的速度收敛于一，因此证明了使用基于奖励的方法（如从人类反馈中进行强化学习）完全对齐人类偏好的不可能性。接下来，我们探讨了在极限情况下采用混合策略的条件，即当LLMs使用非基于奖励的方法（如从人类反馈中进行纳什学习）对齐时，它们不会收敛到一个单一的回应。我们确定了混合策略的一个必要且充分条件：不存在一个被大多数人偏好于其他所有回应的回应。幸运的是，我们证明在概率偏好模型下，这个条件以高概率成立，从而突显了在对齐LLMs时保留少数派偏好的统计可能性，而无需明确规范化。最后，我们利用我们的统计结果的见解设计了一种新颖的、计算高效的算法，用于在对齐LLMs与NLHF时找到纳什均衡。我们的实验表明，通过我们的算法对齐的Llama-3.2-1B模型在与基准模型的对抗中取得了60.55\%的胜率。

更新时间: 2025-03-14 01:29:21

领域: cs.GT,cs.LG,econ.TH,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2503.10990v1

Parameter-Efficient Fine-Tuning of State Space Models

Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules-yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.

Updated: 2025-03-14 01:26:57

标题: 参数高效的状态空间模型微调

摘要: 深度状态空间模型（SSMs），如Mamba（Gu＆Dao，2024），已成为语言建模的强大工具，具有高性能和线性可伸缩性，适用于序列长度。然而，将参数高效微调（PEFT）方法应用于基于SSM的模型仍未得到充分探索。我们首先研究了现有PEFT方法的两个基本问题：（i）它们在基于SSM的模型上的表现如何？（ii）它们应该针对哪些参数以获得最佳结果？我们的分析表明，LoRA及其变体始终优于所有其他PEFT方法。虽然LoRA对于线性投影矩阵有效，但在SSM模块上失败，但仍优于其他适用于SSM的方法，表明它们的局限性。这凸显了需要一种专门的SSM调整方法。为了解决这个问题，我们提出了稀疏维度调整（SDT），这是一种专为SSM模块量身定制的PEFT方法。将SDT用于SSM与LoRA用于线性投影矩阵相结合，我们在广泛实验中实现了最先进的性能。

更新时间: 2025-03-14 01:26:57

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.09016v2

HopCast: Calibration of Autoregressive Dynamics Models

Deep learning models are often trained to approximate dynamical systems that can be modeled using differential equations. These models are optimized to predict one step ahead and produce calibrated predictions if the predictive model can quantify uncertainty, such as deep ensembles. At inference time, multi-step predictions are generated via autoregression, which needs a sound uncertainty propagation method (e.g., Trajectory Sampling) to produce calibrated multi-step predictions. This paper introduces an approach named HopCast that uses the Modern Hopfield Network (MHN) to learn the residuals of a deterministic model that approximates the dynamical system. The MHN predicts the density of residuals based on a context vector at any timestep during autoregression. This approach produces calibrated multi-step predictions without uncertainty propagation and turns a deterministic model into a calibrated probabilistic model. This work is also the first to benchmark existing uncertainty propagation methods based on calibration errors with deep ensembles for multi-step predictions.

Updated: 2025-03-14 01:19:34

标题: HopCast：自回归动态模型的校准

摘要: 深度学习模型通常被训练来逼近可以使用微分方程建模的动态系统。这些模型被优化以预测一步，并在预测模型能够量化不确定性时产生校准预测，例如深度集成。在推断时，通过自回归生成多步预测，需要一个良好的不确定性传播方法（例如轨迹抽样）来产生校准的多步预测。本文介绍了一种名为HopCast的方法，它使用现代霍普菲尔德网络（MHN）来学习逼近动态系统的确定性模型的残差。MHN根据任何时间步的上下文向量预测残差的密度。这种方法产生了校准的多步预测，无需不确定性传播，并将确定性模型转变为校准的概率模型。这项工作也是第一个使用深度集成基于校准误差对多步预测进行基准测试的现有不确定性传播方法。

更新时间: 2025-03-14 01:19:34

领域: cs.LG

下载: http://arxiv.org/abs/2501.16587v2

Image-Goal Navigation Using Refined Feature Guidance and Scene Graph Enhancement

In this paper, we introduce a novel image-goal navigation approach, named RFSG. Our focus lies in leveraging the fine-grained connections between goals, observations, and the environment within limited image data, all the while keeping the navigation architecture simple and lightweight. To this end, we propose the spatial-channel attention mechanism, enabling the network to learn the importance of multi-dimensional features to fuse the goal and observation features. In addition, a selfdistillation mechanism is incorporated to further enhance the feature representation capabilities. Given that the navigation task needs surrounding environmental information for more efficient navigation, we propose an image scene graph to establish feature associations at both the image and object levels, effectively encoding the surrounding scene information. Crossscene performance validation was conducted on the Gibson and HM3D datasets, and the proposed method achieved stateof-the-art results among mainstream methods, with a speed of up to 53.5 frames per second on an RTX3080. This contributes to the realization of end-to-end image-goal navigation in realworld scenarios. The implementation and model of our method have been released at: https://github.com/nubot-nudt/RFSG.

Updated: 2025-03-14 01:15:24

标题: 使用精细特征引导和场景图增强的图像目标导航

摘要: 在本文中，我们介绍了一种新颖的图像目标导航方法，名为RFSG。我们的重点在于利用有限的图像数据中目标、观察和环境之间的精细连接，同时保持导航架构简单和轻量化。为此，我们提出了空间通道注意机制，使网络能够学习多维特征的重要性，以融合目标和观察特征。此外，我们还融入了自我蒸馏机制，进一步增强特征表示能力。考虑到导航任务需要周围环境信息以实现更有效的导航，我们提出了图像场景图，以在图像和对象级别建立特征关联，有效地编码周围场景信息。在Gibson和HM3D数据集上进行了跨场景性能验证，提出的方法在主流方法中取得了最先进的结果，在RTX3080上的速度高达每秒53.5帧。这有助于在现实世界场景中实现端到端的图像目标导航。我们的方法的实现和模型已在以下网址发布：https://github.com/nubot-nudt/RFSG。

更新时间: 2025-03-14 01:15:24

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.10986v1

The Problem of the Priors, or Posteriors?

The problem of the priors is well known: it concerns the challenge of identifying norms that govern one's prior credences. I argue that a key to addressing this problem lies in considering what I call the problem of the posteriors -- the challenge of identifying norms that directly govern one's posterior credences, which then induce constraints on the priors via the diachronic requirement of conditionalization. This forward-looking approach can be summarized as: Think ahead, work backward. Although this idea can be traced to Freedman (1963), Carnap (1963), and Shimony (1970), it has received little attention in philosophy. In this paper, I initiate a systematic defense of forward-looking Bayesianism, addressing potential objections from more traditional views (both subjectivist and objectivist) and arguing for its advantages. In particular, I develop a specific approach to forward-looking Bayesianism -- one that treats the convergence of posterior credences to the truth as a fundamental rather than derived normative requirement. This approach, called convergentist Bayesianism, is argued to be crucial for a Bayesian foundation of Ockham's razor and related inference methods in statistics and machine learning.

Updated: 2025-03-14 01:06:34

标题: 先验的问题，还是后验的问题？

摘要: 先验问题是众所周知的：它涉及确定规范，规范指导一个人的先验信念。我认为解决这个问题的关键在于考虑我所称之为后验问题--确定直接指导一个人后验信念的规范的挑战，然后通过条件化的历时要求对先验信念施加约束。这种前瞻性的方法可以概括为：先想后做。尽管这个想法可以追溯到Freedman（1963）、Carnap（1963）和Shimony（1970），但在哲学领域中却鲜有关注。在这篇论文中，我开始系统地为前瞻性贝叶斯主义进行辩护，解决来自更传统观点（主观主义和客观主义）的潜在反对意见，并论证其优势。特别是，我发展了一种特定的前瞻性贝叶斯主义方法--将后验信念的收敛到真相视为基本而非派生的规范要求。这种方法被称为收敛贝叶斯主义，被认为对于贝叶斯原则和统计学以及机器学习中的奥卡姆剃刀和相关推理方法至关重要。

更新时间: 2025-03-14 01:06:34

领域: stat.OT,cs.AI,math.PR

下载: http://arxiv.org/abs/2503.10984v1

From Dionysius Emerges Apollo -- Learning Patterns and Abstractions from Perceptual Sequences

Cognition swiftly breaks high-dimensional sensory streams into familiar parts and uncovers their relations. Why do structures emerge, and how do they enable learning, generalization, and prediction? What computational principles underlie this core aspect of perception and intelligence? A sensory stream, simplified, is a one-dimensional sequence. In learning such sequences, we naturally segment them into parts -- a process known as chunking. In the first project, I investigated factors influencing chunking in a serial reaction time task and showed that humans adapt to underlying chunks while balancing speed and accuracy. Building on this, I developed models that learn chunks and parse sequences chunk by chunk. Normatively, I proposed chunking as a rational strategy for discovering recurring patterns and nested hierarchies, enabling efficient sequence factorization. Learned chunks serve as reusable primitives for transfer, composition, and mental simulation -- letting the model compose the new from the known. I demonstrated this model's ability to learn hierarchies in single and multi-dimensional sequences and highlighted its utility for unsupervised pattern discovery. The second part moves from concrete to abstract sequences. I taxonomized abstract motifs and examined their role in sequence memory. Behavioral evidence suggests that humans exploit pattern redundancies for compression and transfer. I proposed a non-parametric hierarchical variable model that learns both chunks and abstract variables, uncovering invariant symbolic patterns. I showed its similarity to human learning and compared it to large language models. Taken together, this thesis suggests that chunking and abstraction as simple computational principles enable structured knowledge acquisition in hierarchically organized sequences, from simple to complex, concrete to abstract.

Updated: 2025-03-14 00:37:28

标题: 从狄奥尼修斯出现阿波罗-从感知序列中学习模式和抽象

摘要: 认知迅速将高维感官流分解成熟悉的部分并揭示它们之间的关系。结构是如何产生的，以及它们如何促进学习、泛化和预测？哪些计算原则支撑了感知和智能的核心方面？简化后，感官流是一个一维序列。在学习这些序列时，我们自然地将它们分段，这个过程被称为分块。在第一个项目中，我调查了影响串联反应时间任务中分块的因素，并展示了人类在平衡速度和准确性的同时适应基本块。在此基础上，我开发了学习基本块并逐块解析序列的模型。规范上，我提出分块作为一种合理的策略，用于发现重复模式和嵌套层次结构，从而实现高效的序列分解。学习的基本块可以作为可重复使用的原语进行转移、组合和心理模拟，使模型从已知中组合出新的内容。我展示了这个模型学习单维和多维序列中的层次结构的能力，并强调了它在无监督模式发现中的实用性。第二部分从具体序列转向抽象序列。我对抽象主题进行了分类，并探讨了它们在序列记忆中的作用。行为证据表明，人类利用模式的冗余性进行压缩和转移。我提出了一个非参数化的分层可变模型，学习基本块和抽象变量，揭示不变的符号模式。我展示了它与人类学习的相似性，并将其与大型语言模型进行了比较。总的来说，这篇论文表明，分块和抽象作为简单的计算原则使得在层次化组织的序列中获得结构化知识成为可能，从简单到复杂，从具体到抽象。

更新时间: 2025-03-14 00:37:28

领域: cs.LG

下载: http://arxiv.org/abs/2503.10973v1

Why Johnny Signs with Sigstore: Examining Tooling as a Factor in Software Signing Adoption in the Sigstore Ecosystem

The software supply chain security problem arises from integrating software components from several sources. The integrity of these components is ensured by the use of provenance tools, of which software signing is the strongest guarantee. While software signing has been recommended by regulation and industry consortia, practical adoption of software signing has been generally limited. While tooling has been recognized as a key factor influencing software signing adoption and quality by previous studies, most research has focused primarily on its user interface aspects, with little research on other usability considerations like tool selection, user challenges, software engineering process integration intricacies, etc. To understand how software tools influence the practice and adoption of software signing, we study the formative usability of Sigstore, a modern and widely adopted software signing tool. We interviewed thirteen (13) experienced security practitioners to study the factors that influence the selection of a tool, the problems associated with the use of such tools, how practitioners' software signing tools have evolved, and what drives this migration. To summarize our findings: (1) We highlight the various factors practitioners consider before adopting a software signing tool; (2) We highlight the problems and advantages associated with the current tooling choices of practitioners; and (3) We describe the evolution of tooling adoption of our sample population. Our findings provide the software signing tool development community with valuable insights to improve their design of software signing tools.

Updated: 2025-03-14 00:30:15

标题: 为什么约翰与Sigstore签署协议：审视在Sigstore生态系统中软件签名采用的工具化因素

摘要: 软件供应链安全问题源于整合来自多个来源的软件组件。通过使用溯源工具来确保这些组件的完整性，其中软件签名是最强有力的保证。虽然软件签名已被法规和行业联盟推荐，但实际上的软件签名采用普遍受限。尽管之前的研究已经认识到工具是影响软件签名采用和质量的关键因素，但大多数研究主要集中在其用户界面方面，对于其他可用性考虑如工具选择、用户挑战、软件工程流程整合等方面的研究较少。为了了解软件工具如何影响软件签名的实践和采用，我们研究了现代和广泛采用的软件签名工具Sigstore的形成可用性。我们访谈了十三名经验丰富的安全从业者，研究了影响工具选择、使用此类工具所遇到的问题、从业者的软件签名工具的演变以及推动这种迁移的因素。总结我们的研究结果：(1) 我们强调从业者在采用软件签名工具之前考虑的各种因素；(2) 我们强调了与目前从业者工具选择相关的问题和优势；(3) 我们描述了我们样本人群工具采用的演变。我们的研究结果为软件签名工具开发社区提供了宝贵的见解，以改进他们的软件签名工具设计。

更新时间: 2025-03-14 00:30:15

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2503.00271v3

TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools

Precision therapeutics require multimodal adaptive models that generate personalized treatment recommendations. We introduce TxAgent, an AI agent that leverages multi-step reasoning and real-time biomedical knowledge retrieval across a toolbox of 211 tools to analyze drug interactions, contraindications, and patient-specific treatment strategies. TxAgent evaluates how drugs interact at molecular, pharmacokinetic, and clinical levels, identifies contraindications based on patient comorbidities and concurrent medications, and tailors treatment strategies to individual patient characteristics. It retrieves and synthesizes evidence from multiple biomedical sources, assesses interactions between drugs and patient conditions, and refines treatment recommendations through iterative reasoning. It selects tools based on task objectives and executes structured function calls to solve therapeutic tasks that require clinical reasoning and cross-source validation. The ToolUniverse consolidates 211 tools from trusted sources, including all US FDA-approved drugs since 1939 and validated clinical insights from Open Targets. TxAgent outperforms leading LLMs, tool-use models, and reasoning agents across five new benchmarks: DrugPC, BrandPC, GenericPC, TreatmentPC, and DescriptionPC, covering 3,168 drug reasoning tasks and 456 personalized treatment scenarios. It achieves 92.1% accuracy in open-ended drug reasoning tasks, surpassing GPT-4o and outperforming DeepSeek-R1 (671B) in structured multi-step reasoning. TxAgent generalizes across drug name variants and descriptions. By integrating multi-step inference, real-time knowledge grounding, and tool-assisted decision-making, TxAgent ensures that treatment recommendations align with established clinical guidelines and real-world evidence, reducing the risk of adverse events and improving therapeutic decision-making.

Updated: 2025-03-14 00:28:15

标题: TxAgent：一个跨工具宇宙进行治疗推理的AI代理

摘要: 精准治疗需要多模态自适应模型，生成个性化治疗建议。我们介绍了TxAgent，一个AI代理，利用多步推理和实时生物医学知识检索，在一个包含211个工具的工具箱中分析药物相互作用、禁忌症和患者特定治疗策略。TxAgent评估药物在分子、药代动力学和临床水平的相互作用，根据患者合并症和同时用药识别禁忌症，并根据个体患者特征调整治疗策略。它从多个生物医学来源检索和综合证据，评估药物与患者状况之间的相互作用，并通过迭代推理优化治疗建议。它根据任务目标选择工具，并执行结构化功能调用来解决需要临床推理和跨源验证的治疗任务。ToolUniverse整合了来自受信任来源的211个工具，包括自1939年以来所有美国FDA批准的药物以及来自Open Targets的验证临床见解。TxAgent在五个新基准测试中胜过领先的LLMs、工具使用模型和推理代理：DrugPC、BrandPC、GenericPC、TreatmentPC和DescriptionPC，涵盖3168个药物推理任务和456个个性化治疗场景。在开放式药物推理任务中实现92.1%的准确率，超过GPT-4o，并在结构化多步推理中胜过DeepSeek-R1（671B）。TxAgent在药物名称变体和描述之间具有泛化能力。通过整合多步推理、实时知识基础和工具辅助决策，TxAgent确保治疗建议符合已建立的临床指南和真实世界证据，降低不良事件的风险，改善治疗决策。

更新时间: 2025-03-14 00:28:15

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.10970v1

Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms

Large Language Models (LLMs) have shown notable potential in code generation for optimization algorithms, unlocking exciting new opportunities. This paper examines how LLMs, rather than creating algorithms from scratch, can improve existing ones without the need for specialized expertise. To explore this potential, we selected 10 baseline optimization algorithms from various domains (metaheuristics, reinforcement learning, deterministic, and exact methods) to solve the classic Travelling Salesman Problem. The results show that our simple methodology often results in LLM-generated algorithm variants that improve over the baseline algorithms in terms of solution quality, reduction in computational time, and simplification of code complexity, all without requiring specialized optimization knowledge or advanced algorithmic implementation skills.

Updated: 2025-03-14 00:26:00

标题: 组合优化对所有人都是开放的：利用LLMs帮助非专家改进优化算法

摘要: 大型语言模型(LLMs)在优化算法的代码生成方面表现出显著的潜力，开启了令人兴奋的新机遇。本文探讨了LLMs如何能够改进现有算法，而无需专业知识。为了探索这一潜力，我们选择了来自不同领域(元启发式、强化学习、确定性和精确方法)的10种基准优化算法来解决经典的旅行推销员问题。结果表明，我们的简单方法往往会产生LLM生成的算法变体，这些变体在解决方案质量、计算时间减少以及代码复杂性简化方面优于基准算法，而无需专业的优化知识或高级的算法实施技能。

更新时间: 2025-03-14 00:26:00

领域: cs.AI,cs.CL,cs.LG,cs.SE

下载: http://arxiv.org/abs/2503.10968v1

Auditing language models for hidden objectives

We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model generalizes to exhibit whatever behaviors it believes RMs rate highly, including ones not reinforced during training. We leverage this model to study alignment audits in two ways. First, we conduct a blind auditing game where four teams, unaware of the model's hidden objective or training, investigate it for concerning behaviors and their causes. Three teams successfully uncovered the model's hidden objective using techniques including interpretability with sparse autoencoders (SAEs), behavioral attacks, and training data analysis. Second, we conduct an unblinded follow-up study of eight techniques for auditing the model, analyzing their strengths and limitations. Overall, our work provides a concrete example of using alignment audits to discover a model's hidden objective and proposes a methodology for practicing and validating progress in alignment auditing.

Updated: 2025-03-14 00:21:15

标题: 审计语言模型中的隐藏目标

摘要: 我们研究了进行对齐审计的可行性：调查模型是否具有不良目标。作为一个测试平台，我们训练了一个具有隐藏目标的语言模型。我们的训练流程首先教导模型有关RLHF奖励模型(RMs)中可利用的错误，然后训练模型利用其中一些错误。我们通过超出分布的评估验证，模型能够泛化展示出它认为RMs评分高的任何行为，包括在训练过程中未加强的行为。我们利用这个模型以两种方式研究对齐审计。首先，我们进行了一个盲审计游戏，四个团队不知道模型的隐藏目标或训练，调查其是否存在令人担忧的行为及其原因。其中三个团队成功利用包括稀疏自编码器(SAEs)的可解释性、行为攻击和训练数据分析等技术揭示了模型的隐藏目标。其次，我们进行了一个对模型进行审计的明查后续研究，分析了它们的优势和局限性。总的来说，我们的工作提供了一个具体的例子，展示了使用对齐审计来发现模型的隐藏目标，并提出了一个实践和验证对齐审计进展的方法。

更新时间: 2025-03-14 00:21:15

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.10965v1

FedMentalCare: Towards Privacy-Preserving Fine-Tuned LLMs to Analyze Mental Health Status Using Federated Learning Framework

With the increasing prevalence of mental health conditions worldwide, AI-powered chatbots and conversational agents have emerged as accessible tools to support mental health. However, deploying Large Language Models (LLMs) in mental healthcare applications raises significant privacy concerns, especially regarding regulations like HIPAA and GDPR. In this work, we propose FedMentalCare, a privacy-preserving framework that leverages Federated Learning (FL) combined with Low-Rank Adaptation (LoRA) to fine-tune LLMs for mental health analysis. We investigate the performance impact of varying client data volumes and model architectures (e.g., MobileBERT and MiniLM) in FL environments. Our framework demonstrates a scalable, privacy-aware approach for deploying LLMs in real-world mental healthcare scenarios, addressing data security and computational efficiency challenges.

Updated: 2025-03-14 00:18:36

标题: FedMentalCare：朝向隐私保护的精细调整LLMs，利用联邦学习框架分析心理健康状况

摘要: 随着全球心理健康状况的日益普遍，基于人工智能的聊天机器人和对话代理已经成为支持心理健康的可访问工具。然而，在心理保健应用中部署大型语言模型（LLMs）引发了重大的隐私问题，特别是涉及HIPAA和GDPR等法规。在这项工作中，我们提出了FedMentalCare，一个隐私保护框架，利用联邦学习（FL）结合低秩适应（LoRA）对LLMs进行微调，用于心理健康分析。我们研究了在FL环境中客户数据量和模型架构（例如MobileBERT和MiniLM）变化对性能的影响。我们的框架展示了一种可扩展的、重视隐私的方法，用于在现实世界的心理保健场景中部署LLMs，解决数据安全和计算效率挑战。

更新时间: 2025-03-14 00:18:36

领域: cs.CL,cs.HC,cs.LG,I.2.6; I.5.1; J.3; C.2.4; D.4.6

下载: http://arxiv.org/abs/2503.05786v2

Evaluating System 1 vs. 2 Reasoning Approaches for Zero-Shot Time Series Forecasting: A Benchmark and Insights

Reasoning ability is crucial for solving challenging tasks. With the advancement of foundation models, such as the emergence of large language models (LLMs), a wide range of reasoning strategies has been proposed, including test-time enhancements, such as Chain-ofThought, and post-training optimizations, as used in DeepSeek-R1. While these reasoning strategies have demonstrated effectiveness across various challenging language or vision tasks, their applicability and impact on time-series forecasting (TSF), particularly the challenging zero-shot TSF, remain largely unexplored. In particular, it is unclear whether zero-shot TSF benefits from reasoning and, if so, what types of reasoning strategies are most effective. To bridge this gap, we propose ReC4TS, the first benchmark that systematically evaluates the effectiveness of popular reasoning strategies when applied to zero-shot TSF tasks. ReC4TS conducts comprehensive evaluations across datasets spanning eight domains, covering both unimodal and multimodal with short-term and longterm forecasting tasks. More importantly, ReC4TS provides key insights: (1) Self-consistency emerges as the most effective test-time reasoning strategy; (2) Group-relative policy optimization emerges as a more suitable approach for incentivizing reasoning ability during post-training; (3) Multimodal TSF benefits more from reasoning strategies compared to unimodal TSF. Beyond these insights, ReC4TS establishes two pioneering starting blocks to support future zero-shot TSF reasoning research: (1) A novel dataset, TimeThinking, containing forecasting samples annotated with reasoning trajectories from multiple advanced LLMs, and (2) A new and simple test-time scaling-law validated on foundational TSF models enabled by self-consistency reasoning strategy. All data and code are publicly accessible at: https://github.com/AdityaLab/OpenTimeR

Updated: 2025-03-14 00:16:53

标题: 评估零样本时间序列预测中的系统1和系统2推理方法：一个基准和见解

摘要: Reasoning ability is crucial for solving challenging tasks. With the advancement of foundation models, such as the emergence of large language models (LLMs), a wide range of reasoning strategies has been proposed, including test-time enhancements, such as Chain-ofThought, and post-training optimizations, as used in DeepSeek-R1. While these reasoning strategies have demonstrated effectiveness across various challenging language or vision tasks, their applicability and impact on time-series forecasting (TSF), particularly the challenging zero-shot TSF, remain largely unexplored. In particular, it is unclear whether zero-shot TSF benefits from reasoning and, if so, what types of reasoning strategies are most effective. To bridge this gap, we propose ReC4TS, the first benchmark that systematically evaluates the effectiveness of popular reasoning strategies when applied to zero-shot TSF tasks. ReC4TS conducts comprehensive evaluations across datasets spanning eight domains, covering both unimodal and multimodal with short-term and long-term forecasting tasks. More importantly, ReC4TS provides key insights: (1) Self-consistency emerges as the most effective test-time reasoning strategy; (2) Group-relative policy optimization emerges as a more suitable approach for incentivizing reasoning ability during post-training; (3) Multimodal TSF benefits more from reasoning strategies compared to unimodal TSF. Beyond these insights, ReC4TS establishes two pioneering starting blocks to support future zero-shot TSF reasoning research: (1) A novel dataset, TimeThinking, containing forecasting samples annotated with reasoning trajectories from multiple advanced LLMs, and (2) A new and simple test-time scaling-law validated on foundational TSF models enabled by self-consistency reasoning strategy. All data and code are publicly accessible at: https://github.com/AdityaLab/OpenTimeR

更新时间: 2025-03-14 00:16:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.01895v2

Combinatorial Optimization via LLM-driven Iterated Fine-tuning

We present a novel way to integrate flexible, context-dependent constraints into combinatorial optimization by leveraging Large Language Models (LLMs) alongside traditional algorithms. Although LLMs excel at interpreting nuanced, locally specified requirements, they struggle with enforcing global combinatorial feasibility. To bridge this gap, we propose an iterated fine-tuning framework where algorithmic feedback progressively refines the LLM's output distribution. Interpreting this as simulated annealing, we introduce a formal model based on a "coarse learnability" assumption, providing sample complexity bounds for convergence. Empirical evaluations on scheduling, graph connectivity, and clustering tasks demonstrate that our framework balances the flexibility of locally expressed constraints with rigorous global optimization more effectively compared to baseline sampling methods. Our results highlight a promising direction for hybrid AI-driven combinatorial reasoning.

Updated: 2025-03-14 00:16:29

标题: 基于LLM驱动的迭代微调的组合优化

摘要: 我们提出了一种将灵活的、上下文相关的约束条件与组合优化相结合的新方法，通过利用大型语言模型（LLMs）与传统算法并行。虽然LLMs擅长解释微妙的、局部指定的要求，但在强制执行全局组合可行性方面存在困难。为了弥合这一差距，我们提出了一个迭代微调框架，其中算法反馈逐渐优化LLM的输出分布。将其解释为模拟退火，我们引入了一个基于“粗糙可学习性”假设的形式模型，为收敛提供了样本复杂性界限。在调度、图连通性和聚类任务上的实证评估表明，与基准抽样方法相比，我们的框架更有效地平衡了局部表达约束的灵活性与严格的全局优化。我们的结果突显了混合AI驱动的组合推理的一个有前途的方向。

更新时间: 2025-03-14 00:16:29

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2503.06917v2

FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Federated learning (FL) is a distributed machine learning approach that enables multiple local clients and a central server to collaboratively train a model while keeping the data on their own devices. First-order methods, particularly those incorporating variance reduction techniques, are the most widely used FL algorithms due to their simple implementation and stable performance. However, these methods tend to be slow and require a large number of communication rounds to reach the global minimizer. We propose FedOSAA, a novel approach that preserves the simplicity of first-order methods while achieving the rapid convergence typically associated with second-order methods. Our approach applies one Anderson acceleration (AA) step following classical local updates based on first-order methods with variance reduction, such as FedSVRG and SCAFFOLD, during local training. This AA step is able to leverage curvature information from the history points and gives a new update that approximates the Newton-GMRES direction, thereby significantly improving the convergence. We establish a local linear convergence rate to the global minimizer of FedOSAA for smooth and strongly convex loss functions. Numerical comparisons show that FedOSAA substantially improves the communication and computation efficiency of the original first-order methods, achieving performance comparable to second-order methods like GIANT.

Updated: 2025-03-14 00:10:02

标题: FedOSAA：使用一步Anderson加速改进联邦学习

摘要: 联邦学习（FL）是一种分布式机器学习方法，使多个本地客户端和中央服务器能够协作训练模型，同时保留数据在他们自己的设备上。一阶方法，特别是那些包含方差减少技术的方法，是最广泛使用的FL算法，因为它们实现简单且性能稳定。然而，这些方法往往速度较慢，需要大量的通信轮次才能达到全局最小化器。我们提出了FedOSAA，这是一种新颖的方法，保留了一阶方法的简单性，同时实现了通常与二阶方法相关的快速收敛。我们的方法在本地训练期间基于一阶方法与方差减少的经典局部更新，如FedSVRG和SCAFFOLD，应用一个Anderson加速（AA）步骤。这个AA步骤能够利用历史点的曲率信息，并给出一个近似于Newton-GMRES方向的新更新，从而显著提高了收敛性。我们建立了对于平滑和强凸损失函数的FedOSAA全局最小化器的局部线性收敛率。数值比较表明，FedOSAA极大地提高了原始一阶方法的通信和计算效率，实现了与GIANT等二阶方法相当的性能。

更新时间: 2025-03-14 00:10:02

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2503.10961v1