Arxiv Day: Article

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

FaaS introduces a lightweight, function-based cloud execution model that finds its relevance in a range of applications like IoT-edge data processing and anomaly detection. While cloud service providers offer a near-infinite function elasticity, these applications often experience fluctuating workloads and stricter performance constraints. A typical CSP strategy is to empirically determine and adjust desired function instances or resources, known as autoscaling, based on monitoring-based thresholds such as CPU or memory, to cope with demand and performance. However, threshold configuration either requires expert knowledge, historical data or a complete view of the environment, making autoscaling a performance bottleneck that lacks an adaptable solution. RL algorithms are proven to be beneficial in analysing complex cloud environments and result in an adaptable policy that maximizes the expected objectives. Most realistic cloud environments usually involve operational interference and have limited visibility, making them partially observable. A general solution to tackle observability in highly dynamic settings is to integrate Recurrent units with model-free RL algorithms and model a decision process as a POMDP. Therefore, in this paper, we investigate model-free Recurrent RL agents for function autoscaling and compare them against the model-free PPO algorithm. We explore the integration of a LSTM network with the state-of-the-art PPO algorithm to find that under our experimental and evaluation settings, recurrent policies were able to capture the environment parameters and show promising results for function autoscaling. We further compare a PPO-based autoscaling agent with commercially used threshold-based function autoscaling and posit that a LSTM-based autoscaling agent is able to improve throughput by 18%, function execution by 13% and account for 8.4% more function instances.

Updated: 2024-11-11 23:54:51

标题: 一个用于智能自动缩放无服务器函数的深度递归强化学习方法

摘要: FaaS引入了一种轻量级、基于函数的云执行模型，适用于诸如物联网边缘数据处理和异常检测等一系列应用。虽然云服务提供商提供了接近无限的函数弹性，但这些应用通常会经历波动的工作负载和更严格的性能约束。典型的云服务提供商策略是根据监控阈值（如CPU或内存）经验性地确定和调整所需的函数实例或资源，即自动伸缩，以应对需求和性能。然而，阈值配置要么需要专业知识，要么需要历史数据，要么需要对环境进行完整的视图，使得自动伸缩成为一个缺乏可适应性解决方案的性能瓶颈。已经证明RL算法有益于分析复杂的云环境，并产生一个最大化预期目标的可适应策略。大多数现实的云环境通常涉及运行干扰并具有有限的可见性，使其部分可观察。解决高度动态设置中的可观察性的一般解决方案是将循环单元与无模型RL算法集成，并将决策过程建模为POMDP。因此，在本文中，我们研究了无模型循环RL代理用于函数自动伸缩，并将其与无模型PPO算法进行比较。我们探索了将LSTM网络与最先进的PPO算法集成，发现在我们的实验和评估设置下，循环策略能够捕捉环境参数，并为函数自动伸缩显示出有希望的结果。我们进一步将基于PPO的自动伸缩代理与商业上使用的基于阈值的函数自动伸缩进行比较，并认为基于LSTM的自动伸缩代理能够提高吞吐量18％，函数执行效率13％，并增加8.4％的函数实例。

更新时间: 2024-11-11 23:54:51

领域: cs.DC,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2308.05937v2

Automatically Detecting Online Deceptive Patterns in Real-time

Deceptive patterns (DPs) in digital interfaces manipulate users into making unintended decisions, exploiting cognitive biases and psychological vulnerabilities. These patterns have become ubiquitous across various digital platforms. While efforts to mitigate DPs have emerged from legal and technical perspectives, a significant gap in usable solutions that empower users to identify and make informed decisions about DPs in real-time remains. In this work, we introduce AutoBot, an automated, deceptive pattern detector that analyzes websites' visual appearances using machine learning techniques to identify and notify users of DPs in real-time. AutoBot employs a two-staged pipeline that processes website screenshots, identifying interactable elements and extracting textual features without relying on HTML structure. By leveraging a custom language model, AutoBot understands the context surrounding these elements to determine the presence of deceptive patterns. We implement AutoBot as a lightweight Chrome browser extension that performs all analyses locally, minimizing latency and preserving user privacy. Through extensive evaluation, we demonstrate AutoBot's effectiveness in enhancing users' ability to navigate digital environments safely while providing a valuable tool for regulators to assess and enforce compliance with DP regulations.

Updated: 2024-11-11 23:49:02

标题: 实时自动检测在线欺诈模式

摘要: 数字界面中的欺骗模式（DPs）操纵用户做出意外决定，利用认知偏见和心理脆弱性。这些模式已经在各种数字平台上变得普遍。虽然从法律和技术角度出现了减轻DPs的努力，但仍存在一个显著的可用解决方案差距，使用户能够实时识别并做出关于DPs的知情决策。在这项工作中，我们介绍了AutoBot，一个自动欺骗模式检测器，使用机器学习技术分析网站的视觉外观，以实时识别和通知用户有关DPs。AutoBot采用一个两阶段的流程，处理网站截屏，识别可交互元素并提取文字特征，而不依赖于HTML结构。通过利用自定义语言模型，AutoBot了解围绕这些元素的上下文，以确定欺骗模式的存在。我们将AutoBot实现为轻量级的Chrome浏览器扩展程序，所有分析均在本地进行，最大程度地减少延迟并保护用户隐私。通过广泛的评估，我们展示了AutoBot在增强用户安全导航数字环境能力方面的有效性，同时为监管机构提供了一种有价值的工具，用于评估和执行DP规定的合规性。

更新时间: 2024-11-11 23:49:02

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2411.07441v1

SLOctolyzer: Fully automatic analysis toolkit for segmentation and feature extracting in scanning laser ophthalmoscopy images

Purpose: The purpose of this study was to introduce SLOctolyzer: an open-source analysis toolkit for en face retinal vessels in infrared reflectance scanning laser ophthalmoscopy (SLO) images. Methods: SLOctolyzer includes two main modules: segmentation and measurement. The segmentation module uses deep learning methods to delineate retinal anatomy, and detects the fovea and optic disc, whereas the measurement module quantifies the complexity, density, tortuosity, and calibre of the segmented retinal vessels. We evaluated the segmentation module using unseen data and measured its reproducibility. Results: SLOctolyzer's segmentation module performed well against unseen internal test data (Dice for all-vessels = 0.91; arteries = 0.84; veins = 0.85; optic disc = 0.94; and fovea = 0.88). External validation against severe retinal pathology showed decreased performance (Dice for arteries = 0.72; veins = 0.75; and optic disc = 0.90). SLOctolyzer had good reproducibility (mean difference for fractal dimension = -0.001; density = -0.0003; calibre = -0.32 microns; and tortuosity density = 0.001). SLOctolyzer can process a 768 x 768 pixel macula-centred SLO image in under 20 seconds and a disc-centred SLO image in under 30 seconds using a laptop CPU. Conclusions: To our knowledge, SLOctolyzer is the first open-source tool to convert raw SLO images into reproducible and clinically meaningful retinal vascular parameters. SLO images are captured simultaneous to optical coherence tomography (OCT), and we believe SLOctolyzer will be useful for extracting retinal vascular measurements from large OCT image sets and linking them to ocular or systemic diseases. It requires no specialist knowledge or proprietary software, and allows manual correction of segmentations and re-computing of vascular metrics. SLOctolyzer is freely available at https://github.com/jaburke166/SLOctolyzer.

Updated: 2024-11-11 23:25:00

标题: SLOctolyzer：扫描激光眼底成像图像分割和特征提取的全自动分析工具包

摘要: 目的：本研究旨在介绍SLOctolyzer：一种用于红外反射扫描激光眼底镜（SLO）图像中en face视网膜血管的开源分析工具包。方法：SLOctolyzer包括两个主要模块：分割和测量。分割模块使用深度学习方法勾画视网膜解剖学，并检测中央凹和视盘，而测量模块量化分割的视网膜血管的复杂性、密度、扭曲度和直径。我们使用未见数据评估了分割模块，并测量了其复现性。结果：SLOctolyzer的分割模块在未见内部测试数据中表现良好（所有血管的Dice = 0.91；动脉 = 0.84；静脉 = 0.85；视盘 = 0.94；中央凹 = 0.88）。针对严重视网膜病变的外部验证显示性能下降（动脉的Dice = 0.72；静脉 = 0.75；视盘 = 0.90）。SLOctolyzer具有良好的复现性（分维数的平均差异= -0.001；密度= -0.0003；直径= -0.32微米；扭曲度密度= 0.001）。SLOctolyzer可以在不到20秒的时间内处理768 x 768像素的中央凹SLO图像，以及在不到30秒的时间内处理以盘为中心的SLO图像，使用笔记本电脑CPU。结论：据我们所知，SLOctolyzer是第一个将原始SLO图像转换为可重复和临床意义视网膜血管参数的开源工具。SLO图像与光学相干断层扫描（OCT）同时获取，我们认为SLOctolyzer将有助于从大型OCT图像集中提取视网膜血管测量数据，并将其与眼部或全身疾病联系起来。它不需要专业知识或专有软件，并允许手动更正分割和重新计算血管指标。SLOctolyzer可在https://github.com/jaburke166/SLOctolyzer 免费获取。

更新时间: 2024-11-11 23:25:00

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.16466v2

SDN-Based Smart Cyber Switching (SCS) for Cyber Restoration of a Digital Substation

In recent years, critical infrastructure and power grids have increasingly been targets of cyber-attacks, causing widespread and extended blackouts. Digital substations are particularly vulnerable to such cyber incursions, jeopardizing grid stability. This paper addresses these risks by proposing a cybersecurity framework that leverages software-defined networking (SDN) to bolster the resilience of substations based on the IEC-61850 standard. The research introduces a strategy involving smart cyber switching (SCS) for mitigation and concurrent intelligent electronic device (CIED) for restoration, ensuring ongoing operational integrity and cybersecurity within a substation. The SCS framework improves the physical network's behavior (i.e., leveraging commercial SDN capabilities) by incorporating an adaptive port controller (APC) module for dynamic port management and an intrusion detection system (IDS) to detect and counteract malicious IEC-61850-based sampled value (SV) and generic object-oriented system event (GOOSE) messages within the substation's communication network. The framework's effectiveness is validated through comprehensive simulations and a hardware-in-the-loop (HIL) testbed, demonstrating its ability to sustain substation operations during cyber-attacks and significantly improve the overall resilience of the power grid.

Updated: 2024-11-11 23:22:02

标题: 基于SDN的智能网络交换（SCS）用于数字变电站的网络恢复

摘要: 近年来，关键基础设施和电网越来越成为网络攻击的目标，导致了广泛和长时间的停电。数字变电站特别容易受到这种网络入侵的影响，危及电网的稳定性。本文通过提出一个利用软件定义网络（SDN）来增强基于IEC-61850标准的变电站韧性的网络安全框架来解决这些风险。该研究引入了一种涉及智能网络切换（SCS）用于缓解和并行智能电子设备（CIED）用于恢复的策略，确保变电站内持续的运营完整性和网络安全性。SCS框架通过引入自适应端口控制器（APC）模块进行动态端口管理和入侵检测系统（IDS）来检测和对抗变电站通信网络中恶意的基于IEC-61850的采样值（SV）和通用面向对象系统事件（GOOSE）消息，来改善物理网络的行为（即利用商用SDN功能）。该框架的有效性通过全面的模拟和硬件在环（HIL）测试验证，展示了其在网络攻击期间维持变电站运营的能力，并显著提高了电网的整体韧性。

更新时间: 2024-11-11 23:22:02

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2411.07433v1

Fast unsupervised ground metric learning with tree-Wasserstein distance

The performance of unsupervised methods such as clustering depends on the choice of distance metric between features, or ground metric. Commonly, ground metrics are decided with heuristics or learned via supervised algorithms. However, since many datasets are unlabelled, unsupervised ground metric learning approaches have been introduced. One recent, promising option uses Wasserstein singular vectors (WSV), which emerge when computing optimal transport distances between features and samples simultaneously. While WSV is effective, it has complexity $\mathcal{O}(n^5)$, which is prohibitively expensive in some applications. In this work, we propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD). We demonstrate theoretically and empirically that the algorithm converges to a better approximation of the full WSV approach than the best known alternatives, and does so with $\mathcal{O}(n^3)$ complexity. In addition, we prove that the initial tree structure can be chosen flexibly, since tree geometry does not constrain the richness of the approximation up to the number of edge weights. This proof suggests a fast, recursive algorithm for computing the tree parameter basis set, which we find crucial to realising the efficiency gains at scale. Finally, we employ the tree-WSV algorithm to several single-cell RNA sequencing genomics datasets, demonstrating its scalability and utility for unsupervised cell-type clustering problems. These results poise unsupervised ground metric learning with TWD as a low-rank approximation of WSV with the potential for widespread low-compute application.

Updated: 2024-11-11 23:21:01

标题: 快速无监督地基于树-瓦瑟斯坦距离学习地面度量

摘要: 无监督方法（如聚类）的性能取决于特征之间的距离度量或基准度量的选择。通常，基准度量是通过启发式方法决定的，或者通过监督算法学习的。然而，由于许多数据集是无标记的，因此引入了无监督基准度量学习方法。最近，一种有前途的选择是使用Wasserstein奇异向量（WSV），在同时计算特征和样本之间的最优传输距离时产生。虽然WSV是有效的，但它的复杂度为O(n^5)，在某些应用中是无法接受的昂贵的。在这项工作中，我们建议通过将样本和特征嵌入树结构来增强WSV方法，我们在树上计算树Wasserstein距离（TWD）。我们在理论和实证上证明，该算法收敛到对全WSV方法的更好近似，比最佳已知替代方法更好，并且复杂度为O(n^3)。此外，我们证明初始树结构可以灵活选择，因为树的几何结构不会限制近似的丰富程度，直到边权值的数量。这个证明提出了一个快速递归算法来计算树参数基础集，我们发现这对实现规模上的效率提升至关重要。最后，我们将树-WSV算法应用于几个单细胞RNA测序基因组数据集，展示了其扩展性和对无监督细胞类型聚类问题的实用性。这些结果将无监督基准度量学习与TWD作为WSV的低秩近似，具有广泛的低计算应用潜力。

更新时间: 2024-11-11 23:21:01

领域: cs.LG

下载: http://arxiv.org/abs/2411.07432v1

DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

The safety alignment of Large Language Models (LLMs) is vulnerable to both manual and automated jailbreak attacks, which adversarially trigger LLMs to output harmful content. However, current methods for jailbreaking LLMs, which nest entire harmful prompts, are not effective at concealing malicious intent and can be easily identified and rejected by well-aligned LLMs. This paper discovers that decomposing a malicious prompt into separated sub-prompts can effectively obscure its underlying malicious intent by presenting it in a fragmented, less detectable form, thereby addressing these limitations. We introduce an automatic prompt \textbf{D}ecomposition and \textbf{R}econstruction framework for jailbreak \textbf{Attack} (DrAttack). DrAttack includes three key components: (a) `Decomposition' of the original prompt into sub-prompts, (b) `Reconstruction' of these sub-prompts implicitly by in-context learning with semantically similar but harmless reassembling demo, and (c) a `Synonym Search' of sub-prompts, aiming to find sub-prompts' synonyms that maintain the original intent while jailbreaking LLMs. An extensive empirical study across multiple open-source and closed-source LLMs demonstrates that, with a significantly reduced number of queries, DrAttack obtains a substantial gain of success rate over prior SOTA prompt-only attackers. Notably, the success rate of 78.0\% on GPT-4 with merely 15 queries surpassed previous art by 33.1\%. The project is available at https://github.com/xirui-li/DrAttack.

Updated: 2024-11-11 23:08:20

标题: DrAttack：快速分解和重建使得强大的LLM越狱者

摘要: 大型语言模型（LLMs）的安全对齐易受手动和自动越狱攻击的攻击，这些攻击会故意触发LLMs输出有害内容。然而，当前用于越狱LLMs的方法，即嵌套整个有害提示，无法有效隐藏恶意意图，并且很容易被对齐良好的LLMs识别和拒绝。本文发现，将恶意提示分解为分离的子提示可以有效地掩盖其潜在的恶意意图，通过以片段化、不太可检测的形式呈现它，从而解决这些限制。我们引入了一种自动提示分解和重构框架，用于越狱攻击（DrAttack）。DrAttack包括三个关键组件：（a）将原始提示分解为子提示，（b）通过与语义上类似但无害的重新组装演示进行上下文学习隐式地“重构”这些子提示，（c）“同义词搜索”子提示，旨在找到保持原始意图的子提示的同义词，同时越狱LLMs。通过对多个开源和闭源LLMs进行广泛的实证研究，DrAttack在比以往最优提示攻击者更少的查询次数下，获得了显著的成功率提升。值得注意的是，在仅15个查询的情况下，对GPT-4的成功率达到了78.0％，超过了先前33.1％的最佳成果。该项目可在https://github.com/xirui-li/DrAttack上找到。

更新时间: 2024-11-11 23:08:20

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.16914v3

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. To this end, we introduce SWE-bench, an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation tasks. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. The best-performing model, Claude 2, is able to solve a mere $1.96$% of the issues. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.

Updated: 2024-11-11 23:05:04

标题: SWE-bench：语言模型能够解决现实世界的GitHub问题吗？

摘要: 语言模型已经超出了我们有效评估它们的能力，但是为了它们未来的发展，研究它们能力的前沿至关重要。我们发现，现实世界的软件工程是评估下一代语言模型的一个丰富、可持续和具有挑战性的试验场。为此，我们引入了SWE-bench，这是一个评估框架，包括来自12个流行的Python仓库的$2,294$个软件工程问题，这些问题来自真实的GitHub问题和相应的拉取请求。给定一个代码库和一个要解决的问题描述，语言模型的任务是编辑代码库以解决问题。在SWE-bench中解决问题通常需要理解并协调跨多个函数、类甚至文件同时进行的更改，这要求模型与执行环境交互，处理极其长的上下文，并进行远远超出传统代码生成任务的复杂推理。我们的评估表明，无论是最先进的专有模型还是我们经过精细调整的模型SWE-Llama，都只能解决最简单的问题。表现最佳的模型Claude 2能够解决仅仅$1.96$%的问题。在SWE-bench上的进展代表着朝着更加实用、智能和自主的语言模型迈出的一步。

更新时间: 2024-11-11 23:05:04

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2310.06770v3

Just Label the Repeats for In-The-Wild Audio-to-Score Alignment

We propose an efficient workflow for high-quality offline alignment of in-the-wild performance audio and corresponding sheet music scans (images). Recent work on audio-to-score alignment extends dynamic time warping (DTW) to be theoretically able to handle jumps in sheet music induced by repeat signs-this method requires no human annotations, but we show that it often yields low-quality alignments. As an alternative, we propose a workflow and interface that allows users to quickly annotate jumps (by clicking on repeat signs), requiring a small amount of human supervision but yielding much higher quality alignments on average. Additionally, we refine audio and score feature representations to improve alignment quality by: (1) integrating measure detection into the score feature representation, and (2) using raw onset prediction probabilities from a music transcription model instead of piano roll. We propose an evaluation protocol for audio-to-score alignment that computes the distance between the estimated and ground truth alignment in units of measures. Under this evaluation, we find that our proposed jump annotation workflow and improved feature representations together improve alignment accuracy by 150% relative to prior work (33% to 82%).

Updated: 2024-11-11 23:05:02

标题: 在野外音频到乐谱对齐中仅标注重复部分

摘要: 我们提出了一个高效的工作流程，用于对野外表演音频和对应的乐谱扫描图像进行高质量的离线对齐。最近关于音频到乐谱对齐的研究将动态时间规整（DTW）扩展到理论上能够处理由重复符号引起的乐谱跳跃，这种方法不需要人类标注，但我们发现它经常产生低质量的对齐结果。作为替代方案，我们提出了一种工作流程和界面，允许用户快速注释跳跃（通过点击重复符号），需要少量人工监督，但平均产生更高质量的对齐结果。此外，我们通过以下方式改进音频和乐谱特征表示以提高对齐质量：（1）将节拍检测集成到乐谱特征表示中，（2）使用音乐转录模型的原始起始预测概率，而不是钢琴卷。我们提出了一种音频到乐谱对齐的评估协议，计算估计对齐与地面真实对齐之间以小节为单位的距离。在这种评估下，我们发现我们提出的跳跃注释工作流程和改进的特征表示共同将对齐准确性提高了150％，相对于之前的工作（从33％到82％）。

更新时间: 2024-11-11 23:05:02

领域: cs.SD,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2411.07428v1

Evaluating Detection Thresholds: The Impact of False Positives and Negatives on Super-Resolution Ultrasound Localization Microscopy

Super-resolution ultrasound imaging with ultrasound localization microscopy (ULM) offers a high-resolution view of microvascular structures. Yet, ULM image quality heavily relies on precise microbubble (MB) detection. Despite the crucial role of localization algorithms, there has been limited focus on the practical pitfalls in MB detection tasks such as setting the detection threshold. This study examines how False Positives (FPs) and False Negatives (FNs) affect ULM image quality by systematically adding controlled detection errors to simulated data. Results indicate that while both FP and FN rates impact Peak Signal-to-Noise Ratio (PSNR) similarly, increasing FP rates from 0\% to 20\% decreases Structural Similarity Index (SSIM) by 7\%, whereas same FN rates cause a greater drop of around 45\%. Moreover, dense MB regions are more resilient to detection errors, while sparse regions show high sensitivity, showcasing the need for robust MB detection frameworks to enhance super-resolution imaging.

Updated: 2024-11-11 22:58:56

标题: 评估检测阈值：假阳性和假阴性对超分辨率超声定位显微镜的影响

摘要: 超分辨率超声成像与超声定位显微镜（ULM）提供了微血管结构的高分辨率视图。然而，ULM图像质量严重依赖于精确的微泡（MB）检测。尽管定位算法起着至关重要的作用，但在MB检测任务中设置检测阈值等实际陷阱的关注有限。本研究通过向模拟数据系统地添加受控检测错误来检验假阳性（FPs）和假阴性（FNs）如何影响ULM图像质量。结果表明，虽然FP和FN率对峰值信噪比（PSNR）的影响相似，但将FP率从0\%增加到20\%会使结构相似性指数（SSIM）下降7\%，而相同的FN率会导致约45\%的降低。此外，密集的MB区域对检测错误更具弹性，而稀疏区域显示出较高的敏感性，展示了增强超分辨率成像的需要稳健的MB检测框架。

更新时间: 2024-11-11 22:58:56

领域: cs.AI

下载: http://arxiv.org/abs/2411.07426v1

Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration

Federated Learning has emerged as a promising paradigm for collaborative machine learning, while preserving user data privacy. Despite its potential, standard FL lacks support for diverse heterogeneous device prototypes, which vary significantly in model and dataset sizes -- from small IoT devices to large workstations. This limitation is only partially addressed by existing knowledge distillation techniques, which often fail to transfer knowledge effectively across a broad spectrum of device prototypes with varied capabilities. This failure primarily stems from two issues: the dilution of informative logits from more capable devices by those from less capable ones, and the use of a single integrated logits as the distillation target across all devices, which neglects their individual learning capacities and and the unique contributions of each. To address these challenges, we introduce TAKFL, a novel KD-based framework that treats the knowledge transfer from each device prototype's ensemble as a separate task, independently distilling each to preserve its unique contributions and avoid dilution. TAKFL also incorporates a KD-based self-regularization technique to mitigate the issues related to the noisy and unsupervised ensemble distillation process. To integrate the separately distilled knowledge, we introduce an adaptive task arithmetic knowledge integration process, allowing each student model to customize the knowledge integration for optimal performance. Additionally, we present theoretical results demonstrating the effectiveness of task arithmetic in transferring knowledge across heterogeneous devices with varying capacities. Comprehensive evaluations of our method across both CV and NLP tasks demonstrate that TAKFL achieves SOTA results in a variety of datasets and settings, significantly outperforming existing KD-based methods Code is released at https://github.com/MMorafah/TAKFL

Updated: 2024-11-11 22:57:16

标题: 朝着多样化设备异构联邦学习的方向：通过任务算术知识集成

摘要: 联合学习已经成为一种有前途的协作机器学习范式，同时保护用户数据隐私。尽管具有潜力，标准的联合学习缺乏对不同异构设备原型的支持，这些设备在模型和数据集大小上存在显著差异 - 从小型物联网设备到大型工作站。现有的知识蒸馏技术只在一定程度上解决了这一限制，通常无法有效地在各种设备原型之间传输知识。这种失败主要源于两个问题：更具能力的设备的信息logits被不太具能力的设备的信息logits稀释，以及在所有设备上使用单一的综合logits作为蒸馏目标，忽视了它们的个体学习能力和每个设备的独特贡献。为了解决这些挑战，我们引入了TAKFL，这是一个基于KD的新框架，将每个设备原型的集成视为一个独立的任务，独立地蒸馏每个任务以保留其独特贡献并避免稀释。TAKFL还结合了一个基于KD的自我正则化技术，以减轻与嘈杂和无监督的集成蒸馏过程相关的问题。为了整合分别蒸馏的知识，我们引入了一种自适应任务算术知识整合过程，允许每个学生模型根据最佳性能自定义知识整合。此外，我们提出的理论结果展示了任务算术在传输不同容量异构设备上的知识时的有效性。我们在CV和NLP任务中对我们的方法进行了全面评估，结果表明TAKFL在各种数据集和设置中实现了SOTA结果，明显优于现有的基于KD的方法。代码发布在https://github.com/MMorafah/TAKFL。

更新时间: 2024-11-11 22:57:16

领域: cs.LG,cs.AI,cs.CV,cs.DC

下载: http://arxiv.org/abs/2409.18461v2

Predicting BWR Criticality with Data-Driven Machine Learning Model

One of the challenges in operating nuclear power plants is to decide the amount of fuel needed in a cycle. Large-scale nuclear power plants are designed to operate at base load, meaning that they are expected to always operate at full power. Economically, a nuclear power plant should burn enough fuel to maintain criticality until the end of a cycle (EOC). If the reactor goes subcritical before the end of a cycle, it may result in early coastdown as the fuel in the core is already depleted. On contrary, if the reactor still has significant excess reactivity by the end of a cycle, the remaining fuels will remain unused. In both cases, the plant may lose a significant amount of money. This work proposes an innovative method based on a data-driven deep learning model to estimate the excess criticality of a boiling water reactor.

Updated: 2024-11-11 22:57:11

标题: 用数据驱动的机器学习模型预测BWR的临界性

摘要: 运营核电站面临的挑战之一是决定循环中所需燃料的数量。大型核电站被设计为在基础负荷下运行，意味着它们预计始终以全功率运行。从经济上讲，核电站应燃烧足够的燃料以维持至循环结束的临界状态。如果反应堆在循环结束之前变为亚临界状态，可能会导致早期冷却，因为堆芯中的燃料已经耗尽。相反，如果反应堆在循环结束时仍具有显着的过剩反应性，则剩余的燃料将未被使用。在这两种情况下，该电厂可能会损失大量资金。本研究提出了一种基于数据驱动深度学习模型的创新方法，用于估计沸水反应堆的过剩临界性。

更新时间: 2024-11-11 22:57:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.07425v1

Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources

Machine learning is increasingly used to select which individuals receive limited-resource interventions in domains such as human services, education, development, and more. However, it is often not apparent what the right quantity is for models to predict. In particular, policymakers rarely have access to data from a randomized controlled trial (RCT) that would enable accurate estimates of treatment effects -- which individuals would benefit more from the intervention. Observational data is more likely to be available, creating a substantial risk of bias in treatment effect estimates. Practitioners instead commonly use a technique termed "risk-based targeting" where the model is just used to predict each individual's status quo outcome (an easier, non-causal task). Those with higher predicted risk are offered treatment. There is currently almost no empirical evidence to inform which choices lead to the most effect machine learning-informed targeting strategies in social domains. In this work, we use data from 5 real-world RCTs in a variety of domains to empirically assess such choices. We find that risk-based targeting is almost always inferior to targeting based on even biased estimates of treatment effects. Moreover, these results hold even when the policymaker has strong normative preferences for assisting higher-risk individuals. Our results imply that, despite the widespread use of risk prediction models in applied settings, practitioners may be better off incorporating even weak evidence about heterogeneous causal effects to inform targeting.

Updated: 2024-11-11 22:36:50

标题: 比较有限资源情况下最大化社会福利的目标策略

摘要: 机器学习越来越被用于选择哪些个体在人类服务、教育、发展等领域接受有限资源的干预。然而，往往不清楚模型预测的正确数量是多少。特别是，决策者很少能够获得能够准确估计治疗效果的数据，即哪些个体将更多受益于干预的随机对照试验（RCT）数据。观察数据更容易获得，这样会在治疗效果估计中产生重大的偏差风险。实践者通常使用一种称为“基于风险的定位”技术，其中模型仅用于预测每个个体的现状结果（一项更简单的、非因果的任务）。那些预测风险较高的个体将获得治疗。目前几乎没有经验证据来指导在社会领域中哪些选择会导致最有效的机器学习辅助的定位策略。在这项工作中，我们使用来自5个不同领域的真实世界RCT数据来实证评估这些选择。我们发现，基于风险的定位几乎总是不如基于偏差估计的治疗效果的定位。此外，即使政策制定者对帮助风险更高的个体有强烈的规范偏好，这些结果仍然成立。我们的结果意味着，尽管风险预测模型在应用领域被广泛使用，但实践者可能更好地将有关异质因果效应的弱证据纳入到定位中。

更新时间: 2024-11-11 22:36:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.07414v1

ODEStream: A Buffer-Free Online Learning Framework with ODE-based Adaptor for Streaming Time Series Forecasting

Addressing the challenges of irregularity and concept drift in streaming time series is crucial in real-world predictive modelling. Previous studies in time series continual learning often propose models that require buffering of long sequences, potentially restricting the responsiveness of the inference system. Moreover, these models are typically designed for regularly sampled data, an unrealistic assumption in real-world scenarios. This paper introduces ODEStream, a novel buffer-free continual learning framework that incorporates a temporal isolation layer that integrates temporal dependencies within the data. Simultaneously, it leverages the capability of neural ordinary differential equations to process irregular sequences and generate a continuous data representation, enabling seamless adaptation to changing dynamics in a data streaming scenario. Our approach focuses on learning how the dynamics and distribution of historical data change with time, facilitating the direct processing of streaming sequences. Evaluations on benchmark real-world datasets demonstrate that ODEStream outperforms the state-of-the-art online learning and streaming analysis baselines, providing accurate predictions over extended periods while minimising performance degradation over time by learning how the sequence dynamics change.

Updated: 2024-11-11 22:36:33

标题: ODEStream：一种基于ODE的适配器的无缓冲在线学习框架，用于流式时间序列预测

摘要: 在实时预测建模中，解决流式时间序列中的不规则性和概念漂移挑战至关重要。以往的时间序列持续学习研究经常提出需要缓冲长序列的模型，可能限制推理系统的响应性。此外，这些模型通常设计用于定期采样的数据，这在现实世界中是不现实的假设。本文介绍了ODEStream，这是一个新颖的无缓冲持续学习框架，它包含一个集成数据中时间依赖性的时间隔离层。同时，它利用神经常微分方程处理不规则序列并生成连续数据表示，使其能够无缝适应数据流场景中的动态变化。我们的方法专注于学习历史数据的动态和分布随时间变化的方式，从而便于直接处理流式序列。对基准实际数据集的评估表明，ODEStream胜过最先进的在线学习和流式分析基准线，能够在延长的时间段内提供准确的预测，并通过学习序列动态变化的方式最小化随时间的性能下降。

更新时间: 2024-11-11 22:36:33

领域: cs.LG

下载: http://arxiv.org/abs/2411.07413v1

Controllable Context Sensitivity and the Knob Behind It

When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge. To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context (Paris is in England) and a question (Where is Paris?); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either France or England). When fine-tuned on this task, instruction-tuned versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge. Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family. Finally, we show a strong correlation between a model's performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace. These results suggest a single subspace facilitates how the model chooses between context and prior knowledge, hinting at a simple fundamental mechanism that controls this behavior.

Updated: 2024-11-11 22:22:21

标题: 可控的上下文敏感性及其背后的调节按钮

摘要: 在进行预测时，语言模型必须权衡其依赖上下文和先前知识的程度。选择模型对上下文的敏感度是一项基本功能，因为它使模型能够在检索增强生成和问答等任务中表现出色。在本文中，我们寻找一个控制这种敏感度的旋钮，确定语言模型是从上下文还是先前知识中回答问题。为了引导这一搜索，我们设计了一个可控上下文敏感度的任务。在这个任务中，我们首先向模型提供一个上下文（巴黎在英格兰）和一个问题（巴黎在哪里？）；然后指导模型使用其先前知识或上下文知识，并评估其是否生成了正确答案（法国或英格兰）。在这个任务上进行微调后，Llama-3.1、Mistral-v0.3和Gemma-2的指导调整版本可以以高准确度（85-95%）解决这个问题。通过分析这些表现优异的模型，我们使用一种新颖的线性时间算法缩小了可能对上下文敏感性重要的层。然后，在每个模型中，我们确定了一个单层中编码模型是遵循上下文还是先前知识的1-D子空间。有趣的是，虽然我们在一个经过微调的模型中确定了这个子空间，但我们发现完全相同的子空间不仅在该模型中起到了有效的旋钮作用，而且在该模型系列的非微调指导和基础模型中也是如此。最后，我们展示了模型性能与其在这个子空间中如何明显区分遵循上下文和忽略上下文答案之间的关系。这些结果表明一个单一的子空间促进了模型在上下文和先前知识之间的选择，暗示了一个控制这种行为的简单基本机制。

更新时间: 2024-11-11 22:22:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.07404v1

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Large Language Models (LLMs) demonstrate strong reasoning abilities but face limitations such as hallucinations and outdated knowledge. Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) addresses these issues by grounding LLM outputs in structured external knowledge from KGs. However, current KG-based RAG frameworks still struggle to optimize the trade-off between retrieval effectiveness and efficiency in identifying a suitable amount of relevant graph information for the LLM to digest. We introduce SubgraphRAG, extending the KG-based RAG framework that retrieves subgraphs and leverages LLMs for reasoning and answer prediction. Our approach innovatively integrates a lightweight multilayer perceptron with a parallel triple-scoring mechanism for efficient and flexible subgraph retrieval while encoding directional structural distances to enhance retrieval effectiveness. The size of retrieved subgraphs can be flexibly adjusted to match the query's need and the downstream LLM's capabilities. This design strikes a balance between model complexity and reasoning power, enabling scalable and generalizable retrieval processes. Notably, based on our retrieved subgraphs, smaller LLMs like Llama3.1-8B-Instruct deliver competitive results with explainable reasoning, while larger models like GPT-4o achieve state-of-the-art accuracy compared with previous baselines -- all without fine-tuning. Extensive evaluations on the WebQSP and CWQ benchmarks highlight SubgraphRAG's strengths in efficiency, accuracy, and reliability by reducing hallucinations and improving response grounding.

Updated: 2024-11-11 22:18:14

标题: 简单即有效：图表和大型语言模型在基于知识图谱的检索增强生成中的作用

摘要: Large Language Models (LLMs)展示出强大的推理能力，但面临幻觉和过时知识等限制。基于知识图谱（KG）的检索增强生成（RAG）通过将LLM的输出与KG中的结构化外部知识联系起来，解决了这些问题。然而，当前基于KG的RAG框架仍在努力优化检索效果和效率之间的平衡，以识别LLM消化的适量相关图信息。我们引入了SubgraphRAG，扩展了基于KG的RAG框架，检索子图并利用LLMs进行推理和答案预测。我们的方法创新地将轻量级多层感知器与并行三元评分机制相结合，用于高效灵活地检索子图，同时编码方向性结构距离以增强检索效果。检索到的子图的大小可以灵活调整以匹配查询的需求和下游LLM的能力。这种设计在模型复杂性和推理能力之间取得平衡，实现了可扩展和可推广的检索过程。值得注意的是，基于我们检索到的子图，像Llama3.1-8B-Instruct这样的较小LLMs可以提供具有可解释推理的竞争性结果，而像GPT-4o这样的更大模型与以往基准相比实现了最先进的准确性--所有这些都没有进行微调。对WebQSP和CWQ基准的广泛评估突出了SubgraphRAG在效率、准确性和可靠性方面的优势，通过减少幻觉和改善响应基础。

更新时间: 2024-11-11 22:18:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.20724v2

Extrinsically-Focused Evaluation of Omissions in Medical Summarization

Large language models (LLMs) have shown promise in safety-critical applications such as healthcare, yet the ability to quantify performance has lagged. An example of this challenge is in evaluating a summary of the patient's medical record. A resulting summary can enable the provider to get a high-level overview of the patient's health status quickly. Yet, a summary that omits important facts about the patient's record can produce a misleading picture. This can lead to negative consequences on medical decision-making. We propose MED-OMIT as a metric to explore this challenge. We focus on using provider-patient history conversations to generate a subjective (a summary of the patient's history) as a case study. We begin by discretizing facts from the dialogue and identifying which are omitted from the subjective. To determine which facts are clinically relevant, we measure the importance of each fact to a simulated differential diagnosis. We compare MED-OMIT's performance to that of clinical experts and find broad agreement We use MED-OMIT to evaluate LLM performance on subjective generation and find some LLMs (gpt-4 and llama-3.1-405b) work well with little effort, while others (e.g. Llama 2) perform worse.

Updated: 2024-11-11 22:17:17

标题: 医学总结中遗漏的外部评估

摘要: 大型语言模型（LLMs）在医疗保健等安全关键应用中表现出潜力，但对性能进行量化的能力滞后。一个例子是评估患者医疗记录摘要时的挑战。生成的摘要可以让医生快速了解患者的健康状况。然而，如果摘要省略了关于患者记录的重要事实，可能会产生误导性图片。这可能对医疗决策产生负面影响。我们提出MED-OMIT作为探索这一挑战的度量标准。我们关注使用医生-患者历史对话来生成一个主观（患者历史的摘要）作为案例研究。我们从对话中离散化事实并确定哪些被省略在主观中。为了确定哪些事实在临床上相关，我们衡量每个事实对模拟的鉴别诊断的重要性。我们将MED-OMIT的表现与临床专家进行比较，并发现广泛的一致性。我们使用MED-OMIT来评估LLM在主观生成上的表现，并发现一些LLMs（如gpt-4和llama-3.1-405b）效果良好而付出较少努力，而其他一些（例如Llama 2）表现较差。

更新时间: 2024-11-11 22:17:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.08303v2

LoRA-BERT: a Natural Language Processing Model for Robust and Accurate Prediction of long non-coding RNAs

Long non-coding RNAs (lncRNAs) serve as crucial regulators in numerous biological processes. Although they share sequence similarities with messenger RNAs (mRNAs), lncRNAs perform entirely different roles, providing new avenues for biological research. The emergence of next-generation sequencing technologies has greatly advanced the detection and identification of lncRNA transcripts and deep learning-based approaches have been introduced to classify long non-coding RNAs (lncRNAs). These advanced methods have significantly enhanced the efficiency of identifying lncRNAs. However, many of these methods are devoid of robustness and accuracy due to the extended length of the sequences involved. To tackle this issue, we have introduced a novel pre-trained bidirectional encoder representation called LoRA-BERT. LoRA-BERT is designed to capture the importance of nucleotide-level information during sequence classification, leading to more robust and satisfactory outcomes. In a comprehensive comparison with commonly used sequence prediction tools, we have demonstrated that LoRA-BERT outperforms them in terms of accuracy and efficiency. Our results indicate that, when utilizing the transformer model, LoRA-BERT achieves state-of-the-art performance in predicting both lncRNAs and mRNAs for human and mouse species. Through the utilization of LoRA-BERT, we acquire valuable insights into the traits of lncRNAs and mRNAs, offering the potential to aid in the comprehension and detection of diseases linked to lncRNAs in humans.

Updated: 2024-11-11 22:17:01

标题: LoRA-BERT：一种用于长非编码RNA稳健准确预测的自然语言处理模型

摘要: 长链非编码RNA（lncRNA）在许多生物过程中起着至关重要的调节作用。虽然它们与信使RNA（mRNA）具有序列相似性，但lncRNA扮演着完全不同的角色，为生物研究提供了新的途径。下一代测序技术的出现极大地推动了lncRNA转录物的检测和鉴定，并引入了基于深度学习的方法来对长链非编码RNA（lncRNA）进行分类。这些先进的方法显著提高了识别lncRNAs的效率。然而，由于涉及序列的长度延长，许多这些方法缺乏鲁棒性和准确性。为了解决这个问题，我们引入了一种名为LoRA-BERT的新型预先训练的双向编码器表示。LoRA-BERT旨在捕捉序列分类过程中核苷酸级信息的重要性，从而产生更加鲁棒且令人满意的结果。通过与常用的序列预测工具进行全面比较，我们证明了LoRA-BERT在准确性和效率方面优于它们。我们的结果表明，利用变形金刚模型，LoRA-BERT在预测人类和小鼠物种的lncRNAs和mRNAs方面取得了最先进的性能。通过LoRA-BERT的利用，我们获得了有价值的对lncRNAs和mRNAs特性的见解，为帮助理解并检测与人类lncRNAs相关的疾病提供了潜力。

更新时间: 2024-11-11 22:17:01

领域: q-bio.GN,cs.LG

下载: http://arxiv.org/abs/2411.08073v1

Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews

With the increasing proliferation of mobile applications in our everyday experiences, the concerns surrounding ethics have surged significantly. Users generally communicate their feedback, report issues, and suggest new functionalities in application (app) reviews, frequently emphasizing safety, privacy, and accountability concerns. Incorporating these reviews is essential to developing successful products. However, app reviews related to ethical concerns generally use domain-specific language and are expressed using a more varied vocabulary. Thus making automated ethical concern-related app review extraction a challenging and time-consuming effort. This study proposes a novel Natural Language Processing (NLP) based approach that combines Natural Language Inference (NLI), which provides a deep comprehension of language nuances, and a decoder-only (LLaMA-like) Large Language Model (LLM) to extract ethical concern-related app reviews at scale. Utilizing 43,647 app reviews from the mental health domain, the proposed methodology 1) Evaluates four NLI models to extract potential privacy reviews and compares the results of domain-specific privacy hypotheses with generic privacy hypotheses; 2) Evaluates four LLMs for classifying app reviews to privacy concerns; and 3) Uses the best NLI and LLM models further to extract new privacy reviews from the dataset. Results show that the DeBERTa-v3-base-mnli-fever-anli NLI model with domain-specific hypotheses yields the best performance, and Llama3.1-8B-Instruct LLM performs best in the classification of app reviews. Then, using NLI+LLM, an additional 1,008 new privacy-related reviews were extracted that were not identified through the keyword-based approach in previous research, thus demonstrating the effectiveness of the proposed approach.

Updated: 2024-11-11 22:08:48

标题: 超越关键词：一种基于上下文的混合方法来挖掘与道德关注相关的应用评论

摘要: 随着移动应用程序在我们日常体验中的不断增加，围绕伦理问题的关注显著增加。用户通常在应用程序评论中表达他们的反馈，报告问题，并提出新功能建议，经常强调安全性、隐私和责任方面的顾虑。将这些评论纳入是开发成功产品的关键。然而，与伦理问题相关的应用程序评论通常使用特定领域的语言，并使用更多样化的词汇表达。因此，使自动化的伦理问题相关应用程序评论提取变得具有挑战性且耗时。本研究提出了一种基于自然语言处理（NLP）的新方法，结合了自然语言推理（NLI）和仅解码器（类似于LLaMA）的大型语言模型（LLM），以大规模提取与伦理问题相关的应用程序评论。利用来自心理健康领域的43,647条应用程序评论，所提出的方法：1）评估四个NLI模型以提取潜在的隐私评论，并将特定领域隐私假设的结果与通用隐私假设进行比较；2）评估四个LLM以对应用程序评论进行隐私问题分类；3）进一步使用最佳NLI和LLM模型从数据集中提取新的隐私评论。结果表明，具有特定领域假设的DeBERTa-v3-base-mnli-fever-anli NLI模型表现最佳，而Llama3.1-8B-Instruct LLM在应用程序评论分类中表现最佳。然后，使用NLI+LLM，额外提取了1,008条新的与隐私相关的评论，这些评论在以往基于关键词的方法中未被识别，从而证明了所提出方法的有效性。

更新时间: 2024-11-11 22:08:48

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2411.07398v1

Data-Centric Learning Framework for Real-Time Detection of Aiming Beam in Fluorescence Lifetime Imaging Guided Surgery

This study introduces a novel data-centric approach to improve real-time surgical guidance using fiber-based fluorescence lifetime imaging (FLIm). A key aspect of the methodology is the accurate detection of the aiming beam, which is essential for localizing points used to map FLIm measurements onto the tissue region within the surgical field. The primary challenge arises from the complex and variable conditions encountered in the surgical environment, particularly in Transoral Robotic Surgery (TORS). Uneven illumination in the surgical field can cause reflections, reduce contrast, and results in inconsistent color representation, further complicating aiming beam detection. To overcome these challenges, an instance segmentation model was developed using a data-centric training strategy that improves accuracy by minimizing label noise and enhancing detection robustness. The model was evaluated on a dataset comprising 40 in vivo surgical videos, demonstrating a median detection rate of 85%. This performance was maintained when the model was integrated in a clinical system, achieving a similar detection rate of 85% during TORS procedures conducted in patients. The system's computational efficiency, measured at approximately 24 frames per second (FPS), was sufficient for real-time surgical guidance. This study enhances the reliability of FLIm-based aiming beam detection in complex surgical environments, advancing the feasibility of real-time, image-guided interventions for improved surgical precision

Updated: 2024-11-11 22:04:32

标题: 基于数据中心的学习框架用于荧光寿命成像引导手术中瞄准光束的实时检测

摘要: 这项研究引入了一种新颖的数据中心方法，以改进基于光纤荧光寿命成像（FLIm）的实时外科引导。方法论的一个关键方面是准确检测瞄准光束，这对于定位用于将FLIm测量映射到手术领域内组织区域的点至关重要。主要挑战来自外科环境中遇到的复杂和多变条件，特别是在经口机器人辅助外科（TORS）中。手术领域内的不均匀照明会导致反射，降低对比度，并导致颜色表现不一致，进一步复杂化瞄准光束的检测。为了克服这些挑战，开发了一个实例分割模型，使用数据中心训练策略来提高准确性，通过最小化标签噪声和增强检测稳健性。该模型在一个由40个在体手术视频组成的数据集上进行了评估，表现出85%的中位检测率。当该模型集成到临床系统中时，其性能得以维持，在患者进行TORS手术过程中实现了85%的相似检测率。系统的计算效率，约为每秒24帧（FPS），足以实现实时外科引导。这项研究提高了在复杂外科环境中基于FLIm的瞄准光束检测的可靠性，推动了实时图像引导干预的可行性，以提高外科精度。

更新时间: 2024-11-11 22:04:32

领域: cs.AI

下载: http://arxiv.org/abs/2411.07395v1

Respecting the limit:Bayesian optimization with a bound on the optimal value

In many real-world optimization problems, we have prior information about what objective function values are achievable. In this paper, we study the scenario that we have either exact knowledge of the minimum value or a, possibly inexact, lower bound on its value. We propose bound-aware Bayesian optimization (BABO), a Bayesian optimization method that uses a new surrogate model and acquisition function to utilize such prior information. We present SlogGP, a new surrogate model that incorporates bound information and adapts the Expected Improvement (EI) acquisition function accordingly. Empirical results on a variety of benchmarks demonstrate the benefit of taking prior information about the optimal value into account, and that the proposed approach significantly outperforms existing techniques. Furthermore, we notice that even in the absence of prior information on the bound, the proposed SlogGP surrogate model still performs better than the standard GP model in most cases, which we explain by its larger expressiveness.

Updated: 2024-11-11 22:03:27

标题: 尊重限制：带有最优值上限的贝叶斯优化

摘要: 在许多现实世界的优化问题中，我们对可以实现的目标函数值有先验信息。在本文中，我们研究了一种情况，即我们要么精确知道最小值的值，要么知道其值的下界，可能是不精确的。我们提出了一种称为边界感知贝叶斯优化（BABO）的贝叶斯优化方法，该方法使用新的代理模型和获取函数来利用这种先验信息。我们提出了SlogGP，一种新的代理模型，它结合了边界信息，并相应地调整了期望改进（EI）获取函数。对各种基准测试的实证结果表明，考虑到最优值的先验信息的好处，并且所提出的方法明显优于现有技术。此外，我们注意到即使在没有关于边界的先验信息的情况下，所提出的SlogGP代理模型在大多数情况下仍表现优于标准GP模型，我们解释这是由于其更大的表达能力。

更新时间: 2024-11-11 22:03:27

领域: cs.LG

下载: http://arxiv.org/abs/2411.04744v2

Feature-Space Semantic Invariance: Enhanced OOD Detection for Open-Set Domain Generalization

Open-set domain generalization addresses a real-world challenge: training a model to generalize across unseen domains (domain generalization) while also detecting samples from unknown classes not encountered during training (open-set recognition). However, most existing approaches tackle these issues separately, limiting their practical applicability. To overcome this limitation, we propose a unified framework for open-set domain generalization by introducing Feature-space Semantic Invariance (FSI). FSI maintains semantic consistency across different domains within the feature space, enabling more accurate detection of OOD instances in unseen domains. Additionally, we adopt a generative model to produce synthetic data with novel domain styles or class labels, enhancing model robustness. Initial experiments show that our method improves AUROC by 9.1% to 18.9% on ColoredMNIST, while also significantly increasing in-distribution classification accuracy.

Updated: 2024-11-11 21:51:45

标题: 特征空间语义不变性：增强开放域通用化的OOD检测

摘要: 开放域泛化解决了一个现实世界的挑战：训练一个模型以在未见过的域上实现泛化（域泛化），同时还能检测在训练过程中未遇到的未知类别的样本（开放集识别）。然而，大多数现有方法分别解决这些问题，限制了它们的实际适用性。为了克服这一局限性，我们提出了一个统一的开放域泛化框架，引入了特征空间语义不变性（FSI）。FSI在特征空间中保持不同域之间的语义一致性，从而更准确地检测在未见域中的OOD实例。此外，我们采用生成模型产生具有新颖域风格或类标签的合成数据，增强模型的鲁棒性。初步实验表明，我们的方法在ColoredMNIST上将AUROC提高了9.1％至18.9％，同时显著提高了在内分布分类准确性。

更新时间: 2024-11-11 21:51:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.07392v1

FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs?

There is great interest in fine-tuning frontier large language models (LLMs) to inject new information and update existing knowledge. While commercial LLM fine-tuning APIs from providers such as OpenAI and Google promise flexible adaptation for various applications, the efficacy of fine-tuning remains unclear. In this study, we introduce FineTuneBench, an evaluation framework and dataset for understanding how well commercial fine-tuning APIs can successfully learn new and updated knowledge. We analyze five frontier LLMs with commercially available fine-tuning APIs, including GPT-4o and Gemini 1.5 Pro, on their effectiveness in two settings: (1) ingesting novel information, such as recent news events and new people profiles, and (2) updating existing knowledge, such as updated medical guidelines and code frameworks. Our results reveal substantial shortcomings in all the models' abilities to effectively learn new information through fine-tuning, with an average generalization accuracy of 37% across all models. When updating existing knowledge, such as incorporating medical guideline updates, commercial fine-tuning APIs show even more limited capability (average generalization accuracy of 19%). Overall, fine-tuning GPT-4o mini is the most effective for infusing new knowledge and updating knowledge, followed by GPT-3.5 Turbo and GPT-4o. The fine-tuning APIs for Gemini 1.5 Flesh and Gemini 1.5 Pro are unable to learn new knowledge or update existing knowledge. These findings underscore a major shortcoming in using current commercial fine-tuning services to achieve reliable knowledge infusion in common scenarios. We open source the FineTuneBench dataset at https://github.com/kevinwu23/StanfordFineTuneBench.

Updated: 2024-11-11 21:48:52

标题: FineTuneBench：商业微调API在LLMs中如何将知识融入？

摘要: 对于微调前沿大型语言模型（LLMs）以注入新信息和更新现有知识引起了极大的兴趣。虽然像OpenAI和谷歌等提供商的商业LLM微调API承诺为各种应用提供灵活的适应性，但微调的有效性仍不清楚。在这项研究中，我们引入了FineTuneBench，一个评估框架和数据集，用于了解商业微调API能够成功学习新知识和更新知识的程度。我们分析了五个具有商业微调API的前沿LLMs，包括GPT-4o和Gemini 1.5 Pro，在两个方面的有效性：（1）摄取新信息，如最新新闻事件和新人物简介，以及（2）更新现有知识，如更新的医学指南和代码框架。我们的结果显示，在所有模型中，微调通过有效学习新信息的能力存在显著缺陷，所有模型的平均泛化准确率为37%。在更新现有知识方面，例如整合医学指南更新，商业微调API的能力更为有限（平均泛化准确率为19%）。总体而言，微调GPT-4o mini在注入新知识和更新知识方面效果最好，其次是GPT-3.5 Turbo和GPT-4o。Gemini 1.5 Flesh和Gemini 1.5 Pro的微调API无法学习新知识或更新现有知识。这些发现强调了在常见场景中使用当前商业微调服务来实现可靠知识注入的主要缺陷。我们在https://github.com/kevinwu23/StanfordFineTuneBench上开源FineTuneBench数据集。

更新时间: 2024-11-11 21:48:52

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2411.05059v2

Federated Learning Client Pruning for Noisy Labels

Federated Learning (FL) enables collaborative model training across decentralized edge devices while preserving data privacy. However, existing FL methods often assume clean annotated datasets, impractical for resource-constrained edge devices. In reality, noisy labels are prevalent, posing significant challenges to FL performance. Prior approaches attempt label correction and robust training techniques but exhibit limited efficacy, particularly under high noise levels. This paper introduces ClipFL (Federated Learning Client Pruning), a novel framework addressing noisy labels from a fresh perspective. ClipFL identifies and excludes noisy clients based on their performance on a clean validation dataset, tracked using a Noise Candidacy Score (NCS). The framework comprises three phases: pre-client pruning to identify potential noisy clients and calculate their NCS, client pruning to exclude a percentage of clients with the highest NCS, and post-client pruning for fine-tuning the global model with standard FL on clean clients. Empirical evaluation demonstrates ClipFL's efficacy across diverse datasets and noise levels, achieving accurate noisy client identification, superior performance, faster convergence, and reduced communication costs compared to state-of-the-art FL methods. Our code is available at https://github.com/MMorafah/ClipFL.

Updated: 2024-11-11 21:46:34

标题: 联邦学习客户端修剪在嘈杂标签下的应用

摘要: 联邦学习（FL）使分散的边缘设备之间能够进行协作模型训练，同时保护数据隐私。然而，现有的FL方法通常假设数据集具有干净的注释，这对资源受限的边缘设备来说并不切实际。事实上，嘈杂的标签很常见，给FL性能带来了重大挑战。先前的方法尝试标签校正和强健训练技术，但在高噪声水平下表现出有限的有效性。本文介绍了ClipFL（联邦学习客户端修剪），这是一个从新的角度解决嘈杂标签的新框架。ClipFL根据客户端在干净验证数据集上的表现来识别和排除噪声客户端，使用噪声候选分数（NCS）进行跟踪。该框架包括三个阶段：预客户端修剪以识别潜在的噪声客户端并计算其NCS，客户端修剪以排除具有最高NCS的客户端的百分比，以及后客户端修剪用于在干净客户端上使用标准FL进行全局模型的微调。实证评估表明，与最先进的FL方法相比，ClipFL在各种数据集和噪声水平下均具有高效性，实现了准确识别嘈杂客户端、卓越性能、更快的收敛速度和减少的通信成本。我们的代码可在https://github.com/MMorafah/ClipFL找到。

更新时间: 2024-11-11 21:46:34

领域: cs.LG,cs.AI,cs.CV,cs.DC

下载: http://arxiv.org/abs/2411.07391v1

Communication-Efficient Federated Group Distributionally Robust Optimization

Federated learning faces challenges due to the heterogeneity in data volumes and distributions at different clients, which can compromise model generalization ability to various distributions. Existing approaches to address this issue based on group distributionally robust optimization (GDRO) often lead to high communication and sample complexity. To this end, this work introduces algorithms tailored for communication-efficient Federated Group Distributionally Robust Optimization (FGDRO). Our contributions are threefold: Firstly, we introduce the FGDRO-CVaR algorithm, which optimizes the average top-K losses while reducing communication complexity to $O(1/\epsilon^4)$, where $\epsilon$ denotes the desired precision level. Secondly, our FGDRO-KL algorithm is crafted to optimize KL regularized FGDRO, cutting communication complexity to $O(1/\epsilon^3)$. Lastly, we propose FGDRO-KL-Adam to utilize Adam-type local updates in FGDRO-KL, which not only maintains a communication cost of $O(1/\epsilon^3)$ but also shows potential to surpass SGD-type local steps in practical applications. The effectiveness of our algorithms has been demonstrated on a variety of real-world tasks, including natural language processing and computer vision.

Updated: 2024-11-11 21:42:53

标题: 高效通信的联邦式组分布式鲁棒优化

摘要: 联邦学习面临挑战，因为不同客户端的数据量和分布异质性，可能会影响模型泛化能力到各种分布。现有的解决这个问题的方法基于群体分布鲁棒优化（GDRO），通常会导致高通信和样本复杂性。为此，本文介绍了专门针对通信高效的联邦群体分布鲁棒优化（FGDRO）的算法。我们的贡献有三个方面：首先，我们介绍了FGDRO-CVaR算法，该算法优化了平均top-K损失，同时将通信复杂性降低到$O(1/\epsilon^4)$，其中$\epsilon$表示所需的精度水平。其次，我们的FGDRO-KL算法是为了优化KL正则化的FGDRO，将通信复杂性降至$O(1/\epsilon^3)$。最后，我们提出了FGDRO-KL-Adam，利用Adam类型的本地更新在FGDRO-KL中，不仅保持了通信成本为$O(1/\epsilon^3)$，而且在实际应用中显示出超越SGD类型本地步骤的潜力。我们的算法的有效性已在各种现实世界任务中得到证实，包括自然语言处理和计算机视觉。

更新时间: 2024-11-11 21:42:53

领域: cs.LG,cs.DC,stat.ML

下载: http://arxiv.org/abs/2410.06369v2

Firing Rate Models as Associative Memory: Excitatory-Inhibitory Balance for Robust Retrieval

Firing rate models are dynamical systems widely used in applied and theoretical neuroscience to describe local cortical dynamics in neuronal populations. By providing a macroscopic perspective of neuronal activity, these models are essential for investigating oscillatory phenomena, chaotic behavior, and associative memory processes. Despite their widespread use, the application of firing rate models to associative memory networks has received limited mathematical exploration, and most existing studies are focused on specific models. Conversely, well-established associative memory designs, such as Hopfield networks, lack key biologically-relevant features intrinsic to firing rate models, including positivity and interpretable synaptic matrices that reflect excitatory and inhibitory interactions. To address this gap, we propose a general framework that ensures the emergence of re-scaled memory patterns as stable equilibria in the firing rate dynamics. Furthermore, we analyze the conditions under which the memories are locally and globally asymptotically stable, providing insights into constructing biologically-plausible and robust systems for associative memory retrieval.

Updated: 2024-11-11 21:40:57

标题: Firing Rate Models作为关联记忆：激励抑制平衡用于稳健检索

摘要: 发射率模型是在应用和理论神经科学中广泛使用的动力系统，用于描述神经元群体中的局部皮层动态。通过提供神经元活动的宏观视角，这些模型对于研究振荡现象、混沌行为和联想记忆过程至关重要。尽管它们被广泛应用，但将发射率模型应用于联想记忆网络的数学探索有限，大多数现有研究都集中在特定模型上。相反，已经建立的联想记忆设计，如霍普菲尔德网络，缺乏对发射率模型固有的关键生物相关特征，包括反应和抑制相互作用的可解释突触矩阵。为了弥补这一差距，我们提出了一个确保在发射率动态中稳定平衡的重新缩放记忆模式出现的一般框架。此外，我们分析了记忆何时在局部和全局上渐近稳定的条件，为构建具有生物可信性和稳健性的联想记忆检索系统提供了见解。

更新时间: 2024-11-11 21:40:57

领域: q-bio.NC,cond-mat.dis-nn,cond-mat.stat-mech,cs.AI,math.DS,37N25 (Primary) 34D45, 34D23 (Secondary),I.2.11; I.5.1

下载: http://arxiv.org/abs/2411.07388v1

Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets

Binary code similarity detection is an important problem with applications in areas such as malware analysis, vulnerability research and license violation detection. This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets. A call graphlet encodes the neighborhood around each function in a binary executable, capturing the local and global context through a series of statistical features. A specialized graph neural network model operates on this graph representation, learning to map it to a feature vector that encodes semantic binary code similarities using deep-metric learning. The proposed approach is evaluated across five distinct datasets covering different architectures, compiler tool chains, and optimization levels. Experimental results show that the combination of call graphlets and the novel graph neural network architecture achieves comparable or state-of-the-art performance compared to baseline techniques across cross-architecture, mono-architecture and zero shot tasks. In addition, our proposed approach also performs well when evaluated against an out-of-domain function inlining task. The work provides a general and effective graph neural network-based solution for conducting binary code similarity detection.

Updated: 2024-11-11 21:40:16

标题: 了解你的邻域：由调用图组成的通用和零样本能力的二进制函数搜索

摘要: 二进制代码相似性检测是一个重要的问题，具有在恶意软件分析、漏洞研究和许可违规检测等领域的应用。本文提出了一种新颖的图神经网络架构，结合一种称为调用图的新颖图数据表示。调用图编码了二进制可执行文件中每个函数周围的邻域，通过一系列统计特征捕捉了局部和全局上下文。一种专门的图神经网络模型在这种图表示上运行，学习将其映射到一个特征向量，通过深度度量学习来编码语义二进制代码相似性。所提出的方法在涵盖不同架构、编译器工具链和优化级别的五个不同数据集上进行评估。实验结果显示，调用图和新颖的图神经网络架构的结合在跨架构、单架构和零样本任务中与基准技术相比达到了可比或最新技术水平的性能。此外，我们提出的方法在评估与域外函数内联任务时也表现良好。这项工作提供了一个通用而有效的基于图神经网络的解决方案，用于进行二进制代码相似性检测。

更新时间: 2024-11-11 21:40:16

领域: cs.CR,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2406.02606v2

Data-Driven Analysis of AI in Medical Device Software in China: Deep Learning and General AI Trends Based on Regulatory Data

Artificial intelligence (AI) in medical device software (MDSW) represents a transformative clinical technology, attracting increasing attention within both the medical community and the regulators. In this study, we leverage a data-driven approach to automatically extract and analyze AI-enabled medical devices (AIMD) from the National Medical Products Administration (NMPA) regulatory database. The continued increase in publicly available regulatory data requires scalable methods for analysis. Automation of regulatory information screening is essential to create reproducible insights that can be quickly updated in an ever changing medical device landscape. More than 4 million entries were assessed, identifying 2,174 MDSW registrations, including 531 standalone applications and 1,643 integrated within medical devices, of which 43 were AI-enabled. It was shown that the leading medical specialties utilizing AIMD include respiratory (20.5%), ophthalmology/endocrinology (12.8%), and orthopedics (10.3%). This approach greatly improves the speed of data extracting providing a greater ability to compare and contrast. This study provides the first extensive, data-driven exploration of AIMD in China, showcasing the potential of automated regulatory data analysis in understanding and advancing the landscape of AI in medical technology.

Updated: 2024-11-11 21:28:50

标题: 基于监管数据的中国医疗器械软件中人工智能数据驱动分析：基于深度学习和通用人工智能趋势

摘要: 医疗设备软件（MDSW）中的人工智能（AI）代表着一种具有变革性的临床技术，吸引了医疗界和监管机构的日益关注。在这项研究中，我们利用数据驱动的方法自动提取和分析国家药品监督管理局（NMPA）监管数据库中的AI医疗设备（AIMD）。不断增加的公开可用的监管数据需要可扩展的分析方法。自动化监管信息筛选对于创建可重复的见解至关重要，这些见解可以在不断变化的医疗设备领域中快速更新。共评估了超过400万条记录，确定了2,174个MDSW注册，包括531个独立应用程序和1,643个集成在医疗设备中的应用程序，其中43个是AI启用的。结果显示，利用AIMD的主要医疗专业领域包括呼吸科（20.5%）、眼科/内分泌学（12.8%）和骨科（10.3%）。这种方法极大地提高了数据提取的速度，提供了更大的比较和对比能力。这项研究首次对中国的AIMD进行了广泛的、数据驱动的探索，展示了自动化监管数据分析在理解和推进医疗技术领域中人工智能的潜力。

更新时间: 2024-11-11 21:28:50

领域: cs.AI

下载: http://arxiv.org/abs/2411.07378v1

Ensemble Learning for Microbubble Localization in Super-Resolution Ultrasound

Super-resolution ultrasound (SR-US) is a powerful imaging technique for capturing microvasculature and blood flow at high spatial resolution. However, accurate microbubble (MB) localization remains a key challenge, as errors in localization can propagate through subsequent stages of the super-resolution process, affecting overall performance. In this paper, we explore the potential of ensemble learning techniques to enhance MB localization by increasing detection sensitivity and reducing false positives. Our study evaluates the effectiveness of ensemble methods on both in vivo and simulated outputs of a Deformable DEtection TRansformer (Deformable DETR) network. As a result of our study, we are able to demonstrate the advantages of these ensemble approaches by showing improved precision and recall in MB detection and offering insights into their application in SR-US.

Updated: 2024-11-11 21:26:36

标题: 超分辨率超声波中微泡定位的集成学习

摘要: 超分辨率超声（SR-US）是一种强大的成像技术，可在高空间分辨率下捕获微血管和血流。然而，准确的微泡（MB）定位仍然是一个关键挑战，因为定位错误可能会在超分辨率过程的后续阶段传播，影响整体性能。本文探讨了集成学习技术增强MB定位的潜力，通过提高检测灵敏度和减少误报。我们的研究评估了集成方法在可变形检测变换器（Deformable DETR）网络的实际和模拟输出上的有效性。通过我们的研究，我们能够展示这些集成方法的优势，显示了MB检测精度和召回率的改善，并提供了它们在SR-US中的应用见解。

更新时间: 2024-11-11 21:26:36

领域: eess.IV,cs.AI,physics.med-ph

下载: http://arxiv.org/abs/2411.07376v1

Identifying Differential Patient Care Through Inverse Intent Inference

Sepsis is a life-threatening condition defined by end-organ dysfunction due to a dysregulated host response to infection. Although the Surviving Sepsis Campaign has launched and has been releasing sepsis treatment guidelines to unify and normalize the care for sepsis patients, it has been reported in numerous studies that disparities in care exist across the trajectory of patient stay in the emergency department and intensive care unit. Here, we apply a number of reinforcement learning techniques including behavioral cloning, imitation learning, and inverse reinforcement learning, to learn the optimal policy in the management of septic patient subgroups using expert demonstrations. Then we estimate the counterfactual optimal policies by applying the model to another subset of unseen medical populations and identify the difference in cure by comparing it to the real policy. Our data comes from the sepsis cohort of MIMIC-IV and the clinical data warehouses of the Mass General Brigham healthcare system. The ultimate objective of this work is to use the optimal learned policy function to estimate the counterfactual treatment policy and identify deviations across sub-populations of interest. We hope this approach would help us identify any disparities in care and also changes in cure in response to the publication of national sepsis treatment guidelines.

Updated: 2024-11-11 21:21:32

标题: 通过逆向意图推断识别差异化患者护理

摘要: 脓毒症是一种危及生命的情况，由于机体对感染的不规则反应导致器官功能障碍。尽管“脓毒症存活运动”已经启动并发布了脓毒症治疗指南以统一和规范脓毒症患者的护理，但在许多研究中已经报道在急诊科和重症监护室患者停留过程中存在护理不一致。在这里，我们应用了一些强化学习技术，包括行为克隆、模仿学习和反向强化学习，通过专家演示来学习管理脓毒症患者亚组的最佳政策。然后，我们通过将模型应用于另一组未见过的医疗人群来估计反事实最优政策，并通过与实际政策比较来识别治愈差异。我们的数据来自MIMIC-IV的脓毒症队列和马萨诸塞一般布里格姆医疗系统的临床数据仓库。这项工作的最终目标是使用最佳学习政策函数来估计反事实治疗政策，并识别感兴趣的亚人群之间的偏差。我们希望这种方法能帮助我们识别护理不一致，并在国家脓毒症治疗指南发布后也能改变治愈情况。

更新时间: 2024-11-11 21:21:32

领域: cs.LG

下载: http://arxiv.org/abs/2411.07372v1

LLMs as Method Actors: A Model for Prompt Engineering and Architecture

We introduce "Method Actors" as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; and LLM responses as performances. We apply this mental model to the task of improving LLM performance at playing Connections, a New York Times word puzzle game that prior research identified as a challenging benchmark for evaluating LLM reasoning. Our experiments with GPT-4o show that a "Method Actors" approach can significantly improve LLM performance over both a vanilla and "Chain of Thoughts" approach. A vanilla approach solves 27% of Connections puzzles in our dataset and a "Chain of Thoughts" approach solves 41% of puzzles, whereas our strongest "Method Actor" approach solves 86% of puzzles. We also test OpenAI's newest model designed specifically for complex reasoning tasks, o1-preview. When asked to solve a puzzle all at once, o1-preview solves 79% of Connections puzzles in our dataset, and when allowed to build puzzle solutions one guess at a time over multiple API calls, o1-preview solves 100% of the puzzles. Incorporating a "Method Actor" prompt architecture increases the percentage of puzzles that o1-preview solves perfectly from 76% to 87%.

Updated: 2024-11-11 21:09:42

标题: LLMs作为方法演员：一种用于提示工程和架构的模型

摘要: 我们引入“Method Actors”作为指导LLM提示工程和提示架构的心智模型。在这种心智模型下，LLMs应被视为演员；提示为剧本和线索；LLM的响应为表演。我们将这种心智模型应用于改进LLM在玩《Connections》中的表现，该游戏是一款纽约时报的文字谜题游戏，先前的研究将其确定为评估LLM推理的具有挑战性的基准。我们与GPT-4o进行实验，结果表明“Method Actors”方法可以显著提高LLM的表现，超过了普通和“Chain of Thoughts”方法。普通方法在我们的数据集中解决了27%的Connections谜题，“Chain of Thoughts”方法解决了41%的谜题，而我们最强的“Method Actor”方法解决了86%的谜题。我们还测试了OpenAI专门设计用于复杂推理任务的最新模型o1-preview。当要求一次解决一个谜题时，o1-preview在我们的数据集中解决了79%的Connections谜题，当允许一次一个猜测地构建谜题解决方案，并通过多个API调用时，o1-preview解决了100%的谜题。采用“Method Actor”提示架构将o1-preview完美解决的谜题百分比从76%提高到87%。

更新时间: 2024-11-11 21:09:42

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.05778v2

LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Software is prone to security vulnerabilities. Program analysis tools to detect them have limited effectiveness in practice due to their reliance on human labeled specifications. Large language models (or LLMs) have shown impressive code generation capabilities but they cannot do complex reasoning over code to detect such vulnerabilities especially since this task requires whole-repository analysis. We propose IRIS, a neuro-symbolic approach that systematically combines LLMs with static analysis to perform whole-repository reasoning for security vulnerability detection. Specifically, IRIS leverages LLMs to infer taint specifications and perform contextual analysis, alleviating needs for human specifications and inspection. For evaluation, we curate a new dataset, CWE-Bench-Java, comprising 120 manually validated security vulnerabilities in real-world Java projects. A state-of-the-art static analysis tool CodeQL detects only 27 of these vulnerabilities whereas IRIS with GPT-4 detects 55 (+28) and improves upon CodeQL's average false discovery rate by 5% points. Furthermore, IRIS identifies 6 previously unknown vulnerabilities which cannot be found by existing tools.

Updated: 2024-11-11 21:05:43

标题: LLM辅助的静态分析技术用于检测安全漏洞

摘要: 软件容易受到安全漏洞的影响。用于检测这些漏洞的程序分析工具在实践中的效果有限，因为它们依赖于人工标记的规范。大型语言模型(LLMs)展示了令人印象深刻的代码生成能力，但它们无法进行复杂的代码推理来检测这些漏洞，特别是因为这项任务需要对整个代码库进行分析。我们提出了IRIS，这是一种神经符号方法，系统地将LLMs与静态分析相结合，以执行用于安全漏洞检测的整个代码库推理。具体来说，IRIS利用LLMs推断污点规范并执行上下文分析，减轻了对人工规范和检查的需求。为了评估，我们策划了一个新的数据集CWE-Bench-Java，其中包含120个在真实世界Java项目中经过手工验证的安全漏洞。一种最先进的静态分析工具CodeQL仅检测到了其中的27个漏洞，而具有GPT-4的IRIS检测到了55个(+28)，并且将CodeQL的平均误报率提高了5个百分点。此外，IRIS还识别出了6个以前未知的漏洞，这是现有工具无法找到的。

更新时间: 2024-11-11 21:05:43

领域: cs.CR,cs.PL,cs.SE

下载: http://arxiv.org/abs/2405.17238v2

Factorised Active Inference for Strategic Multi-Agent Interactions

Understanding how individual agents make strategic decisions within collectives is important for advancing fields as diverse as economics, neuroscience, and multi-agent systems. Two complementary approaches can be integrated to this end. The Active Inference framework (AIF) describes how agents employ a generative model to adapt their beliefs about and behaviour within their environment. Game theory formalises strategic interactions between agents with potentially competing objectives. To bridge the gap between the two, we propose a factorisation of the generative model whereby each agent maintains explicit, individual-level beliefs about the internal states of other agents, and uses them for strategic planning in a joint context. We apply our model to iterated general-sum games with 2 and 3 players, and study the ensemble effects of game transitions, where the agents' preferences (game payoffs) change over time. This non-stationarity, beyond that caused by reciprocal adaptation, reflects a more naturalistic environment in which agents need to adapt to changing social contexts. Finally, we present a dynamical analysis of key AIF quantities: the variational free energy (VFE) and the expected free energy (EFE) from numerical simulation data. The ensemble-level EFE allows us to characterise the basins of attraction of games with multiple Nash Equilibria under different conditions, and we find that it is not necessarily minimised at the aggregate level. By integrating AIF and game theory, we can gain deeper insights into how intelligent collectives emerge, learn, and optimise their actions in dynamic environments, both cooperative and non-cooperative.

Updated: 2024-11-11 21:04:43

标题: 分解式主动推断用于战略多智体互动

摘要: 理解个体代理商在集体中如何做出战略决策对于推进经济学、神经科学和多智能体系统等多样化领域至关重要。为此，可以整合两种互补的方法。主动推理框架（AIF）描述了代理商如何利用生成模型来调整他们对环境及行为的信念。博弈论形式化了代理商之间可能存在竞争目标的战略互动。为了弥合两者之间的差距，我们提出了一种生成模型的因式分解方法，其中每个代理商都保持对其他代理商内部状态的显式、个体级别的信念，并在共同环境中用于战略规划。我们将我们的模型应用于有2个和3个玩家的迭代一般和游戏，并研究游戏转换的集合效应，其中代理商的偏好（游戏收益）随时间变化。这种非稳态性，超出了由相互适应引起的，反映了一个更符合自然环境的情况，代理商需要适应不断变化的社会环境。最后，我们通过数值模拟数据的动力学分析了关键的AIF量：变分自由能（VFE）和期望自由能（EFE）。集合级别的EFE使我们能够描述具有多个Nash均衡的游戏在不同条件下的吸引盆地，并且发现它不一定在聚合水平上最小化。通过整合AIF和博弈论，我们可以更深入地了解智能集体如何在动态环境中出现、学习和优化他们的行动，无论是合作还是非合作的。

更新时间: 2024-11-11 21:04:43

领域: cs.MA,cs.GT,cs.LG

下载: http://arxiv.org/abs/2411.07362v1

Grounding Large Language Models In Embodied Environment With Imperfect World Models

Despite a widespread success in various applications, large language models (LLMs) often stumble when tackling basic physical reasoning or executing robotics tasks, due to a lack of direct experience with the physical nuances of the real world. To address these issues, we propose a Grounding Large language model with Imperfect world MOdel (GLIMO), which utilizes proxy world models such as simulators to collect and synthesize trining data. GLIMO incorporates an LLM agent-based data generator to automatically create high-quality and diverse instruction datasets. The generator includes an iterative self-refining module for temporally consistent experience sampling, a diverse set of question-answering instruction seeds, and a retrieval-augmented generation module for reflecting on prior experiences. Comprehensive experiments show that our approach improve the performance of strong open-source LLMs like LLaMA-3 with a performance boost of 2.04 $\times$, 1.54 $\times$, and 1.82 $\times$ across three different benchmarks, respectively. The performance is able to compete with or surpass their larger counterparts such as GPT-4.

Updated: 2024-11-11 20:33:03

标题: 在不完美的世界模型中将大型语言模型接地于具体环境

摘要: 尽管大型语言模型（LLMs）在各种应用中取得了广泛成功，但在处理基本的物理推理或执行机器人任务时，通常会遇到困难，这是由于缺乏对真实世界物理细微差别的直接经验。为了解决这些问题，我们提出了一种具有不完美世界模型的基础大型语言模型（GLIMO），它利用代理世界模型（如模拟器）收集和合成训练数据。GLIMO集成了一个基于LLM代理的数据生成器，可以自动创建高质量和多样化的指令数据集。生成器包括一个用于时间一致性经验采样的迭代自我完善模块，一个多样化的问答指令种子集，以及一个用于反思先前经验的检索增强生成模块。全面的实验表明，我们的方法提高了强大的开源LLMs（如LLaMA-3）的性能，在三个不同的基准测试中，性能提高了2.04倍、1.54倍和1.82倍，分别。该性能能够与或超越像GPT-4这样的更大型号的模型竞争。

更新时间: 2024-11-11 20:33:03

领域: cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.02742v2

Exchangeable Sequence Models Can Naturally Quantify Uncertainty Over Latent Concepts

Intelligent agents must be able to articulate its own uncertainty. In this work, we show that pre-trained sequence models are naturally capable of probabilistic reasoning over exchangeable data points -- forming informed beliefs and sharpening them as it gathers more information. A sequence model learns the relationship between observations, which differs from typical Bayesian models that quantify uncertainty over latent parameters through priors and likelihoods (e.g., topic models). Despite the apparent difference, we illustrate how exchangeable sequence modeling provides a valid Bayesian model by going back to De Finetti's classical predictive view of probabilistic reasoning: uncertainty comes from data that has not been observed yet, rather than latent parameters. From this perspective, pre-training autoregressive models is equivalent to formulating informed beliefs based on prior observations ("empirical Bayes"), and forward generation is equivalent to simulating instantiations of an environment ("posterior inference"). In particular, exchangeable sequence models can explicitly perform statistical inference; epistemic uncertainty over latent environments is captured by variation in predicted future observations. Formally, we show the sequence prediction loss controls the quality of uncertainty quantification, and propose several approaches for encoding exchangeability in sequence model architectures: data augmentation, regularization, and causal masking.

Updated: 2024-11-11 20:23:44

标题: 可交换序列模型可以自然地量化潜在概念的不确定性

摘要: 智能代理必须能够表达自身的不确定性。在这项工作中，我们展示了预先训练的序列模型自然地能够对可交换数据点进行概率推理，形成明智的信念并在获取更多信息时加以锐化。序列模型学习观察之间的关系，这与典型的贝叶斯模型不同，后者通过先验和似然量化潜在参数的不确定性（例如主题模型）。尽管表面上有差异，我们通过回溯到德芬内蒂（De Finetti）经典的概率推理预测观点，说明了可交换序列建模提供了一个有效的贝叶斯模型：不确定性来自尚未观察到的数据，而非潜在参数。从这个角度来看，预训练自回归模型等同于基于先前观察到的信念制定明智的信仰（"经验贝叶斯"），而向前生成等同于模拟环境的实例化（"后验推断"）。特别地，可交换序列模型可以明确执行统计推断；对潜在环境的认识不确定性通过预测未来观察值的变化来捕捉。形式上，我们展示了序列预测损失控制不确定性量化的质量，并提出了几种在序列模型架构中编码可交换性的方法：数据增强、正则化和因果屏蔽。

更新时间: 2024-11-11 20:23:44

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2408.03307v2

Exploring Variational Autoencoders for Medical Image Generation: A Comprehensive Study

Variational autoencoder (VAE) is one of the most common techniques in the field of medical image generation, where this architecture has shown advanced researchers in recent years and has developed into various architectures. VAE has advantages including improving datasets by adding samples in smaller datasets and in datasets with imbalanced classes, and this is how data augmentation works. This paper provides a comprehensive review of studies on VAE in medical imaging, with a special focus on their ability to create synthetic images close to real data so that they can be used for data augmentation. This study reviews important architectures and methods used to develop VAEs for medical images and provides a comparison with other generative models such as GANs on issues such as image quality, and low diversity of generated samples. We discuss recent developments and applications in several medical fields highlighting the ability of VAEs to improve segmentation and classification accuracy.

Updated: 2024-11-11 20:12:13

标题: 探索变分自动编码器用于医学图像生成：一项全面研究

摘要: 变分自动编码器（VAE）是医学图像生成领域中最常见的技术之一，这种体系结构近年来已经展示出先进的研究成果，并发展成各种不同的架构。VAE具有许多优势，包括通过在较小的数据集和不平衡类别的数据集中添加样本来改进数据集，这就是数据增强的工作原理。本文全面回顾了有关医学图像中VAE的研究，重点关注它们生成接近真实数据的合成图像的能力，以便用于数据增强。本研究审查了用于开发医学图像的VAE的重要架构和方法，并与其他生成模型（如GANs）进行了比较，涉及图像质量和生成样本的低多样性等问题。我们讨论了几个医学领域的最新发展和应用，突出了VAE改进分割和分类准确性的能力。

更新时间: 2024-11-11 20:12:13

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2411.07348v1

Transformers represent belief state geometry in their residual stream

What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stream of transformers, even in cases where the predicted belief state geometry has highly nontrivial fractal structure. We investigate cases where the belief state geometry is represented in the final residual stream or distributed across the residual streams of multiple layers, providing a framework to explain these observations. Furthermore we demonstrate that the inferred belief states contain information about the entire future, beyond the local next-token prediction that the transformers are explicitly trained on. Our work provides a general framework connecting the structure of training data to the geometric structure of activations inside transformers.

Updated: 2024-11-11 20:09:51

标题: 变压器在其残差流中代表信念状态几何形状

摘要: 我们在训练大型语言模型时，将什么样的计算结构构建到其中，以便进行下一个标记的预测？在这里，我们提出证据表明，这种结构由对数据生成过程的隐藏状态进行信念更新的元动态所确定。利用最优预测理论，我们预期并发现信念状态在变压器的剩余流中被线性表示，即使在预测的信念状态几何结构具有高度非平凡分形结构的情况下也是如此。我们调查了信念状态几何结构在最终剩余流中或分布在多层剩余流中的情况，提供了一个框架来解释这些观察结果。此外，我们证明了推断出的信念状态包含有关整个未来的信息，超出了变压器明确训练的局部下一个标记预测。我们的工作提供了一个通用框架，将训练数据的结构与变压器内部激活的几何结构联系起来。

更新时间: 2024-11-11 20:09:51

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.15943v2

Attribute-to-Delete: Machine Unlearning via Datamodel Matching

Machine unlearning -- efficiently removing the effect of a small "forget set" of training data on a pre-trained machine learning model -- has recently attracted significant research interest. Despite this interest, however, recent work shows that existing machine unlearning techniques do not hold up to thorough evaluation in non-convex settings. In this work, we introduce a new machine unlearning technique that exhibits strong empirical performance even in such challenging settings. Our starting point is the perspective that the goal of unlearning is to produce a model whose outputs are statistically indistinguishable from those of a model re-trained on all but the forget set. This perspective naturally suggests a reduction from the unlearning problem to that of data attribution, where the goal is to predict the effect of changing the training set on a model's outputs. Thus motivated, we propose the following meta-algorithm, which we call Datamodel Matching (DMM): given a trained model, we (a) use data attribution to predict the output of the model if it were re-trained on all but the forget set points; then (b) fine-tune the pre-trained model to match these predicted outputs. In a simple convex setting, we show how this approach provably outperforms a variety of iterative unlearning algorithms. Empirically, we use a combination of existing evaluations and a new metric based on the KL-divergence to show that even in non-convex settings, DMM achieves strong unlearning performance relative to existing algorithms. An added benefit of DMM is that it is a meta-algorithm, in the sense that future advances in data attribution translate directly into better unlearning algorithms, pointing to a clear direction for future progress in unlearning.

Updated: 2024-11-11 20:02:41

标题: 要删除的属性：通过数据模型匹配实现机器遗忘

摘要: 机器遗忘——有效地消除预训练机器学习模型上一小部分“遗忘集”训练数据的影响——最近引起了相当大的研究兴趣。然而，尽管存在这种兴趣，最近的研究表明现有的机器遗忘技术在非凸设置中无法经受彻底评估。在这项工作中，我们引入了一种新的机器遗忘技术，即使在这种具有挑战性的设置中，也展现出强大的实证表现。我们的出发点是，遗忘的目标是产生一个模型，其输出在统计上与重新训练了除了遗忘集之外的全部数据的模型的输出不可区分。这种观点自然地将遗忘问题降低到数据归因的问题上，其中目标是预测改变训练集对模型输出的影响。因此，我们提出了以下元算法，我们称之为数据模型匹配（DMM）：给定一个训练模型，我们（a）使用数据归因来预测模型在重新训练了除了遗忘点之外的所有数据时的输出；然后（b）微调预训练模型以匹配这些预测输出。在简单的凸设置中，我们展示了这种方法明显优于各种迭代遗忘算法。实证上，我们使用现有评估和基于KL散度的新指标的组合来表明，即使在非凸设置中，DMM相对于现有算法实现了强大的遗忘性能。DMM的一个附加好处是它是一个元算法，即将来数据归因的进展直接转化为更好的遗忘算法，为未来遗忘领域的进展指明了明确的方向。

更新时间: 2024-11-11 20:02:41

领域: cs.LG

下载: http://arxiv.org/abs/2410.23232v2

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning

Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated as expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-iterate convergence guarantees under (weak) gradient domination assumptions, improving and generalizing existing results. Then, we design C-PGAE and C-PGPE, the action-based and the parameter-based versions of C-PG, respectively, and we illustrate how they naturally extend to constraints defined in terms of risk measures over the costs, as it is often requested in safety-critical scenarios. Finally, we numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines, demonstrating their effectiveness.

Updated: 2024-11-11 20:02:38

标题: 策略梯度在受限强化学习中的最终收敛

摘要: Constrained Reinforcement Learning（CRL）解决了代理人需要通过最大化预期回报来实现目标，同时满足领域特定约束条件的顺序决策问题，这些约束条件通常被规定为预期成本。在这种情况下，基于策略的方法被广泛使用，因为它们在处理连续控制问题时具有几个优点。这些方法在策略空间中搜索，采用基于动作或基于参数的探索策略，具体取决于它们是直接学习随机策略的参数还是随机超策略的参数。在本文中，我们提出了一个通过基于梯度的原始-对偶算法来解决CRL问题的通用框架，依靠交替的上升/下降方案和双变量正则化。我们引入了一种不考虑探索的算法，称为C-PG，在（弱）梯度支配假设下具有全局最后迭代收敛保证，改进和泛化了现有结果。然后，我们设计了C-PGAE和C-PGPE，分别是基于动作和基于参数的C-PG版本，并说明它们如何自然地扩展到以成本风险度量为基础的约束条件，因为在安全关键场景中经常需要。最后，我们在受约束控制问题上对我们的算法进行了数值验证，并与最先进的基线进行了比较，展示了它们的有效性。

更新时间: 2024-11-11 20:02:38

领域: cs.LG

下载: http://arxiv.org/abs/2407.10775v2

Warmstarting for Scaling Language Models

Scaling model sizes to scale performance has worked remarkably well for the current large language models paradigm. The research and empirical findings of various scaling studies led to novel scaling results and laws that guides subsequent research. High training costs for contemporary scales of data and models result in a lack of thorough understanding of how to tune and arrive at such training setups. One direction to ameliorate the cost of pretraining large models is to warmstart the large-scale training from smaller models that are cheaper to tune. In this work, we attempt to understand if the behavior of optimal hyperparameters can be retained under warmstarting for scaling. We explore simple operations that allow the application of theoretically motivated methods of zero-shot transfer of optimal hyperparameters using {\mu}Transfer. We investigate the aspects that contribute to the speedup in convergence and the preservation of stable training dynamics under warmstarting with {\mu}Transfer. We find that shrinking smaller model weights, zero-padding, and perturbing the resulting larger model with scaled initialization from {\mu}P enables effective warmstarting of $\mut{}$.

Updated: 2024-11-11 20:02:29

标题: 缩放语言模型的Warmstarting

摘要: 将模型大小与性能进行比例缩放对于当前大型语言模型范式效果显著。各种缩放研究的研究和实证发现导致了新颖的缩放结果和法则，指导了随后的研究。当代数据和模型规模的高训练成本导致缺乏对如何调整和达到这种训练设置的彻底理解。缓解预训练大型模型成本的一个方向是从更便宜调整的较小模型开始进行大规模训练。在这项工作中，我们尝试了解在缩放时是否可以保留最佳超参数的行为。我们探索简单的操作，允许使用理论动机的方法进行零-shot传递最佳超参数，使用μTransfer。我们研究了在μTransfer下通过温启动加速收敛并保持稳定训练动态的方面。我们发现，在μP的缩放初始化下，缩小较小模型权重、零填充和扰动结果更大的模型能够有效地进行μ的温启动。

更新时间: 2024-11-11 20:02:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.07340v1

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.

Updated: 2024-11-11 20:01:15

标题: SWE-agent：代理-计算机界面实现自动化软件工程

摘要: 语言模型（LM）代理越来越被用于自动化数字环境中的复杂任务。就像人类受益于强大的软件应用程序，比如集成开发环境，用于复杂任务如软件工程，我们认为LM代理代表了一类具有自己需求和能力的终端用户，并且会受益于专门为他们使用的软件构建的界面。我们研究了界面设计如何影响语言模型代理的性能。作为这一探索的结果，我们介绍了SWE-agent：一个系统，使LM代理能够自主使用计算机来解决软件工程任务。SWE-agent的定制代理-计算机界面（ACI）显著增强了代理创建和编辑代码文件，浏览整个代码库以及执行测试和其他程序的能力。我们在SWE-bench和HumanEvalFix上评估了SWE-agent，在两者上都取得了最新颖的表现，分别为12.5%和87.7%的通过率，远远超过了以前使用非交互式LM所取得的最新颖成果。最后，我们提供了有关ACI设计如何影响代理行为和性能的见解。

更新时间: 2024-11-11 20:01:15

领域: cs.SE,cs.AI,cs.CL,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.15793v3

Compact Model Parameter Extraction via Derivative-Free Optimization

In this paper, we address the problem of compact model parameter extraction to simultaneously extract tens of parameters via derivative-free optimization. Traditionally, parameter extraction is performed manually by dividing the complete set of parameters into smaller subsets, each targeting different operational regions of the device, a process that can take several days or weeks. Our approach streamlines this process by employing derivative-free optimization to identify a good parameter set that best fits the compact model without performing an exhaustive number of simulations. We further enhance the optimization process to address three critical issues in device modeling by carefully choosing a loss function that focuses on relative errors rather than absolute errors to ensure consistent performance across different orders of magnitude, prioritizes accuracy in key operational regions above a specific threshold, and reduces sensitivity to outliers. Furthermore, we utilize the concept of train-test split to assess the model fit and avoid overfitting. We demonstrate the effectiveness of our approach by successfully modeling a diamond Schottky diode with the SPICE diode model and a GaN-on-SiC HEMT with the ASM-HEMT model. For the latter, which involves extracting 35 parameters for the ASM-HEMT DC model, we identified the best set of parameters in under 6,000 trials. Additional examples using both devices are provided to demonstrate robustness to outliers, showing that an excellent fit is achieved even with over 25% of the data purposely corrupted. These examples demonstrate the practicality of our approach, highlighting the benefits of derivative-free optimization in device modeling.

Updated: 2024-11-11 20:00:02

标题: 无导数优化方法进行紧凑模型参数提取

摘要: 在这篇论文中，我们解决了通过无导数优化同时提取数十个参数的紧凑模型参数提取问题。传统上，参数提取是通过手动将完整参数集分成较小的子集来进行，每个子集针对设备的不同操作区域，这个过程可能需要几天甚至几周的时间。我们的方法通过使用无导数优化来简化这个过程，以确定最适合紧凑模型的一组参数，而无需进行大量模拟。我们进一步改进了优化过程，以解决设备建模中的三个关键问题，通过仔细选择一个损失函数，该函数侧重于相对误差而不是绝对误差，以确保在不同数量级上保持一致的性能，优先考虑关键操作区域的准确性超过特定阈值，并减少对异常值的敏感性。此外，我们利用训练-测试分割的概念来评估模型拟合度并避免过度拟合。我们通过成功地使用SPICE二极管模型对钻石肖特基二极管和ASM-HEMT模型对氮化镓-SiC HEMT进行建模来展示我们方法的有效性。对于后者，涉及提取ASM-HEMT DC模型的35个参数，我们在不到6000次尝试中确定了最佳参数集。使用这两种设备提供了额外的示例来展示对异常值的鲁棒性，即使有超过25%的数据被故意破坏，也能实现良好的拟合。这些示例展示了我们方法的实用性，突显了无导数优化在设备建模中的益处。

更新时间: 2024-11-11 20:00:02

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.16355v2

Multimodal Fusion Balancing Through Game-Theoretic Regularization

Multimodal learning can complete the picture of information extraction by uncovering key dependencies between data sources. However, current systems fail to fully leverage multiple modalities for optimal performance. This has been attributed to modality competition, where modalities strive for training resources, leaving some underoptimized. We show that current balancing methods struggle to train multimodal models that surpass even simple baselines, such as ensembles. This raises the question: how can we ensure that all modalities in multimodal training are sufficiently trained, and that learning from new modalities consistently improves performance? This paper proposes the Multimodal Competition Regularizer (MCR), a new loss component inspired by mutual information (MI) decomposition designed to prevent the adverse effects of competition in multimodal training. Our key contributions are: 1) Introducing game-theoretic principles in multimodal learning, where each modality acts as a player competing to maximize its influence on the final outcome, enabling automatic balancing of the MI terms. 2) Refining lower and upper bounds for each MI term to enhance the extraction of task-relevant unique and shared information across modalities. 3) Suggesting latent space permutations for conditional MI estimation, significantly improving computational efficiency. MCR outperforms all previously suggested training strategies and is the first to consistently improve multimodal learning beyond the ensemble baseline, clearly demonstrating that combining modalities leads to significant performance gains on both synthetic and large real-world datasets.

Updated: 2024-11-11 19:53:05

标题: 多模态融合平衡通过博弈论规范化

摘要: 多模态学习可以通过发现数据源之间的关键依赖关系来完善信息提取的图景。然而，当前系统未能充分利用多种模态以实现最佳性能。这被归因于模态竞争，其中各模态争夺训练资源，导致一些模态性能不佳。我们展示了当前的平衡方法难以训练出超越简单基线（如集成模型）的多模态模型，这引发了一个问题：我们如何确保多模态训练中的所有模态都得到充分训练，以及从新模态中学习如何持续改善性能？本文提出了多模态竞争正则化器（MCR），这是一种受互信息（MI）分解启发设计的新损失组件，旨在防止多模态训练中竞争的不利影响。我们的主要贡献包括：1）在多模态学习中引入博弈论原则，其中每个模态充当竞争的玩家，以最大化其对最终结果的影响，实现MI项的自动平衡。2）优化每个MI项的下界和上界，以增强跨模态的任务相关唯一和共享信息的提取。3）建议使用潜在空间排列进行条件MI估计，显著提高计算效率。MCR优于先前提出的所有训练策略，并首次持续改善多模态学习，超越集成基线，清楚地表明结合模态可以在合成和大型真实世界数据集上获得显著的性能提升。

更新时间: 2024-11-11 19:53:05

领域: cs.LG,cs.AI,cs.CV,cs.GT,cs.MM

下载: http://arxiv.org/abs/2411.07335v1

Rethinking LLM Memorization through the Lens of Adversarial Compression

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself -- in other words, if these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. The ACR overcomes the limitations of existing notions of memorization by (i) offering an adversarial view of measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios.

Updated: 2024-11-11 19:47:16

标题: 重新审视通过对抗性压缩的视角下的LLM记忆化

摘要: 大型语言模型（LLMs）在网络规模数据集上训练引发了关于数据使用的重大担忧。一个主要问题是这些模型是否“记忆”了它们的所有训练数据，或者它们以某种更类似于人类学习和综合信息的方式整合了许多数据源。答案在很大程度上取决于我们如何定义记忆。在这项工作中，我们提出对LLMs中记忆进行评估的度量标准——对抗压缩比（ACR）。如果可以通过比字符串本身（远）更短的提示来引出训练数据中的某个字符串，则认为该字符串已被记忆——换句话说，如果这些字符串可以通过计算更少标记的对抗提示与模型“压缩”。ACR通过（i）提供一个对抗视角来测量记忆，特别是用于监测遗忘和合规；以及（ii）允许灵活地在相对低的计算量下测量任意字符串的记忆，克服了现有记忆概念的限制。我们的定义作为一个实用工具，用于确定模型所有者何时可能违反围绕数据使用的条款，提供了一个潜在的法律工具和一个批判性的视角，以应对这种情况。

更新时间: 2024-11-11 19:47:16

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.15146v3

Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations

While a large body of work inspects language models for biases concerning gender, race, occupation and religion, biases of geographical nature are relatively less explored. Some recent studies benchmark the degree to which large language models encode geospatial knowledge. However, the impact of the encoded geographical knowledge (or lack thereof) on real-world applications has not been documented. In this work, we examine large language models for two common scenarios that require geographical knowledge: (a) travel recommendations and (b) geo-anchored story generation. Specifically, we study four popular language models, and across about $100$K travel requests, and $200$K story generations, we observe that travel recommendations corresponding to poorer countries are less unique with fewer location references, and stories from these regions more often convey emotions of hardship and sadness compared to those from wealthier nations.

Updated: 2024-11-11 19:25:25

标题: 富裕国家的更丰富产出：揭示生成的故事和旅行建议中的地理差异

摘要: 尽管有大量研究检查语言模型在性别、种族、职业和宗教方面的偏见，但地理性质的偏见相对较少被探讨。一些最近的研究对大型语言模型编码地理空间知识的程度进行了基准测试。然而，编码的地理知识（或缺乏）对现实世界应用的影响尚未有记录。在这项工作中，我们对两种需要地理知识的常见情景下的大型语言模型进行了检查：（a）旅行建议和（b）地理锚定故事生成。具体而言，我们研究了四种流行的语言模型，在约100,000个旅行请求和200,000个故事生成中，我们观察到与贫困国家对应的旅行建议较少具有独特性，较少提及地点，并且这些地区的故事更经常传达困难和悲伤的情绪，与来自富裕国家的故事相比。

更新时间: 2024-11-11 19:25:25

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2411.07320v1

SynRL: Aligning Synthetic Clinical Trial Data with Human-preferred Clinical Endpoints Using Reinforcement Learning

Each year, hundreds of clinical trials are conducted to evaluate new medical interventions, but sharing patient records from these trials with other institutions can be challenging due to privacy concerns and federal regulations. To help mitigate privacy concerns, researchers have proposed methods for generating synthetic patient data. However, existing approaches for generating synthetic clinical trial data disregard the usage requirements of these data, including maintaining specific properties of clinical outcomes, and only use post hoc assessments that are not coupled with the data generation process. In this paper, we propose SynRL which leverages reinforcement learning to improve the performance of patient data generators by customizing the generated data to meet the user-specified requirements for synthetic data outcomes and endpoints. Our method includes a data value critic function to evaluate the quality of the generated data and uses reinforcement learning to align the data generator with the users' needs based on the critic's feedback. We performed experiments on four clinical trial datasets and demonstrated the advantages of SynRL in improving the quality of the generated synthetic data while keeping the privacy risks low. We also show that SynRL can be utilized as a general framework that can customize data generation of multiple types of synthetic data generators. Our code is available at https://anonymous.4open.science/r/SynRL-DB0F/.

Updated: 2024-11-11 19:19:46

标题: SynRL：使用强化学习将合成临床试验数据与人类首选临床终点指标对齐

摘要: 每年都有数百项临床试验用于评估新的医疗干预措施，但由于隐私问题和联邦法规的限制，与其他机构共享这些试验的患者记录可能具有挑战性。为了帮助缓解隐私问题，研究人员提出了生成合成患者数据的方法。然而，现有的生成合成临床试验数据的方法忽略了这些数据的使用要求，包括保持临床结果的特定属性，并且只使用与数据生成过程不相结合的事后评估。在本文中，我们提出了SynRL，利用强化学习来通过定制生成的数据以满足用户指定的合成数据结果和终点的要求，从而改善患者数据生成器的性能。我们的方法包括一个数据值评论函数来评估生成数据的质量，并使用强化学习根据评论的反馈来使数据生成器与用户需求对齐。我们对四个临床试验数据集进行了实验证明了SynRL在提高生成的合成数据质量的同时保持隐私风险较低的优势。我们还展示了SynRL可以作为一个通用框架，可以定制多种类型合成数据生成器的数据生成。我们的代码可以在https://anonymous.4open.science/r/SynRL-DB0F/上找到。

更新时间: 2024-11-11 19:19:46

领域: cs.LG

下载: http://arxiv.org/abs/2411.07317v1

Modeling variable guide efficiency in pooled CRISPR screens with ContrastiveVI+

Genetic screens mediated via CRISPR-Cas9 combined with high-content readouts have emerged as powerful tools for biological discovery. However, computational analyses of these screens come with additional challenges beyond those found with standard scRNA-seq analyses. For example, perturbation-induced variations of interest may be subtle and masked by other dominant source of variation shared with controls, and variable guide efficiency results in some cells not undergoing genetic perturbation despite expressing a guide RNA. While a number of methods have been developed to address the former problem by explicitly disentangling perturbation-induced variations from those shared with controls, less attention has been paid to the latter problem of noisy perturbation labels. To address this issue, here we propose ContrastiveVI+, a generative modeling framework that both disentangles perturbation-induced from non-perturbation-related variations while also inferring whether cells truly underwent genomic edits. Applied to three large-scale Perturb-seq datasets, we find that ContrastiveVI+ better recovers known perturbation-induced variations compared to previous methods while successfully identifying cells that escaped the functional consequences of guide RNA expression. An open-source implementation of our model is available at \url{https://github.com/insitro/contrastive_vi_plus}.

Updated: 2024-11-11 19:16:34

标题: 使用ContrastiveVI+模型对CRISPR池筛选中可变引导效率的建模

摘要: 通过CRISPR-Cas9介导的遗传筛选结合高内容读数已经成为生物发现的强大工具。然而，与标准scRNA-seq分析相比，对这些筛选的计算分析面临额外的挑战。例如，感兴趣的干扰引起的变化可能很微妙，并被与对照组共享的其他主要变化掩盖，而可变的引导效率导致一些细胞尽管表达引导RNA，但并未经历基因干扰。尽管已经开发了一些方法来解决前者问题，明确地将干扰引起的变化与与对照组共享的变化分开，但对于后者嘈杂的干扰标签问题付出的关注较少。为了解决这个问题，我们提出了ContrastiveVI+，一个生成式建模框架，既能将干扰引起的变化与非干扰相关变化区分开来，同时也能推断细胞是否真正经历了基因组编辑。应用于三个大规模Perturb-seq数据集，我们发现ContrastiveVI+相比以前的方法更好地恢复已知的干扰引起的变化，同时成功识别出逃脱了引导RNA表达的细胞的功能后果。我们的模型的开源实现可在\url{https://github.com/insitro/contrastive_vi_plus}找到。

更新时间: 2024-11-11 19:16:34

领域: q-bio.QM,cs.LG,q-bio.GN,stat.ML

下载: http://arxiv.org/abs/2411.08072v1

Harnessing Smartphone Sensors for Enhanced Road Safety: A Comprehensive Dataset and Review

Severe collisions can result from aggressive driving and poor road conditions, emphasizing the need for effective monitoring to ensure safety. Smartphones, with their array of built-in sensors, offer a practical and affordable solution for road-sensing. However, the lack of reliable, standardized datasets has hindered progress in assessing road conditions and driving patterns. This study addresses this gap by introducing a comprehensive dataset derived from smartphone sensors, which surpasses existing datasets by incorporating a diverse range of sensors including accelerometer, gyroscope, magnetometer, GPS, gravity, orientation, and uncalibrated sensors. These sensors capture extensive parameters such as acceleration force, gravitation, rotation rate, magnetic field strength, and vehicle speed, providing a detailed understanding of road conditions and driving behaviors. The dataset is designed to enhance road safety, infrastructure maintenance, traffic management, and urban planning. By making this dataset available to the community, the study aims to foster collaboration, inspire further research, and facilitate the development of innovative solutions in intelligent transportation systems.

Updated: 2024-11-11 19:15:29

标题: 利用智能手机传感器提升道路安全性：一份全面的数据集和综述

摘要: 严重的碰撞可能源于激烈驾驶和糟糕的道路条件，强调了有效监测以确保安全的必要性。智能手机，具有各种内置传感器，为道路感知提供了实用且经济的解决方案。然而，缺乏可靠的标准化数据集阻碍了评估道路条件和驾驶模式的进展。本研究通过引入源自智能手机传感器的综合数据集来填补这一空白，该数据集通过整合了加速计、陀螺仪、磁力计、GPS、重力、方向和未校准传感器等多种传感器，超越了现有数据集。这些传感器捕捉了广泛的参数，如加速力、重力、旋转速率、磁场强度和车速，提供了对道路条件和驾驶行为的详细理解。该数据集旨在提升道路安全、基础设施维护、交通管理和城市规划。通过向社区提供这一数据集，本研究旨在促进合作，激发进一步研究，并促进智能交通系统中创新解决方案的发展。

更新时间: 2024-11-11 19:15:29

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2411.07315v1

Anomaly Detection in OKTA Logs using Autoencoders

Okta logs are used today to detect cybersecurity events using various rule-based models with restricted look back periods. These functions have limitations, such as a limited retrospective analysis, a predefined rule set, and susceptibility to generating false positives. To address this, we adopt unsupervised techniques, specifically employing autoencoders. To properly use an autoencoder, we need to transform and simplify the complexity of the log data we receive from our users. This transformed and filtered data is then fed into the autoencoder, and the output is evaluated.

Updated: 2024-11-11 19:15:05

标题: 使用自动编码器在OKTA日志中进行异常检测

摘要: 今天，Okta日志被用于使用基于规则的模型检测网络安全事件，具有受限的回溯期限。这些功能存在一些限制，例如有限的回顾分析、预定义的规则集以及易于产生误报。为了解决这个问题，我们采用了无监督技术，具体采用自动编码器。为了正确使用自动编码器，我们需要转换和简化从用户那里收到的日志数据的复杂性。然后，将这些转换和过滤后的数据输入到自动编码器中，并评估输出结果。

更新时间: 2024-11-11 19:15:05

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2411.07314v1

X-DFS: Explainable Artificial Intelligence Guided Design-for-Security Solution Space Exploration

Design and manufacturing of integrated circuits predominantly use a globally distributed semiconductor supply chain involving diverse entities. The modern semiconductor supply chain has been designed to boost production efficiency, but is filled with major security concerns such as malicious modifications (hardware Trojans), reverse engineering (RE), and cloning. While being deployed, digital systems are also subject to a plethora of threats such as power, timing, and electromagnetic (EM) side channel attacks. Many Design-for-Security (DFS) solutions have been proposed to deal with these vulnerabilities, and such solutions (DFS) relays on strategic modifications (e.g., logic locking, side channel resilient masking, and dummy logic insertion) of the digital designs for ensuring a higher level of security. However, most of these DFS strategies lack robust formalism, are often not human-understandable, and require an extensive amount of human expert effort during their development/use. All of these factors make it difficult to keep up with the ever growing number of microelectronic vulnerabilities. In this work, we propose X-DFS, an explainable Artificial Intelligence (AI) guided DFS solution-space exploration approach that can dramatically cut down the mitigation strategy development/use time while enriching our understanding of the vulnerability by providing human-understandable decision rationale. We implement X-DFS and comprehensively evaluate it for reverse engineering threats (SAIL, SWEEP, and OMLA) and formalize a generalized mechanism for applying X-DFS to defend against other threats such as hardware Trojans, fault attacks, and side channel attacks for seamless future extensions.

Updated: 2024-11-11 19:04:29

标题: X-DFS：可解释的人工智能引导的面向安全性设计解决方案空间探索

摘要: 集成电路的设计和制造主要使用全球分布的半导体供应链，涉及各种实体。现代半导体供应链旨在提高生产效率，但存在严重的安全问题，如恶意修改（硬件特洛伊）、逆向工程（RE）和克隆。数字系统在部署时还面临诸多威胁，如功耗、时序和电磁（EM）侧信道攻击。许多面向安全设计（DFS）解决方案已被提出以应对这些漏洞，这些解决方案（DFS）依赖于数字设计的战略修改（例如逻辑锁定、侧信道抗干扰掩蔽和虚拟逻辑插入）来确保更高水平的安全性。然而，大多数这些DFS策略缺乏健壮的形式化，通常不易理解，并且在开发/使用过程中需要大量的人类专家努力。所有这些因素使得难以跟上日益增长的微电子漏洞数量。在这项工作中，我们提出了X-DFS，一种可解释的人工智能（AI）引导的DFS解决方案空间探索方法，可以显著缩短缓解策略的开发/使用时间，同时通过提供易于理解的决策理由来丰富我们对漏洞的理解。我们实施了X-DFS，并对其进行了全面评估以应对逆向工程威胁（SAIL、SWEEP和OMLA），并形式化了一种通用机制，用于将X-DFS应用于防御其他威胁，如硬件特洛伊、故障攻击和侧信道攻击，以实现未来的无缝扩展。

更新时间: 2024-11-11 19:04:29

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2411.07308v1

Merit-Based Sortition in Decentralized Systems

In decentralized systems, it is often necessary to select an 'active' subset of participants from the total participant pool, with the goal of satisfying computational limitations or optimizing resource efficiency. This selection can sometimes be made at random, mirroring the sortition practice invented in classical antiquity aimed at achieving a high degree of statistical representativeness. However, the recent emergence of specialized decentralized networks that solve concrete coordination problems and are characterized by measurable success metrics often requires prioritizing performance optimization over representativeness. We introduce a simple algorithm for 'merit-based sortition', in which the quality of each participant influences its probability of being drafted into the active set, while simultaneously retaining representativeness by allowing inactive participants an infinite number of chances to be drafted into the active set with non-zero probability. Using a suite of numerical experiments, we demonstrate that our algorithm boosts the quality metric describing the performance of the active set by $>2$ times the intrinsic stochasticity. This implies that merit-based sortition ensures a statistically significant performance boost to the drafted, 'active' set, while retaining the property of classical, random sortition that it enables upward mobility from a much larger 'inactive' set. This way, merit-based sortition fulfils a key requirement for decentralized systems in need of performance optimization.

Updated: 2024-11-11 19:00:31

标题: 分散系统中基于功绩的分选

摘要: 在分散系统中，通常需要从总参与者池中选择一个“活跃”子集，目的是满足计算限制或优化资源效率。这种选择有时可以随机进行，模拟古典古代发明的分配实践，旨在实现高度的统计代表性。然而，最近出现的专门的分散网络解决具体的协调问题，并且以可衡量的成功指标为特征，通常需要优先考虑性能优化而不是代表性。我们引入了一种简单的“基于功绩的分配”算法，其中每个参与者的质量影响其被选入活跃集的概率，同时通过允许非活跃参与者有无限次机会以非零概率被选入活跃集来保持代表性。通过一系列数值实验，我们证明了我们的算法提高了描述活跃集性能的质量指标，超过了固有随机性的2倍。这意味着基于功绩的分配确保了对被选入“活跃”集的统计显著性能提升，同时保留了古典随机分配的特性，即它使得更大的“非活跃”集具有上升机会。这样，基于功绩的分配满足了需要性能优化的分散系统的关键要求。

更新时间: 2024-11-11 19:00:31

领域: cs.MA,cs.CY,cs.DC,cs.LG

下载: http://arxiv.org/abs/2411.07302v1

Artificial Intelligence Ecosystem for Automating Self-Directed Teaching

This research introduces an innovative artificial intelligence-driven educational concept designed to optimize self-directed learning through personalized course delivery and automated teaching assistance. The system leverages fine-tuned AI models to create an adaptive learning environment that encompasses customized roadmaps, automated presentation generation, and three-dimensional modeling for complex concept visualization. By integrating real-time virtual assistance for doubt resolution, the platform addresses the immediate educational needs of learners while promoting autonomous learning practices. This study explores the psychological advantages of self-directed learning and demonstrates how AI automation can enhance educational outcomes through personalized content delivery and interactive support mechanisms. The research contributes to the growing field of educational technology by presenting a comprehensive framework that combines automated content generation, visual learning aids, and intelligent tutoring to create an efficient, scalable solution for modern educational needs. Preliminary findings suggest that this approach not only accommodates diverse learning styles but also strengthens student engagement and knowledge retention through its emphasis on self-paced, independent learning methodologies.

Updated: 2024-11-11 19:00:22

标题: 《用于自主教学自动化的人工智能生态系统》

摘要: 这项研究介绍了一种创新的基于人工智能的教育概念，旨在通过个性化课程传递和自动化教学辅助来优化自主学习。该系统利用精细调整的人工智能模型创建了一个适应性学习环境，包括定制路线图、自动生成演示文稿以及三维建模用于复杂概念可视化。通过整合实时虚拟辅助解决疑问，该平台满足学习者的即时教育需求，同时促进自主学习实践。这项研究探讨了自主学习的心理优势，并展示了人工智能自动化如何通过个性化内容传递和互动支持机制提高教育成果。该研究通过提出一个综合的框架，结合自动生成内容、视觉学习辅助和智能辅导，为现代教育需求提供了一种高效、可扩展的解决方案，从而为教育技术领域的发展做出贡献。初步研究结果表明，这种方法不仅适应多样化的学习风格，还通过强调自主学习方法论增强了学生的参与度和知识保留能力。

更新时间: 2024-11-11 19:00:22

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2411.07300v1

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Language models have shown impressive performance on tasks within their training distribution, but often struggle with novel problems requiring complex reasoning. We investigate the effectiveness of test-time training (TTT) -- updating model parameters temporarily during inference using a loss derived from input data -- as a mechanism for improving models' reasoning capabilities, using the Abstraction and Reasoning Corpus (ARC) as a benchmark. Through systematic experimentation, we identify three crucial components for successful TTT: (1) initial finetuning on similar tasks (2) auxiliary task format and augmentations (3) per-instance training. TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models; applying TTT to an 8B-parameter language model, we achieve 53% accuracy on the ARC's public validation set, improving the state-of-the-art by nearly 25% for public and purely neural approaches. By ensembling our method with recent program generation approaches, we get SoTA public validation accuracy of 61.9%, matching the average human score. Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models; additional test-time applied to continued training on few-shot examples can also be extremely effective.

Updated: 2024-11-11 18:59:45

标题: 抽象推理的测试时间训练的惊人有效性

摘要: 语言模型在其训练分布内的任务中表现出色，但在需要复杂推理的新问题上往往遇到困难。我们研究了测试期训练（TTT）的有效性——即在推理过程中使用从输入数据推导出的损失临时更新模型参数，作为提高模型推理能力的机制，使用抽象和推理语料库（ARC）作为基准。通过系统实验，我们确定了成功TTT的三个关键组成部分：（1）在类似任务上的初始微调，（2）辅助任务格式和增强，（3）每个实例的训练。TTT显著提高了ARC任务的性能，在准确性方面比基础微调模型实现了高达6倍的改进；将TTT应用于一个具有8B参数的语言模型，我们在ARC的公共验证集上实现了53%的准确性，将现有技术的最新水平提高了近25%。通过将我们的方法与最近的程序生成方法集成，我们获得了61.9%的公共验证准确性，与平均人类得分相匹配。我们的研究结果表明，在神经语言模型中，明确的符号搜索不是提高抽象推理的唯一途径；对少样本示例进行持续训练的额外测试时间也可以非常有效。

更新时间: 2024-11-11 18:59:45

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.07279v1

Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation

Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning. While modeling pairwise relations has been widely studied in multi-agent interacting systems, the ability to capture larger-scale group-wise activities is limited. In this paper, we propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures, and we demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation. In addition to the edges between pairs of nodes (i.e., agents), we propose to infer hyperedges that adaptively connect multiple nodes to enable group-wise reasoning in an unsupervised manner. Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states. Meanwhile, we propose to regularize the sharpness and sparsity of the learned relations and the smoothness of the relation evolution, which proves to enhance training stability and model performance. The proposed approach is validated on synthetic crowd simulations and real-world benchmark datasets. Experiments demonstrate that the approach infers reasonable relations and achieves state-of-the-art prediction performance. In addition, we present a deep reinforcement learning (DRL) framework for social robot navigation, which incorporates relational reasoning and trajectory prediction systematically. In a group-based crowd simulation, our method outperforms the strongest baseline by a significant margin in terms of safety, efficiency, and social compliance in dense, interactive scenarios. We also demonstrate the practical applicability of our method with real-world robot experiments. The code and videos can be found at https://relational-reasoning-nav.github.io/.

Updated: 2024-11-11 18:59:07

标题: 多智能体动态关系推理在社交机器人导航中的应用

摘要: 社会机器人导航可以在日常生活的各种情境中提供帮助，但需要安全的人机互动和高效的轨迹规划。虽然在多智能体交互系统中建模成对关系已经得到广泛研究，但捕捉更大规模的群体活动的能力有限。在本文中，我们提出了一种系统的关系推理方法，通过显式推断底层动态演化的关系结构，展示了其对多智能体轨迹预测和社会机器人导航的有效性。除了节点对之间的边缘（即代理），我们提出推断超边缘，自适应地连接多个节点，以便以无监督方式进行群体推理。我们的方法推断动态演化的关系图和超图，以捕捉关系的演变，轨迹预测器利用这些关系生成未来状态。同时，我们提出正则化学习关系的锐度和稀疏性以及关系演化的平滑性，证明增强了训练稳定性和模型性能。该方法在合成人群模拟和真实世界基准数据集上进行了验证。实验表明该方法推断出合理的关系并实现了最先进的预测性能。此外，我们提出了一个用于社会机器人导航的深度强化学习（DRL）框架，系统地融合了关系推理和轨迹预测。在基于群体的人群模拟中，我们的方法在密集、互动场景中在安全性、效率性和社会合规性方面明显优于最强基线。我们还通过真实世界机器人实验展示了我们方法的实际适用性。代码和视频可在https://relational-reasoning-nav.github.io/找到。

更新时间: 2024-11-11 18:59:07

领域: cs.RO,cs.AI,cs.CV,cs.LG,cs.MA

下载: http://arxiv.org/abs/2401.12275v2

UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts

The evaluation of mathematical reasoning capabilities is essential for advancing Artificial General Intelligence (AGI). While Large Language Models (LLMs) have shown impressive performance in solving mathematical problems, existing benchmarks such as GSM8K and MATH present limitations, including narrow problem definitions with specific numbers and reliance on predetermined rules that hinder accurate assessments of reasoning and adaptability. This paper introduces the UTMath Benchmark, which robustly evaluates the models through extensive unit tests. It consists of 1,053 problems across 9 mathematical domains, with over 68 test cases per problem.We propose an innovative evaluation framework inspired by unit testing in software development, focusing on both accuracy and reliability of results. Furthermore, we introduce the Reasoning-to-Coding of Thoughts (RCoT) approach, which encourages LLMs to perform explicit reasoning before generating code, leading to generating more advanced solution and improved performance. Furthermore, we are releasing not only the UTMath benchmark but also the UTMath-Train training dataset (more than 70k samples), to support the community in further exploring mathematical reasoning.

Updated: 2024-11-11 18:59:02

标题: UTMath：通过推理编码思维进行数学评估

摘要: 数学推理能力的评估对于推进人工通用智能（AGI）至关重要。虽然大型语言模型（LLMs）在解决数学问题方面表现出色，但现有的基准测试如GSM8K和MATH存在一些限制，包括具有特定数字的狭窄问题定义以及依赖预先确定规则，这些都会阻碍推理和适应性的准确评估。本文介绍了UTMath基准测试，通过广泛的单元测试对模型进行了强大的评估。它包括来自9个数学领域的1,053个问题，每个问题有超过68个测试用例。我们提出了一个受软件开发中单元测试启发的创新评估框架，重点关注结果的准确性和可靠性。此外，我们还引入了思维到编码推理（RCoT）方法，鼓励LLMs在生成代码之前进行显式推理，从而生成更高级的解决方案并提高性能。此外，我们不仅发布了UTMath基准测试，还发布了UTMath-Train训练数据集（超过70,000个样本），以支持社区进一步探索数学推理。

更新时间: 2024-11-11 18:59:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.07240v1

DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning

We propose a novel fine-tuning method to achieve multi-operator learning through training a distributed neural operator with diverse function data and then zero-shot fine-tuning the neural network using physics-informed losses for downstream tasks. Operator learning effectively approximates solution operators for PDEs and various PDE-related problems, yet it often struggles to generalize to new tasks. To address this, we investigate fine-tuning a pretrained model, while carefully selecting an initialization that enables rapid adaptation to new tasks with minimal data. Our approach combines distributed learning to integrate data from various operators in pre-training, while physics-informed methods enable zero-shot fine-tuning, minimizing the reliance on downstream data. We investigate standard fine-tuning and Low-Rank Adaptation fine-tuning, applying both to train complex nonlinear target operators that are difficult to learn only using random initialization. Through comprehensive numerical examples, we demonstrate the advantages of our approach, showcasing significant improvements in accuracy. Our findings provide a robust framework for advancing multi-operator learning and highlight the potential of transfer learning techniques in this domain.

Updated: 2024-11-11 18:58:46

标题: DeepONet作为多算子外推模型：物理知识引导的分布式预训练和微调

摘要: 我们提出了一种新颖的微调方法，通过训练一个具有多样化功能数据的分布式神经算子来实现多算子学习，然后使用物理信息损失对神经网络进行零-shot微调，用于下游任务。算子学习有效地逼近了PDE和各种与PDE相关的问题的解算子，但往往难以推广到新任务。为了解决这个问题，我们研究了对预训练模型进行微调，同时精心选择一种初始化，以便在数据量最小的情况下快速适应新任务。我们的方法结合了分布式学习，在预训练中整合来自各种算子的数据，而物理信息方法使零-shot微调成为可能，最大程度地减少对下游数据的依赖。我们研究了标准微调和低秩适应微调，应用这两种方法来训练复杂的非线性目标算子，这些算子仅使用随机初始化很难学习。通过全面的数值示例，我们展示了我们方法的优势，展示了准确性的显著改进。我们的发现为推进多算子学习提供了一个稳健的框架，并突出了在这一领域中转移学习技术的潜力。

更新时间: 2024-11-11 18:58:46

领域: cs.LG

下载: http://arxiv.org/abs/2411.07239v1

Score-based generative diffusion with "active" correlated noise sources

Diffusion models exhibit robust generative properties by approximating the underlying distribution of a dataset and synthesizing data by sampling from the approximated distribution. In this work, we explore how the generative performance may be be modulated if noise sources with temporal correlations -- akin to those used in the field of active matter -- are used for the destruction of the data in the forward process. Our numerical and analytical experiments suggest that the corresponding reverse process may exhibit improved generative properties.

Updated: 2024-11-11 18:51:08

标题: 基于分数的生成扩散与“主动”相关噪声源

摘要: 扩散模型通过逼近数据集的基础分布并通过从近似分布中采样来合成数据，展现了稳健的生成性能。本文研究了如果在前向过程中使用具有时间相关性的噪声源（类似于在主动物质领域使用的噪声源）来破坏数据，可能如何调节生成性能。我们的数值和分析实验表明，相应的逆过程可能展现出改善的生成性能。

更新时间: 2024-11-11 18:51:08

领域: cs.LG,cond-mat.dis-nn

下载: http://arxiv.org/abs/2411.07233v1

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Adding Object into images based on text instructions is a challenging task in semantic image editing, requiring a balance between preserving the original scene and seamlessly integrating the new object in a fitting location. Despite extensive efforts, existing models often struggle with this balance, particularly with finding a natural location for adding an object in complex scenes. We introduce Add-it, a training-free approach that extends diffusion models' attention mechanisms to incorporate information from three key sources: the scene image, the text prompt, and the generated image itself. Our weighted extended-attention mechanism maintains structural consistency and fine details while ensuring natural object placement. Without task-specific fine-tuning, Add-it achieves state-of-the-art results on both real and generated image insertion benchmarks, including our newly constructed "Additing Affordance Benchmark" for evaluating object placement plausibility, outperforming supervised methods. Human evaluations show that Add-it is preferred in over 80% of cases, and it also demonstrates improvements in various automated metrics.

Updated: 2024-11-11 18:50:09

标题: Add-it：使用预训练扩散模型在图像中进行无需训练的对象插入

摘要: 将对象添加到基于文本说明的图像中是语义图像编辑中的一个具有挑战性的任务，需要在保留原始场景和将新对象无缝集成到合适位置之间取得平衡。尽管已经做出了大量努力，但现有模型往往在这种平衡方面存在困难，特别是在复杂场景中找到添加对象的自然位置方面。我们引入了Add-it，这是一种无需训练的方法，它扩展了扩散模型的注意力机制，以整合来自三个关键来源的信息：场景图像、文本提示和生成的图像本身。我们的加权扩展关注机制保持了结构一致性和细节，同时确保了自然的对象放置。在没有特定任务的微调的情况下，Add-it在真实和生成的图像插入基准测试中取得了最先进的结果，包括我们新构建的“添加能力基准测试”，用于评估对象放置的合理性，优于监督方法。人类评估显示，在超过80%的情况下，人们更喜欢使用Add-it，并且它还在各种自动化指标上展示了改进。

更新时间: 2024-11-11 18:50:09

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2411.07232v1

Watermark Anything with Localized Messages

Image watermarking methods are not tailored to handle small watermarked areas. This restricts applications in real-world scenarios where parts of the image may come from different sources or have been edited. We introduce a deep-learning model for localized image watermarking, dubbed the Watermark Anything Model (WAM). The WAM embedder imperceptibly modifies the input image, while the extractor segments the received image into watermarked and non-watermarked areas and recovers one or several hidden messages from the areas found to be watermarked. The models are jointly trained at low resolution and without perceptual constraints, then post-trained for imperceptibility and multiple watermarks. Experiments show that WAM is competitive with state-of-the art methods in terms of imperceptibility and robustness, especially against inpainting and splicing, even on high-resolution images. Moreover, it offers new capabilities: WAM can locate watermarked areas in spliced images and extract distinct 32-bit messages with less than 1 bit error from multiple small regions - no larger than 10% of the image surface - even for small $256\times 256$ images.

Updated: 2024-11-11 18:49:58

标题: 使用本地化消息为任何内容添加水印

摘要: 图像水印方法并不适用于处理小水印区域。这限制了在真实世界场景中的应用，其中图像的部分可能来自不同的来源或已经被编辑。我们引入了一种用于本地化图像水印的深度学习模型，名为Watermark Anything Model（WAM）。WAM嵌入器在不可察觉地修改输入图像，而提取器将接收到的图像分割为带水印和无水印区域，并从被发现带有水印的区域中恢复一个或多个隐藏的消息。这些模型在低分辨率下联合训练，没有感知约束，然后进行后期训练以确保不可察觉性和多个水印。实验证明，WAM在不可察觉性和鲁棒性方面与最先进的方法竞争力强，尤其是针对修补和拼接，即使是在高分辨率图像上也是如此。此外，它还提供了新的功能：WAM可以在拼接图像中定位水印区域，并从多个小区域中提取不同的32位消息，即使是对于小于$256\times 256$像素的图像，也可以在少于1比特错误的情况下完成。

更新时间: 2024-11-11 18:49:58

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2411.07231v1

INQUIRE: A Natural World Text-to-Image Retrieval Benchmark

We introduce INQUIRE, a text-to-image retrieval benchmark designed to challenge multimodal vision-language models on expert-level queries. INQUIRE includes iNaturalist 2024 (iNat24), a new dataset of five million natural world images, along with 250 expert-level retrieval queries. These queries are paired with all relevant images comprehensively labeled within iNat24, comprising 33,000 total matches. Queries span categories such as species identification, context, behavior, and appearance, emphasizing tasks that require nuanced image understanding and domain expertise. Our benchmark evaluates two core retrieval tasks: (1) INQUIRE-Fullrank, a full dataset ranking task, and (2) INQUIRE-Rerank, a reranking task for refining top-100 retrievals. Detailed evaluation of a range of recent multimodal models demonstrates that INQUIRE poses a significant challenge, with the best models failing to achieve an mAP@50 above 50%. In addition, we show that reranking with more powerful multimodal models can enhance retrieval performance, yet there remains a significant margin for improvement. By focusing on scientifically-motivated ecological challenges, INQUIRE aims to bridge the gap between AI capabilities and the needs of real-world scientific inquiry, encouraging the development of retrieval systems that can assist with accelerating ecological and biodiversity research. Our dataset and code are available at https://inquire-benchmark.github.io

Updated: 2024-11-11 18:49:52

标题: INQUIRE：一个自然世界文本到图像检索基准

摘要: 我们介绍了INQUIRE，这是一个文本到图像检索基准，旨在挑战多模态视觉语言模型对专家级查询的能力。INQUIRE包括iNaturalist 2024（iNat24），一个包含五百万自然界图像的新数据集，以及250个专家级检索查询。这些查询与iNat24中全面标记的所有相关图像配对，包括33000个总匹配项。查询涵盖物种识别、背景、行为和外观等类别，强调需要细致图像理解和领域专业知识的任务。我们的基准评估了两个核心检索任务：（1）INQUIRE-Fullrank，一个完整数据集排名任务，以及（2）INQUIRE-Rerank，用于细化前100个检索结果的重新排名任务。对一系列最新多模态模型进行详细评估表明，INQUIRE提出了重大挑战，最佳模型未能在mAP@50上达到50%以上。此外，我们展示了使用更强大的多模态模型进行重新排名可以提高检索性能，但仍有较大的改进空间。通过专注于科学动机的生态挑战，INQUIRE旨在弥合人工智能能力和真实科学探究需求之间的差距，鼓励开发可加速生态学和生物多样性研究的检索系统。我们的数据集和代码可在https://inquire-benchmark.github.io上获得。

更新时间: 2024-11-11 18:49:52

领域: cs.CV,cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2411.02537v3

PyRelationAL: a python library for active learning research and development

Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data by strategically querying new data points that are the most useful for a particular task. Here, we introduce PyRelationAL, an open source library for AL research. We describe a modular toolkit based around a two step design methodology for composing pool-based active learning strategies applicable to both single-acquisition and batch-acquisition strategies. This framework allows for the mathematical and practical specification of a broad number of existing and novel strategies under a consistent programming model and abstraction. Furthermore, we incorporate datasets and active learning tasks applicable to them to simplify comparative evaluation and benchmarking, along with an initial group of benchmarks across datasets included in this library. The toolkit is compatible with existing ML frameworks. PyRelationAL is maintained using modern software engineering practices -- with an inclusive contributor code of conduct -- to promote long term library quality and utilisation. PyRelationAL is available under a permissive Apache licence on PyPi and at https://github.com/RelationRx/pyrelational.

Updated: 2024-11-11 18:49:02

标题: PyRelationAL：用于主动学习研究和开发的Python库

摘要: 主动学习（AL）是机器学习的一个子领域，专注于开发策略性地查询对特定任务最有用的新数据点，以迭代和经济地获取数据的方法。在这里，我们介绍了PyRelationAL，这是一个用于AL研究的开源库。我们描述了一个基于两步设计方法的模块化工具包，用于组合适用于单次获取和批量获取策略的基于池的主动学习策略。该框架允许在一致的编程模型和抽象下数学和实际地规定广泛数量的现有和新颖策略。此外，我们整合了数据集和适用于它们的主动学习任务，以简化比较评估和基准测试，同时还包括了该库中包含的数据集的初始一组基准测试。该工具包与现有的机器学习框架兼容。PyRelationAL是使用现代软件工程实践维护的--具有包容性的贡献者行为准则--以促进长期的库质量和利用。PyRelationAL在PyPi上以宽松的Apache许可证发布，并可在https://github.com/RelationRx/pyrelational获取。

更新时间: 2024-11-11 18:49:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2205.11117v3

Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving

To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents' ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.

Updated: 2024-11-11 18:46:37

标题: 使用工具还是不使用工具？工具对化学问题解决的语言代理的影响

摘要: 为了增强大型语言模型（LLMs）在化学问题解决方面的能力，已经提出了几种基于LLMs的代理人，并增加了工具，如ChemCrow和Coscientist。然而，它们的评估范围狭窄，导致对工具在不同化学任务中的益处理解存在较大差距。为了弥合这一差距，我们开发了ChemAgent，这是一个在ChemCrow基础上增强的化学代理人，并对其在专门化学任务和一般化学问题上的性能进行了全面评估。令人惊讶的是，ChemAgent并不总是能够在没有工具的情况下持续胜过其基础LLMs。我们与一位化学专家进行了错误分析，结论是：对于专门化学任务，如合成预测，我们应该使用专门工具增强代理人；然而，对于一般化学问题，如考试中的问题，代理人正确推理化学知识的能力更为重要，而工具的增强并不总是有帮助。

更新时间: 2024-11-11 18:46:37

领域: cs.AI,cs.CE

下载: http://arxiv.org/abs/2411.07228v1

TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling

Deep learning architectures for supervised learning on tabular data range from simple multilayer perceptrons (MLP) to sophisticated Transformers and retrieval-augmented methods. This study highlights a major, yet so far overlooked opportunity for substantially improving tabular MLPs: namely, parameter-efficient ensembling -- a paradigm for implementing an ensemble of models as one model producing multiple predictions. We start by developing TabM -- a simple model based on MLP and our variations of BatchEnsemble (an existing technique). Then, we perform a large-scale evaluation of tabular DL architectures on public benchmarks in terms of both task performance and efficiency, which renders the landscape of tabular DL in a new light. Generally, we show that MLPs, including TabM, form a line of stronger and more practical models compared to attention- and retrieval-based architectures. In particular, we find that TabM demonstrates the best performance among tabular DL models. Lastly, we conduct an empirical analysis on the ensemble-like nature of TabM. For example, we observe that the multiple predictions of TabM are weak individually, but powerful collectively. Overall, our work brings an impactful technique to tabular DL, analyses its behaviour, and advances the performance-efficiency trade-off with TabM -- a simple and powerful baseline for researchers and practitioners.

Updated: 2024-11-11 18:46:06

标题: TabM：通过参数高效集成推进表格深度学习

摘要: 深度学习架构用于表格数据的监督学习，从简单的多层感知器（MLP）到复杂的Transformer和增强检索方法不等。本研究突出了一个迄今为止被忽视的重要机会，即参数高效的集成方法——一种将多个模型实现为一个模型生成多个预测的范例。我们首先开发了TabM——一个基于MLP和我们的批量集成变体（一种现有技术）的简单模型。然后，我们在公共基准测试中对表格数据深度学习架构进行了大规模评估，评估指标包括任务性能和效率，这使得表格数据深度学习的格局呈现出新的光景。总体而言，我们发现与基于注意力和检索的架构相比，包括TabM在内的MLP模型形成了更强大、更实用的模型线。特别地，我们发现TabM在表格数据深度学习模型中表现最佳。最后，我们对TabM的类似集成特性进行了实证分析。例如，我们观察到TabM的多个预测单独而言较弱，但集体具有很强的能力。总的来说，我们的工作为表格数据深度学习带来了一种有影响力的技术，分析了其行为，并通过TabM推进了性能和效率的权衡——这是研究人员和从业者的一个简单而强大的基准。

更新时间: 2024-11-11 18:46:06

领域: cs.LG

下载: http://arxiv.org/abs/2410.24210v2

TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models

With the widespread of digital environments, reliable authentication and continuous access control has become crucial. It can minimize cyber attacks and prevent frauds, specially those associated with identity theft. A particular interest lies on keystroke dynamics (KD), which refers to the task of recognizing individuals' identity based on their unique typing style. In this work, we propose the use of pre-trained language models (PLMs) to recognize such patterns. Although PLMs have shown high performance on multiple NLP benchmarks, the use of these models on specific tasks requires customization. BERT and RoBERTa, for instance, rely on subword tokenization, and they cannot be directly applied to KD, which requires temporal-character information to recognize users. Recent character-aware PLMs are able to process both subwords and character-level information and can be an alternative solution. Notwithstanding, they are still not suitable to be directly fine-tuned for KD as they are not optimized to account for user's temporal typing information (e.g., hold time and flight time). To overcome this limitation, we propose TempCharBERT, an architecture that incorporates temporal-character information in the embedding layer of CharBERT. This allows modeling keystroke dynamics for the purpose of user identification and authentication. Our results show a significant improvement with this customization. We also showed the feasibility of training TempCharBERT on a federated learning settings in order to foster data privacy.

Updated: 2024-11-11 18:44:17

标题: TempCharBERT：基于预训练语言模型的连续访问控制的按键动力学

摘要: 随着数字环境的广泛普及，可靠的身份验证和持续访问控制变得至关重要。它可以最小化网络攻击并防止欺诈，特别是与身份盗窃有关的欺诈。人们特别关注击键动态（KD），它指的是基于个体独特的打字风格来识别个体身份的任务。在这项工作中，我们提出使用预训练语言模型（PLMs）来识别这些模式。尽管PLMs在多个自然语言处理基准测试中表现出色，但在特定任务上使用这些模型需要定制化。例如，BERT和RoBERTa依赖于子词分词，不能直接应用于需要时间字符信息来识别用户的KD任务。最近的字符感知PLMs能够处理子词和字符级信息，并可以成为替代解决方案。尽管如此，它们仍然不适合直接为KD进行微调，因为它们没有被优化来考虑用户的时间打字信息（例如，按住时间和飞行时间）。为了克服这一限制，我们提出了TempCharBERT，这是一种结构，在CharBERT的嵌入层中融入了时间字符信息，从而可以对用户进行身份识别和认证的击键动态建模。我们的结果显示了这种定制化的显著改进。我们还展示了在联邦学习设置中训练TempCharBERT的可行性，以促进数据隐私。

更新时间: 2024-11-11 18:44:17

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2411.07224v1

Grounding Video Models to Actions through Goal Conditioned Exploration

Large video models, pretrained on massive amounts of Internet video, provide a rich source of physical knowledge about the dynamics and motions of objects and tasks. However, video models are not grounded in the embodiment of an agent, and do not describe how to actuate the world to reach the visual states depicted in a video. To tackle this problem, current methods use a separate vision-based inverse dynamic model trained on embodiment-specific data to map image states to actions. Gathering data to train such a model is often expensive and challenging, and this model is limited to visual settings similar to the ones in which data are available. In this paper, we investigate how to directly ground video models to continuous actions through self-exploration in the embodied environment -- using generated video states as visual goals for exploration. We propose a framework that uses trajectory level action generation in combination with video guidance to enable an agent to solve complex tasks without any external supervision, e.g., rewards, action labels, or segmentation masks. We validate the proposed approach on 8 tasks in Libero, 6 tasks in MetaWorld, 4 tasks in Calvin, and 12 tasks in iThor Visual Navigation. We show how our approach is on par with or even surpasses multiple behavior cloning baselines trained on expert demonstrations while without requiring any action annotations.

Updated: 2024-11-11 18:43:44

标题: 通过目标条件探索将视频模型接地到动作

摘要: 大型视频模型，在大量互联网视频上进行预训练，提供了关于物体和任务的动态和运动的丰富物理知识。然而，视频模型并没有根植于代理的具体实体，也没有描述如何驱动世界以达到视频中所展示的视觉状态。为了解决这个问题，当前的方法使用一个在特定实体数据上训练的基于视觉的逆动力模型，将图像状态映射到动作。收集数据来训练这样的模型通常是昂贵且具有挑战性的，并且这个模型仅限于与可用数据相似的视觉环境。在本文中，我们研究了如何通过在具体环境中进行自我探索来直接将视频模型与连续动作联系起来 -- 使用生成的视频状态作为探索的视觉目标。我们提出了一个框架，结合视频指导和轨迹级别的动作生成，使代理能够在没有任何外部监督的情况下解决复杂任务，例如奖励、动作标签或分割掩膜。我们在Libero的8个任务、MetaWorld的6个任务、Calvin的4个任务和iThor Visual Navigation的12个任务上验证了所提出的方法。我们展示了我们的方法如何与甚至超过多个行为克隆基线相媲美，这些基线是在专家演示上训练的，而无需任何动作注释。

更新时间: 2024-11-11 18:43:44

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.07223v1

Nteasee: A mixed methods study of expert and general population perspectives on deploying AI for health in African countries

Artificial Intelligence (AI) for health has the potential to significantly change and improve healthcare. However in most African countries, identifying culturally and contextually attuned approaches for deploying these solutions is not well understood. To bridge this gap, we conduct a qualitative study to investigate the best practices, fairness indicators, and potential biases to mitigate when deploying AI for health in African countries, as well as explore opportunities where artificial intelligence could make a positive impact in health. We used a mixed methods approach combining in-depth interviews (IDIs) and surveys. We conduct 1.5-2 hour long IDIs with 50 experts in health, policy, and AI across 17 countries, and through an inductive approach we conduct a qualitative thematic analysis on expert IDI responses. We administer a blinded 30-minute survey with case studies to 672 general population participants across 5 countries in Africa and analyze responses on quantitative scales, statistically comparing responses by country, age, gender, and level of familiarity with AI. We thematically summarize open-ended responses from surveys. Our results find generally positive attitudes, high levels of trust, accompanied by moderate levels of concern among general population participants for AI usage for health in Africa. This contrasts with expert responses, where major themes revolved around trust/mistrust, ethical concerns, and systemic barriers to integration, among others. This work presents the first-of-its-kind qualitative research study of the potential of AI for health in Africa from an algorithmic fairness angle, with perspectives from both experts and the general population. We hope that this work guides policymakers and drives home the need for further research and the inclusion of general population perspectives in decision-making around AI usage.

Updated: 2024-11-11 18:42:57

标题: Nteasee：关于在非洲国家应用人工智能促进健康的专家和普通民众观点的混合方法研究

摘要: 健康领域的人工智能（AI）有望显著改变和改善医疗保健。然而，在大多数非洲国家，尚未充分了解部署这些解决方案的文化和背景调和方法。为了弥合这一差距，我们进行了一项定性研究，以调查在非洲国家部署健康AI的最佳实践、公平指标和潜在偏见，并探索人工智能在健康领域可能产生积极影响的机会。我们采用混合方法，结合了深入访谈（IDIs）和调查。我们与来自17个国家的50名健康、政策和AI专家进行了1.5-2小时的深度访谈，并通过归纳法对专家IDIs的回应进行了定性主题分析。我们向在非洲5个国家的672名普通民众进行了盲目的30分钟调查，并在定量尺度上分析了他们的回应，通过按国家、年龄、性别和对AI的熟悉程度进行统计比较。我们从调查中总结了开放式回答的主题。我们的研究结果发现，在非洲使用AI进行健康方面，普通民众参与者普遍持积极态度、高度信任，但同时也有适度的担忧。这与专家回答形成对比，专家回答的主要主题涉及信任/不信任、伦理关切和整合的系统性障碍等。这项工作是首个从算法公平角度对非洲健康AI潜力进行定性研究，涵盖了专家和普通民众的观点。我们希望这项工作能指导决策者，强调进一步研究的必要性，并将普通民众的观点纳入决策制定中。

更新时间: 2024-11-11 18:42:57

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2409.12197v3

Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions

Understanding identifiability of latent content and style variables from unaligned multi-domain data is essential for tasks such as domain translation and data generation. Existing works on content-style identification were often developed under somewhat stringent conditions, e.g., that all latent components are mutually independent and that the dimensions of the content and style variables are known. We introduce a new analytical framework via cross-domain \textit{latent distribution matching} (LDM), which establishes content-style identifiability under substantially more relaxed conditions. Specifically, we show that restrictive assumptions such as component-wise independence of the latent variables can be removed. Most notably, we prove that prior knowledge of the content and style dimensions is not necessary for ensuring identifiability, if sparsity constraints are properly imposed onto the learned latent representations. Bypassing the knowledge of the exact latent dimension has been a longstanding aspiration in unsupervised representation learning -- our analysis is the first to underpin its theoretical and practical viability. On the implementation side, we recast the LDM formulation into a regularized multi-domain GAN loss with coupled latent variables. We show that the reformulation is equivalent to LDM under mild conditions -- yet requiring considerably less computational resource. Experiments corroborate with our theoretical claims.

Updated: 2024-11-11 18:40:09

标题: 来自不对齐领域的内容风格学习：在未知的潜在维度下的可识别性

摘要: 理解潜在内容和风格变量在未对齐的多领域数据中的可识别性对于诸如领域翻译和数据生成等任务至关重要。现有关于内容-风格识别的工作通常是在某种程度上严格的条件下开发的，例如，所有潜在组件彼此独立，并且内容和风格变量的维度已知。我们通过跨领域\textit{潜在分布匹配}（LDM）引入了一个新的分析框架，可以在更加宽松的条件下建立内容-风格的可识别性。具体地，我们证明了像潜在变量的分量独立性这样的限制性假设可以被移除。最值得注意的是，我们证明了对内容和风格维度的先前知识并不是确保可识别性的必要条件，只要将稀疏性约束正确地施加到学习的潜在表示上。绕过确切潜在维度的知识一直是无监督表示学习中的长期愿望——我们的分析是第一个支撑其理论和实践可行性的。在实现方面，我们将LDM公式重新表述为具有耦合潜在变量的正则化多领域GAN损失。我们证明了在温和条件下，这种重新表述等效于LDM，但需要更少的计算资源。实验证实了我们的理论主张。

更新时间: 2024-11-11 18:40:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.03755v2

TreeCoders: Trees of Transformers

In this paper, we introduce TreeCoders, a novel family of transformer trees. We moved away from traditional linear transformers to complete k-ary trees. Transformer blocks serve as nodes, and generic classifiers learn to select the best child and route the sequence of tokens to a specific leaf. The selectors, moved outside the transformer blocks, allow for the use of a variety of architecture without further modifications. Furthermore, our proposed architecture supports sparse node activation due to the logarithmic complexity of a tree search. We validate our idea by testing a series of decoder-only tree transformers, achieving competitive results across a diverse range of language datasets. Our study demonstrates that the proposed tree transformer model outperforms a size-equivalent linear transformer model 76\% of the time over a wide range of tree architectures. Furthermore, our proposed model naturally lends itself to distributed implementation.

Updated: 2024-11-11 18:40:04

标题: TreeCoders：变压器树

摘要: 在这篇论文中，我们介绍了TreeCoders，一种新颖的Transformer树系列。我们摒弃了传统的线性Transformer，转而采用完整的k-ary树结构。 Transformer块作为节点，通用分类器学习选择最佳子节点，并将token序列路由到特定的叶节点。选择器移出Transformer块，可以使用各种架构而无需进一步修改。此外，我们提出的架构支持稀疏节点激活，由于树搜索的对数复杂性。我们通过测试一系列仅包含解码器的树Transformer验证了我们的想法，在各种语言数据集上取得了竞争性的结果。我们的研究表明，所提出的树Transformer模型在各种树架构中有76%的时间表现优于等效大小的线性Transformer模型。此外，我们提出的模型自然适合分布式实现。

更新时间: 2024-11-11 18:40:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.07218v1

An information field theory approach to Bayesian state and parameter estimation in dynamical systems

Dynamical system state estimation and parameter calibration problems are ubiquitous across science and engineering. Bayesian approaches to the problem are the gold standard as they allow for the quantification of uncertainties and enable the seamless fusion of different experimental modalities. When the dynamics are discrete and stochastic, one may employ powerful techniques such as Kalman, particle, or variational filters. Practitioners commonly apply these methods to continuous-time, deterministic dynamical systems after discretizing the dynamics and introducing fictitious transition probabilities. However, approaches based on time-discretization suffer from the curse of dimensionality since the number of random variables grows linearly with the number of time-steps. Furthermore, the introduction of fictitious transition probabilities is an unsatisfactory solution because it increases the number of model parameters and may lead to inference bias. To address these drawbacks, the objective of this paper is to develop a scalable Bayesian approach to state and parameter estimation suitable for continuous-time, deterministic dynamical systems. Our methodology builds upon information field theory. Specifically, we construct a physics-informed prior probability measure on the function space of system responses so that functions that satisfy the physics are more likely. This prior allows us to quantify model form errors. We connect the system's response to observations through a probabilistic model of the measurement process. The joint posterior over the system responses and all parameters is given by Bayes' rule. To approximate the intractable posterior, we develop a stochastic variational inference algorithm. In summary, the developed methodology offers a powerful framework for Bayesian estimation in dynamical systems.

Updated: 2024-11-11 18:38:32

标题: 一种信息场理论方法用于贝叶斯动态系统状态和参数估计

摘要: 动力系统状态估计和参数校准问题在科学和工程领域中普遍存在。贝叶斯方法是解决这一问题的黄金标准，因为它们允许对不确定性进行量化，并实现不同实验模态的无缝融合。当动力学是离散且随机的时，可以采用强大的技术，如卡尔曼、粒子或变分滤波器。从业人员通常将这些方法应用于连续时间、确定性动力系统，通过离散化动力学并引入虚构的转移概率。然而，基于时间离散化的方法受到维度诅咒的影响，因为随着时间步数增加，随机变量的数量呈线性增长。此外，引入虚构的转移概率是一个不理想的解决方案，因为它会增加模型参数的数量并可能导致推断偏差。为了解决这些缺点，本文的目标是开发一种适用于连续时间、确定性动力系统的可扩展贝叶斯方法，用于状态和参数估计。我们的方法建立在信息场理论的基础上。具体地，我们在系统响应的函数空间上构建了一个物理信息先验概率测度，以便更可能满足物理规律的函数。这个先验允许我们量化模型形式误差。我们通过测量过程的概率模型将系统响应与观测结果相连接。系统响应和所有参数的联合后验由贝叶斯定理给出。为了近似难以处理的后验，我们开发了一种随机变分推断算法。总之，所开发的方法为动力系统中的贝叶斯估计提供了一个强大的框架。

更新时间: 2024-11-11 18:38:32

领域: physics.data-an,cs.LG

下载: http://arxiv.org/abs/2306.02150v2

Feature Selection Based on Wasserstein Distance

In this paper, we present a novel feature selection method based on the Wasserstein distance. Feature selection plays a critical role in reducing the dimensionality of input data, thereby improving machine learning efficiency and generalization performance. Unlike traditional feature selection approaches that rely on criteria such as correlation or KL divergence, our method leverages the Wasserstein distance to measure the similarity between distributions of selected features and original features. This approach inherently accounts for similarities between classes, making it robust in scenarios involving noisy labels. Experimental results demonstrate that our method outperforms traditional approaches, particularly in challenging settings involving noisy labeled data.

Updated: 2024-11-11 18:38:22

标题: 基于Wasserstein距离的特征选择

摘要: 本文提出一种基于Wasserstein距离的新型特征选择方法。特征选择在减少输入数据的维度方面起着至关重要的作用，从而提高机器学习的效率和泛化性能。与传统的依赖于相关性或KL散度等标准的特征选择方法不同，我们的方法利用Wasserstein距离来衡量所选特征与原始特征之间的相似性。这种方法固有地考虑了类别之间的相似性，使其在涉及嘈杂标签的场景中表现出鲁棒性。实验结果表明，我们的方法在特别是涉及嘈杂标记数据的挑战性环境中优于传统方法。

更新时间: 2024-11-11 18:38:22

领域: cs.LG

下载: http://arxiv.org/abs/2411.07217v1

Stronger Random Baselines for In-Context Learning

Evaluating the in-context learning classification performance of language models poses challenges due to small dataset sizes, extensive prompt-selection using the validation set, and intentionally difficult tasks that lead to near-random performance. The standard random baseline--the expected accuracy of guessing labels uniformly at random--is stable when the evaluation set is used only once or when the dataset is large. We account for the common practice of validation set reuse and existing small datasets with a stronger random baseline: the expected maximum accuracy across multiple random classifiers. When choosing the best prompt demonstrations across six quantized language models applied to 16 BIG-bench Lite tasks, more than 20% of the few-shot results that exceed the standard baseline do not exceed this stronger random baseline. When held-out test sets are available, this stronger baseline is also a better predictor of held-out performance than the standard baseline, avoiding unnecessary test set evaluations. This maximum random baseline provides an easily calculated drop-in replacement for the standard baseline.

Updated: 2024-11-11 18:37:22

标题: 更强大的随机基线用于上下文学习

摘要: 评估语言模型在上下文学习分类性能时面临挑战，原因是数据集规模较小，使用验证集进行大量提示选择，并故意设置困难任务导致接近随机的性能。标准的随机基线--即以均匀随机猜测标签的预期准确率--在评估集仅使用一次或数据集较大时是稳定的。我们考虑了验证集重复使用的常见做法和现有小数据集，使用更强的随机基线：多个随机分类器中的预期最大准确率。在选择应用于16个BIG-bench Lite任务的六个量化语言模型的最佳提示演示时，超过标准基线的少样本结果中超过20%并不超过这个更强的随机基线。当保留测试集时，这个更强的基线也是比标准基线更好的保持性能的预测器，避免不必要的测试集评估。这个最大随机基线提供了标准基线的易于计算的替代。

更新时间: 2024-11-11 18:37:22

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.13020v2

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

A key objective of interpretability research on large language models (LLMs) is to develop methods for robustly steering models toward desired behaviors. To this end, two distinct approaches to interpretability -- ``bottom-up" and ``top-down" -- have been presented, but there has been little quantitative comparison between them. We present a case study comparing the effectiveness of representative vector steering methods from each branch: function vectors (FV; arXiv:2310.15213), as a bottom-up method, and in-context vectors (ICV; arXiv:2311.06668) as a top-down method. While both aim to capture compact representations of broad in-context learning tasks, we find they are effective only on specific types of tasks: ICVs outperform FVs in behavioral shifting, whereas FVs excel in tasks requiring more precision. We discuss the implications for future evaluations of steering methods and for further research into top-down and bottom-up steering given these findings.

Updated: 2024-11-11 18:36:17

标题: 比较基于自下而上和自上而下的驾驶方法在上下文学习任务中的表现

摘要: 在大型语言模型（LLMs）的可解释性研究中，一个关键目标是开发能够稳健地引导模型朝向期望行为的方法。为此，提出了两种不同的解释性方法——“自下而上”和“自上而下”，但它们之间的定量比较很少。我们提出了一个案例研究，比较了每个分支的代表性向量引导方法的有效性：作为自下而上方法的功能向量（FV；arXiv:2310.15213）和作为自上而下方法的上下文向量（ICV；arXiv:2311.06668）。虽然两者都旨在捕获广泛上下文学习任务的简洁表示，但我们发现它们只对特定类型的任务有效：ICVs在行为转移方面优于FVs，而FVs在需要更高精度的任务中表现出色。我们讨论了这些发现对未来引导方法评估和进一步研究自上而下和自下而上引导的影响。

更新时间: 2024-11-11 18:36:17

领域: cs.LG

下载: http://arxiv.org/abs/2411.07213v1

General Geospatial Inference with a Population Dynamics Foundation Model

Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.

Updated: 2024-11-11 18:32:44

标题: 使用基于人口动态基础模型的通用地理推断

摘要: 支持世界各地动态人口的健康和福祉需要政府机构、组织和研究人员理解并推理人类行为与当地环境之间复杂关系，以便识别高风险群体并在有限资源下进行战略性分配。传统的解决这类问题的方法通常涉及开发手动策划的、任务特定的特征和模型，以代表人类行为和自然及建筑环境，这可能难以适应新的，甚至相关的任务。为了解决这个问题，我们引入了一个人口动态基础模型（PDFM），旨在捕捉不同数据模态之间的关系，并适用于广泛的地理空间任务。我们首先为美国各地的邮政编码和县构建了一个地理索引数据集，从地图、繁忙程度和综合搜索趋势，以及天气和空气质量等环境因素中捕获了丰富的人类行为信息。然后，我们使用图神经网络对这些数据和地点之间的复杂关系进行建模，生成可适应各种下游任务的嵌入，使用相对简单的模型。我们通过在涵盖三个不同领域的27个下游任务上进行基准测试来评估我们方法的有效性：健康指标、社会经济因素和环境测量。该方法在所有27个地理空间插值任务中取得了最先进的表现，并在27个外推和高分辨率任务中的25项中取得了最先进的表现。我们将PDFM与最先进的预测基础模型TimesFM结合起来，以预测失业率和贫困率，取得了超过完全监督预测的表现。所有嵌入和示例代码均可供研究人员公开获取。

更新时间: 2024-11-11 18:32:44

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2411.07207v1

Scaling Law Hypothesis for Multimodal Model

We propose a scaling law hypothesis for multimodal models processing text, audio, images, and video within a shared token and embedding space. Our framework predicts model performance based on modality-specific compression and tokenization efficiency, extending established scaling laws from text-based decoder models to mixed-modality systems. We explore whether leveraging more training data in multiple modalities can reduce the size of the multimodal model, enabling efficient deployment on resource-constrained devices.

Updated: 2024-11-11 18:32:16

标题: 多模式模型的尺度定律假设

摘要: 我们提出了一个多模态模型处理文本、音频、图像和视频的缩放定律假设，这些模型在共享的标记和嵌入空间中运行。我们的框架根据特定模态的压缩和标记化效率来预测模型性能，将已建立的文本解码器模型的定律扩展到混合模态系统。我们探讨了在多种模态中利用更多训练数据是否可以减少多模态模型的大小，从而在资源受限设备上实现高效部署。

更新时间: 2024-11-11 18:32:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.06754v4

Evolving to the Future: Unseen Event Adaptive Fake News Detection on Social Media

With the rapid development of social media, the wide dissemination of fake news on social media is increasingly threatening both individuals and society. One of the unique challenges for fake news detection on social media is how to detect fake news on future events. Recently, numerous fake news detection models that utilize textual information and the propagation structure of posts have been proposed. Unfortunately, most of the existing approaches can hardly handle this challenge since they rely heavily on event-specific features for prediction and cannot generalize to unseen events. To address this, we introduce \textbf{F}uture \textbf{AD}aptive \textbf{E}vent-based Fake news Detection (FADE) framework. Specifically, we train a target predictor through an adaptive augmentation strategy and graph contrastive learning to obtain higher-quality features and make more accurate overall predictions. Simultaneously, we independently train an event-only predictor to obtain biased predictions. We further mitigate event bias by subtracting the event-only predictor's output from the target predictor's output to obtain the final prediction. Encouraging results from experiments designed to emulate real-world social media conditions validate the effectiveness of our method in comparison to existing state-of-the-art approaches.

Updated: 2024-11-11 18:27:18

标题: 未来发展：社交媒体上看不见的事件自适应假新闻检测

摘要: 随着社交媒体的快速发展，社交媒体上虚假新闻的广泛传播越来越威胁个人和社会。在社交媒体上检测虚假新闻的一个独特挑战是如何检测未来事件上的虚假新闻。最近，提出了许多利用文本信息和帖子传播结构的虚假新闻检测模型。不幸的是，大多数现有方法很难处理这一挑战，因为它们过于依赖于事件特定特征进行预测，无法推广到未见过的事件。为了解决这个问题，我们引入了未来自适应事件虚假新闻检测（FADE）框架。具体来说，我们通过自适应增强策略和图对比学习训练目标预测器，以获得更高质量的特征并进行更准确的整体预测。同时，我们独立训练一个仅基于事件的预测器来获得有偏见的预测。我们通过从目标预测器的输出中减去仅事件预测器的输出来进一步减少事件偏差，从而得到最终预测。通过设计模拟真实社交媒体条件的实验获得的鼓舞人心的结果验证了我们的方法相对于现有最先进方法的有效性。

更新时间: 2024-11-11 18:27:18

领域: cs.SI,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.00037v2

Programming Distributed Collective Processes in the eXchange Calculus

Recent trends like the Internet of Things (IoT) suggest a vision of dense and multi-scale deployments of computing devices in nearly all kinds of environments. A prominent engineering challenge revolves around programming the collective adaptive behaviour of such computational ecosystems. This requires abstractions able to capture concepts like ensembles (dynamic groups of cooperating devices) and collective tasks (joint activities carried out by ensembles). In this work, we consider collections of devices interacting with neighbours and that execute in nearly-synchronised sense-compute-interact rounds, where the computation is given by a single program mapping sensing values and incoming messages to output and outcoming messages. To support programming whole computational collectives, we propose the abstraction of a distributed collective process, which can be used to define at once the ensemble formation logic and its collective task. We formalise the abstraction in the eXchange Calculus (XC), a core functional language based on neighbouring values (maps from neighbours to values) where state and interaction is handled through a single primitive, exchange, and provide a corresponding implementation in the FCPP language. Then, we exercise distributed collective processes using two case studies: multi-hop message propagation and distributed monitoring of spatial properties. Finally, we discuss the features of the abstraction and its suitability for different kinds of distributed computing applications.

Updated: 2024-11-11 18:26:31

标题: 在交换演算中编程分布式集体过程

摘要: 近年来的趋势，比如物联网（IoT），提出了在几乎所有种类的环境中部署大量和多尺度的计算设备的愿景。一个重要的工程挑战围绕着如何编程这种计算生态系统的集体适应行为。这需要能够捕捉概念如合奏（动态的合作设备组）和集体任务（由合奏执行的联合活动）的抽象。在这项工作中，我们考虑与邻居互动并在几乎同步的感知-计算-交互轮次中执行的设备集合，其中计算由一个单一程序给出，将感知值和传入消息映射到输出和传出消息。为了支持整个计算集合的编程，我们提出了分布式集体过程的抽象，该抽象可用于一次性定义合奏形成逻辑及其集体任务。我们在eXchange Calculus（XC）中形式化这一抽象，这是一种基于邻近值（从邻居到值的映射）的核心函数语言，其中状态和交互通过一个单一原语exchange 处理，并在FCPP语言中提供相应的实现。然后，我们使用两个案例研究对分布式集体过程进行演练：多跳消息传播和空间属性的分布式监测。最后，我们讨论了这种抽象的特点以及它在不同类型的分布式计算应用中的适用性。

更新时间: 2024-11-11 18:26:31

领域: cs.DC,cs.AI,cs.MA,cs.PL,D.1.3; F.1.1; F.4.3; I.2.11; J.7

下载: http://arxiv.org/abs/2401.11212v2

'Explaining RL Decisions with Trajectories': A Reproducibility Study

This work investigates the reproducibility of the paper 'Explaining RL decisions with trajectories'. The original paper introduces a novel approach in explainable reinforcement learning based on the attribution decisions of an agent to specific clusters of trajectories encountered during training. We verify the main claims from the paper, which state that (i) training on less trajectories induces a lower initial state value, (ii) trajectories in a cluster present similar high-level patterns, (iii) distant trajectories influence the decision of an agent, and (iv) humans correctly identify the attributed trajectories to the decision of the agent. We recover the environments used by the authors based on the partial original code they provided for one of the environments (Grid-World), and implemented the remaining from scratch (Seaquest, HalfCheetah, Breakout and Q*Bert). While we confirm that (i), (ii), and (iii) partially hold, we extend on the largely qualitative experiments from the authors by introducing a quantitative metric to further support (iii), and new experiments and visual results for (i). Moreover, we investigate the use of different clustering algorithms and encoder architectures to further support (ii). We could not support (iv), given the limited extent of the original experiments. We conclude that, while some of the claims can be supported, further investigations and experiments could be of interest. We recognise the novelty of the work from the authors and hope that our work paves the way for clearer and more transparent approaches.

Updated: 2024-11-11 18:24:27

标题: 用轨迹解释强化学习决策：一项可重复性研究

摘要: 这项工作调查了文献“用轨迹解释RL决策”的可重复性。原始论文介绍了一种基于代理人将决策归因于训练过程中遇到的特定轨迹群集的可解释强化学习方法。我们验证了论文中的主要观点，即（i）在较少轨迹上进行训练会导致较低的初始状态值，（ii）群集中的轨迹呈现相似的高级模式，（iii）远距离轨迹会影响代理人的决策，以及（iv）人类能够正确识别与代理人决策相关的轨迹。我们根据作者提供的部分原始代码恢复了使用的环境（Grid-World），并从头实现了其余部分（Seaquest，HalfCheetah，Breakout和Q*Bert）。虽然我们确认（i），（ii）和（iii）部分成立，但我们通过引入定量度量标准进一步支持（iii），并为（i）提供了新的实验和视觉结果，从而扩展了作者的主要定性实验。此外，我们调查了不同的聚类算法和编码器架构以进一步支持（ii）。鉴于原始实验的有限范围，我们无法支持（iv）。我们得出结论，虽然一些观点得到支持，但进一步的调查和实验可能会引起兴趣。我们认识到作者的工作的创新性，并希望我们的工作为更清晰和更透明的方法铺平道路。

更新时间: 2024-11-11 18:24:27

领域: cs.AI

下载: http://arxiv.org/abs/2411.07200v1

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Instruction-guided image editing methods have demonstrated significant potential by training diffusion models on automatically synthesized or manually annotated image editing pairs. However, these methods remain far from practical, real-life applications. We identify three primary challenges contributing to this gap. Firstly, existing models have limited editing skills due to the biased synthesis process. Secondly, these methods are trained with datasets with a high volume of noise and artifacts. This is due to the application of simple filtering methods like CLIP-score. Thirdly, all these datasets are restricted to a single low resolution and fixed aspect ratio, limiting the versatility to handle real-world use cases. In this paper, we present \omniedit, which is an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly. Our contribution is in four folds: (1) \omniedit is trained by utilizing the supervision from seven different specialist models to ensure task coverage. (2) we utilize importance sampling based on the scores provided by large multimodal models (like GPT-4o) instead of CLIP-score to improve the data quality. (3) we propose a new editing architecture called EditNet to greatly boost the editing success rate, (4) we provide images with different aspect ratios to ensure that our model can handle any image in the wild. We have curated a test set containing images of different aspect ratios, accompanied by diverse instructions to cover different tasks. Both automatic evaluation and human evaluations demonstrate that \omniedit can significantly outperform all the existing models. Our code, dataset and model will be available at \url{https://tiger-ai-lab.github.io/OmniEdit/}

Updated: 2024-11-11 18:21:43

标题: OmniEdit：通过专家监督构建图像编辑通用模型

摘要: 指导图像编辑方法通过在自动生成或手动注释的图像编辑对上训练扩散模型，已经展示出显著的潜力。然而，这些方法距离实际应用仍有很大差距。我们确定了导致这种差距的三个主要挑战。首先，现有模型由于偏向性合成过程而具有有限的编辑技能。其次，这些方法是使用具有大量噪音和伪影的数据集进行训练的。这是由于应用简单的过滤方法（如CLIP分数）导致的。第三，所有这些数据集都限制在单一低分辨率和固定宽高比，限制了处理真实用例的多功能性。在本文中，我们提出了\omniedit，这是一个全能编辑器，可以无缝地处理七种不同的图像编辑任务，且具有任何宽高比。我们的贡献有四个方面：（1）\omniedit通过利用来自七个不同专家模型的监督来确保任务覆盖范围。（2）我们利用基于大型多模态模型（如GPT-4o）提供的分数的重要性采样，而不是CLIP分数，以提高数据质量。（3）我们提出了一种名为EditNet的新编辑架构，大大提高了编辑成功率，（4）我们提供了具有不同宽高比的图像，以确保我们的模型可以处理任何野外图像。我们已经策划了一个测试集，其中包含具有不同宽高比的图像，附带各种指令以涵盖不同任务。自动评估和人工评估均表明\omniedit可以显著优于所有现有模型。我们的代码、数据集和模型将可在以下网址获取：https://tiger-ai-lab.github.io/OmniEdit/

更新时间: 2024-11-11 18:21:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.07199v1

LLM-based Continuous Intrusion Detection Framework for Next-Gen Networks

In this paper, we present an adaptive framework designed for the continuous detection, identification and classification of emerging attacks in network traffic. The framework employs a transformer encoder architecture, which captures hidden patterns in a bidirectional manner to differentiate between malicious and legitimate traffic. Initially, the framework focuses on the accurate detection of malicious activities, achieving a perfect recall of 100\% in distinguishing between attack and benign traffic. Subsequently, the system incrementally identifies unknown attack types by leveraging a Gaussian Mixture Model (GMM) to cluster features derived from high-dimensional BERT embeddings. This approach allows the framework to dynamically adjust its identification capabilities as new attack clusters are discovered, maintaining high detection accuracy. Even after integrating additional unknown attack clusters, the framework continues to perform at a high level, achieving 95.6\% in both classification accuracy and recall.The results demonstrate the effectiveness of the proposed framework in adapting to evolving threats while maintaining high accuracy in both detection and identification tasks. Our ultimate goal is to develop a scalable, real-time intrusion detection system that can continuously evolve with the ever-changing network threat landscape.

Updated: 2024-11-11 18:19:22

标题: 基于LLM的下一代网络连续入侵检测框架

摘要: 在本文中，我们提出了一个自适应框架，旨在连续检测、识别和分类网络流量中的新型攻击。该框架采用了一个转换器编码器架构，以双向方式捕获隐藏模式，区分恶意和合法流量。最初，该框架侧重于准确检测恶意活动，实现了在攻击和良性流量之间区分的100\%完美召回率。随后，系统通过利用高维BERT嵌入中得到的特征来聚类，逐步识别未知攻击类型。这种方法使得框架能够在发现新的攻击集群时动态调整其识别能力，保持高检测准确性。即使集成了额外的未知攻击集群，该框架仍然保持高水平表现，在分类准确性和召回率方面均达到95.6\%。结果证明了所提出的框架在适应不断变化的威胁同时保持高检测和识别任务准确性方面的有效性。我们的最终目标是开发一个可扩展的、实时的入侵检测系统，可以不断适应不断变化的网络威胁格局。

更新时间: 2024-11-11 18:19:22

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2411.03354v2

Data-Driven Predictive Control of Nonholonomic Robots Based on a Bilinear Koopman Realization: Data Does Not Replace Geometry

Advances in machine learning and the growing trend towards effortless data generation in real-world systems has led to an increasing interest for data-inferred models and data-based control in robotics. It seems appealing to govern robots solely based on data, bypassing the traditional, more elaborate pipeline of system modeling through first-principles and subsequent controller design. One promising data-driven approach is the Extended Dynamic Mode Decomposition (EDMD) for control-affine systems, a system class which contains many vehicles and machines of immense practical importance including, e.g., typical wheeled mobile robots. EDMD can be highly data-efficient, computationally inexpensive, can deal with nonlinear dynamics as prevalent in robotics and mechanics, and has a sound theoretical foundation rooted in Koopman theory. On this background, this present paper examines how EDMD models can be integrated into predictive controllers for nonholonomic mobile robots. In addition to the conventional kinematic mobile robot, we also cover the complete data-driven control pipeline - from data acquisition to control design - when the robot is not treated in terms of first-order kinematics but in a second-order manner, allowing to account for actuator dynamics. Using only real-world measurement data, it is shown in both simulations and hardware experiments that the surrogate models enable high-precision predictive controllers in the studied cases. However, the findings raise significant concerns about purely data-centric approaches that overlook the underlying geometry of nonholonomic systems, showing that, for nonholonomic systems, some geometric insight seems necessary and cannot be easily compensated for with large amounts of data.

Updated: 2024-11-11 18:08:17

标题: 基于双线性Koopman实现的非完整机器人数据驱动预测控制：数据不能替代几何学

摘要: 机器学习的进展和现实世界系统中对轻松数据生成的增长趋势，已经引起了对机器人领域中基于数据推断模型和基于数据控制的越来越多的兴趣。仅基于数据来管理机器人似乎很吸引人，可以绕过传统的更为繁琐的系统建模流程和随后的控制器设计。一种有希望的数据驱动方法是用于控制仿射系统的扩展动态模态分解（EDMD），这是一个包含许多具有巨大实际重要性的车辆和机器，例如典型的轮式移动机器人的系统类别。EDMD可以具有高效的数据利用率，计算成本低廉，可以处理机器人和机械领域中普遍存在的非线性动态，并且具有根植于库普曼理论的坚实理论基础。在这个背景下，本文研究了如何将EDMD模型整合到用于非完整移动机器人的预测控制器中。除了传统的运动学移动机器人外，我们还涵盖了完整的基于数据驱动的控制流程 - 从数据采集到控制设计 - 当机器人不是以一阶运动学而是以二阶方式处理时，可以考虑执行器动态。仅使用实际测量数据，在模拟和硬件实验中表明，在研究的案例中，替代模型可以实现高精度的预测控制器。然而，研究结果引发了关于纯粹基于数据的方法的重大担忧，这种方法忽视了非完整系统的基本几何结构，显示出对于非完整系统，一些几何洞察力似乎是必要的，不能仅仅通过大量数据进行补偿。

更新时间: 2024-11-11 18:08:17

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2411.07192v1

Diffusion Models for Audio Restoration

With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to recover clean sound signals from the corrupted input data. We present here audio restoration algorithms based on diffusion models, with a focus on speech enhancement and music restoration tasks. Traditional approaches, often grounded in handcrafted rules and statistical heuristics, have shaped our understanding of audio signals. In the past decades, there has been a notable shift towards data-driven methods that exploit the modeling capabilities of DNNs. Deep generative models, and among them diffusion models, have emerged as powerful techniques for learning complex data distributions. However, relying solely on DNN-based learning approaches carries the risk of reducing interpretability, particularly when employing end-to-end models. Nonetheless, data-driven approaches allow more flexibility in comparison to statistical model-based frameworks, whose performance depends on distributional and statistical assumptions that can be difficult to guarantee. Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality. We explain the diffusion formalism and its application to the conditional generation of clean audio signals. We believe that diffusion models open an exciting field of research with the potential to spawn new audio restoration algorithms that are natural-sounding and remain robust in difficult acoustic situations.

Updated: 2024-11-11 18:07:26

标题: 音频恢复的扩散模型

摘要: 随着音频播放设备和快速数据传输的发展，对于娱乐和通信中高音质的需求不断增加。在追求更好音质的过程中，由录制端产生的失真和干扰，或由不完美的传输管道引起的问题逐渐浮现。为解决这一问题，音频恢复方法旨在从受损的输入数据中恢复干净的音频信号。我们在这里介绍基于扩散模型的音频恢复算法，重点放在语音增强和音乐恢复任务上。传统方法通常基于手工制定的规则和统计启发式方法，塑造了我们对音频信号的理解。在过去几十年中，数据驱动方法的发展取代了依赖统计模型的框架，后者的性能取决于可能难以保证的分布和统计假设。我们的目标是展示扩散模型可以结合两种方法的优点，并提供设计具有良好可解释性和声音质量表现卓越的音频恢复算法的机会。我们解释了扩散形式主义及其在条件生成干净音频信号方面的应用。我们相信扩散模型开辟了一个激动人心的研究领域，有潜力衍生出自然声音的新音频恢复算法，并在复杂的声学环境中保持稳健性。

更新时间: 2024-11-11 18:07:26

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2402.09821v3

The Super Weight in Large Language Models

Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present an even more surprising finding: Pruning as few as a single parameter can destroy an LLM's ability to generate text -- increasing perplexity by 3 orders of magnitude and reducing zero-shot accuracy to guessing. We propose a data-free method for identifying such parameters, termed super weights, using a single forward pass through the model. We additionally find that these super weights induce correspondingly rare and large activation outliers, termed super activations. When preserved with high precision, super activations can improve simple round-to-nearest quantization to become competitive with state-of-the-art methods. For weight quantization, we similarly find that by preserving the super weight and clipping other weight outliers, round-to-nearest quantization can scale to much larger block sizes than previously considered. To facilitate further research into super weights, we provide an index of super weight coordinates for common, openly available LLMs.

Updated: 2024-11-11 18:05:48

标题: 大型语言模型中的超参数权重

摘要: 最近的研究表明一个令人惊讶的结果：大型语言模型（LLM）参数的小部分异常值对模型质量具有不成比例的重要性。LLMs包含数十亿个参数，因此这些小部分，如0.01%，相当于数十万个参数。在这项工作中，我们提出了一个更令人惊讶的发现：剪枝至少一个参数就能破坏LLM生成文本的能力——将困惑度提高3个数量级，并将零样本准确性降低到猜测的水平。我们提出了一种无需数据的方法来识别这些参数，称为超权重，只需通过模型进行一次前向传递即可。此外，我们发现这些超权重会引发相应罕见且大的激活异常值，称为超激活。当以高精度保留时，超激活可以改善简单的四舍五入量化方法，与最先进的方法竞争。对于权重量化，我们同样发现通过保留超权重并裁剪其他权重异常值，四舍五入量化可以扩展到比以前考虑的更大的块大小。为了促进对超权重的进一步研究，我们为常见的、公开可用的LLMs提供了超权重坐标的索引。

更新时间: 2024-11-11 18:05:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.07191v1

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

Large language models (LLMs) prompted with text and audio represent the state of the art in various auditory tasks, including speech, music, and general audio, showing emergent abilities on unseen tasks. However, these capabilities have yet to be fully demonstrated in bioacoustics tasks, such as detecting animal vocalizations in large recordings, classifying rare and endangered species, and labeling context and behavior - tasks that are crucial for conservation, biodiversity monitoring, and the study of animal behavior. In this work, we present NatureLM-audio, the first audio-language foundation model specifically designed for bioacoustics. Our carefully curated training dataset comprises text-audio pairs spanning a diverse range of bioacoustics, speech, and music data, designed to address the challenges posed by limited annotated datasets in the field. We demonstrate successful transfer of learned representations from music and speech to bioacoustics, and our model shows promising generalization to unseen taxa and tasks. Importantly, we test NatureLM-audio on a novel benchmark (BEANS-Zero) and it sets the new state of the art (SotA) on several bioacoustics tasks, including zero-shot classification of unseen species. To advance bioacoustics research, we also open-source the code for generating training and benchmark data, as well as for training the model.

Updated: 2024-11-11 18:01:45

标题: 自然语音 - 面向生物声学的音频语言基础模型

摘要: 大型语言模型（LLMs）通过文本和音频提示代表了各种听觉任务中的最新技术，包括语音、音乐和一般音频，在未见任务上展示了新兴能力。然而，这些能力尚未完全在生物声学任务中得到展示，如在大型记录中检测动物鸣叫声、分类罕见和濒危物种以及标记上下文和行为-这些任务对于保护、生物多样性监测和动物行为研究至关重要。在这项工作中，我们提出了NatureLM-audio，这是专门为生物声学设计的第一个音频语言基础模型。我们精心策划的训练数据集包括涵盖各种生物声学、语音和音乐数据的文本-音频对，旨在解决该领域中有限注释数据集所带来的挑战。我们展示了从音乐和语音到生物声学的学习表示的成功转移，我们的模型显示出对未见分类和任务的有希望的泛化能力。重要的是，我们在一个新的基准测试（BEANS-Zero）上测试了NatureLM-audio，并在几个生物声学任务上设定了最新技术（SotA），包括未见物种的零-shot分类。为推进生物声学研究，我们还开源了用于生成训练和基准数据以及训练模型的代码。

更新时间: 2024-11-11 18:01:45

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2411.07186v1

Constructing Gaussian Processes via Samplets

Gaussian Processes face two primary challenges: constructing models for large datasets and selecting the optimal model. This master's thesis tackles these challenges in the low-dimensional case. We examine recent convergence results to identify models with optimal convergence rates and pinpoint essential parameters. Utilizing this model, we propose a Samplet-based approach to efficiently construct and train the Gaussian Processes, reducing the cubic computational complexity to a log-linear scale. This method facilitates optimal regression while maintaining efficient performance.

Updated: 2024-11-11 18:01:03

标题: 通过样本构建高斯过程

摘要: 高斯过程面临两个主要挑战：构建大数据集的模型和选择最佳模型。本硕士论文解决了这些挑战在低维情况下。我们检查最近的收敛结果，以确定具有最佳收敛速率的模型，并确定关键参数。利用这个模型，我们提出了一种基于Samplet的方法来高效构建和训练高斯过程，将立方计算复杂度降低到对数线性规模。这种方法在保持高效性能的同时促进了最佳回归。

更新时间: 2024-11-11 18:01:03

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2411.07277v1

Gradual Fine-Tuning with Graph Routing for Multi-Source Unsupervised Domain Adaptation

Multi-source unsupervised domain adaptation aims to leverage labeled data from multiple source domains for training a machine learning model to generalize well on a target domain without labels. Source domain selection plays a crucial role in determining the model's performance. It relies on the similarities amongst source and target domains. Nonetheless, existing work for source domain selection often involves heavyweight computational procedures, especially when dealing with numerous source domains and the need to identify the best ones from them. In this paper, we introduce a framework for gradual fine tuning (GFT) of machine learning models on multiple source domains. We represent multiple source domains as an undirected weighted graph. We then give a new generalization error bound for GFT along any path within the graph, which is used to determine the optimal path corresponding to the optimal training order. With this formulation, we introduce three lightweight graph-routing strategies which tend to minimize the error bound. Our best strategy improves $2.3\%$ of accuracy over the state-of-the-art on Natural Language Inference (NLI) task and achieves competitive performance on Sentiment Analysis (SA) task, especially a $3.9\%$ improvement on a more diverse subset of data we use for SA.

Updated: 2024-11-11 17:59:21

标题: 逐步微调与图路由的多源无监督领域自适应

摘要: 多源无监督域自适应旨在利用来自多个源域的标记数据训练机器学习模型，以在没有标签的情况下在目标域上进行良好泛化。源域选择在确定模型性能方面起着至关重要的作用。它依赖于源域和目标域之间的相似性。然而，现有的源域选择工作通常涉及繁重的计算程序，特别是当处理大量源域并需要从中识别最佳源域时。在本文中，我们介绍了一个在多个源域上逐步微调（GFT）机器学习模型的框架。我们将多个源域表示为一个无向加权图。然后我们给出了GFT沿图内任何路径的新泛化误差界限，用于确定对应于最佳训练顺序的最佳路径。通过这种公式，我们引入了三种轻量级图路由策略，倾向于最小化误差界限。我们最佳策略在自然语言推理（NLI）任务上比最新技术改进了2.3％的准确性，并在情感分析（SA）任务上取得了竞争性能，特别是在我们用于SA的更多样化数据子集上取得了3.9％的提高。

更新时间: 2024-11-11 17:59:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.07185v1

Revisiting Ensembling in One-Shot Federated Learning

Federated learning (FL) is an appealing approach to training machine learning models without sharing raw data. However, standard FL algorithms are iterative and thus induce a significant communication cost. One-shot federated learning (OFL) trades the iterative exchange of models between clients and the server with a single round of communication, thereby saving substantially on communication costs. Not surprisingly, OFL exhibits a performance gap in terms of accuracy with respect to FL, especially under high data heterogeneity. We introduce FENS, a novel federated ensembling scheme that approaches the accuracy of FL with the communication efficiency of OFL. Learning in FENS proceeds in two phases: first, clients train models locally and send them to the server, similar to OFL; second, clients collaboratively train a lightweight prediction aggregator model using FL. We showcase the effectiveness of FENS through exhaustive experiments spanning several datasets and heterogeneity levels. In the particular case of heterogeneously distributed CIFAR-10 dataset, FENS achieves up to a 26.9% higher accuracy over state-of-the-art (SOTA) OFL, being only 3.1% lower than FL. At the same time, FENS incurs at most 4.3x more communication than OFL, whereas FL is at least 10.9x more communication-intensive than FENS.

Updated: 2024-11-11 17:58:28

标题: 重新审视一次性联合学习中的集成技术

摘要: 联邦学习（FL）是一种吸引人的方法，可以在不共享原始数据的情况下训练机器学习模型。然而，标准的FL算法是迭代的，因此会产生显著的通信成本。一次性联邦学习（OFL）通过一轮通信取代了客户端和服务器之间模型的迭代交换，从而大大节省了通信成本。毫不奇怪，在准确性方面，OFL相对于FL存在着性能差距，特别是在数据异质性高的情况下。我们介绍了FENS，一种新颖的联邦集成方案，它在通信效率方面接近FL的准确性。在FENS中，学习分为两个阶段：首先，客户端在本地训练模型并将其发送到服务器，类似于OFL；其次，客户端通过FL协作训练一个轻量级的预测聚合器模型。我们通过涵盖多个数据集和异质性水平的详尽实验展示了FENS的有效性。在异构分布的CIFAR-10数据集的特定情况下，FENS相对于最先进的OFL实现了高达26.9％的更高准确性，仅比FL低3.1％。与此同时，FENS的通信成本最多比OFL高出4.3倍，而FL的通信密集度至少比FENS高出10.9倍。

更新时间: 2024-11-11 17:58:28

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2411.07182v1

Counterfactual Generation from Language Models

Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or manipulation of linear subspaces tied to specific concepts -- to intervene on these models. To understand the impact of interventions precisely, it is useful to examine counterfactuals -- e.g., how a given sentence would have appeared had it been generated by the model following a specific intervention. We highlight that counterfactual reasoning is conceptually distinct from interventions, as articulated in Pearl's causal hierarchy. Based on this observation, we propose a framework for generating true string counterfactuals by reformulating language models as Generalized Structural-equation. Models using the Gumbel-max trick. This allows us to model the joint distribution over original strings and their counterfactuals resulting from the same instantiation of the sampling noise. We develop an algorithm based on hindsight Gumbel sampling that allows us to infer the latent noise variables and generate counterfactuals of observed strings. Our experiments demonstrate that the approach produces meaningful counterfactuals while at the same time showing that commonly used intervention techniques have considerable undesired side effects.

Updated: 2024-11-11 17:57:30

标题: 语言模型生成的反事实情景

摘要: 理解和操纵语言模型中的因果生成机制对于控制它们的行为至关重要。先前的工作主要依赖于诸如表示手术等技术，例如模型消融或与特定概念相关的线性子空间的操纵，以干预这些模型。为了准确理解干预的影响，检查反事实是有用的，例如，如果特定干预后模型生成了给定句子，它会是什么样子。我们强调，反事实推理在概念上与干预有所不同，正如Pearl的因果层次所阐述的那样。基于这一观察，我们提出了一个框架，将语言模型重新构建为广义结构方程模型，使用Gumbel-max技巧。这使我们能够对原始字符串和由相同采样噪声实例化导致的反事实的联合分布进行建模。我们开发了一种基于事后Gumbel采样的算法，使我们能够推断潜在的噪声变量并生成观察到的字符串的反事实。我们的实验表明，这种方法产生了有意义的反事实，同时还显示出常用的干预技术具有相当大的不良副作用。

更新时间: 2024-11-11 17:57:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.07180v1

Joint Age-State Belief is All You Need: Minimizing AoII via Pull-Based Remote Estimation

Age of incorrect information (AoII) is a recently proposed freshness and mismatch metric that penalizes an incorrect estimation along with its duration. Therefore, keeping track of AoII requires the knowledge of both the source and estimation processes. In this paper, we consider a time-slotted pull-based remote estimation system under a sampling rate constraint where the information source is a general discrete-time Markov chain (DTMC) process. Moreover, packet transmission times from the source to the monitor are non-zero which disallows the monitor to have perfect information on the actual AoII process at any time. Hence, for this pull-based system, we propose the monitor to maintain a sufficient statistic called {\em belief} which stands for the joint distribution of the age and source processes to be obtained from the history of all observations. Using belief, we first propose a maximum a posteriori (MAP) estimator to be used at the monitor as opposed to existing martingale estimators in the literature. Second, we obtain the optimality equations from the belief-MDP (Markov decision process) formulation. Finally, we propose two belief-dependent policies one of which is based on deep reinforcement learning, and the other one is a threshold-based policy based on the instantaneous expected AoII.

Updated: 2024-11-11 17:57:25

标题: 关节年龄-状态信念就是你所需要的：通过基于拉取的远程估计来最小化AoII

摘要: 错误信息的年龄（AoII）是最近提出的新鲜度和不匹配度量标准，惩罚不正确的估计以及其持续时间。因此，跟踪AoII需要了解信息来源和估计过程。在本文中，我们考虑了一个受时间槽拉取约束的远程估计系统，其中信息来源是一个通用的离散时间马尔可夫链（DTMC）过程。此外，从源到监视器的数据包传输时间不为零，这导致监视器无法在任何时间点完全了解实际的AoII过程。因此，对于这种拉取式系统，我们建议监视器维护一个称为“信念”的充分统计量，表示从所有观测历史中获得的年龄和来源过程的联合分布。使用信念，我们首先提出了一个最大后验（MAP）估计器，用于监视器，与文献中现有的鞍点估计器相对。其次，我们从信念-MDP（马尔可夫决策过程）制定中获得了最优性方程。最后，我们提出了两个基于信念的策略，其中一个基于深度强化学习，另一个是基于瞬时期望AoII的基于阈值的策略。

更新时间: 2024-11-11 17:57:25

领域: cs.IT,cs.LG,cs.NI,cs.SY,eess.SP,eess.SY,math.IT

下载: http://arxiv.org/abs/2411.07179v1

More Expressive Attention with Negative Weights

We propose a novel attention mechanism, named Cog Attention, that enables attention weights to be negative for enhanced expressiveness, which stems from two key factors: (1) Cog Attention can shift the token deletion and copying function from a static OV matrix to dynamic QK inner products, with the OV matrix now focusing more on refinement or modification. The attention head can simultaneously delete, copy, or retain tokens by assigning them negative, positive, or minimal attention weights, respectively. As a result, a single attention head becomes more flexible and expressive. (2) Cog Attention improves the model's robustness against representational collapse, which can occur when earlier tokens are over-squashed into later positions, leading to homogeneous representations. Negative weights reduce effective information paths from earlier to later tokens, helping to mitigate this issue. We develop Transformer-like models which use Cog Attention as attention modules, including decoder-only models for language modeling and U-ViT diffusion models for image generation. Experiments show that models using Cog Attention exhibit superior performance compared to those employing traditional softmax attention modules. Our approach suggests a promising research direction for rethinking and breaking the entrenched constraints of traditional softmax attention, such as the requirement for non-negative weights.

Updated: 2024-11-11 17:56:28

标题: 更具表现力的注意力与负权重

摘要: 我们提出了一种新颖的注意力机制，名为Cog Attention，允许注意力权重为负以增强表现力，这源于两个关键因素：(1) Cog Attention可以将标记删除和复制功能从静态OV矩阵转移到动态QK内积，现在OV矩阵更专注于细化或修改。注意力头可以通过分配负、正或最小的注意力权重来同时删除、复制或保留标记，结果，单个注意力头变得更加灵活和表现力强。 (2) Cog Attention提高了模型对表征坍塌的鲁棒性，当较早的标记被过度压缩到后来的位置时，可能会导致均质的表征。负权重减少了从较早标记到后来标记的有效信息路径，有助于减轻这个问题。我们开发了使用Cog Attention作为注意力模块的Transformer-like模型，包括仅解码器模型用于语言建模和U-ViT扩散模型用于图像生成。实验表明，使用Cog Attention的模型表现优于使用传统softmax注意力模块的模型。我们的方法为重新思考和打破传统softmax注意力的根深蒂固的约束提供了一种有前途的研究方向，比如对非负权重的要求。

更新时间: 2024-11-11 17:56:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.07176v1

Mutual Information Estimation via $f$-Divergence and Data Derangements

Estimating mutual information accurately is pivotal across diverse applications, from machine learning to communications and biology, enabling us to gain insights into the inner mechanisms of complex systems. Yet, dealing with high-dimensional data presents a formidable challenge, due to its size and the presence of intricate relationships. Recently proposed neural methods employing variational lower bounds on the mutual information have gained prominence. However, these approaches suffer from either high bias or high variance, as the sample size and the structure of the loss function directly influence the training process. In this paper, we propose a novel class of discriminative mutual information estimators based on the variational representation of the $f$-divergence. We investigate the impact of the permutation function used to obtain the marginal training samples and present a novel architectural solution based on derangements. The proposed estimator is flexible since it exhibits an excellent bias/variance trade-off. The comparison with state-of-the-art neural estimators, through extensive experimentation within established reference scenarios, shows that our approach offers higher accuracy and lower complexity.

Updated: 2024-11-11 17:53:15

标题: 通过$f$-分歧和数据扰动进行互信息估计

摘要: 准确估计互信息在各种应用中至关重要，从机器学习到通信和生物学，使我们能够洞悉复杂系统内部机制。然而，处理高维数据面临巨大挑战，由于其规模和复杂关系的存在。最近提出的利用变分下界对互信息进行估计的神经方法备受关注。然而，这些方法要么存在高偏差，要么存在高方差，因为样本大小和损失函数的结构直接影响训练过程。本文提出了一种基于$f$-散度的变分表示的新型判别互信息估计器。我们研究了用于获取边际训练样本的置换函数的影响，并提出了基于错位的新颖架构解决方案。所提出的估计器具有灵活性，因为它表现出出色的偏差/方差权衡。通过在已建立的参考场景中进行大量实验，与最先进的神经估计器进行比较，我们的方法显示出更高的准确性和更低的复杂性。

更新时间: 2024-11-11 17:53:15

领域: cs.LG,cs.IT,eess.SP,math.IT

下载: http://arxiv.org/abs/2305.20025v2

Anytime Sequential Halving in Monte-Carlo Tree Search

Monte-Carlo Tree Search (MCTS) typically uses multi-armed bandit (MAB) strategies designed to minimize cumulative regret, such as UCB1, as its selection strategy. However, in the root node of the search tree, it is more sensible to minimize simple regret. Previous work has proposed using Sequential Halving as selection strategy in the root node, as, in theory, it performs better with respect to simple regret. However, Sequential Halving requires a budget of iterations to be predetermined, which is often impractical. This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result, while being designed such that it approximates the behavior of Sequential Halving. Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm's performance is competitive with Sequential Halving and UCB1 (and their analogues in MCTS).

Updated: 2024-11-11 17:49:47

标题: 蒙特卡罗树搜索中的任意时序二分法

摘要: 蒙特卡洛树搜索(MCTS)通常使用多臂老虎机(MAB)策略，如UCB1，旨在最小化累积遗憾作为其选择策略。然而，在搜索树的根节点中，最好最小化简单遗憾。先前的工作提出在根节点中使用顺序减半作为选择策略，因为在理论上，它在简单遗憾方面表现更好。然而，顺序减半需要预先确定迭代预算，这通常是不切实际的。本文提出了该算法的任意版本，可以在任意时间停止，并仍返回令人满意的结果，同时设计得使其近似顺序减半的行为。在合成MAB问题和十种不同的棋盘游戏中的实证结果表明，该算法的性能与顺序减半和UCB1(以及它们在MCTS中的类似物)竞争力相当。

更新时间: 2024-11-11 17:49:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.07171v1

Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network

Mining machinery operating in variable environments faces high wear and unpredictable stress, challenging Predictive Maintenance (PdM). This paper introduces the Edge Sensor Network for Predictive Maintenance (ESN-PdM), a hierarchical inference framework across edge devices, gateways, and cloud services for real-time condition monitoring. The system dynamically adjusts inference locations--on-device, on-gateway, or on-cloud--based on trade-offs among accuracy, latency, and battery life, leveraging Tiny Machine Learning (TinyML) techniques for model optimization on resource-constrained devices. Performance evaluations showed that on-sensor and on-gateway inference modes achieved over 90\% classification accuracy, while cloud-based inference reached 99\%. On-sensor inference reduced power consumption by approximately 44\%, enabling up to 104 hours of operation. Latency was lowest for on-device inference (3.33 ms), increasing when offloading to the gateway (146.67 ms) or cloud (641.71 ms). The ESN-PdM framework provides a scalable, adaptive solution for reliable anomaly detection and PdM, crucial for maintaining machinery uptime in remote environments. By balancing accuracy, latency, and energy consumption, this approach advances PdM frameworks for industrial applications.

Updated: 2024-11-11 17:48:04

标题: 通过TinyML启用的分层推理网络增强矿业移动机械的预测维护

摘要: 在变化环境中运行的采矿机械面临高磨损和不可预测的压力，挑战着预测性维护（PdM）。本文介绍了边缘传感器网络预测性维护（ESN-PdM），这是一个跨边缘设备、网关和云服务的层次推理框架，用于实时条件监测。该系统根据准确性、延迟和电池寿命之间的权衡动态调整推理位置——设备上、网关上或云端上，利用微型机器学习（TinyML）技术在资源受限设备上进行模型优化。性能评估显示，传感器和网关上的推理模式实现了超过90\%的分类准确性，而基于云端的推理达到了99\%。传感器上的推理将功耗降低了约44\%，使其可以运行长达104小时。延迟在设备上的推理中最低（3.33毫秒），当将推理内容转移到网关（146.67毫秒）或云端（641.71毫秒）时增加。ESN-PdM框架为可靠的异常检测和PdM提供了可扩展的自适应解决方案，对于在偏远环境中维护机械正常运行时间至关重要。通过平衡准确性、延迟和能量消耗，这种方法推动了工业应用的PdM框架。

更新时间: 2024-11-11 17:48:04

领域: cs.LG,cs.DC,cs.MA,cs.NI,eess.SP

下载: http://arxiv.org/abs/2411.07168v1

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at https://github.com/mit-han-lab/efficientvit.

Updated: 2024-11-11 17:42:37

标题: 深度压缩自编码器用于高效的高分辨率扩散模型

摘要: 我们提出了Deep Compression Autoencoder（DC-AE），这是一种用于加速高分辨率扩散模型的新型自动编码器模型。现有的自动编码器模型在适度的空间压缩比（例如8倍）下展示出令人印象深刻的结果，但在高空间压缩比（例如64倍）下却无法保持令人满意的重建准确性。我们通过引入两种关键技术来解决这一挑战：（1）残差自动编码，我们设计我们的模型基于从空间到通道转换的特征学习残差，以缓解高空间压缩自动编码器的优化困难；（2）解耦的高分辨率适应，一种有效的解耦三阶段训练策略，用于减轻高空间压缩自动编码器的泛化惩罚。通过这些设计，我们将自动编码器的空间压缩比提高至128倍，同时保持重建质量。将我们的DC-AE应用于潜在扩散模型，我们实现了显著的加速而不会降低准确性。例如，在ImageNet 512x512上，我们的DC-AE在H100 GPU上为UViT-H提供了19.1倍的推断加速和17.9倍的训练加速，同时与广泛使用的SD-VAE-f8自动编码器相比，实现了更好的FID。我们的代码可在https://github.com/mit-han-lab/efficientvit上找到。

更新时间: 2024-11-11 17:42:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.10733v2

A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19

Monitoring public sentiment via social media is potentially helpful during health crises such as the COVID-19 pandemic. However, traditional frequency-based, data-driven neural network-based approaches can miss newly relevant content due to the evolving nature of language in a dynamically evolving environment. Human-curated symbolic knowledge sources, such as lexicons for standard language and slang terms, can potentially elevate social media signals in evolving language. We introduce a neurosymbolic method that integrates neural networks with symbolic knowledge sources, enhancing the detection and interpretation of mental health-related tweets relevant to COVID-19. Our method was evaluated using a corpus of large datasets (approximately 12 billion tweets, 2.5 million subreddit data, and 700k news articles) and multiple knowledge graphs. This method dynamically adapts to evolving language, outperforming purely data-driven models with an F1 score exceeding 92\%. This approach also showed faster adaptation to new data and lower computational demands than fine-tuning pre-trained large language models (LLMs). This study demonstrates the benefit of neurosymbolic methods in interpreting text in a dynamic environment for tasks such as health surveillance.

Updated: 2024-11-11 17:41:54

标题: 一种领域无关的神经符号方法用于大规模社交数据分析：评估COVID-19期间社交媒体上的心理健康情绪

摘要: 通过社交媒体监测公众情绪在COVID-19大流行等健康危机期间潜在地有益。然而，传统基于频率的数据驱动的神经网络方法可能会错过由于语言在动态演变环境中的演变性质而产生的新相关内容。人工策划的符号知识源，如用于标准语言和俚语术语的词典，潜在地能够提升社交媒体信号在演变语言中的地位。我们引入了一种神经符号方法，将神经网络与符号知识源整合，增强了检测和解释与COVID-19相关的心理健康相关推文。我们的方法使用大量数据集（约120亿推文，250万个subreddit数据和70万条新闻文章）和多个知识图进行评估。该方法动态适应演变的语言，优于纯数据驱动模型，F1分数超过92%。这种方法还显示出对新数据的更快适应和对预训练大型语言模型（LLMs）进行微调的较低计算需求。本研究展示了神经符号方法在解释动态环境中的文本，例如健康监测任务中的好处。

更新时间: 2024-11-11 17:41:54

领域: cs.AI,I.2.4; I.2.6; I.2.7; I.2.0

下载: http://arxiv.org/abs/2411.07163v1

RoundTable: Investigating Group Decision-Making Mechanism in Multi-Agent Collaboration

This study investigates the efficacy of Multi-Agent Systems in eliciting cross-agent communication and enhancing collective intelligence through group decision-making in a decentralized setting. Unlike centralized mechanisms, where a fixed hierarchy governs social choice, decentralized group decision-making allows agents to engage in joint deliberation. Our research focuses on the dynamics of communication and decision-making within various social choice methods. By applying different voting rules in various environments, we find that moderate decision flexibility yields better outcomes. Additionally, exploring the linguistic features of agent-to-agent conversations reveals indicators of effective collaboration, offering insights into communication patterns that facilitate or hinder collaboration. Finally, we propose various methods for determining the optimal stopping point in multi-agent collaborations based on linguistic cues. Our findings contribute to a deeper understanding of how decentralized decision-making and group conversation shape multi-agent collaboration, with implications for the design of more effective MAS environments.

Updated: 2024-11-11 17:37:47

标题: 圆桌讨论：多智能体协作中的群体决策机制调查

摘要: 这项研究探讨了多智能体系统在去中心化环境中通过群体决策激发跨智能体沟通，增强集体智慧的效力。与集中式机制不同，其中固定的等级制度统治社会选择，去中心化的群体决策允许智能体参与共同协商。我们的研究聚焦于不同社会选择方法内沟通和决策的动态。通过在不同环境中应用不同的投票规则，我们发现适度的决策灵活性能产生更好的结果。此外，探索智能体间对话的语言特征揭示出有效协作的指标，提供了有助于或阻碍协作的沟通模式的洞见。最后，我们提出了基于语言线索确定多智能体协作最佳停止点的各种方法。我们的发现有助于更深入地了解去中心化决策和群体对话如何塑造多智能体协作，对设计更有效的多智能体系统环境具有重要意义。

更新时间: 2024-11-11 17:37:47

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2411.07161v1

CDR: Customizable Density Ratios of Strong-over-weak LLMs for Preference Annotation

Preference tuning of large language models (LLMs) relies on high-quality human preference data, which is often expensive and time-consuming to gather. While existing methods can use trained reward models or proprietary model as judges for preference annotation, they have notable drawbacks: training reward models remain dependent on initial human data, and using proprietary model imposes license restrictions that inhibits commercial usage. In this paper, we introduce customized density ratio (CDR), a training-free and highly effective method that leverages off-the-shelf LLMs for preference data annotation. Our approach uses the log-density ratio between a better-aligned LLM and a less aligned LLM as a reward signal. We explores 221 different LLMs pairs and empirically demonstrate that increasing the performance gap between paired LLMs correlates with better reward generalization. Furthermore, we show that tailoring the density ratio reward function with specific criteria and preference exemplars enhances performance across domains and within target areas. In our experiment using density ratio from a pair of Mistral-7B models, CDR achieves a RewardBench score of 82.6, outperforming the best trained reward functions from same model class and demonstrating competitive performance against SoTA models in Safety (91.0) and Reasoning (88.0) domains. We use CDR to annotate an on-policy preference dataset with which we preference tune Llama-3-8B-Instruct with SimPO. Using reward signals from two relatively weak models, our approach pushes Llama-3-8B to achieve a 37.4% (+15.1%) win rate on ArenaHard and a 40.7% (+17.8%) win rate on Length-Controlled AlpacaEval 2.0, along with a score of 8.0 on MT-Bench.

Updated: 2024-11-11 17:34:00

标题: CDR：用于偏好标注的强-弱LLMs的可定制密度比率

摘要: 大型语言模型（LLMs）的偏好调整依赖于高质量的人类偏好数据，通常耗时且昂贵。虽然现有方法可以使用训练有素的奖励模型或专有模型作为偏好标注的评判者，但它们存在明显的缺点：训练奖励模型仍然依赖于最初的人类数据，而使用专有模型则会施加许可限制，限制了商业使用。在本文中，我们介绍了定制的密度比（CDR），这是一种无需训练且高效的方法，利用现成的LLMs进行偏好数据标注。我们的方法使用更好对齐的LLM和不够对齐的LLM之间的对数密度比作为奖励信号。我们探索了221对不同的LLMs，并通过实验证明，增加配对LLMs之间的性能差距与更好的奖励泛化相关。此外，我们展示了通过特定标准和偏好示例定制密度比奖励函数可以提高跨领域和目标领域内的性能。在我们使用一对Mistral-7B模型的密度比进行实验中，CDR实现了82.6的RewardBench分数，优于同一模型类别中最佳训练的奖励函数，并展示了在安全（91.0）和推理（88.0）领域中与SoTA模型的竞争性表现。我们使用CDR对一个在线偏好数据集进行标注，通过这个数据集我们偏好调整了Llama-3-8B-Instruct和SimPO。通过来自两个相对较弱模型的奖励信号，我们的方法将Llama-3-8B推动到ArenaHard上取得了37.4%（+15.1%）的胜率，以及在Length-Controlled AlpacaEval 2.0上取得了40.7%（+17.8%）的胜率，同时在MT-Bench上获得了8.0的分数。

更新时间: 2024-11-11 17:34:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.02481v2

Conditional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps

Conditional simulation is a fundamental task in statistical modeling: Generate samples from the conditionals given finitely many data points from a joint distribution. One promising approach is to construct conditional Brenier maps, where the components of the map pushforward a reference distribution to conditionals of the target. While many estimators exist, few, if any, come with statistical or algorithmic guarantees. To this end, we propose a non-parametric estimator for conditional Brenier maps based on the computational scalability of \emph{entropic} optimal transport. Our estimator leverages a result of Carlier et al. (2010), which shows that optimal transport maps under a rescaled quadratic cost asymptotically converge to conditional Brenier maps; our estimator is precisely the entropic analogues of these converging maps. We provide heuristic justifications for choosing the scaling parameter in the cost as a function of the number of samples by fully characterizing the Gaussian setting. We conclude by comparing the performance of the estimator to other machine learning and non-parametric approaches on benchmark datasets and Bayesian inference problems.

Updated: 2024-11-11 17:32:47

标题: 通过熵优化输运进行条件模拟：朝向条件Brenier映射的非参数估计

摘要: 条件模拟是统计建模中的一个基本任务：从联合分布中有限数量的数据点生成条件样本。一种有前途的方法是构建条件Brenier映射，其中映射的组件将一个参考分布推向目标的条件分布。虽然存在许多估计量，但很少有任何一个带有统计或算法保证。为此，我们提出了一种基于计算可扩展性的非参数估计器，用于条件Brenier映射，这是基于\emph{熵}最优输运的。我们的估计器利用Carlier等人(2010)的结果，该结果表明，在一个经过重新缩放的二次成本下，最优输运映射会渐近地收敛到条件Brenier映射；我们的估计器恰好是这些收敛映射的熵对应物。我们通过充分表征高斯设置来提供选择成本中的缩放参数的启发式理由，作为样本数量的函数。最后，我们通过将估计器的性能与基准数据集和贝叶斯推理问题上的其他机器学习和非参数方法进行比较来总结。

更新时间: 2024-11-11 17:32:47

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2411.07154v1

Using Large Language Models for Hyperparameter Optimization

This paper explores the use of foundational large language models (LLMs) in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning models, yet their optimization often relies on manual approaches in limited-budget settings. By prompting LLMs with dataset and model descriptions, we develop a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance. Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods like Bayesian optimization across different models on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches.

Updated: 2024-11-11 17:30:55

标题: 使用大型语言模型进行超参数优化

摘要: 本文探讨了在超参数优化（HPO）中使用基础大型语言模型（LLMs）的方法。超参数在确定机器学习模型的有效性方面至关重要，然而它们的优化通常依赖于有限预算下的手动方法。通过用数据集和模型描述提示LLMs，我们开发了一种方法，其中LLMs提出超参数配置建议，这些配置根据模型性能进行迭代优化。我们在标准基准测试上的实证评估表明，在受限搜索预算范围内，LLMs可以在标准基准测试中的不同模型上与贝叶斯优化等传统HPO方法相匹敌甚至表现更好。此外，我们提出将指定模型的代码视为一个超参数，LLM输出该代码，比现有HPO方法提供更大的灵活性。

更新时间: 2024-11-11 17:30:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.04528v2

HierTOD: A Task-Oriented Dialogue System Driven by Hierarchical Goals

Task-Oriented Dialogue (TOD) systems assist users in completing tasks through natural language interactions, often relying on a single-layered workflow structure for slot-filling in public tasks, such as hotel bookings. However, in enterprise environments, which involve rich domain-specific knowledge, TOD systems face challenges due to task complexity and the lack of standardized documentation. In this work, we introduce HierTOD, an enterprise TOD system driven by hierarchical goals and can support composite workflows. By focusing on goal-driven interactions, our system serves a more proactive role, facilitating mixed-initiative dialogue and improving task completion. Equipped with components for natural language understanding, composite goal retriever, dialogue management, and response generation, backed by a well-organized data service with domain knowledge base and retrieval engine, HierTOD delivers efficient task assistance. Furthermore, our system implementation unifies two TOD paradigms: slot-filling for information collection and step-by-step guidance for task execution. Our human study demonstrates the effectiveness and helpfulness of HierTOD in performing both paradigms.

Updated: 2024-11-11 17:28:19

标题: HierTOD：一个由分层目标驱动的任务导向对话系统

摘要: 任务导向对话（TOD）系统通过自然语言交互帮助用户完成任务，通常依赖于单层工作流结构来填充公共任务中的插槽，例如酒店预订。然而，在涉及丰富领域特定知识的企业环境中，TOD系统面临由于任务复杂性和缺乏标准化文档而产生的挑战。在这项工作中，我们介绍了HierTOD，一种由分层目标驱动的企业TOD系统，可以支持复合工作流。通过专注于目标驱动的交互，我们的系统发挥更积极的作用，促进混合主动对话并提高任务完成。HierTOD配备了自然语言理解、复合目标检索器、对话管理和响应生成组件，支持由领域知识库和检索引擎支持的组织良好的数据服务，为任务提供高效的帮助。此外，我们的系统实现统一了两种TOD范式：信息收集的插槽填充和任务执行的逐步引导。我们的人类研究证明了HierTOD在执行这两种范式时的有效性和帮助性。

更新时间: 2024-11-11 17:28:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.07152v1

Why the p-norms $p{=}1$, $p{=}2$ and $p{=}\infty$ are so special? An answer based on spatial uniformity

Among all metrics based on p-norms, the Manhattan (p=1), euclidean (p=2) and Chebyshev distances (p=infinity) are the most widely used for their interpretability, simplicity and technical convenience. But these are not the only arguments for the ubiquity of these three p-norms. This article proves that there is a volume-surface correspondence property that is unique to them. More precisely, it is shown that sampling uniformly from the volume of an n-dimensional p-ball and projecting to its surface is equivalent to directly sampling uniformly from its surface if and only if p is 1, 2 or infinity. Sampling algorithms and their implementations in Python are also provided.

Updated: 2024-11-11 17:27:11

标题: 为什么p-范数$p{=}1$, $p{=}2$和$p{=}\infty$如此特殊？基于空间均匀性的回答

摘要: 在基于p-范数的所有度量中，曼哈顿（p=1）、欧几里得（p=2）和切比雪夫距离（p=无穷大）是最广泛使用的，因为它们具有可解释性、简单性和技术便利性。但这并不是这三种p-范数普遍性的唯一原因。本文证明了它们具有独特的体积-表面对应特性。更确切地说，证明了从n维p-球的体积均匀抽样并投影到其表面，等价于直接从其表面均匀抽样，当且仅当p为1、2或无穷大时成立。文章还提供了Python中的抽样算法及其实现。

更新时间: 2024-11-11 17:27:11

领域: math.ST,cs.CR,cs.NA,math.NA,stat.TH

下载: http://arxiv.org/abs/2411.13567v1

On the Counting of Involutory MDS Matrices

The optimal branch number of MDS matrices has established their importance in designing diffusion layers for various block ciphers and hash functions. As a result, numerous matrix structures, including Hadamard and circulant matrices, have been proposed for constructing MDS matrices. Also, in the literature, significant attention is typically given to identifying MDS candidates with optimal implementations or proposing new constructions across different orders. However, this paper takes a different approach by not emphasizing efficiency issues or introducing new constructions. Instead, its primary objective is to enumerate Hadamard MDS and involutory Hadamard MDS matrices of order $4$ within the field $\mathbb{F}_{2^r}$. Specifically, it provides an explicit formula for the count of both Hadamard MDS and involutory Hadamard MDS matrices of order $4$ over $\mathbb{F}_{2^r}$. Additionally, it derives the count of Hadamard Near-MDS (NMDS) and involutory Hadamard NMDS matrices, each with exactly one zero in each row, of order $4$ over $\mathbb{F}_{2^r}$. Furthermore, the paper discusses some circulant-like matrices for constructing NMDS matrices and proves that when $n$ is even, any $2n \times 2n$ Type-II circulant-like matrix can never be an NMDS matrix. While it is known that NMDS matrices may be singular, this paper establishes that singular Hadamard matrices can never be NMDS matrices. Moreover, it proves that there exist exactly two orthogonal Type-I circulant-like matrices of order $4$ over $\mathbb{F}_{2^r}$.

Updated: 2024-11-11 17:25:53

标题: 论可逆MDS矩阵的计数

摘要: MDS矩阵的最佳分支数已经确定了它们在设计各种分组密码和哈希函数的扩散层中的重要性。因此，已经提出了许多矩阵结构，包括Hadamard矩阵和循环矩阵，用于构造MDS矩阵。此外，在文献中，通常会特别关注识别具有最佳实现或在不同阶数之间提出新构造的MDS候选者。然而，本文采用了不同的方法，不强调效率问题或引入新的构造。相反，其主要目标是在域$\mathbb{F}_{2^r}$中枚举阶数为$4$的Hadamard MDS和逆Hadamard MDS矩阵。具体而言，它提供了一个关于$\mathbb{F}_{2^r}$上阶数为$4$的Hadamard MDS和逆Hadamard MDS矩阵数量的显式公式。此外，它推导了在$\mathbb{F}_{2^r}$上阶数为$4$的Hadamard近MDS（NMDS）和逆Hadamard NMDS矩阵的数量，每行恰好有一个零。此外，论文讨论了用于构造NMDS矩阵的类似循环矩阵，并证明当$n$为偶数时，任何$2n \times 2n$的Type-II类循环矩阵永远不可能是NMDS矩阵。虽然已知NMDS矩阵可能是奇异的，但本文证明奇异的Hadamard矩阵永远不可能是NMDS矩阵。此外，它证明了在$\mathbb{F}_{2^r}$上存在正好两个阶数为$4$的正交Type-I类循环矩阵。

更新时间: 2024-11-11 17:25:53

领域: cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2310.00090v4

Knowledge Transfer in Deep Reinforcement Learning via an RL-Specific GAN-Based Correspondence Function

Deep reinforcement learning has demonstrated superhuman performance in complex decision-making tasks, but it struggles with generalization and knowledge reuse - key aspects of true intelligence. This article introduces a novel approach that modifies Cycle Generative Adversarial Networks specifically for reinforcement learning, enabling effective one-to-one knowledge transfer between two tasks. Our method enhances the loss function with two new components: model loss, which captures dynamic relationships between source and target tasks, and Q-loss, which identifies states significantly influencing the target decision policy. Tested on the 2-D Atari game Pong, our method achieved 100% knowledge transfer in identical tasks and either 100% knowledge transfer or a 30% reduction in training time for a rotated task, depending on the network architecture. In contrast, using standard Generative Adversarial Networks or Cycle Generative Adversarial Networks led to worse performance than training from scratch in the majority of cases. The results demonstrate that the proposed method ensured enhanced knowledge generalization in deep reinforcement learning.

Updated: 2024-11-11 17:23:26

标题: 深度强化学习中的知识传递：基于RL特定GAN的对应函数

摘要: 深度强化学习在复杂决策任务中展示出超人类的表现，但在泛化和知识重用方面存在困难 - 这是真正智能的关键方面。本文介绍了一种新的方法，专门修改了Cycle生成对抗网络，用于强化学习，实现了两个任务之间有效的一对一知识转移。我们的方法将损失函数增强了两个新组件：模型损失，捕捉源任务和目标任务之间的动态关系，以及Q损失，识别明显影响目标决策策略的状态。在2-D Atari游戏Pong上进行测试，我们的方法在相同任务中实现了100%的知识转移，并在旋转任务中实现了100%的知识转移或30%的训练时间缩短，具体取决于网络架构。相比之下，使用标准生成对抗网络或Cycle生成对抗网络在大多数情况下导致性能不佳，不如从头开始训练。结果表明，所提出的方法确保了深度强化学习中知识泛化的增强。

更新时间: 2024-11-11 17:23:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2209.06604v2

Variational Graph Contrastive Learning

Graph representation learning (GRL) is a fundamental task in machine learning, aiming to encode high-dimensional graph-structured data into low-dimensional vectors. Self-supervised learning (SSL) methods are widely used in GRL because they can avoid expensive human annotation. In this work, we propose a novel Subgraph Gaussian Embedding Contrast (SGEC) method. Our approach introduces a subgraph Gaussian embedding module, which adaptively maps subgraphs to a structured Gaussian space, ensuring the preservation of graph characteristics while controlling the distribution of generated subgraphs. We employ optimal transport distances, including Wasserstein and Gromov-Wasserstein distances, to effectively measure the similarity between subgraphs, enhancing the robustness of the contrastive learning process. Extensive experiments across multiple benchmarks demonstrate that SGEC outperforms or presents competitive performance against state-of-the-art approaches. Our findings provide insights into the design of SSL methods for GRL, emphasizing the importance of the distribution of the generated contrastive pairs.

Updated: 2024-11-11 17:23:07

标题: 变分图对比学习

摘要: 图形表示学习（GRL）是机器学习中的一个基本任务，旨在将高维图结构数据编码为低维向量。自监督学习（SSL）方法在GRL中被广泛使用，因为它们可以避免昂贵的人工注释。在这项工作中，我们提出了一种新颖的子图高斯嵌入对比（SGEC）方法。我们的方法引入了一个子图高斯嵌入模块，它自适应地将子图映射到一个结构化的高斯空间，确保了图的特征的保留同时控制生成的子图的分布。我们使用了最优输运距离，包括Wasserstein和Gromov-Wasserstein距离，来有效地衡量子图之间的相似性，增强了对比学习过程的鲁棒性。跨多个基准的大量实验表明，SGEC优于或与最先进方法具有竞争性表现。我们的发现为GRL的SSL方法设计提供了见解，强调了生成的对比对的分布的重要性。

更新时间: 2024-11-11 17:23:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.07150v1

Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic method and appropriately combine it with second-order information. Moreover, distinct from common adaptive schemes, we define the step size recursively as a function of the gradient norm and the prediction error in the optimistic update. We first analyze a variant where the step size requires knowledge of the Lipschitz constant of the Hessian. Under the additional assumption of Lipschitz continuous gradients, we further design a parameter-free version by tracking the Hessian Lipschitz constant locally and ensuring the iterates remain bounded. We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.

Updated: 2024-11-11 17:19:18

标题: 自适应和最优的二阶乐观方法用于极小化优化

摘要: 我们提出了一种自适应、无需线搜索的二阶方法，以最优的收敛速度解决凸-凹极小极大问题。通过自适应步长，我们的算法具有简单的更新规则，每次迭代只需要解一个线性系统，无需线搜索或回溯机制。具体来说，我们的算法基于乐观方法，并适当将其与二阶信息结合起来。此外，与常见的自适应方案不同，我们将步长递归地定义为梯度范数和乐观更新中的预测误差的函数。我们首先分析了一种变体，其中步长需要知道Hessian矩阵的Lipschitz常数。在额外假设梯度Lipschitz连续的情况下，我们进一步设计了一个无需参数的版本，通过在本地跟踪Hessian矩阵的Lipschitz常数并确保迭代保持有界来实现。我们还通过将其与现有的二阶算法进行比较，评估了我们算法的实际性能，用于极小极大优化。

更新时间: 2024-11-11 17:19:18

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.02016v2

Edify 3D: Scalable High-Quality 3D Asset Generation

We introduce Edify 3D, an advanced solution designed for high-quality 3D asset generation. Our method first synthesizes RGB and surface normal images of the described object at multiple viewpoints using a diffusion model. The multi-view observations are then used to reconstruct the shape, texture, and PBR materials of the object. Our method can generate high-quality 3D assets with detailed geometry, clean shape topologies, high-resolution textures, and materials within 2 minutes of runtime.

Updated: 2024-11-11 17:07:43

标题: Edify 3D：可扩展的高质量3D资产生成

摘要: 我们介绍了Edify 3D，这是一个专为高质量3D资产生成而设计的先进解决方案。我们的方法首先利用扩散模型在多个视角合成所描述对象的RGB和表面法线图像。然后利用多视角观察结果重建对象的形状、纹理和PBR材料。我们的方法可以在2分钟内生成具有详细几何形状、清晰形状拓扑、高分辨率纹理和材料的高质量3D资产。

更新时间: 2024-11-11 17:07:43

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2411.07135v1

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Instruction tuning has been widely adopted to ensure large language models (LLMs) follow user instructions effectively. The resulting instruction-following capabilities of LLMs heavily rely on the instruction datasets used for tuning. Recently, synthetic instruction datasets have emerged as an economically viable solution to provide LLMs diverse and high-quality instructions. However, existing approaches typically assume that larger or stronger models are stronger teachers for instruction tuning, and hence simply adopt these models as response generators to the synthetic instructions. In this paper, we challenge this commonly-adopted assumption. Our extensive experiments across five base models and twenty response generators reveal that larger and stronger models are not necessarily stronger teachers of smaller models. We refer to this phenomenon as the Larger Models' Paradox. We observe that existing metrics cannot precisely predict the effectiveness of response generators since they ignore the compatibility between teachers and base models being fine-tuned. We thus develop a novel metric, named as Compatibility-Adjusted Reward (CAR) to measure the effectiveness of response generators. Our experiments across five base models demonstrate that CAR outperforms almost all baselines.

Updated: 2024-11-11 17:06:48

标题: 更强大的模型并不一定是更好的教师用于指导调整

摘要: 指导调优已被广泛采用，以确保大型语言模型（LLMs）有效遵循用户指令。LLMs的指令遵循能力严重依赖于用于调优的指令数据集。最近，合成指令数据集已经出现，成为一个经济可行的解决方案，为LLMs提供多样化和高质量的指令。然而，现有方法通常假定更大或更强大的模型对指导调优更有利，并因此简单地采用这些模型作为合成指令的响应生成器。在本文中，我们挑战这个普遍采用的假设。我们在五个基本模型和二十个响应生成器之间进行了大量实验，发现更大更强大的模型未必是小型模型更强大的教师。我们将这一现象称为更大模型的悖论。我们观察到，现有的度量不能精确预测响应生成器的有效性，因为它们忽略了教师和被微调的基本模型之间的兼容性。因此，我们开发了一个新的度量标准，名为兼容性调整奖励（CAR）来衡量响应生成器的有效性。我们在五个基本模型上的实验表明，CAR优于几乎所有基线。

更新时间: 2024-11-11 17:06:48

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.07133v1

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Although text-to-image (T2I) models exhibit remarkable generation capabilities, they frequently fail to accurately bind semantically related objects or attributes in the input prompts; a challenge termed semantic binding. Previous approaches either involve intensive fine-tuning of the entire T2I model or require users or large language models to specify generation layouts, adding complexity. In this paper, we define semantic binding as the task of associating a given object with its attribute, termed attribute binding, or linking it to other related sub-objects, referred to as object binding. We introduce a novel method called Token Merging (ToMe), which enhances semantic binding by aggregating relevant tokens into a single composite token. This ensures that the object, its attributes and sub-objects all share the same cross-attention map. Additionally, to address potential confusion among main objects with complex textual prompts, we propose end token substitution as a complementary strategy. To further refine our approach in the initial stages of T2I generation, where layouts are determined, we incorporate two auxiliary losses, an entropy loss and a semantic binding loss, to iteratively update the composite token to improve the generation integrity. We conducted extensive experiments to validate the effectiveness of ToMe, comparing it against various existing methods on the T2I-CompBench and our proposed GPT-4o object binding benchmark. Our method is particularly effective in complex scenarios that involve multiple objects and attributes, which previous methods often fail to address. The code will be publicly available at \url{https://github.com/hutaihang/ToMe}.

Updated: 2024-11-11 17:05:15

标题: 无需训练的语义绑定中的标记合并在文本到图像合成中的应用

摘要: 尽管文本到图像（T2I）模型展现出卓越的生成能力，但它们经常无法准确地将语义相关的对象或属性绑定在输入提示中；这是一个被称为语义绑定的挑战。先前的方法要么涉及对整个T2I模型进行密集的微调，要么需要用户或大型语言模型指定生成布局，增加了复杂性。在本文中，我们将语义绑定定义为将给定对象与其属性关联的任务，称为属性绑定，或将其链接到其他相关的子对象，称为对象绑定。我们引入了一种名为Token Merging（ToMe）的新方法，通过将相关标记聚合成单个复合标记来增强语义绑定。这确保了对象、其属性和子对象共享相同的交叉注意力图。此外，为了解决在复杂文本提示下可能出现的主要对象混淆，我们提出了末标记替换作为一种补充策略。为了在T2I生成的初始阶段进一步完善我们的方法，确定布局时，我们结合了两个辅助损失，熵损失和语义绑定损失，以迭代更新复合标记以改善生成完整性。我们进行了大量实验来验证ToMe的有效性，在T2I-CompBench和我们提出的GPT-4o对象绑定基准上将其与各种现有方法进行比较。我们的方法在涉及多个对象和属性的复杂场景中特别有效，而先前的方法通常无法解决这些场景。该代码将公开发布在\url{https://github.com/hutaihang/ToMe}。

更新时间: 2024-11-11 17:05:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.07132v1

Adam-mini: Use Fewer Learning Rates To Gain More

We propose Adam-mini, an optimizer that achieves on par or better performance than AdamW with 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). By investigating the Hessian structure of neural nets, we find Adam's $v$ might not function at its full potential as effectively as we expected. We find that $\geq$ 99.9% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our new principle on Hessian structure; (2) assign a single but good learning rate to each parameter block. We then provide one simple way to find good learning rates and propose Adam-mini. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 39M to 13B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama 2-7B on $2\times$ A800-80GB GPUs, which saves 33% wall-clock time for pre-training.

Updated: 2024-11-11 16:59:58

标题: Adam-mini: 使用更少的学习率获得更多收益

摘要: 我们提出了Adam-mini，这是一种优化器，其在内存占用减少50%的情况下，实现了与AdamW相当或更好的性能。Adam-mini通过减少Adam中的学习率资源（即$1/\sqrt{v}$）来减少内存占用。通过研究神经网络的Hessian结构，我们发现Adam的$v$可能并未像我们期望的那样有效地发挥其全部潜力。我们发现$v$中的这些学习率中有超过99.9%可以被安全地移除，方法是：（1）根据我们对Hessian结构的新原则，仔细将参数分为块；（2）为每个参数块分配一个单一但良好的学习率。然后，我们提供了一种简单的方法来找到良好的学习率，并提出了Adam-mini。从经验上看，我们验证了Adam-mini在各种语言模型（从39M到13B）的预训练、监督微调和RLHF中的性能与AdamW相当或更好。Adam-mini的内存占用减少也减轻了GPU之间的通信开销，从而增加了吞吐量。例如，当在$2\times$ A800-80GB GPU上预训练Llama 2-7B时，Adam-mini的吞吐量比AdamW高出49.6%，节省了33%的预训练时间。

更新时间: 2024-11-11 16:59:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16793v6

ZT-RIC:A Zero Trust RIC Framework for ensuring data Privacy and Confidentiality in Open RAN

The advancement of 5G and NextG networks through Open Radio Access Network (O-RAN) architecture enables a shift toward virtualized, modular, and disaggregated configurations. A core component of O-RAN is the RAN Intelligent Controller (RIC), which manages RAN using machine learning-driven xApps that access sensitive data from RAN and User Equipment (UE), stored in the near Real-Time RIC (Near-RT RIC) database. This shared, open environment increases the risk of unauthorized data exposure. To address these concerns, this paper proposes a zero-trust RIC (ZT-RIC) framework that preserves data privacy across the RIC platform, including the RIC database, xApps, and E2 interface. ZT-RIC employs Inner Product Functional Encryption (IPFE) to encrypt RAN/UE data at the base station, preventing leaks through the E2 interface and shared database. Additionally, ZT-RIC enables xApps to perform inference on encrypted data without exposing sensitive information. For evaluation, a state-of-the-art InterClass xApp, which detects jamming signals using RAN key performance metrics (KPMs), is implemented. Testing on an LTE/5G O-RAN testbed shows that ZT-RIC preserves data confidentiality while achieving 97.9% accuracy in jamming detection and meeting sub-second latency requirements, with a round-trip time (RTT) of 0.527 seconds.

Updated: 2024-11-11 16:59:22

标题: ZT-RIC：一种用于在开放式RAN中确保数据隐私和保密性的零信任RIC框架

摘要: 通过开放无线接入网络（O-RAN）架构推动5G和NextG网络的发展，实现了向虚拟化、模块化和分散配置的转变。O-RAN的核心组件是无线接入网络智能控制器（RIC），它使用基于机器学习驱动的xApps来管理无线接入网络，这些xApps可以访问存储在近实时RIC（Near-RT RIC）数据库中的来自无线接入网络和用户设备（UE）的敏感数据。这种共享、开放的环境增加了未经授权的数据泄露风险。为了解决这些问题，本文提出了一个零信任RIC（ZT-RIC）框架，可以在整个RIC平台上保护数据隐私，包括RIC数据库、xApps和E2接口。ZT-RIC采用内积功能加密（IPFE）在基站对无线接入网络/用户设备数据进行加密，防止通过E2接口和共享数据库泄霩。此外，ZT-RIC使xApps能够在不暴露敏感信息的情况下对加密数据进行推理。为了评估，实现了一个最先进的InterClass xApp，它使用无线接入网络关键性能指标（KPMs）来检测干扰信号。在LTE/5G O-RAN测试平台上进行测试表明，ZT-RIC在保护数据机密性的同时，在干扰检测方面达到了97.9%的准确率，并满足亚秒级的延迟要求，往返时间（RTT）为0.527秒。

更新时间: 2024-11-11 16:59:22

领域: cs.CR

下载: http://arxiv.org/abs/2411.07128v1

Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling

Recent progress in artificial intelligence (AI) has been driven by insights from neuroscience, particularly with the development of artificial neural networks (ANNs). This has significantly enhanced the replication of complex cognitive tasks such as vision and natural language processing. Despite these advances, ANNs struggle with continual learning, adaptable knowledge transfer, robustness, and resource efficiency - capabilities that biological systems handle seamlessly. Specifically, ANNs often overlook the functional and morphological diversity of the brain, hindering their computational capabilities. Furthermore, incorporating cell-type specific neuromodulatory effects into ANNs with neuronal heterogeneity could enable learning at two spatial scales: spiking behavior at the neuronal level, and synaptic plasticity at the circuit level, thereby potentially enhancing their learning abilities. In this article, we summarize recent bio-inspired models, learning rules and architectures and propose a biologically-informed framework for enhancing ANNs. Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors and dendritic compartments to simulate morphological and functional diversity of neuronal computations. Finally, we outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, balances bioinspiration and complexity, and provides scalable solutions for pressing AI challenges, such as continual learning, adaptability, robustness, and resource-efficiency.

Updated: 2024-11-11 16:58:38

标题: 通过神经元异质性和神经调节信号增强脉冲神经网络中的学习

摘要: 最近人工智能（AI）领域的进展受到了神经科学的启发，特别是人工神经网络（ANNs）的发展。这显著增强了复杂认知任务的复制，如视觉和自然语言处理。尽管取得了这些进展，人工神经网络在持续学习、可适应知识转移、稳健性和资源效率方面存在困难 - 这些是生物系统轻松处理的能力。具体来说，人工神经网络往往忽视大脑的功能和形态多样性，从而限制了它们的计算能力。此外，将细胞类型特异性的神经调节效应与具有神经元异质性的ANNs结合起来，可以实现两个空间尺度上的学习：在神经元水平上的尖峰行为和在电路水平上的突触可塑性，从而潜在地增强它们的学习能力。在本文中，我们总结了最近的生物启发模型、学习规则和架构，并提出了一个用于增强ANNs的生物信息学框架。我们提出的双框架方法突出了尖峰神经网络（SNNs）模拟多样的尖峰行为和树突区，以模拟神经元计算的形态和功能多样性的潜力。最后，我们概述了提出的方法如何整合以大脑为灵感的区段模型和任务驱动的SNNs，平衡生物启发和复杂性，并为紧迫的AI挑战提供可扩展的解决方案，如持续学习、适应性、稳健性和资源效率。

更新时间: 2024-11-11 16:58:38

领域: q-bio.NC,cs.AI,cs.LG,92B20

下载: http://arxiv.org/abs/2407.04525v4

Benchmarking LLMs' Judgments with No Gold Standard

We introduce the GEM (Generative Estimator for Mutual Information), an evaluation metric for assessing language generation by Large Language Models (LLMs), particularly in generating informative judgments, without the need for a gold standard reference. GEM broadens the scenarios where we can benchmark LLM generation performance-from traditional ones, like machine translation and summarization, where gold standard references are readily available, to subjective tasks without clear gold standards, such as academic peer review. GEM uses a generative model to estimate mutual information between candidate and reference responses, without requiring the reference to be a gold standard. In experiments on a human-annotated dataset, GEM demonstrates competitive correlations with human scores compared to the state-of-the-art GPT-4o Examiner, and outperforms all other baselines. Additionally, GEM is more robust against strategic manipulations, such as rephrasing or elongation, which can artificially inflate scores under a GPT-4o Examiner. We also present GRE-bench (Generating Review Evaluation Benchmark) which evaluates LLMs based on how well they can generate high-quality peer reviews for academic research papers. Because GRE-bench is based upon GEM, it inherits its robustness properties. Additionally, GRE-bench circumvents data contamination problems (or data leakage) by using the continuous influx of new open-access research papers and peer reviews each year. We show GRE-bench results of various popular LLMs on their peer review capabilities using the ICLR2023 dataset.

Updated: 2024-11-11 16:58:36

标题: 用没有黄金标准对LLMs的判断进行基准测试

摘要: 我们引入了GEM（互信息生成估计器），这是一种评估大型语言模型（LLMs）语言生成能力的评估指标，特别是在生成具有信息量的判断时，并无需金标准参考。GEM扩展了我们可以对LLM生成性能进行基准测试的情境范围，从传统的机器翻译和摘要等领域，那里金标准参考是readily可用的，到主观任务中缺乏明确金标准的情况，比如学术同行评审。 GEM使用一个生成模型来估计候选和参考回答之间的互信息，而无需参考是金标准。在一个人工注释的数据集上的实验中，与最先进的GPT-4o Examiner相比，GEM表现出与人类评分具有竞争力的相关性，并且胜过所有其他基线。此外，GEM对策略性操作（例如改写或延长）更具鲁棒性，这些操作可以在GPT-4o Examiner下人为地增加分数。我们还提出了GRE-bench（生成评论评估基准），它根据LLMs生成学术研究论文的高质量同行评审来评估LLMs的性能。由于GRE-bench基于GEM，它继承了其鲁棒性属性。此外，GRE-bench通过每年持续涌入的新开放获取研究论文和同行评审来规避数据污染问题（或数据泄漏）。我们展示了使用ICLR2023数据集对各种热门LLMs的同行评审能力进行评估的GRE-bench结果。

更新时间: 2024-11-11 16:58:36

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.07127v1

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for image customization.

Updated: 2024-11-11 16:58:31

标题: Edify Image: 使用像素空间拉普拉斯扩散模型生成高质量图像

摘要: 我们介绍了Edify Image，一种能够生成像素完美准确的逼真图像内容的扩散模型系列。Edify Image利用经过训练的级联像素空间扩散模型，采用一种新颖的拉普拉斯扩散过程，在这个过程中，不同频段的图像信号以不同的速率衰减。Edify Image支持广泛的应用，包括文本到图像合成、4K上采样、ControlNets、360 HDR全景生成以及图像定制的微调。

更新时间: 2024-11-11 16:58:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.07126v1

Interpretable Machine Learning for Resource Allocation with Application to Ventilator Triage

Rationing of healthcare resources is a challenging decision that policy makers and providers may be forced to make during a pandemic, natural disaster, or mass casualty event. Well-defined guidelines to triage scarce life-saving resources must be designed to promote transparency, trust, and consistency. To facilitate buy-in and use during high-stress situations, these guidelines need to be interpretable and operational. We propose a novel data-driven model to compute interpretable triage guidelines based on policies for Markov Decision Process that can be represented as simple sequences of decision trees ("tree policies"). In particular, we characterize the properties of optimal tree policies and present an algorithm based on dynamic programming recursions to compute good tree policies. We utilize this methodology to obtain simple, novel triage guidelines for ventilator allocations for COVID-19 patients, based on real patient data from Montefiore hospitals. We also compare the performance of our guidelines to the official New York State guidelines that were developed in 2015 (well before the COVID-19 pandemic). Our empirical study shows that the number of excess deaths associated with ventilator shortages could be reduced significantly using our policy. Our work highlights the limitations of the existing official triage guidelines, which need to be adapted specifically to COVID-19 before being successfully deployed.

Updated: 2024-11-11 16:57:36

标题: 可解释的机器学习用于资源分配，以及在呼吸机分诊中的应用

摘要: 医疗资源配给是在大流行病、自然灾害或大规模伤员事件中，政策制定者和提供者可能被迫做出的具有挑战性的决定。必须设计明确定义的指导方针来对稀缺的挽救生命资源进行分级，以促进透明度、信任和一致性。为了在高压力情况下促进接受和使用，这些指导方针需要是可解释和可操作的。我们提出了一种基于马尔可夫决策过程的政策计算可解释分级指南的新颖数据驱动模型，可以表示为简单的决策树序列（“树策略”）。特别地，我们表征了最优树策略的属性，并提出了一个基于动态规划递归的算法来计算好的树策略。我们利用这种方法来根据蒙特菲奥雷医院的真实患者数据获得简单、新颖的COVID-19患者呼吸机分配分级指南。我们还将我们的指南的性能与2015年制定的官方纽约州指南进行了比较（远在COVID-19大流行之前）。我们的实证研究表明，使用我们的政策可以显著减少与呼吸机短缺相关的过多死亡人数。我们的工作突显了现有官方分级指南的局限性，这些指南需要在成功部署之前专门针对COVID-19进行调整。

更新时间: 2024-11-11 16:57:36

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2110.10994v2

Fast and Robust Contextual Node Representation Learning over Dynamic Graphs

Real-world graphs grow rapidly with edge and vertex insertions over time, motivating the problem of efficiently maintaining robust node representation over evolving graphs. Recent efficient GNNs are designed to decouple recursive message passing from the learning process, and favor Personalized PageRank (PPR) as the underlying feature propagation mechanism. However, most PPR-based GNNs are designed for static graphs, and efficient PPR maintenance remains as an open problem. Further, there is surprisingly little theoretical justification for the choice of PPR, despite its impressive empirical performance. In this paper, we are inspired by the recent PPR formulation as an explicit $\ell_1$-regularized optimization problem and propose a unified dynamic graph learning framework based on sparse node-wise attention. We also present a set of desired properties to justify the choice of PPR in STOA GNNs, and serves as the guideline for future node attention designs. Meanwhile, we take advantage of the PPR-equivalent optimization formulation and employ the proximal gradient method (ISTA) to improve the efficiency of PPR-based GNNs upto 6 times. Finally, we instantiate a simple-yet-effective model (\textsc{GoPPE}) with robust positional encodings by maximizing PPR previously used as attention. The model performs comparably to or better than the STOA baselines and greatly outperforms when the initial node attributes are noisy during graph evolution, demonstrating the effectiveness and robustness of \textsc{GoPPE}.

Updated: 2024-11-11 16:51:51

标题: 快速而健壮的动态图上的上下文节点表示学习

摘要: 实际世界中的图随着时间的推移，通过边和顶点插入迅速增长，这促使了在不断演化的图中高效维护稳健节点表示的问题。最近设计的高效GNN旨在将递归消息传递与学习过程分离，并倾向于将个性化PageRank（PPR）作为基础特征传播机制。然而，大多数基于PPR的GNN设计是针对静态图的，而高效维护PPR仍然是一个未解决的问题。此外，尽管PPR在实证性能方面表现出色，但对于选择PPR的理论证明却非常少。在本文中，我们受到最近将PPR公式化为显式$\ell_1$正则化优化问题的启发，并提出了基于稀疏节点注意力的统一动态图学习框架。我们还提出了一组期望的属性，以证明在STOA GNN中选择PPR的合理性，并作为未来节点注意力设计的指导方针。同时，我们利用与PPR等价的优化公式，并采用近端梯度方法（ISTA）来提高基于PPR的GNN的效率高达6倍。最后，我们实例化了一个简单但有效的模型（\textsc{GoPPE}），通过最大化先前用作注意力的PPR来实现坚固的位置编码。该模型在与STOA基线的比较中表现出与或优于表现，并在图演化过程中初始节点属性存在噪音时表现出色，展示了\textsc{GoPPE}的有效性和稳健性。

更新时间: 2024-11-11 16:51:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.07123v1

Tighter Confidence Bounds for Sequential Kernel Regression

Confidence bounds are an essential tool for rigorously quantifying the uncertainty of predictions. They are a core component in many sequential learning and decision-making algorithms, with tighter confidence bounds giving rise to algorithms with better empirical performance and better performance guarantees. In this work, we use martingale tail inequalities to establish new confidence bounds for sequential kernel regression. Our confidence bounds can be computed by solving a conic program, although this bare version quickly becomes impractical, because the number of variables grows with the sample size. However, we show that the dual of this conic program allows us to efficiently compute tight confidence bounds. We prove that our new confidence bounds are always tighter than existing ones in this setting. We apply our confidence bounds to kernel bandit problems, and we find that when our confidence bounds replace existing ones, the KernelUCB (GP-UCB) algorithm has better empirical performance, a matching worst-case performance guarantee and comparable computational cost.

Updated: 2024-11-11 16:50:48

标题: 连续核回归的更紧密置信边界

摘要: 置信区间是严格量化预测不确定性的重要工具。它们是许多顺序学习和决策算法的核心组件，较紧的置信区间导致具有更好经验性能和更好性能保证的算法。在这项工作中，我们使用鞍点不等式建立了顺序核回归的新置信区间。我们的置信区间可以通过解决一个锥形规划来计算，尽管这种简单版本很快变得不切实际，因为变量的数量随样本大小增长。然而，我们展示了这个锥形规划的对偶允许我们有效地计算紧凑的置信区间。我们证明在这种情况下，我们的新置信区间总是比现有的更紧。我们将我们的置信区间应用于核赌博问题，并发现当我们的置信区间取代现有的置信区间时，KernelUCB（GP-UCB）算法具有更好的经验性能，匹配最坏情况性能保证和可比的计算成本。

更新时间: 2024-11-11 16:50:48

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2403.12732v2

Enabling Efficient Equivariant Operations in the Fourier Basis via Gaunt Tensor Products

Developing equivariant neural networks for the E(3) group plays an important role in modeling 3D data across real-world applications. Enforcing this equivariance primarily involves the tensor products of irreducible representations (irreps). However, the computational complexity of such operations increases significantly as higher-order tensors are used. In this work, we propose a systematic approach to substantially accelerate the computation of the tensor products of irreps. We mathematically connect the commonly used Clebsch-Gordan coefficients to the Gaunt coefficients, which are integrals of products of three spherical harmonics. Through Gaunt coefficients, the tensor product of irreps becomes equivalent to the multiplication between spherical functions represented by spherical harmonics. This perspective further allows us to change the basis for the equivariant operations from spherical harmonics to a 2D Fourier basis. Consequently, the multiplication between spherical functions represented by a 2D Fourier basis can be efficiently computed via the convolution theorem and Fast Fourier Transforms. This transformation reduces the complexity of full tensor products of irreps from $\mathcal{O}(L^6)$ to $\mathcal{O}(L^3)$, where $L$ is the max degree of irreps. Leveraging this approach, we introduce the Gaunt Tensor Product, which serves as a new method to construct efficient equivariant operations across different model architectures. Our experiments on the Open Catalyst Project and 3BPA datasets demonstrate both the increased efficiency and improved performance of our approach.

Updated: 2024-11-11 16:50:36

标题: 通过高斯张量积在傅里叶基础上实现高效的等变操作

摘要: 为E(3)群开发等变神经网络在建模现实世界应用中的3D数据中扮演着重要角色。强制这种等变性主要涉及不可约表示（irreps）的张量积。然而，随着使用更高阶张量，这类操作的计算复杂度显著增加。在这项工作中，我们提出了一种系统方法，大幅加快不可约表示的张量积的计算速度。我们通过数学方法将常用的克莱布施-戈登系数与高斯系数相连接，高斯系数是三个球谐函数积分的乘积。通过高斯系数，不可约表示的张量积变成了由球谐函数表示的球函数之间的乘法。这种视角进一步允许我们将等变操作的基础从球谐函数改变为2D傅里叶基础。因此，由2D傅里叶基础表示的球函数之间的乘法可以通过卷积定理和快速傅里叶变换有效计算。这种转换将不可约表示的完整张量积的复杂度从$\mathcal{O}(L^6)$降低到$\mathcal{O}(L^3)$，其中$L$是不可约表示的最大阶数。利用这种方法，我们引入了高斯张量积，作为一种新方法构建不同模型架构之间的有效等变操作。我们在Open Catalyst Project和3BPA数据集上的实验表明，我们的方法既提高了效率，又改善了性能。

更新时间: 2024-11-11 16:50:36

领域: cs.LG,cond-mat.mtrl-sci,math.GR,physics.chem-ph,q-bio.BM

下载: http://arxiv.org/abs/2401.10216v2

Federated Learning under Periodic Client Participation and Heterogeneous Data: A New Communication-Efficient Algorithm and Analysis

In federated learning, it is common to assume that clients are always available to participate in training, which may not be feasible with user devices in practice. Recent works analyze federated learning under more realistic participation patterns, such as cyclic client availability or arbitrary participation. However, all such works either require strong assumptions (e.g., all clients participate almost surely within a bounded window), do not achieve linear speedup and reduced communication rounds, or are not applicable in the general non-convex setting. In this work, we focus on nonconvex optimization and consider participation patterns in which the chance of participation over a fixed window of rounds is equal among all clients, which includes cyclic client availability as a special case. Under this setting, we propose a new algorithm, named Amplified SCAFFOLD, and prove that it achieves linear speedup, reduced communication, and resilience to data heterogeneity simultaneously. In particular, for cyclic participation, our algorithm is proved to enjoy $\mathcal{O}(\epsilon^{-2})$ communication rounds to find an $\epsilon$-stationary point in the non-convex stochastic setting. In contrast, the prior work under the same setting requires $\mathcal{O}(\kappa^2 \epsilon^{-4})$ communication rounds, where $\kappa$ denotes the data heterogeneity. Therefore, our algorithm significantly reduces communication rounds due to better dependency in terms of $\epsilon$ and $\kappa$. Our analysis relies on a fine-grained treatment of the nested dependence between client participation and errors in the control variates, which results in tighter guarantees than previous work. We also provide experimental results with (1) synthetic data and (2) real-world data with a large number of clients $(N = 250)$, demonstrating the effectiveness of our algorithm under periodic client participation.

Updated: 2024-11-11 16:48:48

标题: 分散式学习在周期性客户参与和异构数据下：一种新的高效通信算法和分析

摘要: 在联邦学习中，通常假设客户端始终可以参与训练，但在实践中，用户设备可能无法做到这一点。最近的研究分析了更现实的参与模式下的联邦学习，例如周期性客户端可用性或任意参与。然而，所有这些研究要么需要强假设（例如，所有客户端几乎肯定在有界窗口内参与），要么不能实现线性加速和减少通信轮次，要么不适用于一般的非凸设置。在本研究中，我们专注于非凸优化，并考虑参与模式，其中所有客户端在固定轮次窗口内参与的机会均等，其中包括周期性客户端可用性作为一种特殊情况。在这种设置下，我们提出了一种名为Amplified SCAFFOLD的新算法，并证明它实现了线性加速、减少通信和数据异质性同时具有韧性。特别地，对于周期性参与，我们的算法被证明在非凸随机设置中找到一个$\epsilon$-稳定点只需要$\mathcal{O}(\epsilon^{-2})$通信轮次。相比之下，相同设置下的先前工作需要$\mathcal{O}(\kappa^2 \epsilon^{-4})$通信轮次，其中$\kappa$表示数据异质性。因此，我们的算法通过更好地依赖$\epsilon$和$\kappa$，显著减少了通信轮次。我们的分析依赖于对客户端参与和控制变量中的误差之间的嵌套依赖进行细致处理，这导致比以前的工作更严格的保证。我们还提供了与（1）合成数据和（2）大量客户端（$N=250$）的真实数据的实验结果，展示了我们的算法在周期性客户端参与下的有效性。

更新时间: 2024-11-11 16:48:48

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2410.23131v3

Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum: Fast, Memory-Reduced Training with Convergence Guarantees

We introduce two complementary techniques for efficient adaptive optimization that reduce memory requirements while accelerating training of large-scale neural networks. The first technique, Subset-Norm adaptive step size, generalizes AdaGrad-Norm and AdaGrad(-Coordinate) by reducing the second moment term's memory footprint from $O(d)$ to $O(\sqrt{d})$ through step-size sharing, where $d$ is the model size. For non-convex smooth objectives under coordinate-wise sub-gaussian gradient noise, we prove a noise-adapted high-probability convergence guarantee showing improved dimensional dependence over existing methods. Our second technique, Subspace-Momentum, reduces the momentum state's memory footprint by operating in a low-dimensional subspace while applying standard SGD in the orthogonal complement. We establish high-probability convergence rates under similar relaxed assumptions. Empirical evaluation on LLaMA models from 60M to 1B parameters demonstrates the effectiveness of our methods, where combining subset-norm with subspace-momentum achieves Adam's validation perplexity in approximately half the training tokens (6.8B vs 13.1B) while using only 20% of the Adam's optimizer-states memory footprint and requiring minimal additional hyperparameter tuning.

Updated: 2024-11-11 16:48:07

标题: 通过子集范数和子空间动量实现高效自适应优化：具有收敛保证的快速、内存减少的训练

摘要: 我们引入了两种互补的技术，用于有效的自适应优化，可以减少内存需求，同时加速大规模神经网络的训练。第一种技术是Subset-Norm自适应步长，通过步长共享将第二时刻项的内存占用从$O(d)$减少到$O(\sqrt{d})$，从而推广了AdaGrad-Norm和AdaGrad(-Coordinate)。对于非凸平滑目标，在坐标级次高斯梯度噪声下，我们证明了一种噪声自适应的高概率收敛保证，显示出相对于现有方法的改进维度依赖性。我们的第二种技术，Subspace-Momentum，通过在低维子空间中操作来减少动量状态的内存占用，同时在正交补空间中应用标准SGD。我们建立了类似放松的假设下的高概率收敛速率。在60M到1B参数的LLaMA模型上的实证评估表明了我们方法的有效性，将Subset-Norm与Subspace-Momentum相结合，在仅使用Adam优化器状态内存占用的20%且需要最小的额外超参数调整的情况下，大致可以将Adam的验证困惑度降低了一半的训练标记（6.8B vs 13.1B）。

更新时间: 2024-11-11 16:48:07

领域: cs.LG,cs.NE,math.OC

下载: http://arxiv.org/abs/2411.07120v1

Privacy-Preserving Graph-Based Machine Learning with Fully Homomorphic Encryption for Collaborative Anti-Money Laundering

Combating money laundering has become increasingly complex with the rise of cybercrime and digitalization of financial transactions. Graph-based machine learning techniques have emerged as promising tools for Anti-Money Laundering (AML) detection, capturing intricate relationships within money laundering networks. However, the effectiveness of AML solutions is hindered by data silos within financial institutions, limiting collaboration and overall efficacy. This research presents a novel privacy-preserving approach for collaborative AML machine learning, facilitating secure data sharing across institutions and borders while preserving privacy and regulatory compliance. Leveraging Fully Homomorphic Encryption (FHE), computations are directly performed on encrypted data, ensuring the confidentiality of financial data. Notably, FHE over the Torus (TFHE) was integrated with graph-based machine learning using Zama Concrete ML. The research contributes two key privacy-preserving pipelines. First, the development of a privacy-preserving Graph Neural Network (GNN) pipeline was explored. Optimization techniques like quantization and pruning were used to render the GNN FHE-compatible. Second, a privacy-preserving graph-based XGBoost pipeline leveraging Graph Feature Preprocessor (GFP) was successfully developed. Experiments demonstrated strong predictive performance, with the XGBoost model consistently achieving over 99% accuracy, F1-score, precision, and recall on the balanced AML dataset in both unencrypted and FHE-encrypted inference settings. On the imbalanced dataset, the incorporation of graph-based features improved the F1-score by 8%. The research highlights the need to balance the trade-off between privacy and computational efficiency.

Updated: 2024-11-11 16:47:58

标题: 隐私保护的基于图的机器学习与全同态加密在协同反洗钱中的应用

摘要: 打击洗钱活动随着网络犯罪和金融交易数字化的崛起变得越来越复杂。基于图的机器学习技术已经成为反洗钱（AML）检测的有希望的工具，捕捉到洗钱网络中复杂的关系。然而，AML解决方案的有效性受到金融机构内部数据孤立的限制，限制了合作和整体效力。这项研究提出了一种新颖的隐私保护方法，用于协作式AML机器学习，促进机构和跨境之间的安全数据共享，同时保护隐私和遵守监管规定。利用完全同态加密（FHE），计算直接在加密数据上执行，确保金融数据的保密性。值得注意的是，将Torus上的FHE（TFHE）与使用Zama Concrete ML的基于图的机器学习相结合。研究提供了两个关键的隐私保护管道。首先，探讨了一个隐私保护图神经网络（GNN）管道的开发。使用量化和修剪等优化技术使得GNN与FHE兼容。其次，成功开发了一个利用图特征预处理器（GFP）的基于图的XGBoost管道。实验表明了强大的预测性能，XGBoost模型在未加密和FHE加密推断设置中始终实现超过99%的准确率、F1分数、精度和召回率。在不均衡数据集上，引入基于图的特征将F1分数提高了8%。研究强调了在隐私和计算效率之间取得平衡的必要性。

更新时间: 2024-11-11 16:47:58

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2411.02926v2

ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition

Transformer models have demonstrated remarkable success in many domains such as natural language processing (NLP) and computer vision. With the growing interest in transformer-based architectures, they are now utilized for gesture recognition. So, we also explore and devise a novel ConvMixFormer architecture for dynamic hand gestures. The transformers use quadratic scaling of the attention features with the sequential data, due to which these models are computationally complex and heavy. We have considered this drawback of the transformer and designed a resource-efficient model that replaces the self-attention in the transformer with the simple convolutional layer-based token mixer. The computational cost and the parameters used for the convolution-based mixer are comparatively less than the quadratic self-attention. Convolution-mixer helps the model capture the local spatial features that self-attention struggles to capture due to their sequential processing nature. Further, an efficient gate mechanism is employed instead of a conventional feed-forward network in the transformer to help the model control the flow of features within different stages of the proposed model. This design uses fewer learnable parameters which is nearly half the vanilla transformer that helps in fast and efficient training. The proposed method is evaluated on NVidia Dynamic Hand Gesture and Briareo datasets and our model has achieved state-of-the-art results on single and multimodal inputs. We have also shown the parameter efficiency of the proposed ConvMixFormer model compared to other methods. The source code is available at https://github.com/mallikagarg/ConvMixFormer.

Updated: 2024-11-11 16:45:18

标题: ConvMixFormer- 一种基于Transformer的动态手势识别的资源高效卷积混合器

摘要: Transformer模型在许多领域，如自然语言处理（NLP）和计算机视觉中取得了显著的成功。随着基于transformer的架构日益受到关注，它们现在被用于手势识别。因此，我们还探索和设计了一种新颖的ConvMixFormer架构用于动态手势。transformer在处理顺序数据时使用了注意力特征的二次扩展，因此这些模型在计算上复杂且繁重。我们考虑了transformer的这一缺点，并设计了一种资源高效的模型，用简单的基于卷积层的令牌混合器替换了transformer中的自注意力机制。与二次自注意力相比，基于卷积的混合器的计算成本和参数使用较少。卷积混合器有助于模型捕获自注意力由于其顺序处理性质而难以捕获的局部空间特征。此外，在transformer中使用了一种高效的门控机制，而不是传统的前馈网络，以帮助模型控制所提出模型的不同阶段内的特征流。这种设计使用了更少的可学习参数，几乎是vanilla transformer的一半，有助于快速高效的训练。该方法在NVidia Dynamic Hand Gesture和Briareo数据集上进行了评估，我们的模型在单模态和多模态输入上取得了最先进的结果。我们还展示了所提出的ConvMixFormer模型相对于其他方法的参数效率。源代码可在https://github.com/mallikagarg/ConvMixFormer找到。

更新时间: 2024-11-11 16:45:18

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2411.07118v1

TinyML Security: Exploring Vulnerabilities in Resource-Constrained Machine Learning Systems

Tiny Machine Learning (TinyML) systems, which enable machine learning inference on highly resource-constrained devices, are transforming edge computing but encounter unique security challenges. These devices, restricted by RAM and CPU capabilities two to three orders of magnitude smaller than conventional systems, make traditional software and hardware security solutions impractical. The physical accessibility of these devices exacerbates their susceptibility to side-channel attacks and information leakage. Additionally, TinyML models pose security risks, with weights potentially encoding sensitive data and query interfaces that can be exploited. This paper offers the first thorough survey of TinyML security threats. We present a device taxonomy that differentiates between IoT, EdgeML, and TinyML, highlighting vulnerabilities unique to TinyML. We list various attack vectors, assess their threat levels using the Common Vulnerability Scoring System, and evaluate both existing and possible defenses. Our analysis identifies where traditional security measures are adequate and where solutions tailored to TinyML are essential. Our results underscore the pressing need for specialized security solutions in TinyML to ensure robust and secure edge computing applications. We aim to inform the research community and inspire innovative approaches to protecting this rapidly evolving and critical field.

Updated: 2024-11-11 16:41:22

标题: TinyML安全性：探究资源受限的机器学习系统中的漏洞

摘要: 微型机器学习（TinyML）系统使高度资源受限设备上的机器学习推断成为可能，这正在改变边缘计算，但也面临独特的安全挑战。这些设备受RAM和CPU能力限制，比传统系统小两到三个数量级，传统软件和硬件安全解决方案变得不切实际。这些设备的物理可访问性加剧了它们对侧信道攻击和信息泄漏的脆弱性。此外，TinyML模型存在安全风险，权重可能编码敏感数据，查询接口可能被利用。本文首次全面调查了TinyML的安全威胁。我们提出了一种设备分类法，区分了物联网（IoT）、边缘机器学习（EdgeML）和TinyML，突出了TinyML独有的漏洞。我们列出了各种攻击向量，使用通用漏洞评分系统评估它们的威胁级别，并评估现有和可能的防御措施。我们的分析确定了传统安全措施足够的地方以及针对TinyML至关重要的解决方案。我们的结果强调了在TinyML中需要专门的安全解决方案，以确保边缘计算应用的强大和安全。我们旨在向研究界提供信息，并激发保护这一快速发展和关键领域的创新方法。

更新时间: 2024-11-11 16:41:22

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2411.07114v1

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

All-atom molecular simulations offer detailed insights into macromolecular phenomena, but their substantial computational cost hinders the exploration of complex biological processes. We introduce Advanced Machine-learning Atomic Representation Omni-force-field (AMARO), a new neural network potential (NNP) that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms. AMARO demonstrates the feasibility of training coarser NNP, without prior energy terms, to run stable protein dynamics with scalability and generalization capabilities.

Updated: 2024-11-11 16:41:16

标题: AMARO：蛋白质热力学的全重原子可转移神经网络势能

摘要: 全原子分子模拟提供了对大分子现象的详细见解，但其巨大的计算成本阻碍了对复杂生物过程的探索。我们引入了一种新的神经网络势能（NNP）——Advanced Machine-learning Atomic Representation Omni-force-field（AMARO），它结合了一个O(3)-等变消息传递神经网络架构TensorNet和一个排除氢原子的粗粒化映射。AMARO展示了训练更粗糙的NNP的可行性，无需先前的能量项，可以稳定地运行蛋白质动力学，并具有可扩展性和泛化能力。

更新时间: 2024-11-11 16:41:16

领域: q-bio.BM,cs.LG,physics.bio-ph,physics.comp-ph

下载: http://arxiv.org/abs/2409.17852v3

Learning Dynamics from Multicellular Graphs with Deep Neural Networks

Multicellular self-assembly into functional structures is a dynamic process that is critical in the development and diseases, including embryo development, organ formation, tumor invasion, and others. Being able to infer collective cell migratory dynamics from their static configuration is valuable for both understanding and predicting these complex processes. However, the identification of structural features that can indicate multicellular motion has been difficult, and existing metrics largely rely on physical instincts. Here we show that using a graph neural network (GNN), the motion of multicellular collectives can be inferred from a static snapshot of cell positions, in both experimental and synthetic datasets.

Updated: 2024-11-11 16:40:18

标题: 用深度神经网络从多细胞图中学习动力学

摘要: 多细胞自组装成功能结构是发育和疾病发生中至关重要的动态过程，包括胚胎发育、器官形成、肿瘤侵袭等。能够从静态配置中推断集体细胞迁移动态对于理解和预测这些复杂过程非常有价值。然而，识别可以指示多细胞运动的结构特征一直很困难，现有的度量方法主要依赖于物理直觉。在这里，我们展示了使用图神经网络（GNN），可以从细胞位置的静态快照中推断多细胞集体的运动，无论是在实验数据集还是合成数据集中。

更新时间: 2024-11-11 16:40:18

领域: physics.bio-ph,cond-mat.soft,cs.LG

下载: http://arxiv.org/abs/2401.12196v3

Stochastic Newton Proximal Extragradient Method

Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that achieves superlinear convergence without higher per-iteration costs. Nonetheless, the method has slow global convergence, requiring up to $\tilde{O}(\kappa^2)$ iterations to reach the superlinear rate of $\tilde{O}((1/t)^{t/2})$, where $\kappa$ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in $\tilde{O}(\kappa)$ iterations. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework, achieving fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle.

Updated: 2024-11-11 16:37:02

标题: 随机牛顿近端外梯度方法

摘要: 随机二阶方法通过使用有噪声的Hessian估计来对梯度进行预处理，在强凸优化中实现快速局部收敛。然而，这些方法通常只有在随机Hessian噪声减少时才能达到超线性收敛，随着时间的推移，每次迭代的成本会增加。最近的一项工作[arXiv：2204.09266]通过使用Hessian平均方案来解决这个问题，实现了超线性收敛而没有更高的每次迭代成本。然而，该方法的全局收敛速度较慢，需要多达$\tilde{O}(\kappa^2)$次迭代才能达到$\tilde{O}((1/t)^{t/2})$的超线性速度，其中$\kappa$是问题的条件数。在本文中，我们提出了一种新颖的随机牛顿近端外梯度方法，改进了这些界限，实现了更快的全局线性速度，并在$\tilde{O}(\kappa)$次迭代中达到相同的快速超线性速度。我们通过扩展Hybrid Proximal Extragradient（HPE）框架来实现这一点，为具有访问有噪声Hessian oracle的强凸函数实现快速的全局和局部收敛速度。

更新时间: 2024-11-11 16:37:02

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01478v2

Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding

Recent models for natural language understanding are inclined to exploit simple patterns in datasets, commonly known as shortcuts. These shortcuts hinge on spurious correlations between labels and latent features existing in the training data. At inference time, shortcut-dependent models are likely to generate erroneous predictions under distribution shifts, particularly when some latent features are no longer correlated with the labels. To avoid this, previous studies have trained models to eliminate the reliance on shortcuts. In this study, we explore a different direction: pessimistically aggregating the predictions of a mixture-of-experts, assuming each expert captures relatively different latent features. The experimental results demonstrate that our post-hoc control over the experts significantly enhances the model's robustness to the distribution shift in shortcuts. Besides, we show that our approach has some practical advantages. We also analyze our model and provide results to support the assumption.

Updated: 2024-11-11 16:33:25

标题: 不消除而是聚合：事后控制专家混合以应对自然语言理解中的快捷方式转变

摘要: 最近的自然语言理解模型倾向于利用数据集中的简单模式，通常被称为捷径。这些捷径依赖于训练数据中存在的标签和潜在特征之间的偶然相关性。在推断时，依赖于捷径的模型在分布转移下很可能生成错误预测，特别是当一些潜在特征不再与标签相关时。为了避免这种情况，先前的研究已经训练模型消除对捷径的依赖。在这项研究中，我们探索了一个不同的方向：悲观地聚合专家混合模型的预测，假设每个专家捕捉相对不同的潜在特征。实验结果表明，我们对专家的事后控制显著增强了模型对捷径分布转移的鲁棒性。此外，我们展示了我们的方法具有一些实际优势。我们还分析了我们的模型，并提供结果支持这一假设。

更新时间: 2024-11-11 16:33:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12060v2

Training Neural Networks as Recognizers of Formal Languages

Characterizing the computational power of neural network architectures in terms of formal language theory remains a crucial line of research, as it describes lower and upper bounds on the reasoning capabilities of modern AI. However, when empirically testing these bounds, existing work often leaves a discrepancy between experiments and the formal claims they are meant to support. The problem is that formal language theory pertains specifically to recognizers: machines that receive a string as input and classify whether it belongs to a language. On the other hand, it is common to instead use proxy tasks that are similar in only an informal sense, such as language modeling or sequence-to-sequence transduction. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings, using a general method that can be applied to a wide variety of languages. As part of this, we extend an algorithm recently proposed by Sn{\ae}bjarnarson et al. (2024) to do length-controlled sampling of strings from regular languages, with much better asymptotic time complexity than previous methods. We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures: a simple RNN, an LSTM, and a causally-masked transformer. We find that the RNN and LSTM often outperform the transformer, and that auxiliary training objectives such as language modeling can help, although no single objective uniformly improves performance across languages and architectures. Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work. We have released our datasets as a benchmark called FLaRe (Formal Language Recognition), along with our code.

Updated: 2024-11-11 16:33:25

标题: 训练神经网络作为形式语言识别器

摘要: 在形式语言理论的术语中表征神经网络架构的计算能力仍然是一项关键的研究方向，因为它描述了现代人工智能的推理能力的下限和上限。然而，在实验测试这些界限时，现有的工作经常存在实验和它们旨在支持的形式主张之间的差异。问题在于形式语言理论特指识别器：接收字符串作为输入并分类其是否属于某种语言的机器。另一方面，通常会使用仅在非正式意义上类似的代理任务，例如语言建模或序列到序列转换。我们通过直接训练和评估神经网络作为字符串的二元分类器来纠正这种不匹配，使用一种通用方法，可以应用于各种语言。作为其中的一部分，我们扩展了Sn{\ae}bjarnarson等人（2024年）最近提出的算法，用于从正则语言中进行长度受控的字符串采样，其渐近时间复杂度比先前的方法好得多。我们针对Chomsky层次结构中的各种语言使用了三种神经网络架构（简单RNN、LSTM和因果掩蔽变压器）的结果。我们发现RNN和LSTM通常优于变压器，并且辅助训练目标（如语言建模）可以提高性能，尽管没有单一目标在所有语言和架构上都统一提高性能。我们的贡献将有助于在未来的研究中实施理论上合理的实证测试语言识别主张。我们已发布了我们的数据集作为名为FLaRe（形式语言识别）的基准，以及我们的代码。

更新时间: 2024-11-11 16:33:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.07107v1

The Evolution of Cryptography through Number Theory

Cryptography, derived from Greek meaning hidden writing, uses mathematical techniques to secure information by converting it into an unreadable format. While cryptography as a science began around 100 years ago, its roots trace back to ancient civilizations like Mesopotamia and Egypt. Over time, cryptography evolved from basic methods to complex systems involving number theory, such as modular arithmetic, the Euclidean algorithm, and Eulers totient function. This paper explores the link between early information hiding techniques and modern cryptographic algorithms like RSA, which use advanced number theory to secure data for billions of people. By analyzing historical methods, this study shows how the development of number theory enabled the transition from simple letter shifting ciphers, like the Caesar and Vigenere ciphers, to more sophisticated encryption methods. This evolution reflects a profound impact on daily life and the importance of number theory in protecting information.

Updated: 2024-11-11 16:27:57

标题: 密码学在数论中的演变

摘要: 密码学，源自希腊语的隐藏写作，使用数学技术将信息转换为不可读格式以保护信息安全。虽然密码学作为一门科学大约始于100年前，但其根源可以追溯到古代文明如美索不达米亚和埃及。随着时间的推移，密码学从基本方法演变为涉及数论的复杂系统，如模算术、欧几里得算法和欧拉函数。本文探讨了早期信息隐藏技术与现代密码算法之间的联系，如RSA等利用先进数论保护数十亿人的数据。通过分析历史方法，本研究展示了数论发展如何使得从简单的字母移位密码（如凯撒密码和维吉尼亚密码）过渡到更复杂的加密方法。这种演变反映了数论对日常生活的深远影响以及保护信息安全的重要性。

更新时间: 2024-11-11 16:27:57

领域: cs.CR,94A60, 11T71, 11Y16, 11A41,E.3; F.2.1; G.2.1; K.6.5

下载: http://arxiv.org/abs/2411.14451v1

Learning Multi-Agent Collaborative Manipulation for Long-Horizon Quadrupedal Pushing

Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particularly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room organization. This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub-goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world.

Updated: 2024-11-11 16:27:25

标题: 学习多智能体协作操纵，用于长时间跨度四足动物推动

摘要: 最近，四足动物的运动能力取得了显著的进展，但它们的操作能力，特别是处理大型物体的能力仍然有限，限制了它们在搜索和救援、建筑、工业自动化和房间整理等要求严格的现实应用中的有用性。本文解决了多个四足机器人在障碍感知的长期推动任务中的挑战。我们提出了一个具有三个控制级别的分层多智能体强化学习框架。高级控制器集成了RRT规划器和集中自适应策略，生成子目标，而中级控制器使用分散的目标条件策略引导机器人朝着这些子目标前进。一个经过预训练的低级别运动策略执行移动命令。我们在模拟中将我们的方法与几个基线进行了评估，显示出与基线方法相比显著的改进，成功率提高了36.0%，完成时间比最好的基线减少了24.5%。我们的框架成功地实现了在现实世界中对Go1机器人进行推动长期、障碍感知的操作任务，如推动长方体和推动T形物体。

更新时间: 2024-11-11 16:27:25

领域: cs.RO,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2411.07104v1

Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems

In this work, we address unconstrained finite-sum optimization problems, with particular focus on instances originating in large scale deep learning scenarios. Our main interest lies in the exploration of the relationship between recent line search approaches for stochastic optimization in the overparametrized regime and momentum directions. First, we point out that combining these two elements with computational benefits is not straightforward. To this aim, we propose a solution based on mini-batch persistency. We then introduce an algorithmic framework that exploits a mix of data persistency, conjugate-gradient type rules for the definition of the momentum parameter and stochastic line searches. The resulting algorithm is empirically shown to outperform other popular methods from the literature, obtaining state-of-the-art results in both convex and nonconvex large scale training problems.

Updated: 2024-11-11 16:26:33

标题: 在随机线搜索框架中有效利用动量项，快速优化有限和问题

摘要: 在这项工作中，我们研究了无约束的有限和优化问题，特别关注源自大规模深度学习场景的实例。我们主要关注最近在线搜索方法与动量方向在过参数化区域中随机优化之间的关系。首先，我们指出将这两个元素结合起来并获得计算上的好处并不简单。为此，我们提出了一种基于小批量持久性的解决方案。然后，我们介绍了一个算法框架，利用数据持久性、共轭梯度类型规则来定义动量参数以及随机线搜索。实验结果表明，所得到的算法在凸和非凸大规模训练问题中优于文献中其他流行的方法，获得了最先进的结果。

更新时间: 2024-11-11 16:26:33

领域: math.OC,cs.LG,90C30, 90C26, 90C06, 65K05, 68T07

下载: http://arxiv.org/abs/2411.07102v1

Bounded Rationality Equilibrium Learning in Mean Field Games

Mean field games (MFGs) tractably model behavior in large agent populations. The literature on learning MFG equilibria typically focuses on finding Nash equilibria (NE), which assume perfectly rational agents and are hence implausible in many realistic situations. To overcome these limitations, we incorporate bounded rationality into MFGs by leveraging the well-known concept of quantal response equilibria (QRE). Two novel types of MFG QRE enable the modeling of large agent populations where individuals only noisily estimate the true objective. We also introduce a second source of bounded rationality to MFGs by restricting agents' planning horizon. The resulting novel receding horizon (RH) MFGs are combined with QRE and existing approaches to model different aspects of bounded rationality in MFGs. We formally define MFG QRE and RH MFGs and compare them to existing equilibrium concepts such as entropy-regularized NE. Subsequently, we design generalized fixed point iteration and fictitious play algorithms to learn QRE and RH equilibria. After a theoretical analysis, we give different examples to evaluate the capabilities of our learning algorithms and outline practical differences between the equilibrium concepts.

Updated: 2024-11-11 16:24:03

标题: 有界理性均衡学习在均场博弈中的应用

摘要: 平均场博弈（MFGs）可在大型代理人群体中方便地建模行为。关于学习MFG均衡的文献通常集中在寻找纳什均衡（NE），这些假设完全理性的代理人，因此在许多现实情况下是不切实际的。为了克服这些局限，我们通过利用众所周知的量子反应均衡（QRE）概念将有界理性纳入MFGs。两种新型的MFG QRE使得可以对大型代理人群体进行建模，其中个体只能模糊地估计真实目标。我们还通过限制代理人的规划视野引入了MFGs的第二种有界理性。由此产生的新型逐步视野（RH）MFGs与QRE和现有方法结合起来，以模拟MFGs中的有界理性的不同方面。我们正式定义了MFG QRE和RH MFGs，并将它们与熵正则化NE等现有均衡概念进行比较。随后，我们设计了广义不动点迭代和虚拟博弈算法来学习QRE和RH均衡。在理论分析之后，我们给出了不同的示例来评估我们学习算法的能力，并概述了均衡概念之间的实际差异。

更新时间: 2024-11-11 16:24:03

领域: cs.GT,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2411.07099v1

A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs

As modern web services increasingly rely on REST APIs, their thorough testing has become crucial. Furthermore, the advent of REST API specifications such as the OpenAPI Specification has led to the emergence of many black-box REST API testing tools. However, these tools often focus on individual test elements in isolation (e.g., APIs, parameters, values), resulting in lower coverage and less effectiveness in detecting faults (i.e., 500 response codes). To address these limitations, we present AutoRestTest, the first black-box framework to adopt a dependency-embedded multi-agent approach for REST API testing, integrating Multi-Agent Reinforcement Learning (MARL) with a Semantic Property Dependency Graph (SPDG) and Large Language Models (LLMs). Our approach treats REST API testing as a separable problem, where four agents -- API, dependency, parameter, and value -- collaborate to optimize API exploration. LLMs handle domain-specific value restrictions, the SPDG model simplifies the search space for dependencies using a similarity score between API operations, and MARL dynamically optimizes the agents' behavior. Evaluated on 12 real-world REST services, AutoRestTest outperforms the four leading black-box REST API testing tools, including those assisted by RESTGPT (which augments realistic test inputs using LLMs), in terms of code coverage, operation coverage, and fault detection. Notably, AutoRestTest is the only tool able to identify an internal server error in Spotify. Our ablation study underscores the significant contributions of the agent learning, SPDG, and LLM components.

Updated: 2024-11-11 16:20:27

标题: 一种基于语义图和LLM驱动输入的REST API测试的多智能体方法

摘要: 随着现代Web服务越来越依赖于REST API，它们的彻底测试变得至关重要。此外，REST API规范的出现，如OpenAPI规范，导致了许多黑盒REST API测试工具的出现。然而，这些工具通常侧重于单独的测试元素（例如API、参数、值），导致覆盖范围较低，在检测故障（即500响应代码）方面效果较差。为了解决这些限制，我们提出了AutoRestTest，这是第一个采用依赖嵌入多代理方法进行REST API测试的黑盒框架，将多代理强化学习（MARL）与语义属性依赖图（SPDG）和大型语言模型（LLMs）集成在一起。我们的方法将REST API测试视为一个可分离的问题，其中四个代理- API、依赖、参数和值- 合作优化API的探索。LLMs处理领域特定的值限制，SPDG模型使用API操作之间的相似度分数简化依赖关系的搜索空间，MARL动态优化代理的行为。在12个真实的REST服务上评估，AutoRestTest在代码覆盖率、操作覆盖率和故障检测方面优于四个主要的黑盒REST API测试工具，包括那些由RESTGPT（使用LLMs增强真实测试输入）协助的工具。值得注意的是，AutoRestTest是唯一能够在Spotify中识别内部服务器错误的工具。我们的消融研究强调了代理学习、SPDG和LLM组件的重要贡献。

更新时间: 2024-11-11 16:20:27

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.07098v1

Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Optimization

The goal of multi-objective optimization (MOO) is to learn under multiple, potentially conflicting, objectives. One widely used technique to tackle MOO is through linear scalarization, where one fixed preference vector is used to combine the objectives into a single scalar value for optimization. However, recent work (Hu et al., 2024) has shown linear scalarization often fails to capture the non-convex regions of the Pareto Front, failing to recover the complete set of Pareto optimal solutions. In light of the above limitations, this paper focuses on Tchebycheff scalarization that optimizes for the worst-case objective. In particular, we propose an online mirror descent algorithm for Tchebycheff scalarization, which we call OMD-TCH. We show that OMD-TCH enjoys a convergence rate of $O(\sqrt{\log m/T})$ where $m$ is the number of objectives and $T$ is the number of iteration rounds. We also propose a novel adaptive online-to-batch conversion scheme that significantly improves the practical performance of OMD-TCH while maintaining the same convergence guarantees. We demonstrate the effectiveness of OMD-TCH and the adaptive conversion scheme on both synthetic problems and federated learning tasks under fairness constraints, showing state-of-the-art performance.

Updated: 2024-11-11 16:17:07

标题: 多目标优化中Tchebycheff标量化的在线镜像下降

摘要: 多目标优化（MOO）的目标是在多个潜在冲突的目标下进行学习。解决MOO的一种广泛使用的技术是通过线性标量化，其中使用一个固定的偏好向量将目标组合成一个单一的标量值进行优化。然而，最近的研究（Hu等人，2024年）表明线性标量化经常无法捕捉帕累托前沿的非凸区域，无法恢复完整的帕累托最优解集。鉴于上述限制，本文重点研究了为最坏情况目标进行优化的切比雪夫标量化。具体来说，我们提出了一种在线镜像下降算法用于切比雪夫标量化，我们称之为OMD-TCH。我们展示了OMD-TCH具有$O(\sqrt{\log m/T})$的收敛速度，其中$m$是目标数量，$T$是迭代轮数。我们还提出了一种新颖的自适应在线转换方案，显著提高了OMD-TCH的实际性能，同时保持相同的收敛保证。我们在合成问题和在公平性约束下的联邦学习任务上展示了OMD-TCH和自适应转换方案的有效性，显示出最先进的性能。

更新时间: 2024-11-11 16:17:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.21764v2

Efficient Online Scheduling and Routing for Automated Guided Vehicles In Loop-Based Graphs

Automated guided vehicles (AGVs) are widely used in various industries, and scheduling and routing them in a conflict-free manner is crucial to their efficient operation. We propose a loop-based algorithm that solves the online, conflict-free scheduling and routing problem for AGVs with any capacity and ordered jobs in loop-based graphs. The proposed algorithm is compared against an exact method, a greedy heuristic and a metaheuristic. We experimentally show, using theoretical and real instances on a model representing a real manufacturing plant, that this algorithm either outperforms the other algorithms or gets an equally good solution in less computing time.

Updated: 2024-11-11 16:15:48

标题: 高效的在线调度和路径规划对于基于循环图的自动引导车是非常重要的。

摘要: 自动导引车（AGVs）被广泛应用于各行各业，以冲突自由的方式进行调度和路径规划对它们的高效运行至关重要。我们提出了一种基于循环的算法，用于解决任何容量和有序作业的AGVs在线、冲突自由调度和路径规划问题。所提出的算法与精确方法、贪心启发式和元启发式进行了比较。我们通过在代表真实制造工厂的模型上使用理论和真实实例进行实验，证明了该算法在更短的计算时间内要么优于其他算法，要么得到同样好的解决方案。

更新时间: 2024-11-11 16:15:48

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2310.02195v3

Differentially-Private Collaborative Online Personalized Mean Estimation

We consider the problem of collaborative personalized mean estimation under a privacy constraint in an environment of several agents continuously receiving data according to arbitrary unknown agent-specific distributions. In particular, we provide a method based on hypothesis testing coupled with differential privacy and data variance estimation. Two privacy mechanisms and two data variance estimation schemes are proposed, and we provide a theoretical convergence analysis of the proposed algorithm for any bounded unknown distributions on the agents' data, showing that collaboration provides faster convergence than a fully local approach where agents do not share data. Moreover, we provide analytical performance curves for the case with an oracle class estimator, i.e., the class structure of the agents, where agents receiving data from distributions with the same mean are considered to be in the same class, is known. The theoretical faster-than-local convergence guarantee is backed up by extensive numerical results showing that for a considered scenario the proposed approach indeed converges much faster than a fully local approach, and performs comparably to ideal performance where all data is public. This illustrates the benefit of private collaboration in an online setting.

Updated: 2024-11-11 16:14:56

标题: 差分隐私的协作式在线个性化均值估计

摘要: 我们考虑在一个由多个代理不断接收数据的环境中，在隐私约束下协同个性化均值估计的问题。特别地，我们提供了一种基于假设检验、差分隐私和数据方差估计的方法。我们提出了两种隐私机制和两种数据方差估计方案，并对所提出的算法在代理数据上的任何有界未知分布进行了理论收敛分析，表明合作比代理不共享数据的完全本地方法收敛更快。此外，我们为具有oracle类估计器的情况提供了分析性能曲线，即代理的类结构已知，代理从具有相同均值的分布接收数据被视为同一类。理论上更快于本地收敛保证得到了大量数值结果的支持，这些结果表明对于考虑的情景，所提出的方法确实比完全本地方法收敛快得多，并且表现与所有数据都是公开的理想性能相当。这说明了在线环境中私人合作的好处。

更新时间: 2024-11-11 16:14:56

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2411.07094v1

Towards Characterizing Cyber Networks with Large Language Models

Threat hunting analyzes large, noisy, high-dimensional data to find sparse adversarial behavior. We believe adversarial activities, however they are disguised, are extremely difficult to completely obscure in high dimensional space. In this paper, we employ these latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM). CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed. The model is deliberately overtrained on a sliding window of data to characterize each window closely. We use the Adjusted Rand Index (ARI) to comparing the k-means clustering of CLEM output to expert labeling of the embeddings. Our approach demonstrates that there is promise in using natural language modeling to understand cyber data.

Updated: 2024-11-11 16:09:13

标题: 朝着利用大型语言模型表征网络安全的方向

摘要: 威胁狩猎分析大量、嘈杂、高维数据，以发现稀疏的对抗行为。我们相信，无论对抗活动如何伪装，在高维空间中都极其难以完全隐藏。在本文中，我们利用网络数据的这些潜在特征，通过一个名为网络日志嵌入模型（CLEM）的原型工具来发现异常。CLEM模型在真实生产网络和物联网（IoT）网络安全测试平台的Zeek网络流量日志上进行了训练。该模型被故意过度训练在一个滑动窗口的数据上，以紧密表征每个窗口。我们使用调整兰德指数（ARI）来比较CLEM输出的k均值聚类和专家标记的嵌入。我们的方法表明，使用自然语言建模来理解网络数据是有希望的。

更新时间: 2024-11-11 16:09:13

领域: cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2411.07089v1

Eavesdropping on Semantic Communication: Timing Attacks and Countermeasures

Semantic communication is a new paradigm that considers the meaning of transmitted information to optimize communication. One possible application is the remote monitoring of a process under communication costs: scheduling updates based on semantic considerations can significantly reduce transmission frequency while maintaining high-quality tracking performance. However, semantic scheduling also opens a timing-based side-channel that an eavesdropper may exploit to obtain information about the state of the remote process, even if the content of updates is perfectly secure. In this work, we study an eavesdropping attack against pull-based semantic scheduling for the tracking of remote Markov processes. We provide a theoretical framework for defining the effectiveness of the attack and of possible countermeasures, as well as a practical heuristic that can provide a balance between the performance gains offered by semantic communication and the information leakage.

Updated: 2024-11-11 16:05:03

标题: 窃听语义通信：时序攻击与对策

摘要: 语义通信是一种考虑传输信息含义以优化通信的新范式。一个可能的应用是在通信成本下远程监控过程：基于语义考虑进行更新调度可以显著降低传输频率，同时保持高质量的跟踪性能。然而，语义调度也会打开一个基于时间的侧信道，窃听者可能利用这一信道获取有关远程过程状态的信息，即使更新内容是完全安全的。在这项工作中，我们研究了针对远程马尔可夫过程跟踪的拉取式语义调度的窃听攻击。我们提供了一个理论框架来定义攻击的有效性和可能的对策，以及一个实用的启发式方法，可以在语义通信提供的性能增益和信息泄漏之间取得平衡。

更新时间: 2024-11-11 16:05:03

领域: eess.SY,cs.CR,cs.IT,cs.MA,cs.SY,math.IT

下载: http://arxiv.org/abs/2411.07088v1

OCMDP: Observation-Constrained Markov Decision Process

In many practical applications, decision-making processes must balance the costs of acquiring information with the benefits it provides. Traditional control systems often assume full observability, an unrealistic assumption when observations are expensive. We tackle the challenge of simultaneously learning observation and control strategies in such cost-sensitive environments by introducing the Observation-Constrained Markov Decision Process (OCMDP), where the policy influences the observability of the true state. To manage the complexity arising from the combined observation and control actions, we develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. This decomposition enables efficient learning in the expanded action space by focusing on when and what to observe, as well as determining optimal control actions, without requiring knowledge of the environment's dynamics. We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole. Given both scenarios, the experimental results demonstrate that our model achieves a substantial reduction in observation costs on average, significantly outperforming baseline methods by a notable margin in efficiency.

Updated: 2024-11-11 16:04:49

标题: OCMDP：观察约束的马尔可夫决策过程

摘要: 在许多实际应用中，决策过程必须在获取信息的成本和提供的利益之间取得平衡。传统的控制系统通常假设完全可观察性，当观察成本昂贵时，这是一个不现实的假设。我们通过引入观测约束马尔可夫决策过程（OCMDP）来应对在这种成本敏感环境中同时学习观测和控制策略的挑战，其中策略影响真实状态的可观测性。为了管理由观测和控制行为结合而产生的复杂性，我们开发了一个迭代的、无模型深度强化学习算法，将策略的感知和控制部分分开。这种分解通过专注于何时以及何种观察，以及确定最佳控制行为，从而使在扩展的行动空间中高效学习成为可能，而不需要了解环境的动态。我们在一个模拟的诊断任务和一个真实的医疗环境中使用HeartPole验证了我们的方法。在这两种情况下，实验结果表明我们的模型平均实现了观测成本的显着降低，其效率明显优于基准方法。

更新时间: 2024-11-11 16:04:49

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.07087v1

To Train or Not to Train: Balancing Efficiency and Training Cost in Deep Reinforcement Learning for Mobile Edge Computing

Artificial Intelligence (AI) is a key component of 6G networks, as it enables communication and computing services to adapt to end users' requirements and demand patterns. The management of Mobile Edge Computing (MEC) is a meaningful example of AI application: computational resources available at the network edge need to be carefully allocated to users, whose jobs may have different priorities and latency requirements. The research community has developed several AI algorithms to perform this resource allocation, but it has neglected a key aspect: learning is itself a computationally demanding task, and considering free training results in idealized conditions and performance in simulations. In this work, we consider a more realistic case in which the cost of learning is specifically accounted for, presenting a new algorithm to dynamically select when to train a Deep Reinforcement Learning (DRL) agent that allocates resources. Our method is highly general, as it can be directly applied to any scenario involving a training overhead, and it can approach the same performance as an ideal learning agent even under realistic training conditions.

Updated: 2024-11-11 16:02:12

标题: 是否培训：在移动边缘计算中平衡效率和培训成本的深度强化学习

摘要: 人工智能（AI）是6G网络的关键组成部分，因为它使通信和计算服务能够适应终端用户的需求和需求模式。移动边缘计算（MEC）的管理是AI应用的一个有意义的例子：网络边缘可用的计算资源需要被精心分配给具有不同优先级和延迟要求的用户。研究界已经开发了几种AI算法来执行这种资源分配，但它忽略了一个关键方面：学习本身是一个计算密集型的任务，考虑到免费训练会导致在理想条件下的性能和模拟中的表现。在这项工作中，我们考虑了一个更现实的情况，即学习成本是专门考虑的，提出了一种新算法来动态选择何时训练一个分配资源的深度强化学习（DRL）代理。我们的方法非常通用，因为它可以直接应用于任何涉及训练开销的场景，并且即使在现实的训练条件下，它也可以接近理想学习代理的性能。

更新时间: 2024-11-11 16:02:12

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.07086v1

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

Existing large vision-language models (LVLMs) are largely limited to processing short, seconds-long videos and struggle with generating coherent descriptions for extended video spanning minutes or more. Long video description introduces new challenges, such as plot-level consistency across descriptions. To address these, we figure out audio-visual character identification, matching character names to each dialogue, as a key factor. We propose StoryTeller, a system for generating dense descriptions of long videos, incorporating both low-level visual concepts and high-level plot information. StoryTeller uses a multimodal large language model that integrates visual, audio, and text modalities to perform audio-visual character identification on minute-long video clips. The results are then fed into a LVLM to enhance consistency of video description. We validate our approach on movie description tasks and introduce MovieStory101, a dataset with dense descriptions for three-minute movie clips. To evaluate long video descriptions, we create MovieQA, a large set of multiple-choice questions for the MovieStory101 test set. We assess descriptions by inputting them into GPT-4 to answer these questions, using accuracy as an automatic evaluation metric. Experiments show that StoryTeller outperforms all open and closed-source baselines on MovieQA, achieving 9.5% higher accuracy than the strongest baseline, Gemini-1.5-pro, and demonstrating a +15.56% advantage in human side-by-side evaluations. Additionally, incorporating audio-visual character identification from StoryTeller improves the performance of all video description models, with Gemini-1.5-pro and GPT-4o showing relative improvement of 5.5% and 13.0%, respectively, in accuracy on MovieQA.

Updated: 2024-11-11 15:51:48

标题: 讲述者：通过全局音频视觉角色识别改进长视频描述

摘要: 现有的大型视觉语言模型（LVLMs）主要局限于处理短暂的、几秒钟长的视频，并在生成跨越几分钟或更长时间的视频的连贯描述方面遇到困难。长视频描述引入了新的挑战，比如跨描述的情节一致性。为了解决这些问题，我们确定了音频-视觉角色识别，将角色名称与每个对话匹配，作为关键因素。我们提出了StoryTeller，一个用于生成长视频密集描述的系统，融合了低级视觉概念和高级情节信息。StoryTeller使用一个多模态大型语言模型，整合了视觉、音频和文本模态，对长达一分钟的视频片段进行音频-视觉角色识别。然后将结果输入到LVLM中，以增强视频描述的一致性。我们在电影描述任务上验证了我们的方法，并引入了MovieStory101，一个包含三分钟电影片段密集描述的数据集。为了评估长视频描述，我们创建了MovieQA，一个针对MovieStory101测试集的大量多项选择问题集。我们通过将描述输入到GPT-4中来回答这些问题，使用准确率作为自动评估指标。实验表明，StoryTeller在MovieQA上优于所有开源和闭源基线，比最强基线Gemini-1.5-pro的准确率高出9.5%，在人工并排评估中表现出+15.56%的优势。此外，从StoryTeller中整合音频-视觉角色识别可以提高所有视频描述模型的性能，Gemini-1.5-pro和GPT-4o在MovieQA上的准确率分别提高了5.5%和13.0%。

更新时间: 2024-11-11 15:51:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.07076v1

Deep Augmentation: Self-Supervised Learning with Transformations in Activation Space

We introduce Deep Augmentation, an approach to implicit data augmentation using dropout or PCA to transform a targeted layer within a neural network to improve performance and generalization. We demonstrate Deep Augmentation through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning. We observe substantial performance gains with Transformers, ResNets, and Graph Neural Networks as the underlying models in contrastive learning, but observe inverse effects on the corresponding supervised problems. Our analysis suggests that Deep Augmentation alleviates co-adaptation between layers, a problem exhibited by self-supervised learning where ground truth labels are not available. We use this observation to formulate a method for selecting which layer to target; in particular, our experimentation reveals that targeting deeper layers with Deep Augmentation outperforms augmenting the input data. The simple network- and modality-agnostic nature of this approach enables its integration into various machine learning pipelines.

Updated: 2024-11-11 15:49:16

标题: 深度增强：在激活空间中使用自监督学习进行变换

摘要: 我们介绍了一种称为深度增强（Deep Augmentation）的方法，利用dropout或PCA来转换神经网络中的目标层，以提高性能和泛化能力。我们通过在自然语言处理、计算机视觉和图学习领域进行大量实验来展示深度增强的效果。我们观察到，在对比学习任务中，使用Transformers、ResNets和图神经网络作为基础模型时，可以获得显著的性能提升，但在相应的监督问题中观察到了相反的效果。我们的分析表明，深度增强有助于减轻层之间的共适应问题，这是自监督学习中没有地面真实标签的问题。我们利用这一观察结果来制定一个选择目标层的方法；特别是，我们的实验表明，使用深度增强来针对更深的层要优于对输入数据进行增强。这种方法的简单网络和模态无关性使其可以集成到各种机器学习流程中。

更新时间: 2024-11-11 15:49:16

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2303.14537v3

An Interpretable X-ray Style Transfer via Trainable Local Laplacian Filter

Radiologists have preferred visual impressions or 'styles' of X-ray images that are manually adjusted to their needs to support their diagnostic performance. In this work, we propose an automatic and interpretable X-ray style transfer by introducing a trainable version of the Local Laplacian Filter (LLF). From the shape of the LLF's optimized remap function, the characteristics of the style transfer can be inferred and reliability of the algorithm can be ensured. Moreover, we enable the LLF to capture complex X-ray style features by replacing the remap function with a Multi-Layer Perceptron (MLP) and adding a trainable normalization layer. We demonstrate the effectiveness of the proposed method by transforming unprocessed mammographic X-ray images into images that match the style of target mammograms and achieve a Structural Similarity Index (SSIM) of 0.94 compared to 0.82 of the baseline LLF style transfer method from Aubry et al.

Updated: 2024-11-11 15:47:25

标题: 一种可解释的X射线风格转移方法：通过可训练的局部拉普拉斯滤波器

摘要: 放射科医生更倾向于手动调整的X射线图像的视觉印象或“风格”，以满足其诊断表现的需要。在这项工作中，我们提出了一种自动且可解释的X射线风格转移方法，通过引入可训练版本的局部拉普拉斯滤波器（LLF）。通过 LLF 优化的重新映射函数的形状，可以推断出风格转移的特征，并确保算法的可靠性。此外，我们通过用多层感知器（MLP）替换重新映射函数并添加可训练的归一化层，使LLF能够捕获复杂的X射线风格特征。我们通过将未处理的乳腺X射线图像转换为与目标乳腺X射线图像风格匹配的图像，并将结构相似性指数（SSIM）从Aubry等人的基线LLF风格转移方法的0.82提高至0.94，证明了所提出的方法的有效性。

更新时间: 2024-11-11 15:47:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.07072v1

Universal Response and Emergence of Induction in LLMs

While induction is considered a key mechanism for in-context learning in LLMs, understanding its precise circuit decomposition beyond toy models remains elusive. Here, we study the emergence of induction behavior within LLMs by probing their response to weak single-token perturbations of the residual stream. We find that LLMs exhibit a robust, universal regime in which their response remains scale-invariant under changes in perturbation strength, thereby allowing us to quantify the build-up of token correlations throughout the model. By applying our method, we observe signatures of induction behavior within the residual stream of Gemma-2-2B, Llama-3.2-3B, and GPT-2-XL. Across all models, we find that these induction signatures gradually emerge within intermediate layers and identify the relevant model sections composing this behavior. Our results provide insights into the collective interplay of components within LLMs and serve as a benchmark for large-scale circuit analysis.

Updated: 2024-11-11 15:47:15

标题: 通用响应与LLMs中感应的出现

摘要: 尽管诱导被认为是LLMs中上下文学习的关键机制，但其在玩具模型之外的精确电路分解仍然难以理解。在这里，我们通过探究LLMs对残留流的弱单记号扰动的响应来研究诱导行为的出现。我们发现LLMs展现出一个强大、普遍的状态，在这个状态下，它们的响应在扰动强度变化时保持尺度不变，从而使我们能够量化模型中记号相关性的积累。通过应用我们的方法，我们观察到了Gemma-2-2B、Llama-3.2-3B和GPT-2-XL的残余流中诱导行为的迹象。在所有模型中，我们发现这些诱导迹象逐渐在中间层中出现，并确定了构成这种行为的相关模型部分。我们的结果提供了关于LLMs中组件之间的集体相互作用的见解，并作为大规模电路分析的基准。

更新时间: 2024-11-11 15:47:15

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.07071v1

On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

The pretraining and fine-tuning approach has become the leading technique for various NLP applications. However, recent studies reveal that fine-tuning data, due to their sensitive nature, domain-specific characteristics, and identifiability, pose significant privacy concerns. To help develop more privacy-resilient fine-tuning models, we introduce a novel active privacy auditing framework, dubbed Parsing, designed to identify and quantify privacy leakage risks during the supervised fine-tuning (SFT) of language models (LMs). The framework leverages improved white-box membership inference attacks (MIAs) as the core technology, utilizing novel learning objectives and a two-stage pipeline to monitor the privacy of the LMs' fine-tuning process, maximizing the exposure of privacy risks. Additionally, we have improved the effectiveness of MIAs on large LMs including GPT-2, Llama2, and certain variants of them. Our research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process. Experimental results confirm the framework's efficiency across various models and tasks, emphasizing notable privacy concerns in the fine-tuning process. Project code available for https://github.com/mapleleavesss/PARSING.

Updated: 2024-11-11 15:46:07

标题: 关于白盒语言模型监督微调中的主动隐私审计

摘要: 预训练和微调方法已成为各种自然语言处理应用的主要技术。然而，最近的研究表明，由于微调数据的敏感性、领域特定特征和可识别性，存在重大的隐私问题。为了帮助开发更具隐私韧性的微调模型，我们引入了一种新颖的主动隐私审计框架，称为Parsing，旨在在语言模型（LMs）的监督微调（SFT）过程中识别和量化隐私泄漏风险。该框架利用改进的白盒成员推断攻击（MIAs）作为核心技术，利用新颖的学习目标和两阶段流程来监控LMs微调过程的隐私，最大限度地暴露隐私风险。此外，我们提高了对包括GPT-2、Llama2和它们的某些变体在内的大型LMs上MIAs的有效性。我们的研究旨在为LMs的SFT社区提供一个可靠、即用即用的隐私审计工具，并为在微调过程中保护隐私提供宝贵见解。实验结果证实了框架在各种模型和任务中的效率，强调了微调过程中显著的隐私问题。项目代码可在https://github.com/mapleleavesss/PARSING 上找到。

更新时间: 2024-11-11 15:46:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.07070v1

Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study

Named Entity Recognition seeks to extract substrings within a text that name real-world objects and to determine their type (for example, whether they refer to persons or organizations). In this survey, we first present an overview of recent popular approaches, including advancements in Transformer-based methods and Large Language Models (LLMs) that have not had much coverage in other surveys. In addition, we discuss reinforcement learning and graph-based approaches, highlighting their role in enhancing NER performance. Second, we focus on methods designed for datasets with scarce annotations. Third, we evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics (as regards their domain, their size, and their number of classes). We thus provide a deep comparison of algorithms that have never been considered together. Our experiments shed some light on how the characteristics of datasets affect the behavior of the methods we compare.

Updated: 2024-11-11 15:45:02

标题: 近年来命名实体识别的最新进展：一项全面调查和比较研究

摘要: 命名实体识别旨在从文本中提取命名现实世界对象的子字符串，并确定它们的类型（例如，它们是否指代人员或组织）。在这项调查中，我们首先概述了最近流行的方法，包括基于Transformer的方法和在其他调查中没有得到充分覆盖的大型语言模型（LLMs）的进展。此外，我们讨论了强化学习和基于图的方法，突出它们在提高NER性能中的作用。其次，我们着眼于为稀缺注释的数据集设计的方法。第三，我们评估了主要NER实现在具有不同特征的各种数据集上的性能（就其领域、大小和类别数量而言）。因此，我们对从未共同考虑过的算法进行了深入比较。我们的实验揭示了数据集特征如何影响我们比较的方法的行为。

更新时间: 2024-11-11 15:45:02

领域: cs.CL,cs.LG,68T50, 68Q32

下载: http://arxiv.org/abs/2401.10825v2

Medication Recommendation via Dual Molecular Modalities and Multi-Step Enhancement

Existing works based on molecular knowledge neglect the 3D geometric structure of molecules and fail to learn the high-dimensional information of medications, leading to structural confusion. Additionally, it does not extract key substructures from a single patient visit, resulting in the failure to identify medication molecules suitable for the current patient visit. To address the above limitations, we propose a bimodal molecular recommendation framework named BiMoRec, which introduces 3D molecular structures to obtain atomic 3D coordinates and edge indices, overcoming the inherent lack of high-dimensional molecular information in 2D molecular structures. To retain the fast training and prediction efficiency of the recommendation system, we use bimodal graph contrastive pretraining to maximize the mutual information between the two molecular modalities, achieving the fusion of 2D and 3D molecular graphs. Additionally, we designed a molecular multi-step enhancement mechanism to re-calibrate the molecular weights. Specifically, we employ a pre-training method that captures both 2D and 3D molecular structure representations, along with substructure representations, and leverages contrastive learning to extract mutual information. We then use the pre-trained encoder to generate molecular representations, enhancing them through a three-step process: intra-visit, molecular per-visit, and latest-visit. Finally, we apply temporal information aggregation to generate the final medication combinations. Our implementation on the MIMIC-III and MIMIC-IV datasets demonstrates that our method achieves state-of-the-art performance.

Updated: 2024-11-11 15:37:29

标题: 通过双分子模态和多步增强进行药物推荐

摘要: 现有的基于分子知识的工作忽视了分子的3D几何结构，并未能学习药物的高维信息，导致结构混淆。此外，它没有从单个患者访问中提取关键亚结构，导致未能识别适合当前患者访问的药物分子。为了解决上述限制，我们提出了一个名为BiMoRec的双模式分子推荐框架，引入3D分子结构以获取原子3D坐标和边索引，克服了2D分子结构中高维分子信息的固有缺乏。为了保留推荐系统的快速训练和预测效率，我们使用双模式图对比预训练来最大化两种分子模态之间的互信息，实现2D和3D分子图的融合。此外，我们设计了一个分子多步增强机制来重新校准分子权重。具体而言，我们采用捕捉2D和3D分子结构表示以及亚结构表示的预训练方法，并利用对比学习来提取互信息。然后，我们使用预训练的编码器生成分子表示，通过三步过程增强它们：内部访问、分子每次访问和最新访问。最后，我们应用时间信息聚合来生成最终的药物组合。我们在MIMIC-III和MIMIC-IV数据集上的实现表明，我们的方法取得了最先进的性能。

更新时间: 2024-11-11 15:37:29

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2405.20358v3

Empirical Quantum Advantage Analysis of Quantum Kernel in Gene Expression Data

The incorporation of quantum ansatz with machine learning classification models demonstrates the ability to extract patterns from data for classification tasks. However, taking advantage of the enhanced computational power of quantum machine learning necessitates dealing with various constraints. In this paper, we focus on constraints like finding suitable datasets where quantum advantage is achievable and evaluating the relevance of features chosen by classical and quantum methods. Additionally, we compare quantum and classical approaches using benchmarks and estimate the computational complexity of quantum circuits to assess real-world usability. For our experimental validation, we selected the gene expression dataset, given the critical role of genetic variations in regulating physiological behavior and disease susceptibility. Through this study, we aim to contribute to the advancement of quantum machine learning methodologies, offering valuable insights into their potential for addressing complex classification challenges in various domains.

Updated: 2024-11-11 15:34:53

标题: 基因表达数据中量子核的实证量子优势分析

摘要: 将量子 Ansatz 与机器学习分类模型相结合，展示了从数据中提取模式进行分类任务的能力。然而，利用量子机器学习增强的计算能力需要处理各种约束。本文关注一些约束，如寻找可实现量子优势的合适数据集，并评估经典和量子方法选择的特征的相关性。此外，我们使用基准比较量子和经典方法，并估计量子电路的计算复杂性，以评估实际应用的可行性。在我们的实验验证中，我们选择了基因表达数据集，考虑到遗传变异在调节生理行为和疾病易感性中的关键作用。通过这项研究，我们旨在为量子机器学习方法的进步做出贡献，为解决各个领域的复杂分类挑战提供有价值的见解。

更新时间: 2024-11-11 15:34:53

领域: quant-ph,cs.ET,cs.LG

下载: http://arxiv.org/abs/2411.07276v1

Divide-and-Conquer Posterior Sampling for Denoising Diffusion Priors

Recent advancements in solving Bayesian inverse problems have spotlighted denoising diffusion models (DDMs) as effective priors. Although these have great potential, DDM priors yield complex posterior distributions that are challenging to sample. Existing approaches to posterior sampling in this context address this problem either by retraining model-specific components, leading to stiff and cumbersome methods, or by introducing approximations with uncontrolled errors that affect the accuracy of the produced samples. We present an innovative framework, divide-and-conquer posterior sampling, which leverages the inherent structure of DDMs to construct a sequence of intermediate posteriors that guide the produced samples to the target posterior. Our method significantly reduces the approximation error associated with current techniques without the need for retraining. We demonstrate the versatility and effectiveness of our approach for a wide range of Bayesian inverse problems. The code is available at \url{https://github.com/Badr-MOUFAD/dcps}

Updated: 2024-11-11 15:31:42

标题: 分而治之后验采样用于去噪扩散先验

摘要: 最近在解决贝叶斯逆问题方面取得的进展已经将去噪扩散模型（DDMs）作为有效的先验引起了关注。尽管这些有着巨大的潜力，但是DDM先验会产生复杂的后验分布，使得采样变得具有挑战性。在这种情况下，现有的后验采样方法要么通过重新训练特定于模型的组件来解决这个问题，导致了僵硬和笨重的方法，要么通过引入具有未受控误差的近似值来影响产生样本的准确性。我们提出了一种创新的框架，分而治之后验采样，利用DDMs的固有结构构建一系列中间后验，引导产生的样本到目标后验。我们的方法显著降低了与当前技术相关的近似误差，而不需要重新训练。我们展示了我们的方法在各种贝叶斯逆问题中的多样性和有效性。代码可在\url{https://github.com/Badr-MOUFAD/dcps}上找到。

更新时间: 2024-11-11 15:31:42

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2403.11407v2

Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training

Network pruning is a set of computational techniques that aim to reduce a given model's computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has focused on pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are in any case too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their activations, to obtain sparse models that maximize the activations' alignment w.r.t. their corresponding dense models. Hence, we propose \textsc{NeuroAl}, a \emph{top-up} algorithm that can be used on top of any given pruning algorithm for LLMs, that modifies the block-wise and row-wise sparsity ratios to maximize the \emph{neuron alignment} among activations. Moreover, differently from existing methods, our approach adaptively selects the best parameters for the block-wise and row-wise sparsity ratios w.r.t. to the model and the desired sparsity (given as input), and requires \emph{no re-training}. We test our method on 4 different LLM families and 3 different sparsity ratios, showing how it consistently outperforms the latest state-of-the-art techniques. The code is available at https://github.com/eliacunegatti/NeuroAL.

Updated: 2024-11-11 15:30:16

标题: 基于零阶自适应神经元对齐的无需重新训练的修剪

摘要: 网络剪枝是一组计算技术，旨在通过移除模型的参数子集来降低给定模型的计算成本，同时对性能影响最小。在过去的十年中，最广泛使用的剪枝范式集中在剪枝和重新训练上，但由于大量的预训练模型，重新训练现在变得不方便，而且成本太高。在本文中，我们利用来自密集预训练模型的功能信息，即它们的激活，以获得最大化激活与相应密集模型对齐的稀疏模型。因此，我们提出了NeuroAl，一种可用于LLMs的\emph{top-up}算法，可修改块和行的稀疏比率以最大化激活之间的\emph{神经元对齐}。此外，与现有方法不同，我们的方法根据模型和所需的稀疏度（给定为输入）自适应选择最佳的块和行的稀疏比率参数，并且无需重新训练。我们在4个不同的LLM系列和3个不同的稀疏比率上测试了我们的方法，展示了它如何始终优于最新的技术。代码可在https://github.com/eliacunegatti/NeuroAL上找到。

更新时间: 2024-11-11 15:30:16

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.07066v1

Fast and Scalable Multi-Kernel Encoder Classifier

This paper introduces a new kernel-based classifier by viewing kernel matrices as generalized graphs and leveraging recent progress in graph embedding techniques. The proposed method facilitates fast and scalable kernel matrix embedding, and seamlessly integrates multiple kernels to enhance the learning process. Our theoretical analysis offers a population-level characterization of this approach using random variables. Empirically, our method demonstrates superior running time compared to standard approaches such as support vector machines and two-layer neural network, while achieving comparable classification accuracy across various simulated and real datasets.

Updated: 2024-11-11 15:29:59

标题: 快速且可扩展的多核编码器分类器

摘要: 本文介绍了一种新的基于核的分类器，通过将核矩阵视为广义图，并利用图嵌入技术的最新进展。所提出的方法促进了快速和可扩展的核矩阵嵌入，并无缝集成多个核以增强学习过程。我们的理论分析利用随机变量对这种方法进行了人群级别的表征。在经验上，与支持向量机和两层神经网络等标准方法相比，我们的方法在各种模拟和真实数据集上表现出优越的运行时间，同时实现了可比较的分类准确度。

更新时间: 2024-11-11 15:29:59

领域: cs.LG

下载: http://arxiv.org/abs/2406.02189v2

Statistical Inference with Limited Memory: A Survey

The problem of statistical inference in its various forms has been the subject of decades-long extensive research. Most of the effort has been focused on characterizing the behavior as a function of the number of available samples, with far less attention given to the effect of memory limitations on performance. Recently, this latter topic has drawn much interest in the engineering and computer science literature. In this survey paper, we attempt to review the state-of-the-art of statistical inference under memory constraints in several canonical problems, including hypothesis testing, parameter estimation, and distribution property testing/estimation. We discuss the main results in this developing field, and by identifying recurrent themes, we extract some fundamental building blocks for algorithmic construction, as well as useful techniques for lower bound derivations.

Updated: 2024-11-11 15:27:46

标题: 有限记忆的统计推断：一项调查

摘要: 统计推断在其各种形式中的问题已经成为长达几十年的广泛研究对象。大部分工作都集中在表征行为与可用样本数量的关系上，而对内存限制对性能的影响给予的关注较少。最近，这个后一主题在工程和计算机科学文献中引起了很多兴趣。在这篇调查论文中，我们试图回顾在几个经典问题中受内存约束的统计推断的最新进展，包括假设检验、参数估计和分布属性测试/估计。我们讨论了这一发展领域的主要结果，并通过识别出现频率很高的主题，提取一些算法构建的基本构件，以及用于下界推导的有用技术。

更新时间: 2024-11-11 15:27:46

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2312.15225v3

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

This work investigates the effectiveness of schedule-free methods, developed by A. Defazio et al. (NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable empirical success in training neural networks. Specifically, we show that schedule-free SGD achieves optimal iteration complexity for nonsmooth, nonconvex optimization problems. Our proof begins with the development of a general framework for online-to-nonconvex conversion, which converts a given online learning algorithm into an optimization algorithm for nonconvex losses. Our general framework not only recovers existing conversions but also leads to two novel conversion schemes. Notably, one of these new conversions corresponds directly to schedule-free SGD, allowing us to establish its optimality. Additionally, our analysis provides valuable insights into the parameter choices for schedule-free SGD, addressing a theoretical gap that the convex theory cannot explain.

Updated: 2024-11-11 15:25:48

标题: 在线到非凸转换的通用框架：无调度SGD对非凸优化同样有效

摘要: 这项工作研究了A. Defazio等人（NeurIPS 2024）开发的无调度方法在非凸优化设置中的有效性，受到它们在训练神经网络中的显著经验成功的启发。具体来说，我们展示了无调度SGD在非光滑、非凸优化问题中实现了最佳迭代复杂度。我们的证明始于一个在线到非凸转换的通用框架的开发，该框架将给定的在线学习算法转换为非凸损失的优化算法。我们的通用框架不仅恢复了现有的转换，还提出了两种新的转换方案。值得注意的是，其中一种新的转换直接对应于无调度SGD，使我们能够证明其最优性。此外，我们的分析为无调度SGD的参数选择提供了有价值的见解，解决了凸理论无法解释的理论空白。

更新时间: 2024-11-11 15:25:48

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2411.07061v1

Solving Hidden Monotone Variational Inequalities with Surrogate Losses

Deep learning has proven to be effective in a wide variety of loss minimization problems. However, many applications of interest, like minimizing projected Bellman error and min-max optimization, cannot be modelled as minimizing a scalar loss function but instead correspond to solving a variational inequality (VI) problem. This difference in setting has caused many practical challenges as naive gradient-based approaches from supervised learning tend to diverge and cycle in the VI case. In this work, we propose a principled surrogate-based approach compatible with deep learning to solve VIs. We show that our surrogate-based approach has three main benefits: (1) under assumptions that are realistic in practice (when hidden monotone structure is present, interpolation, and sufficient optimization of the surrogates), it guarantees convergence, (2) it provides a unifying perspective of existing methods, and (3) is amenable to existing deep learning optimizers like ADAM. Experimentally, we demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. Furthermore, in the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient.

Updated: 2024-11-11 15:20:03

标题: 用替代损失函数解决隐藏的单调变分不等式

摘要: 深度学习已被证明在各种损失最小化问题中有效。然而，许多感兴趣的应用，如最小化投影贝尔曼误差和极小极大优化，不能被建模为最小化标量损失函数，而是对应于解决变分不等式（VI）问题。这种设置上的差异引起了许多实际挑战，因为从监督学习导出的朴素梯度方法在VI情况下往往会发散并循环。在这项工作中，我们提出了一种基于原则的基于代理的方法，与深度学习兼容，用于解决VI。我们展示我们的基于代理的方法具有三个主要优点：（1）在实践中存在实际的假设（当存在隐藏的单调结构，插值和足够优化代理）时，它保证收敛，（2）它提供了现有方法的统一视角，（3）适用于现有的深度学习优化器如ADAM。在实验中，我们展示了我们的基于代理的方法在极小极大优化和最小化投影贝尔曼误差方面是有效的。此外，在深度强化学习案例中，我们提出了一种更有效的计算和采样效率更高的TD（0）的新变体。

更新时间: 2024-11-11 15:20:03

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2411.05228v2

A Novel RFID Authentication Protocol Based on A Block-Order-Modulus Variable Matrix Encryption Algorithm

In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost RFID sensor tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix (DBLTKM) encryption algorithm is presented, which effectively expands the feasible domain of the key space. Based on the above three algorithms, a novel joint AM-SUEO-DBLTKM encryption algorithm is constructed. Making full use of the advantages of the proposed joint algorithm, a two-way RFID authentication protocol, named AM-SUEO-DBLTKM-RFID, is proposed for mobile RFID systems. In addition, the Burrows-Abadi-Needham (BAN) logic and security analysis indicate that the proposed AM-SUEO-DBLTKM-RFID protocol can effectively combat various typical attacks. Numerical results demonstrate that the proposed AM-SUEO-DBLTKM algorithm can save 99.59% of tag storage over traditional algorithms. Finally, the low computational complexity as well as the low storage cost of the proposed AM-SUEO-DBLTKM-RFID protocol facilitates deployment within low-cost RFID sensor tags.

Updated: 2024-11-11 15:16:27

标题: 一个基于块顺序模数可变矩阵加密算法的新型RFID认证协议

摘要: 本文研究了具有低成本RFID传感器标签的移动射频识别（RFID）系统的身份验证。首先，提出了一种自适应模数（AM）加密算法。随后，为了增强安全性而无需额外存储新密钥矩阵，设计了一种自更新加密顺序（SUEO）算法。此外，提出了一种对角块本地转置密钥矩阵（DBLTKM）加密算法，有效扩展了密钥空间的可行域。基于上述三种算法，构建了一种新颖的联合AM-SUEO-DBLTKM加密算法。充分利用所提出的联合算法的优势，提出了一种名为AM-SUEO-DBLTKM-RFID的移动RFID系统的双向RFID身份验证协议。此外，Burrows-Abadi-Needham（BAN）逻辑和安全性分析表明，所提出的AM-SUEO-DBLTKM-RFID协议能够有效地抵御各种典型攻击。数值结果表明，所提出的AM-SUEO-DBLTKM算法可以节省99.59%的标签存储空间，相比传统算法。最后，所提出的AM-SUEO-DBLTKM-RFID协议具有低计算复杂性和低存储成本，有利于在低成本RFID传感器标签中部署。

更新时间: 2024-11-11 15:16:27

领域: cs.CR,eess.SP

下载: http://arxiv.org/abs/2312.10593v5

Reconstruction of neuromorphic dynamics from a single scalar time series using variational autoencoder and neural network map

This paper examines the reconstruction of a family of dynamical systems with neuromorphic behavior using a single scalar time series. A model of a physiological neuron based on the Hodgkin-Huxley formalism is considered. Single time series of one of its variables is shown to be enough to train a neural network that can operate as a discrete time dynamical system with one control parameter. The neural network system is created in two steps. First, the delay-coordinate embedding vectors are constructed form the original time series and their dimension is reduced with by means of a variational autoencoder to obtain the recovered state-space vectors. It is shown that an appropriate reduced dimension can be determined by analyzing the autoencoder training process. Second, pairs of the recovered state-space vectors at consecutive time steps supplied with a constant value playing the role of a control parameter are used to train another neural network to make it operate as a recurrent map. The regimes of thus created neural network system observed when its control parameter is varied are in very good accordance with those of the original system, though they were not explicitly presented during training.

Updated: 2024-11-11 15:15:55

标题: 使用变分自动编码器和神经网络映射从单个标量时间序列重建神经形态动力学

摘要: 本文研究了使用单个标量时间序列重构具有神经形态行为的动力系统家族。考虑了基于Hodgkin-Huxley形式主义的生理神经元模型。结果表明，其中一个变量的单个时间序列足以训练一个可以作为具有一个控制参数的离散时间动力系统运行的神经网络。神经网络系统分为两个步骤创建。首先，通过延迟坐标嵌入向量构建原始时间序列，并通过变分自动编码器降低其维度，以获得恢复的状态空间向量。结果表明，通过分析自动编码器训练过程可以确定适当的降维。其次，使用连续时间步骤中恢复的状态空间向量对，并提供一个扮演控制参数角色的恒定值，用于训练另一个神经网络，使其作为一个循环映射运行。当变化控制参数时，所创建的神经网络系统的不同模式与原始系统非常一致，尽管它们在训练过程中并未明确呈现。

更新时间: 2024-11-11 15:15:55

领域: nlin.PS,cs.LG,physics.bio-ph

下载: http://arxiv.org/abs/2411.07055v1

ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset

Facade semantic segmentation is a long-standing challenge in photogrammetry and computer vision. Although the last decades have witnessed the influx of facade segmentation methods, there is a lack of comprehensive facade classes and data covering the architectural variability. In ZAHA, we introduce Level of Facade Generalization (LoFG), novel hierarchical facade classes designed based on international urban modeling standards, ensuring compatibility with real-world challenging classes and uniform methods' comparison. Realizing the LoFG, we present to date the largest semantic 3D facade segmentation dataset, providing 601 million annotated points at five and 15 classes of LoFG2 and LoFG3, respectively. Moreover, we analyze the performance of baseline semantic segmentation methods on our introduced LoFG classes and data, complementing it with a discussion on the unresolved challenges for facade segmentation. We firmly believe that ZAHA shall facilitate further development of 3D facade semantic segmentation methods, enabling robust segmentation indispensable in creating urban digital twins.

Updated: 2024-11-11 15:08:49

标题: ZAHA：引入外立面概括水平和大规模点云外立面语义分割基准数据集

摘要: Facade语义分割是摄影测量学和计算机视觉中长期存在的挑战。尽管过去几十年见证了大量的Facade分割方法涌现，但缺乏全面的Facade类别和覆盖建筑变化的数据。在ZAHA中，我们引入了Facade泛化水平（LoFG），这是一种基于国际城市建模标准设计的新型分层Facade类别，确保与真实世界中具有挑战性的类别和统一方法的比较兼容。实现LoFG后，我们目前提供了最大的语义3D Facade分割数据集，分别在LoFG2和LoFG3的五个和15个类别上提供了601百万个标注点。此外，我们分析了基线语义分割方法在我们引入的LoFG类别和数据上的性能，并讨论了Facade分割尚未解决的挑战。我们坚信ZAHA将促进3D Facade语义分割方法的进一步发展，实现在创建城市数字孪生体中不可或缺的强大分割。

更新时间: 2024-11-11 15:08:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.04865v3

Intelligent Green Efficiency for Intrusion Detection

Artificial Intelligence (AI) has emerged in popularity recently, recording great progress in various industries. However, the environmental impact of AI is a growing concern, in terms of the energy consumption and carbon footprint of Machine Learning (ML) and Deep Learning (DL) models, making essential investigate Green AI, an attempt to reduce the climate impact of AI systems. This paper presents an assessment of different programming languages and Feature Selection (FS) methods to improve computation performance of AI focusing on Network Intrusion Detection (NID) and cyber-attack classification tasks. Experiments were conducted using five ML models - Random Forest, XGBoost, LightGBM, Multi-Layer Perceptron, and Long Short-Term Memory - implemented in four programming languages - Python, Java, R, and Rust - along with three FS methods - Information Gain, Recursive Feature Elimination, and Chi-Square. The obtained results demonstrated that FS plays an important role enhancing the computational efficiency of AI models without compromising detection accuracy, highlighting languages like Python and R, that benefit from a rich AI libraries environment. These conclusions can be useful to design efficient and sustainable AI systems that still provide a good generalization and a reliable detection.

Updated: 2024-11-11 15:01:55

标题: 智能绿色效率用于入侵检测

摘要: 人工智能（AI）最近在各个行业蓬勃发展，取得了巨大进展。然而，AI的环境影响是一个日益关注的问题，主要体现在机器学习（ML）和深度学习（DL）模型的能耗和碳足迹上，因此非常有必要研究绿色人工智能，以减少AI系统对气候的影响。本文评估了不同编程语言和特征选择（FS）方法，以提高AI的计算性能，重点关注网络入侵检测（NID）和网络攻击分类任务。实验使用了五种ML模型 - 随机森林、XGBoost、LightGBM、多层感知器和长短期记忆 - 在四种编程语言 - Python、Java、R和Rust - 以及三种FS方法 - 信息增益、递归特征消除和卡方进行。实验结果表明，FS在提高AI模型的计算效率方面起着重要作用，而不会影响检测准确性，突出了像Python和R这样的语言，因其有丰富的AI库环境而受益。这些结论对设计高效、可持续的AI系统，同时提供良好泛化和可靠检测，具有实际意义。

更新时间: 2024-11-11 15:01:55

领域: cs.CR,cs.LG,cs.PF

下载: http://arxiv.org/abs/2411.08069v1

MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditions

Autonomous trucking is a promising technology that can greatly impact modern logistics and the environment. Ensuring its safety on public roads is one of the main duties that requires an accurate perception of the environment. To achieve this, machine learning methods rely on large datasets, but to this day, no such datasets are available for autonomous trucks. In this work, we present MAN TruckScenes, the first multimodal dataset for autonomous trucking. MAN TruckScenes allows the research community to come into contact with truck-specific challenges, such as trailer occlusions, novel sensor perspectives, and terminal environments for the first time. It comprises more than 740 scenes of 20s each within a multitude of different environmental conditions. The sensor set includes 4 cameras, 6 lidar, 6 radar sensors, 2 IMUs, and a high-precision GNSS. The dataset's 3D bounding boxes were manually annotated and carefully reviewed to achieve a high quality standard. Bounding boxes are available for 27 object classes, 15 attributes, and a range of more than 230m. The scenes are tagged according to 34 distinct scene tags, and all objects are tracked throughout the scene to promote a wide range of applications. Additionally, MAN TruckScenes is the first dataset to provide 4D radar data with 360{\deg} coverage and is thereby the largest radar dataset with annotated 3D bounding boxes. Finally, we provide extensive dataset analysis and baseline results. The dataset, development kit, and more are available online.

Updated: 2024-11-11 14:59:22

标题: MAN卡车场景：一个用于多样化条件下自动卡车的多模态数据集

摘要: 卡车自动驾驶技术是一项有前途的技术，可以极大地影响现代物流和环境。确保其在公共道路上的安全是一个主要职责，需要准确地感知环境。为了实现这一目标，机器学习方法依赖于大型数据集，但直到今天，还没有针对自动驾驶卡车的数据集。在这项工作中，我们提出了MAN TruckScenes，这是第一个用于自动驾驶卡车的多模态数据集。MAN TruckScenes允许研究社区首次接触到卡车特有的挑战，如拖车遮挡、新型传感器视角和终端环境。数据集包括超过740个不同环境条件下每个20秒的场景。传感器组合包括4个摄像头、6个激光雷达、6个雷达传感器、2个IMU和高精度GNSS。数据集的3D边界框经过手动标注和仔细审查，达到了高质量标准。边界框适用于27个对象类别、15个属性和超过230米的范围。场景根据34个不同的场景标签进行标记，所有对象在整个场景中都被跟踪，以促进各种应用。此外，MAN TruckScenes是第一个提供360度覆盖的4D雷达数据的数据集，因此是具有标注的3D边界框的最大雷达数据集。最后，我们提供了广泛的数据集分析和基线结果。数据集、开发工具包等可在线获取。

更新时间: 2024-11-11 14:59:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.07462v2

E3x: $\mathrm{E}(3)$-Equivariant Deep Learning Made Easy

This work introduces E3x, a software package for building neural networks that are equivariant with respect to the Euclidean group $\mathrm{E}(3)$, consisting of translations, rotations, and reflections of three-dimensional space. Compared to ordinary neural networks, $\mathrm{E}(3)$-equivariant models promise benefits whenever input and/or output data are quantities associated with three-dimensional objects. This is because the numeric values of such quantities (e.g. positions) typically depend on the chosen coordinate system. Under transformations of the reference frame, the values change predictably, but the underlying rules can be difficult to learn for ordinary machine learning models. With built-in $\mathrm{E}(3)$-equivariance, neural networks are guaranteed to satisfy the relevant transformation rules exactly, resulting in superior data efficiency and accuracy. The code for E3x is available from https://github.com/google-research/e3x, detailed documentation and usage examples can be found on https://e3x.readthedocs.io.

Updated: 2024-11-11 14:53:38

标题: E3x: $\mathrm{E}(3)$-等变深度学习简化

摘要: 这项工作介绍了E3x，一个用于构建神经网络的软件包，这些神经网络对于欧几里得群$\mathrm{E}(3)$是等变的，包括三维空间的平移、旋转和反射。与普通神经网络相比，$\mathrm{E}(3)$-等变模型在输入和/或输出数据是与三维对象相关的数量时承诺了好处。这是因为这些数量的数值值（例如位置）通常取决于所选择的坐标系。在参考框架的变换下，数值值会可预测性地改变，但对于普通机器学习模型来说，学习底层规则可能会很困难。通过内置$\mathrm{E}(3)$-等变性，神经网络被保证会完全满足相关的变换规则，从而实现卓越的数据效率和准确性。E3x的代码可以从https://github.com/google-research/e3x获得，详细文档和用法示例可以在https://e3x.readthedocs.io上找到。

更新时间: 2024-11-11 14:53:38

领域: cs.LG,cs.AI,physics.chem-ph

下载: http://arxiv.org/abs/2401.07595v3

Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification

We present BALDUR, a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.

Updated: 2024-11-11 14:51:24

标题: 高维多模态生物医学数据的小样本分类的统一贝叶斯表示

摘要: 我们提出了一种新颖的贝叶斯算法BALDUR，旨在处理高维场景中的多模态数据集和小样本量，同时提供可解释的解决方案。为此，所提出的模型将不同数据视图结合在一个共同的潜在空间中，提取相关信息以解决分类任务，并修剪出不相关/冗余的特征/数据视图。此外，为在小样本量场景下提供可泛化的解决方案，BALDUR有效地将双核函数与小样本-特征比率整合起来。最后，其线性性质确保了模型结果的可解释性，使其可用于生物标志物识别。该模型在两个不同的神经退行性数据集上进行了测试，优于最先进的模型，并检测到与已描述在科学文献中的标记一致的特征。

更新时间: 2024-11-11 14:51:24

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2411.07043v1

Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications

AI companions based on large language models can role-play and converse very naturally. When value conflicts arise between the AI companion and the user, it may offend or upset the user. Yet, little research has examined such conflicts. We first conducted a formative study that analyzed 151 user complaints about conflicts with AI companions, providing design implications for our study. Based on these, we created Minion, a technology probe to help users resolve human-AI value conflicts. Minion applies a user-empowerment intervention method that provides suggestions by combining expert-driven and user-driven conflict resolution strategies. We conducted a technology probe study, creating 40 value conflict scenarios on Character.AI and Talkie. 22 participants completed 274 tasks and successfully resolved conflicts 94.16% of the time. We summarize user responses, preferences, and needs in resolving value conflicts, and propose design implications to reduce conflicts and empower users to resolve them more effectively.

Updated: 2024-11-11 14:49:43

标题: "小黄人：通过专家驱动和用户驱动策略解决人工智能伴侣应用中的价值冲突的技术探针"

摘要: 基于大型语言模型的人工智能伴侣可以进行角色扮演和对话，表现得非常自然。当人工智能伴侣与用户之间出现价值冲突时，可能会冒犯或让用户感到不安。然而，很少有研究探讨这种冲突。我们首先进行了一项形成性研究，分析了151个用户关于与人工智能伴侣冲突的投诉，为我们的研究提供了设计启示。基于这些启示，我们创建了Minion，一个技术探针，帮助用户解决人工智能与人之间的价值冲突。Minion应用了一种用户赋权干预方法，通过结合专家驱动和用户驱动的冲突解决策略来提供建议。我们进行了一项技术探针研究，在Character.AI和Talkie上创建了40个价值冲突场景。22名参与者完成了274项任务，成功解决冲突的次数达到94.16%。我们总结了用户的回应、偏好和在解决价值冲突时的需求，并提出了设计建议，以减少冲突并帮助用户更有效地解决冲突。

更新时间: 2024-11-11 14:49:43

领域: cs.HC,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2411.07042v1

Designing Reliable Experiments with Generative Agent-Based Modeling: A Comprehensive Guide Using Concordia by Google DeepMind

In social sciences, researchers often face challenges when conducting large-scale experiments, particularly due to the simulations' complexity and the lack of technical expertise required to develop such frameworks. Agent-Based Modeling (ABM) is a computational approach that simulates agents' actions and interactions to evaluate how their behaviors influence the outcomes. However, the traditional implementation of ABM can be demanding and complex. Generative Agent-Based Modeling (GABM) offers a solution by enabling scholars to create simulations where AI-driven agents can generate complex behaviors based on underlying rules and interactions. This paper introduces a framework for designing reliable experiments using GABM, making sophisticated simulation techniques more accessible to researchers across various fields. We provide a step-by-step guide for selecting appropriate tools, designing the model, establishing experimentation protocols, and validating results.

Updated: 2024-11-11 14:45:08

标题: 使用谷歌DeepMind的Concordia设计可靠实验的生成式基于代理建模：全面指南

摘要: 在社会科学领域，研究人员在进行大规模实验时经常面临挑战，特别是由于模拟的复杂性和缺乏开发这种框架所需的技术专长。基于代理的建模（ABM）是一种计算方法，模拟代理的行为和互动，以评估它们的行为如何影响结果。然而，传统的ABM实施可能是繁重和复杂的。生成式基于代理的建模（GABM）通过使学者能够创建AI驱动的代理生成基于底层规则和互动的复杂行为，提供了一种解决方案。本文介绍了使用GABM设计可靠实验的框架，使复杂的模拟技术更容易被各个领域的研究人员接触。我们提供了一个逐步指南，用于选择适当的工具，设计模型，建立实验协议，并验证结果。

更新时间: 2024-11-11 14:45:08

领域: cs.AI

下载: http://arxiv.org/abs/2411.07038v1

ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models

Backdoor attacks pose significant challenges to the security of machine learning models, particularly for overparameterized models like deep neural networks. In this paper, we propose ProP (Propagation Perturbation), a novel and scalable backdoor detection method that leverages statistical output distributions to identify backdoored models and their target classes without relying on exhausive optimization strategies. ProP introduces a new metric, the benign score, to quantify output distributions and effectively distinguish between benign and backdoored models. Unlike existing approaches, ProP operates with minimal assumptions, requiring no prior knowledge of triggers or malicious samples, making it highly applicable to real-world scenarios. Extensive experimental validation across multiple popular backdoor attacks demonstrates that ProP achieves high detection accuracy and computational efficiency, outperforming existing methods. These results highlight ProP's potential as a robust and practical solution for backdoor detection.

Updated: 2024-11-11 14:43:44

标题: ProP：通过传播扰动实现超参数模型的高效后门检测

摘要: 后门攻击对机器学习模型的安全性提出了重大挑战，特别是对于像深度神经网络这样的过度参数化模型。在本文中，我们提出了一种新颖且可扩展的后门检测方法ProP（传播扰动），利用统计输出分布来识别带有后门的模型及其目标类别，而无需依赖详尽的优化策略。ProP引入了一个新的度量标准，良性分数，用于量化输出分布并有效区分良性模型和带有后门的模型。与现有方法不同，ProP操作时假设最小，不需要先验知识或恶意样本，使其在现实场景中具有高度适用性。通过对多种流行的后门攻击进行广泛的实验验证，证明ProP实现了高检测准确性和计算效率，超越了现有方法。这些结果突显了ProP作为后门检测的强大和实用解决方案的潜力。

更新时间: 2024-11-11 14:43:44

领域: cs.CR

下载: http://arxiv.org/abs/2411.07036v1

Evaluating the Accuracy of Chatbots in Financial Literature

We evaluate the reliability of two chatbots, ChatGPT (4o and o1-preview versions), and Gemini Advanced, in providing references on financial literature and employing novel methodologies. Alongside the conventional binary approach commonly used in the literature, we developed a nonbinary approach and a recency measure to assess how hallucination rates vary with how recent a topic is. After analyzing 150 citations, ChatGPT-4o had a hallucination rate of 20.0% (95% CI, 13.6%-26.4%), while the o1-preview had a hallucination rate of 21.3% (95% CI, 14.8%-27.9%). In contrast, Gemini Advanced exhibited higher hallucination rates: 76.7% (95% CI, 69.9%-83.4%). While hallucination rates increased for more recent topics, this trend was not statistically significant for Gemini Advanced. These findings emphasize the importance of verifying chatbot-provided references, particularly in rapidly evolving fields.

Updated: 2024-11-11 14:37:57

标题: 评估金融文献中聊天机器人的准确性

摘要: 我们评估了两个聊天机器人ChatGPT（4o和o1-preview版本）和Gemini Advanced在提供金融文献参考资料和采用新方法方面的可靠性。除了文献中常用的传统二进制方法外，我们还开发了一种非二进制方法和一个最新性指标来评估幻觉率如何随主题的最新程度变化。在分析了150个引用后，ChatGPT-4o的幻觉率为20.0%（95% CI，13.6%-26.4%），而o1-preview的幻觉率为21.3%（95% CI，14.8%-27.9%）。相比之下，Gemini Advanced表现出更高的幻觉率：76.7%（95% CI，69.9%-83.4%）。尽管对于更近期的主题，幻觉率有所增加，但对于Gemini Advanced，这一趋势在统计上不显著。这些发现强调了验证聊天机器人提供的参考资料的重要性，特别是在快速发展的领域。

更新时间: 2024-11-11 14:37:57

领域: cs.AI

下载: http://arxiv.org/abs/2411.07031v1

Aligning LLMs for FL-free Program Repair

Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs can locate and repair bugs in certain functions using the related artifacts (e.g., test cases), existing methods still depend on statement-level fault localization methods to provide a list of buggy hunks for repair. This restriction hinders LLMs from exploring potential patches beyond the given locations. In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first identifying faulty statements. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM's pre-trained capability, and (2) replacing the traditional localize-buggy-hunks-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR.

Updated: 2024-11-11 14:35:45

标题: 调整LLMs以实现无故障程序修复

摘要: 大型语言模型（LLMs）在自动程序修复（APR）方面取得了不错的结果。然而，仅使用解码器的LLMs（例如GPT-4）的下一个标记预测训练目标与当前填充风格方法的屏蔽跨度预测目标不一致，这阻碍了LLMs充分利用预先训练的知识进行程序修复。此外，虽然一些LLMs可以使用相关工件（例如测试用例）定位和修复某些函数中的错误，但现有方法仍然依赖于语句级故障定位方法提供故障区块列表以供修复。这种限制阻碍了LLMs在给定位置之外探索潜在补丁的能力。本文研究了一种新的方法，以适应LLMs进行程序修复。我们的核心见解是，LLMs的APR能力可以通过简单地将输出与其训练目标对齐，并允许它们在不首先识别有错误的语句的情况下完善整个程序来大大提高。基于这一见解，我们设计了一个名为D4C的简单提示框架用于APR。D4C可以在Defects4J中正确修复180个错误，每个补丁仅被采样10次。这超过了具有完美故障定位的SOTA APR方法10％，并将补丁采样数量减少了90％。我们的发现表明，（1）目标对齐对充分利用LLMs的预先训练能力至关重要，（2）用直接调试替换传统的定位有错误区块然后修复工作流对基于LLMs的APR方法更为有效。因此，我们相信本文引入了一种新的思维方式，以利用LLMs进行APR。

更新时间: 2024-11-11 14:35:45

领域: cs.SE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.08877v2

Fair Generalized Linear Mixed Models

When using machine learning for automated prediction, it is important to account for fairness in the prediction. Fairness in machine learning aims to ensure that biases in the data and model inaccuracies do not lead to discriminatory decisions. E.g., predictions from fair machine learning models should not discriminate against sensitive variables such as sexual orientation and ethnicity. The training data often in obtained from social surveys. In social surveys, oftentimes the data collection process is a strata sampling, e.g. due to cost restrictions. In strata samples, the assumption of independence between the observation is not fulfilled. Hence, if the machine learning models do not account for the strata correlations, the results may be biased. Especially high is the bias in cases where the strata assignment is correlated to the variable of interest. We present in this paper an algorithm that can handle both problems simultaneously, and we demonstrate the impact of stratified sampling on the quality of fair machine learning predictions in a reproducible simulation study.

Updated: 2024-11-11 14:32:58

标题: 公平的广义线性混合模型

摘要: 在使用机器学习进行自动预测时，重要的是要考虑预测中的公平性。机器学习中的公平性旨在确保数据中的偏见和模型的不准确性不会导致歧视性决策。例如，公平的机器学习模型的预测不应歧视敏感变量，如性取向和种族。训练数据通常来自社会调查。在社会调查中，数据收集过程通常是分层抽样，例如由于成本限制。在分层样本中，观察之间的独立性假设并不成立。因此，如果机器学习模型不考虑分层相关性，结果可能会有偏见。特别是在分层分配与感兴趣的变量相关联的情况下，偏见会很高。本文提出了一种同时处理这两个问题的算法，并在可重现的模拟研究中展示了分层抽样对公平机器学习预测质量的影响。

更新时间: 2024-11-11 14:32:58

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.09273v3

The Inherent Adversarial Robustness of Analog In-Memory Computing

A key challenge for Deep Neural Network (DNN) algorithms is their vulnerability to adversarial attacks. Inherently non-deterministic compute substrates, such as those based on Analog In-Memory Computing (AIMC), have been speculated to provide significant adversarial robustness when performing DNN inference. In this paper, we experimentally validate this conjecture for the first time on an AIMC chip based on Phase Change Memory (PCM) devices. We demonstrate higher adversarial robustness against different types of adversarial attacks when implementing an image classification network. Additional robustness is also observed when performing hardware-in-the-loop attacks, for which the attacker is assumed to have full access to the hardware. A careful study of the various noise sources indicate that a combination of stochastic noise sources (both recurrent and non-recurrent) are responsible for the adversarial robustness and that their type and magnitude disproportionately effects this property. Finally, it is demonstrated, via simulations, that when a much larger transformer network is used to implement a Natural Language Processing (NLP) task, additional robustness is still observed.

Updated: 2024-11-11 14:29:59

标题: 模拟内存计算的固有敌对鲁棒性

摘要: 深度神经网络（DNN）算法面临的一个关键挑战是它们对对抗攻击的脆弱性。基于模拟内存计算（AIMC）等固有的非确定性计算基质被认为在执行DNN推断时提供了显著的对抗鲁棒性。在本文中，我们首次在基于相变存储器（PCM）设备的AIMC芯片上实验验证了这一猜想。我们展示了在实现图像分类网络时针对不同类型的对抗攻击更高的对抗鲁棒性。在执行硬件在环攻击时，还观察到了额外的鲁棒性，攻击者被假设对硬件具有完全访问权限。对各种噪声源的仔细研究表明，随机噪声源（既有循环又有非循环）的组合负责对抗鲁棒性，它们的类型和大小不成比例地影响了这一特性。最后，通过模拟实验表明，当使用更大的变压器网络来实现自然语言处理（NLP）任务时，仍然观察到了额外的鲁棒性。

更新时间: 2024-11-11 14:29:59

领域: cs.ET,cs.CR

下载: http://arxiv.org/abs/2411.07023v1

HeteroSample: Meta-path Guided Sampling for Heterogeneous Graph Representation Learning

The rapid expansion of Internet of Things (IoT) has resulted in vast, heterogeneous graphs that capture complex interactions among devices, sensors, and systems. Efficient analysis of these graphs is critical for deriving insights in IoT scenarios such as smart cities, industrial IoT, and intelligent transportation systems. However, the scale and diversity of IoT-generated data present significant challenges, and existing methods often struggle with preserving the structural integrity and semantic richness of these complex graphs. Many current approaches fail to maintain the balance between computational efficiency and the quality of the insights generated, leading to potential loss of critical information necessary for accurate decision-making in IoT applications. We introduce HeteroSample, a novel sampling method designed to address these challenges by preserving the structural integrity, node and edge type distributions, and semantic patterns of IoT-related graphs. HeteroSample works by incorporating the novel top-leader selection, balanced neighborhood expansion, and meta-path guided sampling strategies. The key idea is to leverage the inherent heterogeneous structure and semantic relationships encoded by meta-paths to guide the sampling process. This approach ensures that the resulting subgraphs are representative of the original data while significantly reducing computational overhead. Extensive experiments demonstrate that HeteroSample outperforms state-of-the-art methods, achieving up to 15% higher F1 scores in tasks such as link prediction and node classification, while reducing runtime by 20%.These advantages make HeteroSample a transformative tool for scalable and accurate IoT applications, enabling more effective and efficient analysis of complex IoT systems, ultimately driving advancements in smart cities, industrial IoT, and beyond.

Updated: 2024-11-11 14:27:30

标题: 异构样本：异质图表示学习的元路径引导采样

摘要: 物联网(IoT)的快速扩张导致了庞大、异构的图形，捕捉了设备、传感器和系统之间复杂的交互关系。对这些图形的有效分析对于在物联网场景中获取洞察力至关重要，如智慧城市、工业物联网和智能交通系统。然而，物联网生成的数据规模和多样性带来了重大挑战，现有方法常常难以保持这些复杂图形的结构完整性和语义丰富性。许多当前方法未能在计算效率和生成洞察力质量之间保持平衡，导致潜在丢失关键信息，这些信息对于准确决策在物联网应用中至关重要。我们介绍了HeteroSample，一种旨在解决这些挑战的新型抽样方法，通过保持物联网相关图形的结构完整性、节点和边类型分布以及语义模式。HeteroSample通过整合新颖的顶级领导者选择、平衡邻域扩展和元路径引导抽样策略来实现。其关键思想是利用元路径编码的固有异构结构和语义关系来指导抽样过程。这种方法确保生成的子图代表原始数据，同时显著减少计算开销。广泛的实验证明，HeteroSample优于现有方法，如在链接预测和节点分类等任务中，F1得分高出15%，同时减少运行时间20%。这些优势使HeteroSample成为可扩展和准确的物联网应用的革新工具，实现对复杂物联网系统的更有效和高效分析，最终推动智慧城市、工业物联网等领域的发展。

更新时间: 2024-11-11 14:27:30

领域: cs.LG

下载: http://arxiv.org/abs/2411.07022v1

UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

Beyond-triple fact representations including hyper-relational facts with auxiliary key-value pairs, temporal facts with additional timestamps, and nested facts implying relationships between facts, are gaining significant attention. However, existing link prediction models are usually designed for one specific type of facts, making it difficult to generalize to other fact representations. To overcome this limitation, we propose a Unified Hierarchical Representation learning framework (UniHR) for unified knowledge graph link prediction. It consists of a unified Hierarchical Data Representation (HiDR) module and a unified Hierarchical Structure Learning (HiSL) module as graph encoder. The HiDR module unifies hyper-relational KGs, temporal KGs, and nested factual KGs into triple-based representations. Then HiSL incorporates intra-fact and inter-fact message passing, focusing on enhancing the semantic information within individual facts and enriching the structural information between facts. Experimental results across 7 datasets from 3 types of KGs demonstrate that our UniHR outperforms baselines designed for one specific kind of KG, indicating strong generalization capability of HiDR form and the effectiveness of HiSL module. Code and data are available at https://github.com/Lza12a/UniHR.

Updated: 2024-11-11 14:22:42

标题: UniHR：统一知识图谱链接预测的分层表示学习

摘要: 超越三元事实表示法，包括具有辅助键值对的超关系事实，具有额外时间戳的时间事实，以及暗示事实之间关系的嵌套事实，正受到重视。然而，现有的链接预测模型通常设计用于一种特定类型的事实，这使得很难推广到其他事实表示。为了克服这一限制，我们提出了一个统一的层次化表示学习框架（UniHR）用于统一知识图链接预测。它包括一个统一的层次数据表示（HiDR）模块和一个统一的层次结构学习（HiSL）模块作为图编码器。HiDR模块将超关系知识图，时间知识图和嵌套事实知识图统一为基于三元组的表示。然后HiSL结合了事实内部和事实间的信息传递，重点加强了个体事实内的语义信息，并丰富了事实之间的结构信息。来自3种类型的知识图的7个数据集上的实验结果表明，我们的UniHR优于为一种特定类型的知识图设计的基线，表明HiDR形式具有强大的泛化能力，HiSL模块的有效性。代码和数据可以在https://github.com/Lza12a/UniHR 上找到。

更新时间: 2024-11-11 14:22:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.07019v1

Data-Driven Gradient Optimization for Field Emission Management in a Superconducting Radio-Frequency Linac

Field emission can cause significant problems in superconducting radio-frequency linear accelerators (linacs). When cavity gradients are pushed higher, radiation levels within the linacs may rise exponentially, causing degradation of many nearby systems. This research aims to utilize machine learning with uncertainty quantification to predict radiation levels at multiple locations throughout the linacs and ultimately optimize cavity gradients to reduce field emission induced radiation while maintaining the total linac energy gain necessary for the experimental physics program. The optimized solutions show over 40% reductions for both neutron and gamma radiation from the standard operational settings.

Updated: 2024-11-11 14:22:16

标题: 基于数据驱动的梯度优化在超导射频线性加速器场发射管理中的应用

摘要: 场发射在超导射频直线加速器（linacs）中可能会导致重大问题。当腔体梯度被推高时，linacs内的辐射水平可能会呈指数增长，导致许多附近系统的退化。本研究旨在利用带有不确定性量化的机器学习来预测linacs各个位置的辐射水平，并最终优化腔体梯度，以减少场发射引起的辐射，同时保持实验物理项目所需的总linac能量增益。优化解决方案显示，与标准操作设置相比，中子和γ射线辐射均可减少超过40％。

更新时间: 2024-11-11 14:22:16

领域: physics.acc-ph,cs.LG

下载: http://arxiv.org/abs/2411.07018v1

BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Simulating realistic behaviors of traffic agents is pivotal for efficiently validating the safety of autonomous driving systems. Existing data-driven simulators primarily use an encoder-decoder architecture to encode the historical trajectories before decoding the future. However, the heterogeneity between encoders and decoders complicates the models, and the manual separation of historical and future trajectories leads to low data utilization. Given these limitations, we propose BehaviorGPT, a homogeneous and fully autoregressive Transformer designed to simulate the sequential behavior of multiple agents. Crucially, our approach discards the traditional separation between "history" and "future" by modeling each time step as the "current" one for motion generation, leading to a simpler, more parameter- and data-efficient agent simulator. We further introduce the Next-Patch Prediction Paradigm (NP3) to mitigate the negative effects of autoregressive modeling, in which models are trained to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. Despite having merely 3M model parameters, BehaviorGPT won first place in the 2024 Waymo Open Sim Agents Challenge with a realism score of 0.7473 and a minADE score of 1.4147, demonstrating its exceptional performance in traffic agent simulation.

Updated: 2024-11-11 14:20:39

标题: BehaviorGPT：具有下一个路径预测功能的自动驾驶智能代理模拟

摘要: 模拟交通代理的真实行为对于有效地验证自动驾驶系统的安全性至关重要。现有的数据驱动模拟器主要使用编码器-解码器架构来编码历史轨迹，然后解码未来轨迹。然而，编码器和解码器之间的异质性使模型复杂化，而历史和未来轨迹的手动分离导致数据利用率低。鉴于这些限制，我们提出了BehaviorGPT，这是一个设计用于模拟多个代理的顺序行为的均匀并完全自回归的Transformer。我们的方法至关重要地放弃了传统的“历史”和“未来”之间的分离，将每个时间步骤建模为“当前”时间步骤用于运动生成，从而导致一个更简单、更参数和数据高效的代理模拟器。我们进一步引入了Next-Patch Prediction Paradigm（NP3）来缓解自回归建模的负面影响，其中模型被训练在轨迹的补丁级别进行推理，并捕获长距离的时空交互作用。尽管仅具有3M模型参数，BehaviorGPT在2024年Waymo Open Sim Agents Challenge中获得第一名，现实得分为0.7473，minADE得分为1.4147，展示了其在交通代理模拟中的出色性能。

更新时间: 2024-11-11 14:20:39

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.17372v3

Leveraging LSTM for Predictive Modeling of Satellite Clock Bias

Satellite clock bias prediction plays a crucial role in enhancing the accuracy of satellite navigation systems. In this paper, we propose an approach utilizing Long Short-Term Memory (LSTM) networks to predict satellite clock bias. We gather data from the PRN 8 satellite of the Galileo and preprocess it to obtain a single difference sequence, crucial for normalizing the data. Normalization allows resampling of the data, ensuring that the predictions are equidistant and complete. Our methodology involves training the LSTM model on varying lengths of datasets, ranging from 7 days to 31 days. We employ a training set consisting of two days' worth of data in each case. Our LSTM model exhibits exceptional accuracy, with a Root Mean Square Error (RMSE) of 2.11 $\times$ 10$^{-11}$. Notably, our approach outperforms traditional methods used for similar time-series forecasting projects, being 170 times more accurate than RNN, 2.3 $\times$ 10$^7$ times more accurate than MLP, and 1.9 $\times$ 10$^4$ times more accurate than ARIMA. This study holds significant potential in enhancing the accuracy and efficiency of low-power receivers used in various devices, particularly those requiring power conservation. By providing more accurate predictions of satellite clock bias, the findings of this research can be integrated into the algorithms of such devices, enabling them to function with heightened precision while conserving power. Improved accuracy in clock bias predictions ensures that low-power receivers can maintain optimal performance levels, thereby enhancing the overall reliability and effectiveness of satellite navigation systems. Consequently, this advancement holds promise for a wide range of applications, including remote areas, IoT devices, wearable technology, and other devices where power efficiency and navigation accuracy are paramount.

Updated: 2024-11-11 14:18:32

标题: 利用LSTM进行卫星钟差预测建模

摘要: 卫星钟差预测在提高卫星导航系统精度方面起着至关重要的作用。在本文中，我们提出了一种利用长短期记忆（LSTM）网络预测卫星钟差的方法。我们收集了来自伽利略PRN 8卫星的数据，并对其进行预处理，以获得单差序列，这对于数据归一化至关重要。归一化允许对数据进行重采样，确保预测是等距和完整的。我们的方法涉及在不同长度的数据集上对LSTM模型进行训练，范围从7天到31天不等。在每种情况下，我们使用包含两天数据的训练集。我们的LSTM模型表现出卓越的准确性，均方根误差（RMSE）为2.11×10^-11。值得注意的是，我们的方法胜过传统用于类似时间序列预测项目的方法，比RNN准确度高出170倍，比MLP高出2.3×10^7倍，比ARIMA高出1.9×10^4倍。这项研究在提高各种设备中使用的低功耗接收器的精度和效率方面具有显著潜力，特别是那些需要节能的设备。通过提供更准确的卫星钟差预测，本研究的发现可以集成到这些设备的算法中，使它们在保持功率的同时能够以更高的精度运行。钟差预测的提高精度确保低功耗接收器能够保持最佳性能水平，从而提高卫星导航系统的整体可靠性和效果。因此，这一进展为广泛的应用提供了希望，包括偏远地区、物联网设备、可穿戴技术和其他需要功率效率和导航精度至关重要的设备。

更新时间: 2024-11-11 14:18:32

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2411.07015v1

Anticipatory Understanding of Resilient Agriculture to Climate

With billions of people facing moderate or severe food insecurity, the resilience of the global food supply will be of increasing concern due to the effects of climate change and geopolitical events. In this paper we describe a framework to better identify food security hotspots using a combination of remote sensing, deep learning, crop yield modeling, and causal modeling of the food distribution system. While we feel that the methods are adaptable to other regions of the world, we focus our analysis on the wheat breadbasket of northern India, which supplies a large percentage of the world's population. We present a quantitative analysis of deep learning domain adaptation methods for wheat farm identification based on curated remote sensing data from France. We model climate change impacts on crop yields using the existing crop yield modeling tool WOFOST and we identify key drivers of crop simulation error using a longitudinal penalized functional regression. A description of a system dynamics model of the food distribution system in India is also presented, along with results of food insecurity identification based on seeding this model with the predicted crop yields.

Updated: 2024-11-11 14:17:13

标题: 对气候下具有弹性的农业的预期理解

摘要: 随着数十亿人面临中度或严重的粮食不安全，由于气候变化和地缘政治事件的影响，全球食品供应的弹性将越来越受到关注。在本文中，我们描述了一个框架，通过结合遥感、深度学习、作物产量建模以及食品分配系统的因果建模，更好地识别粮食安全热点。虽然我们认为这些方法适用于世界其他地区，但我们将重点放在印度北部的小麦产地，这里供应了全球大部分人口的粮食。我们基于来自法国的筛选遥感数据，提出了关于小麦农场识别的深度学习领域适应方法的定量分析。我们使用现有的作物产量建模工具WOFOST对气候变化对作物产量的影响进行建模，并利用纵向惩罚函数回归识别作物模拟误差的关键驱动因素。还介绍了印度食品分配系统的系统动力学模型，并根据预测的作物产量对其进行种植，结果显示了粮食不安全的识别情况。

更新时间: 2024-11-11 14:17:13

领域: cs.CV,cs.LG,stat.AP

下载: http://arxiv.org/abs/2411.05219v2

A neural-network based anomaly detection system and a safety protocol to protect vehicular network

This thesis addresses the use of Cooperative Intelligent Transport Systems (CITS) to improve road safety and efficiency by enabling vehicle-to-vehicle communication, highlighting the importance of secure and accurate data exchange. To ensure safety, the thesis proposes a Machine Learning-based Misbehavior Detection System (MDS) using Long Short-Term Memory (LSTM) networks to detect and mitigate incorrect or misleading messages within vehicular networks. Trained offline on the VeReMi dataset, the detection model is tested in real-time within a platooning scenario, demonstrating that it can prevent nearly all accidents caused by misbehavior by triggering a defense protocol that dissolves the platoon if anomalies are detected. The results show that while the system can accurately detect general misbehavior, it struggles to label specific types due to varying traffic conditions, implying the difficulty of creating a universally adaptive protocol. However, the thesis suggests that with more data and further refinement, this MDS could be implemented in real-world CITS, enhancing driving safety by mitigating risks from misbehavior in cooperative driving networks.

Updated: 2024-11-11 14:15:59

标题: 一个基于神经网络的异常检测系统和一种保护车辆网络的安全协议

摘要: 这篇论文讨论了合作智能交通系统（CITS）的应用，通过实现车辆间通信来提高道路安全和效率，强调了安全和准确数据交换的重要性。为了确保安全，该论文提出了一种基于机器学习的恶意行为检测系统（MDS），使用长短期记忆（LSTM）网络来检测和减轻车辆网络中的错误或误导性消息。在VeReMi数据集上离线训练后，检测模型在编队场景中进行实时测试，表明它可以通过触发防御协议来防止几乎所有由恶意行为引起的事故，从而解散编队。结果显示，虽然系统可以准确检测一般恶意行为，但由于交通条件的变化，它很难标记特定类型，这暗示了创建一个普遍适应性协议的困难。然而，该论文建议，通过更多数据和进一步完善，这种MDS可以在现实世界的CITS中实施，通过减轻合作驾驶网络中恶意行为的风险来增强行车安全。

更新时间: 2024-11-11 14:15:59

领域: cs.LG,cs.AI,cs.NI

下载: http://arxiv.org/abs/2411.07013v1

Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention

Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped. Understanding when object-centric representations can theoretically be identified is crucial for scaling slot-based methods to high-dimensional images with correctness guarantees. To that end, we propose a probabilistic slot-attention algorithm that imposes an aggregate mixture prior over object-centric slot representations, thereby providing slot identifiability guarantees without supervision, up to an equivalence relation. We provide empirical verification of our theoretical identifiability result using both simple 2-dimensional data and high-resolution imaging datasets.

Updated: 2024-11-11 14:10:55

标题: 通过概率槽注意力实现可识别的面向对象的表示学习

摘要: 学习模块化对象中心表示对于系统化概括至关重要。现有方法在经验上显示出有希望的对象绑定能力，但理论可识别性保证相对较为不完善。理解何时可以在理论上识别对象中心表示对于将基于槽的方法扩展到具有正确性保证的高维图像至关重要。为此，我们提出了一种概率槽关注算法，通过在对象中心槽表示上施加一个聚合混合先验，从而在等价关系上提供槽可识别性保证，而无需监督。我们使用简单的二维数据和高分辨率成像数据集对我们的理论可识别性结果进行了实证验证。

更新时间: 2024-11-11 14:10:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07141v2

Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation

The generation of synthetic data is a state-of-the-art approach to leverage when access to real data is limited or privacy regulations limit the usability of sensitive data. A fair amount of research has been conducted on synthetic data generation for single-tabular datasets, but only a limited amount of research has been conducted on multi-tabular datasets with complex table relationships. In this paper we propose the algorithm HCTGAN to synthesize multi-tabular data from complex multi-tabular datasets. We compare our results to the probabilistic model HMA1. Our findings show that our proposed algorithm can more efficiently sample large amounts of synthetic data for deep and complex multi-tabular datasets, whilst achieving adequate data quality and always guaranteeing referential integrity. We conclude that the HCTGAN algorithm is suitable for generating large amounts of synthetic data efficiently for deep multi-tabular datasets with complex relationships. We additionally suggest that the HMA1 model should be used on smaller datasets when emphasis is on data quality.

Updated: 2024-11-11 14:09:26

标题: 分层条件表GAN用于多表格合成数据生成

摘要: 合成数据的生成是一种先进的方法，当真实数据受限或隐私规定限制敏感数据的可用性时可以利用。已经进行了相当多的研究，用于单表数据集的合成数据生成，但对于具有复杂表关系的多表数据集进行的研究量有限。在本文中，我们提出了算法HCTGAN，用于从复杂的多表数据集中合成多表数据。我们将结果与概率模型HMA1进行了比较。我们的研究结果表明，我们提出的算法能够更有效地对深度和复杂的多表数据集进行大量合成数据的采样，同时实现足够的数据质量，并始终保证引用完整性。我们得出结论，HCTGAN算法适用于高效生成具有复杂关系的深度多表数据集的大量合成数据。此外，我们建议在数据质量上强调时，应将HMA1模型用于较小的数据集。

更新时间: 2024-11-11 14:09:26

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2411.07009v1

JUICER: Data-Efficient Imitation Learning for Robotic Assembly

While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisely grasping, reorienting, and inserting multiple parts over long horizons and multiple task phases. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. These help expand dataset support and supervise the model with locally corrective actions near bottleneck regions requiring high precision. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps directly from RGB images, outperforming imitation and data augmentation baselines. Project website: https://imitation-juicer.github.io/.

Updated: 2024-11-11 14:09:00

标题: 榨汁机：用于机器人组装的数据高效模仿学习

摘要: 尽管从示范中学习对于获取视觉动作策略是有效的，但是在需要精确、长期操纵的任务中，高性能的模仿在没有大规模示范数据集的情况下仍然具有挑战性。本文提出了一个用于改进模仿学习性能的流程，仅需少量人类示范预算。我们将我们的方法应用于需要精确抓取、重新定位和插入多个零件的装配任务，这些任务需要在长期和多个任务阶段内完成。我们的流程结合了丰富的策略架构和各种数据集扩展和基于仿真的数据增强技术。这些技术有助于扩展数据集支持，并通过在需要高精度的瓶颈区域附近进行局部纠正操作来监督模型。我们在模拟环境中展示了我们的流程，使得一个机械手能够直接从RGB图像中在将近2500个时间步内组装多达五个零件，表现优于模仿和数据增强基线。项目网站：https://imitation-juicer.github.io/。

更新时间: 2024-11-11 14:09:00

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.03729v3

Permutative redundancy and uncertainty of the objective in deep learning

Implications of uncertain objective functions and permutative symmetry of traditional deep learning architectures are discussed. It is shown that traditional architectures are polluted by an astronomical number of equivalent global and local optima. Uncertainty of the objective makes local optima unattainable, and, as the size of the network grows, the global optimization landscape likely becomes a tangled web of valleys and ridges. Some remedies which reduce or eliminate ghost optima are discussed including forced pre-pruning, re-ordering, ortho-polynomial activations, and modular bio-inspired architectures.

Updated: 2024-11-11 14:06:56

标题: 深度学习中的排列冗余性和客观不确定性

摘要: 本文讨论了不确定目标函数和传统深度学习架构的排列对称性的含义。研究表明，传统架构受到大量等效全局和局部最优解的污染。目标的不确定性使得局部最优解无法实现，并且随着网络规模的增长，全局优化景观很可能变成一个纵横交错的山谷和山脊。讨论了一些减少或消除虚假最优解的方法，包括强制预修剪、重新排序、正交多项式激活和模块化生物启发架构。

更新时间: 2024-11-11 14:06:56

领域: cs.AI

下载: http://arxiv.org/abs/2411.07008v1

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimizes the reward through repeated RL procedures. This game-solving approach is both computationally expensive and difficult to stabilize. In this work, we propose a novel approach to IRL by direct policy optimization: exploiting a linear factorization of the return as the inner product of successor features and a reward vector, we design an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. Remarkably, our approach works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.

Updated: 2024-11-11 14:05:50

标题: 非对抗性逆强化学习通过继承特征匹配

摘要: 在逆强化学习（IRL）中，一个代理通过与环境的交互寻求复制专家演示。传统上，IRL被视为一种对抗性游戏，其中对手在奖励模型上搜索，学习者通过重复的RL过程优化奖励。这种游戏解决方法在计算上既昂贵又难以稳定。在这项工作中，我们提出了一种通过直接策略优化的IRL新方法：利用回报的线性因式分解作为后继特征和奖励向量的内积，我们设计了一种IRL算法，通过学习者和专家特征之间的差异进行策略梯度下降。我们的非对抗性方法不需要学习奖励函数，并且可以与现有的演员-评论家RL算法无缝解决。值得注意的是，我们的方法适用于仅在状态设置下且没有专家动作标签的情况，这是行为克隆（BC）无法解决的情况。实证结果表明，我们的方法可以从仅一个专家演示中学习，并在各种控制任务上取得改进的性能。

更新时间: 2024-11-11 14:05:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.07007v1

Estimating Causal Effects in Partially Directed Parametric Causal Factor Graphs

Lifting uses a representative of indistinguishable individuals to exploit symmetries in probabilistic relational models, denoted as parametric factor graphs, to speed up inference while maintaining exact answers. In this paper, we show how lifting can be applied to causal inference in partially directed graphs, i.e., graphs that contain both directed and undirected edges to represent causal relationships between random variables. We present partially directed parametric causal factor graphs (PPCFGs) as a generalisation of previously introduced parametric causal factor graphs, which require a fully directed graph. We further show how causal inference can be performed on a lifted level in PPCFGs, thereby extending the applicability of lifted causal inference to a broader range of models requiring less prior knowledge about causal relationships.

Updated: 2024-11-11 14:05:39

标题: 在部分有向参数因果因子图中估计因果效应

摘要: 抬升利用不可区分个体的代表来利用概率关系模型中的对称性，即参数化因子图，以加快推理速度同时保持准确答案。在本文中，我们展示了如何将抬升应用于部分有向图中的因果推断，即包含有向和无向边表示随机变量之间因果关系的图。我们提出了部分有向参数化因果因子图（PPCFGs）作为先前引入的参数化因果因子图的一种概括，后者需要一个完全有向图。我们进一步展示了如何在PPCFGs中进行因果推断，从而将抬升的因果推断的适用性扩展到需要更少关于因果关系的先验知识的更广泛模型。

更新时间: 2024-11-11 14:05:39

领域: cs.AI,cs.DS,cs.LG

下载: http://arxiv.org/abs/2411.07006v1

Fine Structure-Aware Sampling: A New Sampling Training Scheme for Pixel-Aligned Implicit Models in Single-View Human Reconstruction

Pixel-aligned implicit models, such as PIFu, PIFuHD, and ICON, are used for single-view clothed human reconstruction. These models need to be trained using a sampling training scheme. Existing sampling training schemes either fail to capture thin surfaces (e.g. ears, fingers) or cause noisy artefacts in reconstructed meshes. To address these problems, we introduce Fine Structured-Aware Sampling (FSS), a new sampling training scheme to train pixel-aligned implicit models for single-view human reconstruction. FSS resolves the aforementioned problems by proactively adapting to the thickness and complexity of surfaces. In addition, unlike existing sampling training schemes, FSS shows how normals of sample points can be capitalized in the training process to improve results. Lastly, to further improve the training process, FSS proposes a mesh thickness loss signal for pixel-aligned implicit models. It becomes computationally feasible to introduce this loss once a slight reworking of the pixel-aligned implicit function framework is carried out. Our results show that our methods significantly outperform SOTA methods qualitatively and quantitatively. Our code is publicly available at https://github.com/kcyt/FSS.

Updated: 2024-11-11 14:04:57

标题: 精细结构感知采样：一种新的用于单视图人体重建中像素对齐隐式模型的采样训练方案

摘要: 像PIFu、PIFuHD和ICON这样的像素对齐的隐式模型用于单视角穿着人类重建。这些模型需要使用采样训练方案进行训练。现有的采样训练方案要么无法捕捉薄表面（如耳朵、手指），要么会在重建网格中产生噪声伪影。为了解决这些问题，我们引入了Fine Structured-Aware Sampling（FSS），这是一种新的采样训练方案，用于训练像素对齐的隐式模型进行单视角人类重建。FSS通过主动适应表面的厚度和复杂性来解决上述问题。此外，与现有的采样训练方案不同，FSS展示了如何在训练过程中利用样本点的法线来改善结果。最后，为了进一步改善训练过程，FSS提出了用于像素对齐的隐式模型的网格厚度损失信号。一旦对像素对齐的隐式函数框架进行轻微重新工作，引入这种损失就变得计算上可行。我们的结果表明，我们的方法在质量和数量上显著优于SOTA方法。我们的代码可以在https://github.com/kcyt/FSS上公开获取。

更新时间: 2024-11-11 14:04:57

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.19197v2

Enhancing Robot Assistive Behaviour with Reinforcement Learning and Theory of Mind

The adaptation to users' preferences and the ability to infer and interpret humans' beliefs and intents, which is known as the Theory of Mind (ToM), are two crucial aspects for achieving effective human-robot collaboration. Despite its importance, very few studies have investigated the impact of adaptive robots with ToM abilities. In this work, we present an exploratory comparative study to investigate how social robots equipped with ToM abilities impact users' performance and perception. We design a two-layer architecture. The Q-learning agent on the first layer learns the robot's higher-level behaviour. On the second layer, a heuristic-based ToM infers the user's intended strategy and is responsible for implementing the robot's assistance, as well as providing the motivation behind its choice. We conducted a user study in a real-world setting, involving 56 participants who interacted with either an adaptive robot capable of ToM, or with a robot lacking such abilities. Our findings suggest that participants in the ToM condition performed better, accepted the robot's assistance more often, and perceived its ability to adapt, predict and recognise their intents to a higher degree. Our preliminary insights could inform future research and pave the way for designing more complex computation architectures for adaptive behaviour with ToM capabilities.

Updated: 2024-11-11 14:01:15

标题: 用强化学习和心理理论增强机器人辅助行为

摘要: 用户偏好的适应和推断和解释人类信念和意图的能力，即心灵理论（ToM），是实现有效的人机协作的两个关键方面。尽管其重要性，很少有研究调查具有ToM能力的自适应机器人的影响。在这项工作中，我们提出了一个探索性的比较研究，以调查装备ToM能力的社交机器人如何影响用户的表现和感知。我们设计了一个两层架构。第一层的Q学习代理学习机器人的高层行为。在第二层，基于启发式的ToM推断用户的预期策略，并负责实施机器人的协助，以及提供其选择背后的动机。我们在真实世界环境中进行了一项用户研究，涉及与具有ToM能力的自适应机器人互动或缺乏此类能力的机器人的56名参与者。我们的发现表明，ToM条件下的参与者表现更好，更频繁地接受机器人的协助，并认为其适应、预测和识别他们意图的能力更高。我们的初步见解可能为未来研究提供信息，并为设计具有ToM能力的自适应行为的更复杂计算架构铺平道路。

更新时间: 2024-11-11 14:01:15

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2411.07003v1

MixMask: Revisiting Masking Strategy for Siamese ConvNets

The recent progress in self-supervised learning has successfully combined Masked Image Modeling (MIM) with Siamese Networks, harnessing the strengths of both methodologies. Nonetheless, certain challenges persist when integrating conventional erase-based masking within Siamese ConvNets. Two primary concerns are: (1) The continuous data processing nature of ConvNets, which doesn't allow for the exclusion of non-informative masked regions, leading to reduced training efficiency compared to ViT architecture; (2) The misalignment between erase-based masking and the contrastive-based objective, distinguishing it from the MIM technique. To address these challenges, this work introduces a novel filling-based masking approach, termed \textbf{MixMask}. The proposed method replaces erased areas with content from a different image, effectively countering the information depletion seen in traditional masking methods. Additionally, we unveil an adaptive loss function that captures the semantics of the newly patched views, ensuring seamless integration within the architectural framework. We empirically validate the effectiveness of our approach through comprehensive experiments across various datasets and application scenarios. The findings underscore our framework's enhanced performance in areas such as linear probing, semi-supervised and supervised finetuning, object detection and segmentation. Notably, our method surpasses the MSCN, establishing MixMask as a more advantageous masking solution for Siamese ConvNets. Our code and models are publicly available at https://github.com/kirill-vish/MixMask.

Updated: 2024-11-11 14:00:40

标题: 混合掩码：重新审视孪生卷积神经网络的掩码策略

摘要: 最近在自监督学习领域的进展成功地将遮蔽图像建模（MIM）与连体网络（Siamese Networks）相结合，充分利用了两种方法的优势。然而，在将传统的擦除式遮罩集成到连体卷积网络（Siamese ConvNets）中时仍存在一些挑战。两个主要问题是：（1）ConvNets的连续数据处理特性，不允许排除非信息性的遮蔽区域，导致训练效率低于ViT架构；（2）擦除式遮罩与基于对比的目标之间存在不一致，使其与MIM技术有所区别。为了解决这些挑战，本文引入了一种新颖的填充式遮罩方法，称为MixMask。所提出的方法用不同图像的内容替换擦除区域，有效地对抗了传统遮蔽方法中出现的信息流失。此外，我们揭示了一种自适应损失函数，捕捉了新补丁视图的语义，确保在架构框架内的无缝集成。我们通过全面的实验验证了我们方法的有效性，跨越各种数据集和应用场景。研究结果强调了我们框架在线性探测、半监督和监督微调、目标检测和分割等领域的增强性能。值得注意的是，我们的方法超越了MSCN，将MixMask确立为Siamese ConvNets更有优势的遮罩解决方案。我们的代码和模型可在https://github.com/kirill-vish/MixMask 上公开获取。

更新时间: 2024-11-11 14:00:40

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2210.11456v4

Data Diversity Matters for Robust Instruction Tuning

Recent works have shown that by curating high quality and diverse instruction tuning datasets, we can significantly improve instruction-following capabilities. However, creating such datasets is difficult and most works rely on manual curation or proprietary language models. Automatic data curation is difficult as it is still not clear how we can define diversity for instruction tuning, how diversity and quality depend on one other, and how we can optimize dataset quality and diversity. To resolve these issue, we propose a new algorithm, Quality-Diversity Instruction Tuning (QDIT). QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance. From this study we draw two key insights (1) there is a natural tradeoff between data diversity and quality and (2) increasing data diversity significantly improves the worst case instruction following performance, therefore improving robustness. We validate the performance of QDIT on several large scale instruction tuning datasets, where we find it can substantially improve worst and average case performance compared to quality-driven data selection.

Updated: 2024-11-11 13:58:11

标题: 数据多样性对于稳健的指导调整至关重要。

摘要: 最近的研究表明，通过策划高质量和多样化的指导调整数据集，我们可以显著提高指导遵循能力。然而，创建这样的数据集是困难的，大多数工作依赖于手动策划或专有语言模型。自动数据策划很困难，因为目前还不清楚如何为指导调整定义多样性，多样性和质量如何相互依赖，以及如何优化数据集的质量和多样性。为了解决这些问题，我们提出了一种新算法，质量-多样性指导调整（QDIT）。QDIT提供了一种简单的方法，可以同时控制数据集的多样性和质量，使我们能够对多样性和质量对指导调整性能的影响进行深入研究。从这项研究中，我们得出两个关键见解：（1）数据多样性和质量之间存在自然的权衡，（2）增加数据多样性显著提高了最坏情况下的指导遵循性能，从而提高了鲁棒性。我们在几个大规模指导调整数据集上验证了QDIT的性能，发现与以质量为导向的数据选择相比，它可以显著改善最坏和平均情况下的性能。

更新时间: 2024-11-11 13:58:11

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.14736v3

Which PPML Would a User Choose? A Structured Decision Support Framework for Developers to Rank PPML Techniques Based on User Acceptance Criteria

Using Privacy-Enhancing Technologies (PETs) for machine learning often influences the characteristics of a machine learning approach, e.g., the needed computational power, timing of the answers or how the data can be utilized. When designing a new service, the developer faces the problem that some decisions require a trade-off. For example, the use of a PET may cause a delay in the responses or adding noise to the data to improve the users' privacy might have a negative impact on the accuracy of the machine learning approach. As of now, there is no structured way how the users' perception of a machine learning based service can contribute to the selection of Privacy Preserving Machine Learning (PPML) methods. This is especially a challenge since one cannot assume that users have a deep technical understanding of these technologies. Therefore, they can only be asked about certain attributes that they can perceive when using the service and not directly which PPML they prefer. This study introduces a decision support framework with the aim of supporting the selection of PPML technologies based on user preferences. Based on prior work analysing User Acceptance Criteria (UAC), we translate these criteria into differentiating characteristics for various PPML techniques. As a final result, we achieve a technology ranking based on the User Acceptance Criteria while providing technology insights for the developers. We demonstrate its application using the use case of classifying privacy-relevant information. Our contribution consists of the decision support framework which consists of a process to connect PPML technologies with UAC, a process for evaluating the characteristics that separate PPML techniques, and a ranking method to evaluate the best PPML technique for the use case.

Updated: 2024-11-11 13:53:33

标题: 用户会选择哪种PPML？一种结构化决策支持框架，供开发人员根据用户接受标准对PPML技术进行排序

摘要: 使用隐私增强技术（PETs）进行机器学习通常会影响机器学习方法的特征，例如所需的计算能力、答案的时间或数据的利用方式。在设计新服务时，开发人员面临一些决策需要权衡的问题。例如，使用PET可能会导致响应延迟，或者向数据添加噪音以提高用户隐私可能会对机器学习方法的准确性产生负面影响。目前，还没有一种结构化的方法来解决用户对基于机器学习的服务的感知如何对隐私保护机器学习（PPML）方法的选择产生影响的问题。这是一个挑战，因为不能假设用户对这些技术有深入的技术理解。因此，他们只能被询问关于在使用服务时能感知到的某些属性，而不是直接询问他们偏好哪种PPML方法。本研究介绍了一个决策支持框架，旨在支持基于用户偏好选择PPML技术。基于先前分析用户接受标准（UAC）的工作，我们将这些标准转化为各种PPML技术的不同特征。最终结果是根据用户接受标准实现技术排名，同时为开发人员提供技术见解。我们演示了其应用，使用分类隐私相关信息的用例。我们的贡献包括决策支持框架，其中包括将PPML技术与UAC连接的过程，评估区分PPML技术特征的过程，以及评估用例最佳PPML技术的排名方法。

更新时间: 2024-11-11 13:53:33

领域: cs.AI,cs.CR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2411.06995v1

Wave Network: An Ultra-Small Language Model

We propose an innovative token representation and update method in a new ultra-small language model: the Wave network. Specifically, we use a complex vector to represent each token, encoding both global and local semantics of the input text. A complex vector consists of two components: a magnitude vector representing the global semantics of the input text, and a phase vector capturing the relationships between individual tokens and global semantics. Experiments on the AG News text classification task demonstrate that, when generating complex vectors from randomly initialized token embeddings, our single-layer Wave Network achieves 90.91% accuracy with wave interference and 91.66% with wave modulation - outperforming a single Transformer layer using BERT pre-trained embeddings by 19.23% and 19.98%, respectively, and approaching the accuracy of the pre-trained and fine-tuned BERT base model (94.64%). Additionally, compared to BERT base, the Wave Network reduces video memory usage and training time by 77.34% and 85.62% during wave modulation. In summary, we used a 2.4-million-parameter small language model to achieve accuracy comparable to a 100-million-parameter BERT model in text classification.

Updated: 2024-11-11 13:49:30

标题: Wave网络：一个超小型语言模型

摘要: 我们在一个新的超小语言模型——Wave网络中提出了一种创新的令牌表示和更新方法。具体而言，我们使用一个复杂向量来表示每个令牌，编码输入文本的全局和局部语义。复杂向量由两个部分组成：一个表示输入文本全局语义的幅度向量，和一个捕捉各个令牌与全局语义之间关系的相位向量。在AG新闻文本分类任务上的实验表明，当从随机初始化的令牌嵌入中生成复杂向量时，我们的单层Wave网络在波干涉下达到了90.91%的准确率，在波调制下达到了91.66%的准确率——分别比使用BERT预训练嵌入的单个Transformer层高出19.23%和19.98%，并接近预训练并微调的BERT基础模型的准确率（94.64%）。此外，与BERT基础模型相比，Wave网络在波调制期间减少了77.34%的视频内存使用和85.62%的训练时间。总之，我们使用了一个240万参数的小语言模型，在文本分类中实现了与1亿参数的BERT模型相当的准确率。

更新时间: 2024-11-11 13:49:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.02674v4

Causal-discovery-based root-cause analysis and its application in time-series prediction error diagnosis

Recent rapid advancements of machine learning have greatly enhanced the accuracy of prediction models, but most models remain "black boxes", making prediction error diagnosis challenging, especially with outliers. This lack of transparency hinders trust and reliability in industrial applications. Heuristic attribution methods, while helpful, often fail to capture true causal relationships, leading to inaccurate error attributions. Various root-cause analysis methods have been developed using Shapley values, yet they typically require predefined causal graphs, limiting their applicability for prediction errors in machine learning models. To address these limitations, we introduce the Causal-Discovery-based Root-Cause Analysis (CD-RCA) method that estimates causal relationships between the prediction error and the explanatory variables, without needing a pre-defined causal graph. By simulating synthetic error data, CD-RCA can identify variable contributions to outliers in prediction errors by Shapley values. Extensive simulations show CD-RCA outperforms current heuristic attribution methods, and a sensitivity analysis reveals new patterns where Shapley values may misattribute errors, paving the way for more accurate error attribution methods.

Updated: 2024-11-11 13:48:13

标题: 基于因果发现的根本原因分析及其在时间序列预测错误诊断中的应用

摘要: 最近机器学习的快速发展极大提高了预测模型的准确性，但大多数模型仍然是“黑盒子”，使得预测错误诊断具有挑战性，特别是在存在异常值时。这种缺乏透明度阻碍了工业应用中的信任和可靠性。启发式归因方法虽然有所帮助，但常常无法捕捉真实的因果关系，导致错误归因不准确。已经开发出各种使用夏普利值的根本原因分析方法，但它们通常需要预先定义的因果图，从而限制了它们在机器学习模型的预测错误中的适用性。为了解决这些限制，我们引入了基于因果发现的根本原因分析（CD-RCA）方法，该方法估计了预测错误与解释变量之间的因果关系，而无需预先定义的因果图。通过模拟合成错误数据，CD-RCA可以通过夏普利值识别对预测错误中异常值的变量贡献。广泛的模拟显示CD-RCA优于当前的启发式归因方法，敏感性分析揭示了夏普利值可能错误归因的新模式，为更准确的错误归因方法铺平了道路。

更新时间: 2024-11-11 13:48:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2411.06990v1

Token2Wave

This paper provides an in-depth analysis of Token2Wave, a novel token representation method derived from the Wave Network, designed to capture both global and local semantics of input text through wave-inspired complex vectors. In Token2Wave, each token is represented with a magnitude component, capturing the global semantics of the entire input text, and a phase component, encoding the relationships between individual tokens and the global semantics. Building on prior research that demonstrated the effectiveness of wave-like operations, such as interference and modulation, during forward propagation, this study investigates the convergence behavior, backpropagation characteristics, and embedding independence within the Token2Wave framework. A detailed computational complexity analysis shows that Token2Wave can significantly reduce video memory usage and training time compared to BERT. Gradient comparisons for the [CLS] token, total input text, and classifier parameters further highlight Token2Wave's unique characteristics. This research offers new insights into wave-based token representations, demonstrating their potential to enable efficient and computationally friendly language model architectures.

Updated: 2024-11-11 13:48:01

标题: Token2Wave

摘要: 本文提供了对Token2Wave的深入分析，这是一种源自Wave Network的新型令牌表示方法，旨在通过受波浪启发的复杂向量捕获输入文本的全局和局部语义。在Token2Wave中，每个令牌都用一个幅度成分表示，捕捉整个输入文本的全局语义，还有一个相位成分，编码了各个令牌与全局语义之间的关系。在之前证明波状操作（如干扰和调制）在前向传播中有效的研究基础上，本研究探讨了Token2Wave框架中的收敛行为、反向传播特性和嵌入独立性。详细的计算复杂性分析显示，与BERT相比，Token2Wave可以显著减少视频内存使用和训练时间。对于[CLS]令牌、总输入文本和分类器参数的梯度比较进一步突显了Token2Wave的独特特性。这项研究为基于波的令牌表示提供了新的见解，展示了它们能够实现高效和计算友好的语言模型架构的潜力。

更新时间: 2024-11-11 13:48:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.06989v1

Unified Lexical Representation for Interpretable Visual-Language Alignment

Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work. Although CLIP performs well, the typical direct latent feature alignment lacks clarity in its representation and similarity scores. On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse representation and interpretable, providing exact matches for individual words. However, lexical representations are difficult to learn due to no ground-truth supervision and false-discovery issues, and thus requires complex design to train effectively. In this paper, we introduce LexVLA, a more interpretable VLA framework by learning a unified lexical representation for both modalities without complex design. We use DINOv2 as our visual model for its local-inclined features and Llama 2, a generative language model, to leverage its in-context lexical prediction ability. To avoid the false discovery, we propose an overuse penalty to refrain the lexical representation from falsely frequently activating meaningless words. We demonstrate that these two pre-trained uni-modal models can be well-aligned by fine-tuning on the modest multi-modal dataset and avoid intricate training configurations. On cross-modal retrieval benchmarks, LexVLA, trained on the CC-12M multi-modal dataset, outperforms baselines fine-tuned on larger datasets (e.g., YFCC15M) and those trained from scratch on even bigger datasets (e.g., 1.1B data, including CC-12M). We conduct extensive experiments to analyze LexVLA. Codes are available at https://github.com/Clementine24/LexVLA.

Updated: 2024-11-11 13:46:50

标题: 统一的词汇表示方法用于可解释的视觉-语言对齐

摘要: 视觉-语言对齐（VLA）自CLIP的开创性工作以来引起了很多关注。尽管CLIP表现良好，但典型的直接潜在特征对齐在表示和相似度得分方面缺乏清晰性。另一方面，词汇表示是一个向量，其元素表示样本与词汇中的一个词之间的相似性，是一种自然的稀疏表示和可解释性强，为个别单词提供精确匹配。然而，由于缺乏基本的监督和虚假发现问题，词汇表示很难学习，因此需要复杂的设计来有效训练。在本文中，我们引入LexVLA，这是一个更可解释的VLA框架，通过学习统一的词汇表示来为两个模态提供支持，而无需复杂的设计。我们使用DINOv2作为我们的视觉模型，因其局部倾斜的特征，以及Llama 2，一种生成语言模型，利用其上下文词汇预测能力。为了避免虚假发现，我们提出了一种过度使用惩罚，以阻止词汇表示频繁激活无意义的词语。我们展示了这两个预先训练的单模型可以通过在适度的多模态数据集上微调而被很好地对齐，并避免复杂的训练配置。在跨模态检索基准测试中，训练在CC-12M多模态数据集上的LexVLA胜过了在更大数据集上微调（例如YFCC15M）和在更大数据集上从头开始训练（例如包括CC-12M在内的11亿数据）的基线。我们进行了大量实验来分析LexVLA。代码可在https://github.com/Clementine24/LexVLA 上找到。

更新时间: 2024-11-11 13:46:50

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17827v2

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling

A major consideration in multilingual language modeling is how to best represent languages with diverse vocabularies and scripts. Although contemporary text encoding methods cover most of the world's writing systems, they exhibit bias towards the high-resource languages of the Global West. As a result, texts of underrepresented languages tend to be segmented into long sequences of linguistically meaningless units. To address the disparities, we introduce a new paradigm that encodes the same information with segments of consistent size across diverse languages. Our encoding convention (MYTE) is based on morphemes, as their inventories are more balanced across languages than characters, which are used in previous methods. We show that MYTE produces shorter encodings for all 99 analyzed languages, with the most notable improvements for non-European languages and non-Latin scripts. This, in turn, improves multilingual LM performance and diminishes the perplexity gap throughout diverse languages.

Updated: 2024-11-11 13:33:25

标题: MYTE：基于形态学的字节编码，用于更好、更公平的多语言语言建模

摘要: 跨语言语言建模中的一个重要考虑因素是如何最好地表示具有多样化词汇和文字的语言。尽管当代文本编码方法涵盖了世界上大部分的书写系统，但它们存在偏向于全球西方高资源语言的偏见。因此，代表性不足的语言的文本往往被分割成一长串语言上毫无意义的单位。为了解决这些差异，我们引入了一种新的范式，即使用相同大小的片段来编码各种语言中的相同信息。我们的编码约定（MYTE）基于形态素，因为形态素的库存在各种语言中更加平衡，而之前的方法使用的是字符。我们展示了MYTE对所有经过分析的99种语言都产生了更短的编码，对非欧洲语言和非拉丁文字的改进最为显著。这反过来提高了多语言LM的性能，并减少了各种语言之间的困惑差距。

更新时间: 2024-11-11 13:33:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.10691v2

IntegratedPIFu: Integrated Pixel Aligned Implicit Function for Single-view Human Reconstruction

We propose IntegratedPIFu, a new pixel aligned implicit model that builds on the foundation set by PIFuHD. IntegratedPIFu shows how depth and human parsing information can be predicted and capitalised upon in a pixel-aligned implicit model. In addition, IntegratedPIFu introduces depth oriented sampling, a novel training scheme that improve any pixel aligned implicit model ability to reconstruct important human features without noisy artefacts. Lastly, IntegratedPIFu presents a new architecture that, despite using less model parameters than PIFuHD, is able to improves the structural correctness of reconstructed meshes. Our results show that IntegratedPIFu significantly outperforms existing state of the arts methods on single view human reconstruction. Our code has been made available online.

Updated: 2024-11-11 13:32:00

标题: IntegratedPIFu: 单视图人体重建的整合像素对齐隐式函数

摘要: 我们提出了IntegratedPIFu，这是一个建立在PIFuHD基础上的新的像素对齐的隐式模型。IntegratedPIFu展示了如何在像素对齐的隐式模型中预测和利用深度和人体解析信息。此外，IntegratedPIFu引入了深度导向采样，一种新颖的训练方案，可以提高任何像素对齐的隐式模型重建重要人体特征而不产生噪音伪影的能力。最后，IntegratedPIFu提出了一种新的架构，尽管使用的模型参数比PIFuHD少，但能够提高重建网格的结构正确性。我们的结果表明，IntegratedPIFu在单视图人体重建方面明显优于现有的最先进方法。我们的代码已经在线提供。

更新时间: 2024-11-11 13:32:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2211.07955v3

Examining Attacks on Consensus and Incentive Systems in Proof-of-Work Blockchains: A Systematic Literature Review

Cryptocurrencies have gained popularity due to their transparency, security, and accessibility compared to traditional financial systems, with Bitcoin, introduced in 2009, leading the market. Bitcoin's security relies on blockchain technology - a decentralized ledger consisting of a consensus and an incentive mechanism. The consensus mechanism, Proof of Work (PoW), requires miners to solve difficult cryptographic puzzles to add new blocks, while the incentive mechanism rewards them with newly minted bitcoins. However, as Bitcoin's acceptance grows, it faces increasing threats from attacks targeting these mechanisms, such as selfish mining, double-spending, and block withholding. These attacks compromise security, efficiency, and reward distribution. Recent research shows that these attacks can be combined with each other or with either malicious strategies, such as network-layer attacks, or non-malicious strategies, like honest mining. These combinations lead to more sophisticated attacks, increasing the attacker's success rates and profitability. Therefore, understanding and evaluating these attacks is essential for developing effective countermeasures and ensuring long-term security. This paper begins by examining individual attacks executed in isolation and their profitability. It then explores how combining these attacks with each other or with other malicious and non-malicious strategies can enhance their overall effectiveness and profitability. The analysis further explores how the deployment of attacks such as selfish mining and block withholding by multiple competing mining pools against each other impacts their economic returns. Lastly, a set of design guidelines is provided, outlining areas future work should focus on to prevent or mitigate the identified threats.

Updated: 2024-11-11 13:30:26

标题: 审查工作量证明区块链中对共识和激励系统的攻击：系统文献综述

摘要: 由于透明性、安全性和易访问性，加密货币比传统金融系统更受欢迎，比特币在2009年推出后成为市场领导者。比特币的安全性依赖于区块链技术 - 一个由共识和激励机制组成的去中心化账本。共识机制Proof of Work（PoW）要求矿工解决困难的密码难题以添加新的区块，而激励机制则以新铸造的比特币奖励他们。然而，随着比特币的接受程度增加，它面临着越来越多针对这些机制的攻击威胁，如自私挖矿、双重支付和区块保留。这些攻击危及安全性、效率和奖励分配。最近的研究表明，这些攻击可以相互结合，或与恶意策略（如网络层攻击）或非恶意策略（如诚实挖矿）结合。这些组合导致更复杂的攻击，提高攻击者的成功率和盈利能力。因此，理解和评估这些攻击对于制定有效的对策和确保长期安全至关重要。本文首先研究单独执行的攻击和它们的盈利能力。接着探讨如何将这些攻击与其他攻击或其他恶意和非恶意策略结合，以增强它们的整体效果和盈利能力。分析进一步探讨了多个竞争矿池之间部署自私挖矿和区块保留等攻击对它们的经济回报产生的影响。最后，提供一套设计指导方针，概述未来工作应该专注于哪些领域，以防止或减轻已识别的威胁。

更新时间: 2024-11-11 13:30:26

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2411.00349v2

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration

The advent of AI-Generated Content (AIGC) has spurred research into automated video generation to streamline conventional processes. However, automating storytelling video production, particularly for customized narratives, remains challenging due to the complexity of maintaining subject consistency across shots. While existing approaches like Mora and AesopAgent integrate multiple agents for Story-to-Video (S2V) generation, they fall short in preserving protagonist consistency and supporting Customized Storytelling Video Generation (CSVG). To address these limitations, we propose StoryAgent, a multi-agent framework designed for CSVG. StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process. Notably, our framework includes agents for story design, storyboard generation, video creation, agent coordination, and result evaluation. Leveraging the strengths of different models, StoryAgent enhances control over the generation process, significantly improving character consistency. Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency, while a novel storyboard generation pipeline is proposed to maintain subject consistency across shots. Extensive experiments demonstrate the effectiveness of our approach in synthesizing highly consistent storytelling videos, outperforming state-of-the-art methods. Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.

Updated: 2024-11-11 13:24:18

标题: StoryAgent：通过多智能体协作实现定制故事视频生成

摘要: AI-Generated Content (AIGC)的出现促使了对自动视频生成的研究，以简化传统流程。然而，自动化叙事视频制作，特别是针对定制叙事的情况，仍然具有挑战性，因为要在镜头之间保持主题一致性的复杂性。虽然现有的方法如Mora和AesopAgent整合了多个代理以进行故事到视频（S2V）生成，但它们在保持主角一致性和支持定制叙事视频生成（CSVG）方面存在不足。为了解决这些限制，我们提出了StoryAgent，这是一个专为CSVG设计的多代理框架。StoryAgent将CSVG分解为分配给专门代理的不同子任务，反映出专业制作过程。值得注意的是，我们的框架包括故事设计代理，故事板生成，视频创建，代理协调和结果评估。通过利用不同模型的优势，StoryAgent增强了对生成过程的控制，显著提高了角色一致性。具体来说，我们引入了一种定制的图像到视频（I2V）方法LoRA-BE，以增强镜头内的时间一致性，同时提出了一种新颖的故事板生成管道，以保持镜头之间的主题一致性。广泛的实验表明，我们的方法在合成高度一致的叙事视频方面具有有效性，优于最先进的方法。我们的贡献包括引入了StoryAgent，这是一个用于视频生成任务的通用框架，以及保持主角一致性的新技术。

更新时间: 2024-11-11 13:24:18

领域: cs.CV,cs.AI,cs.MA

下载: http://arxiv.org/abs/2411.04925v2

Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration

Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.

Updated: 2024-11-11 13:11:18

标题: 来自不同行为的模仿：使用单步档案探索的Wasserstein质量多样性模仿学习

摘要: Learning diverse and high-performance behaviors from a limited set of demonstrations is a difficult task. Traditional imitation learning methods often struggle with this challenge as they are typically designed to learn a single behavior even when provided with multiple demonstrations. Therefore, new techniques for quality diversity imitation learning are necessary to address this issue. This study introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which enhances the stability of imitation learning in a quality diversity setting through latent adversarial training using a Wasserstein Auto-Encoder (WAE). Additionally, it addresses the problem of behavior overfitting by incorporating a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method surpasses current imitation learning methods, achieving performance levels close to or exceeding expert quality diversity performance on challenging continuous control tasks based on MuJoCo environments.

更新时间: 2024-11-11 13:11:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.06965v1

ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis

Recently, token-based generation have demonstrated their effectiveness in image synthesis. As a representative example, non-autoregressive Transformers (NATs) can generate decent-quality images in a few steps. NATs perform generation in a progressive manner, where the latent tokens of a resulting image are incrementally revealed. At each step, the unrevealed image regions are padded with mask tokens and inferred by NAT. In this paper, we delve into the mechanisms behind the effectiveness of NATs and uncover two important patterns that naturally emerge from NATs: Spatially (within a step), although mask and visible tokens are processed uniformly by NATs, the interactions between them are highly asymmetric. In specific, mask tokens mainly gather information for decoding, while visible tokens tend to primarily provide information, and their deep representations can be built only upon themselves. Temporally (across steps), the interactions between adjacent generation steps mostly concentrate on updating the representations of a few critical tokens, while the computation for the majority of tokens is generally repetitive. Driven by these findings, we propose EfficientNAT (ENAT), a NAT model that explicitly encourages these critical interactions inherent in NATs. At the spatial level, we disentangle the computations of visible and mask tokens by encoding visible tokens independently, while decoding mask tokens conditioned on the fully encoded visible tokens. At the temporal level, we prioritize the computation of the critical tokens at each step, while maximally reusing previously computed token representations to supplement necessary information. ENAT improves the performance of NATs notably with significantly reduced computational cost. Experiments on ImageNet-256, ImageNet-512 and MS-COCO validate the effectiveness of ENAT. Code is available at https://github.com/LeapLabTHU/ENAT.

Updated: 2024-11-11 13:05:39

标题: ENAT：重新思考基于标记的图像合成中的时空交互

摘要: 最近，基于令牌的生成在图像合成中展示了它们的有效性。作为一个代表性的例子，非自回归Transformer（NATs）可以在几个步骤内生成质量不错的图像。NATs以渐进的方式进行生成，生成的图像的潜在令牌逐渐显露出来。在每个步骤中，未显露的图像区域被填充上掩码令牌，并由NAT推断。在本文中，我们深入探讨了NATs有效性背后的机制，并揭示了NATs自然产生的两个重要模式：在空间上（在一个步骤内），尽管掩码和可见令牌被NATs统一处理，但它们之间的交互是高度不对称的。具体而言，掩码令牌主要用于解码的信息，而可见令牌则主要提供信息，并且它们的深层表示只能建立在自身之上。在时间上（跨步骤），相邻生成步骤之间的交互主要集中在更新少数关键令牌的表示，而对大多数令牌的计算通常是重复的。受这些发现的驱动，我们提出了EfficientNAT（ENAT），这是一个明确鼓励NATs固有的关键交互的NAT模型。在空间级别上，我们通过独立编码可见令牌来解开可见和掩码令牌的计算，同时根据完全编码的可见令牌来解码掩码令牌。在时间级别上，我们优先考虑每个步骤中关键令牌的计算，同时最大程度地重用先前计算的令牌表示来补充必要的信息。ENAT显着提高了NATs的性能，并显著降低了计算成本。在ImageNet-256、ImageNet-512和MS-COCO上的实验验证了ENAT的有效性。代码可在https://github.com/LeapLabTHU/ENAT找到。

更新时间: 2024-11-11 13:05:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.06959v1

Data-driven discovery of mechanical models directly from MRI spectral data

Finding interpretable biomechanical models can provide insight into the functionality of organs with regard to physiology and disease. However, identifying broadly applicable dynamical models for in vivo tissue remains challenging. In this proof of concept study we propose a reconstruction framework for data-driven discovery of dynamical models from experimentally obtained undersampled MRI spectral data. The method makes use of the previously developed spectro-dynamic framework which allows for reconstruction of displacement fields at high spatial and temporal resolution required for model identification. The proposed framework combines this method with data-driven discovery of interpretable models using Sparse Identification of Non-linear Dynamics (SINDy). The design of the reconstruction algorithm is such that a symbiotic relation between the reconstruction of the displacement fields and the model identification is created. Our method does not rely on periodicity of the motion. It is successfully validated using spectral data of a dynamic phantom gathered on a clinical MRI scanner. The dynamic phantom is programmed to perform motion adhering to 5 different (non-linear) ordinary differential equations. The proposed framework performed better than a 2-step approach where the displacement fields were first reconstructed from the undersampled data without any information on the model, followed by data-driven discovery of the model using the reconstructed displacement fields. This study serves as a first step in the direction of data-driven discovery of in vivo models.

Updated: 2024-11-11 13:05:29

标题: 基于MRI谱数据直接发现机械模型的数据驱动方法

摘要: 寻找可解释的生物力学模型可以帮助我们了解器官在生理和疾病方面的功能。然而，为体内组织确定广泛适用的动力模型仍然具有挑战性。在这项概念验证研究中，我们提出了一个重建框架，用于从实验获取的欠采样MRI谱数据中发现动态模型。该方法利用先前开发的光谱动力学框架，允许以高空间和时间分辨率重建位移场，这对模型识别是必需的。所提出的框架将这种方法与使用稀疏非线性动力学辨识（SINDy）的数据驱动可解释模型的发现结合起来。重建算法的设计使位移场的重建和模型识别之间建立了一种共生关系。我们的方法不依赖于运动的周期性。通过使用在临床MRI扫描仪上收集的动态幻影的谱数据成功验证了该方法。动态幻影被编程执行遵循5个不同（非线性）常微分方程的运动。与首先从欠采样数据中重建位移场而没有关于模型的任何信息，然后使用重建的位移场进行数据驱动模型发现的两步方法相比，所提出的框架表现更好。这项研究是朝着数据驱动发现体内模型迈出的第一步。

更新时间: 2024-11-11 13:05:29

领域: physics.med-ph,cs.LG,eess.IV

下载: http://arxiv.org/abs/2411.06958v1

Visual Data Diagnosis and Debiasing with Concept Graphs

The widespread success of deep learning models today is owed to the curation of extensive datasets significant in size and complexity. However, such models frequently pick up inherent biases in the data during the training process, leading to unreliable predictions. Diagnosing and debiasing datasets is thus a necessity to ensure reliable model performance. In this paper, we present ConBias, a novel framework for diagnosing and mitigating Concept co-occurrence Biases in visual datasets. ConBias represents visual datasets as knowledge graphs of concepts, enabling meticulous analysis of spurious concept co-occurrences to uncover concept imbalances across the whole dataset. Moreover, we show that by employing a novel clique-based concept balancing strategy, we can mitigate these imbalances, leading to enhanced performance on downstream tasks. Extensive experiments show that data augmentation based on a balanced concept distribution augmented by Conbias improves generalization performance across multiple datasets compared to state-of-the-art methods.

Updated: 2024-11-11 12:56:11

标题: 使用概念图进行视觉数据诊断和去偏见

摘要: 当今深度学习模型的广泛成功归功于规模和复杂性显著的大量数据集的策划。然而，在训练过程中，这些模型经常会捕捉到数据中固有的偏见，导致预测不可靠。因此，诊断和消除数据集中的偏见是确保模型性能可靠的必要条件。在本文中，我们提出了ConBias，一个用于诊断和缓解视觉数据集中概念共现偏见的新框架。ConBias将视觉数据集表示为概念的知识图，可以对虚假概念共现进行细致分析，以揭示整个数据集中的概念不平衡。此外，我们展示了通过采用基于团的概念平衡策略，我们可以减轻这些不平衡，从而提高后续任务的性能。大量实验证明，基于Conbias增强的平衡概念分布的数据增强比起最先进方法，可以改善多个数据集上的泛化性能。

更新时间: 2024-11-11 12:56:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.18055v2

Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference

Large language models have demonstrated promising capabilities upon scaling up parameters. However, serving large language models incurs substantial computation and memory movement costs due to their large scale. Quantization methods have been employed to reduce service costs and latency. Nevertheless, outliers in activations hinder the development of INT4 weight-activation quantization. Existing approaches separate outliers and normal values into two matrices or migrate outliers from activations to weights, suffering from high latency or accuracy degradation. Based on observing activations from large language models, outliers can be classified into channel-wise and spike outliers. In this work, we propose Rotated Runtime Smooth (RRS), a plug-and-play activation smoother for quantization, consisting of Runtime Smooth and the Rotation operation. Runtime Smooth (RS) is introduced to eliminate channel-wise outliers by smoothing activations with channel-wise maximums during runtime. The rotation operation can narrow the gap between spike outliers and normal values, alleviating the effect of victims caused by channel-wise smoothing. The proposed method outperforms the state-of-the-art method in the LLaMA and Qwen families and improves WikiText-2 perplexity from 57.33 to 6.66 for INT4 inference.

Updated: 2024-11-11 12:45:51

标题: 旋转运行时平滑：无需训练的激活平滑器，用于准确的INT4推理

摘要: 大型语言模型在扩展参数方面表现出有希望的能力。然而，提供大型语言模型会产生大量的计算和内存移动成本，因为它们的规模很大。量化方法已被应用于降低服务成本和延迟。然而，激活中的异常值阻碍了INT4权重-激活量化的发展。现有方法将异常值和正常值分开放入两个矩阵，或者将异常值从激活迁移到权重，从而导致高延迟或准确度降低。根据对大型语言模型的激活观察，异常值可以分为通道异常和尖峰异常。在这项工作中，我们提出了旋转运行平滑（RRS），这是一个用于量化的即插即用激活平滑器，由运行时平滑和旋转操作组成。引入运行时平滑（RS）来消除通道异常，通过在运行时用通道最大值平滑激活。旋转操作可以缩小尖峰异常和正常值之间的差距，缓解由通道平滑引起的受害者的影响。该方法在LLaMA和Qwen系列中优于最先进的方法，并将WikiText-2的困惑度从57.33提高到6.66，用于INT4推理。

更新时间: 2024-11-11 12:45:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.20361v2

Computing excited states of molecules using normalizing flows

Calculations of highly excited and delocalized molecular vibrational states is a computationally challenging task, which strongly depends on the choice of coordinates for describing vibrational motions. We introduce a new method that utilizes normalizing flows (parametrized invertible functions) to optimize vibrational coordinates to satisfy the variational principle. This approach produces coordinates specifically tailored to the vibrational problem at hand, significantly increasing the accuracy and enhancing basis set convergence of calculated energy spectrum. The efficiency of the method is demonstrated in calculations of the 100 lowest excited vibrational states of H$_2$S, H$_2$CO, and HCN/CNH. The method effectively captures the essential vibrational behavior of molecules by enhancing the separability of the Hamiltonian. We further demonstrate that the optimized coordinates are transferable across different levels of basis set truncation, enabling a cost-efficient protocol for computing vibrational spectra of high-dimensional systems.

Updated: 2024-11-11 12:34:53

标题: 使用归一化流计算分子的激发态

摘要: 高度激发和离域分子振动态的计算是一项计算上具有挑战性的任务，这在很大程度上取决于用于描述振动运动的坐标选择。我们引入了一种新方法，利用正规化流（参数化可逆函数）来优化振动坐标以满足变分原理。这种方法产生了专门为手头的振动问题量身定制的坐标，显著提高了计算能谱的准确性，并增强了基组收敛性。该方法的效率在计算H$_2$S、H$_2$CO和HCN/CNH的100个最低激发振动态时得到了证明。该方法通过增强哈密顿量的可分离性有效地捕捉分子的基本振动行为。我们进一步证明，优化的坐标可以在不同基组截断水平之间传递，实现了一种成本高效的计算高维系统振动光谱的协议。

更新时间: 2024-11-11 12:34:53

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2308.16468v2

Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum

Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, we observe that, on our recently presented dataset with 15-class directional focus, models relying exclusively on EEG inputs exhibits significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. We employ the CNN, LSM-CNN, and EEG-Deformer models to decode the directional focus from listeners' EEG signals with the auxiliary audio spatial spectra. The proposed Sp-Aux-Deformer model achieves notable 15-class decoding accuracies of 57.48% and 61.83% in leave-one-subject-out and leave-one-trial-out scenarios, respectively.

Updated: 2024-11-11 12:32:26

标题: 基于脑电图的多类别解码：通过音频空间谱解码关注讲话者的方向

摘要: 从听众的脑电图（EEG）信号中解码出受关注演讲者的方向焦点对于开发脑机接口以改善听力障碍个体的生活质量至关重要。先前的研究集中在二进制方向焦点解码上，即确定受关注演讲者是否在听众的左侧或右侧。然而，对受关注演讲者的确切方向进行更精确的解码对于有效的语音处理是必要的。此外，音频空间信息尚未得到有效利用，导致解码结果不佳。在本文中，我们观察到，在我们最近提供的具有15类方向焦点的数据集上，完全依赖EEG输入的模型在解码方向焦点时在leave-one-subject-out和leave-one-trial-out场景中的准确度显著较低。通过将音频空间频谱与EEG特征结合，解码准确性可以得到有效提升。我们采用CNN、LSM-CNN和EEG-Deformer模型来从听众的EEG信号中解码方向焦点，辅助使用音频空间频谱。提出的Sp-Aux-Deformer模型在leave-one-subject-out和leave-one-trial-out情景中分别达到了显著的15类解码准确率，分别为57.48％和61.83％。

更新时间: 2024-11-11 12:32:26

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2411.06928v1

Multi-modal Iterative and Deep Fusion Frameworks for Enhanced Passive DOA Sensing via a Green Massive H2AD MIMO Receiver

Most existing DOA estimation methods assume ideal source incident angles with minimal noise. Moreover, directly using pre-estimated angles to calculate weighted coefficients can lead to performance loss. Thus, a green multi-modal (MM) fusion DOA framework is proposed to realize a more practical, low-cost and high time-efficiency DOA estimation for a H$^2$AD array. Firstly, two more efficient clustering methods, global maximum cos\_similarity clustering (GMaxCS) and global minimum distance clustering (GMinD), are presented to infer more precise true solutions from the candidate solution sets. Based on this, an iteration weighted fusion (IWF)-based method is introduced to iteratively update weighted fusion coefficients and the clustering center of the true solution classes by using the estimated values. Particularly, the coarse DOA calculated by fully digital (FD) subarray, serves as the initial cluster center. The above process yields two methods called MM-IWF-GMaxCS and MM-IWF-GMinD. To further provide a higher-accuracy DOA estimation, a fusion network (fusionNet) is proposed to aggregate the inferred two-part true angles and thus generates two effective approaches called MM-fusionNet-GMaxCS and MM-fusionNet-GMinD. The simulation outcomes show the proposed four approaches can achieve the ideal DOA performance and the CRLB. Meanwhile, proposed MM-fusionNet-GMaxCS and MM-fusionNet-GMinD exhibit superior DOA performance compared to MM-IWF-GMaxCS and MM-IWF-GMinD, especially in extremely-low SNR range.

Updated: 2024-11-11 12:32:18

标题: 多模态迭代和深度融合框架用于通过绿色大规模H2AD MIMO接收器增强被动DOA感知

摘要: 大多数现有的DOA估计方法假设理想的源入射角度和最小的噪声。此外，直接使用预估的角度来计算加权系数可能会导致性能损失。因此，提出了一种绿色多模态(MM)融合DOA框架，以实现一种更实用、低成本和高时效性的H$^2$AD阵列DOA估计。首先，提出了两种更有效的聚类方法，全局最大余弦相似度聚类(GMaxCS)和全局最小距离聚类(GMinD)，用于从候选解集中推断出更精确的真实解。基于此，引入了一种迭代加权融合(IWF)方法，通过使用估计值迭代更新加权融合系数和真实解类的聚类中心。特别地，由全数字(FD)子阵列计算得到的粗DOA作为初始簇中心。以上过程产生了两种方法，称为MM-IWF-GMaxCS和MM-IWF-GMinD。为了进一步提供更高精度的DOA估计，提出了一种融合网络(fusionNet)，用于聚合推断出的两部分真实角度，从而生成两种有效方法，称为MM-fusionNet-GMaxCS和MM-fusionNet-GMinD。模拟结果显示，所提出的四种方法可以实现理想的DOA性能和CRLB。同时，提出的MM-fusionNet-GMaxCS和MM-fusionNet-GMinD在极低信噪比范围内表现出优越的DOA性能，与MM-IWF-GMaxCS和MM-IWF-GMinD相比。

更新时间: 2024-11-11 12:32:18

领域: cs.AI

下载: http://arxiv.org/abs/2411.06927v1

Understanding Generalization in Quantum Machine Learning with Margins

Understanding and improving generalization capabilities is crucial for both classical and quantum machine learning (QML). Recent studies have revealed shortcomings in current generalization theories, particularly those relying on uniform bounds, across both classical and quantum settings. In this work, we present a margin-based generalization bound for QML models, providing a more reliable framework for evaluating generalization. Our experimental studies on the quantum phase recognition (QPR) dataset demonstrate that margin-based metrics are strong predictors of generalization performance, outperforming traditional metrics like parameter count. By connecting this margin-based metric to quantum information theory, we demonstrate how to enhance the generalization performance of QML through a classical-quantum hybrid approach when applied to classical data.

Updated: 2024-11-11 12:22:18

标题: 理解量子机器学习中的泛化性能：边界的作用

摘要: 理解和改进泛化能力对于传统机器学习和量子机器学习（QML）都至关重要。最近的研究揭示了当前泛化理论的不足之处，特别是那些依赖于统一界限的理论，无论是在传统还是量子设置中。在这项工作中，我们提出了一种基于边缘的QML模型泛化界限，为评估泛化提供了更可靠的框架。我们在量子相识别（QPR）数据集上的实验研究表明，基于边缘的度量是泛化性能的强大预测器，优于传统的度量如参数计数。通过将这种基于边缘的度量与量子信息理论联系起来，我们演示了如何通过经典-量子混合方法增强QML在应用于传统数据时的泛化性能。

更新时间: 2024-11-11 12:22:18

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2411.06919v1

Efficient Unsupervised Domain Adaptation Regression for Spatial-Temporal Air Quality Sensor Fusion

The deployment of affordable Internet of Things (IoT) sensors for air pollution monitoring has increased in recent years due to their scalability and cost-effectiveness. However, accurately calibrating these sensors in uncontrolled environments remains a significant challenge. While expensive reference sensors can provide accurate ground truth data, they are often deployed on a limited scale due to high costs, leading to a scarcity of labeled data. In diverse urban environments, data distributions constantly shift due to varying factors such as traffic patterns, industrial activities, and weather conditions, which impact sensor readings. Consequently, traditional machine learning models -- despite their increasing deployment for environmental sensor calibration -- often struggle to provide reliable pollutant measurements across different locations due to domain shifts. To address these challenges, we propose a novel unsupervised domain adaptation (UDA) method specifically tailored for regression tasks on graph-structured data. Our approach leverages Graph Neural Networks (GNNs) to model the relationships between sensors. To effectively capture critical spatial-temporal interactions, we incorporate spatial-temporal graph neural networks (STGNNs), which extend GNNs by incorporating temporal dynamics. To handle the resulting larger embeddings, we propose a domain adaptation method using a closed-form solution inspired by the Tikhonov-regularized least-squares problem. This method leverages Cholesky decomposition and power iteration to align the subspaces between source and target domains. By aligning these subspaces, our approach allows low-cost IoT sensors to learn calibration parameters from expensive reference sensors. This facilitates reliable pollutant measurements in new locations without the need for additional costly equipment.

Updated: 2024-11-11 12:20:57

标题: 高效的无监督领域自适应回归方法用于时空空气质量传感器融合

摘要: 近年来，由于其可扩展性和成本效益，廉价的物联网（IoT）传感器在空气污染监测方面的部署逐渐增加。然而，在不受控制的环境中准确校准这些传感器仍然是一个重大挑战。虽然昂贵的参考传感器可以提供准确的基准数据，但由于高昂的成本，它们通常只在有限范围内部署，导致标记数据的稀缺性。在各种城市环境中，数据分布由于交通模式、工业活动和天气条件等不同因素的影响而不断变化，这些因素会影响传感器的读数。因此，传统的机器学习模型尽管在环境传感器校准方面的应用越来越广泛，但由于领域转移，它们往往难以在不同位置提供可靠的污染物浓度测量。为了解决这些挑战，我们提出了一种专门针对图结构数据上回归任务的无监督领域自适应（UDA）方法。我们的方法利用图神经网络（GNNs）来建模传感器之间的关系。为了有效捕捉关键的时空相互作用，我们引入了时空图神经网络（STGNNs），通过结合时间动态来扩展GNNs。为了处理结果更大的嵌入，我们提出了一种使用受Tikhonov正则化最小二乘问题启发的封闭形式解的领域自适应方法。该方法利用Cholesky分解和幂迭代来对齐源域和目标域之间的子空间。通过对齐这些子空间，我们的方法让低成本的物联网传感器能够从昂贵的参考传感器中学习校准参数，从而在新位置实现可靠的污染物浓度测量，而无需额外昂贵的设备。

更新时间: 2024-11-11 12:20:57

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2411.06917v1

Inferentialist Resource Semantics

In systems modelling, a 'system' typically comprises located resources relative to which processes execute. One important use of logic in informatics is in modelling such systems for the purpose of reasoning (perhaps automated) about their behaviour and properties. To this end, one requires an interpretation of logical formulae in terms of the resources and states of the system; such an interpretation is called a 'resource semantics' of the logic. This paper shows how inferentialism -- the view that meaning is given in terms of inferential behaviour -- enables a versatile and expressive framework for resource semantics. Specifically, how inferentialism seamlessly incorporates the assertion-based approach of the logic of Bunched Implications, foundational in program verification (e.g., as the basis of Separation Logic), and the renowned number-of-uses reading of Linear Logic. This integration enables reasoning about shared and separated resources in intuitive and familiar ways, as well as about the composition and interfacing of system components.

Updated: 2024-11-11 12:20:34

标题: 推理资源语义学

摘要: 在系统建模中，“系统”通常包括相对于其执行过程的资源。逻辑在信息学中的一个重要用途是对这些系统进行建模，目的是推理（可能是自动化的）它们的行为和属性。为此，需要根据系统的资源和状态解释逻辑公式；这种解释称为逻辑的“资源语义”。本文展示了推理主义的用途——即意义是根据推理行为确定的观点——为资源语义提供了一个多功能和富有表现力的框架。具体来说，推理主义如何无缝地融合了逻辑的“束联含义”中的基于断言的方法，这在程序验证（例如分离逻辑的基础）中是基础，以及线性逻辑中著名的“使用次数”解读。这种整合使得以直观和熟悉的方式推理共享和分离资源，以及系统组件的组合和接口。

更新时间: 2024-11-11 12:20:34

领域: cs.LO,cs.CR,cs.SY,eess.SY,math.LO

下载: http://arxiv.org/abs/2402.09217v5

Slowing Down Forgetting in Continual Learning

A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods to slow down forgetting. We further demonstrate the performance gain from our framework across a large series of experiments, including different CL scenarios (class incremental, domain incremental, task incremental learning) different datasets (MNIST, CIFAR10), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.

Updated: 2024-11-11 12:19:28

标题: 持续学习中的遗忘减缓

摘要: 在继续学习（CL）中的一个常见挑战是灾难性遗忘，即在学习新的附加任务后，旧任务的表现下降。在本文中，我们提出了一个名为ReCL的新颖框架，用于减缓CL中的遗忘。我们的框架利用了基于梯度的神经网络的隐含偏差，使其收敛到边缘最大化点。这种收敛点使我们能够重建先前任务的旧数据，然后将其与当前训练数据结合。我们的框架灵活且可以应用于现有的最先进的CL方法，以减缓遗忘。我们进一步展示了我们的框架在一系列大型实验中的性能增益，包括不同的CL场景（类增量、领域增量、任务增量学习）、不同的数据集（MNIST、CIFAR10）和不同的网络架构。在所有实验中，我们发现通过ReCL获得了巨大的性能增益。据我们所知，我们的框架是第一个通过利用CL模型作为它们自己的内存缓冲区来解决灾难性遗忘的方法。

更新时间: 2024-11-11 12:19:28

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2411.06916v1

Revisiting, Benchmarking and Understanding Unsupervised Graph Domain Adaptation

Unsupervised Graph Domain Adaptation (UGDA) involves the transfer of knowledge from a label-rich source graph to an unlabeled target graph under domain discrepancies. Despite the proliferation of methods designed for this emerging task, the lack of standard experimental settings and fair performance comparisons makes it challenging to understand which and when models perform well across different scenarios. To fill this gap, we present the first comprehensive benchmark for unsupervised graph domain adaptation named GDABench, which encompasses 16 algorithms across 5 datasets with 74 adaptation tasks. Through extensive experiments, we observe that the performance of current UGDA models varies significantly across different datasets and adaptation scenarios. Specifically, we recognize that when the source and target graphs face significant distribution shifts, it is imperative to formulate strategies to effectively address and mitigate graph structural shifts. We also find that with appropriate neighbourhood aggregation mechanisms, simple GNN variants can even surpass state-of-the-art UGDA baselines. To facilitate reproducibility, we have developed an easy-to-use library PyGDA for training and evaluating existing UGDA methods, providing a standardized platform in this community. Our source codes and datasets can be found at: https://github.com/pygda-team/pygda.

Updated: 2024-11-11 12:16:49

标题: 重新审视、基准测试和理解无监督图领域自适应

摘要: 无监督图领域自适应（UGDA）涉及在领域差异下，将知识从标签丰富的源图传输到未标记的目标图。尽管已经设计了大量用于这一新兴任务的方法，但缺乏标准实验设置和公平性能比较，使得在不同情景下了解哪种模型何时表现良好具有挑战性。为了填补这一空白，我们提出了第一个无监督图领域自适应的全面基准GDABench，包括5个数据集上的16种算法，共74个自适应任务。通过大量实验，我们观察到当前UGDA模型的性能在不同数据集和自适应情景下变化显著。具体来说，我们认识到当源图和目标图面临显著的分布偏移时，有必要制定策略来有效解决和减轻图结构的偏移。我们还发现，通过适当的邻域聚合机制，简单的GNN变体甚至可以超越最先进的UGDA基线。为促进可重现性，我们开发了一个易于使用的库PyGDA，用于训练和评估现有的UGDA方法，在该社区提供了一个标准化的平台。我们的源代码和数据集可以在以下链接找到：https://github.com/pygda-team/pygda。

更新时间: 2024-11-11 12:16:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.11052v2

Gaussian Process Emulators for Few-Shot Segmentation in Cardiac MRI

Segmentation of cardiac magnetic resonance images (MRI) is crucial for the analysis and assessment of cardiac function, helping to diagnose and treat various cardiovascular diseases. Most recent techniques rely on deep learning and usually require an extensive amount of labeled data. To overcome this problem, few-shot learning has the capability of reducing data dependency on labeled data. In this work, we introduce a new method that merges few-shot learning with a U-Net architecture and Gaussian Process Emulators (GPEs), enhancing data integration from a support set for improved performance. GPEs are trained to learn the relation between the support images and the corresponding masks in latent space, facilitating the segmentation of unseen query images given only a small labeled support set at inference. We test our model with the M&Ms-2 public dataset to assess its ability to segment the heart in cardiac magnetic resonance imaging from different orientations, and compare it with state-of-the-art unsupervised and few-shot methods. Our architecture shows higher DICE coefficients compared to these methods, especially in the more challenging setups where the size of the support set is considerably small.

Updated: 2024-11-11 12:13:58

标题: 高斯过程模拟器用于少量样本分割在心脏磁共振成像中

摘要: 心脏磁共振图像(MRI)的分割对于心脏功能的分析和评估至关重要，有助于诊断和治疗各种心血管疾病。大多数最新技术依赖深度学习，并通常需要大量标记数据。为了克服这个问题，少样本学习具有减少对标记数据依赖性的能力。在这项工作中，我们引入了一种新方法，将少样本学习与U-Net架构和高斯过程模拟器(GPEs)相结合，增强了来自支持集的数据集成以提高性能。GPEs被训练来学习支持图像和潜在空间中相应掩模之间的关系，从而有助于在推理时仅给出一个小的标记支持集的情况下分割未见的查询图像。我们使用M&Ms-2公共数据集测试我们的模型，评估其在不同方向的心脏磁共振成像中分割心脏的能力，并将其与最先进的无监督和少样本方法进行比较。我们的架构显示出比这些方法更高的DICE系数，特别是在支持集大小相当小的更具挑战性的设置中。

更新时间: 2024-11-11 12:13:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.06911v1

Deep Riemannian Networks for End-to-End EEG Decoding

State-of-the-art performance in electroencephalography (EEG) decoding tasks is currently often achieved with either Deep-Learning (DL) or Riemannian-Geometry-based decoders (RBDs). Recently, there is growing interest in Deep Riemannian Networks (DRNs) possibly combining the advantages of both previous classes of methods. However, there are still a range of topics where additional insight is needed to pave the way for a more widespread application of DRNs in EEG. These include architecture design questions such as network size and end-to-end ability. How these factors affect model performance has not been explored. Additionally, it is not clear how the data within these networks is transformed, and whether this would correlate with traditional EEG decoding. Our study aims to lay the groundwork in the area of these topics through the analysis of DRNs for EEG with a wide range of hyperparameters. Networks were tested on five public EEG datasets and compared with state-of-the-art ConvNets. Here we propose EE(G)-SPDNet, and we show that this wide, end-to-end DRN can outperform the ConvNets, and in doing so use physiologically plausible frequency regions. We also show that the end-to-end approach learns more complex filters than traditional band-pass filters targeting the classical alpha, beta, and gamma frequency bands of the EEG, and that performance can benefit from channel specific filtering approaches. Additionally, architectural analysis revealed areas for further improvement due to the possible under utilisation of Riemannian specific information throughout the network. Our study thus shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.

Updated: 2024-11-11 12:05:10

标题: 深度黎曼网络用于端到端脑电图解码

摘要: 目前，在脑电图（EEG）解码任务中，最先进的性能通常是通过深度学习（DL）或黎曼几何基础解码器（RBDs）实现的。最近，对深度黎曼网络（DRNs）的兴趣正在增长，可能结合了前两种方法的优势。然而，仍然有一系列问题需要进一步探讨，以便更广泛地应用DRNs在EEG中。这些问题包括架构设计问题，如网络大小和端对端能力。这些因素如何影响模型性能尚未得到探讨。此外，目前尚不清楚这些网络中的数据是如何转换的，以及这是否与传统的EEG解码有关。我们的研究旨在通过分析具有广泛超参数的DRNs在EEG中的应用领域奠定基础。网络在五个公共EEG数据集上进行了测试，并与最先进的ConvNets进行了比较。在这里，我们提出了EE(G)-SPDNet，并展示了这种广泛的端到端DRN可以胜过ConvNets，并利用生理学上合理的频率区域。我们还展示了端到端方法学习比传统的带通滤波器更复杂的滤波器，以定位EEG的经典α、β和γ频段，并且性能可以受益于通道特定的滤波方法。此外，架构分析揭示了可能由于整个网络中黎曼特有信息的潜在利用不足而需要进一步改进的领域。因此，我们的研究展示了如何设计和训练DRNs，以从原始EEG中推断与任务相关的信息，而无需手工制作的滤波器组，并强调了像EE(G)-SPDNet这样的端到端DRNs在高性能EEG解码中的潜力。

更新时间: 2024-11-11 12:05:10

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2212.10426v7

Preserving correlations: A statistical method for generating synthetic data

We propose a method to generate statistically representative synthetic data from a given dataset. The main goal of our method is for the created data set to mimic the between feature correlations present in the original data, while also offering a tunable parameter to influence the privacy level. In particular, our method constructs a statistical map by using the empirical conditional distributions between the features of the original dataset. We describe in detail our algorithms used both in the construction of a statistical map and how to use this map to generate synthetic observations. This approach is tested in three different ways: with a hand calculated example; a manufactured dataset; and a real world energy-related dataset of consumption/production of households in Madeira Island. We test our method's performance by comparing the datasets using the on Pearson correlation matrix. The proposed methodology is general in the sense that it does not rely on the used test dataset. We expect it to be applicable in a much broader context than indicated here.

Updated: 2024-11-11 12:01:06

标题: 保持相关性：一种生成合成数据的统计方法

摘要: 我们提出了一种方法，可以从给定的数据集中生成具有统计代表性的合成数据。我们方法的主要目标是创建的数据集模仿原始数据中存在的特征之间的相关性，同时还提供一个可调参数来影响隐私级别。具体来说，我们的方法通过使用原始数据集的特征之间的经验条件分布构建统计地图。我们详细描述了我们算法在构建统计地图和如何使用该地图生成合成观测数据时所使用的方法。这种方法通过三种不同方式进行了测试：使用手工计算的示例；一个人为制造的数据集；以及马德拉岛家庭能源相关数据集中的消费/生产数据。我们通过比较数据集使用皮尔逊相关矩阵来测试我们方法的性能。所提出的方法在不依赖于使用的测试数据集的意义上是通用的。我们期望它适用范围比这里所示的要广泛。

更新时间: 2024-11-11 12:01:06

领域: cs.LG,math.PR,physics.data-an

下载: http://arxiv.org/abs/2403.01471v2

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI.

Updated: 2024-11-11 12:00:35

标题: CTIBench：用于评估网络威胁情报中LLMs的基准测试

摘要: 网络威胁情报（CTI）在当今的网络安全领域至关重要，为了解和减轻不断演变的网络威胁提供了重要见解。最近大型语言模型（LLM）的兴起显示出在这一领域具有潜力，但对它们的可靠性、准确性和幻觉仍然存在担忧。虽然现有的基准测试提供了对LLM的一般评估，但没有基准测试涉及CTI特定任务的实际和应用方面。为了弥合这一差距，我们介绍了CTIBench，这是一个旨在评估LLM在CTI应用中性能的基准测试。CTIBench包括多个数据集，重点评估LLM在网络威胁领域所获得的知识。我们对几种最先进模型在这些任务上的评估为我们提供了有关它们在CTI环境中的优势和劣势的见解，有助于更好地了解LLM在CTI中的能力。

更新时间: 2024-11-11 12:00:35

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.07599v3

LongSafetyBench: Long-Context LLMs Struggle with Safety Issues

With the development of large language models (LLMs), the sequence length of these models continues to increase, drawing significant attention to long-context language models. However, the evaluation of these models has been primarily limited to their capabilities, with a lack of research focusing on their safety. Existing work, such as ManyShotJailbreak, has to some extent demonstrated that long-context language models can exhibit safety concerns. However, the methods used are limited and lack comprehensiveness. In response, we introduce \textbf{LongSafetyBench}, the first benchmark designed to objectively and comprehensively evaluate the safety of long-context models. LongSafetyBench consists of 10 task categories, with an average length of 41,889 words. After testing eight long-context language models on LongSafetyBench, we found that existing models generally exhibit insufficient safety capabilities. The proportion of safe responses from most mainstream long-context LLMs is below 50\%. Moreover, models' safety performance in long-context scenarios does not always align with that in short-context scenarios. Further investigation revealed that long-context models tend to overlook harmful content within lengthy texts. We also proposed a simple yet effective solution, allowing open-source models to achieve performance comparable to that of top-tier closed-source models. We believe that LongSafetyBench can serve as a valuable benchmark for evaluating the safety capabilities of long-context language models. We hope that our work will encourage the broader community to pay attention to the safety of long-context models and contribute to the development of solutions to improve the safety of long-context LLMs.

Updated: 2024-11-11 11:57:37

标题: 长安全长椅：长上下文LLMs在安全问题上的挣扎

摘要: 随着大型语言模型（LLMs）的发展，这些模型的序列长度不断增加，引起了对长上下文语言模型的广泛关注。然而，对这些模型的评估主要局限在它们的能力上，缺乏关注它们的安全性的研究。现有的工作，如ManyShotJailbreak，在一定程度上证明了长上下文语言模型可能存在安全问题。然而，所使用的方法有限且缺乏综合性。为此，我们引入了\textbf{LongSafetyBench}，这是第一个旨在客观和全面评估长上下文模型安全性的基准测试。LongSafetyBench包含10个任务类别，平均长度为41,889个单词。在LongSafetyBench上测试了8个长上下文语言模型后，我们发现现有的模型通常表现出不足的安全能力。大多数主流长上下文LLMs的安全响应比例低于50％。此外，模型在长上下文场景中的安全性表现并不总是与短上下文场景中的一致。进一步的调查揭示了长上下文模型倾向于忽视长文本中的有害内容。我们还提出了一个简单而有效的解决方案，使开源模型可以达到与顶尖封闭源模型相媲美的性能。我们相信LongSafetyBench可以作为评估长上下文语言模型安全能力的有价值基准。我们希望我们的工作能够鼓励更广泛的社区关注长上下文模型的安全性，并为改善长上下文LLMs的安全性提供解决方案的发展做出贡献。

更新时间: 2024-11-11 11:57:37

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.06899v1

Problem Solving Through Human-AI Preference-Based Cooperation

While there is a widespread belief that artificial general intelligence (AGI) -- or even superhuman AI -- is imminent, complex problems in expert domains are far from being solved. We argue that such problems require human-AI cooperation and that the current state of the art in generative AI is unable to play the role of a reliable partner due to a multitude of shortcomings, including inability to keep track of a complex solution artifact (e.g., a software program), limited support for versatile human preference expression and lack of adapting to human preference in an interactive setting. To address these challenges, we propose HAI-Co2, a novel human-AI co-construction framework. We formalize HAI-Co2 and discuss the difficult open research problems that it faces. Finally, we present a case study of HAI-Co2 and demonstrate its efficacy compared to monolithic generative AI models.

Updated: 2024-11-11 11:44:20

标题: 通过人工智能偏好合作解决问题

摘要: 尽管广泛认为人工通用智能（AGI）——甚至超人工智能——即将到来，但专家领域中的复杂问题远未得到解决。我们认为这类问题需要人工智能与人类的合作，并且由于诸多缺陷，包括无法跟踪复杂解决方案的不足（例如软件程序）、有限的支持多样化人类偏好表达以及在交互环境中无法适应人类偏好，目前的生成式人工智能技术无法成为可靠的合作伙伴。为了解决这些挑战，我们提出了一种新颖的人工智能与人类共同构建框架HAI-Co2。我们对HAI-Co2进行了形式化，并讨论了它面临的困难开放性研究问题。最后，我们提供了HAI-Co2的案例研究，并展示了其与单一生成式人工智能模型相比的有效性。

更新时间: 2024-11-11 11:44:20

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2408.07461v3

SPARTAN: A Sparse Transformer Learning Local Causation

Causal structures play a central role in world models that flexibly adapt to changes in the environment. While recent works motivate the benefits of discovering local causal graphs for dynamics modelling, in this work we demonstrate that accurately capturing these relationships in complex settings remains challenging for the current state-of-the-art. To remedy this shortcoming, we postulate that sparsity is a critical ingredient for the discovery of such local causal structures. To this end we present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns local causal structures between entities in a scene. By applying sparsity regularisation on the attention pattern between object-factored tokens, SPARTAN identifies sparse local causal models that accurately predict future object states. Furthermore, we extend our model to capture sparse interventions with unknown targets on the dynamics of the environment. This results in a highly interpretable world model that can efficiently adapt to changes. Empirically, we evaluate SPARTAN against the current state-of-the-art in object-centric world models on observation-based environments and demonstrate that our model can learn accurate local causal graphs and achieve significantly improved few-shot adaptation to changes in the dynamics of the environment as well as robustness against removing irrelevant distractors.

Updated: 2024-11-11 11:42:48

标题: 斯巴达：一种学习局部因果关系的稀疏变压器

摘要: 因果结构在灵活适应环境变化的世界模型中起着核心作用。尽管最近的研究激发了发现动态建模的局部因果图的好处，但在本研究中，我们证明在复杂环境中准确捕捉这些关系仍然是当前技术的难点。为了弥补这一不足，我们假设稀疏性是发现这种局部因果结构的关键因素。为此，我们提出了基于Transformer的SPARse TrANsformer World模型（SPARTAN），该模型学习场景中实体之间的局部因果结构。通过在对象分解令牌之间的注意模式上应用稀疏正则化，SPARTAN识别出能够准确预测未来对象状态的稀疏局部因果模型。此外，我们将模型扩展到捕捉对环境动态的未知目标进行稀疏干预。这导致了一个高度可解释的世界模型，能够有效地适应变化。在实证方面，我们将SPARTAN与基于对象的世界模型在基于观察的环境中进行了评估，并证明我们的模型能够学习准确的局部因果图，并实现对环境动态变化的少样本适应性显着改善，同时具有对移除无关干扰者的鲁棒性。

更新时间: 2024-11-11 11:42:48

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.06890v1

Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications

The objective of augmented reality (AR) is to add digital content to natural images and videos to create an interactive experience between the user and the environment. Scene analysis and object recognition play a crucial role in AR, as they must be performed quickly and accurately. In this study, a new approach is proposed that involves using oriented bounding boxes with a detection and recognition deep network to improve performance and processing time. The approach is evaluated using two datasets: a real image dataset (DOTA dataset) commonly used for computer vision tasks, and a synthetic dataset that simulates different environmental, lighting, and acquisition conditions. The focus of the evaluation is on small objects, which are difficult to detect and recognise. The results indicate that the proposed approach tends to produce better Average Precision and greater accuracy for small objects in most of the tested conditions.

Updated: 2024-11-11 11:41:13

标题: 评估环境条件对使用定向边界框进行对象检测的影响，用于增强现实应用

摘要: 增强现实（AR）的目标是将数字内容添加到自然图像和视频中，以创建用户和环境之间的交互体验。场景分析和物体识别在AR中起着至关重要的作用，因为它们必须快速且准确地执行。本研究提出了一种新的方法，该方法涉及使用定向边界框和检测识别深度网络来提高性能和处理时间。该方法使用两个数据集进行评估：一个用于计算机视觉任务的真实图像数据集（DOTA数据集）和一个模拟不同环境、光照和采集条件的合成数据集。评估的重点是难以检测和识别的小型物体。结果表明，所提出的方法在大多数测试条件下倾向于产生更好的平均精度和更高的小型物体准确性。

更新时间: 2024-11-11 11:41:13

领域: cs.CV,cs.AI,I.2.10

下载: http://arxiv.org/abs/2306.16798v2

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with several simple questions supported by a generic fact, LLMs often struggle to abstract and apply the generic fact to provide consistent and precise answers, revealing a deficiency in abstract reasoning abilities. This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing. In light of this, we design a preliminary study to quantify and delve into the abstract reasoning abilities of existing LLMs. Our findings reveal a substantial discrepancy between their general reasoning and abstract reasoning performances. To relieve this problem, we tailor an abstract reasoning dataset (AbsR) together with a meaningful learning paradigm to teach LLMs how to leverage generic facts for reasoning purposes. The results show that our approach not only boosts the general reasoning performance of LLMs but also makes considerable strides towards their capacity for abstract reasoning, moving beyond simple memorization or imitation to a more nuanced understanding and application of generic facts. The code is available at https://github.com/Waste-Wood/MeanLearn.

Updated: 2024-11-11 11:35:28

标题: 有意义的学习：通过通用事实引导增强大型语言模型中的抽象推理

摘要: 大型语言模型（LLMs）在各种推理场景中展现出令人印象深刻的性能和强大的可解释性，标志着朝着模仿类似人类智能迈出了重要的一步。然而，当要求LLMs回答几个简单问题并提供支持的通用事实时，它们常常难以抽象和应用通用事实以提供一致和精确的答案，揭示了抽象推理能力的不足。这引发了一场激烈的争论，即LLMs是否真正进行推理还是仅仅记忆。鉴于此，我们设计了一项初步研究，以量化和探究现有LLMs的抽象推理能力。我们的研究结果显示它们的一般推理和抽象推理性能之间存在重大差异。为了缓解这个问题，我们设计了一个抽象推理数据集（AbsR），并结合一种有意义的学习范式，教导LLMs如何利用通用事实进行推理。结果显示，我们的方法不仅提升了LLMs的一般推理性能，还在抽象推理能力方面取得了显著进展，超越了简单的记忆或模仿，实现了对通用事实更加细致理解和应用的能力。代码可在https://github.com/Waste-Wood/MeanLearn 上找到。

更新时间: 2024-11-11 11:35:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.09085v2

LMLPA: Language Model Linguistic Personality Assessment

Large Language Models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs' language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the AI rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilising Principal Component Analysis and reliability validations, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Computer Interaction and Human-Centered AI, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.

Updated: 2024-11-11 11:32:21

标题: LMLPA：语言模型语言个性评估

摘要: 大型语言模型（LLMs）越来越多地应用于日常生活和研究中。其中最常见的用例之一是通过LLMs的语言生成能力实现的对话交互。就像两个人之间一样，LLM驱动的实体与人之间的对话取决于对话者的个性。然而，目前衡量特定LLM的个性是一项挑战。本文介绍了语言模型语言个性评估（LMLPA），这是一个旨在评估LLMs语言个性的系统。我们的系统通过定量评估在LLMs语言输出中反映的不同个性特征，帮助理解LLMs的语言生成能力。与传统以人为中心的心理测量不同，LMLPA采用个性评估问卷，特别是五大人格问卷，以符合LLMs的操作能力，并融合了先前语言为基础的个性测量文献的研究结果。为了减轻对选项顺序的敏感性，我们的问卷设计为开放式，产生文本答案。因此，需要AI评分员将文本回答中的模糊个性信息转化为清晰的个性特征的数值指标。利用主成分分析和可靠性验证，我们的发现表明LLMs具有可以通过LMLPA有效量化的独特个性特征。这项研究对人机交互和以人为中心的人工智能做出了贡献，为未来研究提供了一个稳健的框架，以完善AI个性评估并扩大其在教育和制造等多个领域的应用。

更新时间: 2024-11-11 11:32:21

领域: cs.CL,cs.AI,I.2

下载: http://arxiv.org/abs/2410.17632v2

Generative Data Assimilation of Sparse Weather Station Observations at Kilometer Scales

Data assimilation of observational data into full atmospheric states is essential for weather forecast model initialization. Recently, methods for deep generative data assimilation have been proposed which allow for using new input data without retraining the model. They could also dramatically accelerate the costly data assimilation process used in operational regional weather models. Here, in a central US testbed, we demonstrate the viability of score-based data assimilation in the context of realistically complex km-scale weather. We train an unconditional diffusion model to generate snapshots of a state-of-the-art km-scale analysis product, the High Resolution Rapid Refresh. Then, using score-based data assimilation to incorporate sparse weather station data, the model produces maps of precipitation and surface winds. The generated fields display physically plausible structures, such as gust fronts, and sensitivity tests confirm learnt physics through multivariate relationships. Preliminary skill analysis shows the approach already outperforms a naive baseline of the High-Resolution Rapid Refresh system itself. By incorporating observations from 40 weather stations, 10% lower RMSEs on left-out stations are attained. Despite some lingering imperfections such as insufficiently disperse ensemble DA estimates, we find the results overall an encouraging proof of concept, and the first at km-scale. It is a ripe time to explore extensions that combine increasingly ambitious regional state generators with an increasing set of in situ, ground-based, and satellite remote sensing data streams.

Updated: 2024-11-11 11:26:26

标题: 公里尺度下稀疏气象站观测数据的生成式数据同化

摘要: 将观测数据同化到完整的大气状态对于天气预报模型的初始化至关重要。最近，提出了深度生成数据同化的方法，这些方法允许使用新的输入数据而无需重新训练模型。它们还可以大大加速操作区域天气模型中使用的昂贵数据同化过程。在美国中部的一个测试区域中，我们展示了基于评分的数据同化在现实复杂的千米尺度天气中的可行性。我们训练了一个无条件扩散模型，生成最先进的千米尺度分析产品High Resolution Rapid Refresh的快照。然后，使用基于评分的数据同化来整合稀疏的天气站数据，模型生成降水和地表风速的地图。生成的场显示出物理上合理的结构，如阵风线，敏感性测试通过多变量关系确认了学到的物理知识。初步技能分析显示，这种方法已经优于High-Resolution Rapid Refresh系统本身的天真基线。通过整合来自40个天气站的观测数据，左出站点的RMSE降低了10%。尽管存在一些仍然存在的不完美，如集合数据同化估计不够分散，但我们发现这些结果总体上是一个令人鼓舞的概念验证，并且是千米尺度的第一个。现在是一个适合探索扩展的时机，将日益雄心勃勃的区域状态生成器与越来越多的原位、地面和卫星遥感数据流结合起来。

更新时间: 2024-11-11 11:26:26

领域: cs.LG,physics.ao-ph,J.2

下载: http://arxiv.org/abs/2406.16947v2

WassFFed: Wasserstein Fair Federated Learning

Federated Learning (FL) employs a training approach to address scenarios where users' data cannot be shared across clients. Achieving fairness in FL is imperative since training data in FL is inherently geographically distributed among diverse user groups. Existing research on fairness predominantly assumes access to the entire training data, making direct transfer to FL challenging. However, the limited existing research on fairness in FL does not effectively address two key challenges, i.e., (CH1) Current methods fail to deal with the inconsistency between fair optimization results obtained with surrogate functions and fair classification results. (CH2) Directly aggregating local fair models does not always yield a globally fair model due to non Identical and Independent data Distributions (non-IID) among clients. To address these challenges, we propose a Wasserstein Fair Federated Learning framework, namely WassFFed. To tackle CH1, we ensure that the outputs of local models, rather than the loss calculated with surrogate functions or classification results with a threshold, remain independent of various user groups. To resolve CH2, we employ a Wasserstein barycenter calculation of all local models' outputs for each user group, bringing local model outputs closer to the global output distribution to ensure consistency between the global model and local models. We conduct extensive experiments on three real-world datasets, demonstrating that WassFFed outperforms existing approaches in striking a balance between accuracy and fairness.

Updated: 2024-11-11 11:26:22

标题: WassFFed: Wasserstein公平联邦学习

摘要: 联邦学习（FL）采用一种训练方法来解决用户数据无法在客户端之间共享的情况。在FL中实现公平性至关重要，因为FL中的训练数据在不同用户群体之间本质上是地理分布的。现有关于公平性的研究主要假设能够访问整个训练数据，从而使直接转移到FL变得具有挑战性。然而，在FL中有限的现有关于公平性的研究并未有效解决两个关键挑战，即（CH1）当前方法无法处理使用替代函数获得的公平优化结果与公平分类结果之间的不一致性。（CH2）直接汇总本地公平模型并不总是会产生全局公平模型，因为客户端之间存在非相同且独立的数据分布（non-IID）。为了解决这些挑战，我们提出了一种Wasserstein Fair Federated Learning框架，即WassFFed。为了解决CH1，我们确保本地模型的输出而不是使用替代函数计算的损失或分类结果与阈值保持独立于各种用户群体。为了解决CH2，我们使用所有本地模型输出的Wasserstein重心计算来对每个用户群体进行计算，使本地模型输出接近全局输出分布，以确保全局模型与本地模型之间的一致性。我们在三个真实世界数据集上进行了大量实验，表明WassFFed在平衡准确性和公平性方面优于现有方法。

更新时间: 2024-11-11 11:26:22

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.06881v1

GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs

Graph-based patterns are extensively employed and favored by practitioners within industrial companies due to their capacity to represent the behavioral attributes and topological relationships among users, thereby offering enhanced interpretability in comparison to black-box models commonly utilized for classification and recognition tasks. For instance, within the scenario of transaction risk management, a graph pattern that is characteristic of a particular risk category can be readily employed to discern transactions fraught with risk, delineate networks of criminal activity, or investigate the methodologies employed by fraudsters. Nonetheless, graph data in industrial settings is often characterized by its massive scale, encompassing data sets with millions or even billions of nodes, making the manual extraction of graph patterns not only labor-intensive but also necessitating specialized knowledge in particular domains of risk. Moreover, existing methodologies for mining graph patterns encounter significant obstacles when tasked with analyzing large-scale attributed graphs. In this work, we introduce GraphRPM, an industry-purpose parallel and distributed risk pattern mining framework on large attributed graphs. The framework incorporates a novel edge-involved graph isomorphism network alongside optimized operations for parallel graph computation, which collectively contribute to a considerable reduction in computational complexity and resource expenditure. Moreover, the intelligent filtration of efficacious risky graph patterns is facilitated by the proposed evaluation metrics. Comprehensive experimental evaluations conducted on real-world datasets of varying sizes substantiate the capability of GraphRPM to adeptly address the challenges inherent in mining patterns from large-scale industrial attributed graphs, thereby underscoring its substantial value for industrial deployment.

Updated: 2024-11-11 11:20:30

标题: GraphRPM：工业大型属性图上的风险模式挖掘

摘要: 基于图的模式被广泛应用和受到工业公司从业者的青睐，因为它们能够代表用户之间的行为属性和拓扑关系，从而在分类和识别任务中提供比常用的黑盒模型更强的可解释性。例如，在交易风险管理场景中，特定风险类别的图模式可以被轻松地用来识别充满风险的交易，揭示犯罪活动网络，或调查欺诈者所采用的方法。然而，在工业环境中，图数据往往具有巨大的规模，包括拥有数百万甚至数十亿节点的数据集，这不仅使得手动提取图模式变得费时费力，还需要在特定风险领域具有专业知识。此外，现有的挖掘图模式的方法在分析大规模属性图时遇到重大困难。在这项工作中，我们介绍了GraphRPM，一个面向工业目的的并行和分布式风险模式挖掘框架，用于大规模属性图。该框架整合了一种新颖的边涉及图同构网络，以及针对并行图计算进行优化的操作，共同降低了计算复杂度和资源开支。此外，提出的评估指标有助于智能地筛选出有效的风险图模式。对不同规模的真实世界数据集进行的全面实验评估证实了GraphRPM在从大规模工业属性图中挖掘模式方面的能力，进而强调其在工业部署中的巨大价值。

更新时间: 2024-11-11 11:20:30

领域: cs.LG,cs.AI,cs.DC,cs.SI

下载: http://arxiv.org/abs/2411.06878v1

CE-QArg: Counterfactual Explanations for Quantitative Bipolar Argumentation Frameworks (Technical Report)

There is a growing interest in understanding arguments' strength in Quantitative Bipolar Argumentation Frameworks (QBAFs). Most existing studies focus on attribution-based methods that explain an argument's strength by assigning importance scores to other arguments but fail to explain how to change the current strength to a desired one. To solve this issue, we introduce counterfactual explanations for QBAFs. We discuss problem variants and propose an iterative algorithm named Counterfactual Explanations for Quantitative bipolar Argumentation frameworks (CE-QArg). CE-QArg can identify valid and cost-effective counterfactual explanations based on two core modules, polarity and priority, which help determine the updating direction and magnitude for each argument, respectively. We discuss some formal properties of our counterfactual explanations and empirically evaluate CE-QArg on randomly generated QBAFs.

Updated: 2024-11-11 11:19:27

标题: CE-QArg：定量双极论证框架的反事实解释（技术报告）

摘要: 对于定量双极论证框架（QBAFs）中论点强度的理解越来越受到关注。大多数现有研究集中在基于归因的方法上，通过给其他论点分配重要性分数来解释论点的强度，但未能解释如何将当前强度改变为所需的强度。为了解决这个问题，我们引入了QBAFs的反事实解释。我们讨论了问题变种，并提出了一个名为Counterfactual Explanations for Quantitative Bipolar Argumentation Frameworks（CE-QArg）的迭代算法。CE-QArg可以基于两个核心模块，即极性和优先级，识别有效且成本效益高的反事实解释，这有助于确定每个论点的更新方向和幅度。我们讨论了我们反事实解释的一些形式属性，并在随机生成的QBAFs上对CE-QArg进行了经验评估。

更新时间: 2024-11-11 11:19:27

领域: cs.AI

下载: http://arxiv.org/abs/2407.08497v2

Multi-Modal interpretable automatic video captioning

Video captioning aims to describe video contents using natural language format that involves understanding and interpreting scenes, actions and events that occurs simultaneously on the view. Current approaches have mainly concentrated on visual cues, often neglecting the rich information available from other important modality of audio information, including their inter-dependencies. In this work, we introduce a novel video captioning method trained with multi-modal contrastive loss that emphasizes both multi-modal integration and interpretability. Our approach is designed to capture the dependency between these modalities, resulting in more accurate, thus pertinent captions. Furthermore, we highlight the importance of interpretability, employing multiple attention mechanisms that provide explanation into the model's decision-making process. Our experimental results demonstrate that our proposed method performs favorably against the state-of the-art models on commonly used benchmark datasets of MSR-VTT and VATEX.

Updated: 2024-11-11 11:12:23

标题: 多模态可解释的自动视频字幕生成

摘要: 视频标题旨在使用自然语言格式描述视频内容，其中涉及理解和解释同时发生在视野中的场景、动作和事件。目前的方法主要集中在视觉线索上，通常忽略了其他重要的音频信息模态提供的丰富信息，包括它们之间的相互依赖关系。在这项工作中，我们介绍了一种新颖的视频标题方法，该方法经过多模态对比损失训练，强调多模态集成和可解释性。我们的方法旨在捕捉这些模态之间的依赖关系，从而产生更准确、因此更相关的标题。此外，我们强调解释性的重要性，采用多个注意力机制，提供对模型决策过程的解释。我们的实验结果表明，我们提出的方法在MSR-VTT和VATEX的常用基准数据集上表现优于当前最先进的模型。

更新时间: 2024-11-11 11:12:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.06872v1

AI-Native Multi-Access Future Networks -- The REASON Architecture

The development of the sixth generation of communication networks (6G) has been gaining momentum over the past years, with a target of being introduced by 2030. Several initiatives worldwide are developing innovative solutions and setting the direction for the key features of these networks. Some common emerging themes are the tight integration of AI, the convergence of multiple access technologies and sustainable operation, aiming to meet stringent performance and societal requirements. To that end, we are introducing REASON - Realising Enabling Architectures and Solutions for Open Networks. The REASON project aims to address technical challenges in future network deployments, such as E2E service orchestration, sustainability, security and trust management, and policy management, utilising AI-native principles, considering multiple access technologies and cloud-native solutions. This paper presents REASON's architecture and the identified requirements for future networks. The architecture is meticulously designed for modularity, interoperability, scalability, simplified troubleshooting, flexibility, and enhanced security, taking into consideration current and future standardisation efforts, and the ease of implementation and training. It is structured into four horizontal layers: Physical Infrastructure, Network Service, Knowledge, and End-User Application, complemented by two vertical layers: Management and Orchestration, and E2E Security. This layered approach ensures a robust, adaptable framework to support the diverse and evolving requirements of 6G networks, fostering innovation and facilitating seamless integration of advanced technologies.

Updated: 2024-11-11 11:10:39

标题: AI本地多接入未来网络--REASON架构

摘要: 近年来，通信网络第六代（6G）的发展势头迅猛，计划在2030年前推出。全球各地的几个倡议正在开发创新解决方案，并为这些网络的关键特性设定方向。一些常见的新兴主题是人工智能的紧密集成、多种接入技术的融合和可持续运营，旨在满足严格的性能和社会需求。为此，我们介绍了REASON - 实现开放网络的启用架构和解决方案。REASON项目旨在解决未来网络部署中的技术挑战，例如端到端服务编排、可持续性、安全和信任管理以及政策管理，利用AI本地化原则，考虑多种接入技术和云原生解决方案。本文介绍了REASON的架构和未来网络的需求。该架构经过精心设计，具有模块化、互操作性、可扩展性、简化故障排除、灵活性和增强安全性，考虑了当前和未来的标准化工作，以及实施和培训的便利性。它分为四个水平层：物理基础设施、网络服务、知识和终端用户应用，辅以两个垂直层：管理和编排以及端到端安全。这种分层方法确保了一个强大、适应性强的框架，支持6G网络的各种不断发展的需求，促进创新并促进先进技术的无缝集成。

更新时间: 2024-11-11 11:10:39

领域: cs.NI,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.06870v1

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

Category-agnostic pose estimation (CAPE) has traditionally relied on support images with annotated keypoints, a process that is often cumbersome and may fail to fully capture the necessary correspondences across diverse object categories. Recent efforts have begun exploring the use of text-based queries, where the need for support keypoints is eliminated. However, the optimal use of textual descriptions for keypoints remains an underexplored area. In this work, we introduce CapeLLM, a novel approach that leverages a text-based multimodal large language model (MLLM) for CAPE. Our method only employs query image and detailed text descriptions as an input to estimate category-agnostic keypoints. We conduct extensive experiments to systematically explore the design space of LLM-based CAPE, investigating factors such as choosing the optimal description for keypoints, neural network architectures, and training strategies. Thanks to the advanced reasoning capabilities of the pre-trained MLLM, CapeLLM demonstrates superior generalization and robust performance. Our approach sets a new state-of-the-art on the MP-100 benchmark in the challenging 1-shot setting, marking a significant advancement in the field of category-agnostic pose estimation.

Updated: 2024-11-11 11:08:26

标题: CapeLLM：多模态大语言模型支持无类别姿态估计

摘要: 类别无关的姿势估计（CAPE）传统上依赖于带有注释关键点的支持图像，这一过程通常繁琐且可能无法完全捕捉跨不同对象类别的必要对应关系。最近的努力开始探索使用基于文本的查询，其中支持关键点的需求被消除。然而，文本描述对关键点的最佳利用仍然是一个未被充分探讨的领域。在这项工作中，我们介绍了CapeLLM，这是一种利用基于文本的多模态大语言模型（MLLM）进行CAPE的新方法。我们的方法仅利用查询图像和详细的文本描述作为输入来估计类别无关的关键点。我们进行了大量实验，系统地探索了基于LLM的CAPE的设计空间，研究了选择关键点的最佳描述、神经网络架构和训练策略等因素。由于预训练MLLM的先进推理能力，CapeLLM展示了卓越的泛化能力和稳健性能。我们的方法在具有挑战性的1-shot设置下在MP-100基准上取得了新的最先进水平，标志着类别无关姿势估计领域的重大进步。

更新时间: 2024-11-11 11:08:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.06869v1

Effect sizes as a statistical feature-selector-based learning to detect breast cancer

Breast cancer detection is still an open research field, despite a tremendous effort devoted to work in this area. Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale. Feature selection is widely used to reduce the dimensionality of data by selecting only a subset of predictor variables to improve a learning model. In this work, an algorithm and experimental results demonstrate the feasibility of developing a statistical feature-selector-based learning tool capable of reducing the data dimensionality using parametric effect size measures from features extracted from cell nuclei images. The SVM classifier with a linear kernel as a learning tool achieved an accuracy of over 90%. These excellent results suggest that the effect size is within the standards of the feature-selector methods

Updated: 2024-11-11 11:07:38

标题: 作为一种统计特征选择器的效应大小学习用于检测乳腺癌

摘要: 乳腺癌检测仍然是一个开放的研究领域，尽管在这个领域投入了大量的努力。效应大小是一个统计概念，用于衡量两个变量之间关系的强度。特征选择被广泛用于通过仅选择一部分预测变量来降低数据的维度，从而改善学习模型。在这项工作中，算法和实验结果表明，开发一种基于统计特征选择器的学习工具是可行的，该工具能够使用从细胞核图像中提取的特征的参数效应大小度量来降低数据维度。作为学习工具的具有线性核的SVM分类器实现了超过90%的准确率。这些优秀的结果表明，效应大小符合特征选择器方法的标准。

更新时间: 2024-11-11 11:07:38

领域: stat.ML,cs.LG,eess.IV

下载: http://arxiv.org/abs/2411.06868v1

Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification

Breast cancer is a leading cause of mortality worldwide, and demands the critical need for early and accurate diagnostic tools. Ultrasound imaging is a widely used modality for breast cancer screening, yet the precise segmentation and classification of tumors in these images are challenging due to variations in tumor morphology and image quality. To address these challenges, we propose novel deep learning (DL) frameworks leveraging a precision mapping mechanism (PMM) along with a component-specific feature enhancement module (CSFEM) to improve breast cancer lesion segmentation and identification. Our PPM ensures that the segmentation accurately reflects the true shape and extent of the tumor by meticulously delineating their boundaries. The CSFEM focuses on extracting and amplifying features unique to different tumor types, enabling the model to effectively distinguish between benign, malignant, and normal tissues. Integrating PMM and CSFEM into our segmentation model yielded an accuracy of 98.1%, an IoU of 96.9%, and a Dice Coefficient of 97.2%. Similarly, our classification model achieved an accuracy of 99.2%, with F1-score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively. Our results indicate significant improvement in evaluation metrics in comparison to state-of-the-art (SOTA) models, demonstrating the effectiveness of precision mapping and component-specific feature enhancement in advancing breast cancer lesion analysis.

Updated: 2024-11-11 11:05:49

标题: 利用精确映射和组件特定特征增强进行乳腺癌分割和识别

摘要: 乳腺癌是全球死亡率的主要原因，迫切需要早期和准确的诊断工具。超声成像是乳腺癌筛查的广泛应用模式，然而由于肿瘤形态和图像质量的变化，对这些图像中肿瘤的精确分割和分类具有挑战性。为了解决这些挑战，我们提出了一种新颖的深度学习（DL）框架，利用精密映射机制（PMM）以及组件特定特征增强模块（CSFEM）来改善乳腺癌病变的分割和识别。我们的PMM确保分割准确地反映肿瘤的真实形状和范围，通过精细地勾画它们的边界。CSFEM专注于提取和放大不同肿瘤类型独特的特征，使模型能够有效区分良性、恶性和正常组织。将PMM和CSFEM整合到我们的分割模型中，实现了98.1%的准确度，96.9%的IoU和97.2%的Dice系数。同样，我们的分类模型实现了99.2%的准确度，F1分数、精确度和召回率分别为99.1%、99.3%和99.1%。我们的结果表明，在评估指标方面与最新技术（SOTA）模型相比，明显改善了，展示了精密映射和组件特定特征增强在推进乳腺癌病变分析方面的有效性。

更新时间: 2024-11-11 11:05:49

领域: eess.IV,cs.CV,cs.LG,F.2.2; I.2.7

下载: http://arxiv.org/abs/2407.02844v4

Evaluating Synthetically Generated Data from Small Sample Sizes: An Experimental Study

This work proposes a method to evaluate the similarity between low-sample tabular data and synthetically generated data with a larger number of samples than the original. The technique is known to as data augmentation. However, significance values derived from non-parametric tests are questionable when the sample size is limited. Our approach uses a combination of geometry, topology, and robust statistics for hypothesis testing to evaluate the "validity" of generated data. We additionally contrast the findings with prominent global metric practices described in the literature for large sample size data.

Updated: 2024-11-11 11:04:06

标题: 评估来自小样本合成数据的实验性研究

摘要: 这项工作提出了一种方法，用于评估低样本表格数据与具有较大样本数量的合成数据之间的相似性。该技术被称为数据增强。然而，在样本量有限时，从非参数检验中导出的显著性值是值得质疑的。我们的方法结合了几何学、拓扑学和稳健统计学，用于假设检验以评估生成数据的“有效性”。我们还将所发现的结果与文献中描述的针对大样本数据的著名全球度量实践进行对比。

更新时间: 2024-11-11 11:04:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.10760v4

A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective

Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to tackle the out-of-distribution problem. However, existing works often suffer from the constraint conflict issue when offline datasets are collected from multiple behavior policies, i.e., different behavior policies may exhibit inconsistent actions with distinct returns across the state space. To remedy this issue, recent advantage-weighted methods prioritize samples with high advantage values for agent training while inevitably ignoring the diversity of behavior policy. In this paper, we introduce a novel Advantage-Aware Policy Optimization (A2PO) method to explicitly construct advantage-aware policy constraints for offline learning under mixed-quality datasets. Specifically, A2PO employs a conditional variational auto-encoder to disentangle the action distributions of intertwined behavior policies by modeling the advantage values of all training data as conditional variables. Then the agent can follow such disentangled action distribution constraints to optimize the advantage-aware policy towards high advantage values. Extensive experiments conducted on both the single-quality and mixed-quality datasets of the D4RL benchmark demonstrate that A2PO yields results superior to the counterparts. Our code is available at https://github.com/Plankson/A2PO

Updated: 2024-11-11 10:59:52

标题: A2PO: 朝向有效的离线强化学习的优势感知视角

摘要: 离线强化学习致力于利用离线数据集来制定有效的智能体策略，而无需在线交互，通过在行为策略的支持下施加适当的保守约束来解决分布外问题。然而，现有的工作在离线数据集来自多个行为策略时往往会遇到约束冲突问题，即不同的行为策略可能在状态空间中展示出不一致的动作和不同的回报。为了解决这个问题，最近的加权优势方法优先考虑具有高优势值的样本进行智能体训练，但不可避免地忽略了行为策略的多样性。本文介绍了一种新的基于优势意识的策略优化（A2PO）方法，以明确构建离线学习中混合质量数据集的优势意识策略约束。具体而言，A2PO采用条件变分自编码器来解开纠缠的行为策略的动作分布，通过将所有训练数据的优势值建模为条件变量。然后，智能体可以遵循这些解开的动作分布约束来优化优势意识策略，以达到高优势值。对D4RL基准测试的单一质量和混合质量数据集进行的大量实验表明，A2PO的结果优于其他方法。我们的代码可在https://github.com/Plankson/A2PO 获取。

更新时间: 2024-11-11 10:59:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07262v4

Subgraph Retrieval Enhanced by Graph-Text Alignment for Commonsense Question Answering

Commonsense question answering is a crucial task that requires machines to employ reasoning according to commonsense. Previous studies predominantly employ an extracting-and-modeling paradigm to harness the information in KG, which first extracts relevant subgraphs based on pre-defined rules and then proceeds to design various strategies aiming to improve the representations and fusion of the extracted structural knowledge. Despite their effectiveness, there are still two challenges. On one hand, subgraphs extracted by rule-based methods may have the potential to overlook critical nodes and result in uncontrollable subgraph size. On the other hand, the misalignment between graph and text modalities undermines the effectiveness of knowledge fusion, ultimately impacting the task performance. To deal with the problems above, we propose a novel framework: \textbf{S}ubgraph R\textbf{E}trieval Enhanced by Gra\textbf{P}h-\textbf{T}ext \textbf{A}lignment, named \textbf{SEPTA}. Firstly, we transform the knowledge graph into a database of subgraph vectors and propose a BFS-style subgraph sampling strategy to avoid information loss, leveraging the analogy between BFS and the message-passing mechanism. In addition, we propose a bidirectional contrastive learning approach for graph-text alignment, which effectively enhances both subgraph retrieval and knowledge fusion. Finally, all the retrieved information is combined for reasoning in the prediction module. Extensive experiments on five datasets demonstrate the effectiveness and robustness of our framework.

Updated: 2024-11-11 10:57:31

标题: 子图检索增强的常识问答图文对齐

摘要: 常识问题回答是一项关键任务，需要机器根据常识进行推理。先前的研究主要采用提取和建模范式来利用知识图中的信息，首先基于预定义规则提取相关子图，然后设计各种策略以改进提取的结构知识的表示和融合。尽管这些方法有效，仍然存在两个挑战。一方面，基于规则的方法提取的子图可能会忽视关键节点，并导致无法控制的子图大小。另一方面，图和文本模态之间的不对齐影响了知识融合的有效性，最终影响了任务性能。为了解决以上问题，我们提出了一个新的框架：通过图文对齐增强的子图检索，命名为SEPTA。首先，我们将知识图转换为子图向量数据库，并提出了一种BFS风格的子图采样策略，以避免信息损失，利用BFS和消息传递机制之间的类比。此外，我们提出了一个双向对比学习方法，用于图文对齐，有效增强了子图检索和知识融合。最后，所有检索到的信息都结合在预测模块中进行推理。对五个数据集的大量实验证明了我们框架的有效性和鲁棒性。

更新时间: 2024-11-11 10:57:31

领域: cs.LG,cs.AI,cs.CL,cs.SI

下载: http://arxiv.org/abs/2411.06866v1

Computable Model-Independent Bounds for Adversarial Quantum Machine Learning

By leveraging the principles of quantum mechanics, QML opens doors to novel approaches in machine learning and offers potential speedup. However, machine learning models are well-documented to be vulnerable to malicious manipulations, and this susceptibility extends to the models of QML. This situation necessitates a thorough understanding of QML's resilience against adversarial attacks, particularly in an era where quantum computing capabilities are expanding. In this regard, this paper examines model-independent bounds on adversarial performance for QML. To the best of our knowledge, we introduce the first computation of an approximate lower bound for adversarial error when evaluating model resilience against sophisticated quantum-based adversarial attacks. Experimental results are compared to the computed bound, demonstrating the potential of QML models to achieve high robustness. In the best case, the experimental error is only 10% above the estimated bound, offering evidence of the inherent robustness of quantum models. This work not only advances our theoretical understanding of quantum model resilience but also provides a precise reference bound for the future development of robust QML algorithms.

Updated: 2024-11-11 10:56:31

标题: 对抗性量子机器学习的可计算模型无关界限

摘要: 通过利用量子力学的原理，量子机器学习（QML）为机器学习中的新方法打开了大门，并提供了潜在的加速。然而，已经有大量文献表明，机器学习模型容易受到恶意操纵的影响，这种脆弱性也延伸到了量子机器学习模型中。在量子计算能力不断扩展的时代，这种情况需要对QML的抵抗敌对攻击的能力进行深入理解。在这方面，本文研究了针对QML的对抗性性能的模型无关界限。据我们所知，我们首次计算了在评估模型抵抗复杂的基于量子的对抗性攻击时的对抗性错误的近似下限。实验结果与计算的界限进行了比较，展示了QML模型实现高鲁棒性的潜力。在最佳情况下，实验误差仅比估计的界限高出10％，证明了量子模型固有鲁棒性的证据。这项工作不仅推动了我们对量子模型韧性的理论理解，而且为未来发展具有鲁棒性的QML算法提供了精确的参考界限。

更新时间: 2024-11-11 10:56:31

领域: cs.LG,cs.AI,cs.ET,quant-ph

下载: http://arxiv.org/abs/2411.06863v1

Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models

Phishing attacks remain a persistent threat to online security, demanding robust detection methods. This study investigates the use of machine learning to identify phishing URLs, emphasizing the crucial role of feature selection and model interpretability for improved performance. Employing Recursive Feature Elimination, the research pinpointed key features like "length_url," "time_domain_activation" and "Page_rank" as strong indicators of phishing attempts. The study evaluated various algorithms, including CatBoost, XGBoost, and Explainable Boosting Machine, assessing their robustness and scalability. XGBoost emerged as highly efficient in terms of runtime, making it well-suited for large datasets. CatBoost, on the other hand, demonstrated resilience by maintaining high accuracy even with reduced features. To enhance transparency and trustworthiness, Explainable AI techniques, such as SHAP, were employed to provide insights into feature importance. The study's findings highlight that effective feature selection and model interpretability can significantly bolster phishing detection systems, paving the way for more efficient and adaptable defenses against evolving cyber threats

Updated: 2024-11-11 10:49:24

标题: 通过特征重要性分析和可解释性人工智能提高钓鱼网站检测：CatBoost、XGBoost和EBM模型的比较研究

摘要: 网络钓鱼攻击仍然是在线安全面临的持久威胁，需要健壮的检测方法。本研究调查了利用机器学习识别网络钓鱼URL的方法，强调了特征选择和模型可解释性在提高性能方面的关键作用。通过使用递归特征消除，研究确定了"length_url"、"time_domain_activation"和"Page_rank"等关键特征作为网络钓鱼尝试的强有力指标。该研究评估了包括CatBoost、XGBoost和可解释性增强机器在内的各种算法，评估它们的稳健性和可扩展性。 XGBoost在运行时间方面表现出高效率，使其非常适合大型数据集。另一方面，CatBoost表现出韧性，即使特征减少也能保持高准确性。为了增强透明度和可信度，本研究采用了可解释的AI技术，如SHAP，以提供有关特征重要性的见解。该研究的发现强调了有效的特征选择和模型可解释性可以显著增强网络钓鱼检测系统，为更高效和适应不断发展的网络威胁的防御措施铺平道路。

更新时间: 2024-11-11 10:49:24

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2411.06860v1

Solving Kernel Ridge Regression with Gradient Descent for a Non-Constant Kernel

Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the parameters. The solution can be obtained either as a closed-form solution, which includes solving a system of linear equations, or iteratively through gradient descent. Using the iterative approach opens up for changing the kernel during training, something that is investigated in this paper. We theoretically address the effects this has on model complexity and generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show theoretically and empirically that using a decreasing bandwidth, we are able to achieve both zero training error in combination with good generalization, and a double descent behavior, phenomena that do not occur for KRR with constant bandwidth but are known to appear for neural networks.

Updated: 2024-11-11 10:43:06

标题: 使用梯度下降法解决非恒定核的核岭回归

摘要: Kernel ridge regression（KRR）是线性岭回归的一种泛化，它在数据中是非线性的，但在参数中是线性的。解决方案可以通过求解一组线性方程来获得闭式解，也可以通过梯度下降迭代获得。使用迭代方法可以在训练过程中改变核函数，这是本文研究的内容。我们在理论上探讨了这对模型复杂性和泛化性能的影响。根据我们的发现，我们提出了一种用于平移不变核函数带宽更新的方案，其中在训练过程中让带宽减小到零，从而避免了需要进行超参数选择。我们在真实数据和合成数据上演示了在训练过程中减小带宽优于使用由交叉验证和边际似然最大化选择的常数带宽。我们还从理论和实证角度展示了，通过使用减小的带宽，我们能够实现训练误差为零并且泛化性能良好的组合，以及双谷行为，这是常数带宽的KRR不会出现但已知在神经网络中出现的现象。

更新时间: 2024-11-11 10:43:06

领域: stat.ML,cs.LG,math.OC,stat.ME

下载: http://arxiv.org/abs/2311.01762v2

Scientific machine learning in ecological systems: A study on the predator-prey dynamics

In this study, we apply two pillars of Scientific Machine Learning: Neural Ordinary Differential Equations (Neural ODEs) and Universal Differential Equations (UDEs) to the Lotka Volterra Predator Prey Model, a fundamental ecological model describing the dynamic interactions between predator and prey populations. The Lotka-Volterra model is critical for understanding ecological dynamics, population control, and species interactions, as it is represented by a system of differential equations. In this work, we aim to uncover the underlying differential equations without prior knowledge of the system, relying solely on training data and neural networks. Using robust modeling in the Julia programming language, we demonstrate that both Neural ODEs and UDEs can be effectively utilized for prediction and forecasting of the Lotka-Volterra system. More importantly, we introduce the forecasting breakdown point: the time at which forecasting fails for both Neural ODEs and UDEs. We observe how UDEs outperform Neural ODEs by effectively recovering the underlying dynamics and achieving accurate forecasting with significantly less training data. Additionally, we introduce Gaussian noise of varying magnitudes (from mild to high) to simulate real-world data perturbations and show that UDEs exhibit superior robustness, effectively recovering the underlying dynamics even in the presence of noisy data, while Neural ODEs struggle with high levels of noise. Through extensive hyperparameter optimization, we offer insights into neural network architectures, activation functions, and optimizers that yield the best results. This study opens the door to applying Scientific Machine Learning frameworks for forecasting tasks across a wide range of ecological and scientific domains.

Updated: 2024-11-11 10:40:45

标题: 生态系统中的科学机器学习：掠食者-猎物动态研究

摘要: 在这项研究中，我们应用了科学机器学习的两大支柱：神经常微分方程（Neural ODEs）和通用微分方程（UDEs）到洛特卡-沃尔特拉捕食-被捕食者模型，这是一个描述捕食者和被捕食者种群动态相互作用的基本生态模型。洛特卡-沃尔特拉模型对于理解生态动态、种群控制和物种相互作用至关重要，因为它由一组微分方程表示。在这项工作中，我们旨在在没有关于系统的先验知识的情况下，仅依靠训练数据和神经网络来揭示潜在的微分方程。通过在Julia编程语言中进行强大的建模，我们展示了神经ODE和UDE都可以有效地用于Lotka-Volterra系统的预测和预测。更重要的是，我们引入了预测崩溃点：即神经ODE和UDE都失败的预测时间点。我们观察到UDE通过有效地恢复潜在动态并使用大大较少的训练数据实现准确的预测而胜过神经ODE。此外，我们引入了不同幅度的高斯噪声（从轻微到高）来模拟真实世界的数据扰动，并展示UDE展现出卓越的稳健性，即使在存在嘈杂数据的情况下也能有效地恢复潜在动态，而神经ODE在高水平的噪声下表现不佳。通过广泛的超参数优化，我们提供了有关神经网络架构、激活函数和优化器的见解，以获得最佳结果。这项研究为在广泛的生态和科学领域中应用科学机器学习框架进行预测任务打开了大门。

更新时间: 2024-11-11 10:40:45

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2411.06858v1

The Multiple Dimensions of Spuriousness in Machine Learning

Learning correlations from data forms the foundation of today's machine learning (ML) and artificial intelligence (AI) research. While such an approach enables the automatic discovery of patterned relationships within big data corpora, it is susceptible to failure modes when unintended correlations are captured. This vulnerability has expanded interest in interrogating spuriousness, often critiqued as an impediment to model performance, fairness, and robustness. In this article, we trace deviations from the conventional definition of statistical spuriousness-which denotes a non-causal observation arising from either coincidence or confounding variables-to articulate how ML researchers make sense of spuriousness in practice. Drawing on a broad survey of ML literature, we conceptualize the "multiple dimensions of spuriousness," encompassing: relevance ("Models should only use correlations that are relevant to the task."), generalizability ("Models should only use correlations that generalize to unseen data"), human-likeness ("Models should only use correlations that a human would use to perform the same task"), and harmfulness ("Models should only use correlations that are not harmful"). These dimensions demonstrate that ML spuriousness goes beyond the causal/non-causal dichotomy and that the disparate interpretative paths researchers choose could meaningfully influence the trajectory of ML development. By underscoring how a fundamental problem in ML is contingently negotiated in research contexts, we contribute to ongoing debates about responsible practices in AI development.

Updated: 2024-11-11 10:38:39

标题: 机器学习中虚假性的多个维度

摘要: 学习数据中的相关性构成了当今机器学习（ML）和人工智能（AI）研究的基础。虽然这种方法使得在大数据语料库中自动发现模式化关系成为可能，但当意外相关性被捕捉时，这种方法容易出现故障模式。这种脆弱性扩大了对虚假性的关注，常被批评为模型性能、公平性和稳健性的障碍。在本文中，我们跟踪传统统计虚假性定义的偏离-表明非因果观察是由巧合或混杂变量引起的，以阐明ML研究人员如何实践中理解虚假性。借鉴ML文献的广泛调查，我们概念化了"虚假性的多个维度"，包括：相关性（"模型应仅使用与任务相关的相关性"），泛化性（"模型应仅使用能泛化到未见数据的相关性"），类人性（"模型应仅使用人类在执行相同任务时会使用的相关性"）和有害性（"模型应仅使用不具有害性的相关性"）。这些维度表明，ML虚假性超越了因果/非因果的二元对立，并且研究人员选择的不同解释路径可能会对ML发展的轨迹产生有意义的影响。通过强调ML中一个基本问题是如何在研究环境中有条件地协商的，我们为AI发展中的负责实践的持续辩论做出了贡献。

更新时间: 2024-11-11 10:38:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.04696v2

Evaluating Large Language Models on Financial Report Summarization: An Empirical Study

In recent years, Large Language Models (LLMs) have demonstrated remarkable versatility across various applications, including natural language understanding, domain-specific knowledge tasks, etc. However, applying LLMs to complex, high-stakes domains like finance requires rigorous evaluation to ensure reliability, accuracy, and compliance with industry standards. To address this need, we conduct a comprehensive and comparative study on three state-of-the-art LLMs, GLM-4, Mistral-NeMo, and LLaMA3.1, focusing on their effectiveness in generating automated financial reports. Our primary motivation is to explore how these models can be harnessed within finance, a field demanding precision, contextual relevance, and robustness against erroneous or misleading information. By examining each model's capabilities, we aim to provide an insightful assessment of their strengths and limitations. Our paper offers benchmarks for financial report analysis, encompassing proposed metrics such as ROUGE-1, BERT Score, and LLM Score. We introduce an innovative evaluation framework that integrates both quantitative metrics (e.g., precision, recall) and qualitative analyses (e.g., contextual fit, consistency) to provide a holistic view of each model's output quality. Additionally, we make our financial dataset publicly available, inviting researchers and practitioners to leverage, scrutinize, and enhance our findings through broader community engagement and collaborative improvement. Our dataset is available on huggingface.

Updated: 2024-11-11 10:36:04

标题: 在财务报告摘要上评估大型语言模型：一项实证研究

摘要: 近年来，大型语言模型（LLMs）在各种应用中展示了出色的多功能性，包括自然语言理解、领域特定知识任务等。然而，将LLMs应用于金融等复杂、高风险领域需要严格的评估，以确保可靠性、准确性和符合行业标准。为了解决这一需求，我们对三种最先进的LLMs，GLM-4、Mistral-NeMo和LLaMA3.1进行了全面比较研究，重点关注它们在生成自动化财务报告方面的有效性。我们的主要动机是探索这些模型如何在金融领域中得到利用，这是一个需要精确性、上下文相关性以及抗拒错误或误导信息的领域。通过检查每个模型的能力，我们旨在提供对其优势和局限性的深入评估。我们的论文提供了金融报告分析的基准，包括提出的指标如ROUGE-1、BERT分数和LLM分数。我们引入了一种创新的评估框架，将定量指标（例如精确度、召回率）和定性分析（例如上下文适应性、一致性）结合起来，以提供对每个模型输出质量的全面视图。此外，我们公开提供了我们的财务数据集，邀请研究人员和从业者通过更广泛的社区参与和协作改进来利用、审查和增强我们的发现。我们的数据集可以在huggingface上获得。

更新时间: 2024-11-11 10:36:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.06852v1

Fast and Efficient Transformer-based Method for Bird's Eye View Instance Prediction

Accurate object detection and prediction are critical to ensure the safety and efficiency of self-driving architectures. Predicting object trajectories and occupancy enables autonomous vehicles to anticipate movements and make decisions with future information, increasing their adaptability and reducing the risk of accidents. Current State-Of-The-Art (SOTA) approaches often isolate the detection, tracking, and prediction stages, which can lead to significant prediction errors due to accumulated inaccuracies between stages. Recent advances have improved the feature representation of multi-camera perception systems through Bird's-Eye View (BEV) transformations, boosting the development of end-to-end systems capable of predicting environmental elements directly from vehicle sensor data. These systems, however, often suffer from high processing times and number of parameters, creating challenges for real-world deployment. To address these issues, this paper introduces a novel BEV instance prediction architecture based on a simplified paradigm that relies only on instance segmentation and flow prediction. The proposed system prioritizes speed, aiming at reduced parameter counts and inference times compared to existing SOTA architectures, thanks to the incorporation of an efficient transformer-based architecture. Furthermore, the implementation of the proposed architecture is optimized for performance improvements in PyTorch version 2.1. Code and trained models are available at https://github.com/miguelag99/Efficient-Instance-Prediction

Updated: 2024-11-11 10:35:23

标题: 快速高效的基于Transformer的鸟瞰视角实例预测方法

摘要: 精确的物体检测和预测对于确保自动驾驶架构的安全性和效率至关重要。预测物体轨迹和占用情况使自动驾驶车辆能够预测移动并根据未来信息做出决策，增加其适应性并降低事故风险。当前的最新技术方法通常将检测、跟踪和预测阶段分离，这可能会导致由于各阶段之间的累积不准确性而产生显著的预测错误。最近的进展通过鸟瞰图转换改进了多摄像头感知系统的特征表示，推动了能够直接从车辆传感器数据预测环境元素的端到端系统的发展。然而，这些系统往往会面临处理时间长和参数数量多的问题，给实际部署带来挑战。为了解决这些问题，本文介绍了一种基于简化范例的新型鸟瞰图实例预测架构，仅依赖于实例分割和流预测。所提出的系统优先考虑速度，旨在减少参数数量和推理时间，相较于现有的最新技术架构，这要归功于高效的基于变换器的架构的引入。此外，所提出的架构的实现经过了针对性的优化，以提升在PyTorch版本2.1中的性能。代码和训练模型可在https://github.com/miguelag99/Efficient-Instance-Prediction获取。

更新时间: 2024-11-11 10:35:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.06851v1

1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs

This paper presents a detailed system description of our entry for the CHiPSAL 2025 shared task, focusing on language detection, hate speech identification, and target detection in Devanagari script languages. We experimented with a combination of large language models and their ensembles, including MuRIL, IndicBERT, and Gemma-2, and leveraged unique techniques like focal loss to address challenges in the natural understanding of Devanagari languages, such as multilingual processing and class imbalance. Our approach achieved competitive results across all tasks: F1 of 0.9980, 0.7652, and 0.6804 for Sub-tasks A, B, and C respectively. This work provides insights into the effectiveness of transformer models in tasks with domain-specific and linguistic challenges, as well as areas for potential improvement in future iterations.

Updated: 2024-11-11 10:34:36

标题: 1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: 使用LLMs检测语言、仇恨言论和目标

摘要: 这篇论文详细描述了我们参加CHiPSAL 2025共享任务的系统，重点关注语言检测、仇恨言论识别和目标检测在天城文脚本语言中的应用。我们尝试了大型语言模型及其集成，包括MuRIL、IndicBERT和Gemma-2，并利用焦点损失等独特技术来解决天城语言自然理解中的挑战，例如多语言处理和类别不平衡。我们的方法在所有任务中取得了竞争性结果：子任务A、B和C的F1分别为0.9980、0.7652和0.6804。这项工作为转换器模型在具有领域特定和语言挑战的任务中的有效性提供了见解，同时也提出了未来改进的潜在领域。

更新时间: 2024-11-11 10:34:36

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.06850v1

Generative Feature Training of Thin 2-Layer Networks

We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on the squared loss and small datasets. Due to the highly non-convex energy landscape, gradient-based training often suffers from local minima. As a remedy, we initialize the hidden weights with samples from a learned proposal distribution, which we parameterize as a deep generative model. To train this model, we exploit the fact that with fixed hidden weights, the optimal output weights solve a linear equation. After learning the generative model, we refine the sampled weights with a gradient-based post-processing in the latent space. Here, we also include a regularization scheme to counteract potential noise. Finally, we demonstrate the effectiveness of our approach by numerical examples.

Updated: 2024-11-11 10:32:33

标题: 生成特征训练的薄型2层网络

摘要: 我们考虑使用具有少量隐藏权重的2层神经网络基于平方损失和小数据集来逼近函数。由于高度非凸的能量景观，基于梯度的训练经常受到局部最小值的影响。为了解决这个问题，我们使用从学习的提议分布中抽样的隐藏权重进行初始化，我们将其参数化为深度生成模型。为了训练这个模型，我们利用了一个事实，即在固定隐藏权重的情况下，最优输出权重解决一个线性方程。在学习生成模型之后，我们在潜在空间中使用基于梯度的后处理来优化抽样的权重。在这里，我们还包括一个正则化方案来对抗潜在的噪音。最后，我们通过数值示例展示了我们方法的有效性。

更新时间: 2024-11-11 10:32:33

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2411.06848v1

GLinSAT: The General Linear Satisfiability Neural Network Layer By Accelerated Gradient Descent

Ensuring that the outputs of neural networks satisfy specific constraints is crucial for applying neural networks to real-life decision-making problems. In this paper, we consider making a batch of neural network outputs satisfy bounded and general linear constraints. We first reformulate the neural network output projection problem as an entropy-regularized linear programming problem. We show that such a problem can be equivalently transformed into an unconstrained convex optimization problem with Lipschitz continuous gradient according to the duality theorem. Then, based on an accelerated gradient descent algorithm with numerical performance enhancement, we present our architecture, GLinSAT, to solve the problem. To the best of our knowledge, this is the first general linear satisfiability layer in which all the operations are differentiable and matrix-factorization-free. Despite the fact that we can explicitly perform backpropagation based on automatic differentiation mechanism, we also provide an alternative approach in GLinSAT to calculate the derivatives based on implicit differentiation of the optimality condition. Experimental results on constrained traveling salesman problems, partial graph matching with outliers, predictive portfolio allocation and power system unit commitment demonstrate the advantages of GLinSAT over existing satisfiability layers. Our implementation is available at \url{https://github.com/HunterTracer/GLinSAT}.

Updated: 2024-11-11 10:17:00

标题: GLinSAT：通过加速梯度下降实现的一般线性可满足性神经网络层

摘要: 确保神经网络的输出满足特定约束对于将神经网络应用于现实生活中的决策问题至关重要。在本文中，我们考虑让一批神经网络输出满足有界和一般的线性约束。我们首先将神经网络输出投影问题重新表述为一个熵正则化的线性规划问题。我们展示了这样一个问题可以根据对偶定理等价地转化为一个具有Lipschitz连续梯度的无约束凸优化问题。然后，基于一个带有数值性能增强的加速梯度下降算法，我们提出了我们的架构GLinSAT来解决这个问题。据我们所知，这是第一个全部操作都可微分且不需要矩阵分解的一般线性可满足性层。尽管我们可以显式地基于自动微分机制执行反向传播，但在GLinSAT中我们还提供了一种基于最优性条件的隐式微分方法来计算导数。在受约束的旅行推销员问题、带离群值的部分图匹配、预测性投资组合分配和电力系统单元分配等实验结果表明GLinSAT相对于现有的可满足性层具有优势。我们的实现可在\url{https://github.com/HunterTracer/GLinSAT}找到。

更新时间: 2024-11-11 10:17:00

领域: cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2409.17500v2

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge from a large language model (LLM) teacher to a compact student. Initially, we revisit the knowledge distillation (KD) and low-rank adaption (LoRA), and argue that they share the same paradigm. Inspired by this observation, we explore the strategy that combines LoRA and KD to enhance the efficiency of knowledge transfer. We first summarize some guidelines for this design and further develop the LLM-Neo. Experimental results on compressing Llama 2 and Llama 3 show that LLM-Neo outperforms various baselines. Further analysis demonstrates the robustness of the proposed LLM-Neo on variants of LoRA. The trained models have been available at \href{https://huggingface.co/collections/yang31210999/llm-neo-66e3c882f5579b829ff57eba}{this repository}.

Updated: 2024-11-11 10:07:51

标题: LLM-Neo：用于大型语言模型的参数高效知识蒸馏

摘要: 在本文中，我们提出了一种新颖的LLM-Neo框架，可以有效地将知识从大型语言模型（LLM）教师转移到紧凑的学生。最初，我们重新审视了知识蒸馏（KD）和低秩调整（LoRA），并认为它们共享相同的范式。受到这一观察的启发，我们探讨了结合LoRA和KD的策略，以增强知识转移的效率。我们首先总结了一些设计指南，并进一步开发了LLM-Neo。对压缩Llama 2和Llama 3的实验结果表明，LLM-Neo优于各种基线。进一步分析表明，所提出的LLM-Neo在LoRA的各种变体上具有鲁棒性。训练好的模型可以在\href{https://huggingface.co/collections/yang31210999/llm-neo-66e3c882f5579b829ff57eba}{此存储库}中找到。

更新时间: 2024-11-11 10:07:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.06839v1

Spatially Constrained Transformer with Efficient Global Relation Modelling for Spatio-Temporal Prediction

Accurate spatio-temporal prediction is crucial for the sustainable development of smart cities. However, current approaches often struggle to capture important spatio-temporal relationships, particularly overlooking global relations among distant city regions. Most existing techniques predominantly rely on Convolutional Neural Networks (CNNs) to capture global relations. However, CNNs exhibit neighbourhood bias, making them insufficient for capturing distant relations. To address this limitation, we propose ST-SampleNet, a novel transformer-based architecture that combines CNNs with self-attention mechanisms to capture both local and global relations effectively. Moreover, as the number of regions increases, the quadratic complexity of self-attention becomes a challenge. To tackle this issue, we introduce a lightweight region sampling strategy that prunes non-essential regions and enhances the efficiency of our approach. Furthermore, we introduce a spatially constrained position embedding that incorporates spatial neighbourhood information into the self-attention mechanism, aiding in semantic interpretation and improving the performance of ST-SampleNet. Our experimental evaluation on three real-world datasets demonstrates the effectiveness of ST-SampleNet. Additionally, our efficient variant achieves a 40% reduction in computational costs with only a marginal compromise in performance, approximately 1%.

Updated: 2024-11-11 10:03:59

标题: 具有高效全局关系建模的空间约束变压器用于时空预测

摘要: 准确的时空预测对智能城市的可持续发展至关重要。然而，当前的方法通常难以捕捉重要的时空关系，特别是忽视了远距离城市地区之间的全局关系。大多数现有技术主要依赖卷积神经网络（CNN）来捕捉全局关系。然而，CNN存在邻域偏差，使其不足以捕捉遥远的关系。为解决这一限制，我们提出了ST-SampleNet，这是一种基于变压器的新型架构，将CNN与自注意机制结合起来，有效地捕捉本地和全局关系。此外，随着区域数量的增加，自注意力的二次复杂性成为一个挑战。为了解决这个问题，我们引入了一种轻量级的区域抽样策略，修剪非必要的区域，并增强我们方法的效率。此外，我们引入了一种空间约束的位置嵌入，将空间邻域信息结合到自注意机制中，有助于语义解释并提高ST-SampleNet的性能。我们对三个真实数据集进行的实验评估表明了ST-SampleNet的有效性。此外，我们的高效变体在计算成本上实现了40%的降低，仅在性能上有微小的妥协，约为1%。

更新时间: 2024-11-11 10:03:59

领域: cs.LG

下载: http://arxiv.org/abs/2411.06836v1

Multimodal Structure-Aware Quantum Data Processing

While large language models (LLMs) have advanced the field of natural language processing (NLP), their "black box" nature obscures their decision-making processes. To address this, researchers developed structured approaches using higher order tensors. These are able to model linguistic relations, but stall when training on classical computers due to their excessive size. Tensors are natural inhabitants of quantum systems and training on quantum computers provides a solution by translating text to variational quantum circuits. In this paper, we develop MultiQ-NLP: a framework for structure-aware data processing with multimodal text+image data. Here, "structure" refers to syntactic and grammatical relationships in language, as well as the hierarchical organization of visual elements in images. We enrich the translation with new types and type homomorphisms and develop novel architectures to represent structure. When tested on a main stream image classification task (SVO Probes), our best model showed a par performance with the state of the art classical models; moreover the best model was fully structured.

Updated: 2024-11-11 10:03:47

标题: 多模态结构感知量子数据处理

摘要: 尽管大型语言模型（LLMs）推动了自然语言处理（NLP）领域的发展，但它们的“黑匣子”性质掩盖了它们的决策过程。为了解决这个问题，研究人员开发了使用高阶张量的结构化方法。这些方法能够建模语言关系，但由于其庞大的规模，在经典计算机上训练时会受到阻碍。张量是量子系统的自然成员，通过在量子计算机上训练，可以将文本转换为变分量子电路，从而提供了解决方案。在本文中，我们开发了MultiQ-NLP：一个用于多模态文本+图像数据结构感知数据处理的框架。这里，“结构”指的是语言中的句法和语法关系，以及图像中视觉元素的分层组织。我们通过引入新的类型和类型同态，并开发新颖的架构来表示结构，丰富了翻译过程。在进行主流图像分类任务（SVO探针）测试时，我们的最佳模型表现出与现有最先进经典模型相当的性能；此外，最佳模型是完全结构化的。

更新时间: 2024-11-11 10:03:47

领域: cs.LG,68T45, 68T50, 68Q12, 68U15, 68U10, 81P45, 81P68,I.2.7; I.2.10; H.5.1

下载: http://arxiv.org/abs/2411.04242v3

HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment

With the introduction of the transformers architecture, LLMs have revolutionized the NLP field with ever more powerful models. Nevertheless, their development came up with several challenges. The exponential growth in computational power and reasoning capabilities of language models has heightened concerns about their security. As models become more powerful, ensuring their safety has become a crucial focus in research. This paper aims to address gaps in the current literature on jailbreaking techniques and the evaluation of LLM vulnerabilities. Our contributions include the creation of a novel dataset designed to assess the harmfulness of model outputs across multiple harm levels, as well as a focus on fine-grained harm-level analysis. Using this framework, we provide a comprehensive benchmark of state-of-the-art jailbreaking attacks, specifically targeting the Vicuna 13B v1.5 model. Additionally, we examine how quantization techniques, such as AWQ and GPTQ, influence the alignment and robustness of models, revealing trade-offs between enhanced robustness with regards to transfer attacks and potential increases in vulnerability on direct ones. This study aims to demonstrate the influence of harmful input queries on the complexity of jailbreaking techniques, as well as to deepen our understanding of LLM vulnerabilities and improve methods for assessing model robustness when confronted with harmful content, particularly in the context of compression strategies.

Updated: 2024-11-11 10:02:49

标题: HarmLevelBench: 评估伤害级别的合规性和量化对模型对齐的影响

摘要: 随着Transformer架构的引入，大型语言模型（LLMs）已经以更加强大的模型彻底改变了自然语言处理领域。然而，它们的发展也带来了一些挑战。语言模型在计算能力和推理能力方面的指数增长加剧了人们对其安全性的担忧。随着模型变得越来越强大，确保其安全性已经成为研究的重点。本文旨在填补当前文献中关于越狱技术和LLM漏洞评估的空白。我们的贡献包括创建一个新颖的数据集，旨在评估模型在多个伤害级别上的有害性，并专注于细粒度伤害级别分析。利用这一框架，我们提供了一个针对Vicuna 13B v1.5模型的最新越狱攻击的综合基准。此外，我们还研究了量化技术（如AWQ和GPTQ）如何影响模型的对齐性和稳健性，揭示了在转移攻击方面增强稳健性与直接攻击潜在增加漏洞之间的权衡。本研究旨在展示有害输入查询对越狱技术复杂性的影响，深化我们对LLM漏洞的理解，并改进在面对有害内容时评估模型稳健性的方法，尤其是在压缩策略的背景下。

更新时间: 2024-11-11 10:02:49

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2411.06835v1

MacroSwarm: A Field-based Compositional Framework for Swarm Programming

Swarm behaviour engineering is an area of research that seeks to investigate methods and techniques for coordinating computation and action within groups of simple agents to achieve complex global goals like pattern formation, collective movement, clustering, and distributed sensing. Despite recent progress in the analysis and engineering of swarms (of drones, robots, vehicles), there is still a need for general design and implementation methods and tools that can be used to define complex swarm behaviour in a principled way. To contribute to this quest, this article proposes a new field-based coordination approach, called MacroSwarm, to design and program swarm behaviour in terms of reusable and fully composable functional blocks embedding collective computation and coordination. Based on the macroprogramming paradigm of aggregate computing, MacroSwarm builds on the idea of expressing each swarm behaviour block as a pure function, mapping sensing fields into actuation goal fields, e.g., including movement vectors. In order to demonstrate the expressiveness, compositionality, and practicality of MacroSwarm as a framework for swarm programming, we perform a variety of simulations covering common patterns of flocking, pattern formation, and collective decision-making. The implications of the inherent self-stabilisation properties of field-based computations in MacroSwarm are discussed, which formally guarantee some resilience properties and guided the design of the library.

Updated: 2024-11-11 09:51:28

标题: MacroSwarm：一种面向领域的群体编程组成框架

摘要: Swarm行为工程是一个研究领域，旨在探讨协调计算和行动的方法和技术，以便在简单代理之间实现复杂的全局目标，如模式形成、集体运动、聚类和分布式感知。尽管在分析和工程领域取得了一些进展（例如对无人机、机器人和车辆的研究），但仍然需要通用的设计和实施方法和工具，用于以原则性方式定义复杂的Swarm行为。为了为此提供贡献，本文提出了一种名为MacroSwarm的基于场的协调方法，用于设计和编程Swarm行为，以可重用和完全可组合的功能块为基础，嵌入集体计算和协调。基于集合计算的宏编程范式，MacroSwarm建立在将每个Swarm行为块表达为纯函数的理念之上，将感知场映射为执行目标场，例如包括移动向量。为了展示MacroSwarm作为Swarm编程框架的表达能力、可组合性和实用性，我们进行了各种模拟，涵盖了常见的群集、模式形成和集体决策模式。讨论了基于场的计算在MacroSwarm中固有的自稳定性属性的影响，这些属性在形式上保证了一些弹性属性，并指导了库的设计。

更新时间: 2024-11-11 09:51:28

领域: cs.AI,cs.LO,cs.SE

下载: http://arxiv.org/abs/2401.10969v2

Learning Interpretable Network Dynamics via Universal Neural Symbolic Regression

Discovering governing equations of complex network dynamics is a fundamental challenge in contemporary science with rich data, which can uncover the mysterious patterns and mechanisms of the formation and evolution of complex phenomena in various fields and assist in decision-making. In this work, we develop a universal computational tool that can automatically, efficiently, and accurately learn the symbolic changing patterns of complex system states by combining the excellent fitting ability from deep learning and the equation inference ability from pre-trained symbolic regression. We conduct intensive experimental verifications on more than ten representative scenarios from physics, biochemistry, ecology, epidemiology, etc. Results demonstrate the outstanding effectiveness and efficiency of our tool by comparing with the state-of-the-art symbolic regression techniques for network dynamics. The application to real-world systems including global epidemic transmission and pedestrian movements has verified its practical applicability. We believe that our tool can serve as a universal solution to dispel the fog of hidden mechanisms of changes in complex phenomena, advance toward interpretability, and inspire more scientific discoveries.

Updated: 2024-11-11 09:51:22

标题: 通过通用神经符号回归学习可解释的网络动态

摘要: 在当今科学领域中，发现复杂网络动态的控制方程是一个基本挑战，而丰富的数据可以揭示各个领域复杂现象形成和演化的神秘模式和机制，并有助于决策。在这项工作中，我们开发了一种通用的计算工具，通过结合深度学习的优秀拟合能力和预先训练的符号回归的方程推理能力，可以自动、高效、准确地学习复杂系统状态的符号变化模式。我们对来自物理学、生物化学、生态学、流行病学等领域的十多个代表性场景进行了大量实验验证。结果表明，与网络动态的最新符号回归技术相比，我们的工具具有出色的效果和效率。应用于包括全球传染病传播和行人运动在内的真实系统已经验证了其实际适用性。我们相信，我们的工具可以作为一个通用解决方案，消除复杂现象变化中隐藏机制的雾气，推进可解释性，并激发更多科学发现。

更新时间: 2024-11-11 09:51:22

领域: cs.AI,cs.LG,cs.MA,cs.SC

下载: http://arxiv.org/abs/2411.06833v1

Optimized Quality of Service prediction in FSO Links over South Africa using Ensemble Learning

Fibre optic communication system is expected to increase exponentially in terms of application due to the numerous advantages over copper wires. The optical network evolution presents several advantages such as over long-distance, low-power requirement, higher carrying capacity and high bandwidth among others Such network bandwidth surpasses methods of transmission that include copper cables and microwaves. Despite these benefits, free-space optical communications are severely impacted by harsh weather situations like mist, precipitation, blizzard, fume, soil, and drizzle debris in the atmosphere, all of which have an impact on the Quality of Service (QoS) rendered by the systems. The primary goal of this article is to optimize the QoS using the ensemble learning models Random Forest, ADaBoost Regression, Stacking Regression, Gradient Boost Regression, and Multilayer Neural Network. To accomplish the stated goal, meteorological data, visibility, wind speed, and altitude were obtained from the South Africa Weather Services archive during a ten-year period (2010 to 2019) at four different locations: Polokwane, Kimberley, Bloemfontein, and George. We estimated the data rate, power received, fog-induced attenuation, bit error rate and power penalty using the collected and processed data. The RMSE and R-squared values of the model across all the study locations, Polokwane, Kimberley, Bloemfontein, and George, are 0.0073 and 0.9951, 0.0065 and 0.9998, 0.0060 and 0.9941, and 0.0032 and 0.9906, respectively. The result showed that using ensemble learning techniques in transmission modeling can significantly enhance service quality and meet customer service level agreements and ensemble method was successful in efficiently optimizing the signal to noise ratio, which in turn enhanced the QoS at the point of reception.

Updated: 2024-11-11 09:48:38

标题: 在南非使用集成学习对FSO链路中的服务质量进行优化预测

摘要: 光纤通信系统预计在应用方面呈指数增长，因为它具有许多优势，超过铜线。光网络演进具有多项优势，如远距离传输、低功率需求、更高的承载能力和高带宽等等。此类网络带宽超过了铜缆和微波等传输方法。尽管具有这些优势，自由空间光通信受到恶劣天气条件的严重影响，如薄雾、降水、暴风雪、烟雾、土壤和细雨等大气中的杂质，所有这些都对系统提供的服务质量（QoS）产生影响。本文的主要目标是利用集成学习模型随机森林、ADaBoost回归、堆叠回归、梯度提升回归和多层神经网络来优化QoS。为了实现既定目标，从南非气象服务机构的档案中获取了十年（2010年至2019年）期间在Polokwane、Kimberley、Bloemfontein和George四个不同位置的气象数据、能见度、风速和海拔。我们利用收集和处理的数据估算了数据速率、接收功率、雾引起的衰减、误比特率和功率惩罚。模型在所有研究地点Polokwane、Kimberley、Bloemfontein和George的RMSE和R平方值分别为0.0073和0.9951、0.0065和0.9998、0.0060和0.9941、0.0032和0.9906。结果表明，使用集成学习技术在传输建模中可以显著提高服务质量，达到客户服务水平协议，并且集成方法成功地高效优化了信噪比，从而增强了接收端的QoS。

更新时间: 2024-11-11 09:48:38

领域: stat.ML,cs.LG,eess.SP,physics.optics

下载: http://arxiv.org/abs/2411.06832v1

Adaptive Conditional Expert Selection Network for Multi-domain Recommendation

Mixture-of-Experts (MOE) has recently become the de facto standard in Multi-domain recommendation (MDR) due to its powerful expressive ability. However, such MOE-based method typically employs all experts for each instance, leading to scalability issue and low-discriminability between domains and experts. Furthermore, the design of commonly used domain-specific networks exacerbates the scalability issues. To tackle the problems, We propose a novel method named CESAA consists of Conditional Expert Selection (CES) Module and Adaptive Expert Aggregation (AEA) Module to tackle these challenges. Specifically, CES first combines a sparse gating strategy with domain-shared experts. Then AEA utilizes mutual information loss to strengthen the correlations between experts and specific domains, and significantly improve the distinction between experts. As a result, only domain-shared experts and selected domain-specific experts are activated for each instance, striking a balance between computational efficiency and model performance. Experimental results on both public ranking and industrial retrieval datasets verify the effectiveness of our method in MDR tasks.

Updated: 2024-11-11 09:39:31

标题: 多领域推荐的自适应条件专家选择网络

摘要: 混合专家模型（MOE）最近已成为多领域推荐（MDR）中的事实标准，因其强大的表达能力。然而，基于MOE的方法通常为每个实例使用所有专家，导致可扩展性问题和领域与专家之间的低区分性。此外，常用的领域特定网络设计加剧了可扩展性问题。为了解决这些问题，我们提出了一种名为CESAA的新方法，由条件专家选择（CES）模块和自适应专家聚合（AEA）模块组成，以解决这些挑战。具体而言，CES首先将稀疏门控策略与领域共享的专家结合起来。然后AEA利用互信息损失加强专家和特定领域之间的相关性，并显著改善专家之间的区分度。结果，每个实例仅激活领域共享专家和选择的领域特定专家，实现计算效率和模型性能之间的平衡。对公共排名和工业检索数据集的实验结果验证了我们在MDR任务中的方法的有效性。

更新时间: 2024-11-11 09:39:31

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2411.06826v1

GenCode: A Generic Data Augmentation Framework for Boosting Deep Learning-Based Code Understanding

Pre-trained code models lead the era of code intelligence with multiple models have been designed with impressive performance. However, one important problem, data augmentation for code data that automatically helps developers prepare training data lacks study in this field. In this paper, we introduce a generic data augmentation framework, GenCode, to enhance the training of code understanding models. Simply speaking, GenCode follows a generation-and-selection paradigm to prepare useful training code data. Specifically, it employs code transformation techniques to generate new code candidates first and then selects important ones as the training data by importance metrics. To evaluate the effectiveness of GenCode, we conduct experiments on four code understanding tasks (e.g., code clone detection) and three pre-trained code models (e.g., CodeT5). Compared to the state-of-the-art (SOTA) code augmentation method, MixCode, GenCode produces code models with 2.92% higher accuracy and 4.90% robustness on average.

Updated: 2024-11-11 09:38:23

标题: GenCode：用于提升基于深度学习的代码理解的通用数据增强框架

摘要: 预训练的代码模型引领了代码智能时代，已设计出多个性能出色的模型。然而，一个重要的问题是，自动帮助开发人员准备训练数据的代码数据增强在这一领域缺乏研究。本文介绍了一个通用的数据增强框架GenCode，以增强代码理解模型的训练。简而言之，GenCode遵循一种生成和选择范式来准备有用的训练代码数据。具体来说，它利用代码转换技术首先生成新的代码候选，然后通过重要性度量选择重要的代码作为训练数据。为了评估GenCode的有效性，我们在四个代码理解任务（例如，代码克隆检测）和三个预训练代码模型（例如，CodeT5）上进行实验。与最先进的代码增强方法MixCode相比，GenCode平均提高了2.92%的准确率和4.90%的鲁棒性。

更新时间: 2024-11-11 09:38:23

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2402.15769v2

TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classification

In time-series classification, understanding model decisions is crucial for their application in high-stakes domains such as healthcare and finance. Counterfactual explanations, which provide insights by presenting alternative inputs that change model predictions, offer a promising solution. However, existing methods for generating counterfactual explanations for time-series data often struggle with balancing key objectives like proximity, sparsity, and validity. In this paper, we introduce TX-Gen, a novel algorithm for generating counterfactual explanations based on the Non-dominated Sorting Genetic Algorithm II (NSGA-II). TX-Gen leverages evolutionary multi-objective optimization to find a diverse set of counterfactuals that are both sparse and valid, while maintaining minimal dissimilarity to the original time series. By incorporating a flexible reference-guided mechanism, our method improves the plausibility and interpretability of the counterfactuals without relying on predefined assumptions. Extensive experiments on benchmark datasets demonstrate that TX-Gen outperforms existing methods in generating high-quality counterfactuals, making time-series models more transparent and interpretable.

Updated: 2024-11-11 09:37:09

标题: TX-Gen：用于稀疏反事实解释的时间序列分类的多目标优化

摘要: 在时间序列分类中，理解模型决策对于其在高风险领域（如医疗保健和金融）的应用至关重要。提供替代输入以改变模型预测、从而提供洞见的反事实解释提供了一种有前途的解决方案。然而，现有的用于生成时间序列数据的反事实解释的方法往往在平衡关键目标（如接近度、稀疏性和有效性）方面存在困难。在本文中，我们介绍了一种基于非支配排序遗传算法II（NSGA-II）的新算法TX-Gen，用于生成反事实解释。TX-Gen利用进化多目标优化来找到一组既稀疏又有效的反事实解释，同时保持与原始时间序列的最小差异。通过结合灵活的参考引导机制，我们的方法提高了反事实解释的可信度和可解释性，而不依赖于预定义的假设。对基准数据集的大量实验表明，TX-Gen在生成高质量反事实解释方面优于现有方法，使时间序列模型更加透明和可解释。

更新时间: 2024-11-11 09:37:09

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2409.09461v2

Reward is not enough: can we liberate AI from the reinforcement learning paradigm?

I present arguments against the hypothesis put forward by Silver, Singh, Precup, and Sutton ( https://www.sciencedirect.com/science/article/pii/S0004370221000862 ) : reward maximization is not enough to explain many activities associated with natural and artificial intelligence including knowledge, learning, perception, social intelligence, evolution, language, generalisation and imitation. I show such reductio ad lucrum has its intellectual origins in the political economy of Homo economicus and substantially overlaps with the radical version of behaviourism. I show why the reinforcement learning paradigm, despite its demonstrable usefulness in some practical application, is an incomplete framework for intelligence -- natural and artificial. Complexities of intelligent behaviour are not simply second-order complications on top of reward maximisation. This fact has profound implications for the development of practically usable, smart, safe and robust artificially intelligent agents.

Updated: 2024-11-11 09:34:57

标题: 奖励不足：我们能否让人工智能摆脱强化学习范式？

摘要: 我反驳Silver、Singh、Precup和Sutton提出的假设：奖励最大化无法充分解释与自然和人工智能相关的许多活动，包括知识、学习、感知、社会智能、进化、语言、泛化和模仿。我表明，这种基于利益最大化的反证法源于经济人的政治经济学思想，并在很大程度上与激进的行为主义版本重叠。我说明了为什么强化学习范式，尽管在一些实际应用中表现出可证明的用处，但却是一个不完整的智能框架——无论是自然的还是人工的。智能行为的复杂性并不仅仅是奖励最大化之上的二阶复杂性。这一事实对于开发实用、智能、安全和稳健的人工智能代理程序具有深远的影响。

更新时间: 2024-11-11 09:34:57

领域: cs.AI,I.2.0

下载: http://arxiv.org/abs/2202.03192v3

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruction-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

Updated: 2024-11-11 09:32:20

标题: 将领域和对齐向量相结合以在LLMs中实现更好的知识安全性权衡

摘要: 越来越多人对培训在特定技术领域表现出色的领域专家LLMs表现出兴趣，相比之下，他们的通用指导调整型对照组。然而，这些专家模型在这个过程中往往会失去他们的安全能力，使他们能够生成有害内容。作为解决方案，我们引入了一种高效且有效的基于合并的对齐方法，称为MergeAlign，该方法插值域和对齐向量，创建更安全的领域特定模型同时保留其实用性。我们将MergeAlign应用于在医学和金融领域专家的Llama3变体，获得了对领域特定基准测试的实质性对齐改进，同时最小化或没有对领域特定基准测试的降级。我们通过模型相似性度量和被合并的个体模型的贡献研究了模型合并的影响。我们希望我们的发现开辟新的研究途径，激发更高效地开发安全的专家LLMs。

更新时间: 2024-11-11 09:32:20

领域: cs.AI

下载: http://arxiv.org/abs/2411.06824v1

Large Language Model in Medical Informatics: Direct Classification and Enhanced Text Representations for Automatic ICD Coding

Addressing the complexity of accurately classifying International Classification of Diseases (ICD) codes from medical discharge summaries is challenging due to the intricate nature of medical documentation. This paper explores the use of Large Language Models (LLM), specifically the LLAMA architecture, to enhance ICD code classification through two methodologies: direct application as a classifier and as a generator of enriched text representations within a Multi-Filter Residual Convolutional Neural Network (MultiResCNN) framework. We evaluate these methods by comparing them against state-of-the-art approaches, revealing LLAMA's potential to significantly improve classification outcomes by providing deep contextual insights into medical texts.

Updated: 2024-11-11 09:31:46

标题: 在医学信息学中的大型语言模型：用于自动ICD编码的直接分类和增强文本表示

摘要: 处理从医学出院摘要准确分类国际疾病分类（ICD）代码的复杂性具有挑战性，这是由于医学文档的复杂性。本文探讨了使用大型语言模型（LLM），特别是LLAMA架构，通过两种方法增强ICD代码分类的方法：直接应用作为分类器和作为在Multi-Filter Residual Convolutional Neural Network（MultiResCNN）框架中生成丰富文本表示的生成器。我们通过将它们与最先进的方法进行比较来评估这些方法，揭示LLAMA提供深入文本背景见解的潜力，从而显着改善分类结果。

更新时间: 2024-11-11 09:31:46

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2411.06823v1

Instruction Tuning for Large Language Models: A Survey

This paper surveys research works in the quickly advancing field of instruction tuning (IT), which can also be referred to as supervised fine-tuning (SFT)\footnote{In this paper, unless specified otherwise, instruction tuning (IT) will be equivalent to supervised fine-tuning (SFT).}, a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and application, along with analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.Project page: github.com/xiaoya-li/Instruction-Tuning-Survey

Updated: 2024-11-11 09:25:48

标题: 大型语言模型的指令调优：一项调查

摘要: 本文调研了快速发展的指令调优（IT）领域的研究工作，也可以称为监督微调（SFT）\footnote{在本文中，除非另有说明，指令调优（IT）将等同于监督微调（SFT）。}，这是增强大型语言模型（LLMs）能力和可控性的关键技术。指令调优是指在监督方式下进一步训练LLMs的过程，该过程使用由（指令，输出）对组成的数据集，将LLMs的下一个词预测目标与用户希望LLMs遵循人类指令的目标之间的差距。在本研究中，我们对文献进行了系统性回顾，包括IT的一般方法论、IT数据集的构建、IT模型的训练，以及应用于不同形式、领域和应用的分析，同时分析了影响IT结果的因素（例如，指令输出的生成、指令数据集的大小等）。我们还审查了IT的潜在缺陷以及针对IT的批评，以及指出现有策略当前存在的不足之处并建议一些有益研究方向。项目页面：github.com/xiaoya-li/Instruction-Tuning-Survey

更新时间: 2024-11-11 09:25:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2308.10792v7

Federated Graph Condensation with Information Bottleneck Principles

Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediately benefited various graph learning tasks. However, existing graph condensation methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To bridge the gap, we propose and study the novel problem of federated graph condensation for graph neural networks (GNNs). Specifically, we first propose a general framework for federated graph condensation, in which we decouple the typical gradient matching process for graph condensation into client-side gradient calculation and server-side gradient matching. In this way, the burdensome computation cost in client-side is largely alleviated. Besides, our empirical studies show that under the federated setting, the condensed graph will consistently leak data membership privacy, i.e., the condensed graph during the federated training can be utilized to steal the training data under the membership inference attacks (MIA). To tackle this issue, we innovatively incorporate information bottleneck principles into the federated graph condensation, which only needs to extract partial node features in one local pre-training step and utilize the features during federated training. Extensive experiments on real-world datasets demonstrate that our framework can consistently protect membership privacy during training. Meanwhile, it also achieves comparable and even superior performance against existing centralized graph condensation and federated graph learning methods.

Updated: 2024-11-11 09:23:00

标题: 使用信息瓶颈原理的联合图压缩

摘要: 图形压缩通过合成一个小规模的压缩图作为其替代物来减小大规模图的大小，立即受益于各种图学习任务。然而，现有的图形压缩方法依赖于集中式数据存储，这对于实际的分散式数据分发是不可行的，并且忽视了数据持有者的隐私保护要求。为了弥补这一差距，我们提出并研究了图神经网络（GNNs）的联邦图压缩的新问题。具体而言，我们首先提出了联邦图压缩的通用框架，在其中将图压缩的典型梯度匹配过程分解为客户端梯度计算和服务器端梯度匹配。通过这种方式，客户端的繁重计算成本得到了很大程度的缓解。此外，我们的实证研究表明，在联邦设置下，压缩图将始终泄漏数据成员隐私，即在联邦训练期间，压缩图可以被用于窃取训练数据，从而进行成员推断攻击（MIA）。为了解决这个问题，我们创新地将信息瓶颈原则融入到联邦图压缩中，这只需要在本地预训练步骤中提取部分节点特征，并在联邦训练期间利用这些特征。对真实世界数据集的大量实验表明，我们的框架在训练过程中可以始终保护成员隐私。同时，它也在性能上达到了与现有集中式图压缩和联邦图学习方法相媲美甚至更优的表现。

更新时间: 2024-11-11 09:23:00

领域: cs.LG,cs.AI,cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.03911v2

Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC

The difficulty of exploring and training online on real production systems limits the scope of real-time online data/feedback-driven decision making. The most feasible approach is to adopt offline reinforcement learning from limited trajectory samples. However, after deployment, such policies fail due to exogenous factors that temporarily or permanently disturb/alter the transition distribution of the assumed decision process structure induced by offline samples. This results in critical policy failures and generalization errors in sensitive domains like Real-Time Communication (RTC). We solve this crucial problem of identifying robust actions in presence of domain shifts due to unseen exogenous stochastic factors in the wild. As it is impossible to learn generalized offline policies within the support of offline data that are robust to these unseen exogenous disturbances, we propose a novel post-deployment shaping of policies (Streetwise), conditioned on real-time characterization of out-of-distribution sub-spaces. This leads to robust actions in bandwidth estimation (BWE) of network bottlenecks in RTC and in standard benchmarks. Our extensive experimental results on BWE and other standard offline RL benchmark environments demonstrate a significant improvement ($\approx$ 18% on some scenarios) in final returns wrt. end-user metrics over state-of-the-art baselines.

Updated: 2024-11-11 09:22:09

标题: 街头智能代理：赋能离线强化学习政策，以智胜RTC中的外部随机干扰

摘要: 探索和训练在线真实生产系统的困难限制了实时在线数据/反馈驱动决策的范围。最可行的方法是采用从有限轨迹样本中的离线强化学习。然而，在部署后，由于外部因素的干扰/改变，导致了假设的决策过程结构的转换分布暂时或永久性地发生改变，从而使这些策略失败。这导致了在敏感领域，如实时通信（RTC）中的关键策略失效和泛化错误。我们解决了在野外存在未知外部随机因素导致领域转移的情况下识别稳健行动的关键问题。由于不可能在离线数据的支持范围内学习到对这些未知外部干扰具有稳健性的广义离线策略，我们提出了一种新颖的策略部署后形状（Streetwise），在实时特征化的基础上，针对分布之外的子空间。这导致在RTC中的网络瓶颈带宽估计（BWE）和标准基准测试中采取稳健行动。我们在BWE和其他标准离线RL基准环境上的广泛实验结果表明，在最终用户指标方面相对于最先进的基准线，有显着的改善（某些场景约为18%）。

更新时间: 2024-11-11 09:22:09

领域: cs.LG

下载: http://arxiv.org/abs/2411.06815v1

On Provable Length and Compositional Generalization

Out-of-distribution generalization capabilities of sequence-to-sequence models can be studied from the lens of two crucial forms of generalization: length generalization -- the ability to generalize to longer sequences than ones seen during training, and compositional generalization: the ability to generalize to token combinations not seen during training. In this work, we provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models -- deep sets, transformers, state space models, and recurrent neural nets -- trained to minimize the prediction error. Taking a first principles perspective, we study the realizable case, i.e., the labeling function is realizable on the architecture. We show that \emph{simple limited capacity} versions of these different architectures achieve both length and compositional generalization. In all our results across different architectures, we find that the learned representations are linearly related to the representations generated by the true labeling function.

Updated: 2024-11-11 09:22:02

标题: 关于可证明长度和组合泛化

摘要: 序列到序列模型的超出分布泛化能力可以从两种关键的泛化形式的角度进行研究：长度泛化——能够泛化到比训练中看到的更长的序列，以及组合泛化：能够泛化到训练中未见过的令牌组合。在这项工作中，我们为常见的序列到序列模型——深度集合、变压器、状态空间模型和循环神经网络提供了首次可证明的关于长度和组合泛化的保证，这些模型训练时最小化预测误差。从第一原理的角度来看，我们研究了可实现的情况，即标记函数在架构上是可实现的。我们展示了这些不同架构的简单有限容量版本都实现了长度和组合泛化。在所有我们在不同架构上的结果中，我们发现学习到的表示与真实标记函数生成的表示呈线性关系。

更新时间: 2024-11-11 09:22:02

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2402.04875v4

CUDRT: Benchmarking the Detection Models of Human vs. Large Language Models Generated Texts

While large language models (LLMs) have greatly enhanced text generation across industries, their human-like outputs make distinguishing between human and AI authorship challenging. Although many LLM-generated text detectors exist, current benchmarks mainly rely on static datasets, limiting their effectiveness in assessing model-based detectors requiring prior training. Furthermore, these benchmarks focus on specific scenarios like question answering and text refinement and are primarily limited to English, overlooking broader linguistic applications and LLM subtleties. To address these gaps, we construct a comprehensive bilingual benchmark in Chinese and English to rigorously evaluate mainstream LLM-generated text detection methods. We categorize LLM text generation into five key operations-Create, Update, Delete, Rewrite, and Translate (CUDRT)-covering the full range of LLM activities. For each CUDRT category, we developed extensive datasets enabling thorough assessment of detection performance, incorporating the latest mainstream LLMs for each language. We also establish a robust evaluation framework to support scalable, reproducible experiments, facilitating an in-depth analysis of how LLM operations, different LLMs, datasets, and multilingual training sets impact detector performance, particularly for model-based methods. Our extensive experiments provide critical insights for optimizing LLM-generated text detectors and suggest future directions to improve detection accuracy and generalization across diverse scenarios.Source code and dataset are available at GitHub.

Updated: 2024-11-11 09:19:46

标题: CUDRT: 人类与大型语言模型生成文本检测模型的基准测试

摘要: 尽管大型语言模型（LLMs）已经极大地增强了各行各业的文本生成能力，但它们人类般的输出使得区分人类和AI作者变得具有挑战性。尽管存在许多LLM生成的文本检测器，但目前的基准主要依赖于静态数据集，限制了它们在评估基于模型的检测器时的有效性，这些检测器需要事先训练。此外，这些基准主要关注特定场景，比如问答和文本精炼，并且主要限于英语，忽略了更广泛的语言应用和LLM的微妙之处。为了填补这些空白，我们构建了一个全面的中英双语基准，以严格评估主流LLM生成的文本检测方法。我们将LLM文本生成分类为五个关键操作-创建、更新、删除、重写和翻译（CUDRT）-涵盖了LLM活动的全部范围。针对每个CUDRT类别，我们开发了广泛的数据集，以便彻底评估检测性能，包括每种语言的最新主流LLMs。我们还建立了一个强大的评估框架，支持可扩展的、可重现的实验，促进对LLM操作、不同LLMs、数据集和多语言训练集如何影响检测器性能的深入分析，特别是对于基于模型的方法。我们的广泛实验为优化LLM生成的文本检测器提供了关键见解，并提出了改进检测精度和在不同场景中的泛化能力的未来方向。源代码和数据集可在GitHub上获得。

更新时间: 2024-11-11 09:19:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.09056v2

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Planning is a crucial element of both human intelligence and contemporary large language models (LLMs). In this paper, we initiate a theoretical investigation into the emergence of planning capabilities in Transformer-based LLMs via their next-word prediction mechanisms. We model planning as a network path-finding task, where the objective is to generate a valid path from a specified source node to a designated target node. Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices. These theoretical insights are then validated through experiments, which demonstrate that Transformer architectures indeed learn the adjacency and an incomplete reachability matrices, consistent with our theoretical predictions. When applying our methodology to the real-world planning benchmark Blocksworld, our observations remain consistent. Additionally, our analyses uncover a fundamental limitation of current Transformer architectures in path-finding: these architectures cannot identify reachability relationships through transitivity, which leads to failures in generating paths when concatenation is required. These findings provide new insights into how the internal mechanisms of autoregressive learning facilitate intelligent planning and deepen our understanding of how future LLMs might achieve more advanced and general planning-and-reasoning capabilities across diverse applications.

Updated: 2024-11-11 09:16:56

标题: ALPINE：揭示自回归学习在语言模型中的规划能力

摘要: 规划是人类智能和当代大型语言模型（LLMs）的关键要素。在这篇文章中，我们通过它们的下一个单词预测机制，开始对基于Transformer的LLMs中规划能力的出现进行理论研究。我们将规划建模为一个网络路径查找任务，其目标是从指定的源节点生成到指定目标节点的有效路径。我们的数学特征表明，Transformer架构可以通过在其权重中嵌入邻接和可达矩阵来执行路径查找。此外，我们对基于梯度的学习动态的理论分析表明，LLMs可以学习邻接和有限形式的可达矩阵。这些理论见解随后通过实验证实，表明Transformer架构确实学习了邻接和不完整的可达矩阵，与我们的理论预测一致。将我们的方法应用于现实世界的规划基准Blocksworld时，我们的观察结果保持一致。此外，我们的分析揭示了当前Transformer架构在路径查找中的一个基本限制：这些架构无法通过传递性识别可达关系，这导致在需要连接时生成路径失败。这些发现为我们了解自回归学习的内部机制如何促进智能规划提供了新的见解，深化了我们对未来LLMs如何在各种应用中实现更先进和通用的规划和推理能力的理解。

更新时间: 2024-11-11 09:16:56

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.09220v3

Two-stage Learning-to-Defer for Multi-Task Learning

The Learning-to-Defer approach has been explored for classification and, more recently, regression tasks separately. Many contemporary learning tasks, however, involves both classification and regression components. In this paper, we introduce a Learning-to-Defer approach for multi-task learning that encompasses both classification and regression tasks. Our two-stage approach utilizes a rejector that defers decisions to the most accurate agent among a pre-trained joint classifier-regressor models and one or more external experts. We show that our surrogate loss is $(\mathcal{H}, \mathcal{F}, \mathcal{R})$ and Bayes--consistent, ensuring an effective approximation of the optimal solution. Additionally, we derive learning bounds that demonstrate the benefits of employing multiple confident experts along a rich model in a two-stage learning framework. Empirical experiments conducted on electronic health record analysis tasks underscore the performance enhancements achieved through our method.

Updated: 2024-11-11 09:15:21

标题: 两阶段学习推迟用于多任务学习

摘要: 学习推迟方法已经被分别用于分类和最近的回归任务。然而，许多当代学习任务涉及分类和回归两个组成部分。在本文中，我们引入了一种适用于多任务学习的学习推迟方法，包括分类和回归任务。我们的两阶段方法利用一个拒绝器，将决策推迟给在预先训练的联合分类器-回归器模型和一个或多个外部专家中最准确的代理。我们展示了我们的替代损失是$(\mathcal{H}, \mathcal{F}, \mathcal{R})$和Bayes一致的，确保对最优解的有效逼近。此外，我们推导出学习界限，展示在一个两阶段学习框架中利用多个自信专家沿着一个丰富模型的好处。在电子健康记录分析任务上进行的实证实验强调了通过我们的方法实现的性能增强。

更新时间: 2024-11-11 09:15:21

领域: stat.ML,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.15729v2

Generative midtended cognition and Artificial Intelligence. Thinging with thinging things

This paper introduces the concept of ``generative midtended cognition'', exploring the integration of generative AI with human cognition. The term "generative" reflects AI's ability to iteratively produce structured outputs, while "midtended" captures the potential hybrid (human-AI) nature of the process. It stands between traditional conceptions of intended creation, understood directed from within, and extended processes that bring exo-biological processes into the creative process. We examine current generative technologies (based on multimodal transformer architectures typical of large language models like ChatGPT), to explain how they can transform human cognitive agency beyond what standard theories of extended cognition can capture. We suggest that the type of cognitive activity typical of the coupling between a human and generative technologies is closer (but not equivalent) to social cognition than to classical extended cognitive paradigms. Yet, it deserves a specific treatment. We provide an explicit definition of generative midtended cognition in which we treat interventions by AI systems as constitutive of the agent's intentional creative processes. Furthermore, we distinguish two dimensions of generative hybrid creativity: 1. Width: captures the sensitivity of the context of the generative process (from the single letter to the whole historical and surrounding data), 2. Depth: captures the granularity of iteration loops involved in the process. Generative midtended cognition stands in the middle depth between conversational forms of cognition in which complete utterances or creative units are exchanged, and micro-cognitive (e.g. neural) subpersonal processes. Finally, the paper discusses the potential risks and benefits of widespread generative AI adoption, including the challenges of authenticity, generative power asymmetry, and creative boost or atrophy.

Updated: 2024-11-11 09:14:27

标题: 生成中介认知与人工智能。用物体思考物体。

摘要: 这篇论文介绍了“生成性中介认知”的概念，探讨了生成人工智能与人类认知的整合。术语“生成性”反映了人工智能迭代产生结构化输出的能力，而“中介”捕捉了过程的潜在混合（人工智能-人类）性质。它介于传统意图创造的概念和将外生生物过程纳入创造过程的扩展过程之间。我们检查了当前的生成技术（基于类似ChatGPT这样的大型语言模型的多模态变压器架构），以解释它们如何可以将人类认知代理转化为标准扩展认知理论无法捕捉的程度。我们认为人类与生成技术之间的耦合所产生的类型典型的认知活动更接近（但不等同于）社会认知，而不是经典扩展认知范式。然而，它值得特别对待。我们提供了生成性中介认知的明确定义，其中我们将人工智能系统的干预视为构成代理意图创造过程的一部分。此外，我们区分了生成混合创造力的两个维度：1. 宽度：捕捉生成过程的上下文敏感性（从单个字母到整个历史和周围数据）；2. 深度：捕捉涉及过程的迭代循环的粒度。生成性中介认知在对话形式的认知和微观认知（例如神经）的次人认知过程之间处于中间深度。最后，本文讨论了广泛采用生成人工智能的潜在风险和好处，包括真实性、生成力量的不对称性以及创造力的增强或萎缩等挑战。

更新时间: 2024-11-11 09:14:27

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2411.06812v1

JPEG AI Image Compression Visual Artifacts: Detection Methods and Dataset

Learning-based image compression methods have improved in recent years and started to outperform traditional codecs. However, neural-network approaches can unexpectedly introduce visual artifacts in some images. We therefore propose methods to separately detect three types of artifacts (texture and boundary degradation, color change, and text corruption), to localize the affected regions, and to quantify the artifact strength. We consider only those regions that exhibit distortion due solely to the neural compression but that a traditional codec recovers successfully at a comparable bitrate. We employed our methods to collect artifacts for the JPEG AI verification model with respect to HM-18.0, the H.265 reference software. We processed about 350,000 unique images from the Open Images dataset using different compression-quality parameters; the result is a dataset of 46,440 artifacts validated through crowd-sourced subjective assessment. Our proposed dataset and methods are valuable for testing neural-network-based image codecs, identifying bugs in these codecs, and enhancing their performance. We make source code of the methods and the dataset publicly available.

Updated: 2024-11-11 09:11:01

标题: JPEG AI图像压缩视觉伪影：检测方法与数据集

摘要: 近年来，基于学习的图像压缩方法有所改进，并开始胜过传统编解码器。然而，神经网络方法可能在某些图像中意外引入视觉伪影。因此，我们提出了方法来分别检测三种类型的伪影（纹理和边界退化、颜色变化和文本破坏），以定位受影响区域，并量化伪影强度。我们仅考虑那些仅由神经压缩导致失真的区域，但传统编解码器可以在相似比特率下成功恢复的区域。我们使用这些方法来收集针对JPEG AI验证模型的伪影，相对于HM-18.0和H.265参考软件。我们使用不同的压缩质量参数处理了来自Open Images数据集的约35万个独特图像；结果是一个通过众包主观评估验证的46,440个伪影数据集。我们提出的数据集和方法对于测试基于神经网络的图像编解码器，识别这些编解码器中的错误，并增强其性能都是有价值的。我们公开发布了方法和数据集的源代码。

更新时间: 2024-11-11 09:11:01

领域: cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2411.06810v1

Learning-to-Defer for Extractive Question Answering

Pre-trained language models have profoundly impacted the field of extractive question-answering, leveraging large-scale textual corpora to enhance contextual language understanding. Despite their success, these models struggle in complex scenarios that demand nuanced interpretation or inferential reasoning beyond immediate textual cues. Furthermore, their size poses deployment challenges on resource-constrained devices. Addressing these limitations, we introduce an adapted two-stage Learning-to-Defer mechanism that enhances decision-making by enabling selective deference to human experts or larger models without retraining language models in the context of question-answering. This approach not only maintains computational efficiency but also significantly improves model reliability and accuracy in ambiguous contexts. We establish the theoretical soundness of our methodology by proving Bayes and $(\mathcal{H}, \mathcal{R})$--consistency of our surrogate loss function, guaranteeing the optimality of the final solution. Empirical evaluations on the SQuADv2 dataset illustrate performance gains from integrating human expertise and leveraging larger models. Our results further demonstrate that deferring a minimal number of queries allows the smaller model to achieve performance comparable to their larger counterparts while preserving computing efficiency, thus broadening the applicability of pre-trained language models in diverse operational environments.

Updated: 2024-11-11 09:06:51

标题: 学习延迟进行抽取式问答

摘要: 预训练语言模型对抽取式问答领域产生了深远影响，利用大规模文本语料库增强上下文语言理解能力。尽管取得成功，这些模型在需要微妙解释或推理能力超越即时文本线索的复杂场景中仍然存在挑战。此外，它们的规模对资源受限设备的部署提出了挑战。为了解决这些限制，我们引入了一种适应的两阶段学习推迟机制，通过在问答环境中使选择性推迟给人类专家或更大模型来增强决策能力，而无需重新训练语言模型。这种方法不仅保持了计算效率，而且在模棱两可的情况下显著提高了模型的可靠性和准确性。我们通过证明我们的替代损失函数的贝叶斯和 $(\mathcal{H}, \mathcal{R})$--一致性，确保了最终解决方案的最优性，证实了我们方法的理论合理性。在SQuADv2数据集上的实证评估展示了整合人类专业知识和利用更大模型带来的性能提升。我们的结果进一步表明，推迟少量查询使较小模型能够实现与更大对应模型相当的性能，同时保持计算效率，从而扩大了预训练语言模型在不同运营环境中的适用性。

更新时间: 2024-11-11 09:06:51

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.15761v2

Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has been less explored. Malicious deepfakes could ultimately cause public and social problems. Can we humans correctly perceive the authenticity of the content of the videos we watch? The answer is obviously uncertain; therefore, this paper aims to evaluate the human ability to discern deepfake videos through a subjective study. We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models. To this end, we used gamification concepts to provide 110 participants (55 native English speakers and 55 non-native English speakers) with a webbased platform where they could access a series of 40 videos (20 real and 20 fake) to determine their authenticity. Each participant performed the experiment twice with the same 40 videos in different random orders. The videos are manually selected from the FakeAVCeleb dataset. We found that all AI models performed better than humans when evaluated on the same 40 videos. The study also reveals that while deception is not impossible, humans tend to overestimate their detection capabilities. Our experimental results may help benchmark human versus machine performance, advance forensics analysis, and enable adaptive countermeasures.

Updated: 2024-11-11 09:05:15

标题: 揭示幻象：理解人类对音视频深度伪造的感知

摘要: 当代深度伪造技术的出现引起了机器学习研究的重视，因为人工智能（AI）生成的合成媒体增加了误解的发生率，很难与真实内容区分开来。目前，机器学习技术已被广泛研究，用于自动检测深度伪造。然而，人类感知力度较少被探讨。恶意深度伪造最终可能导致公共和社会问题。我们人类能够正确地感知我们观看的视频内容的真实性吗？答案显然是不确定的；因此，本文旨在通过主观研究评估人类辨别深度伪造视频的能力。我们通过将人类观察者与五种最先进的视听深度伪造检测模型进行比较，呈现了我们的研究结果。为此，我们使用游戏化概念为110名参与者（55名母语为英语的讲者和55名非母语为英语的讲者）提供了一个基于网络的平台，他们可以访问一系列40个视频（20个真实和20个伪造）来确定它们的真实性。每位参与者以不同的随机顺序两次执行实验，使用相同的40个视频。这些视频是从FakeAVCeleb数据集手动选择的。我们发现，当在相同的40个视频上进行评估时，所有AI模型的表现都优于人类。研究还表明，虽然欺骗并非不可能，但人类往往高估了他们的检测能力。我们的实验结果可能有助于评估人类与机器性能，推进取证分析，并实现自适应对策。

更新时间: 2024-11-11 09:05:15

领域: cs.CV,cs.AI,cs.CY,cs.LG,cs.MM

下载: http://arxiv.org/abs/2405.04097v2

AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant

The emergence of Large Language Models (LLMs) has significantly advanced natural language processing, but these models often generate factually incorrect information, known as "hallucination". Initial retrieval-augmented generation (RAG) methods like the "Retrieve-Read" framework was inadequate for complex reasoning tasks. Subsequent prompt-based RAG strategies and Supervised Fine-Tuning (SFT) methods improved performance but required frequent retraining and risked altering foundational LLM capabilities. To cope with these challenges, we propose Assistant-based Retrieval-Augmented Generation (AssistRAG), integrating an intelligent information assistant within LLMs. This assistant manages memory and knowledge through tool usage, action execution, memory building, and plan specification. Using a two-phase training approach, Curriculum Assistant Learning and Reinforced Preference Optimization. AssistRAG enhances information retrieval and decision-making. Experiments show AssistRAG significantly outperforms benchmarks, especially benefiting less advanced LLMs, by providing superior reasoning capabilities and accurate responses.

Updated: 2024-11-11 09:03:52

标题: AssistRAG：通过智能信息助手提升大型语言模型的潜力

摘要: 大型语言模型（LLMs）的出现显著推动了自然语言处理的发展，但这些模型经常会生成事实上不正确的信息，被称为“幻觉”。最初的检索增强生成（RAG）方法，如“检索-阅读”框架，在复杂推理任务中不足以胜任。随后基于提示的RAG策略和监督微调（SFT）方法提升了性能，但需要频繁重新训练，存在改变基础LLM功能的风险。为了应对这些挑战，我们提出了基于助手的检索增强生成（AssistRAG），在LLMs中集成一个智能信息助手。这个助手通过工具使用、行动执行、记忆构建和计划规范来管理记忆和知识。使用两阶段训练方法，课程助理学习和强化优先选择优化。AssistRAG增强了信息检索和决策能力。实验证明AssistRAG明显优于基准，特别有益于较不先进的LLMs，提供了卓越的推理能力和准确的回应。

更新时间: 2024-11-11 09:03:52

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2411.06805v1

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings such as autonomous driving. Both of these challenges make deploying purely cloned or pure RL policies in safety critical applications such as autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.

Updated: 2024-11-11 09:02:49

标题: CIMRL：结合模仿学习和强化学习实现安全自主驾驶

摘要: 现代自动驾驶的方法主要依赖于通过模仿学习使用大量人类驾驶数据训练的学习组件。然而，这些方法需要大量昂贵的数据收集，即使如此，也面临着处理长尾场景和随着时间累积错误的挑战。与此同时，纯强化学习（RL）方法可能无法在稀疏、受限和难以定义奖励设置的挑战性环境中学习到高性能策略，比如自动驾驶。这两个挑战使得在安全关键应用场景如自动驾驶中部署纯克隆或纯RL策略具有挑战性。在本文中，我们提出了结合模仿学习和强化学习（CIMRL）方法-一种安全的强化学习框架，通过利用模仿运动先验和安全约束在仿真环境中训练驾驶策略。CIMRL不需要详尽的奖励规范，并且改进了纯克隆方法的闭环行为。通过结合RL和模仿学习，我们证明我们的方法在闭环仿真和真实世界驾驶基准测试中取得了最先进的结果。

更新时间: 2024-11-11 09:02:49

领域: cs.LG

下载: http://arxiv.org/abs/2406.08878v4

Predicting ionic conductivity in solids from the machine-learned potential energy landscape

Discovering new superionic materials is essential for advancing solid-state batteries, which offer improved energy density and safety compared to the traditional lithium-ion batteries with liquid electrolytes. Conventional computational methods for identifying such materials are resource-intensive and not easily scalable. Recently, universal interatomic potential models have been developed using equivariant graph neural networks. These models are trained on extensive datasets of first-principles force and energy calculations. One can achieve significant computational advantages by leveraging them as the foundation for traditional methods of assessing the ionic conductivity, such as molecular dynamics or nudged elastic band techniques. However, the generalization error from model inference on diverse atomic structures arising in such calculations can compromise the reliability of the results. In this work, we propose an approach for the quick and reliable evaluation of ionic conductivity through the analysis of a universal interatomic potential. Our method incorporates a set of heuristic structure descriptors that effectively employ the rich knowledge of the underlying model while requiring minimal generalization capabilities. Using our descriptors, we rank lithium-containing materials in the Materials Project database according to their expected ionic conductivity. Eight out of the ten highest-ranked materials are confirmed to be superionic at room temperature in first-principles calculations. Notably, our method achieves a speed-up factor of approximately 50 compared to molecular dynamics driven by a machine-learning potential, and is at least 3,000 times faster compared to first-principles molecular dynamics.

Updated: 2024-11-11 09:01:36

标题: 从机器学习的势能地形预测固体的离子导电率

摘要: 发现新的超离子材料对于推动固态电池的发展至关重要，这种电池相比传统的具有液体电解质的锂离子电池具有更高的能量密度和安全性。传统的用于识别这类材料的计算方法资源密集且不易扩展。最近，使用等变图神经网络开发了通用原子间势模型。这些模型经过大量第一性原理力和能量计算的数据集训练。通过将其作为评估离子导电性的传统方法的基础，如分子动力学或受激弹性带技术，可以获得显著的计算优势。然而，模型推断在这些计算中出现的多样原子结构的泛化错误可能会影响结果的可靠性。在这项工作中，我们提出了一种通过分析通用原子间势快速可靠评估离子导电性的方法。我们的方法将一组启发式结构描述符纳入其中，有效利用基础模型的丰富知识，同时要求最小的泛化能力。使用我们的描述符，我们根据其预期的离子导电性对Materials Project数据库中含锂材料进行排名。前十名中有八种材料在第一性原理计算中被确认为在室温下为超离子。值得注意的是，我们的方法与由机器学习势驱动的分子动力学相比，实现了大约50倍的加速，与第一性原理分子动力学相比至少快3,000倍。

更新时间: 2024-11-11 09:01:36

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2411.06804v1

Structuring the Processing Frameworks for Data Stream Evaluation and Application

The following work addresses the problem of frameworks for data stream processing that can be used to evaluate the solutions in an environment that resembles real-world applications. The definition of structured frameworks stems from a need to reliably evaluate the data stream classification methods, considering the constraints of delayed and limited label access. The current experimental evaluation often boundlessly exploits the assumption of their complete and immediate access to monitor the recognition quality and to adapt the methods to the changing concepts. The problem is leveraged by reviewing currently described methods and techniques for data stream processing and verifying their outcomes in simulated environment. The effect of the work is a proposed taxonomy of data stream processing frameworks, showing the linkage between drift detection and classification methods considering a natural phenomenon of label delay.

Updated: 2024-11-11 08:53:02

标题: 构建数据流评估和应用的处理框架

摘要: 这项工作解决了数据流处理框架的问题，这些框架可以用来评估类似真实应用程序环境中的解决方案。结构化框架的定义源自对数据流分类方法的可靠评估的需求，考虑到延迟和有限标签访问的限制。当前的实验评估经常无限制地利用他们完全和立即访问的假设来监视识别质量并调整方法以适应不断变化的概念。通过回顾当前描述的数据流处理方法和技术，并在模拟环境中验证其结果来解决这个问题。这项工作的影响是提出了一个数据流处理框架的分类法，展示了漂移检测和分类方法之间的联系，考虑了标签延迟的自然现象。

更新时间: 2024-11-11 08:53:02

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2411.06799v1

LA4SR: illuminating the dark proteome with generative AI

AI language models (LMs) show promise for biological sequence analysis. We re-engineered open-source LMs (GPT-2, BLOOM, DistilRoBERTa, ELECTRA, and Mamba, ranging from 70M to 12B parameters) for microbial sequence classification. The models achieved F1 scores up to 95 and operated 16,580x faster and at 2.9x the recall of BLASTP. They effectively classified the algal dark proteome - uncharacterized proteins comprising about 65% of total proteins - validated on new data including a new, complete Hi-C/Pacbio Chlamydomonas genome. Larger (>1B) LA4SR models reached high accuracy (F1 > 86) when trained on less than 2% of available data, rapidly achieving strong generalization capacity. High accuracy was achieved when training data had intact or scrambled terminal information, demonstrating robust generalization to incomplete sequences. Finally, we provide custom AI explainability software tools for attributing amino acid patterns to AI generative processes and interpret their outputs in evolutionary and biophysical contexts.

Updated: 2024-11-11 08:51:18

标题: LA4SR: 用生成式人工智能揭示暗蛋白质

摘要: AI语言模型（LMs）在生物序列分析方面表现出潜力。我们重新设计了开源LMs（GPT-2、BLOOM、DistilRoBERTa、ELECTRA和Mamba，参数范围从70M到12B）用于微生物序列分类。这些模型的F1分数最高可达95，运行速度比BLASTP快16,580倍，召回率提高了2.9倍。它们有效地对藻类暗蛋白组进行分类-这些未经表征的蛋白质占总蛋白质的约65%，并且在新数据上得到验证，包括新的完整Hi-C/Pacbio藻类基因组。较大的（>1B）LA4SR模型在训练数据不足2%的情况下达到了高精度（F1>86），迅速实现了强大的泛化能力。当训练数据具有完整或乱序的末端信息时，也能实现高准确度，表明对不完整序列具有强大的泛化能力。最后，我们提供了自定义的AI可解释性软件工具，用于将氨基酸模式归因于AI生成过程，并解释其在进化和生物物理背景下的输出。

更新时间: 2024-11-11 08:51:18

领域: q-bio.GN,cs.AI,cs.CL,q-bio.QM

下载: http://arxiv.org/abs/2411.06798v1

Evolving Efficient Genetic Encoding for Deep Spiking Neural Networks

By exploiting discrete signal processing and simulating brain neuron communication, Spiking Neural Networks (SNNs) offer a low-energy alternative to Artificial Neural Networks (ANNs). However, existing SNN models, still face high computational costs due to the numerous time steps as well as network depth and scale. The tens of billions of neurons and trillions of synapses in the human brain are developed from only 20,000 genes, which inspires us to design an efficient genetic encoding strategy that dynamic evolves to regulate large-scale deep SNNs at low cost. Therefore, we first propose a genetically scaled SNN encoding scheme that incorporates globally shared genetic interactions to indirectly optimize neuronal encoding instead of weight, which obviously brings about reductions in parameters and energy consumption. Then, a spatio-temporal evolutionary framework is designed to optimize the inherently initial wiring rules. Two dynamic regularization operators in the fitness function evolve the neuronal encoding to a suitable distribution and enhance information quality of the genetic interaction respectively, substantially accelerating evolutionary speed and improving efficiency. Experiments show that our approach compresses parameters by approximately 50\% to 80\%, while outperforming models on the same architectures by 0.21\% to 4.38\% on CIFAR-10, CIFAR-100 and ImageNet. In summary, the consistent trends of the proposed genetically encoded spatio-temporal evolution across different datasets and architectures highlight its significant enhancements in terms of efficiency, broad scalability and robustness, demonstrating the advantages of the brain-inspired evolutionary genetic coding for SNN optimization.

Updated: 2024-11-11 08:40:52

标题: 演变高效的基因编码以用于深度脉冲神经网络

摘要: 通过利用离散信号处理和模拟大脑神经元通信，脉冲神经网络（SNN）为人工神经网络（ANN）提供了一种低能量的替代方案。然而，现有的SNN模型由于众多的时间步骤以及网络的深度和规模而面临高昂的计算成本。人类大脑中数十亿个神经元和万亿个突触仅发展自2万个基因，这启发我们设计一种高效的遗传编码策略，动态演化以低成本调节大规模深度SNN。因此，我们首先提出了一个基因调节的SNN编码方案，将全局共享的遗传相互作用融入其中，间接优化神经元编码而非权重，显然能够减少参数和能耗。然后，设计了一个时空演化框架来优化固有的初始连线规则。适应性函数中的两个动态正则化运算符将神经元编码演化到适当的分布，并分别增强遗传相互作用的信息质量，显著加快演化速度并提高效率。实验证明，我们的方法将参数压缩约50\%至80\%，同时在CIFAR-10、CIFAR-100和ImageNet上的相同架构模型表现优于0.21\%至4.38\%。总之，提出的基因编码的时空演化在不同数据集和架构上均表现出一致的趋势，突显了其在效率、广泛可扩展性和稳健性方面的显著增强，展示了脉冲神经网络优化的大脑启发式进化遗传编码的优势。

更新时间: 2024-11-11 08:40:52

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2411.06792v1

Ultra-marginal Feature Importance: Learning from Data with Causal Guarantees

Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal contribution feature importance (MCI) was developed to break this trend by providing a useful framework for quantifying the relationships in data. In this work, we aim to improve upon the theoretical properties, performance, and runtime of MCI by introducing ultra-marginal feature importance (UMFI), which uses dependence removal techniques from the AI fairness literature as its foundation. We first propose axioms for feature importance methods that seek to explain the causal and associative relationships in data, and we prove that UMFI satisfies these axioms under basic assumptions. We then show on real and simulated data that UMFI performs better than MCI, especially in the presence of correlated interactions and unrelated features, while partially learning the structure of the causal graph and reducing the exponential runtime of MCI to super-linear.

Updated: 2024-11-11 08:34:20

标题: 超边缘特征重要性：从具有因果保证的数据中学习

摘要: 科学家经常将学习从数据中获取信息放在比训练最佳模型更重要的位置；然而，在机器学习研究中，通常更注重后者。边际贡献特征重要性（MCI）被开发出来，以打破这一趋势，为量化数据中的关系提供了一个有用的框架。在这项工作中，我们旨在通过引入超边际特征重要性（UMFI）来改进MCI的理论属性、性能和运行时间，UMFI使用了AI公平性文献中的依赖性去除技术作为其基础。我们首先提出了寻求解释数据中因果和关联关系的特征重要性方法的公理，并证明了在基本假设下UMFI符合这些公理。然后我们展示了在真实和模拟数据上，UMFI表现比MCI更好，尤其是在存在相关交互作用和无关特征的情况下，同时部分学习因果图的结构并将MCI的指数运行时间减少为超线性。

更新时间: 2024-11-11 08:34:20

领域: stat.ML,cs.IT,cs.LG,math.IT,stat.AP

下载: http://arxiv.org/abs/2204.09938v5

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

In this paper, we question if well pre-trained vision transformer (ViT) models could be used as teachers that exhibit scalable properties to advance cross architecture knowledge distillation (KD) research, in the context of using large-scale datasets for evaluation. To make this possible, our analysis underlines the importance of seeking effective strategies to align (1) feature computing paradigm differences, (2) model scale differences, and (3) knowledge density differences. By combining three coupled components namely cross attention projector, dual-view feature mimicking and teacher parameter perception tailored to address the above problems, we present a simple and effective KD method, called ScaleKD. Our method can train student backbones that span across a variety of convolutional neural network (CNN), multi-layer perceptron (MLP), and ViT architectures on image classification datasets, achieving state-of-the-art distillation performance. For instance, taking a well pre-trained Swin-L as the teacher model, our method gets 75.15%|82.03%|84.16%|78.63%|81.96%|83.93%|83.80%|85.53% top-1 accuracies for MobileNet-V1|ResNet-50|ConvNeXt-T|Mixer-S/16|Mixer-B/16|ViT-S/16|Swin-T|ViT-B/16 models trained on ImageNet-1K dataset from scratch, showing 3.05%|3.39%|2.02%|4.61%|5.52%|4.03%|2.62%|3.73% absolute gains to the individually trained counterparts. Intriguingly, when scaling up the size of teacher models or their pre-training datasets, our method showcases the desired scalable properties, bringing increasingly larger gains to student models. The student backbones trained by our method transfer well on downstream MS-COCO and ADE20K datasets. More importantly, our method could be used as a more efficient alternative to the time-intensive pre-training paradigm for any target student model if a strong pre-trained ViT is available, reducing the amount of viewed training samples up to 195x.

Updated: 2024-11-11 08:25:21

标题: ScaleKD：强大的视觉Transformer可能是出色的教师

摘要: 在这篇论文中，我们质疑是否可以将训练良好的视觉变换器（ViT）模型用作展示可扩展属性的教师，以推动跨架构知识蒸馏（KD）研究，背景是使用大规模数据集进行评估。为实现这一目标，我们的分析强调寻求有效策略以对齐（1）特征计算范式的差异，（2）模型规模的差异，和（3）知识密度的差异的重要性。通过结合三个耦合组件，即跨注意力投影器，双视图特征模仿和针对以上问题定制的教师参数感知，我们提出了一种简单有效的KD方法，称为ScaleKD。我们的方法可以训练跨越各种卷积神经网络（CNN）、多层感知器（MLP）和ViT架构的学生骨干网络，在图像分类数据集上实现了最先进的蒸馏性能。例如，以训练良好的Swin-L作为教师模型，我们的方法在ImageNet-1K数据集上从零开始训练的MobileNet-V1、ResNet-50、ConvNeXt-T、Mixer-S/16、Mixer-B/16、ViT-S/16、Swin-T和ViT-B/16模型获得了75.15%|82.03%|84.16%|78.63%|81.96%|83.93%|83.80%|85.53%的top-1准确率，相对于单独训练的对应模型分别获得了3.05%|3.39%|2.02%|4.61%|5.52%|4.03%|2.62%|3.73%的绝对增益。有趣的是，当扩大教师模型的规模或它们的预训练数据集时，我们的方法展示了所需的可扩展属性，为学生模型带来越来越大的收益。我们方法训练的学生骨干网络在下游MS-COCO和ADE20K数据集上表现良好。更重要的是，如果有强大的预训练ViT可用，我们的方法可以作为任何目标学生模型更高效的替代预训练范式，将查看的训练样本数量减少至195倍。

更新时间: 2024-11-11 08:25:21

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.06786v1

White-Box Diffusion Transformer for single-cell RNA-seq generation

As a powerful tool for characterizing cellular subpopulations and cellular heterogeneity, single cell RNA sequencing (scRNA-seq) technology offers advantages of high throughput and multidimensional analysis. However, the process of data acquisition is often constrained by high cost and limited sample availability. To overcome these limitations, we propose a hybrid model based on Diffusion model and White-Box transformer that aims to generate synthetic and biologically plausible scRNA-seq data. Diffusion model progressively introduce noise into the data and then recover the original data through a denoising process, a forward and reverse process that is particularly suitable for generating complex data distributions. White-Box transformer is a deep learning architecture that emphasizes mathematical interpretability. By minimizing the encoding rate of the data and maximizing the sparsity of the representation, it not only reduces the computational burden, but also provides clear insight into underlying structure. Our White-Box Diffusion Transformer combines the generative capabilities of Diffusion model with the mathematical interpretability of White-Box transformer. Through experiments using six different single-cell RNA-Seq datasets, we visualize both generated and real data using t-SNE dimensionality reduction technique, as well as quantify similarity between generated and real data using various metrics to demonstrate comparable performance of White-Box Diffusion Transformer and Diffusion Transformer in generating scRNA-seq data alongside significant improvements in training efficiency and resource utilization. Our code is available at https://github.com/lingximamo/White-Box-Diffusion-Transformer

Updated: 2024-11-11 08:24:59

标题: 白盒扩散变压器用于单细胞RNA测序生成

摘要: 作为表征细胞亚群和细胞异质性的强大工具，单细胞RNA测序（scRNA-seq）技术提供了高通量和多维分析的优势。然而，数据获取过程通常受限于高成本和样本供应不足。为了克服这些限制，我们提出了一种基于扩散模型和白盒变压器的混合模型，旨在生成合成且生物合理的scRNA-seq数据。扩散模型逐渐向数据引入噪声，然后通过去噪过程恢复原始数据，这是一种前向和反向过程，特别适合生成复杂数据分布。白盒变压器是一种强调数学可解释性的深度学习架构。通过最小化数据的编码速率并最大化表示的稀疏性，它不仅减少了计算负担，而且提供了对潜在结构的清晰洞察。我们的白盒扩散变压器结合了扩散模型的生成能力和白盒变压器的数学可解释性。通过使用六个不同的单细胞RNA测序数据集进行实验，我们使用t-SNE降维技术可视化生成和真实数据，以及使用各种度量衡量生成和真实数据之间的相似性，以展示白盒扩散变压器和扩散变压器在生成scRNA-seq数据方面的可比性性能，以及在培训效率和资源利用方面的显著改进。我们的代码可在https://github.com/lingximamo/White-Box-Diffusion-Transformer 上找到。

更新时间: 2024-11-11 08:24:59

领域: cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2411.06785v1

Optimizing Noise for $f$-Differential Privacy via Anti-Concentration and Stochastic Dominance

In this paper, we establish anti-concentration inequalities for additive noise mechanisms which achieve $f$-differential privacy ($f$-DP), a notion of privacy phrased in terms of a tradeoff function $f$ which limits the ability of an adversary to determine which individuals were in the database. We show that canonical noise distributions (CNDs), proposed by Awan and Vadhan (2023), match the anti-concentration bounds at half-integer values, indicating that their tail behavior is near-optimal. We also show that all CNDs are sub-exponential, regardless of the $f$-DP guarantee. In the case of log-concave CNDs, we show that they are the stochastically smallest noise compared to any other noise distributions with the same privacy guarantee. In terms of integer-valued noise, we propose a new notion of discrete CND and prove that a discrete CND always exists, can be constructed by rounding a continuous CND, and that the discrete CND is unique when designed for a statistic with sensitivity 1. We further show that the discrete CND at sensitivity 1 is stochastically smallest compared to other integer-valued noises. Our theoretical results shed light on the different types of privacy guarantees possible in the $f$-DP framework and can be incorporated in more complex mechanisms to optimize performance.

Updated: 2024-11-11 08:23:46

标题: 通过反集中和随机优势优化$f$-差分隐私的噪声

摘要: 在本文中，我们建立了针对实现$f$-差分隐私（$f$-DP）的加性噪声机制的反集中不等式，$f$-DP是一种以限制对手能力的权衡函数$f$来表达隐私概念，以确定哪些个体在数据库中。我们展示了由Awan和Vadhan（2023年）提出的规范噪声分布（CNDs）在半整数值处符合反集中界限，表明它们的尾部行为接近最优。我们还展示了所有的CND都是次指数级的，无论$f$-DP的保证是什么。在对数凹CNDs的情况下，我们展示了它们与具有相同隐私保证的任何其他噪声分布相比是随机最小的噪声。在整数值噪声方面，我们提出了离散CND的新概念，并证明了离散CND总是存在的，可以通过将连续CND四舍五入来构造，并且当设计用于灵敏度为1的统计量时，离散CND是唯一的。我们进一步展示了在灵敏度为1时，离散CND与其他整数值噪声相比是随机最小的。我们的理论结果揭示了在$f$-DP框架中可能的不同类型的隐私保证，并可以融入更复杂的机制中以优化性能。

更新时间: 2024-11-11 08:23:46

领域: cs.CR,math.PR,math.ST,stat.TH,68P27, 60E15

下载: http://arxiv.org/abs/2308.08343v3

QuadWBG: Generalizable Quadrupedal Whole-Body Grasping

Legged robots with advanced manipulation capabilities have the potential to significantly improve household duties and urban maintenance. Despite considerable progress in developing robust locomotion and precise manipulation methods, seamlessly integrating these into cohesive whole-body control for real-world applications remains challenging. In this paper, we present a modular framework for robust and generalizable whole-body loco-manipulation controller based on a single arm-mounted camera. By using reinforcement learning (RL), we enable a robust low-level policy for command execution over 5 dimensions (5D) and a grasp-aware high-level policy guided by a novel metric, Generalized Oriented Reachability Map (GORM). The proposed system achieves state-of-the-art one-time grasping accuracy of 89% in the real world, including challenging tasks such as grasping transparent objects. Through extensive simulations and real-world experiments, we demonstrate that our system can effectively manage a large workspace, from floor level to above body height, and perform diverse whole-body loco-manipulation tasks.

Updated: 2024-11-11 08:19:54

标题: QuadWBG：可泛化的四足全身抓取

摘要: 具有先进操纵能力的四足机器人有潜力显着改善家庭职责和城市维护。尽管在开发强大的运动和精确操纵方法方面取得了相当大的进展，但将这些方法无缝地整合到适用于现实世界应用的统一全身控制中仍然具有挑战性。在本文中，我们提出了一个基于单臂装载摄像头的模块化框架，用于稳健且通用的全身运动操纵控制器。通过使用强化学习（RL），我们实现了一个稳健的低级别策略，用于在5个维度（5D）上执行命令，并由一种新颖的度量，广义定向可达性图（GORM）引导的抓取感知高级策略。所提出的系统在现实世界中实现了89%的最新一次抓取准确率，包括抓取透明物体等具有挑战性的任务。通过大量的模拟和现实世界实验，我们展示了我们的系统能够有效管理一个大的工作空间，从地面高度到身体以上高度，并执行多样化的全身运动操纵任务。

更新时间: 2024-11-11 08:19:54

领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2411.06782v1

MP-PINN: A Multi-Phase Physics-Informed Neural Network for Epidemic Forecasting

Forecasting temporal processes such as virus spreading in epidemics often requires more than just observed time-series data, especially at the beginning of a wave when data is limited. Traditional methods employ mechanistic models like the SIR family, which make strong assumptions about the underlying spreading process, often represented as a small set of compact differential equations. Data-driven methods such as deep neural networks make no such assumptions and can capture the generative process in more detail, but fail in long-term forecasting due to data limitations. We propose a new hybrid method called MP-PINN (Multi-Phase Physics-Informed Neural Network) to overcome the limitations of these two major approaches. MP-PINN instils the spreading mechanism into a neural network, enabling the mechanism to update in phases over time, reflecting the dynamics of the epidemics due to policy interventions. Experiments on COVID-19 waves demonstrate that MP-PINN achieves superior performance over pure data-driven or model-driven approaches for both short-term and long-term forecasting.

Updated: 2024-11-11 08:19:22

标题: MP-PINN：一种用于流行病预测的多阶段物理信息神经网络

摘要: 预测诸如流行病中病毒传播等时间过程通常需要更多的信息，尤其是在波浪开始时数据有限时。传统方法采用机械模型，如SIR家族，这些模型对传播过程的基本假设较强，通常表示为一小组紧凑的微分方程。数据驱动方法，如深度神经网络，不做这种假设，并且可以更详细地捕捉生成过程，但由于数据限制，长期预测失败。我们提出了一种新的混合方法，称为MP-PINN（多相物理信息神经网络），以克服这两种主要方法的局限性。MP-PINN将传播机制融入神经网络中，使机制能够随着时间分阶段更新，反映由于政策干预而产生的流行病动态。对COVID-19波浪的实验证明，MP-PINN在短期和长期预测方面均优于纯数据驱动或模型驱动方法。

更新时间: 2024-11-11 08:19:22

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.06781v1

FactorSim: Generative Simulation via Factorized Representation

Generating simulations to train intelligent agents in game-playing and robotics from natural language input, from user input or task documentation, remains an open-ended challenge. Existing approaches focus on parts of this challenge, such as generating reward functions or task hyperparameters. Unlike previous work, we introduce FACTORSIM that generates full simulations in code from language input that can be used to train agents. Exploiting the structural modularity specific to coded simulations, we propose to use a factored partially observable Markov decision process representation that allows us to reduce context dependence during each step of the generation. For evaluation, we introduce a generative simulation benchmark that assesses the generated simulation code's accuracy and effectiveness in facilitating zero-shot transfers in reinforcement learning settings. We show that FACTORSIM outperforms existing methods in generating simulations regarding prompt alignment (e.g., accuracy), zero-shot transfer abilities, and human evaluation. We also demonstrate its effectiveness in generating robotic tasks.

Updated: 2024-11-11 08:16:40

标题: FactorSim：通过因子化表示进行生成式模拟

摘要: 生成模拟以训练智能代理在游戏和机器人技术中的应用，从自然语言输入、用户输入或任务文档中，仍然是一个开放性挑战。现有方法集中在这个挑战的部分，比如生成奖励函数或任务超参数。与以往的工作不同，我们介绍了FACTORSIM，它可以从语言输入中生成完整的代码模拟，用于训练代理。利用特定于编码模拟的结构模块化，我们建议使用一个因子化的部分可观察马尔可夫决策过程表示，这使我们能够在生成的每一步中减少上下文依赖性。为了评估，我们引入了一个生成模拟基准，评估生成的模拟代码在促进强化学习设置中的零-shot转移方面的准确性和有效性。我们展示了FACTORSIM在生成模拟方面优于现有方法，包括提示对齐（例如准确性）、零-shot转移能力和人类评估。我们还展示了它在生成机器人任务方面的有效性。

更新时间: 2024-11-11 08:16:40

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2409.17652v2

Machine vision-aware quality metrics for compressed image and video assessment

A main goal in developing video-compression algorithms is to enhance human-perceived visual quality while maintaining file size. But modern video-analysis efforts such as detection and recognition, which are integral to video surveillance and autonomous vehicles, involve so much data that they necessitate machine-vision processing with minimal human intervention. In such cases, the video codec must be optimized for machine vision. This paper explores the effects of compression on detection and recognition algorithms (objects, faces, and license plates) and introduces novel full-reference image/video-quality metrics for each task, tailored to machine vision. Experimental results indicate our proposed metrics correlate better with the machine-vision results for the respective tasks than do existing image/video-quality metrics.

Updated: 2024-11-11 08:07:34

标题: 机器视觉感知的压缩图像和视频质量评估指标

摘要: 视频压缩算法的主要目标是在保持文件大小的同时提高人类感知的视觉质量。但是，现代视频分析工作，如检测和识别，这些工作对视频监控和自动驾驶汽车至关重要，涉及的数据量如此之大，以至于需要最少的人工干预进行机器视觉处理。在这种情况下，视频编解码器必须针对机器视觉进行优化。本文探讨了压缩对检测和识别算法（物体、人脸和车牌）的影响，并为每个任务引入了针对机器视觉量身定制的新型全参考图像/视频质量指标。实验结果表明，我们提出的指标与各自任务的机器视觉结果更好地相关，而现有的图像/视频质量指标则不具备这种相关性。

更新时间: 2024-11-11 08:07:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.06776v1

Model Partition and Resource Allocation for Split Learning in Vehicular Edge Networks

The integration of autonomous driving technologies with vehicular networks presents significant challenges in privacy preservation, communication efficiency, and resource allocation. This paper proposes a novel U-shaped split federated learning (U-SFL) framework to address these challenges on the way of realizing in vehicular edge networks. U-SFL is able to enhance privacy protection by keeping both raw data and labels on the vehicular user (VU) side while enabling parallel processing across multiple vehicles. To optimize communication efficiency, we introduce a semantic-aware auto-encoder (SAE) that significantly reduces the dimensionality of transmitted data while preserving essential semantic information. Furthermore, we develop a deep reinforcement learning (DRL) based algorithm to solve the NP-hard problem of dynamic resource allocation and split point selection. Our comprehensive evaluation demonstrates that U-SFL achieves comparable classification performance to traditional split learning (SL) while substantially reducing data transmission volume and communication latency. The proposed DRL-based optimization algorithm shows good convergence in balancing latency, energy consumption, and learning performance.

Updated: 2024-11-11 07:59:13

标题: 车载边缘网络中拆分学习的模型划分和资源分配

摘要: 自动驾驶技术与车载网络的整合在隐私保护、通信效率和资源分配方面面临着重大挑战。本文提出了一种新颖的U形分裂联邦学习（U-SFL）框架，以解决在实现车载边缘网络中的这些挑战。U-SFL能够通过将原始数据和标签保留在车载用户（VU）端，同时在多辆车之间实现并行处理，从而增强隐私保护。为了优化通信效率，我们引入了一种语义感知自动编码器（SAE），它显著减少了传输数据的维度，同时保留了基本的语义信息。此外，我们开发了一种基于深度强化学习（DRL）的算法来解决动态资源分配和分裂点选择的NP难问题。我们的全面评估表明，U-SFL实现了与传统分裂学习（SL）相当的分类性能，同时大幅减少了数据传输量和通信延迟。所提出的基于DRL的优化算法在平衡延迟、能耗和学习性能方面表现出良好的收敛性。

更新时间: 2024-11-11 07:59:13

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2411.06773v1

A Text Classification Model Combining Adversarial Training with Pre-trained Language Model and neural networks: A Case Study on Telecom Fraud Incident Texts

Front-line police officers often categorize all police call reported cases of Telecom Fraud into 14 subcategories to facilitate targeted prevention measures, such as precise public education. However, the associated data is characterized by its large volume, diverse information content, and variations in expression. Currently, there is a lack of efficient and accurate intelligent models to replace manual classification, which, while precise, is relatively inefficient. To address these challenges, this paper proposes a text classification model that combines adversarial training with Pre-trained Language Model and neural networks. The Linguistically-motivated Pre-trained Language Model model extracts three types of language features and then utilizes the Fast Gradient Method algorithm to perturb the generated embedding layer. Subsequently, the Bi-directional Long Short-Term Memory and Convolutional Neural Networks networks extract contextual syntactic information and local semantic information, respectively. The model achieved an 83.9% classification accuracy when trained on a portion of telecom fraud case data provided by the operational department. The model established in this paper has been deployed in the operational department, freeing up a significant amount of manpower and improving the department's efficiency in combating Telecom Fraud crimes. Furthermore, considering the universality of the model established in this paper, other application scenarios await further exploration.

Updated: 2024-11-11 07:52:38

标题: 一个结合对抗训练、预训练语言模型和神经网络的文本分类模型：以电信诈骗事件文本为案例研究

摘要: 前线警察经常将所有报告的电信诈骗案件分为14个子类别，以便实施有针对性的预防措施，如精准的公众教育。然而，相关数据具有大量、多样化的信息内容和表达变化的特点。目前，缺乏有效和准确的智能模型来取代手动分类，虽然准确，但相对效率低下。为解决这些挑战，本文提出了一种将对抗训练与预训练语言模型和神经网络相结合的文本分类模型。基于语言学动机的预训练语言模型提取三种语言特征，然后利用快速梯度方法算法扰动生成的嵌入层。随后，双向长短期记忆和卷积神经网络提取上下文句法信息和局部语义信息，分别。当在运营部门提供的部分电信诈骗案例数据上进行训练时，该模型达到了83.9%的分类准确率。本文建立的模型已部署在运营部门，释放出大量人力资源，并提高了部门打击电信诈骗犯罪效率。此外，考虑到本文建立的模型的普适性，其他应用场景有待进一步探索。

更新时间: 2024-11-11 07:52:38

领域: cs.AI

下载: http://arxiv.org/abs/2411.06772v1

Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis

Combining gradient compression methods (e.g., CountSketch, quantization) and adaptive optimizers (e.g., Adam, AMSGrad) is a desirable goal in federated learning (FL), with potential benefits on both fewer communication rounds and less per-round communication. In spite of the preliminary empirical success of sketched adaptive methods, existing convergence analyses show the communication cost to have a linear dependence on the ambient dimension, i.e., number of parameters, which is prohibitively high for modern deep learning models. In this work, we introduce specific sketched adaptive federated learning (SAFL) algorithms and, as our main contribution, provide theoretical convergence analyses in different FL settings with guarantees on communication cost depending only logarithmically (instead of linearly) on the ambient dimension. Unlike existing analyses, we show that the entry-wise sketching noise existent in the preconditioners and the first moments of SAFL can be implicitly addressed by leveraging the recently-popularized anisotropic curvatures in deep learning losses, e.g., fast decaying loss Hessian eigen-values. In the i.i.d. client setting of FL, we show that SAFL achieves asymptotic $O(1/\sqrt{T})$ convergence, and converges faster in the initial epochs. In the non-i.i.d. client setting, where non-adaptive methods lack convergence guarantees, we show that SACFL (SAFL with clipping) algorithms can provably converge in spite of the additional heavy-tailed noise. Our theoretical claims are supported by empirical studies on vision and language tasks, and in both fine-tuning and training-from-scratch regimes. Surprisingly, as a by-product of our analysis, the proposed SAFL methods are competitive with the state-of-the-art communication-efficient federated learning algorithms based on error feedback.

Updated: 2024-11-11 07:51:22

标题: 草拟的自适应联邦深度学习：尖锐收敛分析

摘要: 结合梯度压缩方法（例如CountSketch，量化）和自适应优化器（例如Adam，AMSGrad）是联邦学习（FL）中的一个可取目标，可能在通信轮次更少和每轮通信更少方面带来好处。尽管经验上sketched自适应方法取得了初步成功，但现有的收敛分析显示通信成本与环境维度（即参数数量）呈线性关系，对于现代深度学习模型来说是过高的。在这项工作中，我们引入了特定的sketched自适应联邦学习（SAFL）算法，并且作为我们的主要贡献，在不同的FL设置中提供了理论收敛分析，保证通信成本仅以对数方式（而不是线性方式）依赖于环境维度。与现有分析不同，我们展示了SAFL中预处理器和第一时刻存在的逐点草图噪声可以通过利用深度学习损失中最近流行的非均匀曲率（例如快速衰减的损失Hessian特征值）隐式处理。在FL的i.i.d.客户端设置中，我们展示了SAFL实现渐近$O(1/\sqrt{T})$收敛，并在初始阶段收敛更快。在非i.i.d.客户端设置中，非自适应方法缺乏收敛保证，我们展示了SACFL（带截断的SAFL）算法可以在额外的重尾噪声存在的情况下得到可证收敛。我们的理论论断得到了对视觉和语言任务的经验研究的支持，在微调和从头训练的情况下都是如此。令人惊讶的是，作为我们分析的副产品，提出的SAFL方法与基于误差反馈的最先进的通信高效联邦学习算法相竞争。

更新时间: 2024-11-11 07:51:22

领域: cs.LG

下载: http://arxiv.org/abs/2411.06770v1

PDC & DM-SFT: A Road for LLM SQL Bug-Fix Enhancing

Code Large Language Models (Code LLMs), such as Code llama and DeepSeek-Coder, have demonstrated exceptional performance in the code generation tasks. However, most existing models focus on the abilities of generating correct code, but often struggle with bug repair. We introduce a suit of methods to enhance LLM's SQL bug-fixing abilities. The methods are mainly consisted of two parts: A Progressive Dataset Construction (PDC) from scratch and Dynamic Mask Supervised Fine-tuning (DM-SFT). PDC proposes two data expansion methods from the perspectives of breadth first and depth first respectively. DM-SFT introduces an efficient bug-fixing supervised learning approach, which effectively reduce the total training steps and mitigate the "disorientation" in SQL code bug-fixing training. In our evaluation, the code LLM models trained with two methods have exceeds all current best performing model which size is much larger.

Updated: 2024-11-11 07:47:20

标题: PDC与DM-SFT：LLM SQL漏洞修复增强之路

摘要: 大型语言模型（Code LLMs），如Code llama和DeepSeek-Coder，在代码生成任务中表现出色。然而，大多数现有模型侧重于生成正确的代码能力，但在修复错误方面常常遇到困难。我们引入了一套方法来增强LLM的SQL错误修复能力。这些方法主要包括两部分：从头开始的渐进式数据集构建（PDC）和动态蒙版监督微调（DM-SFT）。PDC从广度优先和深度优先的角度提出了两种数据扩展方法。DM-SFT引入了一种高效的错误修复监督学习方法，有效减少了总训练步骤并缓解了SQL代码错误修复训练中的“迷失”现象。在我们的评估中，使用两种方法训练的代码LLM模型超过了所有当前表现最佳的模型，其大小要大得多。

更新时间: 2024-11-11 07:47:20

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.06767v1

Research on an intelligent fault diagnosis method for nuclear power plants based on ETCN-SSA combined algorithm

Utilizing fault diagnosis methods is crucial for nuclear power professionals to achieve efficient and accurate fault diagnosis for nuclear power plants (NPPs). The performance of traditional methods is limited by their dependence on complex feature extraction and skilled expert knowledge, which can be time-consuming and subjective. This paper proposes a novel intelligent fault diagnosis method for NPPs that combines enhanced temporal convolutional network (ETCN) with sparrow search algorithm (SSA). ETCN utilizes temporal convolutional network (TCN), self-attention (SA) mechanism and residual block for enhancing performance. ETCN excels at extracting local features and capturing time series information, while SSA adaptively optimizes its hyperparameters for superior performance. The proposed method's performance is experimentally verified on a CPR1000 simulation dataset. Compared to other advanced intelligent fault diagnosis methods, the proposed one demonstrates superior performance across all evaluation metrics. This makes it a promising tool for NPP intelligent fault diagnosis, ultimately enhancing operational reliability.

Updated: 2024-11-11 07:43:12

标题: 基于ETCN-SSA组合算法的核电厂智能故障诊断方法研究

摘要: 利用故障诊断方法对核电专业人员至关重要，以实现核电厂（NPPs）的高效和准确的故障诊断。传统方法的性能受到复杂特征提取和熟练专家知识的限制，这可能耗时且主观。本文提出了一种新颖的智能故障诊断方法，将增强型时间卷积网络（ETCN）与麻雀搜索算法（SSA）相结合。ETCN利用时间卷积网络（TCN）、自注意机制（SA）和残差块来增强性能。ETCN擅长提取局部特征和捕捉时间序列信息，而SSA自适应地优化其超参数以获得更优性能。所提出方法的性能在CPR1000仿真数据集上经过实验证实。与其他先进的智能故障诊断方法相比，所提出的方法在所有评估指标上表现出更优秀的性能。这使其成为核电厂智能故障诊断的有前途的工具，最终提高了运行可靠性。

更新时间: 2024-11-11 07:43:12

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2411.06765v1

Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning

Vision Language Models (VLMs), pre-trained on large-scale image-text datasets, enable zero-shot predictions for unseen data but may underperform on specific unseen tasks. Continual learning (CL) can help VLMs effectively adapt to new data distributions without joint training, but faces challenges of catastrophic forgetting and generalization forgetting. Although significant progress has been achieved by distillation-based methods, they exhibit two severe limitations. One is the popularly adopted single-teacher paradigm fails to impart comprehensive knowledge, The other is the existing methods inadequately leverage the multimodal information in the original training dataset, instead they rely on additional data for distillation, which increases computational and storage overhead. To mitigate both limitations, by drawing on Knowledge Integration Theory (KIT), we propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods. MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections. During the four stages, we first leverage prototypes to align across modalities, eliciting cross-modal knowledge, then adding new knowledge by constructing fine-grained intra- and inter-modality relationships with prototypes. After that, knowledge from two teacher models is adaptively distinguished and re-weighted. Finally, we connect between models from intra- and inter-task, integrating preceding and new knowledge. Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks, showcasing its potential in adapting VLMs to evolving data distributions.

Updated: 2024-11-11 07:36:19

标题: 多阶段视觉语言模型知识整合用于持续学习

摘要: Vision Language Models (VLMs)是在大规模图像文本数据集上进行预训练的，可以对未见数据进行零样本预测，但在特定未见任务上可能表现不佳。持续学习（CL）可以帮助VLM有效地适应新的数据分布，而无需联合训练，但面临灾难性遗忘和泛化遗忘的挑战。尽管蒸馏方法取得了显著进展，但它们存在两个严重限制。一个是广泛采用的单教师范式无法传授全面的知识，另一个是现有方法未能充分利用原始训练数据集中的多模态信息，而是依赖额外数据进行蒸馏，这会增加计算和存储开销。为了缓解这两个限制，我们借鉴了知识整合理论（KIT），提出了一个多阶段知识整合网络（MulKI）来模拟蒸馏方法中的人类学习过程。MulKI通过四个阶段实现了这一目标，包括引发想法、添加新想法、区分想法和建立联系。在这四个阶段中，我们首先利用原型来对齐跨模态，引发跨模态知识，然后通过与原型建立细粒度的模内和模间关系来添加新知识。之后，从两个教师模型中灵活区分和重新加权知识。最后，我们在任务内和任务间的模型之间建立联系，整合先前和新知识。我们的方法在保持零样本能力的同时，支持跨多样化下游任务的持续学习，展示了其在调整VLM以适应不断演变的数据分布中的潜力。

更新时间: 2024-11-11 07:36:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.06764v1

Precision Glass Thermoforming Assisted by Neural Networks

Glass with good processability, chemical inertness, and optical transparency has been widely used in optical and aesthetic products, many of which require curve pro-files with high precision. To meet the increasingly tightened geometrical tolerances and fast product updating rates, the traditional approach of developing a thermoform-ing process through trials and errors can cause a large waste of time and resources and often end up with failure. Hence, there is a need to develop an efficient predictive model, replacing the costly simulations or experiments, to assist the design of preci-sion glass thermoforming. In this work, we report a dimensionless back-propagation neural network (BPNN) that can adequately predict the form errors and thus compen-sate for these errors in mold design to achieve precision glass molding. Based on the precision molds, also discussed is the issue of error magnification considering that cover glass for AR/VR glasses or smartphones, with extremely large scale of produc-tion, may require a lower level of mold machining accuracy. It is expected that this BPNN will also be implementable in the glass-manufacturing industry, i.e., trained using industrial data for precision mold designs.

Updated: 2024-11-11 07:34:21

标题: 神经网络辅助的精密玻璃热成型

摘要: 具有良好加工性、化学惰性和光学透明性的玻璃被广泛应用于光学和美学产品中，其中许多产品需要具有高精度的曲线轮廓。为了满足日益严格的几何公差要求和快速的产品更新速度，通过试验和错误来开发热成型工艺的传统方法可能会导致大量时间和资源的浪费，并经常以失败告终。因此，有必要开发一种高效的预测模型，以取代昂贵的模拟或实验，帮助设计精密玻璃热成型。在这项工作中，我们报告了一种无量纲反向传播神经网络（BPNN），可以充分预测形状误差，从而在模具设计中补偿这些误差，实现精密玻璃成型。基于精密模具，还讨论了错误放大的问题，考虑到AR/VR眼镜或智能手机的盖玻璃，由于生产规模极大，可能需要更低水平的模具加工精度。预计这种BPNN也将可以在玻璃制造行业中实施，即使用工业数据进行精密模具设计的训练。

更新时间: 2024-11-11 07:34:21

领域: cs.CE,cs.LG

下载: http://arxiv.org/abs/2411.06762v1

RoCar: A Relationship Network-based Evaluation Method for Large Language Models

Large language models (LLMs) have received increasing attention. However, due to the complexity of its capabilities, how to rationally evaluate the capabilities of LLMs is still a task to be solved. We propose the RoCar method, which utilizes the defined basic schemas to randomly construct a task graph and generates natural language evaluation tasks based on the task graph to evaluate the reasoning and memory abilities of LLMs respectively. Due to the very large randomness of the task construction process, it is possible to ensure that none of the LLMs to be tested has directly learned the evaluation tasks, guaranteeing the fairness of the evaluation method.

Updated: 2024-11-11 07:27:03

标题: RoCar：一种基于关系网络的大型语言模型评估方法

摘要: 大型语言模型(LLMs)受到越来越多的关注。然而，由于其能力的复杂性，如何合理评估LLMs的能力仍然是一个待解决的任务。我们提出了RoCar方法，该方法利用定义的基本模式随机构建任务图，并基于任务图生成自然语言评估任务，以分别评估LLMs的推理和记忆能力。由于任务构建过程的非常大的随机性，可以确保被测试的LLMs中没有一个直接学习过评估任务，从而保证了评估方法的公平性。

更新时间: 2024-11-11 07:27:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2307.15997v2

FiSTECH: Financial Style Transfer to Enhance Creativity without Hallucinations in LLMs

Recent trends in Generative AI have emerged towards fine-tuning foundational large language models (LLMs) to create domain-specific LLMs for automation and chatbot-like applications. Specialized applications for analytics-heavy domains such as Financial report generation require specific writing styles that comprise compound and creative sentences with minimized hallucinations. In this work, we explore the self-corrective auto-regressive qualities of LLMs to learn creativity in writing styles with minimal prompting. We propose a novel two-stage fine-tuning (FT) strategy wherein in the first stage public domain financial reports are used to train for writing styles while allowing the LLM to hallucinate. In the second stage the examples of hallucinations are manually corrected and further used to fine-tune the LLM. The finally trained LLM learns to generate specific financial report sections using minimal instructions and tabular data inputs while ensuring low fine-tuning costs. Our proposed two-stage fine-tuning boosts the accuracy of financial questions answering by two-folds while reducing hallucinations by over 50%. Also, the fine-tuned model has lower perplexity, improved ROUGE, TER and BLEU scores, higher creativity and knowledge density with lower uncertainty and cross entropy than base LLMs. Thus, the proposed framework can be generalized to train creativity in LLMs by first allowing them to hallucinate.

Updated: 2024-11-11 07:18:34

标题: FiSTECH：金融风格转移，提高LLM中的创造力而不产生幻觉

摘要: 最近，生成式人工智能的趋势已经转向对基础大型语言模型（LLMs）进行微调，以创建用于自动化和聊天机器人等特定领域的LLMs。对于分析密集型领域如金融报告生成，专门的应用需要特定的写作风格，包括由复合和创意句子组成且幻想最小化。在这项工作中，我们探讨了LLMs的自我校正自回归特性，以学习写作风格的创意，同时最小化提示。我们提出了一种新颖的两阶段微调（FT）策略，在第一阶段使用公共领域的金融报告来训练写作风格，同时允许LLMs产生幻想。在第二阶段，幻想的例子被手动校正，并进一步用于微调LLMs。最终训练的LLMs学会了使用最少的指令和表格数据输入生成特定的金融报告部分，同时确保微调成本低。我们提出的两阶段微调策略将金融问题回答的准确性提高了两倍，同时将幻想减少了超过50%。此外，微调后的模型具有更低的困惑度，改进了ROUGE、TER和BLEU分数，具有更高的创造性和知识密度，同时具有更低的不确定性和交叉熵，比基础LLMs更好。因此，提出的框架可以被泛化以通过首先允许它们产生幻想来训练LLMs的创意。

更新时间: 2024-11-11 07:18:34

领域: cs.CL,cs.AI,cs.CE

下载: http://arxiv.org/abs/2408.05365v3

Overview frequency principle/spectral bias in deep learning

Understanding deep learning is increasingly emergent as it penetrates more and more into industry and science. In recent years, a research line from Fourier analysis sheds lights on this magical "black box" by showing a Frequency Principle (F-Principle or spectral bias) of the training behavior of deep neural networks (DNNs) -- DNNs often fit functions from low to high frequency during the training. The F-Principle is first demonstrated by onedimensional synthetic data followed by the verification in high-dimensional real datasets. A series of works subsequently enhance the validity of the F-Principle. This low-frequency implicit bias reveals the strength of neural network in learning low-frequency functions as well as its deficiency in learning high-frequency functions. Such understanding inspires the design of DNN-based algorithms in practical problems, explains experimental phenomena emerging in various scenarios, and further advances the study of deep learning from the frequency perspective. Although incomplete, we provide an overview of F-Principle and propose some open problems for future research.

Updated: 2024-11-11 07:14:07

标题: 深度学习中的频率原则/频谱偏差综述

摘要: 随着深度学习越来越多地渗透到工业和科学领域，对深度学习的理解日益紧迫。近年来，傅里叶分析的一个研究方向通过展示深度神经网络（DNNs）的训练行为具有频率原则（F-Principle或谱偏差），从而为这个神奇的“黑匣子”带来了一些启示。在训练过程中，DNNs通常适应从低频到高频的函数。首先通过一维合成数据展示了F-Principle，随后在高维度真实数据集中进行验证。随后的一系列研究进一步增强了F-Principle的有效性。这种低频隐含偏差揭示了神经网络在学习低频函数方面的优势，以及在学习高频函数方面的不足。这种理解启发了在实际问题中基于DNN的算法设计，解释了在各种场景中出现的实验现象，并进一步推动了从频率角度研究深度学习。尽管是不完整的，我们提供了F-Principle的概述，并为未来研究提出了一些开放性问题。

更新时间: 2024-11-11 07:14:07

领域: cs.LG

下载: http://arxiv.org/abs/2201.07395v3

KLCBL: An Improved Police Incident Classification Model

Police incident data is crucial for public security intelligence, yet grassroots agencies struggle with efficient classification due to manual inefficiency and automated system limitations, especially in telecom and online fraud cases. This research proposes a multichannel neural network model, KLCBL, integrating Kolmogorov-Arnold Networks (KAN), a linguistically enhanced text preprocessing approach (LERT), Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory (BiLSTM) for police incident classification. Evaluated with real data, KLCBL achieved 91.9% accuracy, outperforming baseline models. The model addresses classification challenges, enhances police informatization, improves resource allocation, and offers broad applicability to other classification tasks.

Updated: 2024-11-11 07:02:23

标题: KLCBL：一种改进的警察事件分类模型

摘要: 警察事件数据对公共安全情报至关重要，然而基层机构由于手动效率和自动化系统限制而在有效分类方面遇到困难，特别是在电信和在线诈骗案件中。本研究提出了一种多通道神经网络模型KLCBL，将科尔莫戈洛夫-阿诺德网络（KAN）、语言增强文本预处理方法（LERT）、卷积神经网络（CNN）和双向长短期记忆（BiLSTM）整合用于警察事件分类。通过真实数据评估，KLCBL实现了91.9%的准确率，优于基准模型。该模型解决了分类挑战，提升了警察信息化水平，改善了资源分配，并为其他分类任务提供了广泛的适用性。

更新时间: 2024-11-11 07:02:23

领域: cs.AI

下载: http://arxiv.org/abs/2411.06749v1

Neuromodulated Meta-Learning

Humans excel at adapting perceptions and actions to diverse environments, enabling efficient interaction with the external world. This adaptive capability relies on the biological nervous system (BNS), which activates different brain regions for distinct tasks. Meta-learning similarly trains machines to handle multiple tasks but relies on a fixed network structure, not as flexible as BNS. To investigate the role of flexible network structure (FNS) in meta-learning, we conduct extensive empirical and theoretical analyses, finding that model performance is tied to structure, with no universally optimal pattern across tasks. This reveals the crucial role of FNS in meta-learning, ensuring meta-learning to generate the optimal structure for each task, thereby maximizing the performance and learning efficiency of meta-learning. Motivated by this insight, we propose to define, measure, and model FNS in meta-learning. First, we define that an effective FNS should possess frugality, plasticity, and sensitivity. Then, to quantify FNS in practice, we present three measurements for these properties, collectively forming the \emph{structure constraint} with theoretical supports. Building on this, we finally propose Neuromodulated Meta-Learning (NeuronML) to model FNS in meta-learning. It utilizes bi-level optimization to update both weights and structure with the structure constraint. Extensive theoretical and empirical evaluations demonstrate the effectiveness of NeuronML on various tasks. Code is publicly available at \href{https://github.com/WangJingyao07/NeuronML}{https://github.com/WangJingyao07/NeuronML}.

Updated: 2024-11-11 06:54:25

标题: 神经调节的元学习

摘要: 人类在适应各种环境，并与外部世界高效交互方面具有优势。这种适应能力依赖于生物神经系统（BNS），激活不同的大脑区域以执行不同任务。元学习同样训练机器处理多个任务，但依赖于固定的网络结构，不像BNS那样灵活。为了探究灵活网络结构（FNS）在元学习中的作用，我们进行了大量经验和理论分析，发现模型表现与结构相关，没有适用于所有任务的普遍最佳模式。这揭示了FNS在元学习中的关键作用，确保元学习为每个任务生成最佳结构，从而最大化元学习的性能和学习效率。受此启发，我们提出定义、衡量和模拟元学习中的FNS。首先，我们定义有效的FNS应具备节俭性、可塑性和敏感性。然后，为了实践中量化FNS，我们提出了三种属性的测量方法，共同构成具有理论支持的“结构约束”。基于此，我们最终提出了神经调节元学习（NeuronML）来模拟元学习中的FNS。它利用双层优化来更新权重和结构，并遵循结构约束。广泛的理论和实证评估证明了NeuronML在各种任务中的有效性。代码可在以下链接公开获取：https://github.com/WangJingyao07/NeuronML。

更新时间: 2024-11-11 06:54:25

领域: cs.LG

下载: http://arxiv.org/abs/2411.06746v1

Methane projections from Canada's oil sands tailings using scientific deep learning reveal significant underestimation

Bitumen extraction for the production of synthetic crude oil in Canada's Athabasca Oil Sands industry has recently come under spotlight for being a significant source of greenhouse gas emission. A major cause of concern is methane, a greenhouse gas produced by the anaerobic biodegradation of hydrocarbons in oil sands residues, or tailings, stored in settle basins commonly known as oil sands tailing ponds. In order to determine the methane emitting potential of these tailing ponds and have future methane projections, we use real-time weather data, mechanistic models developed from laboratory controlled experiments, and industrial reports to train a physics constrained machine learning model. Our trained model can successfully identify the directions of active ponds and estimate their emission levels, which are generally hard to obtain due to data sampling restrictions. We found that each active oil sands tailing pond could emit between 950 to 1500 tonnes of methane per year, whose environmental impact is equivalent to carbon dioxide emissions from at least 6000 gasoline powered vehicles. Although abandoned ponds are often presumed to have insignificant emissions, our findings indicate that these ponds could become active over time and potentially emit up to 1000 tonnes of methane each year. Taking an average over all datasets that was used in model training, we estimate that emissions around major oil sands regions would need to be reduced by approximately 12% over a year, to reduce the average methane concentrations to 2005 levels.

Updated: 2024-11-11 06:37:09

标题: 加拿大油砂尾矿中甲烷排放的科学深度学习预测揭示了显著的低估

摘要: 在加拿大阿萨巴斯卡油砂产业中，为生产合成原油而提取沥青最近成为关注焦点，因为它是温室气体排放的重要来源。人们担心的一个主要原因是甲烷，一种由油砂残留物或尾矿中的碳氢化合物厌氧生物降解产生的温室气体，在通常被称为油砂尾矿池的沉降池中排放。为了确定这些尾矿池的甲烷排放潜力并进行未来的甲烷预测，我们使用实时气象数据、从实验室控制实验中开发的机械模型以及工业报告来训练一个受物理约束的机器学习模型。我们训练的模型可以成功识别活跃池的方向并估计它们的排放水平，这通常由于数据采样限制而难以获得。我们发现每个活跃的油砂尾矿池每年可能排放950至1500吨甲烷，其环境影响相当于至少6000辆汽油车的二氧化碳排放量。尽管通常认为废弃的池塘排放量微不足道，但我们的发现表明，这些池塘随着时间的推移可能会变得活跃，并潜在地每年排放高达1000吨甲烷。在模型训练中使用的所有数据集的平均值上，我们估计需要在一年内将主要油砂地区的排放量减少约12%，以将平均甲烷浓度降至2005年的水平。

更新时间: 2024-11-11 06:37:09

领域: stat.AP,cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.06741v1

GenAI Arena: An Open Evaluation Platform for Generative Models

Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the nuanced quality and user satisfaction associated with generative outputs. This paper proposes an open platform GenAI-Arena to evaluate different image and video generative models, where users can actively participate in evaluating these models. By leveraging collective user feedback and votes, GenAI-Arena aims to provide a more democratic and accurate measure of model performance. It covers three tasks of text-to-image generation, text-to-video generation, and image editing respectively. Currently, we cover a total of 35 open-source generative models. GenAI-Arena has been operating for seven months, amassing over 9000 votes from the community. We describe our platform, analyze the data, and explain the statistical methods for ranking the models. To further promote the research in building model-based evaluation metrics, we release a cleaned version of our preference data for the three tasks, namely GenAI-Bench. We prompt the existing multi-modal models like Gemini, and GPT-4o to mimic human voting. We compute the accuracy by comparing the model voting with the human voting to understand their judging abilities. Our results show existing multimodal models are still lagging in assessing the generated visual content, even the best model GPT-4o only achieves an average accuracy of 49.19 across the three generative tasks. Open-source MLLMs perform even worse due to the lack of instruction-following and reasoning ability in complex vision scenarios.

Updated: 2024-11-11 06:32:24

标题: GenAI Arena：一个用于生成模型的开放式评估平台

摘要: 生成式人工智能已经取得了显著进展，以革新图像和视频生成等领域。这些进步是由创新的算法、架构和数据推动的。然而，生成模型的快速增长突显了一个关键的缺口：缺乏可信的评估指标。当前的自动评估方法，如FID、CLIP、FVD等，通常无法捕捉生成输出所关联的微妙质量和用户满意度。本文提出了一个名为GenAI-Arena的开放平台，用于评估不同的图像和视频生成模型，用户可以积极参与评估这些模型。通过利用集体用户反馈和投票，GenAI-Arena旨在提供更民主和准确的模型性能衡量。它涵盖了文本到图像生成、文本到视频生成和图像编辑三个任务。目前，我们涵盖了共计35个开源生成模型。GenAI-Arena已经运行了七个月，从社区中获得了超过9000次投票。我们描述了我们的平台，分析了数据，并解释了排名模型的统计方法。为了进一步促进建立基于模型的评估指标的研究，我们发布了三个任务的我们偏好数据的清理版本，即GenAI-Bench。我们促使现有的多模态模型如Gemini和GPT-4o模仿人类投票。我们通过比较模型投票和人类投票来计算准确性，以了解它们的评判能力。我们的结果显示，现有的多模态模型在评估生成的视觉内容方面仍然落后，即使最好的模型GPT-4o在三个生成任务中的平均准确率也仅为49.19。由于缺乏在复杂视觉场景中遵循指示和推理能力，开源MLLMs表现得更糟。

更新时间: 2024-11-11 06:32:24

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.04485v4

OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching

Hallucinations of large language models (LLMs) commonly occur in domain-specific downstream tasks, with no exception in ontology matching (OM). The prevalence of using LLMs for OM raises the need for benchmarks to better understand LLM hallucinations. The OAEI-LLM dataset is an extended version of the Ontology Alignment Evaluation Initiative (OAEI) datasets that evaluate LLM-specific hallucinations in OM tasks. We outline the methodology used in dataset construction and schema extension, and provide examples of potential use cases.

Updated: 2024-11-11 06:26:39

标题: OAEI-LLM：用于理解本体匹配中大型语言模型幻觉的基准数据集

摘要: 大型语言模型（LLMs）的幻觉通常在特定领域的下游任务中发生，本体匹配（OM）也不例外。使用LLMs进行OM的普遍性提高了对基准的需求，以更好地理解LLM的幻觉。OAEI-LLM数据集是本体对齐评估倡议（OAEI）数据集的扩展版本，用于评估OM任务中LLM特定的幻觉。我们概述了数据集构建和模式扩展的方法，并提供了潜在用例的示例。

更新时间: 2024-11-11 06:26:39

领域: cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2409.14038v4

Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening

Molecular docking enables virtual screening of compound libraries to identify potential ligands that target proteins of interest, a crucial step in drug development; however, as the size of the compound library increases, the computational complexity of traditional docking models increases. Deep learning algorithms can provide data-driven research and development models to increase the speed of the docking process. Unfortunately, few models can achieve superior screening performance compared to that of traditional models. Therefore, a novel deep learning-based docking approach named Dockformer is introduced in this study. Dockformer leverages multimodal information to capture the geometric topology and structural knowledge of molecules and can directly generate binding conformations with the corresponding confidence measures in an end-to-end manner. The experimental results show that Dockformer achieves success rates of 90.53\% and 82.71\% on the PDBbind core set and PoseBusters benchmarks, respectively, and more than a 100-fold increase in the inference process speed, outperforming almost all state-of-the-art docking methods. In addition, the ability of Dockformer to identify the main protease inhibitors of coronaviruses is demonstrated in a real-world virtual screening scenario. Considering its high docking accuracy and screening efficiency, Dockformer can be regarded as a powerful and robust tool in the field of drug design.

Updated: 2024-11-11 06:25:13

标题: Dockformer：一种基于变压器的大规模虚拟筛选分子对接范式

摘要: 分子对接使得虚拟筛选化合物库以识别可能靶向感兴趣蛋白的配体成为可能，这是药物开发中关键的一步；然而，随着化合物库规模的增加，传统对接模型的计算复杂度也在增加。深度学习算法可以提供数据驱动的研发模型，以增加对接过程的速度。不幸的是，很少有模型能够达到优于传统模型的筛选性能。因此，在本研究中引入了一种名为Dockformer的新型基于深度学习的对接方法。Dockformer利用多模态信息捕获分子的几何拓扑和结构知识，并可以直接以端到端的方式生成具有相应置信度的结合构象。实验结果表明，Dockformer在PDBbind核心集和PoseBusters基准上分别取得了90.53%和82.71%的成功率，并且推断过程速度提高了100多倍，胜过几乎所有最先进的对接方法。此外，Dockformer在真实的虚拟筛选场景中展示了识别冠状病毒主蛋白酶抑制剂的能力。考虑到其高对接准确性和筛选效率，Dockformer可以被视为药物设计领域中的一个强大而稳健的工具。

更新时间: 2024-11-11 06:25:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.06740v1

SPRING Lab IITM's submission to Low Resource Indic Language Translation Shared Task

We develop a robust translation model for four low-resource Indic languages: Khasi, Mizo, Manipuri, and Assamese. Our approach includes a comprehensive pipeline from data collection and preprocessing to training and evaluation, leveraging data from WMT task datasets, BPCC, PMIndia, and OpenLanguageData. To address the scarcity of bilingual data, we use back-translation techniques on monolingual datasets for Mizo and Khasi, significantly expanding our training corpus. We fine-tune the pre-trained NLLB 3.3B model for Assamese, Mizo, and Manipuri, achieving improved performance over the baseline. For Khasi, which is not supported by the NLLB model, we introduce special tokens and train the model on our Khasi corpus. Our training involves masked language modelling, followed by fine-tuning for English-to-Indic and Indic-to-English translations.

Updated: 2024-11-11 06:25:04

标题: SPRING实验室IITM对低资源印度语言翻译共享任务的提交

摘要: 我们为四种资源稀缺的印度语言：卡西语、密佐语、曼尼普尔语和阿萨姆语开发了一个强大的翻译模型。我们的方法包括从数据收集和预处理到训练和评估的全面流程，利用了WMT任务数据集、BPCC、PMIndia和OpenLanguageData的数据。为了解决双语数据稀缺的问题，我们对密佐语和卡西语的单语数据集使用反向翻译技术，显著扩大了我们的训练语料库。我们对预训练的NLLB 3.3B模型进行微调，用于阿萨姆语、密佐语和曼尼普尔语，实现了比基准更好的性能。对于不受NLLB模型支持的卡西语，我们引入了特殊标记，并在我们的卡西语语料库上训练模型。我们的训练包括掩模语言建模，接着进行英译印和印译英的微调。

更新时间: 2024-11-11 06:25:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.00727v2

BFA-YOLO: A balanced multiscale object detection network for building façade attachments detection

The detection of fa\c{c}ade elements on buildings, such as doors, windows, balconies, air conditioning units, billboards, and glass curtain walls, is a critical step in automating the creation of Building Information Modeling (BIM). Yet, this field faces significant challenges, including the uneven distribution of fa\c{c}ade elements, the presence of small objects, and substantial background noise, which hamper detection accuracy. To address these issues, we develop the BFA-YOLO model and the BFA-3D dataset in this study. The BFA-YOLO model is an advanced architecture designed specifically for analyzing multi-view images of fa\c{c}ade attachments. It integrates three novel components: the Feature Balanced Spindle Module (FBSM) that tackles the issue of uneven object distribution; the Target Dynamic Alignment Task Detection Head (TDATH) that enhances the detection of small objects; and the Position Memory Enhanced Self-Attention Mechanism (PMESA), aimed at reducing the impact of background noise. These elements collectively enable BFA-YOLO to effectively address each challenge, thereby improving model robustness and detection precision. The BFA-3D dataset, offers multi-view images with precise annotations across a wide range of fa\c{c}ade attachment categories. This dataset is developed to address the limitations present in existing fa\c{c}ade detection datasets, which often feature a single perspective and insufficient category coverage. Through comparative analysis, BFA-YOLO demonstrated improvements of 1.8\% and 2.9\% in mAP$_{50}$ on the BFA-3D dataset and the public Fa\c{c}ade-WHU dataset, respectively, when compared to the baseline YOLOv8 model. These results highlight the superior performance of BFA-YOLO in fa\c{c}ade element detection and the advancement of intelligent BIM technologies.

Updated: 2024-11-11 06:23:21

标题: BFA-YOLO：用于建筑立面附属物检测的平衡多尺度目标检测网络

摘要: 建筑物上的立面元素的检测，如门、窗户、阳台、空调机组、广告牌和玻璃幕墙，是自动创建建筑信息建模（BIM）的关键步骤。然而，这一领域面临着重大挑战，包括立面元素的不均匀分布、小物体的存在以及大量的背景噪音，这些都影响了检测的准确性。为了解决这些问题，在本研究中我们开发了BFA-YOLO模型和BFA-3D数据集。BFA-YOLO模型是专门设计用于分析多视角立面附件图像的先进架构。它集成了三个新颖的组件：处理不均匀对象分布问题的特征平衡主轴模块（FBSM）；增强小物体检测的目标动态对齐任务检测头（TDATH）；以及旨在减少背景噪音影响的位置记忆增强自注意机制（PMESA）。这些元素共同使BFA-YOLO能够有效解决每一个挑战，从而提高模型的鲁棒性和检测精度。BFA-3D数据集提供了跨多个立面附件类别的精确注释的多视角图像。该数据集旨在解决现有立面检测数据集中存在的局限性，这些数据集通常只有一个视角，且类别覆盖不足。通过比较分析，与基准YOLOv8模型相比，BFA-YOLO在BFA-3D数据集和公共Fa\c{c}ade-WHU数据集上分别提高了1.8\%和2.9\%的mAP$_{50}$。这些结果突显了BFA-YOLO在立面元素检测方面的卓越性能，以及智能BIM技术的进步。

更新时间: 2024-11-11 06:23:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.04025v2

Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback

We consider regret minimization in low-rank MDPs with fixed transition and adversarial losses. Previous work has investigated this problem under either full-information loss feedback with unknown transitions (Zhao et al., 2024), or bandit loss feedback with known transition (Foster et al., 2022). First, we improve the $poly(d, A, H)T^{5/6}$ regret bound of Zhao et al. (2024) to $poly(d, A, H)T^{2/3}$ for the full-information unknown transition setting, where d is the rank of the transitions, A is the number of actions, H is the horizon length, and T is the number of episodes. Next, we initiate the study on the setting with bandit loss feedback and unknown transitions. Assuming that the loss has a linear structure, we propose both model based and model free algorithms achieving $poly(d, A, H)T^{2/3}$ regret, though they are computationally inefficient. We also propose oracle-efficient model-free algorithms with $poly(d, A, H)T^{4/5}$ regret. We show that the linear structure is necessary for the bandit case without structure on the reward function, the regret has to scale polynomially with the number of states. This is contrary to the full-information case (Zhao et al., 2024), where the regret can be independent of the number of states even for unstructured reward function.

Updated: 2024-11-11 06:19:33

标题: 用未知转移和贝叶斯反馈击败对抗性低秩MDPs

摘要: 我们考虑在具有固定转移和对抗性损失的低秩MDPs中进行遗憾最小化。先前的工作已经在未知转移的全信息损失反馈（Zhao等，2024年）或已知转移的臂损失反馈（Foster等，2022年）下研究了这个问题。首先，我们将Zhao等人（2024年）的$poly(d, A, H)T^{5/6}$遗憾界限改进为全信息未知转移设置的$poly(d, A, H)T^{2/3}$，其中d是转移的秩，A是动作数量，H是地平线长度，T是剧集数量。接下来，我们开始研究具有臂损失反馈和未知转移的情景。假设损失具有线性结构，我们提出了既有模型又无模型的算法，实现了$poly(d, A, H)T^{2/3}$的遗憾，尽管它们在计算上是低效的。我们还提出了具有oracle高效性的无模型算法，其遗憾为$poly(d, A, H)T^{4/5}$。我们表明线性结构在臂损失情况下是必要的，没有奖励函数结构的情况下，遗憾必须与状态数量多项式级增长。这与全信息情况相反（Zhao等，2024年），在那种情况下，即使奖励函数没有结构，遗憾也可以独立于状态数量。

更新时间: 2024-11-11 06:19:33

领域: cs.LG

下载: http://arxiv.org/abs/2411.06739v1

Deep graph kernel point processes

Point process models are widely used for continuous asynchronous event data, where each data point includes time and additional information called "marks", which can be locations, nodes, or event types. This paper presents a novel point process model for discrete event data over graphs, where the event interaction occurs within a latent graph structure. Our model builds upon Hawkes's classic influence kernel-based formulation in the original self-exciting point processes work to capture the influence of historical events on future events' occurrence. The key idea is to represent the influence kernel by Graph Neural Networks (GNN) to capture the underlying graph structure while harvesting the strong representation power of GNNs. Compared with prior works focusing on directly modeling the conditional intensity function using neural networks, our kernel presentation herds the repeated event influence patterns more effectively by combining statistical and deep models, achieving better model estimation/learning efficiency and superior predictive performance. Our work significantly extends the existing deep spatio-temporal kernel for point process data, which is inapplicable to our setting due to the fundamental difference in the nature of the observation space being Euclidean rather than a graph. We present comprehensive experiments on synthetic and real-world data to show the superior performance of the proposed approach against the state-of-the-art in predicting future events and uncovering the relational structure among data.

Updated: 2024-11-11 06:12:24

标题: 深度图核点过程

摘要: 点过程模型被广泛用于连续的异步事件数据，其中每个数据点包括时间和被称为“标记”的附加信息，可以是位置、节点或事件类型。本文提出了一种新颖的点过程模型，用于图上的离散事件数据，其中事件交互发生在潜在的图结构内。我们的模型建立在Hawkes的经典影响核心基于原始自激点过程工作的公式上，以捕获历史事件对未来事件发生的影响。关键思想是通过图神经网络（GNN）来表示影响核心，以捕获潜在的图结构，同时利用GNN的强大表征能力。与专注于直接使用神经网络建模条件密度函数的先前工作相比，我们的核心表示通过结合统计和深度模型更有效地捕捉重复事件影响模式，实现更好的模型估计/学习效率和更优越的预测性能。我们的工作显著扩展了现有的深度时空核心，用于点过程数据，由于观测空间的本质差异为欧几里得而不是图，因此无法应用于我们的设置。我们对合成和真实世界数据进行了全面的实验，展示了所提出方法在预测未来事件和揭示数据之间关系结构方面优于现有技术的卓越性能。

更新时间: 2024-11-11 06:12:24

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2306.11313v4

Mr.Steve: Instruction-Following Agents in Minecraft with What-Where-When Memory

Significant advances have been made in developing general-purpose embodied AI in environments like Minecraft through the adoption of LLM-augmented hierarchical approaches. While these approaches, which combine high-level planners with low-level controllers, show promise, low-level controllers frequently become performance bottlenecks due to repeated failures. In this paper, we argue that the primary cause of failure in many low-level controllers is the absence of an episodic memory system. To address this, we introduce Mr.Steve (Memory Recall Steve-1), a novel low-level controller equipped with Place Event Memory (PEM), a form of episodic memory that captures what, where, and when information from episodes. This directly addresses the main limitation of the popular low-level controller, Steve-1. Unlike previous models that rely on short-term memory, PEM organizes spatial and event-based data, enabling efficient recall and navigation in long-horizon tasks. Additionally, we propose an Exploration Strategy and a Memory-Augmented Task Solving Framework, allowing agents to alternate between exploration and task-solving based on recalled events. Our approach significantly improves task-solving and exploration efficiency compared to existing methods. We will release our code and demos on the project page: https://sites.google.com/view/mr-steve.

Updated: 2024-11-11 06:04:53

标题: 史蒂夫先生：具有何时何地记忆的指示遵循代理在《我的世界》中

摘要: 在像Minecraft这样的环境中，通过采用LLM增强的层次方法，取得了在发展通用目的的具体化人工智能方面的重大进展。虽然这些方法将高级规划者与低级控制器结合起来，显示出了潜力，但低级控制器经常因反复失败而成为性能瓶颈。在本文中，我们认为许多低级控制器失败的主要原因是缺乏一个情节记忆系统。为了解决这个问题，我们引入了Mr.Steve（Memory Recall Steve-1），一个新颖的低级控制器，配备了Place Event Memory（PEM），一种捕捉事件中的“什么，哪里，什么时候”信息的情节记忆。这直接解决了流行的低级控制器Steve-1的主要限制。与依赖短期记忆的先前模型不同，PEM组织了空间和基于事件的数据，实现了对长期任务中的高效回忆和导航。此外，我们提出了一种探索策略和一种记忆增强任务解决框架，使代理能够根据回忆的事件在探索和任务解决之间交替。我们的方法相比现有方法显著提高了任务解决和探索效率。我们将在项目页面 https://sites.google.com/view/mr-steve 上发布我们的代码和演示。

更新时间: 2024-11-11 06:04:53

领域: cs.LG

下载: http://arxiv.org/abs/2411.06736v1

Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data

Current forecasting approaches are largely unimodal and ignore the rich textual data that often accompany the time series due to lack of well-curated multimodal benchmark dataset. In this work, we develop TimeText Corpus (TTC), a carefully curated, time-aligned text and time dataset for multimodal forecasting. Our dataset is composed of sequences of numbers and text aligned to timestamps, and includes data from two different domains: climate science and healthcare. Our data is a significant contribution to the rare selection of available multimodal datasets. We also propose the Hybrid Multi-Modal Forecaster (Hybrid-MMF), a multimodal LLM that jointly forecasts both text and time series data using shared embeddings. However, contrary to our expectations, our Hybrid-MMF model does not outperform existing baselines in our experiments. This negative result highlights the challenges inherent in multimodal forecasting. Our code and data are available at https://github.com/Rose-STL-Lab/Multimodal_ Forecasting.

Updated: 2024-11-11 06:04:15

标题: 多模态预测器：联合预测时间序列和文本数据

摘要: 当前的预测方法主要是单峰的，忽视了通常伴随时间序列的丰富文本数据，因为缺乏精心策划的多模态基准数据集。在这项工作中，我们开发了TimeText Corpus（TTC），这是一个精心策划的、时间对齐的文本和时间数据集，用于多模态预测。我们的数据集由与时间戳对齐的数字和文本序列组成，并包括来自两个不同领域的数据：气候科学和医疗保健。我们的数据是现有少数多模态数据集中的重要贡献。我们还提出了Hybrid Multi-Modal Forecaster（Hybrid-MMF），这是一个多模态LLM，使用共享嵌入同时预测文本和时间序列数据。然而，与我们的期望相反，我们的Hybrid-MMF模型在实验中并未超越现有基线。这一负面结果突显了多模态预测中固有的挑战。我们的代码和数据可在https://github.com/Rose-STL-Lab/Multimodal_Forecasting获取。

更新时间: 2024-11-11 06:04:15

领域: cs.AI

下载: http://arxiv.org/abs/2411.06735v1

GSL-PCD: Improving Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning

Generalization in Deep Reinforcement Learning (DRL) across unseen environment variations often requires training over a diverse set of scenarios. Many existing DRL algorithms struggle with efficiency when handling numerous variations. The Generalist-Specialist Learning (GSL) framework addresses this by first training a generalist model on all variations, then creating specialists from the generalist's weights, each focusing on a subset of variations. The generalist then refines its learning with assistance from the specialists. However, random task partitioning in GSL can impede performance by assigning vastly different variations to the same specialist, often resulting in each specialist focusing on only one variation, which raises computational costs. To improve this, we propose Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning (GSL-PCD). Our approach clusters environment variations based on features extracted from object point clouds and uses balanced clustering with a greedy algorithm to assign similar variations to the same specialist. Evaluations on robotic manipulation tasks from the ManiSkill benchmark demonstrate that point cloud feature-based partitioning outperforms vanilla partitioning by 9.4%, with a fixed number of specialists, and reduces computational and sample requirements by 50% to achieve comparable performance.

Updated: 2024-11-11 06:03:42

标题: GSL-PCD：利用点云特征进行任务分区以改进广义-专业学习

摘要: 深度强化学习（DRL）中的泛化在看不见的环境变化中经常需要在多种情景下进行训练。许多现有的DRL算法在处理大量变化时效率低下。泛化-专家学习（GSL）框架通过首先在所有变化上训练一个通用模型，然后从通用模型的权重中创建专家，每个专家专注于一组变化。通用模型然后通过专家的帮助来完善学习。然而，在GSL中的随机任务分配可能会阻碍性能，因为将非常不同的变化分配给同一个专家，通常导致每个专家只关注一个变化，这会提高计算成本。为了改进这一点，我们提出了基于点云特征的泛化-专家学习任务划分（GSL-PCD）。我们的方法基于从物体点云中提取的特征对环境变化进行聚类，并使用贪婪算法进行平衡聚类，将相似的变化分配给相同的专家。来自ManiSkill基准测试的机器人操纵任务的评估表明，基于点云特征的分区比普通分区表现优异9.4％，在固定数量的专家的情况下，将计算和样本需求降低了50％以实现可比较的性能。

更新时间: 2024-11-11 06:03:42

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2411.06733v1

Verifiable Quantum Advantage without Structure

We show the following hold, unconditionally unless otherwise stated, relative to a random oracle: - There are NP search problems solvable by quantum polynomial-time machines but not classical probabilistic polynomial-time machines. - There exist functions that are one-way, and even collision resistant, against classical adversaries but are easily inverted quantumly. Similar separations hold for digital signatures and CPA-secure public key encryption (the latter requiring the assumption of a classically CPA-secure encryption scheme). Interestingly, the separation does not necessarily extend to the case of other cryptographic objects such as PRGs. - There are unconditional publicly verifiable proofs of quantumness with the minimal rounds of interaction: for uniform adversaries, the proofs are non-interactive, whereas for non-uniform adversaries the proofs are two message public coin. - Our results do not appear to contradict the Aaronson-Ambanis conjecture. Assuming this conjecture, there exist publicly verifiable certifiable randomness, again with the minimal rounds of interaction. By replacing the random oracle with a concrete cryptographic hash function such as SHA2, we obtain plausible Minicrypt instantiations of the above results. Previous analogous results all required substantial structure, either in terms of highly structured oracles and/or algebraic assumptions in Cryptomania and beyond.

Updated: 2024-11-11 05:57:41

标题: 无结构可验证的量子优势

摘要: 我们展示以下结论，除非另有说明，在随机预言机的条件下，相对于一个随机预言机： - 存在NP搜索问题，可以由量子多项式时间机器解决，但经典概率多项式时间机器无法解决。 - 存在一种单向函数，甚至对抗经典对手也是碰撞抗性的，但在量子环境中很容易被反转。类似的分离也适用于数字签名和CPA安全的公钥加密（后者需要基于经典CPA安全加密方案的假设）。有趣的是，这种分离并不一定适用于其他加密对象，如PRGs。 - 存在无条件的公开可验证的量子性证明，交互轮数最少：对于均匀对手，证明是非交互式的，而对于非均匀对手，证明是两条消息的公共硬币。 - 我们的结果似乎不与Aaronson-Ambanis猜想相矛盾。假设这一猜想成立，存在公开可验证的可证明随机性，同样具有最少的交互轮数。通过用具体的加密哈希函数（如SHA2）替换随机预言机，我们得到了上述结果的合理Minicrypt实例化。以前的类似结果都需要大量的结构，无论是在Cryptomania及其后续中使用高度结构化的预言机和/或代数假设。

更新时间: 2024-11-11 05:57:41

领域: quant-ph,cs.CC,cs.CR

下载: http://arxiv.org/abs/2204.02063v3

On the Principles of ReLU Networks with One Hidden Layer

A neural network with one hidden layer or a two-layer network (regardless of the input layer) is the simplest feedforward neural network, whose mechanism may be the basis of more general network architectures. However, even to this type of simple architecture, it is also a ``black box''; that is, it remains unclear how to interpret the mechanism of its solutions obtained by the back-propagation algorithm and how to control the training process through a deterministic way. This paper systematically studies the first problem by constructing universal function-approximation solutions. It is shown that, both theoretically and experimentally, the training solution for the one-dimensional input could be completely understood, and that for a higher-dimensional input can also be well interpreted to some extent. Those results pave the way for thoroughly revealing the black box of two-layer ReLU networks and advance the understanding of deep ReLU networks.

Updated: 2024-11-11 05:51:11

标题: 关于具有一层隐藏层的ReLU网络原理

摘要: 一个具有一个隐藏层或两个层的神经网络（无论输入层是什么）是最简单的前馈神经网络，其机制可能是更一般网络架构的基础。然而，即使对于这种简单的架构，它也是一个“黑匣子”; 也就是说，如何解释通过反向传播算法获得的解决方案的机制以及如何通过确定性方式控制训练过程仍然不清楚。本文通过构建通用函数逼近解系统地研究了第一个问题。理论上和实验上表明，一维输入的训练解决方案可以完全理解，对于更高维度的输入也可以在一定程度上很好地解释。这些结果为彻底揭示两层ReLU网络的黑匣子铺平了道路，并推进了对深层ReLU网络的理解。

更新时间: 2024-11-11 05:51:11

领域: cs.LG,cs.AI,cs.NE,68T07(Primary), 41A15(Secondary),I.2.6; G.1.2

下载: http://arxiv.org/abs/2411.06728v1

PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis

Spatial multi-modal omics technology, highlighted by Nature Methods as an advanced biological technique in 2023, plays a critical role in resolving biological regulatory processes with spatial context. Recently, graph neural networks based on K-nearest neighbor (KNN) graphs have gained prominence in spatial multi-modal omics methods due to their ability to model semantic relations between sequencing spots. However, the fixed KNN graph fails to capture the latent semantic relations hidden by the inevitable data perturbations during the biological sequencing process, resulting in the loss of semantic information. In addition, the common lack of spot annotation and class number priors in practice further hinders the optimization of spatial multi-modal omics models. Here, we propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA). PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge. Moreover, a dynamic prototype contrastive learning is proposed based on the dynamic adaptability of Bayesian Gaussian Mixture Models to optimize the multi-modal omics representations for unknown biological priors. Quantitative and qualitative experiments on simulated and real datasets with 7 competing methods demonstrate the superior performance of PRAGA.

Updated: 2024-11-11 05:32:27

标题: PRAGA：用于空间多模式组学分析的原型感知图自适应聚合

摘要: 空间多模组学技术，被《自然方法》杂志2023年评为先进的生物技术，在解决具有空间背景的生物调控过程中发挥着关键作用。最近，基于K最近邻（KNN）图的图神经网络在空间多模组学方法中备受关注，因为它们能够建模测序点之间的语义关系。然而，固定的KNN图无法捕捉生物测序过程中不可避免的数据扰动隐藏的潜在语义关系，导致语义信息的丢失。此外，在实践中普遍缺乏点注释和类别数量先验进一步阻碍了空间多模组学模型的优化。在这里，我们提出了一个新颖的空间多模组学解决框架，称为PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis（PRAGA）。PRAGA构建了一个动态图来捕捉潜在的语义关系，并全面整合空间信息和特征语义。可学习的图结构还可以通过学习跨模态知识来去噪。此外，基于贝叶斯高斯混合模型的动态适应性提出了一种动态原型对比学习，以优化未知生物先验的多模组学表示。在模拟和真实数据集上与7种竞争方法进行的定量和定性实验表明了PRAGA的卓越性能。

更新时间: 2024-11-11 05:32:27

领域: q-bio.GN,cs.LG

下载: http://arxiv.org/abs/2409.12728v3

Weakly Supervised Label Learning Flows

Supervised learning usually requires a large amount of labelled data. However, attaining ground-truth labels is costly for many tasks. Alternatively, weakly supervised methods learn with cheap weak signals that only approximately label some data. Many existing weakly supervised learning methods learn a deterministic function that estimates labels given the input data and weak signals. In this paper, we develop label learning flows (LLF), a general framework for weakly supervised learning problems. Our method is a generative model based on normalizing flows. The main idea of LLF is to optimize the conditional likelihoods of all possible labelings of the data within a constrained space defined by weak signals. We develop a training method for LLF that trains the conditional flow inversely and avoids estimating the labels. Once a model is trained, we can make predictions with a sampling algorithm. We apply LLF to three weakly supervised learning problems. Experiment results show that our method outperforms many baselines we compare against.

Updated: 2024-11-11 05:16:13

标题: 弱监督标签学习流程

摘要: 监督学习通常需要大量标记数据。然而，获取地面真实标签对许多任务来说是昂贵的。相反，弱监督方法使用廉价的弱信号学习，这些信号仅大致标记一些数据。许多现有的弱监督学习方法学习一个确定性函数，该函数根据输入数据和弱信号估计标签。在本文中，我们开发了标签学习流（LLF），这是一个用于弱监督学习问题的通用框架。我们的方法是基于归一化流的生成模型。LLF的主要思想是优化弱信号定义的约束空间内数据的所有可能标记的条件似然。我们为LLF开发了一种训练方法，该方法反向训练条件流并避免估计标签。一旦模型训练完毕，我们可以使用采样算法进行预测。我们将LLF应用于三个弱监督学习问题。实验结果表明，我们的方法胜过我们进行比较的许多基线。

更新时间: 2024-11-11 05:16:13

领域: cs.LG

下载: http://arxiv.org/abs/2302.09649v2

Video Summarization: Towards Entity-Aware Captions

Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities for meaningful summarization. As such, we propose the task of summarizing news video directly to entity-aware captions. We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task. Further, we propose a method that augments visual information from videos with context retrieved from external world knowledge to generate entity-aware captions. We demonstrate the effectiveness of our approach on three video captioning models. We also show that our approach generalizes to existing news image captions dataset. With all the extensive experiments and insights, we believe we establish a solid basis for future research on this challenging task.

Updated: 2024-11-11 05:14:15

标题: 视频摘要：朝向实体感知字幕

摘要: 现有的流行视频字幕基准和模型处理缺乏特定人物、地点或组织命名实体的通用字幕。相比之下，新闻视频提供了一个具有挑战性的环境，其中字幕需要这些命名实体才能进行有意义的摘要。因此，我们提出了直接将新闻视频总结为实体感知字幕的任务。我们还发布了一个大规模数据集，VIEWS（视频新闻），以支持对这一任务的研究。此外，我们提出了一种方法，通过从视频中提取的上下文与外部世界知识相结合，生成实体感知字幕。我们在三个视频字幕模型上展示了我们方法的有效性。我们还展示了我们的方法推广到现有新闻图片字幕数据集的能力。通过所有广泛的实验和见解，我们相信我们为未来研究这一具有挑战性任务奠定了坚实的基础。

更新时间: 2024-11-11 05:14:15

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2312.02188v2

Script-Strategy Aligned Generation: Aligning LLMs with Expert-Crafted Dialogue Scripts and Therapeutic Strategies for Psychotherapy

Chatbots or conversational agents (CAs) are increasingly used to improve access to digital psychotherapy. Many current systems rely on rigid, rule-based designs, heavily dependent on expert-crafted dialogue scripts for guiding therapeutic conversations. Although recent advances in large language models (LLMs) offer the potential for more flexible interactions, their lack of controllability and transparency poses significant challenges in sensitive areas like psychotherapy. In this work, we explored how aligning LLMs with expert-crafted scripts can enhance psychotherapeutic chatbot performance. Our comparative study showed that LLMs aligned with expert-crafted scripts through prompting and fine-tuning significantly outperformed both pure LLMs and rule-based chatbots, achieving a more effective balance between dialogue flexibility and adherence to therapeutic principles. Building on findings, we proposed ``Script-Strategy Aligned Generation (SSAG)'', a flexible alignment approach that reduces reliance on fully scripted content while enhancing LLMs' therapeutic adherence and controllability. In a 10-day field study, SSAG demonstrated performance comparable to full script alignment and outperformed rule-based chatbots, empirically supporting SSAG as an efficient approach for aligning LLMs with domain expertise. Our work advances LLM applications in psychotherapy by providing a controllable, adaptable, and scalable solution for digital interventions, reducing reliance on expert effort. It also provides a collaborative framework for domain experts and developers to efficiently build expertise-aligned chatbots, broadening access to psychotherapy and behavioral interventions.

Updated: 2024-11-11 05:14:14

标题: 脚本策略对齐生成：将LLMs与专家制作的对话脚本和心理治疗策略对齐

摘要: 聊天机器人或对话代理（CAs）越来越被用于改善数字心理治疗的获取。许多当前的系统依赖于严格的、基于规则的设计，严重依赖于专家制作的对话脚本来引导治疗对话。尽管最近大型语言模型（LLMs）的进展为更灵活的互动提供了可能性，但它们缺乏可控性和透明性在心理治疗等敏感领域中提出了重大挑战。在这项工作中，我们探讨了如何通过与专家制作的脚本对齐来增强心理治疗聊天机器人的性能。我们的比较研究表明，通过提示和微调与专家制作的脚本对齐的LLMs明显优于纯LLMs和基于规则的聊天机器人，实现了对话灵活性和遵循治疗原则之间更有效的平衡。基于研究结果，我们提出了“脚本策略对齐生成（SSAG）”，这是一种灵活的对齐方法，减少了对完全脚本内容的依赖，同时增强了LLMs的治疗遵从性和可控性。在为期10天的现场研究中，SSAG表现出与完全脚本对齐相媲美的性能，并且优于基于规则的聊天机器人，从经验上支持SSAG作为将LLMs与领域专业知识对齐的高效方法。我们的工作通过提供一种可控、适应和可扩展的数字干预解决方案，减少了对专家努力的依赖，推动了LLMs在心理治疗中的应用。它还为领域专家和开发人员提供了一个协作框架，有效地构建与专业知识对齐的聊天机器人，拓宽了心理治疗和行为干预的获取途径。

更新时间: 2024-11-11 05:14:14

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2411.06723v1

Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models

Presenting users with diverse responses from foundation models is crucial for enhancing user experience and accommodating varying preferences. However, generating multiple high-quality and diverse responses without sacrificing accuracy remains a challenge, especially when using greedy sampling. In this work, we propose a novel framework, Synthesize-Partition-Adapt (SPA), that leverages the abundant synthetic data available in many domains to elicit diverse responses from foundation models. By leveraging signal provided by data attribution methods such as influence functions, SPA partitions data into subsets, each targeting unique aspects of the data, and trains multiple model adaptations optimized for these subsets. Experimental results demonstrate the effectiveness of our approach in diversifying foundation model responses while maintaining high quality, showcased through the HumanEval and MBPP tasks in the code generation domain and several tasks in the natural language understanding domain, highlighting its potential to enrich user experience across various applications.

Updated: 2024-11-11 05:13:21

标题: 合成、划分、然后适应：从基础模型中获取多样样本

摘要: 向用户呈现基础模型的多样化响应对于增强用户体验并适应不同偏好至关重要。然而，在不牺牲准确性的情况下生成多个高质量且多样化的响应仍然是一个挑战，特别是在使用贪婪抽样时。在这项工作中，我们提出了一种新颖的框架，即Synthesize-Partition-Adapt（SPA），利用许多领域中丰富的合成数据来引出基础模型的多样化响应。通过利用数据归因方法（如影响函数）提供的信号，SPA将数据分成子集，每个子集针对数据的独特方面，并训练多个针对这些子集进行优化的模型适应。实验结果表明，我们的方法在使基础模型的响应多样化的同时保持高质量的有效性，通过在代码生成领域的HumanEval和MBPP任务以及自然语言理解领域的几个任务中展示出来，突显了它在各种应用中丰富用户体验的潜力。

更新时间: 2024-11-11 05:13:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.06722v1

Real-time Monitoring and Analysis of Track and Field Athletes Based on Edge Computing and Deep Reinforcement Learning Algorithm

This research focuses on real-time monitoring and analysis of track and field athletes, addressing the limitations of traditional monitoring systems in terms of real-time performance and accuracy. We propose an IoT-optimized system that integrates edge computing and deep learning algorithms. Traditional systems often experience delays and reduced accuracy when handling complex motion data, whereas our method, by incorporating a SAC-optimized deep learning model within the IoT architecture, achieves efficient motion recognition and real-time feedback. Experimental results show that this system significantly outperforms traditional methods in response time, data processing accuracy, and energy efficiency, particularly excelling in complex track and field events. This research not only enhances the precision and efficiency of athlete monitoring but also provides new technical support and application prospects for sports science research.

Updated: 2024-11-11 05:12:15

标题: 基于边缘计算和深度强化学习算法的田径运动员实时监测与分析

摘要: 这项研究着重于田径运动员的实时监测和分析，解决了传统监测系统在实时性能和准确性方面的限制。我们提出了一个集成边缘计算和深度学习算法的物联网优化系统。传统系统在处理复杂运动数据时往往会出现延迟和准确性降低的问题，而我们的方法通过在物联网架构中整合了一个经过SAC优化的深度学习模型，实现了高效的运动识别和实时反馈。实验结果表明，该系统在响应时间、数据处理准确性和能源效率方面显著优于传统方法，尤其在处理复杂的田径比赛时表现出色。这项研究不仅提高了运动员监测的精度和效率，还为运动科学研究提供了新的技术支持和应用前景。

更新时间: 2024-11-11 05:12:15

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2411.06720v1

Shallow Signed Distance Functions for Kinematic Collision Bodies

We present learning-based implicit shape representations designed for real-time avatar collision queries arising in the simulation of clothing. Signed distance functions (SDFs) have been used for such queries for many years due to their computational efficiency. Recently deep neural networks have been used for implicit shape representations (DeepSDFs) due to their ability to represent multiple shapes with modest memory requirements compared to traditional representations over dense grids. However, the computational expense of DeepSDFs prevents their use in real-time clothing simulation applications. We design a learning-based representation of SDFs for human avatars whoes bodies change shape kinematically due to joint-based skinning. Rather than using a single DeepSDF for the entire avatar, we use a collection of extremely computationally efficient (shallow) neural networks that represent localized deformations arising from changes in body shape induced by the variation of a single joint. This requires a stitching process to combine each shallow SDF in the collection together into one SDF representing the signed closest distance to the boundary of the entire body. To achieve this we augment each shallow SDF with an additional output that resolves whether or not the individual shallow SDF value is referring to a closest point on the boundary of the body, or to a point on the interior of the body (but on the boundary of the individual shallow SDF). Our model is extremely fast and accurate and we demonstrate its applicability with real-time simulation of garments driven by animated characters.

Updated: 2024-11-11 05:09:31

标题: 浅层符号距离函数用于动力学碰撞体

摘要: 我们提出了一种基于学习的隐式形状表示，专为在服装模拟中发生的实时角色碰撞查询而设计。多年来，由于其计算效率，已经使用有符号距离函数（SDF）进行此类查询。最近，由于深度神经网络能够以适中的内存要求表示多种形状，因此已经开始使用隐式形状表示（DeepSDF）。然而，DeepSDF的计算成本阻碍了其在实时服装模拟应用中的使用。我们为由于基于关节的皮肤绑定而发生形状动态变化的人体角色设计了基于学习的SDF表示。我们并非使用单个DeepSDF来表示整个角色，而是使用一组极其计算高效（浅层）的神经网络，代表由单个关节变化引起的局部变形。这需要一个缝合过程，将集合中的每个浅层SDF组合成一个SDF，表示整个身体边界的有符号最近距离。为了实现这一点，我们通过为每个浅层SDF增加一个额外的输出来解决是否个别浅层SDF值是指身体边界上的最近点，还是指身体内部的一个点（但在个别浅层SDF的边界上）。我们的模型非常快速和准确，我们通过由动画角色驱动的服装实时模拟展示了其适用性。

更新时间: 2024-11-11 05:09:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.06719v1

Truth, beauty, and goodness in grand unification: a machine learning approach

We investigate the flavour sector of the supersymmetric $SU(5)$ Grand Unified Theory (GUT) model using machine learning techniques. The minimal $SU(5)$ model is known to predict fermion masses that disagree with observed values in nature. There are two well-known approaches to address this issue: one involves introducing a 45-representation Higgs field, while the other employs a higher-dimensional operator involving the 24-representation GUT Higgs field. We compare these two approaches by numerically optimising a loss function, defined as the ratio of determinants of mass matrices. Our findings indicate that the 24-Higgs approach achieves the observed fermion masses with smaller modifications to the original minimal $SU(5)$ model.

Updated: 2024-11-11 05:02:46

标题: 真理、美和善于大一统中的应用机器学习方法

摘要: 我们使用机器学习技术研究了超对称$SU(5)$大统一理论（GUT）模型的味道部分。已知最小的$SU(5)$模型预测的费米子质量与自然界观测值不符。解决这个问题有两种众所周知的方法：一种涉及引入一个45表示Higgs场，另一种则使用涉及24表示GUT Higgs场的高维算符。我们通过数值优化一个损失函数来比较这两种方法，该损失函数定义为质量矩阵行列式的比值。我们的发现表明，24-Higgs方法通过对原始最小$SU(5)$模型进行较小的修改就可以实现观测到的费米子质量。

更新时间: 2024-11-11 05:02:46

领域: hep-ph,cs.LG,hep-th

下载: http://arxiv.org/abs/2411.06718v1

DiffSR: Learning Radar Reflectivity Synthesis via Diffusion Model from Satellite Observations

Weather radar data synthesis can fill in data for areas where ground observations are missing. Existing methods often employ reconstruction-based approaches with MSE loss to reconstruct radar data from satellite observation. However, such methods lead to over-smoothing, which hinders the generation of high-frequency details or high-value observation areas associated with convective weather. To address this issue, we propose a two-stage diffusion-based method called DiffSR. We first pre-train a reconstruction model on global-scale data to obtain radar estimation and then synthesize radar reflectivity by combining radar estimation results with satellite data as conditions for the diffusion model. Extensive experiments show that our method achieves state-of-the-art (SOTA) results, demonstrating the ability to generate high-frequency details and high-value areas.

Updated: 2024-11-11 04:50:34

标题: DiffSR：通过从卫星观测中学习扩散模型的雷达反射率合成

摘要: 天气雷达数据合成可以填补地面观测缺失的数据。现有方法通常采用基于重建的方法以均方误差损失来从卫星观测中重建雷达数据。然而，这种方法会导致过度平滑，阻碍了生成与对流天气相关的高频细节或高价值观测区域。为解决这一问题，我们提出了一种基于两阶段扩散的方法，称为DiffSR。我们首先在全球范围的数据上预训练一个重建模型，以获得雷达估计，然后通过将雷达估计结果与卫星数据结合作为扩散模型的条件来合成雷达反射率。大量实验表明，我们的方法取得了最先进的结果，展示了生成高频细节和高价值区域的能力。

更新时间: 2024-11-11 04:50:34

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2411.06714v1

GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation

Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training. However, the advancement of these models has been hindered by the lack of comprehensive benchmarks. To address this gap, we introduce the General Time Series Forecasting Model Evaluation, GIFT-Eval, a pioneering benchmark aimed at promoting evaluation across diverse datasets. GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points, spanning seven domains, 10 frequencies, multivariate inputs, and prediction lengths ranging from short to long-term forecasts. To facilitate the effective pretraining and evaluation of foundation models, we also provide a non-leaking pretraining dataset containing approximately 230 billion data points. Additionally, we provide a comprehensive analysis of 17 baselines, which includes statistical models, deep learning models, and foundation models. We discuss each model in the context of various benchmark characteristics and offer a qualitative analysis that spans both deep learning and foundation models. We believe the insights from this analysis, along with access to this new standard zero-shot time series forecasting benchmark, will guide future developments in time series foundation models. Code, data, and the leaderboard can be found at https://github.com/SalesforceAIResearch/gift-eval .

Updated: 2024-11-11 04:48:24

标题: GIFT-Eval: 用于一般时间序列预测模型评估的基准测试

摘要: 时间序列基础模型在零射击预测方面表现出色，能够处理多样化的任务而无需明确训练。然而，这些模型的进展受到缺乏全面基准的阻碍。为了解决这一问题，我们引入了通用时间序列预测模型评估（GIFT-Eval），这是一个旨在促进跨多个数据集的评估的先驱性基准。GIFT-Eval涵盖了23个数据集，包括144,000个时间序列和1.77亿个数据点，涵盖七个领域、10个频率、多变量输入以及预测长度从短期到长期的范围。为了促进基础模型的有效预训练和评估，我们还提供一个包含约230亿数据点的非泄漏预训练数据集。此外，我们提供了对17个基线模型的全面分析，包括统计模型、深度学习模型和基础模型。我们讨论了每个模型在各种基准特征下的情况，并提供了一个涵盖深度学习和基础模型的定性分析。我们相信从这一分析中得出的见解，以及对这一新标准零射击时间序列预测基准的访问，将引导时间序列基础模型的未来发展。代码、数据和排行榜可以在https://github.com/SalesforceAIResearch/gift-eval 找到。

更新时间: 2024-11-11 04:48:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.10393v2

Ambient AI Scribing Support: Comparing the Performance of Specialized AI Agentic Architecture to Leading Foundational Models

This study compares Sporo Health's AI Scribe, a proprietary model fine-tuned for medical scribing, with various LLMs (GPT-4o, GPT-3.5, Gemma-9B, and Llama-3.2-3B) in clinical documentation. We analyzed de-identified patient transcripts from partner clinics, using clinician-provided SOAP notes as the ground truth. Each model generated SOAP summaries using zero-shot prompting, with performance assessed via recall, precision, and F1 scores. Sporo outperformed all models, achieving the highest recall (73.3%), precision (78.6%), and F1 score (75.3%) with the lowest performance variance. Statistically significant differences (p < 0.05) were found between Sporo and the other models, with post-hoc tests showing significant improvements over GPT-3.5, Gemma-9B, and Llama 3.2-3B. While Sporo outperformed GPT-4o by up to 10%, the difference was not statistically significant (p = 0.25). Clinical user satisfaction, measured with a modified PDQI-9 inventory, favored Sporo. Evaluations indicated Sporo's outputs were more accurate and relevant. This highlights the potential of Sporo's multi-agentic architecture to improve clinical workflows.

Updated: 2024-11-11 04:45:48

标题: 环境人工智能书写支持：比较专门的AI代理架构与领先的基础模型的性能

摘要: 这项研究比较了Sporo Health的AI Scribe，一个专为医学记录调整的专有模型，与各种LLMs（GPT-4o，GPT-3.5，Gemma-9B和Llama-3.2-3B）在临床文档中的表现。我们分析了合作诊所的去标识化患者记录，使用临床医生提供的SOAP注释作为基准。每个模型使用零-shot提示生成SOAP摘要，通过召回率、精确率和F1分数进行评估。Sporo在所有模型中表现最好，达到最高的召回率（73.3%）、精确率（78.6%）和F1分数（75.3%），且性能变异最低。在Sporo和其他模型之间发现了统计学显著差异（p < 0.05），事后检验显示Sporo在GPT-3.5、Gemma-9B和Llama 3.2-3B上有显著改进。虽然Sporo在某些方面超过GPT-4o高达10%，但差异并不具有统计学意义（p = 0.25）。通过修改后的PDQI-9清单测量的临床用户满意度更倾向于Sporo。评估表明Sporo的输出更准确和相关。这突显了Sporo的多代理结构改善临床工作流程的潜力。

更新时间: 2024-11-11 04:45:48

领域: cs.AI

下载: http://arxiv.org/abs/2411.06713v1

Anytime Probabilistically Constrained Provably Convergent Online Belief Space Planning

Taking into account future risk is essential for an autonomously operating robot to find online not only the best but also a safe action to execute. In this paper, we build upon the recently introduced formulation of probabilistic belief-dependent constraints. We present an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains. Unlike previous approaches, our method assures safety anytime with respect to the currently expanded search tree without relying on the convergence of the search. We prove convergence in probability with an exponential rate of a version of our algorithms and study proposed techniques via extensive simulations. Even with a tiny number of tree queries, the best action found by our approach is much safer than the baseline. Moreover, our approach constantly finds better than the baseline action in terms of objective. This is because we revise the values and statistics maintained in the search tree and remove from them the contribution of the pruned actions.

Updated: 2024-11-11 04:42:18

标题: 随时概率约束可证明收敛的在线信念空间规划

摘要: 考虑未来风险对于一个自主操作的机器人来说是至关重要的，它需要在线找到不仅是最佳的，还是安全的执行动作。在本文中，我们基于最近提出的概率信念依赖约束的公式化。我们提出了一种使用蒙特卡洛树搜索（MCTS）方法的任意时间方法，适用于连续领域。与先前的方法不同，我们的方法保证了在当前扩展的搜索树中随时保持安全，而不依赖于搜索的收敛。我们证明了我们算法的一个版本的概率收敛性，并通过广泛的模拟研究了提出的技术。即使只进行少量的树查询，我们的方法找到的最佳动作比基准方法更安全。此外，我们的方法在客观上不断比基准动作更好。这是因为我们修订了搜索树中维护的值和统计数据，并从中删除了被剪枝动作的贡献。

更新时间: 2024-11-11 04:42:18

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2411.06711v1

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several troublesome engineering choices, such as selecting hyperparameters and determining checkpoints from an optimization trajectory. To tackle the difficulty of choosing the best model, one effective solution is model fusion, which combines multiple models in a parameter space. However, we observe a large discrepancy between loss and metric landscapes during the fine-tuning of pre-trained language models. Building on this observation, we introduce a novel model fusion technique that optimizes both the desired metric and loss through multi-objective Bayesian optimization. In addition, to effectively select hyperparameters, we establish a two-stage procedure by integrating Bayesian optimization processes into our framework. Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.

Updated: 2024-11-11 04:36:58

标题: 通过贝叶斯优化在语言模型微调中进行模型融合

摘要: 将预训练模型微调用于下游任务是一种被广泛采用的技术，以其在各个领域中的适应性和可靠性而闻名。尽管概念简单，但微调涉及一些麻烦的工程选择，比如选择超参数和确定优化轨迹中的检查点。为了解决选择最佳模型的困难，一个有效的解决方案是模型融合，它在参数空间中结合了多个模型。然而，在预训练语言模型微调过程中，我们观察到损失和指标景观之间存在较大的差异。基于这一观察，我们引入了一种通过多目标贝叶斯优化来优化所需指标和损失的新颖模型融合技术。此外，为了有效选择超参数，我们通过将贝叶斯优化过程整合到我们的框架中建立了一个两阶段过程。在各种下游任务的实验中，我们的贝叶斯优化引导方法显示出显著的性能改进。

更新时间: 2024-11-11 04:36:58

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.06710v1

NutriBench: A Dataset for Evaluating Large Language Models on Nutrition Estimation from Meal Descriptions

Accurate nutrition estimation helps people make informed dietary choices and is essential in the prevention of serious health complications. We present NutriBench, the first publicly available natural language meal description nutrition benchmark. NutriBench consists of 11,857 meal descriptions generated from real-world global dietary intake data. The data is human-verified and annotated with macro-nutrient labels, including carbohydrates, proteins, fats, and calories. We conduct an extensive evaluation of NutriBench on the task of carbohydrate estimation, testing twelve leading Large Language Models (LLMs), including GPT-4o, Llama3.1, Qwen2, Gemma2, and OpenBioLLM models, using standard, Chain-of-Thought and Retrieval-Augmented Generation strategies. Additionally, we present a study involving professional nutritionists, finding that LLMs can provide more accurate and faster estimates. Finally, we perform a real-world risk assessment by simulating the effect of carbohydrate predictions on the blood glucose levels of individuals with diabetes. Our work highlights the opportunities and challenges of using LLMs for nutrition estimation, demonstrating their potential to aid professionals and laypersons and improve health outcomes. Our benchmark is publicly available at: https://mehak126.github.io/nutribench.html

Updated: 2024-11-11 04:17:30

标题: NutriBench：用于评估大型语言模型在从餐食描述中进行营养估计的数据集

摘要: 准确的营养估计有助于人们做出明智的饮食选择，并在预防严重健康并发症中至关重要。我们提出NutriBench，这是第一个公开可用的自然语言餐饮描述营养基准。NutriBench包括来自真实全球饮食摄入数据生成的11,857个餐饮描述。这些数据经过人工验证，并标注了宏量营养标签，包括碳水化合物、蛋白质、脂肪和卡路里。我们对NutriBench在碳水化合物估计任务上进行了广泛评估，测试了包括GPT-4o、Llama3.1、Qwen2、Gemma2和OpenBioLLM模型在内的十二个领先的大语言模型（LLMs），使用标准、Chain-of-Thought和Retrieval-Augmented Generation策略。此外，我们进行了一项涉及专业营养师的研究，发现LLMs可以提供更准确和更快速的估计。最后，我们通过模拟碳水化合物预测对患有糖尿病的个体血糖水平的影响，进行了实际风险评估。我们的工作突出了使用LLMs进行营养估计的机会和挑战，展示了它们帮助专业人士和普通人，并改善健康结果的潜力。我们的基准可以在以下网址公开获取：https://mehak126.github.io/nutribench.html

更新时间: 2024-11-11 04:17:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12843v4

Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present Beyond Text: an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. Beyond Text' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.

Updated: 2024-11-11 04:03:28

标题: 超越文本：利用声音线索改善机器人导航任务中的决策制定

摘要: 尽管LLMs在处理这些人类对话中的文本方面表现出色，但它们在诸如社交导航等场景中的口头指令细微差别方面却遇到困难，其中的模糊性和不确定性可能会侵蚀对机器人和其他人工智能系统的信任。我们可以通过超越文本，同时专注于这些音频回应的语用特征来解决这一缺陷。这些特征是口头交流中不涉及字面措辞（词汇内容）但通过说话方式传达含义和细微差别的方面。我们提出了Beyond Text：一种通过整合音频转录和这些特征的一个子集来改善LLM决策的方法，这些特征侧重于情感，并在人机对话中更相关。这种方法不仅实现了70.26%的胜率，比现有的LLMs（分别为gemini-1.5-pro和gpt-3.5）提高了22.16%至48.30%，而且还增强了对令牌篡改对抗攻击的鲁棒性，体现在比仅文本语言模型的胜率下降率低22.44%。Beyond Text'标志着社交机器人导航和更广泛的人机交互的进步，将基于文本的指导与基于人类音频信息的语言模型无缝集成。

更新时间: 2024-11-11 04:03:28

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2402.03494v3

Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise

We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a ``best-fit'' function. More precisely, given training samples from a reference distribution $\mathcal{p}_0$, the goal is to approximate the vector $\mathbf{w}^*$ which minimizes the squared loss with respect to the worst-case distribution that is close in $\chi^2$-divergence to $\mathcal{p}_{0}$. We design a computationally efficient algorithm that recovers a vector $ \hat{\mathbf{w}}$ satisfying $\mathbb{E}_{\mathcal{p}^*} (\sigma(\hat{\mathbf{w}} \cdot \mathbf{x}) - y)^2 \leq C \, \mathbb{E}_{\mathcal{p}^*} (\sigma(\mathbf{w}^* \cdot \mathbf{x}) - y)^2 + \epsilon$, where $C>1$ is a dimension-independent constant and $(\mathbf{w}^*, \mathcal{p}^*)$ is the witness attaining the min-max risk $\min_{\mathbf{w}~:~\|\mathbf{w}\| \leq W} \max_{\mathcal{p}} \mathbb{E}_{(\mathbf{x}, y) \sim \mathcal{p}} (\sigma(\mathbf{w} \cdot \mathbf{x}) - y)^2 - \nu \chi^2(\mathcal{p}, \mathcal{p}_0)$. Our algorithm follows a primal-dual framework and is designed by directly bounding the risk with respect to the original, nonconvex $L_2^2$ loss. From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.

Updated: 2024-11-11 03:43:52

标题: 学习使单个神经元能够稳健地适应分布转移和对抗性标签噪声

摘要: 我们研究了在对抗性分布转移的情况下，学习单个神经元关于$L_2^2$-损失的问题，其中标签可以是任意的，目标是找到一个“最佳拟合”函数。更具体地说，给定来自参考分布$\mathcal{p}_0$的训练样本，目标是近似最小化与在$\chi^2$-散度上接近$\mathcal{p}_{0}$的最坏情况分布相关的向量$\mathbf{w}^*$。我们设计了一个计算效率高的算法，可以恢复一个向量$\hat{\mathbf{w}}$，满足$\mathbb{E}_{\mathcal{p}^*} (\sigma(\hat{\mathbf{w}} \cdot \mathbf{x}) - y)^2 \leq C \, \mathbb{E}_{\mathcal{p}^*} (\sigma(\mathbf{w}^* \cdot \mathbf{x}) - y)^2 + \epsilon$，其中$C>1$是一个与维度无关的常数，$(\mathbf{w}^*, \mathcal{p}^*)$是实现最小-最大风险的证据，$\min_{\mathbf{w}~:~\|\mathbf{w}\| \leq W} \max_{\mathcal{p}} \mathbb{E}_{(\mathbf{x}, y) \sim \mathcal{p}} (\sigma(\mathbf{w} \cdot \mathbf{x}) - y)^2 - \nu \chi^2(\mathcal{p}, \mathcal{p}_0)$。我们的算法遵循原始-对偶框架，并且是通过直接限制与原始的非凸$L_2^2$损失相关的风险来设计的。从优化的角度来看，我们的工作为设计在结构化的非凸性下的原始-对偶算法开辟了新的途径。

更新时间: 2024-11-11 03:43:52

领域: cs.LG,cs.DS,math.OC,stat.ML

下载: http://arxiv.org/abs/2411.06697v1

SPIRIT: Low Power Seizure Prediction using Unsupervised Online-Learning and Zoom Analog Frontends

Early prediction of seizures and timely interventions are vital for improving patients' quality of life. While seizure prediction has been shown in software-based implementations, to enable timely warnings of upcoming seizures, prediction must be done on an edge device to reduce latency. Ideally, such devices must also be low-power and track long-term drifts to minimize maintenance from the user. This work presents SPIRIT: Stochastic-gradient-descent-based Predictor with Integrated Retraining and In situ accuracy Tuning. SPIRIT is a complete system-on-a-chip (SoC) integrating an unsupervised online-learning seizure prediction classifier with eight 14.4 uW, 0.057 mm2, 90.5 dB dynamic range, Zoom Analog Frontends. SPIRIT achieves, on average, 97.5%/96.2% sensitivity/specificity respectively, predicting seizures an average of 8.4 minutes before they occur. Through its online learning algorithm, prediction accuracy improves by up to 15%, and prediction times extend by up to 7x, without any external intervention. Its classifier consumes 17.2 uW and occupies 0.14 mm2, the lowest reported for a prediction classifier by >134x in power and >5x in area. SPIRIT is also at least 5.6x more energy efficient than the state-of-the-art.

Updated: 2024-11-11 03:37:16

标题: 灵敏度：使用无监督在线学习和变焦模拟前端实现低功耗癫痫预测

摘要: 早期预测癫痫发作并及时干预对改善患者的生活质量至关重要。虽然已经在基于软件的实现中显示了癫痫预测的能力，为了使预测能够提供及时的发作警告，必须在边缘设备上进行预测以减少延迟。理想情况下，这些设备还必须是低功耗的，并跟踪长期漂移以最大程度地减少用户的维护。本文介绍了SPIRIT：基于随机梯度下降的预测器，具有集成再训练和原位准确调整功能。SPIRIT是一种完整的片上系统(SoC)，将无监督在线学习的癫痫预测分类器与八个14.4 uW、0.057 mm2、90.5 dB动态范围的Zoom模拟前端集成在一起。SPIRIT平均实现了97.5%/96.2%的灵敏度/特异性，平均提前8.4分钟预测癫痫发作。通过其在线学习算法，预测准确性提高了15%，预测时间延长了7倍，而无需任何外部干预。其分类器消耗17.2 uW，占用0.14 mm2，这是预测分类器中报告的最低功耗和面积比例分别大于134倍和5倍。SPIRIT的能效至少比最先进技术高出5.6倍。

更新时间: 2024-11-11 03:37:16

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2409.04838v2

Solving the 2D Advection-Diffusion Equation using Fixed-Depth Symbolic Regression and Symbolic Differentiation without Expression Trees

This paper presents a novel method for solving the 2D advection-diffusion equation using fixed-depth symbolic regression and symbolic differentiation without expression trees. The method is applied to two cases with distinct initial and boundary conditions, demonstrating its accuracy and ability to find approximate solutions efficiently. This framework offers a promising, scalable solution for finding approximate solutions to differential equations, with the potential for future improvements in computational performance and applicability to more complex systems involving vector-valued objectives.

Updated: 2024-11-11 03:34:46

标题: 使用固定深度符号回归和无表达式树符号微分解决二维对流扩散方程

摘要: 本文提出了一种新颖的方法，使用固定深度符号回归和符号微分而不使用表达式树来解决2D平流-扩散方程。该方法应用于两种具有不同初始和边界条件的情况，展示了其准确性和有效找到近似解的能力。这一框架为找到微分方程的近似解提供了一种有前景、可扩展的解决方案，具有潜力在计算性能和适用于涉及矢量目标的更复杂系统中进行未来改进。

更新时间: 2024-11-11 03:34:46

领域: stat.CO,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2411.00011v2

Autonomous Droplet Microfluidic Design Framework with Large Language Models

Droplet-based microfluidic devices have substantial promise as cost-effective alternatives to current assessment tools in biological research. Moreover, machine learning models that leverage tabular data, including input design parameters and their corresponding efficiency outputs, are increasingly utilised to automate the design process of these devices and to predict their performance. However, these models fail to fully leverage the data presented in the tables, neglecting crucial contextual information, including column headings and their associated descriptions. This study presents MicroFluidic-LLMs, a framework designed for processing and feature extraction, which effectively captures contextual information from tabular data formats. MicroFluidic-LLMs overcomes processing challenges by transforming the content into a linguistic format and leveraging pre-trained large language models (LLMs) for analysis. We evaluate our MicroFluidic-LLMs framework on 11 prediction tasks, covering aspects such as geometry, flow conditions, regimes, and performance, utilising a publicly available dataset on flow-focusing droplet microfluidics. We demonstrate that our MicroFluidic-LLMs framework can empower deep neural network models to be highly effective and straightforward while minimising the need for extensive data preprocessing. Moreover, the exceptional performance of deep neural network models, particularly when combined with advanced natural language processing models such as DistilBERT and GPT-2, reduces the mean absolute error in the droplet diameter and generation rate by nearly 5- and 7-fold, respectively, and enhances the regime classification accuracy by over 4%, compared with the performance reported in a previous study. This study lays the foundation for the huge potential applications of LLMs and machine learning in a wider spectrum of microfluidic applications.

Updated: 2024-11-11 03:20:53

标题: 基于大型语言模型的自主液滴微流控设计框架

摘要: 基于液滴的微流体器件作为生物研究中当前评估工具的经济有效替代方案具有重要潜力。此外，利用表格数据的机器学习模型，包括输入设计参数及其对应的效率输出，越来越被用于自动化设计这些器件并预测其性能。然而，这些模型未能充分利用表格中呈现的数据，忽略了关键的上下文信息，包括列标题及其相关描述。本研究提出了MicroFluidic-LLMs框架，旨在处理和提取特征，有效地从表格数据格式中捕获上下文信息。MicroFluidic-LLMs通过将内容转换为语言格式并利用预训练的大型语言模型（LLMs）进行分析，克服了处理挑战。我们在液滴微流体力学的一个公开可用数据集上评估了我们的MicroFluidic-LLMs框架，涵盖了几何、流动条件、制度和性能等方面的11个预测任务。我们展示了我们的MicroFluidic-LLMs框架能够使深度神经网络模型高效而简单地运行，同时最小化对广泛数据预处理的需求。此外，深度神经网络模型的异常性能，特别是与高级自然语言处理模型（如DistilBERT和GPT-2）相结合时，分别将液滴直径和产生速率的平均绝对误差减少了近5倍和7倍，并将制度分类准确率提高了超过4%，相比于先前研究报告的性能。本研究为LLMs和机器学习在更广泛的微流体应用领域中的巨大潜力应用奠定了基础。

更新时间: 2024-11-11 03:20:53

领域: cs.AI

下载: http://arxiv.org/abs/2411.06691v1

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

Deep models have demonstrated remarkable performance in time series forecasting. However, due to the partially-observed nature of real-world applications, solely focusing on the target of interest, so-called endogenous variables, is usually insufficient to guarantee accurate forecasting. Notably, a system is often recorded into multiple variables, where the exogenous variables can provide valuable external information for endogenous variables. Thus, unlike well-established multivariate or univariate forecasting paradigms that either treat all the variables equally or ignore exogenous information, this paper focuses on a more practical setting: time series forecasting with exogenous variables. We propose a novel approach, TimeXer, to ingest external information to enhance the forecasting of endogenous variables. With deftly designed embedding layers, TimeXer empowers the canonical Transformer with the ability to reconcile endogenous and exogenous information, where patch-wise self-attention and variate-wise cross-attention are used simultaneously. Moreover, global endogenous tokens are learned to effectively bridge the causal information underlying exogenous series into endogenous temporal patches. Experimentally, TimeXer achieves consistent state-of-the-art performance on twelve real-world forecasting benchmarks and exhibits notable generality and scalability. Code is available at this repository: https://github.com/thuml/TimeXer.

Updated: 2024-11-11 03:18:32

标题: TimeXer：利用外生变量增强变压器进行时间序列预测

摘要: 深度模型在时间序列预测中表现出卓越的性能。然而，由于现实应用的部分观察性质，仅关注感兴趣的目标，即所谓的内生变量，通常不足以保证准确的预测。值得注意的是，一个系统通常记录在多个变量中，其中外生变量可以为内生变量提供有价值的外部信息。因此，与既定的多变量或单变量预测范式不同，这篇论文着眼于一个更实际的设置：带有外生变量的时间序列预测。我们提出了一种新颖的方法，TimeXer，用于吸收外部信息以增强内生变量的预测能力。通过巧妙设计的嵌入层，TimeXer赋予了传统的Transformer能力，使其能够协调内生和外生信息，同时使用基于补丁的自注意力和基于变量的交叉注意力。此外，全局内生代币被学习用于有效地将潜在的因果信息从外生系列传递到内生时间补丁。在实验中，TimeXer在十二个真实世界的预测基准上实现了一致的最先进性能，并展现了显著的泛化性和可扩展性。代码可在以下存储库中找到：https://github.com/thuml/TimeXer。

更新时间: 2024-11-11 03:18:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.19072v4

Shedding Light on Problems with Hyperbolic Graph Learning

Recent papers in the graph machine learning literature have introduced a number of approaches for hyperbolic representation learning. The asserted benefits are improved performance on a variety of graph tasks, node classification and link prediction included. Claims have also been made about the geometric suitability of particular hierarchical graph datasets to representation in hyperbolic space. Despite these claims, our work makes a surprising discovery: when simple Euclidean models with comparable numbers of parameters are properly trained in the same environment, in most cases, they perform as well, if not better, than all introduced hyperbolic graph representation learning models, even on graph datasets previously claimed to be the most hyperbolic as measured by Gromov $\delta$-hyperbolicity (i.e., perfect trees). This observation gives rise to a simple question: how can this be? We answer this question by taking a careful look at the field of hyperbolic graph representation learning as it stands today, and find that a number of papers fail to diligently present baselines, make faulty modelling assumptions when constructing algorithms, and use misleading metrics to quantify geometry of graph datasets. We take a closer look at each of these three problems, elucidate the issues, perform an analysis of methods, and introduce a parametric family of benchmark datasets to ascertain the applicability of (hyperbolic) graph neural networks.

Updated: 2024-11-11 03:12:41

标题: 揭示关于双曲图学习问题的光明

摘要: 近期在图机器学习文献中，引入了许多用于双曲表示学习的方法。声称的好处包括在各种图任务上的性能提升，包括节点分类和链接预测。还声称了特定分层图数据集在双曲空间中的几何适用性。尽管有这些声明，我们的研究发现了一个令人惊讶的发现：当在相同环境中正确训练具有可比较参数数量的简单欧几里德模型时，在大多数情况下，它们的表现与或优于所有引入的双曲图表示学习模型，甚至是在先前声称最双曲的图数据集上（通过Gromov δ-双曲性度量，即完美树）。这一观察引发了一个简单的问题：这是怎么回事？我们通过仔细研究当前双曲图表示学习领域来回答这个问题，并发现许多论文未认真呈现基准线，在构建算法时做出错误的建模假设，并使用误导性指标来量化图数据集的几何形状。我们对这三个问题进行了更详细的研究，阐明了问题，进行了方法分析，并引入了一个参数化的基准数据集系列，以确定（双曲）图神经网络的适用性。

更新时间: 2024-11-11 03:12:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.06688v1

High-Frequency Enhanced Hybrid Neural Representation for Video Compression

Neural Representations for Videos (NeRV) have simplified the video codec process and achieved swift decoding speeds by encoding video content into a neural network, presenting a promising solution for video compression. However, existing work overlooks the crucial issue that videos reconstructed by these methods lack high-frequency details. To address this problem, this paper introduces a High-Frequency Enhanced Hybrid Neural Representation Network. Our method focuses on leveraging high-frequency information to improve the synthesis of fine details by the network. Specifically, we design a wavelet high-frequency encoder that incorporates Wavelet Frequency Decomposer (WFD) blocks to generate high-frequency feature embeddings. Next, we design the High-Frequency Feature Modulation (HFM) block, which leverages the extracted high-frequency embeddings to enhance the fitting process of the decoder. Finally, with the refined Harmonic decoder block and a Dynamic Weighted Frequency Loss, we further reduce the potential loss of high-frequency information. Experiments on the Bunny and UVG datasets demonstrate that our method outperforms other methods, showing notable improvements in detail preservation and compression performance.

Updated: 2024-11-11 03:04:46

标题: 高频增强的混合神经表示在视频压缩中的应用

摘要: 视频的神经表示（NeRV）简化了视频编解码过程，并通过将视频内容编码到神经网络中，实现了快速解码速度，为视频压缩提供了有前途的解决方案。然而，现有研究忽视了通过这些方法重建的视频缺乏高频细节的关键问题。为了解决这个问题，本文介绍了一种高频增强混合神经表示网络。我们的方法专注于利用高频信息来改善网络对细节的合成。具体而言，我们设计了一个小波高频编码器，其中包括小波频率分解器（WFD）块，用于生成高频特征嵌入。接下来，我们设计了高频特征调制（HFM）块，利用提取的高频嵌入来增强解码器的拟合过程。最后，通过优化的谐波解码器块和动态加权频率损失，我们进一步减少了高频信息的潜在损失。对Bunny和UVG数据集的实验表明，我们的方法优于其他方法，在细节保留和压缩性能方面显示出显著的改进。

更新时间: 2024-11-11 03:04:46

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2411.06685v1

Towards Backwards-Compatible Data with Confounded Domain Adaptation

Most current domain adaptation methods address either covariate shift or label shift, but are not applicable where they occur simultaneously and are confounded with each other. Domain adaptation approaches which do account for such confounding are designed to adapt covariates to optimally predict a particular label whose shift is confounded with covariate shift. In this paper, we instead seek to achieve general-purpose data backwards compatibility. This would allow the adapted covariates to be used for a variety of downstream problems, including on pre-existing prediction models and on data analytics tasks. To do this we consider a modification of generalized label shift (GLS), which we call confounded shift. We present a novel framework for this problem, based on minimizing the expected divergence between the source and target conditional distributions, conditioning on possible confounders. Within this framework, we provide concrete implementations using the Gaussian reverse Kullback-Leibler divergence and the maximum mean discrepancy. Finally, we demonstrate our approach on synthetic and real datasets.

Updated: 2024-11-11 02:49:50

标题: 朝向具有混杂域适应能力的向后兼容数据

摘要: 大多数当前的领域自适应方法要么解决协变量转移，要么解决标签转移，但不能同时适用于两者同时出现且相互混淆的情况。考虑到这种混淆的领域自适应方法旨在使协变量适应以最佳预测与协变量转移混淆的特定标签。在本文中，我们希望实现通用数据的向后兼容性。这将允许适应后的协变量用于各种下游问题，包括现有预测模型和数据分析任务。为此，我们考虑了广义标签转移（GLS）的修改，我们称之为混淆转移。我们提出了一个针对这个问题的新框架，基于最小化源和目标条件分布之间的期望差异，考虑可能的混淆因素。在这个框架内，我们提供了使用高斯逆Kullback-Leibler散度和最大均值差异的具体实现。最后，我们在合成和真实数据集上展示了我们的方法。

更新时间: 2024-11-11 02:49:50

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2203.12720v3

WDMoE: Wireless Distributed Mixture of Experts for Large Language Models

Large Language Models (LLMs) have achieved significant success in various natural language processing tasks, but the role of wireless networks in supporting LLMs has not been thoroughly explored. In this paper, we propose a wireless distributed Mixture of Experts (WDMoE) architecture to enable collaborative deployment of LLMs across edge servers at the base station (BS) and mobile devices in wireless networks. Specifically, we decompose the MoE layer in LLMs by placing the gating network and the preceding neural network layer at BS, while distributing the expert networks among the devices. This deployment leverages the parallel inference capabilities of expert networks on mobile devices, effectively utilizing the limited computing and caching resources of these devices. Accordingly, we develop a performance metric for WDMoE-based LLMs, which accounts for both model capability and latency. To minimize the latency while maintaining accuracy, we jointly optimize expert selection and bandwidth allocation based on the performance metric. Moreover, we build a hardware testbed using NVIDIA Jetson kits to validate the effectiveness of WDMoE. Both theoretical simulations and practical hardware experiments demonstrate that the proposed method can significantly reduce the latency without compromising LLM performance.

Updated: 2024-11-11 02:48:00

标题: WDMoE：用于大型语言模型的无线分布式专家混合

摘要: 大型语言模型（LLMs）在各种自然语言处理任务中取得了显著成功，但无线网络在支持LLMs方面的作用尚未得到彻底探讨。本文提出了一种无线分布式专家混合（WDMoE）架构，以实现LLMs在基站（BS）的边缘服务器和移动设备之间的协作部署。具体而言，我们通过将门控网络和前置神经网络层放置在BS，同时将专家网络分布在设备之间，将LLMs中的MoE层进行了分解。这种部署利用了移动设备上专家网络的并行推理能力，有效利用了这些设备的有限计算和缓存资源。因此，我们为基于WDMoE的LLMs开发了一个性能指标，该指标考虑了模型能力和延迟。为了最大程度地减少延迟同时保持准确性，我们根据性能指标共同优化专家选择和带宽分配。此外，我们使用NVIDIA Jetson套件构建了一个硬件测试平台，以验证WDMoE的有效性。理论模拟和实际硬件实验均表明，所提出的方法可以显著减少延迟而不影响LLMs的性能。

更新时间: 2024-11-11 02:48:00

领域: cs.LG,cs.AI,cs.DC,cs.IT,math.IT

下载: http://arxiv.org/abs/2411.06681v1

What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance

We explore the impact of pre-training data composition on the performance of small language models in a sample-efficient setting. Using datasets limited to 10 million words, we evaluate several dataset sources, including child-directed speech (CHILDES), classic books (Gutenberg), synthetic data (TinyStories), and a mix of these (Mix) across different model sizes ranging from 18 million to 705 million parameters. Our experiments show that smaller models (e.g., GPT2-97M, GPT2-705M, Llama-360M) perform better when trained on more complex and rich datasets like Gutenberg. Models trained on the CHILDES and TinyStories datasets underperformed across all model sizes. These findings suggest that the optimal dataset for sample efficient training depends on the model size, and that neither child-directed speech nor simplified stories are optimal for language models of all sizes. We highlight the importance of considering both dataset composition and model capacity for effective sample efficient language model training.

Updated: 2024-11-11 02:37:21

标题: 宝宝模型应该读什么？探索样本高效数据组合对模型性能的影响

摘要: 我们研究了预训练数据构成对小型语言模型在样本有效设置中性能的影响。使用限制在1000万字的数据集，我们评估了几种数据集来源，包括儿童指导语音（CHILDES）、经典书籍（古腾堡）、合成数据（TinyStories）以及这些数据的混合（Mix），跨越从1800万到7050万参数的不同模型大小。我们的实验表明，较小的模型（例如，GPT2-9700万、GPT2-7050万、Llama-3600万）在受过更复杂和丰富的数据集，如古腾堡，训练时表现更好。在CHILDES和TinyStories数据集上训练的模型在所有模型大小下表现不佳。这些发现表明，样本高效训练的最佳数据集取决于模型大小，并且既不是儿童指导的语音也不是简化故事对所有大小的语言模型都是最佳的。我们强调考虑数据集构成和模型容量对于有效的样本高效语言模型训练的重要性。

更新时间: 2024-11-11 02:37:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.06672v1

Adversarial Detection with a Dynamically Stable System

Adversarial detection is designed to identify and reject maliciously crafted adversarial examples(AEs) which are generated to disrupt the classification of target models. Presently, various input transformation-based methods have been developed on adversarial example detection, which typically rely on empirical experience and lead to unreliability against new attacks. To address this issue, we propose and conduct a Dynamically Stable System (DSS), which can effectively detect the adversarial examples from normal examples according to the stability of input examples. Particularly, in our paper, the generation of adversarial examples is considered as the perturbation process of a Lyapunov dynamic system, and we propose an example stability mechanism, in which a novel control term is added in adversarial example generation to ensure that the normal examples can achieve dynamic stability while the adversarial examples cannot achieve the stability. Then, based on the proposed example stability mechanism, a Dynamically Stable System (DSS) is proposed, which can utilize the disruption and restoration actions to determine the stability of input examples and detect the adversarial examples through changes in the stability of the input examples. In comparison with existing methods in three benchmark datasets(MNIST, CIFAR10, and CIFAR100), our evaluation results show that our proposed DSS can achieve ROC-AUC values of 99.83%, 97.81% and 94.47%, surpassing the state-of-the-art(SOTA) values of 97.35%, 91.10% and 93.49% in the other 7 methods.

Updated: 2024-11-11 02:16:17

标题: 对抗性检测与动态稳定系统

摘要: 对抗性检测旨在识别并拒绝恶意制造的对抗性示例（AEs），这些示例被生成以干扰目标模型的分类。目前，已开发了各种基于输入转换的方法来检测对抗性示例，通常依赖于经验经验，并导致对新攻击的不可靠性。为了解决这个问题，我们提出并进行了一个动态稳定系统（DSS），它可以根据输入示例的稳定性有效地检测对抗性示例。特别地，在我们的论文中，对抗性示例的生成被视为Lyapunov动态系统的扰动过程，并且我们提出了一个示例稳定性机制，其中在对抗性示例生成中添加了一个新的控制项，以确保正常示例可以实现动态稳定，而对抗性示例则无法实现稳定性。然后，基于提出的示例稳定性机制，提出了一个动态稳定系统（DSS），它可以利用干扰和恢复操作来确定输入示例的稳定性，并通过输入示例稳定性的变化来检测对抗性示例。与现有方法在三个基准数据集（MNIST、CIFAR10和CIFAR100）的比较中，我们的评估结果显示，我们提出的DSS可以实现99.83％、97.81％和94.47％的ROC-AUC值，超越了其他7种方法中97.35％、91.10％和93.49％的最新技术值。

更新时间: 2024-11-11 02:16:17

领域: cs.AI

下载: http://arxiv.org/abs/2411.06666v1

Bridge: A Unified Framework to Knowledge Graph Completion via Language Models and Knowledge Representation

Knowledge graph completion (KGC) is a task of inferring missing triples based on existing Knowledge Graphs (KGs). Both structural and semantic information are vital for successful KGC. However, existing methods only use either the structural knowledge from the KG embeddings or the semantic information from pre-trained language models (PLMs), leading to suboptimal model performance. Moreover, since PLMs are not trained on KGs, directly using PLMs to encode triples may be inappropriate. To overcome these limitations, we propose a novel framework called Bridge, which jointly encodes structural and semantic information of KGs. Specifically, we strategically encode entities and relations separately by PLMs to better utilize the semantic knowledge of PLMs and enable structured representation learning via a structural learning principle. Furthermore, to bridge the gap between KGs and PLMs, we employ a self-supervised representation learning method called BYOL to fine-tune PLMs with two different views of a triple. Unlike BYOL, which uses augmentation methods to create two semantically similar views of the same image, potentially altering the semantic information. We strategically separate the triple into two parts to create different views, thus avoiding semantic alteration. Experiments demonstrate that Bridge outperforms the SOTA models on three benchmark datasets.

Updated: 2024-11-11 01:59:04

标题: 桥梁：通过语言模型和知识表示实现知识图补全的统一框架

摘要: 知识图谱完善（KGC）是根据现有知识图谱（KGs）推断缺失三元组的任务。成功的KGC需要结构和语义信息。然而，现有方法只使用KG嵌入的结构知识或预训练语言模型（PLMs）的语义信息，导致模型性能不佳。此外，由于PLMs没有在KGs上进行训练，直接使用PLMs对三元组进行编码可能不合适。为了克服这些限制，我们提出了一个称为Bridge的新框架，该框架同时编码KGs的结构和语义信息。具体来说，我们通过PLMs分别对实体和关系进行战略编码，以更好地利用PLMs的语义知识，并通过结构学习原则实现结构化表示学习。此外，为了弥合KGs和PLMs之间的差距，我们采用了一种自监督表示学习方法称为BYOL，通过两种不同视图对PLMs进行微调。与BYOL不同，BYOL使用增强方法创建同一图像的两个语义相似视图，可能改变语义信息。我们战略地将三元组分成两部分以创建不同视图，从而避免语义改变。实验证明，Bridge在三个基准数据集上优于SOTA模型。

更新时间: 2024-11-11 01:59:04

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.06660v1

An Efficient Memory Module for Graph Few-Shot Class-Incremental Learning

Incremental graph learning has gained significant attention for its ability to address the catastrophic forgetting problem in graph representation learning. However, traditional methods often rely on a large number of labels for node classification, which is impractical in real-world applications. This makes few-shot incremental learning on graphs a pressing need. Current methods typically require extensive training samples from meta-learning to build memory and perform intensive fine-tuning of GNN parameters, leading to high memory consumption and potential loss of previously learned knowledge. To tackle these challenges, we introduce Mecoin, an efficient method for building and maintaining memory. Mecoin employs Structured Memory Units to cache prototypes of learned categories, as well as Memory Construction Modules to update these prototypes for new categories through interactions between the nodes and the cached prototypes. Additionally, we have designed a Memory Representation Adaptation Module to store probabilities associated with each class prototype, reducing the need for parameter fine-tuning and lowering the forgetting rate. When a sample matches its corresponding class prototype, the relevant probabilities are retrieved from the MRaM. Knowledge is then distilled back into the GNN through a Graph Knowledge Distillation Module, preserving the model's memory. We analyze the effectiveness of Mecoin in terms of generalization error and explore the impact of different distillation strategies on model performance through experiments and VC-dimension analysis. Compared to other related works, Mecoin shows superior performance in accuracy and forgetting rate. Our code is publicly available on the https://github.com/Arvin0313/Mecoin-GFSCIL.git .

Updated: 2024-11-11 01:53:14

标题: 一种用于图形少样本类增量学习的高效内存模块

摘要: 增量图学习因其在图表示学习中解决灾难性遗忘问题的能力而受到重视。然而，传统方法通常依赖大量标签进行节点分类，这在实际应用中是不切实际的。这使得在图上进行少样本增量学习成为一个迫切的需求。目前的方法通常需要来自元学习的大量训练样本来构建记忆，并对GNN参数进行密集微调，导致高内存消耗和可能丢失先前学到的知识。为了解决这些挑战，我们引入了Mecoin，一种用于构建和维护记忆的高效方法。Mecoin采用结构化记忆单元来缓存学习类别的原型，以及记忆构建模块来通过节点和缓存的原型之间的交互更新这些原型的新类别。此外，我们设计了一个记忆表示适应模块来存储与每个类别原型相关的概率，减少了参数微调的需求并降低了遗忘速率。当样本与其对应的类别原型匹配时，相关概率会从MRaM中检索出来。然后，知识通过图知识蒸馏模块重新注入到GNN中，保持模型的记忆。我们通过实验和VC维度分析分析了Mecoin在泛化误差方面的有效性，探讨了不同蒸馏策略对模型性能的影响。与其他相关工作相比，Mecoin在准确性和遗忘率方面表现出更优异的性能。我们的代码在https://github.com/Arvin0313/Mecoin-GFSCIL.git 上公开可用。

更新时间: 2024-11-11 01:53:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.06659v1

A Methodological Report on Anomaly Detection on Dynamic Knowledge Graphs

In this paper, we explore different approaches to anomaly detection on dynamic knowledge graphs, specifically in a Micro-services environment for Kubernetes applications. Our approach explores three dynamic knowledge graph representations: sequential data, hierarchical data and inter-service dependency data, with each representation incorporating increasingly complex structural information of dynamic knowledge graph. Different machine learning and deep learning models are tested on these representations. We empirically analyse their performance and propose an approach based on ensemble learning of these models. Our approach significantly outperforms the baseline on the ISWC 2024 Dynamic Knowledge Graph Anomaly Detection dataset, providing a robust solution for anomaly detection in dynamic complex data.

Updated: 2024-11-11 01:49:19

标题: 一个关于动态知识图异常检测的方法论报告

摘要: 在本文中，我们探讨了在Kubernetes应用程序的微服务环境中对动态知识图进行异常检测的不同方法。我们的方法探索了三种动态知识图表示：顺序数据、层次数据和服务间依赖数据，每种表示都包含越来越复杂的动态知识图结构信息。不同的机器学习和深度学习模型在这些表示上进行了测试。我们对它们的性能进行了实证分析，并提出了一种基于这些模型的集成学习方法。我们的方法在ISWC 2024动态知识图异常检测数据集上明显优于基线，为动态复杂数据中的异常检测提供了稳健的解决方案。

更新时间: 2024-11-11 01:49:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.06121v3

Renaissance: Investigating the Pretraining of Vision-Language Encoders

In the past several years there has been an explosion of available models for vision-language tasks. Unfortunately, the literature still leaves open a number of questions related to best practices in designing and training such models. In this paper we seek to answer several questions related to the pretraining of vision-language encoders through meta-analysis. In our first set of experiments, we show that we can save significant compute at no cost to downstream performance, by freezing large parts of vision-language models during pretraining. In our second set of experiments we examine the effect of basing a VL transformer on a vision model versus a text model. Additionally, we introduce a VL modeling platform called Renaissance that we use to conduct all of the experiments. This program offers a great deal of flexibility in creating, training and evaluating transformer encoders for VL modeling. The source code for Renaissance can be found at https://github.com/bsu-slim/renaissance.

Updated: 2024-11-11 01:44:54

标题: 文艺复兴：探讨视觉-语言编码器的预训练

摘要: 在过去的几年里，可用于视觉-语言任务的模型数量激增。不幸的是，文献仍然存在一些关于设计和训练这些模型的最佳实践的问题。在本文中，我们通过元分析试图回答与视觉-语言编码器的预训练相关的几个问题。在我们的第一组实验中，我们展示了通过在预训练期间冻结视觉-语言模型的大部分部分，可以节省大量计算资源而不会影响下游性能。在我们的第二组实验中，我们研究了将VL transformer基于视觉模型与文本模型的效果。此外，我们介绍了一个名为Renaissance的VL建模平台，我们使用它来进行所有实验。该程序在创建、训练和评估VL建模的transformer编码器方面提供了很大的灵活性。Renaissance的源代码可以在https://github.com/bsu-slim/renaissance找到。

更新时间: 2024-11-11 01:44:54

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.06657v1

Explore the Reasoning Capability of LLMs in the Chess Testbed

Reasoning is a central capability of human intelligence. In recent years, with the advent of large-scale datasets, pretrained large language models have emerged with new capabilities, including reasoning. However, these models still struggle with long-term, complex reasoning tasks, such as playing chess. Based on the observation that expert chess players employ a dual approach combining long-term strategic play with short-term tactical play along with language explanation, we propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic. Specifically, we collect a dataset named MATE, which consists of 1 million chess positions with candidate moves annotated by chess experts for strategy and tactics. We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting better chess moves. Our experiments show that our models perform better than GPT, Claude, and Gemini models. We find that language explanations can enhance the reasoning capability of large language models.

Updated: 2024-11-11 01:42:56

标题: 在国际象棋测试环境中探究LLMs的推理能力

摘要: Reasoning is a central capability of human intelligence. In recent years, with the advent of large-scale datasets, pretrained large language models have emerged with new capabilities, including reasoning. However, these models still struggle with long-term, complex reasoning tasks, such as playing chess. Based on the observation that expert chess players employ a dual approach combining long-term strategic play with short-term tactical play along with language explanation, we propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic. Specifically, we collect a dataset named MATE, which consists of 1 million chess positions with candidate moves annotated by chess experts for strategy and tactics. We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting better chess moves. Our experiments show that our models perform better than GPT, Claude, and Gemini models. We find that language explanations can enhance the reasoning capability of large language models.

更新时间: 2024-11-11 01:42:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.06655v1

Machine learning enabled velocity model building with uncertainty quantification

Accurately characterizing migration velocity models is crucial for a wide range of geophysical applications, from hydrocarbon exploration to monitoring of CO2 sequestration projects. Traditional velocity model building methods such as Full-Waveform Inversion (FWI) are powerful but often struggle with the inherent complexities of the inverse problem, including noise, limited bandwidth, receiver aperture and computational constraints. To address these challenges, we propose a scalable methodology that integrates generative modeling, in the form of Diffusion networks, with physics-informed summary statistics, making it suitable for complicated imaging problems including field datasets. By defining these summary statistics in terms of subsurface-offset image volumes for poor initial velocity models, our approach allows for computationally efficient generation of Bayesian posterior samples for migration velocity models that offer a useful assessment of uncertainty. To validate our approach, we introduce a battery of tests that measure the quality of the inferred velocity models, as well as the quality of the inferred uncertainties. With modern synthetic datasets, we reconfirm gains from using subsurface-image gathers as the conditioning observable. For complex velocity model building involving salt, we propose a new iterative workflow that refines amortized posterior approximations with salt flooding and demonstrate how the uncertainty in the velocity model can be propagated to the final product reverse time migrated images. Finally, we present a proof of concept on field datasets to show that our method can scale to industry-sized problems.

Updated: 2024-11-11 01:36:48

标题: 利用机器学习实现速度模型建设和不确定性量化

摘要: 准确表征迁移速度模型对于一系列地球物理应用至关重要，从油气勘探到二氧化碳封存项目的监测。传统的速度模型构建方法如全波形反演（FWI）虽然强大，但常常在逆问题的固有复杂性方面遇到困难，包括噪声、有限带宽、接收器孔径和计算约束。为了解决这些挑战，我们提出了一种可扩展的方法，将生成建模（以扩散网络形式）与物理信息的摘要统计相结合，使其适用于包括现场数据集在内的复杂成像问题。通过将这些摘要统计定义为针对初始速度模型不佳的井下偏移图像体，我们的方法允许计算高效地生成贝叶斯后验样本，以提供对迁移速度模型的有用不确定性评估。为验证我们的方法，我们引入了一系列测试来衡量推断速度模型的质量，以及推断不确定性的质量。通过使用现代合成数据集，我们再次确认使用井下图像收集作为条件观测的好处。对涉及盐的复杂速度模型构建，我们提出了一种通过盐水淹没来优化摊销后验逼近的新的迭代工作流程，并展示了速度模型中的不确定性如何传播到最终产品的逆时迁移图像中。最后，我们在现场数据集上展示了一个概念验证，表明我们的方法可以扩展到行业规模的问题。

更新时间: 2024-11-11 01:36:48

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2411.06651v1

Quantum Policy Gradient in Reproducing Kernel Hilbert Space

Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional complex Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum reinforcement learning. This paper proposes parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and analytical quantum policy gradient techniques, allows exploiting the many advantages of kernel methods, including available analytic forms for the gradient of the policy and tunable expressiveness. The proposed approach is suitable for vector-valued action spaces and each of the formulations demonstrates a quadratic reduction in query complexity compared to their classical counterparts. Two actor-critic algorithms, one based on stochastic policy gradient and one based on deterministic policy gradient (comparable to the popular DDPG algorithm), demonstrate additional query complexity reductions compared to quantum policy gradient algorithms under favourable conditions.

Updated: 2024-11-11 01:34:10

标题: 在再生核希尔伯特空间中的量子策略梯度

摘要: 参数化量子电路为机器学习提供了富有表现力和高效的数据表示。由于量子态驻留在高维复杂的希尔伯特空间中，参数化量子电路在核方法方面具有自然的解释。量子电路的表示与量子核之间的关系在量子监督学习中得到了广泛研究，但在量子强化学习的背景下却被忽视。本文提出了在量子环境中使用量子核策略的参数化和非参数化策略梯度和演员-评论家算法。这种方法采用数值和分析量子策略梯度技术实现，允许利用核方法的许多优势，包括策略梯度的梯度和可调的表现力的解析形式。所提出的方法适用于矢量值动作空间，每种形式相比于其经典对应物展示了查询复杂性的二次减少。两种演员-评论家算法，一种基于随机策略梯度，一种基于确定性策略梯度（类似于流行的DDPG算法），在有利条件下相比于量子策略梯度算法展示了额外的查询复杂性减少。

更新时间: 2024-11-11 01:34:10

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2411.06650v1

A Novel Combined Data-Driven Approach for Electricity Theft Detection

The two-way flow of information and energy is an important feature of the Energy Internet. Data analytics is a powerful tool in the information flow that aims to solve practical problems using data mining techniques. As the problem of electricity thefts via tampering with smart meters continues to increase, the abnormal behaviors of thefts become more diversified and more difficult to detect. Thus, a data analytics method for detecting various types of electricity thefts is required. However, the existing methods either require a labeled dataset or additional system information which is difficult to obtain in reality or have poor detection accuracy. In this paper, we combine two novel data mining techniques to solve the problem. One technique is the Maximum Information Coefficient (MIC), which can find the correlations between the non-technical loss (NTL) and a certain electricity behavior of the consumer. MIC can be used to precisely detect thefts that appear normal in shapes. The other technique is the clustering technique by fast search and find of density peaks (CFSFDP). CFSFDP finds the abnormal users among thousands of load profiles, making it quite suitable for detecting electricity thefts with arbitrary shapes. Next, a framework for combining the advantages of the two techniques is proposed. Numerical experiments on the Irish smart meter dataset are conducted to show the good performance of the combined method.

Updated: 2024-11-11 01:30:51

标题: 一种新颖的结合数据驱动方法用于电力盗窃检测

摘要: 信息和能量的双向流动是能源互联网的重要特征。数据分析是信息流中解决实际问题的强大工具，采用数据挖掘技术。随着通过篡改智能电表进行电力盗窃的问题不断增加，盗窃行为变得更加多样化和难以检测。因此，需要一种用于检测各种类型电力盗窃的数据分析方法。然而，现有方法要么需要标记数据集，要么需要额外的系统信息，这在现实中很难获取，或者检测精度很差。本文结合了两种新型数据挖掘技术来解决这个问题。一种技术是最大信息系数（MIC），可以找到非技术损失（NTL）与消费者某种电力行为之间的相关性。MIC可以精确检测在形状上看起来正常的盗窃行为。另一种技术是快速搜索和密度峰值查找（CFSFDP）的聚类技术。CFSFDP在成千上万的负荷配置文件中找到异常用户，非常适合检测具有任意形状的电力盗窃行为。接下来，提出了一个结合两种技术优势的框架。对爱尔兰智能电表数据集进行数值实验，展示了组合方法的良好性能。

更新时间: 2024-11-11 01:30:51

领域: eess.SY,cs.LG,cs.SY,eess.SP

下载: http://arxiv.org/abs/2411.06649v1

Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data

When training deep neural networks, a model's generalization error is often observed to follow a power scaling law dependent both on the model size and the data size. Perhaps the best known example of such scaling laws are for transformer-based large language models, where networks with billions of parameters are trained on trillions of tokens of text. Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. Our theory predicts a power law between the generalization error and both the training data size and the network size for transformers, where the power depends on the intrinsic dimension $d$ of the training data. Notably, the constructed model architecture is shallow, requiring only logarithmic depth in $d$. By leveraging low-dimensional data structures under a manifold hypothesis, we are able to explain transformer scaling laws in a way which respects the data geometry. Moreover, we test our theory with empirical observation by training LLMs on natural language datasets. We find the observed empirical data scaling laws closely agree with our theoretical predictions. Taken together, these results rigorously show the intrinsic dimension of data to be a crucial quantity affecting transformer scaling laws in both theory and practice.

Updated: 2024-11-11 01:05:28

标题: 理解变压器神经网络在固有低维数据上的统计和近似理论的缩放定律

摘要: 在训练深度神经网络时，通常观察到模型的泛化误差遵循一种功率比例定律，这种定律取决于模型大小和数据大小。也许最为人所知的这种比例定律的例子是基于transformer的大型语言模型，在这些模型中，数十亿个参数的网络被训练在数万亿个文本标记上。然而，尽管持续广泛的兴趣，对于为什么transformer比例定律存在的严格理解仍然缺失。为了回答这个问题，我们建立了新颖的统计估计和数学逼近理论，针对当输入数据集中在低维流形上时的transformer。我们的理论预测了transformer的泛化误差与训练数据大小和网络大小之间的幂定律关系，其中幂的大小取决于训练数据的固有维度d。值得注意的是，所构建的模型架构是浅层的，仅需要在d中进行对数深度。通过利用流形假设下的低维数据结构，我们能够解释transformer比例定律，从而尊重数据的几何性质。此外，我们通过在自然语言数据集上训练LLM模型来测试我们的理论。我们发现观察到的实证数据比例定律与我们的理论预测非常吻合。综合来看，这些结果严格表明数据的固有维度是影响transformer比例定律的关键数量，无论是在理论还是实践中。

更新时间: 2024-11-11 01:05:28

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2411.06646v1

Accelerating optimization over the space of probability measures

The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine learning applications. While much attention has been directed towards optimizing within Euclidean space, the need to optimize over spaces of probability measures in machine learning motivates exploration of accelerated gradient methods in this context too. To this end, we introduce a Hamiltonian-flow approach analogous to momentum-based approaches in Euclidean space. We demonstrate that, in the continuous-time setting, algorithms based on this approach can achieve convergence rates of arbitrarily high order. We complement our findings with numerical examples.

Updated: 2024-11-11 01:02:49

标题: 加速概率测度空间上的优化

摘要: 梯度优化方法的加速是一个具有重要实际和理论意义的主题，特别是在机器学习应用中。虽然人们已经致力于在欧几里得空间内进行优化，但在机器学习中需要优化概率测度空间的需求也促使我们探索在这个背景下加速梯度方法。为此，我们引入了一种类似于欧几里得空间动量方法的哈密顿流方法。我们证明，在连续时间设置中，基于这种方法的算法可以实现任意高阶的收敛速率。我们用数值例子来补充我们的发现。

更新时间: 2024-11-11 01:02:49

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2310.04006v4

Predicting Country Instability Using Bayesian Deep Learning and Random Forest

Country instability is a global issue, with unpredictably high levels of instability thwarting socio-economic growth and possibly causing a slew of negative consequences. As a result, uncertainty prediction models for a country are becoming increasingly important in the real world, and they are expanding to provide more input from 'big data' collections, as well as the interconnectedness of global economies and social networks. This has culminated in massive volumes of qualitative data from outlets like television, print, digital, and social media, necessitating the use of artificial intelligence (AI) tools like machine learning to make sense of it all and promote predictive precision [1]. The Global Database of Activities, Voice, and Tone (GDELT Project) records broadcast, print, and web news in over 100 languages every second of every day, identifying the people, locations, organisations, counts, themes, outlets, and events that propel our global community and offering a free open platform for computation on the entire world. The main goal of our research is to investigate how, when our data grows more voluminous and fine-grained, we can conduct a more complex methodological analysis of political conflict. The GDELT dataset, which was released in 2012, is the first and potentially the most technologically sophisticated publicly accessible dataset on political conflict.

Updated: 2024-11-11 00:23:03

标题: 使用贝叶斯深度学习和随机森林预测国家不稳定性

摘要: 国家不稳定是一个全球性问题，不可预测的高水平不稳定阻碍了社会经济增长，可能导致一系列负面后果。因此，针对一个国家的不确定性预测模型在现实世界中变得越来越重要，并且它们正在扩展以提供更多来自“大数据”收集以及全球经济和社交网络相互关联的输入。这导致了来自电视、印刷、数字和社交媒体等渠道的大量定性数据，需要利用人工智能（AI）工具如机器学习来理解这一切并促进预测的精度。《全球活动、声音和语调数据库》（GDELT项目）每天每秒记录了100多种语言的广播、印刷和网络新闻，识别推动我们全球社区的人员、地点、组织、计数、主题、渠道和事件，并提供一个免费开放的平台，用于对整个世界进行计算。我们研究的主要目标是探讨当我们的数据变得更加庞大和细致时，如何进行更复杂的方法论分析政治冲突。GDELT数据集于2012年发布，是关于政治冲突的第一个并且可能是最具技术先进性的公开可访问数据集。

更新时间: 2024-11-11 00:23:03

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2411.06639v1

Mixed Effects Deep Learning Autoencoder for interpretable analysis of single cell RNA Sequencing data

Single-cell RNA sequencing (scRNA-seq) data are often confounded due to technical or biological batch effects. Existing deep learning models aim to mitigate these effects but may inadvertently discard batch-specific information. We propose a Mixed Effects Deep Learning (MEDL) Autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling fixed effects representing biological states from random effects capturing batch-specific variations, MEDL integrates both types of information into predictive models, minimizing information loss. This approach improves interpretability enabling 2D visualizations that show how the same cell would appear across different batches, facilitating exploration of batch-specific variations. We applied MEDL to three datasets: Healthy Heart, Autism Spectrum Disorder (ASDc), and Acute Myeloid Leukemia (AML). In Healthy Heart, MEDL managed 147 batches, assessing its capacity to handle high batch numbers. In ASDc, MEDL captured donor heterogeneity between autistic and healthy individuals, while in AML, it distinguished heterogeneity in a complex setting with variable cell-type presence and malignant cells in diseased donors. These applications demonstrate MEDL's potential to capture fixed and random effects, improve visualization, and enhance predictive accuracy, offering a robust framework for cellular heterogeneity analysis across diverse datasets.

Updated: 2024-11-11 00:10:48

标题: 混合效应深度学习自编码器用于解释性分析单细胞RNA测序数据

摘要: 单细胞RNA测序（scRNA-seq）数据经常受到技术或生物批次效应的影响。现有的深度学习模型旨在减轻这些效应，但可能会无意中丢弃批次特定信息。我们提出了一个混合效应深度学习（MEDL）自动编码器框架，该框架分别对批次不变（固定效应）和批次特定（随机效应）分量进行建模。通过解耦代表生物状态的固定效应和捕捉批次特定变化的随机效应，MEDL将两种类型的信息整合到预测模型中，最小化信息丢失。这种方法提高了可解释性，能够进行2D可视化，展示了同一细胞在不同批次中的展示方式，便于探索批次特定变化。我们将MEDL应用于三个数据集：健康心脏、自闭症谱系障碍（ASDc）和急性髓样白血病（AML）。在健康心脏中，MEDL管理了147批次，评估其处理高批次数量的能力。在ASDc中，MEDL捕捉到自闭症患者和健康个体之间的供体异质性，而在AML中，它在复杂设置中区分了不同细胞类型的存在和患者供体中的恶性细胞的异质性。这些应用展示了MEDL捕获固定和随机效应、改善可视化并提高预测准确性的潜力，为跨多样数据集的细胞异质性分析提供了一个强大的框架。

更新时间: 2024-11-11 00:10:48

领域: cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2411.06635v1

Optimized Homomorphic Vector Permutation From New Decomposition Techniques

Homomorphic vector permutation is fundamental to privacy-preserving computations based on batch-encoded homomorphic encryption, underpinning nearly all homomorphic matrix operation algorithms and predominantly influencing their complexity. A potential approach to optimize this critical component lies in permutation decomposition, a technique we consider as not yet fully explored. In this paper, we enhance the efficiency of homomorphic permutations through novel decomposition techniques, thus advancing privacy-preserving computations. We start by estimating the ideal performance of decompositions on permutations and proposing an algorithm that searches depth-1 ideal decomposition solutions. This enables us to ascertain the full-depth ideal decomposability of specific permutations in homomorphic matrix transposition (SIGSAC 18) and multiplication (CCSW 22), allowing these privacy-preserving computations to achieve asymptotic improvement in speed and rotation key reduction. We further devise a new method for computing arbitrary homomorphic permutations, aiming to approximate the performance of ideal decomposition, as permutations with weak structures are unlikely to be ideally factorized. Our design deviates from the conventional scope of permutation decomposition. It outperforms state-of-the-art techniques (EUROCRYPT 12, CRYPTO 14) with a speed-up of up to $\times2.27$ under the minimum requirement of rotation keys.

Updated: 2024-11-11 00:08:09

标题: 优化的同态向量置换：来自新分解技术

摘要: 同态向量置换对基于批量编码同态加密的隐私保护计算至关重要，支撑几乎所有同态矩阵操作算法，并主要影响其复杂性。优化这一关键组件的潜在方法在于排列分解，这是我们认为尚未完全探索的技术。在本文中，我们通过新颖的分解技术提高了同态置换的效率，从而推进隐私保护计算。我们首先评估了排列上的分解的理想性能，并提出了一种搜索深度为1的理想分解解决方案的算法。这使我们能够确定同态矩阵转置（SIGSAC 18）和乘法（CCSW 22）中特定排列的完全深度理想可分解性，从而使这些隐私保护计算在速度和旋转密钥减少方面实现了渐近改进。我们进一步设计了一种新的计算任意同态置换的方法，旨在近似理想分解的性能，因为结构薄弱的排列不太可能被理想地因式分解。我们的设计偏离了常规的排列分解范围。在旋转密钥的最低要求下，它能够超越最先进的技术（EUROCRYPT 12，CRYPTO 14），速度提升高达2.27倍。

更新时间: 2024-11-11 00:08:09

领域: cs.CR

下载: http://arxiv.org/abs/2410.21840v3

Inductive Graph Few-shot Class Incremental Learning

Node classification with Graph Neural Networks (GNN) under a fixed set of labels is well known in contrast to Graph Few-Shot Class Incremental Learning (GFSCIL), which involves learning a GNN classifier as graph nodes and classes growing over time sporadically. We introduce inductive GFSCIL that continually learns novel classes with newly emerging nodes while maintaining performance on old classes without accessing previous data. This addresses the practical concern of transductive GFSCIL, which requires storing the entire graph with historical data. Compared to the transductive GFSCIL, the inductive setting exacerbates catastrophic forgetting due to inaccessible previous data during incremental training, in addition to overfitting issue caused by label sparsity. Thus, we propose a novel method, called Topology-based class Augmentation and Prototype calibration (TAP). To be specific, it first creates a triple-branch multi-topology class augmentation method to enhance model generalization ability. As each incremental session receives a disjoint subgraph with nodes of novel classes, the multi-topology class augmentation method helps replicate such a setting in the base session to boost backbone versatility. In incremental learning, given the limited number of novel class samples, we propose an iterative prototype calibration to improve the separation of class prototypes. Furthermore, as backbone fine-tuning poses the feature distribution drift, prototypes of old classes start failing over time, we propose the prototype shift method for old classes to compensate for the drift. We showcase the proposed method on four datasets.

Updated: 2024-11-11 00:06:20

标题: 归纳图Few-shot类增量学习

摘要: 使用图神经网络（GNN）进行节点分类在固定标签集下是众所周知的，与之对比的是图少样本类增量学习（GFSCIL），其中涉及学习作为图节点和类别随着时间零星增长的GNN分类器。我们引入了归纳GFSCIL，它在不访问先前数据的情况下持续学习新出现的节点的新类别，同时保持对旧类别的性能。这解决了传统GFSCIL的实际问题，后者需要存储包括历史数据在内的整个图。与传统GFSCIL相比，归纳设置在增量训练过程中由于无法访问以前的数据而加剧了灾难性遗忘，另外还存在由标签稀疏性引起的过拟合问题。因此，我们提出了一种新颖的方法，名为基于拓扑的类别增强和原型校准（TAP）。具体而言，它首先创建了一个三分支多拓扑类别增强方法来增强模型的泛化能力。由于每个增量会话接收一个包含新类别节点的不相交子图，多拓扑类别增强方法有助于在基础会话中复制这种设置，以提高骨干的多功能性。在增量学习中，考虑到有限数量的新类别样本，我们提出了迭代原型校准来改善类别原型的分离。此外，由于骨干微调导致特征分布漂移，随着时间的推移，旧类别原型开始失效，我们提出了用于旧类别的原型漂移方法来补偿漂移。我们在四个数据集上展示了所提出的方法。

更新时间: 2024-11-11 00:06:20

领域: cs.LG

下载: http://arxiv.org/abs/2411.06634v1