X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation. Specifically, given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions along with wide-range head movements. As its core, we leverage the generative prior of a pre-trained diffusion model as the rendering backbone, while achieve fine-grained head pose and expression control with novel controlling signals within the framework of ControlNet. In contrast to conventional coarse explicit controls such as facial landmarks, our motion control module is learned to interpret the dynamics directly from the original driving RGB inputs. The motion accuracy is further enhanced with a patch-based local control module that effectively enhance the motion attention to small-scale nuances like eyeball positions. Notably, to mitigate the identity leakage from the driving signals, we train our motion control modules with scaling-augmented cross-identity images, ensuring maximized disentanglement from the appearance reference modules. Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences, and showcase its proficiency in generating captivating portrait animations with consistently maintained identity characteristics.
Updated: 2024-03-27 23:57:47
标题: X-Portrait:具有层次运动注意力的富有表现力的肖像动画
摘要: 我们提出了一种创新的条件扩散模型X-Portrait,专为生成富有表现力和时间连贯的人像动画而设计。具体来说,我们旨在通过从驾驶视频派生的运动为外观参考的单个人像添加动画,捕捉高动态和微妙的面部表情以及广泛范围的头部运动。作为其核心,我们利用预训练的扩散模型的生成先验作为渲染骨干,同时在ControlNet框架内利用新颖的控制信号实现细粒度的头部姿势和表情控制。与传统的粗糙显式控制,如面部地标相比,我们的运动控制模块被学习以直接从原始的驾驶RGB输入中解释动态。通过一个基于补丁的局部控制模块进一步增强运动精度,有效地增强运动对小尺度细微差异如眼球位置的关注。值得注意的是,为了减轻从驾驶信号泄漏的身份信息,我们使用经过缩放增强的跨身份图像训练我们的运动控制模块,确保最大程度地与外观参考模块解耦。实验结果展示了X-Portrait在各种面部肖像和富有表现力的驾驶序列中的普遍有效性,并展示了它在生成引人入胜的人像动画中始终保持身份特征的熟练性。
更新时间: 2024-03-27 23:57:47
领域: cs.CV,cs.AI
Targeted collapse regularized autoencoder for anomaly detection: black hole at the center
Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms more complex alternatives. We further demonstrate that implementing this idea in the context of state-of-the-art methods can further improve their performance. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it helps with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.
Updated: 2024-03-27 23:54:26
标题: 目标折叠正则化自编码器用于异常检测:中心的黑洞
摘要: Autoencoders在最近异常检测技术的发展中被广泛使用。他们的应用前提是基于这样一个观念:在对正常训练数据进行训练后,异常输入将表现出显著的重构错误。因此,这使得可以清晰区分正常和异常样本。然而,在实践中,观察到autoencoders可以泛化到正常类别之外,并在一些异常样本上实现较小的重构误差。为了提高性能,各种技术提出了额外的组件和更复杂的训练程序。在这项工作中,我们提出了一个非常简单的替代方案:不是添加神经网络组件、复杂的计算和繁琐的训练,而是通过在潜在空间中使用一个计算轻量级的术语来补充重构损失,以调节表示的范数。我们的方法的简单性最小化了对超参数调整和为新应用定制的要求,配合其允许的数据模态约束,增强了跨广泛应用范围的成功采用潜力。我们在各种视觉和表格基准上测试该方法,并展示该技术与更复杂的替代方案匹配并经常表现更好。我们进一步证明,在最先进的方法背景下实施这个想法可以进一步提高它们的性能。我们还提供了理论分析和数值模拟,帮助展示训练过程中展开的基本过程以及如何帮助异常检测。这减轻了基于autoencoder的异常检测算法的黑盒特性,并为进一步研究优势、失败情况和潜在新方向提供了途径。
更新时间: 2024-03-27 23:54:26
领域: cs.LG,cs.AI,cs.CV,q-bio.NC,stat.ML
Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient Readmissions
This study investigates how machine learning (ML) models can predict hospital readmissions for diabetic patients fairly and accurately across different demographics (age, gender, race). We compared models like Deep Learning, Generalized Linear Models, Gradient Boosting Machines (GBM), and Naive Bayes. GBM stood out with an F1-score of 84.3% and accuracy of 82.2%, accurately predicting readmissions across demographics. A fairness analysis was conducted across all the models. GBM minimized disparities in predictions, achieving balanced results across genders and races. It showed low False Discovery Rates (FDR) (6-7%) and False Positive Rates (FPR) (5%) for both genders. Additionally, FDRs remained low for racial groups, such as African Americans (8%) and Asians (7%). Similarly, FPRs were consistent across age groups (4%) for both patients under 40 and those above 40, indicating its precision and ability to reduce bias. These findings emphasize the importance of choosing ML models carefully to ensure both accuracy and fairness for all patients. By showcasing effectiveness of various models with fairness metrics, this study promotes personalized medicine and the need for fair ML algorithms in healthcare. This can ultimately reduce disparities and improve outcomes for diabetic patients of all backgrounds.
Updated: 2024-03-27 23:49:22
标题: 医疗公平性:分析机器学习预测糖尿病患者再入院差异
摘要: 这项研究调查了机器学习(ML)模型如何可以在不同人口统计学特征(年龄、性别、种族)之间公平准确地预测糖尿病患者的住院再次入院。我们比较了诸如深度学习、广义线性模型、梯度提升机(GBM)和朴素贝叶斯等模型。GBM以84.3%的F1分数和82.2%的准确率脱颖而出,准确预测了不同人口统计学特征下的再次入院情况。对所有模型进行了公平性分析,GBM最大程度地减少了预测中的差异,实现了性别和种族之间的平衡结果。对于两性,其假发现率(FDR)(6-7%)和假阳性率(FPR)(5%)都很低。此外,对于种族群体,如非裔美国人(8%)和亚裔(7%),FDR保持在较低水平。同样,对于年龄组,FPR保持一致(4%),表明了其精确性和减少偏见的能力。这些发现强调了谨慎选择ML模型以确保所有患者准确性和公平性的重要性。通过展示不同模型在公平性指标上的有效性,这项研究促进了个性化医学以及医疗保健中公平ML算法的需求。这最终可以减少不同背景的糖尿病患者之间的差异,改善结果。
更新时间: 2024-03-27 23:49:22
领域: cs.LG
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
We show that physics-based simulations can be seamlessly integrated with NeRF to generate high-quality elastodynamics of real-world objects. Unlike existing methods, we discretize nonlinear hyperelasticity in a meshless way, obviating the necessity for intermediate auxiliary shape proxies like a tetrahedral mesh or voxel grid. A quadratic generalized moving least square (Q-GMLS) is employed to capture nonlinear dynamics and large deformation on the implicit model. Such meshless integration enables versatile simulations of complex and codimensional shapes. We adaptively place the least-square kernels according to the NeRF density field to significantly reduce the complexity of the nonlinear simulation. As a result, physically realistic animations can be conveniently synthesized using our method for a wide range of hyperelastic materials at an interactive rate. For more information, please visit our project page at https://fytalon.github.io/pienerf/.
Updated: 2024-03-27 23:49:07
标题: PIE-NeRF: 基于物理的具有 NeRF 的交互式弹性动力学
摘要: 我们展示了物理仿真可以与NeRF无缝集成,用于生成真实物体的高质量弹性动力学。与现有方法不同,我们以无网格的方式离散非线性超弹性,避免了需要像四面体网格或体素网格这样的中间辅助形状代理的必要性。采用二次广义移动最小二乘(Q-GMLS)来捕捉隐式模型上的非线性动态和大变形。这种无网格集成使得能够对复杂和共维形状进行多样化的仿真。我们根据NeRF密度场自适应地放置最小二乘核,从而大大降低非线性仿真的复杂性。因此,可以利用我们的方法方便地合成各种高弹性材料的物理逼真动画,并且以交互速率进行。更多信息,请访问我们的项目页面https://fytalon.github.io/pienerf/。
更新时间: 2024-03-27 23:49:07
领域: cs.CV,cs.AI,cs.GR,cs.LG
MacGyver: Are Large Language Models Creative Problem Solvers?
We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting. To this end, we create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems deliberately designed to trigger innovative usage of objects and necessitate out-of-the-box thinking. We then present our collection to both LLMs and humans to compare and contrast their problem-solving abilities. MACGYVER is challenging for both groups, but in unique and complementary ways. For instance, humans excel in tasks they are familiar with but struggle with domain-specific knowledge, leading to a higher variance. In contrast, LLMs, exposed to a variety of specialized knowledge, attempt broader problems but fail by proposing physically-infeasible actions. Finally, we provide a detailed error analysis of LLMs, and demonstrate the potential of enhancing their problem-solving ability with novel prompting techniques such as iterative step-wise reflection and divergent-convergent thinking. This work (1) introduces a fresh arena for intelligent agents focusing on intricate aspects of physical reasoning, planning, and unconventional thinking, which supplements the existing spectrum of machine intelligence; and (2) provides insight into the constrained problem-solving capabilities of both humans and AI.
Updated: 2024-03-27 23:43:54
标题: 麦克盖尔:大型语言模型是否具有创造性解决问题的能力?
摘要: 我们探讨了现代LLMs在一种新颖的受限环境中的创造性问题解决能力。为此,我们创建了MACGYVER,这是一个自动生成的数据集,包含超过1600个特意设计的现实世界问题,旨在触发物体的创新使用并需要跳出固有思维模式。然后,我们将这个数据集展示给LLMs和人类,以比较和对比它们的问题解决能力。MACGYVER对两组人都具有挑战性,但以独特和互补的方式。例如,人类在熟悉的任务上表现出色,但在领域特定知识方面遇到困难,导致了更高的方差。相比之下,LLMs接触到各种专业知识,尝试更广泛的问题,但通过提出在物理上不可行的行动而失败。最后,我们提供了LLMs的详细错误分析,并展示了通过新颖的提示技术(如迭代式分步反思和发散-收敛思维)提升它们的问题解决能力的潜力。 这项工作(1)引入了一个新颖的智能代理领域,侧重于物理推理、规划和非传统思维的复杂方面,为现有的机器智能谱系提供了补充;(2)提供了有关人类和人工智能受限问题解决能力的洞察。
更新时间: 2024-03-27 23:43:54
领域: cs.CL,cs.AI
Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities
As social media platforms are evolving from text-based forums into multi-modal environments, the nature of misinformation in social media is also transforming accordingly. Taking advantage of the fact that visual modalities such as images and videos are more favorable and attractive to the users and textual contents are sometimes skimmed carelessly, misinformation spreaders have recently targeted contextual connections between the modalities e.g., text and image. Hence many researchers have developed automatic techniques for detecting possible cross-modal discordance in web-based content. We analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new research opportunities in the field of multi-modal misinformation detection.
Updated: 2024-03-27 23:27:58
标题: 多模态误信息检测:方法、挑战和机遇
摘要: 随着社交媒体平台从基于文本的论坛逐渐发展为多模态环境,社交媒体中的错误信息性质也相应发生变化。利用图像和视频等视觉模态对用户更有吸引力的事实,以及有时用户对文本内容的粗心阅读,错误信息传播者最近开始针对模态之间的上下文连接,例如文本和图像。因此,许多研究人员已经开发了自动技术,用于检测网络内容中可能存在的跨模态不一致性。我们分析、分类和识别现有方法,以及它们面临的挑战和不足之处,以揭示多模态错误信息检测领域的新研究机会。
更新时间: 2024-03-27 23:27:58
领域: cs.LG,cs.AI,cs.CV,cs.CY,cs.MM,cs.SI
Differentiable Turbulence: Closure as a partial differential equation constrained optimization
Deep learning is increasingly becoming a promising pathway to improving the accuracy of sub-grid scale (SGS) turbulence closure models for large eddy simulations (LES). We leverage the concept of differentiable turbulence, whereby an end-to-end differentiable solver is used in combination with physics-inspired choices of deep learning architectures to learn highly effective and versatile SGS models for two-dimensional turbulent flow. We perform an in-depth analysis of the inductive biases in the chosen architectures, finding that the inclusion of small-scale non-local features is most critical to effective SGS modeling, while large-scale features can improve pointwise accuracy of the \textit{a-posteriori} solution field. The velocity gradient tensor on the LES grid can be mapped directly to the SGS stress via decomposition of the inputs and outputs into isotropic, deviatoric, and anti-symmetric components. We see that the model can generalize to a variety of flow configurations, including higher and lower Reynolds numbers and different forcing conditions. We show that the differentiable physics paradigm is more successful than offline, \textit{a-priori} learning, and that hybrid solver-in-the-loop approaches to deep learning offer an ideal balance between computational efficiency, accuracy, and generalization. Our experiments provide physics-based recommendations for deep-learning based SGS modeling for generalizable closure modeling of turbulence.
Updated: 2024-03-27 23:15:33
标题: 不同iable Turbulence: Closure as a partial differential equation constrained optimization (可微湍流:将闭合看作偏微分方程受约束优化)
摘要: 深度学习越来越成为改善大涡模拟(LES)中亚网格尺度(SGS)湍流闭合模型准确性的一条有前途的途径。我们利用可微湍流的概念,通过端到端可微分求解器与物理启发选择的深度学习架构相结合,学习出高效且多功能的二维湍流流动SGS模型。我们对所选架构中的归纳偏差进行了深入分析,发现包含小尺度非局部特征对于有效的SGS建模至关重要,而大尺度特征可以改善\textit{a-posteriori}解场的逐点准确性。LES网格上的速度梯度张量可以通过将输入和输出分解为各向同性、偏差和反对称组件,直接映射到SGS应力。我们发现该模型可以推广到各种流动配置,包括更高和更低的雷诺数以及不同的施加条件。我们展示了可微分物理范式比离线\textit{a-priori}学习更成功,并且深度学习中的混合求解器循环方法提供了计算效率、准确性和泛化之间的理想平衡。我们的实验为基于物理的深度学习SGS建模提供了推荐。
更新时间: 2024-03-27 23:15:33
领域: physics.flu-dyn,cs.LG
Detecting Generative Parroting through Overfitting Masked Autoencoders
The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted content in modified datasets. Preliminary evaluations demonstrate promising results, suggesting our method's potential to ensure ethical use and enhance the legal compliance of generative models.
Updated: 2024-03-27 23:10:33
标题: 通过过拟合掩码自编码器检测生成性模仿
摘要: 生成式人工智能模型的出现彻底改变了数字内容的创作方式,然而,由于生成性模仿,即模型过于模仿其训练数据,这也带来了维护版权完整性的挑战。我们的研究提出了一种新颖的方法来解决这个问题,即利用一个过度拟合的遮罩自动编码器(MAE)来有效检测这种模仿样本。我们基于训练数据集的平均损失建立了一个检测阈值,可以精确识别修改数据集中的模仿内容。初步评估显示了有希望的结果,表明我们的方法有潜力确保道德使用并增强生成模型的法律合规性。
更新时间: 2024-03-27 23:10:33
领域: cs.LG,cs.AI
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2Kx2K).
Updated: 2024-03-27 22:58:34
标题: HybridNeRF:通过自适应体积表面实现高效的神经渲染
摘要: 神经辐射场提供了最先进的视图合成质量,但渲染速度较慢。其中一个原因是它们利用体积渲染,因此在渲染时需要每条射线多次采样(和模型查询)。尽管这种表示灵活且易于优化,但大多数现实世界的物体可以更有效地用表面而不是体积来建模,从而需要每条射线的采样次数大大减少。这一观察结果促使表面表示(如符号距离函数)取得了相当大的进展,但这些方法可能难以模拟半透明和薄结构。我们提出了一种方法,HybridNeRF,利用两种表示的优势,将大多数物体呈现为表面,同时对(通常是)具有挑战性的区域进行体积建模。我们将HybridNeRF与具有挑战性的Eyeful Tower数据集以及其他常用的视图合成数据集进行了评估。与最先进的基线(包括最近的光栅化方法)进行比较时,我们将错误率提高了15-30%,同时实现了虚拟现实分辨率(2Kx2K)的实时帧速率(至少36 FPS)。
更新时间: 2024-03-27 22:58:34
领域: cs.CV,cs.GR,cs.LG
LITA: Language Instructed Temporal-Localization Assistant
There has been tremendous progress in multimodal Large Language Models (LLMs). Recent works have extended these models to video input with promising instruction following capabilities. However, an important missing piece is temporal localization. These models cannot accurately answer the "When?" questions. We identify three key aspects that limit their temporal localization capabilities: (i) time representation, (ii) architecture, and (iii) data. We address these shortcomings by proposing Language Instructed Temporal-Localization Assistant (LITA) with the following features: (1) We introduce time tokens that encode timestamps relative to the video length to better represent time in videos. (2) We introduce SlowFast tokens in the architecture to capture temporal information at fine temporal resolution. (3) We emphasize temporal localization data for LITA. In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task. Reasoning temporal localization requires both the reasoning and temporal localization of Video LLMs. LITA demonstrates strong performance on this challenging task, nearly doubling the temporal mean intersection-over-union (mIoU) of baselines. In addition, we show that our emphasis on temporal localization also substantially improves video-based text generation compared to existing Video LLMs, including a 36% relative improvement of Temporal Understanding. Code is available at: https://github.com/NVlabs/LITA
Updated: 2024-03-27 22:50:48
标题: LITA: 语言指导的时间定位助手
摘要: 多模式大型语言模型(LLMs)取得了巨大的进展。最近的研究将这些模型扩展到视频输入,并具有有前景的指导性能。然而,一个重要缺失的部分是时间定位。这些模型无法准确回答“何时?”的问题。我们确定了限制它们时间定位能力的三个关键方面:(i) 时间表示,(ii) 架构,和(iii) 数据。我们通过提出语言指导的时间定位助手(LITA)来解决这些缺点,具有以下特点:(1) 我们引入了时间标记,以相对于视频长度编码时间戳,以更好地表示视频中的时间。(2) 我们在架构中引入了SlowFast标记,以捕获精细的时间分辨率下的时间信息。(3) 我们强调LITA的时间定位数据。除了利用具有时间戳的现有视频数据集外,我们还提出了一个新任务,推理时间定位(RTL),以及用于学习和评估此任务的数据集ActivityNet-RTL。推理时间定位要求视频LLMs的推理和时间定位。LITA在这一具有挑战性的任务上表现出色,几乎使基线的时间平均交集-联合(mIoU)翻了一番。此外,我们展示了我们对时间定位的强调也相对于现有的视频LLMs大大改善了基于视频的文本生成,包括对时间理解的36%相对改善。代码可在以下网址找到:https://github.com/NVlabs/LITA
更新时间: 2024-03-27 22:50:48
领域: cs.CV,cs.AI
Computationally and Memory-Efficient Robust Predictive Analytics Using Big Data
In the current data-intensive era, big data has become a significant asset for Artificial Intelligence (AI), serving as a foundation for developing data-driven models and providing insight into various unknown fields. This study navigates through the challenges of data uncertainties, storage limitations, and predictive data-driven modeling using big data. We utilize Robust Principal Component Analysis (RPCA) for effective noise reduction and outlier elimination, and Optimal Sensor Placement (OSP) for efficient data compression and storage. The proposed OSP technique enables data compression without substantial information loss while simultaneously reducing storage needs. While RPCA offers an enhanced alternative to traditional Principal Component Analysis (PCA) for high-dimensional data management, the scope of this work extends its utilization, focusing on robust, data-driven modeling applicable to huge data sets in real-time. For that purpose, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network, are applied to model and predict data based on a low-dimensional subset obtained from OSP, leading to a crucial acceleration of the training phase. LSTMs are feasible for capturing long-term dependencies in time series data, making them particularly suited for predicting the future states of physical systems on historical data. All the presented algorithms are not only theorized but also simulated and validated using real thermal imaging data mapping a ship's engine.
Updated: 2024-03-27 22:39:08
标题: 使用大数据进行高效计算和存储的强大预测分析
摘要: 在当前数据密集的时代,大数据已经成为人工智能(AI)的重要资产,为开发基于数据的模型提供基础,并揭示各种未知领域的见解。本研究通过大数据导航数据不确定性、存储限制和使用大数据进行预测数据驱动建模的挑战。我们利用鲁棒主成分分析(RPCA)进行有效的噪声降低和异常值消除,以及最佳传感器布置(OSP)进行高效的数据压缩和存储。所提出的OSP技术实现了数据压缩,同时减少存储需求而没有实质性信息丢失。虽然RPCA为高维数据管理提供了一种优化的替代方案,但本工作的范围扩展了其利用,着重于鲁棒的、适用于实时庞大数据集的数据驱动建模。为此,采用了一种称为长短期记忆(LSTM)网络的循环神经网络,用于根据从OSP获得的低维子集对数据进行建模和预测,从而加速训练阶段的关键过程。LSTM适用于捕获时间序列数据中的长期依赖关系,使其特别适用于基于历史数据预测物理系统未来状态。所有提出的算法不仅在理论上得到验证,还使用真实的热成像数据对船舶引擎进行了模拟和验证。
更新时间: 2024-03-27 22:39:08
领域: cs.LG,cs.AI,eess.IV
Visualizing High-Dimensional Temporal Data Using Direction-Aware t-SNE
Many real-world data sets contain a temporal component or involve transitions from state to state. For exploratory data analysis, we can represent these high-dimensional data sets in two-dimensional maps, using embeddings of the data objects under exploration and representing their temporal relationships with directed edges. Most existing dimensionality reduction techniques, such as t-SNE and UMAP, do not take into account the temporal or relational nature of the data when constructing the embeddings, resulting in temporally cluttered visualizations that obscure potentially interesting patterns. To address this problem, we propose two complementary, direction-aware loss terms in the optimization function of t-SNE that emphasize the temporal aspects of the data, guiding the optimization and the resulting embedding to reveal temporal patterns that might otherwise go unnoticed. The Directional Coherence Loss (DCL) encourages nearby arrows connecting two adjacent time series points to point in the same direction, while the Edge Length Loss (ELL) penalizes arrows - which effectively represent time gaps in the visualized embedding - based on their length. Both loss terms are differentiable and can be easily incorporated into existing dimensionality reduction techniques. By promoting local directionality of the directed edges, our procedure produces more temporally meaningful and less cluttered visualizations. We demonstrate the effectiveness of our approach on a toy dataset and two real-world datasets.
Updated: 2024-03-27 22:26:50
标题: 使用方向感知t-SNE可视化高维时间数据
摘要: 许多真实世界的数据集包含时间组成部分或涉及从一个状态转换到另一个状态。对于探索性数据分析,我们可以使用数据对象的嵌入表示这些高维数据集在二维地图中,并使用有向边表示它们之间的时间关系。大多数现有的降维技术,如 t-SNE 和 UMAP,在构建嵌入时并未考虑数据的时间性或关联性,导致时间上混乱的可视化呈现,模糊了潜在的有趣模式。为解决这一问题,我们在 t-SNE 的优化函数中提出了两个互补的方向感知损失项,强调数据的时间方面,引导优化和生成的嵌入以揭示可能被忽视的时间模式。方向一致性损失(DCL)鼓励连接两个相邻时间序列点的附近箭头指向同一方向,而边长损失(ELL)基于它们的长度惩罚箭头 - 这些箭头有效地表示了可视化嵌入中的时间间隔。这两个损失项均可微分,并可轻松地纳入现有的降维技术。通过促进有向边的局部方向性,我们的方法产生了更具有时间意义且更少混乱的可视化效果。我们在一个玩具数据集和两个真实世界数据集上展示了我们方法的有效性。
更新时间: 2024-03-27 22:26:50
领域: cs.LG,cs.HC
Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data
Large language models (LLMs) have demonstrated remarkable success in NLP tasks. However, there is a paucity of studies that attempt to evaluate their performances on social media-based health-related natural language processing tasks, which have traditionally been difficult to achieve high scores in. We benchmarked one supervised classic machine learning model based on Support Vector Machines (SVMs), three supervised pretrained language models (PLMs) based on RoBERTa, BERTweet, and SocBERT, and two LLM based classifiers (GPT3.5 and GPT4), across 6 text classification tasks. We developed three approaches for leveraging LLMs for text classification: employing LLMs as zero-shot classifiers, us-ing LLMs as annotators to annotate training data for supervised classifiers, and utilizing LLMs with few-shot examples for augmentation of manually annotated data. Our comprehensive experiments demonstrate that employ-ing data augmentation using LLMs (GPT-4) with relatively small human-annotated data to train lightweight supervised classification models achieves superior results compared to training with human-annotated data alone. Supervised learners also outperform GPT-4 and GPT-3.5 in zero-shot settings. By leveraging this data augmentation strategy, we can harness the power of LLMs to develop smaller, more effective domain-specific NLP models. LLM-annotated data without human guidance for training light-weight supervised classification models is an ineffective strategy. However, LLM, as a zero-shot classifier, shows promise in excluding false negatives and potentially reducing the human effort required for data annotation. Future investigations are imperative to explore optimal training data sizes and the optimal amounts of augmented data.
Updated: 2024-03-27 22:05:10
标题: 使用公开社交媒体数据评估大型语言模型在健康相关文本分类任务中的效果
摘要: 大型语言模型(LLMs)在自然语言处理(NLP)任务中表现出了显著的成功。然而,很少有研究试图评估它们在基于社交媒体的健康相关自然语言处理任务中的表现,这些任务传统上很难获得高分。我们对基于支持向量机(SVMs)的一个监督经典机器学习模型,三个基于RoBERTa、BERTweet和SocBERT的监督预训练语言模型(PLMs),以及两个基于GPT3.5和GPT4的LLM分类器进行了基准测试,涵盖了6个文本分类任务。我们开发了三种利用LLMs进行文本分类的方法:将LLMs作为零-shot分类器,使用LLMs作为注释器为监督分类器注释训练数据,并利用LLMs的少量示例来增强手动注释数据。我们的全面实验表明,利用LLMs(GPT-4)进行数据增强,结合相对较小的人工注释数据训练轻量级监督分类模型,比仅使用人工注释数据训练取得了更好的结果。监督学习器在零-shot设置中也优于GPT-4和GPT-3.5。通过利用这种数据增强策略,我们可以利用LLMs的力量开发更小、更有效的领域特定NLP模型。没有人类指导的LLM注释数据用于训练轻量级监督分类模型是一种无效的策略。然而,LLM作为零-shot分类器,在排除假阴性和潜在减少数据注释所需的人力方面显示出潜力。未来的研究是必不可少的,以探索最佳训练数据大小和最佳增强数据量。
更新时间: 2024-03-27 22:05:10
领域: cs.CL,cs.AI,cs.LG
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Learning from Preferential Feedback (LfPF) plays an essential role in training Large Language Models, as well as certain types of interactive learning agents. However, a substantial gap exists between the theory and application of LfPF algorithms. Current results guaranteeing the existence of optimal policies in LfPF problems assume that both the preferences and transition dynamics are determined by a Markov Decision Process. We introduce the Direct Preference Process, a new framework for analyzing LfPF problems in partially-observable, non-Markovian environments. Within this framework, we establish conditions that guarantee the existence of optimal policies by considering the ordinal structure of the preferences. We show that a decision-making problem can have optimal policies -- that are characterized by recursive optimality equations -- even when no reward function can express the learning goal. These findings underline the need to explore preference-based learning strategies which do not assume that preferences are generated by reward.
Updated: 2024-03-27 22:03:46
标题: 保证存在最优策略的偏好关系条件
摘要: 学习偏好反馈(LfPF)在训练大型语言模型以及某些类型的交互式学习代理中起着至关重要的作用。然而,在LfPF算法的理论和应用之间存在着实质性的差距。当前的结果保证了在LfPF问题中存在最优策略,假设偏好和转移动态都由马尔可夫决策过程确定。我们引入了直接偏好过程,这是一个分析部分可观察、非马尔可夫环境中LfPF问题的新框架。在这个框架内,我们建立了保证存在最优策略的条件,通过考虑偏好的序数结构。我们展示了决策问题可以有最优策略,这些策略以递归最优方程式为特征,即使没有奖励函数可以表达学习目标。这些发现强调了需要探索基于偏好的学习策略,而不是假设偏好是由奖励生成的。
更新时间: 2024-03-27 22:03:46
领域: cs.LG
Visual Acuity Prediction on Real-Life Patient Data Using a Machine Learning Based Multistage System
In ophthalmology, intravitreal operative medication therapy (IVOM) is a widespread treatment for diseases related to the age-related macular degeneration (AMD), the diabetic macular edema (DME), as well as the retinal vein occlusion (RVO). However, in real-world settings, patients often suffer from loss of vision on time scales of years despite therapy, whereas the prediction of the visual acuity (VA) and the earliest possible detection of deterioration under real-life conditions is challenging due to heterogeneous and incomplete data. In this contribution, we present a workflow for the development of a research-compatible data corpus fusing different IT systems of the department of ophthalmology of a German maximum care hospital. The extensive data corpus allows predictive statements of the expected progression of a patient and his or her VA in each of the three diseases. For the disease AMD, we found out a significant deterioration of the visual acuity over time. Within our proposed multistage system, we subsequently classify the VA progression into the three groups of therapy "winners", "stabilizers", and "losers" (WSL classification scheme). Our OCT biomarker classification using an ensemble of deep neural networks results in a classification accuracy (F1-score) of over 98 %, enabling us to complete incomplete OCT documentations while allowing us to exploit them for a more precise VA modelling process. Our VA prediction requires at least four VA examinations and optionally OCT biomarkers from the same time period to predict the VA progression within a forecasted time frame, whereas our prediction is currently restricted to IVOM / no therapy. We achieve a final prediction accuracy of 69 % in macro average F1-score, while being in the same range as the ophthalmologists with 57.8 and 50 +- 10.7 % F1-score.
Updated: 2024-03-27 22:02:30
标题: 使用基于机器学习的多阶段系统预测现实患者数据上的视觉敏锐度
摘要: 在眼科学中,玻璃体内手术药物治疗(IVOM)是治疗与老年性黄斑变性(AMD)、糖尿病黄斑水肿(DME)以及视网膜静脉阻塞(RVO)相关疾病的普遍方法。然而,在现实世界中,尽管接受治疗,患者往往会在数年的时间尺度上出现视力丧失,由于数据的异质性和不完整性,预测视力(VA)和在真实环境下尽早检测恶化是具有挑战性的。在这篇文章中,我们提出了一个工作流程,用于融合德国一家综合医院眼科部门的不同IT系统的数据,以开发一个与研究兼容的数据语料库。这个庞大的数据语料库允许预测患者及其在这三种疾病中的VA的预期进展。对于AMD疾病,我们发现视力随时间显著恶化。在我们提出的多阶段系统中,我们随后将VA进展分类为“赢家”、“稳定者”和“输家”三组(WSL分类方案)。我们使用深度神经网络集成的OCT生物标记分类结果达到了超过98%的分类准确度(F1分数),使我们能够完成不完整的OCT文档同时将其用于更精确的VA建模过程。我们的VA预测需要至少四次VA检查和可选的同一时间段的OCT生物标记来预测预期的时间范围内的VA进展,目前我们的预测仅限于IVOM/无治疗。我们在宏平均F1分数中实现了69%的最终预测准确度,与眼科医生的57.8和50+-10.7%的F1分数处于相同水平。
更新时间: 2024-03-27 22:02:30
领域: eess.IV,cs.CV,cs.IR,cs.LG
ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's Disease
Alzheimer's Disease (AD) and related dementia are a growing global health challenge due to the aging population. In this paper, we present ADMarker, the first end-to-end system that integrates multi-modal sensors and new federated learning algorithms for detecting multidimensional AD digital biomarkers in natural living environments. ADMarker features a novel three-stage multi-modal federated learning architecture that can accurately detect digital biomarkers in a privacy-preserving manner. Our approach collectively addresses several major real-world challenges, such as limited data labels, data heterogeneity, and limited computing resources. We built a compact multi-modality hardware system and deployed it in a four-week clinical trial involving 91 elderly participants. The results indicate that ADMarker can accurately detect a comprehensive set of digital biomarkers with up to 93.8% accuracy and identify early AD with an average of 88.9% accuracy. ADMarker offers a new platform that can allow AD clinicians to characterize and track the complex correlation between multidimensional interpretable digital biomarkers, demographic factors of patients, and AD diagnosis in a longitudinal manner.
Updated: 2024-03-27 21:56:59
标题: ADMarker:用于监测阿尔茨海默病数字生物标志物的多模态联邦学习系统
摘要: 阿尔茨海默病(AD)及相关痴呆是由于人口老龄化而日益增长的全球健康挑战。在本文中,我们介绍了ADMarker,这是第一个集成多模式传感器和新的联邦学习算法的端到端系统,用于在自然生活环境中检测多维AD数字生物标志物。ADMarker具有一种新颖的三阶段多模式联邦学习架构,可以以保护隐私的方式准确检测数字生物标志物。我们的方法共同解决了一些主要的现实世界挑战,如有限的数据标签、数据异质性和有限的计算资源。我们构建了一个紧凑的多模式硬件系统,并在一个持续四周的临床试验中部署,涉及91名老年参与者。结果表明,ADMarker可以准确检测全面的数字生物标志物,准确率高达93.8%,并以平均88.9%的准确率识别早期AD。ADMarker提供了一个新的平台,可以让AD临床医生以纵向方式表征和跟踪多维可解释数字生物标志物、患者的人口因素和AD诊断之间的复杂关联。
更新时间: 2024-03-27 21:56:59
领域: cs.LG
When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings
Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP.
Updated: 2024-03-27 21:51:03
标题: 当SMILES有语言:使用文本分类方法对药物SMILES字符串进行药物分类
摘要: 复杂的化学结构,如药物,通常通过SMILES字符串定义为分子和键的序列。这些SMILES字符串被用于不同的基于复杂机器学习的药物相关研究和表示工作中。在这项工作中,我们避免复杂的表示,提出一个问题:如果我们将药物的SMILES视为常规句子,并进行文本分类以进行药物分类,会怎么样?我们的实验证实了这种可能性,并取得了非常有竞争力的分数。该研究探讨了将每个原子和键视为句子组成部分的概念,利用基本的自然语言处理方法对药物类型进行分类,证明了复杂问题也可以通过更简单的视角来解决。数据和代码可在此处获得:https://github.com/azminewasi/Drug-Classification-NLP.
更新时间: 2024-03-27 21:51:03
领域: q-bio.BM,cs.CL,cs.IR,cs.LG,stat.ML
CADGL: Context-Aware Deep Graph Learning for Predicting Drug-Drug Interactions
Examining Drug-Drug Interactions (DDIs) is a pivotal element in the process of drug development. DDIs occur when one drug's properties are affected by the inclusion of other drugs. Detecting favorable DDIs has the potential to pave the way for creating and advancing innovative medications applicable in practical settings. However, existing DDI prediction models continue to face challenges related to generalization in extreme cases, robust feature extraction, and real-life application possibilities. We aim to address these challenges by leveraging the effectiveness of context-aware deep graph learning by introducing a novel framework named CADGL. Based on a customized variational graph autoencoder (VGAE), we capture critical structural and physio-chemical information using two context preprocessors for feature extraction from two different perspectives: local neighborhood and molecular context, in a heterogeneous graphical structure. Our customized VGAE consists of a graph encoder, a latent information encoder, and an MLP decoder. CADGL surpasses other state-of-the-art DDI prediction models, excelling in predicting clinically valuable novel DDIs, supported by rigorous case studies.
Updated: 2024-03-27 21:47:49
标题: CADGL:上下文感知深度图学习用于预测药物相互作用
摘要: 检查药物相互作用(DDIs)是药物开发过程中的关键元素。当一个药物的特性受到其他药物的影响时,就会发生DDIs。检测有利的DDIs有可能为创造和推进创新药物在实际环境中的应用铺平道路。然而,现有的DDI预测模型在极端情况下仍面临着与泛化、强大的特征提取和实际应用可能性相关的挑战。我们旨在通过利用上下文感知的深度图学习的有效性,通过引入一种名为CADGL的新型框架来解决这些挑战。基于自定义变分图自动编码器(VGAE),我们利用两个上下文预处理器从两个不同的角度提取特征:局部邻域和分子上下文,在异质图结构中捕获关键的结构和生理化学信息。我们的定制VGAE包括一个图编码器,一个潜在信息编码器和一个MLP解码器。CADGL超越了其他最先进的DDI预测模型,在预测临床有价值的新型DDIs方面表现出色,并得到了严格的案例研究支持。
更新时间: 2024-03-27 21:47:49
领域: cs.LG,cs.AI,cs.IR,q-bio.BM,q-bio.MN
Tightest Admissible Shortest Path
The shortest path problem in graphs is fundamental to AI. Nearly all variants of the problem and relevant algorithms that solve them ignore edge-weight computation time and its common relation to weight uncertainty. This implies that taking these factors into consideration can potentially lead to a performance boost in relevant applications. Recently, a generalized framework for weighted directed graphs was suggested, where edge-weight can be computed (estimated) multiple times, at increasing accuracy and run-time expense. We build on this framework to introduce the problem of finding the tightest admissible shortest path (TASP); a path with the tightest suboptimality bound on the optimal cost. This is a generalization of the shortest path problem to bounded uncertainty, where edge-weight uncertainty can be traded for computational cost. We present a complete algorithm for solving TASP, with guarantees on solution quality. Empirical evaluation supports the effectiveness of this approach.
Updated: 2024-03-27 21:46:41
标题: 最严格可接受的最短路径
摘要: 图中的最短路径问题对人工智能至关重要。几乎所有问题的变体和相关算法都忽略了边权重计算时间及其与权重不确定性的常见关系。这意味着考虑这些因素可能潜在地提高相关应用的性能。最近,提出了一个通用的加权有向图框架,其中边权重可以多次计算(估计),并且准确性和运行时间开销逐渐增加。我们基于这个框架引入了寻找最紧密可接受最短路径(TASP)的问题;这是一条具有最紧密次优性边界的最佳成本的路径。这是对最短路径问题到有界不确定性的泛化,其中边权重不确定性可以与计算成本进行交换。我们提出了一个完整的算法来解决TASP,并保证解决方案的质量。实证评估支持这种方法的有效性。
更新时间: 2024-03-27 21:46:41
领域: cs.DS,cs.AI,cs.DM
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, rendering them incapable of generalizing to complex, unseen prompts. In this paper, we propose Proximal Reward Difference Prediction (PRDP), enabling stable black-box reward finetuning for diffusion models for the first time on large-scale prompt datasets with over 100K prompts. Our key innovation is the Reward Difference Prediction (RDP) objective that has the same optimal solution as the RL objective while enjoying better training stability. Specifically, the RDP objective is a supervised regression objective that tasks the diffusion model with predicting the reward difference of generated image pairs from their denoising trajectories. We theoretically prove that the diffusion model that obtains perfect reward difference prediction is exactly the maximizer of the RL objective. We further develop an online algorithm with proximal updates to stably optimize the RDP objective. In experiments, we demonstrate that PRDP can match the reward maximization ability of well-established RL-based methods in small-scale training. Furthermore, through large-scale training on text prompts from the Human Preference Dataset v2 and the Pick-a-Pic v1 dataset, PRDP achieves superior generation quality on a diverse set of complex, unseen prompts whereas RL-based methods completely fail.
Updated: 2024-03-27 21:37:39
标题: PRDP:大规模奖励微调扩散模型的近端奖励差异预测
摘要: 奖励微调已成为一种有前途的方法,用于将基础模型与下游目标对齐。在语言领域,通过使用强化学习(RL)来最大化反映人类偏好的奖励,取得了显著的成功。然而,在视觉领域,现有基于RL的奖励微调方法受到大规模训练中不稳定性的限制,使它们无法泛化到复杂的、未见过的提示。在本文中,我们提出了Proximal Reward Difference Prediction(PRDP),首次在拥有100K个以上提示的大规模数据集上实现了对扩散模型的稳定黑盒奖励微调。我们的关键创新是奖励差异预测(RDP)目标,该目标与RL目标具有相同的最优解,同时具有更好的训练稳定性。具体来说,RDP目标是一个监督回归目标,要求扩散模型预测从其去噪轨迹中生成的图像对的奖励差异。我们在理论上证明,获得完美奖励差异预测的扩散模型恰好是RL目标的最大化器。我们进一步开发了一个带有近端更新的在线算法,以稳定地优化RDP目标。在实验中,我们证明PRDP在小规模训练中可以达到已建立的基于RL的方法的奖励最大化能力。此外,通过对Human Preference Dataset v2和Pick-a-Pic v1数据集中的文本提示进行大规模训练,PRDP在各种复杂、未见过的提示上实现了优越的生成质量,而基于RL的方法完全失败。
更新时间: 2024-03-27 21:37:39
领域: cs.LG,cs.AI
Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards
Recent work in reinforcement learning has leveraged symmetries in the model to improve sample efficiency in training a policy. A commonly used simplifying assumption is that the dynamics and reward both exhibit the same symmetry. However, in many real-world environments, the dynamical model exhibits symmetry independent of the reward model: the reward may not satisfy the same symmetries as the dynamics. In this paper, we investigate scenarios where only the dynamics are assumed to exhibit symmetry, extending the scope of problems in reinforcement learning and learning in control theory where symmetry techniques can be applied. We use Cartan's moving frame method to introduce a technique for learning dynamics which, by construction, exhibit specified symmetries. We demonstrate through numerical experiments that the proposed method learns a more accurate dynamical model.
Updated: 2024-03-27 21:31:46
标题: 利用动态对称性在模型基强化学习中利用非对称奖励
摘要: 最近在强化学习领域的研究利用模型中的对称性来提高训练策略的样本效率。一个常用的简化假设是动态和奖励都表现出相同的对称性。然而,在许多现实世界的环境中,动态模型表现出独立于奖励模型的对称性:奖励可能不满足与动态相同的对称性。在本文中,我们研究了只假定动态表现出对称性的情况,扩展了强化学习和控制理论学习中可以应用对称技术的问题范围。我们使用Cartan的移动框架方法引入了一种学习动态的技术,通过构建,这些动态表现出指定的对称性。我们通过数值实验证明了所提出的方法学习了一个更准确的动态模型。
更新时间: 2024-03-27 21:31:46
领域: cs.LG,cs.AI,cs.RO,cs.SY,eess.SY
Towards LLM-RecSys Alignment with Textual ID Learning
Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using concise yet meaningful ID representations. To better align LLMs with recommendation needs, we propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID using human language tokens. This is achieved by training a textual ID generator alongside the LLM-based recommender, enabling seamless integration of personalized recommendations into natural language generation. Notably, as user history is expressed in natural language and decoupled from the original dataset, our approach suggests the potential for a foundational generative recommendation model. Experiments show that our framework consistently surpasses existing models in sequential recommendation under standard experimental setting. Then, we explore the possibility of training a foundation recommendation model with the proposed method on data collected from 19 different datasets and tested its recommendation performance on 6 unseen datasets across different platforms under a completely zero-shot setting. The results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models based on supervised training, showing the potential of the IDGen paradigm serving as the foundation model for generative recommendation. Code and data are open-sourced at https://github.com/agiresearch/IDGenRec.
Updated: 2024-03-27 21:22:37
标题: 朝向LLM-RecSys与文本ID学习的对齐
摘要: 基于大型语言模型(LLMs)的生成式推荐已经将传统的基于排名的推荐风格转变为一种文本生成范式。然而,与标准的自然语言处理任务不同的是,当前在生成式推荐方面的研究很难有效地将推荐项编码到文本生成框架中,使用简洁而有意义的ID表示。为了更好地与推荐需求对齐LLMs,我们提出了IDGen,将每个项目表示为一个独特、简洁、语义丰富、跨平台的文本ID,使用人类语言标记。通过在基于LLM的推荐系统旁边训练文本ID生成器,实现个性化推荐无缝集成到自然语言生成中。值得注意的是,由于用户历史记录以自然语言表达并与原始数据集解耦,我们的方法提出了一个基础生成式推荐模型的潜力。实验表明,我们的框架在标准实验设置下一直优于现有的顺序推荐模型。然后,我们探讨了使用所提出的方法在收集自19个不同数据集的数据上训练基础推荐模型的可能性,并在6个不同平台上的完全零样本设置下测试了其推荐性能。结果显示,预训练的基础模型的零样本性能与甚至比一些基于监督训练的传统推荐模型更好,显示了IDGen范式作为生成式推荐基础模型的潜力。代码和数据开源在https://github.com/agiresearch/IDGenRec。
更新时间: 2024-03-27 21:22:37
领域: cs.IR,cs.AI,cs.CL,cs.LG
Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation
Meta-learning involves training models on a variety of training tasks in a way that enables them to generalize well on new, unseen test tasks. In this work, we consider meta-learning within the framework of high-dimensional multivariate random-effects linear models and study generalized ridge-regression based predictions. The statistical intuition of using generalized ridge regression in this setting is that the covariance structure of the random regression coefficients could be leveraged to make better predictions on new tasks. Accordingly, we first characterize the precise asymptotic behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task. We next show that this predictive risk is optimal when the weight matrix in generalized ridge regression is chosen to be the inverse of the covariance matrix of random coefficients. Finally, we propose and analyze an estimator of the inverse covariance matrix of random regression coefficients based on data from the training tasks. As opposed to intractable MLE-type estimators, the proposed estimators could be computed efficiently as they could be obtained by solving (global) geodesically-convex optimization problems. Our analysis and methodology use tools from random matrix theory and Riemannian optimization. Simulation results demonstrate the improved generalization performance of the proposed method on new unseen test tasks within the considered framework.
Updated: 2024-03-27 21:18:43
标题: 元学习中的广义岭回归:高维渐近性、最优性和超协方差估计
摘要: 元学习涉及在各种训练任务上训练模型,使其能够在新的、未见过的测试任务上进行良好的泛化。在这项工作中,我们考虑将元学习置于高维多变量随机效应线性模型的框架内,并研究基于广义岭回归的预测。在这种情况下使用广义岭回归的统计直觉是,随机回归系数的协方差结构可以被利用来在新任务上进行更好的预测。因此,我们首先描述了当数据维度与每个任务的样本数量成比例增长时,新测试任务的预测风险的精确渐近行为。接着我们展示了当广义岭回归中的权重矩阵选择为随机系数的协方差矩阵的逆时,该预测风险是最优的。最后,我们提出并分析了一个基于训练任务数据的随机回归系数的逆协方差矩阵的估计量。与难以处理的MLE类型估计器不同,所提出的估计器可以高效计算,因为它们可以通过解决(全局)几何凸优化问题来获得。我们的分析和方法论使用了随机矩阵理论和黎曼优化的工具。模拟结果展示了在考虑的框架内,所提出的方法在新的未见过的测试任务上表现出改善的泛化性能。
更新时间: 2024-03-27 21:18:43
领域: math.ST,cs.LG,stat.ML,stat.TH
Thelxinoë: Recognizing Human Emotions Using Pupillometry and Machine Learning
In this study, we present a method for emotion recognition in Virtual Reality (VR) using pupillometry. We analyze pupil diameter responses to both visual and auditory stimuli via a VR headset and focus on extracting key features in the time-domain, frequency-domain, and time-frequency domain from VR generated data. Our approach utilizes feature selection to identify the most impactful features using Maximum Relevance Minimum Redundancy (mRMR). By applying a Gradient Boosting model, an ensemble learning technique using stacked decision trees, we achieve an accuracy of 98.8% with feature engineering, compared to 84.9% without it. This research contributes significantly to the Thelxino\"e framework, aiming to enhance VR experiences by integrating multiple sensor data for realistic and emotionally resonant touch interactions. Our findings open new avenues for developing more immersive and interactive VR environments, paving the way for future advancements in virtual touch technology.
Updated: 2024-03-27 21:14:17
标题: Thelxinoë:利用瞳孔测量和机器学习识别人类情绪
摘要: 在这项研究中,我们提出了一种使用瞳孔测量技术在虚拟现实(VR)中进行情绪识别的方法。我们通过VR头显分析瞳孔直径对视觉和听觉刺激的反应,并专注于从VR生成的数据中提取时域、频域和时频域中的关键特征。我们的方法利用特征选择来通过最大相关性最小冗余度(mRMR)识别最具影响力的特征。通过应用梯度提升模型,一种使用堆叠决策树的集成学习技术,我们在特征工程的情况下实现了98.8%的准确率,而没有特征工程时为84.9%。这项研究对Thelxino"e框架做出了重大贡献,旨在通过整合多个传感器数据来增强VR体验,实现更真实和情感共鸣的触感交互。我们的发现为开发更具沉浸感和互动性的VR环境开辟了新途径,为虚拟触觉技术的未来进步铺平了道路。
更新时间: 2024-03-27 21:14:17
领域: cs.LG,cs.HC
The current state of security -- Insights from the German software industry
These days, software development and security go hand in hand. Numerous techniques and strategies are discussed in the literature that can be applied to guarantee the incorporation of security into the software development process. In this paper the main ideas of secure software development that have been discussed in the literature are outlined. Next, a dataset on implementation in practice is gathered through a qualitative interview research involving 20 companies. Trends and correlations in this dataset are found and contrasted with theoretical ideas from the literature. The results show that the organizations that were polled are placing an increasing focus on security. Although the techniques covered in the literature are being used in the real world, they are frequently not fully integrated into formal, standardized processes. The insights gained from our research lay the groundwork for future research, which can delve deeper into specific elements of these methods to enhance our understanding of their application in real-world scenarios.
Updated: 2024-03-27 21:13:09
标题: 当前安全状况——来自德国软件行业的见解
摘要: 这些天,软件开发和安全性是相辅相成的。文献中讨论了许多技术和策略,可以应用于确保安全性被纳入软件开发过程中。本文概述了文献中讨论的安全软件开发的主要思想。接下来,通过对涉及20家公司的定性访谈研究,收集了一个实践中的实施数据集。在这个数据集中发现了趋势和相关性,并与文献中的理论思想进行对比。结果显示,接受调查的组织正在越来越注重安全性。尽管文献中涵盖的技术在实际世界中被使用,但它们经常没有完全整合到正式的标准化流程中。我们研究所获得的见解为未来研究奠定了基础,这些研究可以深入探讨这些方法的具体要素,以增强我们对它们在实际情景中应用的理解。
更新时间: 2024-03-27 21:13:09
领域: cs.CR
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on the ID or text-based recommendation problem, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements in foundational generative modeling have provided the flexibility and effectiveness necessary to achieve the objective. In light of this, we develop a generic and extensible personalization generative framework, that can handle a wide range of personalized needs including item recommendation, product search, preference prediction, explanation generation, and further user-guided image generation. Our methodology enhances the capabilities of foundational language models for personalized tasks by seamlessly ingesting interleaved cross-modal user history information, ensuring a more precise and customized experience for users. To train and evaluate the proposed multi-modal personalized tasks, we also introduce a novel and comprehensive benchmark covering a variety of user requirements. Our experiments on the real-world benchmark showcase the model's potential, outperforming competitive methods specialized for each task.
Updated: 2024-03-27 21:11:19
标题: 朝着统一的多模态个性化方向发展:为生成式推荐及更多领域构建大型视觉-语言模型
摘要: 开发一个可以有效利用异构资源并响应各种个性化需求的通用模型一直是社区的愿望。我们的日常选择,尤其是在时尚和零售领域,很大程度上受到多模态数据的影响,如图片和文本描述。这些模态不仅提供直观的指导,还迎合个性化用户偏好。然而,主流的个性化方法主要集中在基于ID或文本的推荐问题上,未能理解涉及各种任务或模态的信息。在本文中,我们的目标是建立一个统一的多模态个性化系统范式(UniMP),它有效利用多模态数据,同时消除与任务和模态特定定制相关的复杂性。我们认为,基础生成建模的进展已经提供了实现目标所需的灵活性和有效性。基于此,我们开发了一个通用和可扩展的个性化生成框架,可以处理包括项目推荐、产品搜索、偏好预测、解释生成和进一步用户引导的图像生成在内的各种个性化需求。我们的方法通过无缝地吸收交叉模态用户历史信息来增强基础语言模型的个性化任务能力,为用户提供更精确和定制化的体验。为了训练和评估所提出的多模态个性化任务,我们还引入了一个涵盖各种用户需求的新颖和全面的基准。我们在真实世界基准上的实验展示了模型的潜力,胜过了专门针对每个任务的竞争方法。
更新时间: 2024-03-27 21:11:19
领域: cs.IR,cs.AI,cs.CL,cs.MM
ReflectSumm: A Benchmark for Course Reflection Summarization
This paper introduces ReflectSumm, a novel summarization dataset specifically designed for summarizing students' reflective writing. The goal of ReflectSumm is to facilitate developing and evaluating novel summarization techniques tailored to real-world scenarios with little training data, %practical tasks with potential implications in the opinion summarization domain in general and the educational domain in particular. The dataset encompasses a diverse range of summarization tasks and includes comprehensive metadata, enabling the exploration of various research questions and supporting different applications. To showcase its utility, we conducted extensive evaluations using multiple state-of-the-art baselines. The results provide benchmarks for facilitating further research in this area.
Updated: 2024-03-27 21:10:07
标题: ReflectSumm:课程反思总结评估的基准
摘要: 这篇论文介绍了ReflectSumm,这是一个专门设计用于总结学生反思性写作的新颖总结数据集。ReflectSumm的目标是促进开发和评估针对现实情景和少量训练数据的新颖总结技术,这些技术在普遍观点总结领域以及教育领域可能产生重要影响。该数据集涵盖了多样化的总结任务,并包括全面的元数据,可探索各种研究问题并支持不同的应用。为展示其实用性,我们进行了广泛的评估,使用了多种最先进的基准。结果为促进该领域进一步研究提供了基准。
更新时间: 2024-03-27 21:10:07
领域: cs.CL,cs.AI
Sequential Inference of Hospitalization ElectronicHealth Records Using Probabilistic Models
In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work we design a probabilistic unsupervised model for multiple arbitrary-length sequences contained in hospitalization Electronic Health Record (EHR) data. The model uses a latent variable structure and captures complex relationships between medications, diagnoses, laboratory tests, neurological assessments, and medications. It can be trained on original data, without requiring any lossy transformations or time binning. Inference algorithms are derived that use partial data to infer properties of the complete sequences, including their length and presence of specific values. We train this model on data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The results are evaluated against held-out data for predicting the length of sequences and presence of Intensive Care Unit (ICU) in hospitalization bed sequences. Our method outperforms a baseline approach, showing that in these experiments the trained model captures information in the sequences that is informative of their future values.
Updated: 2024-03-27 21:06:26
标题: 使用概率模型进行医院电子健康记录的顺序推断
摘要: 在动态的医院环境中,决策支持可以是改善患者结果的有价值工具。在这种动态设置中,通过数据驱动的推断未来结果是具有挑战性的,因为诸如实验室检测和药物等长序列经常更新。这部分是由于数据类型的异质性以及包含在可变长度序列中的混合序列类型。在这项工作中,我们设计了一个概率无监督模型,用于医院电子健康记录(EHR)数据中包含的多个任意长度序列。该模型使用潜变量结构,捕捉了药物、诊断、实验室检测、神经评估和药物之间的复杂关系。它可以在原始数据上进行训练,而无需任何损失转换或时间分箱。推断算法被导出,使用部分数据来推断完整序列的属性,包括它们的长度和特定值的存在。我们在接受加利福尼亚北部凯撒永久综合医疗保健系统医疗护理的对象的数据上训练了该模型。结果针对保留数据进行评估,用于预测序列的长度和住院床位序列中是否有重症监护室(ICU)。我们的方法优于基线方法,表明在这些实验中,训练模型捕捉了序列中的信息,这些信息对其未来值具有信息价值。
更新时间: 2024-03-27 21:06:26
领域: q-bio.QM,cs.LG
Towards Sustainable SecureML: Quantifying Carbon Footprint of Adversarial Machine Learning
The widespread adoption of machine learning (ML) across various industries has raised sustainability concerns due to its substantial energy usage and carbon emissions. This issue becomes more pressing in adversarial ML, which focuses on enhancing model security against different network-based attacks. Implementing defenses in ML systems often necessitates additional computational resources and network security measures, exacerbating their environmental impacts. In this paper, we pioneer the first investigation into adversarial ML's carbon footprint, providing empirical evidence connecting greater model robustness to higher emissions. Addressing the critical need to quantify this trade-off, we introduce the Robustness Carbon Trade-off Index (RCTI). This novel metric, inspired by economic elasticity principles, captures the sensitivity of carbon emissions to changes in adversarial robustness. We demonstrate the RCTI through an experiment involving evasion attacks, analyzing the interplay between robustness against attacks, performance, and carbon emissions.
Updated: 2024-03-27 21:02:15
标题: 走向可持续的SecureML:量化对抗机器学习的碳足迹
摘要: 机器学习(ML)在各行各业的广泛应用引起了人们对其大量能源使用和碳排放的可持续性担忧。这个问题在对抗性ML中变得更加紧迫,它专注于提高模型对不同网络攻击的安全性。在ML系统中实施防御措施通常需要额外的计算资源和网络安全措施,加剧了它们对环境的影响。在本文中,我们首次对对抗性ML的碳足迹进行了调查,提供了实证证据,将更大的模型鲁棒性与更高的排放联系起来。为了解决量化这种权衡的迫切需求,我们引入了鲁棒性碳交易指数(RCTI)。这种新颖的度量标准受经济弹性原则启发,捕捉了碳排放对对抗性鲁棒性变化的敏感性。我们通过一个涉及规避攻击的实验来展示RCTI,分析了对抗攻击、性能和碳排放之间的相互作用。
更新时间: 2024-03-27 21:02:15
领域: cs.LG,cs.CR
TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments
Autonomous robots exploring unknown environments face a significant challenge: navigating effectively without prior maps and with limited external feedback. This challenge intensifies in sparse reward environments, where traditional exploration techniques often fail. In this paper, we present TopoNav, a novel topological navigation framework that integrates active mapping, hierarchical reinforcement learning, and intrinsic motivation to enable efficient goal-oriented exploration and navigation in sparse-reward settings. TopoNav dynamically constructs a topological map of the environment, capturing key locations and pathways. A two-level hierarchical policy architecture, comprising a high-level graph traversal policy and low-level motion control policies, enables effective navigation and obstacle avoidance while maintaining focus on the overall goal. Additionally, TopoNav incorporates intrinsic motivation to guide exploration toward relevant regions and frontier nodes in the topological map, addressing the challenges of sparse extrinsic rewards. We evaluate TopoNav both in the simulated and real-world off-road environments using a Clearpath Jackal robot, across three challenging navigation scenarios: goal-reaching, feature-based navigation, and navigation in complex terrains. We observe an increase in exploration coverage by 7- 20%, in success rates by 9-19%, and reductions in navigation times by 15-36% across various scenarios, compared to state-of-the-art methods
Updated: 2024-03-27 21:01:24
标题: TopoNav:稀疏奖励环境下高效探索的拓扑导航
摘要: Autonomous robots exploring unknown environments face a significant challenge: navigating effectively without prior maps and with limited external feedback. This challenge intensifies in sparse reward environments, where traditional exploration techniques often fail. In this paper, we present TopoNav, a novel topological navigation framework that integrates active mapping, hierarchical reinforcement learning, and intrinsic motivation to enable efficient goal-oriented exploration and navigation in sparse-reward settings. TopoNav dynamically constructs a topological map of the environment, capturing key locations and pathways. A two-level hierarchical policy architecture, comprising a high-level graph traversal policy and low-level motion control policies, enables effective navigation and obstacle avoidance while maintaining focus on the overall goal. Additionally, TopoNav incorporates intrinsic motivation to guide exploration toward relevant regions and frontier nodes in the topological map, addressing the challenges of sparse extrinsic rewards. We evaluate TopoNav both in the simulated and real-world off-road environments using a Clearpath Jackal robot, across three challenging navigation scenarios: goal-reaching, feature-based navigation, and navigation in complex terrains. We observe an increase in exploration coverage by 7-20%, in success rates by 9-19%, and reductions in navigation times by 15-36% across various scenarios, compared to state-of-the-art methods.
更新时间: 2024-03-27 21:01:24
领域: cs.RO,cs.LG
What's in a Prior? Learned Proximal Networks for Inverse Problems
Proximal operators are ubiquitous in inverse problems, commonly appearing as part of algorithmic strategies to regularize problems that are otherwise ill-posed. Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling, where they loosely resemble proximal operators. Yet, something essential is lost in employing these purely data-driven approaches: there is no guarantee that a general deep network represents the proximal operator of any function, nor is there any characterization of the function for which the network might provide some approximate proximal. This not only makes guaranteeing convergence of iterative schemes challenging but, more fundamentally, complicates the analysis of what has been learned by these networks about their training data. Herein we provide a framework to develop learned proximal networks (LPN), prove that they provide exact proximal operators for a data-driven nonconvex regularizer, and show how a new training strategy, dubbed proximal matching, provably promotes the recovery of the log-prior of the true data distribution. Such LPN provide general, unsupervised, expressive proximal operators that can be used for general inverse problems with convergence guarantees. We illustrate our results in a series of cases of increasing complexity, demonstrating that these models not only result in state-of-the-art performance, but provide a window into the resulting priors learned from data.
Updated: 2024-03-27 20:48:37
标题: 一个先验中包含什么?为逆问题学习的近端网络
摘要: 接近算子在逆问题中是无处不在的,通常作为算法策略的一部分,用于规范那些本质上不适定的问题。现代深度学习模型也被用于这些任务,例如插拔式或深度展开的框架,在这些框架中它们与接近算子有类似之处。然而,通过采用这些纯数据驱动的方法,会丢失一些关键信息:没有保证通用深度网络表示任何函数的接近算子,也没有表征该网络可能提供某种近似接近算子的函数。这不仅使得保证迭代方案的收敛性具有挑战性,更基本地,使得分析这些网络对其训练数据学习到了什么变得复杂。在这里,我们提供了一个框架来开发学习到的接近算子网络(LPN),证明它们为数据驱动的非凸正则化器提供了精确的接近算子,并展示一种名为接近匹配的新训练策略,可确保促进真实数据分布的对数先验的恢复。这样的LPN提供了通用的、无监督的、表达丰富的接近算子,可用于具有收敛性保证的一般逆问题。我们在一系列逐渐复杂的情况下说明了我们的结果,证明这些模型不仅实现了最先进的性能,而且提供了对从数据中学习到的先验的窗口。
更新时间: 2024-03-27 20:48:37
领域: cs.CV,cs.LG
HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?
This paper describes our system developed for SemEval-2024 Task 8, ``Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection'' Machine-generated texts have been one of the main concerns due to the use of large language models (LLM) in fake text generation, phishing, cheating in exams, or even plagiarizing copyright materials. A lot of systems have been developed to detect machine-generated text. Nonetheless, the majority of these systems rely on the text-generating model. This limitation is impractical in real-world scenarios, as it's often impossible to know which specific model the user has used for text generation. In this work, we propose a $\textbf{single}$ model based on contrastive learning, which uses $\textbf{$\approx$40% of the baseline's parameters}$ (149M vs. 355M) but shows a comparable performance on the test dataset $(\textbf{21st out of 137 participants})$. Our key finding is that even without an ensemble of multiple models, a single base model can have comparable performance with the help of data augmentation and contrastive learning. Our code is publicly available at https://github.com/dipta007/SemEval24-Task8.
Updated: 2024-03-27 20:30:08
标题: HU在SemEval-2024任务8A中的表现:对比学习能否学习嵌入以检测机器生成的文本?
摘要: 这篇论文描述了我们为SemEval-2024任务8开发的系统,“多生成器、多领域和多语言黑盒机器生成文本检测”。由于大型语言模型(LLM)在虚假文本生成、网络钓鱼、考试作弊甚至剽窃版权材料中的使用,机器生成的文本一直是主要关注的焦点之一。已经开发了许多系统来检测机器生成的文本。然而,这些系统大多数依赖于文本生成模型。在现实世界的场景中,这种限制是不切实际的,因为往往不可能知道用户用于文本生成的特定模型。在这项工作中,我们提出了一种基于对比学习的单一模型,它使用基线参数的约40%(149M对355M),但在测试数据集上表现相当(137名参与者中的第21名)。我们的关键发现是,即使没有多个模型的集合,单个基础模型也可以在数据增强和对比学习的帮助下表现出可比性能。我们的代码可在https://github.com/dipta007/SemEval24-Task8 上公开获取。
更新时间: 2024-03-27 20:30:08
领域: cs.CL,cs.AI,cs.LG
Causal-StoNet: Causal Inference for High-Dimensional Complex Data
With the advancement of data science, the collection of increasingly complex datasets has become commonplace. In such datasets, the data dimension can be extremely high, and the underlying data generation process can be unknown and highly nonlinear. As a result, the task of making causal inference with high-dimensional complex data has become a fundamental problem in many disciplines, such as medicine, econometrics, and social science. However, the existing methods for causal inference are frequently developed under the assumption that the data dimension is low or that the underlying data generation process is linear or approximately linear. To address these challenges, this paper proposes a novel causal inference approach for dealing with high-dimensional complex data. The proposed approach is based on deep learning techniques, including sparse deep learning theory and stochastic neural networks, that have been developed in recent literature. By using these techniques, the proposed approach can address both the high dimensionality and unknown data generation process in a coherent way. Furthermore, the proposed approach can also be used when missing values are present in the datasets. Extensive numerical studies indicate that the proposed approach outperforms existing ones.
Updated: 2024-03-27 20:27:31
标题: 因果-StoNet:高维复杂数据的因果推断
摘要: 随着数据科学的发展,收集越来越复杂的数据集已经变得司空见惯。在这种数据集中,数据维度可能非常高,而潜在的数据生成过程可能是未知的且高度非线性的。因此,在许多领域,如医学、计量经济学和社会科学,利用高维复杂数据进行因果推断的任务已经成为一个基本问题。然而,现有的因果推断方法通常是基于数据维度较低或潜在数据生成过程是线性或近似线性的假设而开发的。为了解决这些挑战,本文提出了一种针对高维复杂数据的新型因果推断方法。提出的方法基于最近文献中开发的深度学习技术,包括稀疏深度学习理论和随机神经网络。通过使用这些技术,提出的方法可以以一种连贯的方式处理高维度和未知数据生成过程。此外,当数据集中存在缺失值时,提出的方法也可以使用。广泛的数值研究表明,提出的方法优于现有方法。
更新时间: 2024-03-27 20:27:31
领域: stat.ML,cs.LG
Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models
The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we propose Prompt Risk Control, a lightweight framework for selecting a prompt based on rigorous upper bounds on families of informative risk measures. We offer methods for producing bounds on a diverse set of metrics, including quantities that measure worst-case responses and disparities in generation quality across the population of users. In addition, we extend the underlying statistical bounding techniques to accommodate the possibility of distribution shifts in deployment. Experiments on applications such as open-ended chat, medical question summarization, and code generation highlight how such a framework can foster responsible deployment by reducing the risk of the worst outcomes.
Updated: 2024-03-27 20:20:22
标题: 即时风险控制:负责任部署大型语言模型的严格框架
摘要: 最近大型语言模型能力的爆炸式增长引发了人们对如何最好地提示模型执行特定任务的兴趣潮流。尽管简单地基于验证集上的平均性能选择提示可能很诱人,但这可能导致生成出乎意料的糟糕响应的部署,尤其是对于最糟糕的用户。为了减轻这种可能性,我们提出了Prompt Risk Control,这是一个基于严格的信息风险度量家族上界选择提示的轻量级框架。我们提供了产生多样化指标上界的方法,包括衡量最坏情况响应和用户群体中生成质量差异的数量。此外,我们扩展了基本的统计边界技术,以适应部署中分布转移的可能性。在开放式聊天、医疗问题摘要和代码生成等应用上的实验突显了这种框架如何通过降低最糟糕结果的风险来促进负责任的部署。
更新时间: 2024-03-27 20:20:22
领域: cs.LG,cs.AI,cs.CL
COVID-19 detection from pulmonary CT scans using a novel EfficientNet with attention mechanism
Manual analysis and diagnosis of COVID-19 through the examination of Computed Tomography (CT) images of the lungs can be time-consuming and result in errors, especially given high volume of patients and numerous images per patient. So, we address the need for automation of this task by developing a new deep learning model-based pipeline. Our motivation was sparked by the CVPR Workshop on "Domain Adaptation, Explainability and Fairness in AI for Medical Image Analysis", more specifically, the "COVID-19 Diagnosis Competition (DEF-AI-MIA COV19D)" under the same Workshop. This challenge provides an opportunity to assess our proposed pipeline for COVID-19 detection from CT scan images. The same pipeline incorporates the original EfficientNet, but with an added Attention Mechanism: EfficientNet-AM. Also, unlike the traditional/past pipelines, which relied on a pre-processing step, our pipeline takes the raw selected input images without any such step, except for an image-selection step to simply reduce the number of CT images required for training and/or testing. Moreover, our pipeline is computationally efficient, as, for example, it does not incorporate a decoder for segmenting the lungs. It also does not combine different backbones nor combine RNN with a backbone, as other pipelines in the past did. Nevertheless, our pipeline still outperforms all approaches presented by other teams in last year's instance of the same challenge, at least based on the validation subset of the competition dataset.
Updated: 2024-03-27 20:10:05
标题: 利用具有注意机制的新型EfficientNet从肺部CT扫描中检测COVID-19
摘要: COVID-19的手动分析和诊断通过检查肺部的计算机断层扫描(CT)图像可能会耗时且容易出现错误,尤其是考虑到患者数量众多且每位患者有大量图像的情况下。因此,我们通过开发基于新的深度学习模型的流程来满足自动化这一任务的需求。我们的动机源自CVPR研讨会关于“医学图像分析中的领域适应性、可解释性和公平性”的启发,更具体地说,是在同一研讨会下的“COVID-19诊断竞赛(DEF-AI-MIA COV19D)”。这一挑战提供了一个机会来评估我们提出的用于从CT扫描图像中检测COVID-19的流程。相同的流程包含原始的EfficientNet,但增加了一个注意力机制:EfficientNet-AM。此外,与传统/过去的流程不同,传统流程依赖于预处理步骤,而我们的流程采用选择的原始输入图像,没有任何这样的步骤,除了一个图像选择步骤,仅仅是为了减少训练和/或测试所需的CT图像数量。此外,我们的流程在计算上是高效的,例如,它不包括用于分割肺部的解码器。它也不组合不同的主干网络,也不将RNN与主干网络结合,而过去的其他流程是这样做的。尽管如此,我们的流程在去年同一挑战赛的验证子集中仍然优于其他团队提出的所有方法。
更新时间: 2024-03-27 20:10:05
领域: eess.IV,cs.CV,cs.LG
Dealing with Imbalanced Classes in Bot-IoT Dataset
With the rapidly spreading usage of Internet of Things (IoT) devices, a network intrusion detection system (NIDS) plays an important role in detecting and protecting various types of attacks in the IoT network. To evaluate the robustness of the NIDS in the IoT network, the existing work proposed a realistic botnet dataset in the IoT network (Bot-IoT dataset) and applied it to machine learning-based anomaly detection. This dataset contains imbalanced normal and attack packets because the number of normal packets is much smaller than that of attack ones. The nature of imbalanced data may make it difficult to identify the minority class correctly. In this thesis, to address the class imbalance problem in the Bot-IoT dataset, we propose a binary classification method with synthetic minority over-sampling techniques (SMOTE). The proposed classifier aims to detect attack packets and overcome the class imbalance problem using the SMOTE algorithm. Through numerical results, we demonstrate the proposed classifier's fundamental characteristics and the impact of imbalanced data on its performance.
Updated: 2024-03-27 20:09:59
标题: 处理Bot-IoT数据集中的不平衡类别
摘要: 随着物联网(IoT)设备的迅速普及,网络入侵检测系统(NIDS)在检测和保护IoT网络中各种类型的攻击中发挥着重要作用。为了评估IoT网络中NIDS的强健性,现有研究提出了一个现实的僵尸网络数据集(Bot-IoT数据集),并将其应用于基于机器学习的异常检测。该数据集包含不平衡的正常和攻击数据包,因为正常数据包的数量远小于攻击数据包的数量。不平衡数据的性质可能会使得难以正确识别少数类。在本论文中,为了解决Bot-IoT数据集中的类不平衡问题,我们提出了一种利用合成少数类过采样技术(SMOTE)的二元分类方法。所提出的分类器旨在检测攻击数据包并利用SMOTE算法克服类不平衡问题。通过数值结果,我们展示了所提出分类器的基本特征以及不平衡数据对其性能的影响。
更新时间: 2024-03-27 20:09:59
领域: cs.CR,cs.AI
Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning
We present a generic Reinforcement Learning (RL) framework optimized for crafting adversarial attacks on different model types spanning from ECG signal analysis (1D), image classification (2D), and video classification (3D). The framework focuses on identifying sensitive regions and inducing misclassifications with minimal distortions and various distortion types. The novel RL method outperforms state-of-the-art methods for all three applications, proving its efficiency. Our RL approach produces superior localization masks, enhancing interpretability for image classification and ECG analysis models. For applications such as ECG analysis, our platform highlights critical ECG segments for clinicians while ensuring resilience against prevalent distortions. This comprehensive tool aims to bolster both resilience with adversarial training and transparency across varied applications and data types.
Updated: 2024-03-27 20:07:39
标题: 强化学习在黑盒图像、视频和心电图信号分类中的鲁棒性和视觉解释
摘要: 我们提出了一个通用的强化学习(RL)框架,针对不同模型类型进行优化,涵盖了从ECG信号分析(1D)、图像分类(2D)到视频分类(3D)等各种领域的对抗攻击。该框架专注于识别敏感区域,并通过最小程度的扭曲和各种扭曲类型诱导误分类。这种新颖的RL方法在所有三个应用中都优于现有技术方法,证明了其效率。我们的RL方法生成了优越的定位掩模,增强了图像分类和ECG分析模型的可解释性。对于ECG分析等应用,我们的平台突出显示了临床医生需要关注的关键ECG片段,同时确保对流行的扭曲具有抗性。这个全面的工具旨在通过对抗性训练增强韧性,并在各种应用和数据类型之间提高透明度。
更新时间: 2024-03-27 20:07:39
领域: cs.LG,cs.AI,cs.CR,cs.CV,cs.MA
LTL learning on GPUs
Linear temporal logic (LTL) is widely used in industrial verification. LTL formulae can be learned from traces. Scaling LTL formula learning is an open problem. We implement the first GPU-based LTL learner using a novel form of enumerative program synthesis. The learner is sound and complete. Our benchmarks indicate that it handles traces at least 2048 times more numerous, and on average at least 46 times faster than existing state-of-the-art learners. This is achieved with, among others, novel branch-free LTL semantics that has $O(\log n)$ time complexity, where $n$ is trace length, while previous implementations are $O(n^2)$ or worse (assuming bitwise boolean operations and shifts by powers of 2 have unit costs -- a realistic assumption on modern processors).
Updated: 2024-03-27 20:00:00
标题: GPU上的LTL学习
摘要: 线性时序逻辑(LTL)在工业验证中被广泛使用。 LTL公式可以从轨迹中学习。 扩展LTL公式学习是一个未解决的问题。 我们使用一种新颖的枚举程序合成实现了第一个基于GPU的LTL学习器。 该学习器是完备的。 我们的基准测试表明,它处理的轨迹数量至少比现有最先进的学习器多2048倍,平均速度至少快46倍。 这是通过使用新颖的无分支LTL语义实现的,该语义具有$O(\log n)$的时间复杂度,其中$n$是轨迹长度,而以前的实现则为$O(n^2)$或更糟(假设位布尔运算和2的幂次移位的单位成本相同-这是现代处理器上的一个真实假设)。
更新时间: 2024-03-27 20:00:00
领域: cs.PL,cs.AI,68,D.3
TextCraftor: Your Text Encoder Can be Image Quality Controller
Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable capabilities, these models are not without their limitations. It is still challenging to synthesize an image that aligns well with the input text, and multiple runs with carefully crafted prompts are required to achieve satisfactory results. To mitigate these limitations, numerous studies have endeavored to fine-tune the pre-trained diffusion models, i.e., UNet, utilizing various technologies. Yet, amidst these efforts, a pivotal question of text-to-image diffusion model training has remained largely unexplored: Is it possible and feasible to fine-tune the text encoder to improve the performance of text-to-image diffusion models? Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments. Interestingly, our technique also empowers controllable image generation through the interpolation of different text encoders fine-tuned with various rewards. We also demonstrate that TextCraftor is orthogonal to UNet finetuning, and can be combined to further improve generative quality.
Updated: 2024-03-27 19:52:55
标题: TextCraftor:您的文本编码器可以作为图像质量控制器
摘要: 基于扩散的文本到图像生成模型,例如稳定扩散,已经在内容生成领域产生了革命性的影响,使得图像编辑和视频合成等领域取得了显著进展。尽管这些模型具有强大的能力,但它们并非没有局限性。合成与输入文本相匹配的图像仍然具有挑战性,需要多次运行并精心设计提示才能实现令人满意的结果。为了缓解这些限制,许多研究努力对预训练的扩散模型,即UNet进行微调,利用各种技术。然而,在这些努力中,一个关键问题——文本到图像扩散模型训练的可行性与可能性——仍然大部分未被探索。我们的研究发现,与其用其他大型语言模型替换稳定扩散中使用的CLIP文本编码器,我们可以通过我们提出的精细调整方法TextCraftor 来增强它,从而在定量基准和人类评估中实现显著改进。有趣的是,我们的技术还通过调整不同奖励微调的文本编码器的插值来实现可控图像生成。我们还证明,TextCraftor 是与UNet微调正交的,并且可以结合使用以进一步提高生成质量。
更新时间: 2024-03-27 19:52:55
领域: cs.CV,cs.AI,cs.LG
A Systematic Evaluation of Euclidean Alignment with Deep Learning for EEG Decoding
Electroencephalography (EEG) signals are frequently used for various Brain-Computer Interface (BCI) tasks. While Deep Learning (DL) techniques have shown promising results, they are hindered by the substantial data requirements. By leveraging data from multiple subjects, transfer learning enables more effective training of DL models. A technique that is gaining popularity is Euclidean Alignment (EA) due to its ease of use, low computational complexity, and compatibility with Deep Learning models. However, few studies evaluate its impact on the training performance of shared and individual DL models. In this work, we systematically evaluate the effect of EA combined with DL for decoding BCI signals. We used EA to train shared models with data from multiple subjects and evaluated its transferability to new subjects. Our experimental results show that it improves decoding in the target subject by 4.33% and decreases convergence time by more than 70%. We also trained individual models for each subject to use as a majority-voting ensemble classifier. In this scenario, using EA improved the 3-model ensemble accuracy by 3.7%. However, when compared to the shared model with EA, the ensemble accuracy was 3.62% lower.
Updated: 2024-03-27 19:47:06
标题: 基于深度学习的欧几里德对齐在脑电解码中的系统评估
摘要: 脑电图(EEG)信号经常用于各种脑-计算机界面(BCI)任务。尽管深度学习(DL)技术显示出有希望的结果,但受到大量数据需求的限制。通过利用多个受试者的数据,迁移学习可以更有效地训练DL模型。欧几里德对齐(EA)是一种正在流行的技术,因为它易于使用,计算复杂度低,并且与深度学习模型兼容。然而,很少有研究评估其对共享和个体DL模型的训练性能的影响。在这项工作中,我们系统地评估了EA与DL相结合对解码BCI信号的影响。我们使用EA来训练多个受试者的共享模型,并评估其对新受试者的可转移性。我们的实验结果显示,它可以使目标受试者的解码提高4.33%,并且收敛时间减少了70%以上。我们还为每个受试者训练了个体模型,用作多数投票集成分类器。在这种情况下,使用EA提高了3个模型集成的准确率3.7%。然而,与具有EA的共享模型相比,集成准确率降低了3.62%。
更新时间: 2024-03-27 19:47:06
领域: eess.SP,cs.AI,cs.LG,I.5.1; I.6.3; I.2.6
"Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing
Hallucination has emerged as the most vulnerable aspect of contemporary Large Language Models (LLMs). In this paper, we introduce the Sorry, Come Again (SCA) prompting, aimed to avoid LLM hallucinations by enhancing comprehension through: (i) optimal paraphrasing and (ii) injecting [PAUSE] tokens to delay LLM generation. First, we provide an in-depth analysis of linguistic nuances: formality, readability, and concreteness of prompts for 21 LLMs, and elucidate how these nuances contribute to hallucinated generation. Prompts with lower readability, formality, or concreteness pose comprehension challenges for LLMs, similar to those faced by humans. In such scenarios, an LLM tends to speculate and generate content based on its imagination (associative memory) to fill these information gaps. Although these speculations may occasionally align with factual information, their accuracy is not assured, often resulting in hallucination. Recent studies reveal that an LLM often neglects the middle sections of extended prompts, a phenomenon termed as lost in the middle. While a specific paraphrase may suit one LLM, the same paraphrased version may elicit a different response from another LLM. Therefore, we propose an optimal paraphrasing technique to identify the most comprehensible paraphrase of a given prompt, evaluated using Integrated Gradient (and its variations) to guarantee that the LLM accurately processes all words. While reading lengthy sentences, humans often pause at various points to better comprehend the meaning read thus far. We have fine-tuned an LLM with injected [PAUSE] tokens, allowing the LLM to pause while reading lengthier prompts. This has brought several key contributions: (i) determining the optimal position to inject [PAUSE], (ii) determining the number of [PAUSE] tokens to be inserted, and (iii) introducing reverse proxy tuning to fine-tune the LLM for [PAUSE] insertion.
Updated: 2024-03-27 19:45:09
标题: 抱歉,请再来一次?提示——通过注入[PAUSE]的最佳释义来增强理解并减少幻觉
摘要: 幻觉已经成为当代大型语言模型(LLMs)中最脆弱的方面。在本文中,我们介绍了“抱歉,请再说一遍”(SCA)提示,旨在通过增强理解力来避免LLM的幻觉,方法是:(i)最佳的释义和(ii)注入[PAUSE]标记以延迟LLM的生成。首先,我们对21个LLMs的提示进行了形式,可读性和具体性的深入分析,并阐明了这些细微差别如何导致产生幻觉。具有较低可读性,形式化或具体性的提示对LLMs提出了理解挑战,类似于人类所面临的挑战。在这种情况下,LLM倾向于推测并生成基于其想象力(联想记忆)的内容,以填补这些信息空白。尽管这些推测有时可能与事实信息一致,但其准确性并不得到保证,通常导致幻觉。最近的研究表明,LLM经常忽略扩展提示的中间部分,这一现象被称为中间迷失。虽然一种特定的释义可能适合一个LLM,但同一释义的版本可能会引发另一个LLM的不同反应。因此,我们提出了一种最佳释义技术,以识别给定提示的最易理解的释义,评估使用集成梯度(及其变体)以确保LLM准确处理所有单词。在阅读冗长句子时,人们通常会在各个点暂停一下,以更好地理解迄今所读内容的意思。我们已经对注入[PAUSE]标记进行了微调的LLM进行了微调,使LLM在阅读较长提示时能够暂停。这带来了几个关键贡献:(i)确定注入[PAUSE]的最佳位置,(ii)确定要插入的[PAUSE]标记的数量,以及(iii)引入反向代理调整来微调LLM进行[PAUSE]插入。
更新时间: 2024-03-27 19:45:09
领域: cs.CL,cs.AI
A Survey on Large Language Models from Concept to Implementation
Recent advancements in Large Language Models (LLMs), particularly those built on Transformer architectures, have significantly broadened the scope of natural language processing (NLP) applications, transcending their initial use in chatbot technology. This paper investigates the multifaceted applications of these models, with an emphasis on the GPT series. This exploration focuses on the transformative impact of artificial intelligence (AI) driven tools in revolutionizing traditional tasks like coding and problem-solving, while also paving new paths in research and development across diverse industries. From code interpretation and image captioning to facilitating the construction of interactive systems and advancing computational domains, Transformer models exemplify a synergy of deep learning, data analysis, and neural network design. This survey provides an in-depth look at the latest research in Transformer models, highlighting their versatility and the potential they hold for transforming diverse application sectors, thereby offering readers a comprehensive understanding of the current and future landscape of Transformer-based LLMs in practical applications.
Updated: 2024-03-27 19:35:41
标题: 从概念到实现的大型语言模型调查
摘要: 最近在大型语言模型(LLMs)方面取得的进展,特别是建立在Transformer架构上的模型,显著拓展了自然语言处理(NLP)应用的范围,超越了最初在聊天机器人技术中的应用。本文调查了这些模型的多方面应用,重点放在GPT系列上。这项探索着重于人工智能(AI)驱动工具在改变传统任务,如编码和问题解决方面所产生的变革性影响,同时也在跨越不同行业中的研究和发展开辟新道路。从代码解释和图像标题到促进交互系统的构建和推进计算领域,Transformer模型展示了深度学习、数据分析和神经网络设计的协同作用。本调查提供了对Transformer模型最新研究的深入了解,突出了它们的多功能性以及它们在转变各种应用领域中所持有的潜力,从而为读者提供了对Transformer LLMs在实际应用中的当前和未来景观的全面理解。
更新时间: 2024-03-27 19:35:41
领域: cs.CL,cs.AI,cs.IT,cs.LG,math.IT
Integration of Graph Neural Network and Neural-ODEs for Tumor Dynamic Prediction
In anti-cancer drug development, a major scientific challenge is disentangling the complex relationships between high-dimensional genomics data from patient tumor samples, the corresponding tumor's organ of origin, the drug targets associated with given treatments and the resulting treatment response. Furthermore, to realize the aspirations of precision medicine in identifying and adjusting treatments for patients depending on the therapeutic response, there is a need for building tumor dynamic models that can integrate both longitudinal tumor size as well as multimodal, high-content data. In this work, we take a step towards enhancing personalized tumor dynamic predictions by proposing a heterogeneous graph encoder that utilizes a bipartite Graph Convolutional Neural network (GCN) combined with Neural Ordinary Differential Equations (Neural-ODEs). We applied the methodology to a large collection of patient-derived xenograft (PDX) data, spanning a wide variety of treatments (as well as their combinations) on tumors that originated from a number of different organs. We first show that the methodology is able to discover a tumor dynamic model that significantly improves upon an empirical model which is in current use. Additionally, we show that the graph encoder is able to effectively utilize multimodal data to enhance tumor predictions. Our findings indicate that the methodology holds significant promise and offers potential applications in pre-clinical settings.
Updated: 2024-03-27 19:34:21
标题: 将图神经网络和神经-ODEs集成用于肿瘤动态预测
摘要: 在抗癌药物开发中,一个主要的科学挑战是解开患者肿瘤样本的高维基因组数据、对应肿瘤的起源器官、与给定治疗相关的药物靶标以及治疗反应之间复杂关系。此外,为了实现精准医学的愿景,根据治疗反应为患者识别和调整治疗,需要建立能够整合纵向肿瘤大小以及多模式、高内容数据的肿瘤动态模型。在这项工作中,我们通过提出一个利用双部图卷积神经网络(GCN)结合神经普通微分方程(Neural-ODEs)的异质图编码器,迈出了增强个性化肿瘤动态预测的一步。我们将该方法应用于一个大型的患者来源的异种移植(PDX)数据集,涵盖了来自多种不同器官的肿瘤的多种治疗方案(以及它们的组合)。首先,我们展示了该方法能够发现一个显著优于当前实际使用的经验模型的肿瘤动态模型。此外,我们展示了图编码器能够有效利用多模式数据来提升肿瘤预测。我们的发现表明,该方法具有重要的潜力,在临床前设置中具有潜在的应用。
更新时间: 2024-03-27 19:34:21
领域: cs.LG
LORD: Large Models based Opposite Reward Design for Autonomous Driving
Reinforcement learning (RL) based autonomous driving has emerged as a promising alternative to data-driven imitation learning approaches. However, crafting effective reward functions for RL poses challenges due to the complexity of defining and quantifying good driving behaviors across diverse scenarios. Recently, large pretrained models have gained significant attention as zero-shot reward models for tasks specified with desired linguistic goals. However, the desired linguistic goals for autonomous driving such as "drive safely" are ambiguous and incomprehensible by pretrained models. On the other hand, undesired linguistic goals like "collision" are more concrete and tractable. In this work, we introduce LORD, a novel large models based opposite reward design through undesired linguistic goals to enable the efficient use of large pretrained models as zero-shot reward models. Through extensive experiments, our proposed framework shows its efficiency in leveraging the power of large pretrained models for achieving safe and enhanced autonomous driving. Moreover, the proposed approach shows improved generalization capabilities as it outperforms counterpart methods across diverse and challenging driving scenarios.
Updated: 2024-03-27 19:30:06
标题: LORD:基于大型模型的相反奖励设计用于自动驾驶
摘要: 基于强化学习(RL)的自动驾驶已经成为一种有前途的替代数据驱动的模仿学习方法。然而,为RL制定有效的奖励函数面临挑战,因为在不同场景中定义和量化良好的驾驶行为的复杂性。最近,大型预训练模型作为零样本奖励模型,针对具有所需语言目标的任务引起了重视。然而,自动驾驶的所需语言目标,如“安全驾驶”,是模糊的,并且预训练模型无法理解。另一方面,不需要的语言目标,如“碰撞”,更具体和可处理。在这项工作中,我们引入了LORD,一种基于大型模型的相反奖励设计,通过不需要的语言目标来实现大型预训练模型作为零样本奖励模型的高效使用。通过广泛的实验,我们提出的框架展示了其在利用大型预训练模型的能力实现安全和增强的自动驾驶方面的效率。此外,所提出的方法显示出改进的泛化能力,因为它在各种具有挑战性的驾驶场景中胜过对应的方法。
更新时间: 2024-03-27 19:30:06
领域: cs.RO,cs.AI,cs.LG
Using Quantum Computing to Infer Dynamic Behaviors of Biological and Artificial Neural Networks
The exploration of new problem classes for quantum computation is an active area of research. An essentially completely unexplored topic is the use of quantum algorithms and computing to explore and ask questions \textit{about} the functional dynamics of neural networks. This is a component of the still-nascent topic of applying quantum computing to the modeling and simulations of biological and artificial neural networks. In this work, we show how a carefully constructed set of conditions can use two foundational quantum algorithms, Grover and Deutsch-Josza, in such a way that the output measurements admit an interpretation that guarantees we can infer if a simple representation of a neural network (which applies to both biological and artificial networks) after some period of time has the potential to continue sustaining dynamic activity. Or whether the dynamics are guaranteed to stop either through 'epileptic' dynamics or quiescence.
Updated: 2024-03-27 19:16:56
标题: 利用量子计算推断生物和人工神经网络的动态行为
摘要: 对于量子计算的新问题类别的探索是一个活跃的研究领域。一个基本上完全未被探索的主题是使用量子算法和计算来探索和询问关于神经网络功能动态的问题。这是将量子计算应用于生物和人工神经网络建模和仿真的尚未成熟的主题的一个组成部分。在这项工作中,我们展示了如何通过精心构造的一组条件,以使两个基础性的量子算法,Grover和Deutsch-Josza,能够被用于输出测量,并且这些测量结果能够保证我们能够推断在一段时间后,一个简单的神经网络表示(既适用于生物网络又适用于人工网络)是否有潜力继续维持动态活动。或者动态是否被保证会停止,无论是通过“癫痫样”的动态还是静止状态。
更新时间: 2024-03-27 19:16:56
领域: quant-ph,cs.AI,q-bio.NC
A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products
This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies, aiming to enhance the reliability and effectiveness of LLM-based applications in real-world settings.
Updated: 2024-03-27 19:02:56
标题: 一份面向生成式AI软件产品的最新实践发布准备检查清单
摘要: 本文探讨了将大型语言模型(LLMs)集成到软件产品中的复杂性,重点关注确定它们发布准备就绪所遇到的挑战。我们对灰色文献的系统审查确定了在部署LLMs时常见的挑战,从预训练和微调到用户体验考虑。该研究引入了一个全面的检查表,旨在指导从业者评估关键的发布准备方面,如性能、监控和部署策略,旨在增强LLM应用程序在实际环境中的可靠性和有效性。
更新时间: 2024-03-27 19:02:56
领域: cs.SE,cs.AI
Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models
Online user-generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promotions of unsafe UGCGs on social media, which can inadvertently attract young users. This challenge arises from the difficulty of obtaining comprehensive training data for UGCG images and the unique nature of these images, which differ from traditional unsafe content. In this work, we take the first step towards studying the threat of illicit promotions of unsafe UGCGs. We collect a real-world dataset comprising 2,924 images that display diverse sexually explicit and violent content used to promote UGCGs by their game creators. Our in-depth studies reveal a new understanding of this problem and the urgent need for automatically flagging illicit UGCG promotions. We additionally create a cutting-edge system, UGCG-Guard, designed to aid social media platforms in effectively identifying images used for illicit UGCG promotions. This system leverages recently introduced large vision-language models (VLMs) and employs a novel conditional prompting strategy for zero-shot domain adaptation, along with chain-of-thought (CoT) reasoning for contextual identification. UGCG-Guard achieves outstanding results, with an accuracy rate of 94% in detecting these images used for the illicit promotion of such games in real-world scenarios.
Updated: 2024-03-27 19:02:13
标题: 利用大型视觉-语言模型调节不安全用户生成内容游戏的非法网络图像推广
摘要: 在线用户生成内容游戏(UGCGs)在儿童和青少年中越来越受欢迎,用于社交互动和更具创意的在线娱乐。然而,它们存在暴露明确内容的风险,引起了对儿童和青少年在线安全的日益关注。尽管存在这些担忧,但很少有研究涉及社交媒体上不安全UGCG的违规基于图像的促销问题,这可能会无意中吸引年轻用户。这一挑战源于获取UGCG图像的全面训练数据的困难,以及这些图像的独特性质,与传统的不安全内容有所不同。在这项工作中,我们迈出了研究非法促销不安全UGCG的威胁的第一步。我们收集了一个真实世界数据集,包括2,924张显示各种性暴力和暴力内容的图像,这些图像是由游戏创建者用来促销UGCG。我们深入研究揭示了对这一问题的新理解,以及迫切需要自动标记非法UGCG促销的需求。此外,我们还创建了一个先进系统UGCG-Guard,旨在帮助社交媒体平台有效识别用于非法UGCG促销的图像。该系统利用最近引入的大型视觉-语言模型(VLMs),并采用一种新颖的条件提示策略进行零-shot领域适应,以及链式思维(CoT)推理进行上下文识别。UGCG-Guard在真实场景中检测用于非法促销此类游戏的图像方面取得了出色的结果,准确率达到94%。
更新时间: 2024-03-27 19:02:13
领域: cs.CY,cs.CL,cs.LG,cs.SI
A Python library for efficient computation of molecular fingerprints
Machine learning solutions are very popular in the field of chemoinformatics, where they have numerous applications, such as novel drug discovery or molecular property prediction. Molecular fingerprints are algorithms commonly used for vectorizing chemical molecules as a part of preprocessing in this kind of solution. However, despite their popularity, there are no libraries that implement them efficiently for large datasets, utilizing modern, multicore architectures. On top of that, most of them do not provide the user with an intuitive interface, or one that would be compatible with other machine learning tools. In this project, we created a Python library that computes molecular fingerprints efficiently and delivers an interface that is comprehensive and enables the user to easily incorporate the library into their existing machine learning workflow. The library enables the user to perform computation on large datasets using parallelism. Because of that, it is possible to perform such tasks as hyperparameter tuning in a reasonable time. We describe tools used in implementation of the library and asses its time performance on example benchmark datasets. Additionally, we show that using molecular fingerprints we can achieve results comparable to state-of-the-art ML solutions even with very simple models.
Updated: 2024-03-27 19:02:09
标题: 一个用于高效计算分子指纹的Python库
摘要: 机器学习解决方案在化学信息学领域非常受欢迎,具有众多应用,如新药发现或分子性质预测。分子指纹是一种常用的算法,用于将化学分子向量化作为预处理的一部分。然而,尽管它们很受欢迎,但没有为大型数据集有效实现它们的库,利用现代多核架构。此外,大多数库并没有为用户提供直观的界面,或者与其他机器学习工具兼容的界面。 在这个项目中,我们创建了一个Python库,可以高效地计算分子指纹,并提供全面的界面,使用户能够轻松将该库整合到其现有的机器学习工作流程中。该库使用户能够使用并行计算处理大型数据集。因此,可以在合理的时间内执行诸如超参数调整之类的任务。我们描述了在实现库时使用的工具,并评估了其在示例基准数据集上的时间性能。此外,我们展示了使用分子指纹,即使使用非常简单的模型,也可以实现与最先进的机器学习解决方案相媲美的结果。
更新时间: 2024-03-27 19:02:09
领域: q-bio.QM,cs.LG
Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers
BACKGROUND: Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness and information retrieval. We propose a pipeline to extract information from free-text radiology reports, that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. METHODS: Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 radiology reports, we investigate a rule-free generative Question Answering approach based on a domain-specific version of T5 (IT5). Two strategies (batch-truncation and ex-post combination) are implemented to comply with the model's context length limitations. Performance is evaluated in terms of strict accuracy, F1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. A 5-point Likert scale questionnaire is used to collect human-expert feedback on the similarity between medical annotations and generated answers. RESULTS: The combination of fine-tuning and batch splitting allows IT5 to achieve notable results; it performs on par with GPT-3.5 albeit its size being a thousand times smaller in terms of parameters. Human-based assessment scores show a high correlation (Spearman's correlation coefficients>0.88, p-values<0.001) with AI performance metrics (F1) and confirm the superior ability of LLMs (i.e., GPT-3.5, 175B of parameters) in generating plausible human-like statements.
Updated: 2024-03-27 18:38:39
标题: 用生成变换器将自由文本放射学笔记重塑为结构化报告
摘要: 背景:放射学报告通常以自由文本格式编写,使得难以提取和使用临床信息。最近,由于结构化报告(SR)提供的诸多优势,例如标准化、完整性和信息检索,各种医学会议推荐采用SR。我们提出了一个从自由文本放射学报告中提取信息的流程,符合一个国家介入和医学放射学学会提出的参考SR注册表的项目,重点关注淋巴瘤患者的CT分期。方法:我们的工作旨在利用自然语言处理(NLP)和基于Transformer模型的潜力,处理自动SR注册表填写。通过174份放射学报告的可用性,我们研究了基于领域特定版本的T5(IT5)的无规则生成问答方法。实施两种策略(批量截断和事后组合)以符合模型的上下文长度限制。性能评估以严格准确性、F1和格式准确性为指标,并与广泛使用的GPT-3.5大型语言模型进行比较。使用5点李克特量表问卷收集人类专家对医学注释和生成答案之间的相似性的反馈。结果:微调和批量拆分的组合使IT5取得了显着的结果;尽管参数数量小了一千倍,但其性能与GPT-3.5不相上下。基于人类评估的得分显示,与人工智能性能指标(F1)有很高的相关性(斯皮尔曼相关系数>0.88,p值<0.001),并确认了大型语言模型(即GPT-3.5,参数为175亿)在生成合理类似人类语句方面的优越能力。
更新时间: 2024-03-27 18:38:39
领域: cs.CL,cs.AI,I.2.7; J.3
Measuring Political Bias in Large Language Models: What Is Said and How It Is Said
We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine-grained and explainable measures of political biases generated by LLMs. Our proposed measure looks at different political issues such as reproductive rights and climate change, at both the content (the substance of the generation) and the style (the lexical polarity) of such bias. We measured the political bias in eleven open-sourced LLMs and showed that our proposed framework is easily scalable to other topics and is explainable.
Updated: 2024-03-27 18:22:48
标题: 测量大型语言模型中的政治偏见:言论内容与表达方式
摘要: 我们提议通过分析LLMs生成的关于政治问题的内容和风格来衡量其政治偏见。现有的基准和度量重点放在性别和种族偏见上。然而,LLMs中存在政治偏见,可能导致下游应用中的极化和其他危害。为了向用户提供透明度,我们主张应该有由LLMs生成的政治偏见的细粒度和可解释的度量。我们提出的度量考虑了不同的政治问题,如生殖权和气候变化,以及这种偏见的内容(生成物的实质)和风格(词汇极性)。我们测量了十一个开源LLMs中的政治偏见,并展示了我们提出的框架可以轻松扩展到其他主题,并且是可解释的。
更新时间: 2024-03-27 18:22:48
领域: cs.CL,cs.AI
Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type Recognition
In this work, following the intuition that adverbs describing scene-sequences are best identified by reasoning over high-level concepts of object-behavior, we propose the design of a new framework that reasons over object-behaviours extracted from raw-video-clips to recognize the clip's corresponding adverb-types. Importantly, while previous works for general scene adverb-recognition assume knowledge of the clips underlying action-types, our method is directly applicable in the more general problem setting where the action-type of a video-clip is unknown. Specifically, we propose a novel pipeline that extracts human-interpretable object-behaviour-facts from raw video clips and propose novel symbolic and transformer based reasoning methods that operate over these extracted facts to identify adverb-types. Experiment results demonstrate that our proposed methods perform favourably against the previous state-of-the-art. Additionally, to support efforts in symbolic video-processing, we release two new datasets of object-behaviour-facts extracted from raw video clips - the MSR-VTT-ASP and ActivityNet-ASP datasets.
Updated: 2024-03-27 18:17:46
标题: 在视频剪辑中对对象行为进行推理以识别副词类型
摘要: 在这项工作中,根据描述场景序列的副词最好通过对物体行为的高级概念进行推理来识别,我们提出了设计一个新框架的想法,该框架通过从原始视频剪辑中提取的物体行为来识别剪辑对应的副词类型。重要的是,虽然以往的一般场景副词识别作品假定了对剪辑潜在动作类型的了解,但我们的方法可以直接应用于更一般的问题设置,即视频剪辑的动作类型未知。具体来说,我们提出了一种新颖的流水线,从原始视频剪辑中提取人类可解释的物体行为事实,并提出了基于符号和变压器的新颖推理方法,这些方法可以在提取的事实上运行,以识别副词类型。实验结果表明,我们提出的方法在与先前的最新技术相比中表现良好。此外,为支持符号视频处理的努力,我们发布了两个新数据集,这些数据集是从原始视频剪辑中提取的物体行为事实 - MSR-VTT-ASP和ActivityNet-ASP数据集。
更新时间: 2024-03-27 18:17:46
领域: cs.CV,cs.AI,cs.SC
SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction
Convolutional Neural Networks (CNNs) have demonstrated their effectiveness in numerous vision tasks. However, their high processing requirements necessitate efficient hardware acceleration to meet the application's performance targets. In the space of FPGAs, streaming-based dataflow architectures are often adopted by users, as significant performance gains can be achieved through layer-wise pipelining and reduced off-chip memory access by retaining data on-chip. However, modern topologies, such as the UNet, YOLO, and X3D models, utilise long skip connections, requiring significant on-chip storage and thus limiting the performance achieved by such system architectures. The paper addresses the above limitation by introducing weight and activation eviction mechanisms to off-chip memory along the computational pipeline, taking into account the available compute and memory resources. The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer. This enables the mapping of such modern CNNs to devices with limited on-chip memory, under the streaming architecture design approach. SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks, achieving up to 10.65 X throughput improvement compared to previous works.
Updated: 2024-03-27 18:12:24
标题: SMOF:在具有智能脱机驱逐功能的FPGAs上流式传输现代CNNs
摘要: 卷积神经网络(CNNs)在许多视觉任务中展现了它们的有效性。然而,它们高处理需求要求高效的硬件加速以满足应用的性能目标。在FPGA领域,用户通常采用基于流的数据流架构,通过逐层流水线处理和保留数据在芯片上以减少芯片外存访问来实现显著的性能提升。然而,现代拓扑结构,如UNet、YOLO和X3D模型,利用长跳跃连接,需要大量芯片内存,从而限制了这种系统架构所实现的性能。该论文通过在计算管道中引入权重和激活清除机制到芯片外存中,考虑到可用的计算和内存资源,来解决上述限制。所提出的机制被整合到现有的工具流中,通过利用芯片外存作为缓冲区扩展设计空间。这使得将这种现代CNNs映射到具有有限芯片内存的设备成为可能,在流架构设计方法下。SMOF已经证明能够在一系列计算机视觉任务中提供竞争性的,并在某些情况下达到最新性能,相较于以前的工作提高了高达10.65倍的吞吐量。
更新时间: 2024-03-27 18:12:24
领域: cs.AR,cs.CV,cs.LG
CPR: Retrieval Augmented Generation for Copyright Protection
Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG techniques for image generation may lead to parts of the retrieved samples being copied in the model's output. To reduce risks of leaking private information contained in the retrieved set, we introduce Copy-Protected generation with Retrieval (CPR), a new method for RAG with strong copyright protection guarantees in a mixed-private setting for diffusion models.CPR allows to condition the output of diffusion models on a set of retrieved images, while also guaranteeing that unique identifiable information about those example is not exposed in the generated outputs. In particular, it does so by sampling from a mixture of public (safe) distribution and private (user) distribution by merging their diffusion scores at inference. We prove that CPR satisfies Near Access Freeness (NAF) which bounds the amount of information an attacker may be able to extract from the generated images. We provide two algorithms for copyright protection, CPR-KL and CPR-Choose. Unlike previously proposed rejection-sampling-based NAF methods, our methods enable efficient copyright-protected sampling with a single run of backward diffusion. We show that our method can be applied to any pre-trained conditional diffusion model, such as Stable Diffusion or unCLIP. In particular, we empirically show that applying CPR on top of unCLIP improves quality and text-to-image alignment of the generated results (81.4 to 83.17 on TIFA benchmark), while enabling credit attribution, copy-right protection, and deterministic, constant time, unlearning.
Updated: 2024-03-27 18:09:55
标题: CPR:检索增强生成用于版权保护
摘要: 检索增强生成(RAG)作为一种灵活且健壮的技术正在兴起,可适应私人用户数据而无需训练,处理信用归属,并允许规模化的高效机器遗忘。然而,图像生成的RAG技术可能导致检索样本的部分内容被复制到模型的输出中。为了减少检索集中包含的私人信息泄露风险,我们引入了带检索的受保护生成(CPR),这是一种在扩散模型的混合私密设置中具有强大版权保护保证的RAG新方法。CPR允许将扩散模型的输出条件化为一组检索图像,同时保证生成的输出中不会暴露关于这些示例的唯一可识别信息。具体而言,它通过在推断时合并公共(安全)分布和私人(用户)分布的扩散分数来从公共分布和私人分布的混合中进行采样。我们证明CPR满足近似无访问性(NAF),该性质限制了攻击者从生成的图像中可能提取的信息量。我们提供了两种用于版权保护的算法,CPR-KL和CPR-Choose。与先前提出的基于拒绝抽样的NAF方法不同,我们的方法使得通过一次反向扩散运行可以高效进行受版权保护的抽样。我们展示了我们的方法可以应用于任何预训练的条件扩散模型,如稳定扩散或unCLIP。特别是,我们在实证上展示,将CPR应用于unCLIP可以提高生成结果的质量和文本到图像的对齐性(在TIFA基准上从81.4提高到83.17),同时实现信用归属、版权保护和确定性、恒定时间的遗忘。
更新时间: 2024-03-27 18:09:55
领域: cs.CR,cs.AI,cs.CV
PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization
This paper introduces a novel approach to temporal action localization (TAL) in few-shot learning. Our work addresses the inherent limitations of conventional single-prompt learning methods that often lead to overfitting due to the inability to generalize across varying contexts in real-world videos. Recognizing the diversity of camera views, backgrounds, and objects in videos, we propose a multi-prompt learning framework enhanced with optimal transport. This design allows the model to learn a set of diverse prompts for each action, capturing general characteristics more effectively and distributing the representation to mitigate the risk of overfitting. Furthermore, by employing optimal transport theory, we efficiently align these prompts with action features, optimizing for a comprehensive representation that adapts to the multifaceted nature of video data. Our experiments demonstrate significant improvements in action localization accuracy and robustness in few-shot settings on the standard challenging datasets of THUMOS-14 and EpicKitchens100, highlighting the efficacy of our multi-prompt optimal transport approach in overcoming the challenges of conventional few-shot TAL methods.
Updated: 2024-03-27 18:08:14
标题: PLOT-TAL -- 利用最优输运进行少样本时间动作定位的及时学习
摘要: 本文介绍了一种新颖的方法来进行时间动作定位(TAL)在少样本学习中。我们的工作解决了传统的单提示学习方法的固有局限性,这些方法往往由于无法在真实世界的视频中跨不同背景下泛化而导致过拟合。我们意识到视频中的摄像机视角、背景和物体的多样性,提出了一个增强了最优传输的多提示学习框架。这种设计允许模型学习每个动作的一组多样的提示,更有效地捕捉一般特征,并分配表示以减少过拟合的风险。此外,通过采用最优传输理论,我们有效地将这些提示与动作特征对齐,优化全面适应视频数据多方面特性的表示。我们的实验在THUMOS-14和EpicKitchens100等标准挑战数据集上展示了在少样本设置中动作定位准确性和鲁棒性方面的显著改进,突显了我们的多提示最优传输方法在克服传统少样本TAL方法的挑战方面的有效性。
更新时间: 2024-03-27 18:08:14
领域: cs.CV,cs.LG
A Geometric Explanation of the Likelihood OOD Detection Paradox
Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.
Updated: 2024-03-27 18:02:49
标题: 一种几何解释的可能性OOD检测悖论
摘要: 基于似然的深度生成模型(DGMs)通常表现出令人困惑的行为:当在一个相对复杂的数据集上训练时,它们会将来自简单来源的离群分布(OOD)数据分配更高的似然值。更增加神秘感的是,尽管具有更高似然性,但这些DGMs从未生成OOD样本。这个双重悖论尚未得到确切解释,导致基于似然的OOD检测不可靠。我们的主要观察是,如果高似然性区域包含最小的概率质量,它们将不会被生成。我们展示了如何在低维流形约束的数据周围出现大密度但低概率质量的这种看似矛盾。我们还展示了通过局部固有维度(LID)估计可以识别这种情况,并提出了一种从预训练的DGM中获得的似然性和LID估计配对的OOD检测方法。我们的方法可以应用于归一化流和基于分数的扩散模型,并获得了与或超过使用相同DGM骨干的最新OOD检测基准相匹配的结果。我们的代码可在https://github.com/layer6ai-labs/dgm_ood_detection上找到。
更新时间: 2024-03-27 18:02:49
领域: cs.LG,cs.AI,cs.CV,stat.ML
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Continual learning aims to learn from a stream of continuously arriving data with minimum forgetting of previously learned knowledge. While previous works have explored the effectiveness of leveraging the generalizable knowledge from pre-trained models in continual learning, existing parameter-efficient fine-tuning approaches focus on the use of a predetermined or task-wise set of adapters or prompts. However, these approaches still suffer from forgetting due to task interference on jointly used parameters or restricted flexibility. The reliance on a static model architecture may lead to the allocation of excessive parameters that are not essential or, conversely, inadequate adaptation for downstream tasks, given that the scale and distribution of incoming data are unpredictable in continual learning. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel fine-tuning approach which automatically decides to reuse or add adapter modules on demand in continual learning, depending on whether drastic distribution shift that could not be handled by existing modules is detected at different representation levels. We design each adapter module to consist of an adapter and a representation descriptor, specifically, implemented as an autoencoder. The representation descriptor functions as a distributional shift indicator during training and triggers adapter expansion. For better usage of the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. By comparing with vision-transformer-based continual learning adaptation methods, we demonstrate that the proposed framework outperforms the state-of-the-art without memory rehearsal.
Updated: 2024-03-27 17:59:21
标题: 预先训练模型的自我扩展:混合适配器用于持续学习
摘要: 持续学习旨在从连续到达的数据流中学习,同时最小化对先前学到知识的遗忘。虽然先前的研究已经探讨了在持续学习中利用预训练模型中的可推广知识的有效性,但现有的参数高效微调方法专注于使用预先确定的或任务特定的一组适配器或提示。然而,这些方法仍然受到任务干扰在共同使用的参数上或受限灵活性上导致的遗忘的影响。依赖于静态模型架构可能会导致分配过多不必要的参数,或者相反,在持续学习中,由于传入数据的规模和分布是不可预测的,可能会导致对下游任务的适应不足。我们提出了一种名为自我扩展的预训练模型与模块化适应(SEMA)的新型微调方法,在持续学习中,它根据在不同表示级别检测到无法由现有模块处理的剧烈分布变化来自动决定是否重新使用或添加适配器模块。我们设计每个适配器模块由适配器和表示描述符组成,具体实现为自编码器。表示描述符在训练过程中作为分布变化指示器,并触发适配器扩展。为了更好地利用适配器,还联合学习了一个可扩展的加权路由器,用于混合适配器输出。通过与基于视觉变换器的持续学习适应方法进行比较,我们证明了所提出的框架在无需记忆回放的情况下优于最先进技术。
更新时间: 2024-03-27 17:59:21
领域: cs.LG,cs.CV
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current frameworks with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpasses the developed private models. Code and models are available at https://github.com/dvlab-research/MiniGemini.
Updated: 2024-03-27 17:59:04
标题: 小双子座:挖掘多模态视觉语言模型的潜力
摘要: 在这项工作中,我们介绍了Mini-Gemini,这是一个简单而有效的框架,可以增强多模态视觉语言模型(VLMs)。尽管VLMs在促进基本视觉对话和推理方面取得了进展,但与GPT-4和Gemini等先进模型相比,仍存在性能差距。我们尝试通过挖掘VLMs的潜力,从高分辨率视觉标记、高质量数据和VLM引导生成三个方面缩小这一差距。为了增强视觉标记,我们提出利用额外的视觉编码器进行高分辨率精细化处理,而不增加视觉标记数量。我们进一步构建了一个高质量数据集,促进精确的图像理解和基于推理的生成,扩大了当前VLMs的操作范围。总的来说,Mini-Gemini进一步挖掘了VLMs的潜力,并同时赋予当前框架图像理解、推理和生成的能力。Mini-Gemini支持一系列2B到34B的密集型和MoE大型语言模型(LLMs)。它已被证明在几个零样本基准测试中取得了领先表现,甚至超过了已开发的私有模型。代码和模型可在https://github.com/dvlab-research/MiniGemini获取。
更新时间: 2024-03-27 17:59:04
领域: cs.CV,cs.AI,cs.CL
Long-form factuality in large language models
Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can achieve superhuman rating performance - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality.
Updated: 2024-03-27 17:48:55
标题: 大型语言模型中的长篇事实性
摘要: 大型语言模型(LLMs)在回应开放性主题上寻找事实性提示时,经常会产生包含事实错误的内容。为了在开放领域中对模型的长篇事实性进行基准测试,我们首先使用GPT-4生成了LongFact,这是一个包含数千个涵盖38个主题的问题的提示集。然后,我们提出LLM代理可以通过一种我们称为搜索增强事实性评估器(SAFE)的方法作为长篇事实性的自动评估器。SAFE利用LLM将长篇回应分解为一组单独的事实,并通过发送搜索查询到Google搜索并确定是否支持搜索结果来评估每个事实的准确性的多步推理过程。此外,我们提出扩展F1分数作为长篇事实性的聚合度量。为此,我们平衡了回应中支持的事实的百分比(精确度)与提供的事实相对于代表用户首选回应长度的超参数(召回率)的百分比。 在实证上,我们证明LLM代理可以实现超人类评级表现 - 在大约16,000个个别事实的一组中,SAFE在72%的情况下与众包人工注释者达成一致意见,在100个不一致情况的随机子集中,SAFE在76%的情况下胜出。同时,SAFE比人类注释者便宜20多倍。我们还在LongFact上对十三个语言模型进行基准测试,跨四个模型家族(Gemini,GPT,Claude和PaLM-2),发现较大的语言模型通常具有更好的长篇事实性。 LongFact,SAFE和所有实验代码均可在https://github.com/google-deepmind/long-form-factuality上找到。
更新时间: 2024-03-27 17:48:55
领域: cs.CL,cs.AI,cs.LG
A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
Large communication costs are a critical bottleneck in training state-of-the-art neural networks on distributed systems. This paper introduces AxoNN, a novel four-dimensional (4D) parallelization approach, inspired by Agarwal's algorithm for matrix multiplication, for parallelizing tensor computations in deep learning, AxoNN employs two key strategies to minimize communication overhead. First, we optimize communication by overlapping expensive collective operations (reduce-scatter, all-gather, all-reduce) with computations. Our experiments with a 20-billion parameter transformer model demonstrate that these optimizations deliver nearly 53\% improvement. Second, we present an analytical model to assist users in identifying communication-minimizing configurations within the vast search space defined by our 4D algorithm. This model empowers practitioners by simplifying the tuning process for their specific training workloads. When training an 80-billion parameter model on 1024 GPUs of Perlmutter, AxoNN surpasses Megatron-LM, a state-of-the-art framework, by a significant 26%. Additionally, it achieves 57% of the theoretical peak FLOP/s.
Updated: 2024-03-27 17:47:56
标题: 一个4D混合算法,将并行训练扩展到数千个GPU
摘要: 大规模通信成本是在分布式系统上训练最先进的神经网络时的一个关键瓶颈。本文介绍了AxoNN,这是一种受Agarwal的矩阵乘法算法启发的新颖的四维(4D)并行化方法,用于在深度学习中并行化张量计算。AxoNN采用了两个关键策略来最小化通信开销。首先,我们通过将昂贵的集体操作(reduce-scatter,all-gather,all-reduce)与计算重叠来优化通信。我们对一个拥有200亿参数的transformer模型的实验表明,这些优化带来了近53%的改进。其次,我们提出了一个分析模型,帮助用户在我们的4D算法定义的广泛搜索空间内识别最小化通信的配置。这个模型通过简化特定训练负载的调整过程,赋予了从业者更多权力。在Perlmutter的1024个GPU上训练一个拥有800亿参数的模型时,AxoNN超越了Megatron-LM,一个最先进的框架,达到了显著的26%。此外,它实现了理论峰值FLOP/s的57%。
更新时间: 2024-03-27 17:47:56
领域: cs.LG,cs.AI,cs.DC,cs.PF
A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok
Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced machine learning (ML) models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts.
Updated: 2024-03-27 17:46:14
标题: 一张图片价值500个标签:Instagram和TikTok本地机器学习模型中的人口统计差异案例研究
摘要: 移动应用程序通过将数据处理转移到用户的智能手机来拥抱用户隐私。先进的机器学习(ML)模型,如视觉模型,现在可以在本地分析用户图像以提取推动多种功能的见解。利用这种新的处理模型,即本地分析用户图像,我们分析了两个流行的社交媒体应用程序TikTok和Instagram,以揭示(1)这两个应用程序中的视觉模型从用户的图像和视频数据中推断出有关用户的见解,以及(2)这些模型在人口统计方面是否存在性能差异。由于视觉模型为年龄验证和面部识别等敏感技术提供信号,了解这些模型中潜在的偏见对于确保用户获得公平和准确的服务至关重要。 我们开发了一种捕捉和评估移动应用程序中ML任务的新方法,克服了诸如代码混淆、本地代码执行和可扩展性等挑战。我们的方法包括ML任务检测、ML管道重构和ML性能评估,特别关注人口统计差异。我们将我们的方法应用于TikTok和Instagram,揭示了重要见解。对于TikTok,我们发现在年龄和性别预测准确性方面存在问题,特别是对于未成年人和黑人个体。在Instagram中,我们的分析揭示了在从图像中提取的500多个视觉概念中存在的人口统计差异,以及人口统计特征和某些概念之间存在虚假相关的证据。
更新时间: 2024-03-27 17:46:14
领域: cs.LG,cs.CR,cs.CY,K.4.2; C.4; D.2.2
Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification
Advancements in clinical treatment are increasingly constrained by the limitations of supervised learning techniques, which depend heavily on large volumes of annotated data. The annotation process is not only costly but also demands substantial time from clinical specialists. Addressing this issue, we introduce the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning. These techniques engage in auxiliary tasks that do not require labeling, thus simplifying the scaling of machine supervision compared to fully-supervised methods. Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks. Notably, we observed that self supervised learning significantly surpassed the performance of supervised methods in the classification of all evaluated datasets. Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets. In line with our commitment to contributing to the scientific community, we have made the S4MI code openly accessible, allowing for broader application and further development of these methods.
Updated: 2024-03-27 17:41:50
标题: 转向机器监督:自动医学图像分割和分类的注释高效半监督和自监督学习
摘要: 临床治疗的进展越来越受监督学习技术的限制,这些技术严重依赖于大量的标注数据。标注过程不仅成本高昂,还需要临床专家大量的时间。针对这一问题,我们引入了S4MI(医学影像的自监督和半监督)管道,这是一种利用自监督和半监督学习的新方法。这些技术参与辅助任务,不需要标记,从而简化了与完全监督方法相比的机器监督的扩展。我们在三个不同的医学影像数据集上对这些技术进行了基准测试,评估它们在分类和分割任务中的有效性。值得注意的是,我们观察到自监督学习在所有评估数据集的分类中明显优于监督方法的表现。显著地,半监督方法在分割方面表现出优越的结果,超过了所有数据集上使用50%更少标签的完全监督方法。为了贡献于科学社区,我们已经开放了S4MI代码,以便更广泛地应用和进一步发展这些方法。
更新时间: 2024-03-27 17:41:50
领域: cs.CV,cs.AI
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
Text-to-image generation systems have emerged as revolutionary tools in the realm of artistic creation, offering unprecedented ease in transforming textual prompts into visual art. However, the efficacy of these systems is intricately linked to the quality of user-provided prompts, which often poses a challenge to users unfamiliar with prompt crafting. This paper addresses this challenge by leveraging user reformulation data from interaction logs to develop an automatic prompt reformulation model. Our in-depth analysis of these logs reveals that user prompt reformulation is heavily dependent on the individual user's capability, resulting in significant variance in the quality of reformulation pairs. To effectively use this data for training, we introduce the Capability-aware Prompt Reformulation (CAPR) framework. CAPR innovatively integrates user capability into the reformulation process through two key components: the Conditional Reformulation Model (CRM) and Configurable Capability Features (CCF). CRM reformulates prompts according to a specified user capability, as represented by CCF. The CCF, in turn, offers the flexibility to tune and guide the CRM's behavior. This enables CAPR to effectively learn diverse reformulation strategies across various user capacities and to simulate high-capability user reformulation during inference. Extensive experiments on standard text-to-image generation benchmarks showcase CAPR's superior performance over existing baselines and its remarkable robustness on unseen systems. Furthermore, comprehensive analyses validate the effectiveness of different components. CAPR can facilitate user-friendly interaction with text-to-image systems and make advanced artistic creation more achievable for a broader range of users.
Updated: 2024-03-27 17:41:16
标题: 能力感知提示重构学习用于文本到图像生成
摘要: 文本到图像生成系统已经成为艺术创作领域的革命性工具,为将文本提示转化为视觉艺术提供了前所未有的便利。然而,这些系统的有效性与用户提供的提示质量密切相关,这常常对不熟悉提示制作的用户构成挑战。本文通过利用交互日志中的用户重组数据来开发自动提示重组模型来解决这一挑战。我们对这些日志进行了深入分析,发现用户提示重组严重依赖于个体用户的能力,导致重组对之间质量存在显著差异。为了有效利用这些数据进行训练,我们引入了 Capability-aware Prompt Reformulation(CAPR)框架。CAPR通过两个关键组件 - 条件重组模型(CRM)和可配置能力特征(CCF) - 创新地将用户能力整合到重组过程中。CRM根据CCF表示的指定用户能力重组提示。而CCF则提供了调整和指导CRM行为的灵活性。这使得CAPR能够有效地学习各种用户容量下的不同重组策略,并在推断过程中模拟高能力用户的重组。对标准文本到图像生成基准的广泛实验展示了CAPR相对于现有基准的优越性能以及在未知系统上的显著稳健性。此外,全面的分析验证了不同组件的有效性。CAPR可以促进用户友好地与文本到图像系统进行交互,并使更广泛范围的用户能够更轻松地进行高级艺术创作。
更新时间: 2024-03-27 17:41:16
领域: cs.CL,cs.AI,cs.CV,cs.IR
CrystalBox: Future-Based Explanations for Input-Driven Deep RL Systems
We present CrystalBox, a novel, model-agnostic, posthoc explainability framework for Deep Reinforcement Learning (DRL) controllers in the large family of input-driven environments which includes computer systems. We combine the natural decomposability of reward functions in input-driven environments with the explanatory power of decomposed returns. We propose an efficient algorithm to generate future-based explanations across both discrete and continuous control environments. Using applications such as adaptive bitrate streaming and congestion control, we demonstrate CrystalBox's capability to generate high-fidelity explanations. We further illustrate its higher utility across three practical use cases: contrastive explanations, network observability, and guided reward design, as opposed to prior explainability techniques that identify salient features.
Updated: 2024-03-27 17:38:27
标题: 晶体盒:面向输入驱动深度强化学习系统的基于未来的解释
摘要: 我们提出了CrystalBox,这是一个新颖的、与模型无关的、后续可解释性框架,适用于深度强化学习(DRL)控制器在包括计算机系统在内的输入驱动环境中。我们结合了输入驱动环境中奖励函数的自然可分解性与分解回报的解释能力。我们提出了一种高效的算法,能够在离散和连续控制环境中生成基于未来的解释。通过应用如自适应比特率流和拥塞控制,我们展示了CrystalBox生成高保真解释的能力。我们进一步展示了其在三个实际用例中的更高效用性:对比解释、网络可观察性和引导奖励设计,与之前识别显著特征的可解释性技术相对。
更新时间: 2024-03-27 17:38:27
领域: cs.LG,cs.NI,cs.SY,eess.SY
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Large Language Models exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.
Updated: 2024-03-27 17:34:57
标题: Agent-Pro: 通过策略级反思和优化学习进化
摘要: 大型语言模型展示了在多样化任务中具有强大的问题解决能力。然而,大多数基于LLM的代理被设计为特定任务解决者,需要复杂的提示工程,而不是能够通过互动学习和进化的代理。这些任务解决者需要手动制定提示来通知任务规则和调节LLM行为,从根本上无法解决复杂的动态场景,例如大型互动游戏。基于此,我们提出了Agent-Pro:一种具有政策级反思和优化的基于LLM的代理,可以从互动经验中学习丰富的专业知识,并逐渐提升其行为政策。具体来说,它涉及动态信念生成和反思过程以进行政策演化。Agent-Pro不是通过行动级反思,而是迭代地反思过去的轨迹和信念,调整其不合理的信念以获得更好的政策。此外,采用深度优先搜索进行政策优化,确保政策回报不断提高。Agent-Pro在两个游戏中进行评估:二十一点和德州扑克,表现优于基础LLM和专门模型。我们的结果显示Agent-Pro能够学习和发展在复杂和动态的场景中,这也有益于许多基于LLM的应用。
更新时间: 2024-03-27 17:34:57
领域: cs.AI,cs.CL
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM's response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We find LLMs often exhibit overconfidence or high randomness when self-evaluate, offering stubborn or inconsistent feedback, which causes poor reflection. To remedy this, we advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies. Our method endows LLM with diverse perspectives to alleviate stubborn biases. Moreover, their discrepancies indicate potential errors or inherent uncertainties that LLM often overlooks. Reflecting upon these can catalyze more accurate and stable reflection. Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.
Updated: 2024-03-27 17:24:47
标题: 自我对比:通过不一致的解决视角获得更好的反思
摘要: 大型语言模型(LLM)的反思能力引起了广泛关注。事后提示策略,例如反思和自我完善,根据自我评估或外部反馈对LLM的响应进行了完善。然而,最近的研究表明,缺乏外部反馈时,LLM的内在反思是不稳定的。我们的调查揭示了自我评估反馈质量是关键瓶颈。我们发现LLM在自我评估时经常表现出过于自信或高度随机,提供固执或不一致的反馈,导致反思质量不佳。为了解决这个问题,我们提倡自我对比:它灵活地探索针对请求量身定制的多样化解决视角,对比差异,并将这些差异总结成一个清单,可用于重新审查和消除差异。我们的方法赋予LLM多样化的视角,以减轻固执的偏见。此外,它们的差异表明LLM经常忽视的潜在错误或固有不确定性。对这些进行反思可以催化更准确和稳定的反思。在一系列推理和翻译任务上进行的实验,与不同的LLM一起进行,以突显我们的策略的有效性和普遍性。
更新时间: 2024-03-27 17:24:47
领域: cs.CL,cs.AI
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for synthesizing hard images that benchmark deep models' robustness. Leveraging diffusion models, we are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D. Experimental results show that ImageNet-D results in a significant accuracy drop to a range of vision models, from the standard ResNet visual classifier to the latest foundation models like CLIP and MiniGPT-4, significantly reducing their accuracy by up to 60\%. Our work suggests that diffusion models can be an effective source to test vision models. The code and dataset are available at https://github.com/chenshuang-zhang/imagenet_d.
Updated: 2024-03-27 17:23:39
标题: ImageNet-D:在扩散合成对象上对神经网络稳健性进行基准测试
摘要: 我们为视觉感知稳健性建立了严格的基准。像ImageNet-C、ImageNet-9和Stylized ImageNet这样的合成图像提供了针对合成损坏、背景和纹理的特定类型评估,然而这些稳健性基准受限于指定变化并且合成质量较低。在这项工作中,我们引入生成模型作为合成难图像的数据源,以评估深度模型的稳健性。通过利用扩散模型,我们能够生成比先前任何作品更多样化的背景、纹理和材料的图像,我们将这一基准称为ImageNet-D。实验结果表明,ImageNet-D导致一系列视觉模型的显著准确度下降,从标准的ResNet视觉分类器到最新的基础模型如CLIP和MiniGPT-4,将它们的准确度显著降低了高达60\%。我们的工作表明,扩散模型可以是测试视觉模型的有效来源。代码和数据集可在https://github.com/chenshuang-zhang/imagenet_d上找到。
更新时间: 2024-03-27 17:23:39
领域: cs.CV,cs.AI,cs.LG
Generalization Bounds: Perspectives from Information Theory and PAC-Bayes
A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of PAC-Bayesian and information-theoretic generalization bounds. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.
Updated: 2024-03-27 17:07:47
标题: 泛化界限:信息论和PAC-Bayes的视角
摘要: 在理论机器学习中一个基本问题是泛化。在过去的几十年中,PAC-Bayesian方法已经被确立为一个灵活的框架,用来解决机器学习算法的泛化能力,并设计新的算法。最近,由于其在各种学习算法中的潜在适用性,特别是深度神经网络,它已经引起了越来越多的兴趣。与此同时,一个信息论视角的泛化理论也在发展中,其中建立了泛化与各种信息度量之间的关系。这个框架与PAC-Bayesian方法密切相关,并且一些结果在这两个方面都独立发现了。在这本专著中,我们强调了这种强大的联系,并提出了关于PAC-Bayesian和信息论泛化界限的统一处理。我们展示了这两种观点共同具有的技术和结果,并讨论了不同之处的方法和解释。特别是,我们展示了该领域许多证明共享模块化结构,通过这种结构可以直观理解底层思想。我们特别关注条件互信息(CMI)框架;学习算法信息复杂度的分析研究;以及将所提出的方法应用于深度学习。这本专著旨在为信息论泛化界限及其与PAC-Bayes的联系提供全面介绍,为最新进展的获取提供基础。它广泛面向对泛化和理论机器学习感兴趣的研究人员。
更新时间: 2024-03-27 17:07:47
领域: cs.LG,cs.AI,cs.IT,math.IT,math.ST,stat.ML,stat.TH
Decoupled Data Consistency with Diffusion Purification for Image Restoration
Diffusion models have recently gained traction as a powerful class of deep generative priors, excelling in a wide range of image restoration tasks due to their exceptional ability to model data distributions. To solve image restoration problems, many existing techniques achieve data consistency by incorporating additional likelihood gradient steps into the reverse sampling process of diffusion models. However, the additional gradient steps pose a challenge for real-world practical applications as they incur a large computational overhead, thereby increasing inference time. They also present additional difficulties when using accelerated diffusion model samplers, as the number of data consistency steps is limited by the number of reverse sampling steps. In this work, we propose a novel diffusion-based image restoration solver that addresses these issues by decoupling the reverse process from the data consistency steps. Our method involves alternating between a reconstruction phase to maintain data consistency and a refinement phase that enforces the prior via diffusion purification. Our approach demonstrates versatility, making it highly adaptable for efficient problem-solving in latent space. Additionally, it reduces the necessity for numerous sampling steps through the integration of consistency models. The efficacy of our approach is validated through comprehensive experiments across various image restoration tasks, including image denoising, deblurring, inpainting, and super-resolution.
Updated: 2024-03-27 17:06:10
标题: 使用扩散净化实现图像恢复的数据一致性解耦
摘要: 扩散模型最近作为一类强大的深度生成先验类别而受到关注,在广泛的图像恢复任务中表现出色,这是由于它们在建模数据分布方面的异常能力。为了解决图像恢复问题,许多现有技术通过将额外的似然梯度步骤纳入到扩散模型的逆采样过程中,实现数据一致性。然而,额外的梯度步骤对于现实世界的实际应用构成挑战,因为它们会带来大量的计算开销,从而增加推断时间。当使用加速扩散模型采样器时,它们还会在数据一致性步骤的数量受到逆采样步骤数量限制时带来额外的困难。在这项工作中,我们提出了一种新颖的基于扩散的图像恢复求解器,通过将逆过程与数据一致性步骤解耦,解决了这些问题。我们的方法涉及在维持数据一致性的重建阶段和通过扩散净化强制执行先验的细化阶段之间交替。我们的方法展示了多样性,使其在潜在空间中高效问题解决。此外,通过整合一致性模型,它减少了大量采样步骤的需要。我们的方法的有效性通过在各种图像恢复任务中进行全面实验得到验证,包括图像去噪、去模糊、修补和超分辨率。
更新时间: 2024-03-27 17:06:10
领域: eess.IV,cs.AI,cs.CV,cs.LG,eess.SP
Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
Updated: 2024-03-27 17:05:03
标题: 通过竞争性随机样本大小优化在Big-means中实现卓越的并行大数据聚类
摘要: 这篇论文介绍了一种新颖的K-means聚类算法,是对传统Big-means方法的一种进步。提出的方法有效地整合了并行处理、随机抽样和竞争优化,创建了一个可扩展的变体,专为大数据应用而设计。它解决了传统技术通常面临的可扩展性和计算时间挑战。该算法在执行过程中动态调整每个工作者的样本大小,优化性能。这些样本大小的数据不断被分析,有助于确定最有效的配置。通过在使用不同样本大小的工作者之间加入竞争元素,进一步激发了Big-means算法内的效率。本质上,该算法通过在并行计算环境中采用随机、竞争抽样策略来平衡计算时间和聚类质量。
更新时间: 2024-03-27 17:05:03
领域: cs.LG,cs.AI,cs.DC,cs.IR
CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning
Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.
Updated: 2024-03-27 17:03:31
标题: CaT:约束作为四足动物运动强化学习的终止
摘要: 深度强化学习(RL)在解决复杂的机器人任务(如四足动作)方面取得了令人印象深刻的成果。然而,当前的解决方案无法产生尊重硬性约束的高效策略。在这项工作中,我们主张将约束整合到机器人学习中,并提出将约束视为终止条件(CaT)的新型受约束RL算法。我们通过策略学习中的随机终止重新制定约束,即任何约束违反都会触发终止RL代理可能获得的未来奖励的概率。我们提出了一种算法方法来实现这种制定,通过对机器人学习中广泛使用的现成RL算法(如近端策略优化)进行最小修改。我们的方法在不引入不必要的复杂性和计算开销的情况下实现了出色的约束遵守,从而减轻了更广泛采用的障碍。通过对真实四足机器人Solo跨越具有挑战性的障碍的经验评估,我们证明了CaT为将约束纳入RL框架提供了令人信服的解决方案。视频和代码可在https://constraints-as-terminations.github.io 上获得。
更新时间: 2024-03-27 17:03:31
领域: cs.RO,cs.LG
ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.
Updated: 2024-03-27 17:01:10
标题: ModaLink:统一多模态以实现高效的图像到点云地点识别
摘要: 地点识别是机器人和自动驾驶汽车定位自身并在预先构建的地图中闭环的重要任务。虽然基于单一传感器的方法已经显示出令人满意的性能,但从点云数据库中检索图像的跨模态地点识别仍然是一个具有挑战性的问题。当前的跨模态方法将图像转换为3D点,使用深度估计进行模态转换,通常计算密集并且需要昂贵的标记数据进行深度监督。在这项工作中,我们介绍了一个快速而轻量级的框架,将图像和点云编码为具有地点特征的描述符。我们提出了一个有效的视场(FoV)转换模块,将点云转换为与图像类似的模态。这个模块消除了深度估计的必要性,并帮助后续模块实现实时性能。我们进一步设计了基于非负因子分解的编码器,提取点云和图像之间互相一致的语义特征。这个编码器为检索产生了更具有区别性的全局描述符。在KITTI数据集上的实验结果显示,我们提出的方法在实时运行时达到了最先进的性能。对覆盖17公里轨迹的HAOMO数据集的额外评估进一步显示了实际泛化能力。我们已将我们的方法的实现作为开源发布在:https://github.com/haomo-ai/ModaLink.git。
更新时间: 2024-03-27 17:01:10
领域: cs.CV,cs.AI,cs.RO
FedSN: A Novel Federated Learning Framework over LEO Satellite Networks
Recently, a large number of Low Earth Orbit (LEO) satellites have been launched and deployed successfully in space by commercial companies, such as SpaceX. Due to multimodal sensors equipped by the LEO satellites, they serve not only for communication but also for various machine learning applications, such as space modulation recognition, remote sensing image classification, etc. However, the ground station (GS) may be incapable of downloading such a large volume of raw sensing data for centralized model training due to the limited contact time with LEO satellites (e.g. 5 minutes). Therefore, federated learning (FL) has emerged as the promising solution to address this problem via on-device training. Unfortunately, to enable FL on LEO satellites, we still face three critical challenges that are i) heterogeneous computing and memory capabilities, ii) limited uplink rate, and iii) model staleness. To this end, we propose FedSN as a general FL framework to tackle the above challenges, and fully explore data diversity on LEO satellites. Specifically, we first present a novel sub-structure scheme to enable heterogeneous local model training considering different computing, memory, and communication constraints on LEO satellites. Additionally, we propose a pseudo-synchronous model aggregation strategy to dynamically schedule model aggregation for compensating model staleness. To further demonstrate the effectiveness of the FedSN, we evaluate it using space modulation recognition and remote sensing image classification tasks by leveraging the data from real-world satellite networks. Extensive experimental results demonstrate that FedSN framework achieves higher accuracy, lower computing, and communication overhead than the state-of-the-art benchmarks and the effectiveness of each components in FedSN.
Updated: 2024-03-27 16:56:23
标题: FedSN:一种新颖的基于低地球轨道卫星网络的联邦学习框架
摘要: 最近,许多低地球轨道(LEO)卫星已经被商业公司成功发射并在太空中部署,例如SpaceX。由于LEO卫星配备了多模式传感器,它们不仅用于通信,还用于各种机器学习应用,如空间调制识别、遥感图像分类等。然而,由于地面站(GS)可能无法下载如此大量的原始传感数据进行中心化模型训练,这是由于与LEO卫星的有限接触时间(例如5分钟)而导致的。因此,联邦学习(FL)已成为通过设备端训练解决此问题的有希望的解决方案。不幸的是,要在LEO卫星上启用FL,我们仍然面临三个关键挑战,即i)异构计算和存储能力,ii)有限的上行速率,和iii)模型陈旧问题。为此,我们提出了FedSN作为一个通用的FL框架来解决上述挑战,并充分探索LEO卫星上的数据多样性。具体而言,我们首先提出了一种新颖的子结构方案,以启用考虑LEO卫星上不同计算、内存和通信约束的异构本地模型训练。此外,我们提出了一种伪同步模型聚合策略,动态调度模型聚合以补偿模型陈旧。为了进一步证明FedSN的有效性,我们利用来自真实卫星网络的数据对其进行了空间调制识别和遥感图像分类任务的评估。大量的实验结果表明,FedSN框架实现了比最先进基准更高的准确性,更低的计算和通信开销,并证明了FedSN中每个组件的有效性。
更新时间: 2024-03-27 16:56:23
领域: cs.LG,cs.AI,cs.DC
Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray
Aims. To develop a deep-learning based system for recognition of subclinical atherosclerosis on a plain frontal chest x-ray. Methods and Results. A deep-learning algorithm to predict coronary artery calcium (CAC) score (the AI-CAC model) was developed on 460 chest x-ray (80% training cohort, 20% internal validation cohort) of primary prevention patients (58.4% male, median age 63 [51-74] years) with available paired chest x-ray and chest computed tomography (CT) indicated for any clinical reason and performed within 3 months. The CAC score calculated on chest CT was used as ground truth. The model was validated on an temporally-independent cohort of 90 patients from the same institution (external validation). The diagnostic accuracy of the AI-CAC model assessed by the area under the curve (AUC) was the primary outcome. Overall, median AI-CAC score was 35 (0-388) and 28.9% patients had no AI-CAC. AUC of the AI-CAC model to identify a CAC>0 was 0.90 in the internal validation cohort and 0.77 in the external validation cohort. Sensitivity was consistently above 92% in both cohorts. In the overall cohort (n=540), among patients with AI-CAC=0, a single ASCVD event occurred, after 4.3 years. Patients with AI-CAC>0 had significantly higher Kaplan Meier estimates for ASCVD events (13.5% vs. 3.4%, log-rank=0.013). Conclusion. The AI-CAC model seems to accurately detect subclinical atherosclerosis on chest x-ray with elevated sensitivity, and to predict ASCVD events with elevated negative predictive value. Adoption of the AI-CAC model to refine CV risk stratification or as an opportunistic screening tool requires prospective evaluation.
Updated: 2024-03-27 16:56:14
标题: 通过基于图像的深度学习在胸部X射线上检测亚临床动脉粥样硬化
摘要: 目的。开发一个基于深度学习的系统,用于在普通胸部正面X射线上识别亚临床动脉硬化。方法和结果。开发了一个深度学习算法来预测冠状动脉钙化(CAC)评分(AI-CAC模型),在460例初级预防患者的胸部X射线(80%训练队列,20%内部验证队列)上进行了开发,这些患者的胸部X射线和胸部计算机断层扫描(CT)可用于任何临床原因,并在3个月内进行。在胸部CT上计算的CAC分数被用作实际情况。该模型在同一机构的90名患者的时间独立队列上进行了验证(外部验证)。通过曲线下面积(AUC)评估AI-CAC模型的诊断准确性是主要结果。总体而言,中位AI-CAC分数为35(0-388),28.9%的患者没有AI-CAC。AI-CAC模型识别CAC>0的AUC在内部验证队列中为0.90,在外部验证队列中为0.77。在两个队列中,灵敏度始终在92%以上。在整体队列(n=540)中,AI-CAC=0的患者中发生了一次ASCVD事件,持续4.3年。AI-CAC>0的患者ASCVD事件的Kaplan Meier估计值显著较高(13.5% vs. 3.4%,对数秩=0.013)。结论。AI-CAC模型似乎能够准确地在胸部X射线上检测亚临床动脉硬化,并且能够预测ASCVD事件,具有较高的负预测价值。采用AI-CAC模型以优化心血管风险分层或作为一种机会性筛查工具需要进行前瞻性评估。
更新时间: 2024-03-27 16:56:14
领域: cs.CV,cs.AI,cs.LG
Simplified Diffusion Schrödinger Bridge
This paper introduces a novel theoretical simplification of the Diffusion Schr\"odinger Bridge (DSB) that facilitates its unification with Score-based Generative Models (SGMs), addressing the limitations of DSB in complex data generation and enabling faster convergence and enhanced performance. By employing SGMs as an initial solution for DSB, our approach capitalizes on the strengths of both frameworks, ensuring a more efficient training process and improving the performance of SGM. We also propose a reparameterization technique that, despite theoretical approximations, practically improves the network's fitting capabilities. Our extensive experimental evaluations confirm the effectiveness of the simplified DSB, demonstrating its significant improvements. We believe the contributions of this work pave the way for advanced generative modeling. The code is available at https://github.com/checkcrab/SDSB.
Updated: 2024-03-27 16:49:35
标题: 简化扩散薛定谔桥
摘要: 本文介绍了一种新颖的扩散薛定谔桥(DSB)的理论简化,促进了其与基于得分的生成模型(SGMs)的统一,解决了DSB在复杂数据生成中的局限性,并实现了更快的收敛和增强性能。通过将SGMs作为DSB的初始解决方案,我们的方法利用了两个框架的优势,确保了更高效的训练过程并提高了SGM的性能。我们还提出了一种重新参数化技术,尽管存在理论近似,但实际上提高了网络的拟合能力。我们的广泛实验评估证实了简化的DSB的有效性,展示了其显著的改进。我们相信本研究的贡献为先进的生成建模铺平了道路。代码可在https://github.com/checkcrab/SDSB 上找到。
更新时间: 2024-03-27 16:49:35
领域: cs.LG,cs.CV
Nonlinear Control Allocation: A Learning Based Approach
Modern aircraft are designed with redundant control effectors to cater for fault tolerance and maneuverability requirements. This leads to aircraft being over-actuated and requires control allocation schemes to distribute the control commands among control effectors. Traditionally, optimization-based control allocation schemes are used; however, for nonlinear allocation problems, these methods require large computational resources. In this work, an artificial neural network (ANN) based nonlinear control allocation scheme is proposed. The proposed scheme is composed of learning the inverse of the control effectiveness map through ANN, and then implementing it as an allocator instead of solving an online optimization problem. Stability conditions are presented for closed-loop systems incorporating the allocator, and computational challenges are explored with piece-wise linear effectiveness functions and ANN-based allocators. To demonstrate the efficacy of the proposed scheme, it is compared with a standard quadratic programming-based method for control allocation.
Updated: 2024-03-27 16:45:26
标题: 非线性控制分配:基于学习的方法
摘要: 现代飞机设计具有冗余控制效应器,以满足容错性和机动性要求。这导致飞机过度激活,并需要控制分配方案将控制命令分配给控制效应器。传统上,使用基于优化的控制分配方案;然而,对于非线性分配问题,这些方法需要大量的计算资源。在这项工作中,提出了一种基于人工神经网络(ANN)的非线性控制分配方案。所提出的方案通过ANN学习控制有效性映射的逆,然后将其实施为分配器,而不是解决在线优化问题。为包含分配器的闭环系统提出了稳定性条件,并通过分段线性有效性函数和基于ANN的分配器探索了计算挑战。为了展示所提出方案的有效性,将其与基于标准二次规划的控制分配方法进行了比较。
更新时间: 2024-03-27 16:45:26
领域: eess.SY,cs.AI,cs.SY,math.OC
Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks
Discriminatively trained, deterministic neural networks are the de facto choice for classification problems. However, even though they achieve state-of-the-art results on in-domain test sets, they tend to be overconfident on out-of-distribution (OOD) data. For instance, ReLU networks - a popular class of neural network architectures - have been shown to almost always yield high confidence predictions when the test data are far away from the training set, even when they are trained with OOD data. We overcome this problem by adding a term to the output of the neural network that corresponds to the logit of an extra class, that we design to dominate the logits of the original classes as we move away from the training data.This technique provably prevents arbitrarily high confidence on far-away test data while maintaining a simple discriminative point-estimate training. Evaluation on various benchmarks demonstrates strong performance against competitive baselines on both far-away and realistic OOD data.
Updated: 2024-03-27 16:44:22
标题: 防止在点估计的判别神经网络中对远处数据产生任意高的置信度
摘要: 经过区分训练的确定性神经网络已成为分类问题的事实选择。然而,尽管它们在领域内测试集上取得了最先进的结果,但它们在超出分布(OOD)数据上往往过于自信。例如,ReLU网络 - 一类流行的神经网络架构 - 已被证明在测试数据远离训练集时几乎总是产生高置信度的预测,即使它们是用OOD数据进行训练的。我们通过向神经网络的输出添加一个术语,该术语对应于额外类别的logit,我们设计该类别在远离训练数据时支配原始类别的logit。这种技术可以有效防止在远离测试数据时产生任意高的置信度,同时保持简单的区分点估计训练。在各种基准测试上的评估表明,与具有竞争基线的远离和现实OOD数据相比,性能强劲。
更新时间: 2024-03-27 16:44:22
领域: cs.LG
Understanding the Learning Dynamics of Alignment with Human Feedback
Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these methods affect model behavior remains an open question. Our work provides an initial attempt to theoretically analyze the learning dynamics of human preference alignment. We formally show how the distribution of preference datasets influences the rate of model updates and provide rigorous guarantees on the training accuracy. Our theory also reveals an intricate phenomenon where the optimization is prone to prioritizing certain behaviors with higher preference distinguishability. We empirically validate our findings on contemporary LLMs and alignment tasks, reinforcing our theoretical insights and shedding light on considerations for future alignment approaches. Disclaimer: This paper contains potentially offensive text; reader discretion is advised.
Updated: 2024-03-27 16:39:28
标题: 理解与人类反馈对齐的学习动态
摘要: 将大型语言模型(LLMs)与人类意图对齐已经成为在现实世界系统中安全部署模型的关键任务。虽然现有的对齐方法已经取得了经验上的成功,但理论上了解这些方法如何影响模型行为仍然是一个开放的问题。我们的工作首次尝试从理论上分析人类偏好对齐的学习动态。我们正式展示了偏好数据集的分布如何影响模型更新速度,并对训练准确性提供严格的保证。我们的理论还揭示了一个复杂的现象,即优化倾向于优先考虑具有更高偏好区分度的行为。我们在当代LLMs和对齐任务上经验验证了我们的发现,加强了我们的理论见解,并为未来对齐方法的考虑提供了启示。免责声明:本文包含潜在冒犯性文本,请谨慎阅读。
更新时间: 2024-03-27 16:39:28
领域: cs.LG,cs.AI
Usage-Specific Survival Modeling Based on Operational Data and Neural Networks
Accurate predictions of when a component will fail are crucial when planning maintenance, and by modeling the distribution of these failure times, survival models have shown to be particularly useful in this context. The presented methodology is based on conventional neural network-based survival models that are trained using data that is continuously gathered and stored at specific times, called snapshots. An important property of this type of training data is that it can contain more than one snapshot from a specific individual which results in that standard maximum likelihood training can not be directly applied since the data is not independent. However, the papers show that if the data is in a specific format where all snapshot times are the same for all individuals, called homogeneously sampled, maximum likelihood training can be applied and produce desirable results. In many cases, the data is not homogeneously sampled and in this case, it is proposed to resample the data to make it homogeneously sampled. How densely the dataset is sampled turns out to be an important parameter; it should be chosen large enough to produce good results, but this also increases the size of the dataset which makes training slow. To reduce the number of samples needed during training, the paper also proposes a technique to, instead of resampling the dataset once before the training starts, randomly resample the dataset at the start of each epoch during the training. The proposed methodology is evaluated on both a simulated dataset and an experimental dataset of starter battery failures. The results show that if the data is homogeneously sampled the methodology works as intended and produces accurate survival models. The results also show that randomly resampling the dataset on each epoch is an effective way to reduce the size of the training data.
Updated: 2024-03-27 16:32:32
标题: 基于运行数据和神经网络的特定用途生存建模
摘要: 准确预测组件故障发生的时间对于计划维护至关重要,通过建模这些故障时间的分布,生存模型在这方面已经被证明特别有用。所提出的方法基于基于传统神经网络的生存模型,这些模型使用在特定时间连续收集和存储的数据进行训练,称为快照。这种类型的训练数据的一个重要特性是,它可以包含来自特定个体的多个快照,这导致标准的最大似然训练无法直接应用,因为数据不是独立的。然而,研究表明,如果数据以所有个体的所有快照时间相同的特定格式存在,称为均匀采样,最大似然训练可以应用并产生理想的结果。在许多情况下,数据并非均匀采样,在这种情况下,建议重新对数据进行采样以使其均匀采样。数据集的采样密度被证明是一个重要参数;应该选择足够大以产生良好的结果,但这也会增加数据集的大小,使训练变慢。为了减少训练过程中所需的样本数量,本文还提出了一种技术,即在训练开始前不仅一次重新对数据集进行采样,而是在训练过程中每个时期的开始时随机重新对数据集进行采样。所提出的方法在模拟数据集和一个关于起动电池故障的实验数据集上进行了评估。结果表明,如果数据均匀采样,则该方法按预期运行并产生准确的生存模型。结果还表明,每个时期随机重新对数据集进行采样是减少训练数据大小的有效方法。
更新时间: 2024-03-27 16:32:32
领域: cs.LG,cs.SY,eess.SY,stat.ML
Nonlinear model reduction for operator learning
Operator learning provides methods to approximate mappings between infinite-dimensional function spaces. Deep operator networks (DeepONets) are a notable architecture in this field. Recently, an extension of DeepONet based on model reduction and neural networks, proper orthogonal decomposition (POD)-DeepONet, has been able to outperform other architectures in terms of accuracy for several benchmark tests. We extend this idea towards nonlinear model order reduction by proposing an efficient framework that combines neural networks with kernel principal component analysis (KPCA) for operator learning. Our results demonstrate the superior performance of KPCA-DeepONet over POD-DeepONet.
Updated: 2024-03-27 16:24:26
标题: 非线性模型简化用于操作员学习
摘要: Operator learning提供了近似无限维函数空间之间映射的方法。深度算子网络(DeepONets)是这一领域中一种显著的架构。最近,基于模型简化和神经网络的DeepONet扩展,称为适当奇异分解(POD)-DeepONet,在几个基准测试中表现出色,准确度超过其他架构。我们将这一想法扩展到非线性模型降阶,提出了一种高效的框架,将神经网络与核主成分分析(KPCA)相结合用于算子学习。我们的结果显示,KPCA-DeepONet比POD-DeepONet表现更优。
更新时间: 2024-03-27 16:24:26
领域: cs.LG,cs.NA,math.NA
Enhancing Manufacturing Quality Prediction Models through the Integration of Explainability Methods
This research presents a method that utilizes explainability techniques to amplify the performance of machine learning (ML) models in forecasting the quality of milling processes, as demonstrated in this paper through a manufacturing use case. The methodology entails the initial training of ML models, followed by a fine-tuning phase where irrelevant features identified through explainability methods are eliminated. This procedural refinement results in performance enhancements, paving the way for potential reductions in manufacturing costs and a better understanding of the trained ML models. This study highlights the usefulness of explainability techniques in both explaining and optimizing predictive models in the manufacturing realm.
Updated: 2024-03-27 16:21:24
标题: 通过整合解释性方法提升制造质量预测模型
摘要: 这项研究提出了一种利用可解释性技术来增强机器学习(ML)模型在预测铣削过程质量方面性能的方法,正如本文通过制造业案例所展示的那样。该方法涉及首先对ML模型进行初始训练,然后通过可解释性方法识别并消除无关特征的微调阶段。这种程序上的改进导致性能提升,为潜在的降低制造成本和更好理解训练后的ML模型铺平了道路。这项研究突出了可解释性技术在制造领域中解释和优化预测模型的实用性。
更新时间: 2024-03-27 16:21:24
领域: cs.AI,cs.CV,cs.CY,cs.LG
Probabilistic Model Checking of Stochastic Reinforcement Learning Policies
We introduce a method to verify stochastic reinforcement learning (RL) policies. This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property. In this setting, the future state of the environment should depend solely on its current state and the action executed, independent of any previous states or actions. Our method integrates a verification technique, referred to as model checking, with RL, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm. We demonstrate our method's applicability across multiple benchmarks, comparing it to baseline methods called deterministic safety estimates and naive monolithic model checking. Our results show that our method is suited to verify stochastic RL policies.
Updated: 2024-03-27 16:15:21
标题: 随机强化学习策略的概率模型检验
摘要: 我们介绍了一种验证随机强化学习(RL)策略的方法。只要算法及其对应的环境共同遵守马尔可夫性质,该方法就与任何RL算法兼容。在这种设置中,环境的未来状态应仅依赖于其当前状态和执行的动作,独立于任何先前的状态或动作。我们的方法将一种称为模型检验的验证技术与RL集成,利用马尔可夫决策过程、训练好的RL策略和概率计算树逻辑(PCTL)公式构建一个形式模型,随后可以通过模型检验器Storm进行验证。我们展示了我们的方法在多个基准测试中的适用性,将其与称为确定性安全估计和天真的整体模型检验的基线方法进行比较。我们的结果表明我们的方法适用于验证随机RL策略。
更新时间: 2024-03-27 16:15:21
领域: cs.AI
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data.
Updated: 2024-03-27 16:14:34
标题: 自然语音3:使用分解编解码器和扩散模型进行零样本语音合成
摘要: 尽管最近的大规模文本转语音(TTS)模型取得了显著进展,但在语音质量、相似度和韵律方面仍存在不足。考虑到语音复杂地涵盖了各种属性(如内容、韵律、音色和声学细节),这给生成带来了重大挑战,一个自然的想法是将语音因子分解为代表不同属性的个体子空间,并分别生成它们。在此基础上,我们提出了NaturalSpeech 3,这是一个具有新颖的因子扩散模型的TTS系统,可以以零样本的方式生成自然语音。具体来说,1)我们设计了一个具有因子化向量量化(FVQ)的神经编解码器,将语音波形解开为内容、韵律、音色和声学细节的子空间;2)我们提出了一个因子扩散模型,根据其对应的提示生成每个子空间中的属性。通过这种因子化设计,NaturalSpeech 3能够以分而治之的方式有效、高效地建模复杂的语音。实验证明,NaturalSpeech 3在质量、相似度、韵律和可理解性方面优于最先进的TTS系统,并且可以实现与人类录音相媲美的质量。此外,通过扩展到10亿个参数和20万小时的训练数据,我们实现了更好的性能。
更新时间: 2024-03-27 16:14:34
领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD
Testing Resource Isolation for System-on-Chip Architectures
Ensuring resource isolation at the hardware level is a crucial step towards more security inside the Internet of Things. Even though there is still no generally accepted technique to generate appropriate tests, it became clear that tests should be generated at the system level. In this paper, we illustrate the modeling aspects in test generation for resource isolation, namely modeling the behavior and expressing the intended test scenario. We present both aspects using the industrial standard PSS and an academic approach based on conformance testing.
Updated: 2024-03-27 16:11:23
标题: 测试片上系统架构中的资源隔离
摘要: 确保在硬件级别实现资源隔离是迈向物联网内部更安全的关键步骤。尽管目前仍然没有被普遍接受的生成适当测试的技术,但已经明确测试应该在系统级别生成。本文中,我们阐述了在资源隔离测试生成中的建模方面,即建模行为和表达预期测试场景。我们使用工业标准PSS和基于一致性测试的学术方法来展示这两个方面。
更新时间: 2024-03-27 16:11:23
领域: cs.AR,cs.CR,cs.SE
Semi-Supervised Learning for Deep Causal Generative Models
Developing models that can answer questions of the form "How would $x$ change if $y$ had been $z$?" is fundamental for advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that corresponding labels are available in training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.
Updated: 2024-03-27 16:06:37
标题: 深度因果生成模型的半监督学习
摘要: 开发能够回答“如果$y$是$z$,那么$x$会如何改变?”这种形式问题的模型对于推进医学图像分析至关重要。目前,训练能够解决这种因果关系问题的生成模型需要观察到所有相关变量并且相应的标签在训练数据中可用。然而,临床数据可能并没有所有患者的完整记录,而且最先进的因果生成模型无法充分利用这些数据。因此,我们首次开发了一种半监督深度因果生成模型,利用变量之间的因果关系来最大程度地利用所有可用数据。我们在每个样本要么完全标记要么完全未标记的情况下进行了探索,以及在每个样本中有不同标签缺失的更具临床现实性的情况下进行了研究。我们利用因果推断技术来推断缺失值,然后生成逼真的反事实情况,即使是对于具有不完整标签的样本也是如此。
更新时间: 2024-03-27 16:06:37
领域: cs.LG,cs.AI,cs.CV,stat.ML
Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering Tasks
Computational efficiency and non-adversarial robustness are critical factors in real-world engineering applications. Yet, conventional neural networks often fall short in addressing both simultaneously, or even separately. Drawing insights from natural physical systems and existing literature, it is known that an input convex architecture enhances computational efficiency, while a Lipschitz-constrained architecture bolsters non-adversarial robustness. By leveraging the strengths of convexity and Lipschitz continuity, we develop a novel network architecture, termed Input Convex Lipschitz Recurrent Neural Networks. This model is explicitly designed for fast and robust optimization-based tasks and outperforms existing recurrent units across a spectrum of engineering tasks in terms of computational efficiency and non-adversarial robustness, including real-world solar irradiance prediction for Solar PV system planning at LHT Holdings in Singapore and real-time Model Predictive Control optimization for a nonlinear chemical reactor.
Updated: 2024-03-27 16:06:34
标题: 输入凸Lipschitz RNN:一种用于工程任务的快速和稳健方法
摘要: 计算效率和非对抗性稳健性是现实世界工程应用中的关键因素。然而,传统的神经网络通常无法同时或甚至分别解决这两个问题。通过从自然物理系统和现有文献中获取的见解,已知输入凸架构增强了计算效率,而利普希茨约束架构增强了非对抗性稳健性。通过利用凸性和利普希茨连续性的优势,我们开发了一种新颖的网络架构,称为输入凸利普希茨循环神经网络。该模型专门设计用于快速和稳健的基于优化的任务,并在计算效率和非对抗性稳健性方面胜过现有的循环单元,在包括新加坡LHT Holdings的太阳能光伏系统规划的实际太阳辐射预测和非线性化学反应器的实时模型预测控制优化在内的一系列工程任务中表现出色。
更新时间: 2024-03-27 16:06:34
领域: cs.LG,cs.CE,cs.SY,eess.SY
Statistical testing of random number generators and their improvement using randomness extraction
Random number generators (RNGs) are notoriously hard to build and test, especially in a cryptographic setting. Although one cannot conclusively determine the quality of an RNG by testing the statistical properties of its output alone, running numerical tests is both a powerful verification tool and the only universally applicable method. In this work, we present and make available a comprehensive statistical testing environment (STE) that is based on existing statistical test suites. The STE can be parameterised to run lightweight (i.e. fast) all the way to intensive testing, which goes far beyond what is required by certification bodies. With it, we benchmark the statistical properties of several RNGs, comparing them against each other. We then present and implement a variety of post-processing methods, in the form of randomness extractors, which improve the RNG's output quality under different sets of assumptions and analyse their impact through numerical testing with the STE.
Updated: 2024-03-27 16:05:02
标题: 随机数生成器的统计测试及其通过随机性提取的改进
摘要: 随机数生成器(RNGs)在加密环境中建立和测试起来非常困难。虽然仅通过测试其输出的统计特性无法确定RNG的质量,但运行数值测试既是一种强大的验证工具,也是唯一普遍适用的方法。在这项工作中,我们提出并提供了一个基于现有统计测试套件的全面统计测试环境(STE)。STE可以被参数化以进行轻量级(即快速)到深度测试,远远超出认证机构的要求。我们使用它来对比几种RNG的统计特性,并提出和实施了各种后处理方法,以提高RNG在不同假设集下的输出质量,并通过STE进行数值测试来分析它们的影响。
更新时间: 2024-03-27 16:05:02
领域: cs.CR,quant-ph
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
Large Vision-Language Models (LVLMs) are increasingly adept at generating contextually detailed and coherent responses from visual inputs. However, their application in multimodal decision-making and open-ended generation is hindered by a notable rate of hallucinations, where generated text inaccurately represents the visual contents. To address this issue, this paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference. Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules. ICD contrasts distributions from standard and instruction disturbance, thereby increasing alignment uncertainty and effectively subtracting hallucinated concepts from the original distribution. Through comprehensive experiments on discriminative benchmarks (POPE and MME) and a generative benchmark (LLaVa-Bench), we demonstrate that ICD significantly mitigates both object-level and attribute-level hallucinations. Moreover, our method not only addresses hallucinations but also significantly enhances the general perception and recognition capabilities of LVLMs.
Updated: 2024-03-27 16:04:47
标题: 使用指导对比解码技术减轻大规模视觉语言模型中的幻觉
摘要: 大型视觉语言模型(LVLMs)越来越擅长从视觉输入中生成具有上下文细节和连贯性的响应。然而,它们在多模态决策和开放式生成中的应用受到幻觉的显著影响,即生成的文本不准确地代表视觉内容。为了解决这个问题,本文介绍了指令对比解码(ICD)方法,这是一种旨在减少LVLM推理过程中幻觉的新方法。我们的方法灵感来自于我们观察到所谓的干扰指令在多模态融合模块中明显加剧了幻觉。ICD对比标准和指令干扰的分布,从而增加对齐的不确定性,并有效地从原始分布中减去幻觉概念。通过对辨别基准(POPE和MME)和生成基准(LLaVa-Bench)的全面实验,我们证明了ICD显著减轻了物体级和属性级的幻觉。此外,我们的方法不仅解决了幻觉问题,还显著增强了LVLM的一般感知和识别能力。
更新时间: 2024-03-27 16:04:47
领域: cs.CV,cs.AI,cs.CL,cs.MM
ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review
ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e. sustainability, privacy, digital divide, and ethics and suggests that not only chatGPT but every subsequent entry in the category of conversational bots should undergo Sustainability, PrivAcy, Digital divide, and Ethics (SPADE) evaluation. This paper discusses in detail the issues and concerns raised over chatGPT in line with aforementioned characteristics. We also discuss the recent EU AI Act briefly in accordance with the SPADE evaluation. We support our hypothesis by some preliminary data collection and visualizations along with hypothesized facts. We also suggest mitigations and recommendations for each of the concerns. Furthermore, we also suggest some policies and recommendations for EU AI policy act concerning ethics, digital divide, and sustainability.
Updated: 2024-03-27 16:03:32
标题: ChatGPT 需要 SPADE(可持续性、隐私、数字鸿沟和道德)评估:一项综述
摘要: ChatGPT是另一个大型语言模型(LLM),广泛可用于消费者的设备上,但由于其性能和有效对话的能力,它在研究和工业界中获得了巨大的流行。最近,许多研究已经发表,展示了chatGPT和其他LLMs的有效性,效率,整合和情感。相比之下,本研究侧重于经常被忽视的重要方面,即可持续性,隐私,数字鸿沟和伦理,并建议不仅chatGPT,而且该类别中的每一个后续条目都应该接受可持续性,隐私,数字鸿沟和伦理(SPADE)评估。本文详细讨论了与上述特征相关的chatGPT引发的问题和担忧。我们还根据SPADE评估简要讨论了最近的欧盟人工智能法案。我们通过一些初步数据收集和可视化以及假设的事实支持我们的假设。我们还针对每个问题提出了缓解措施和建议。此外,我们还就伦理,数字鸿沟和可持续性提出了一些欧盟人工智能政策法案的政策和建议。
更新时间: 2024-03-27 16:03:32
领域: cs.CY,cs.AI,cs.CL,cs.LG
Empowering Data Mesh with Federated Learning
The evolution of data architecture has seen the rise of data lakes, aiming to solve the bottlenecks of data management and promote intelligent decision-making. However, this centralized architecture is limited by the proliferation of data sources and the growing demand for timely analysis and processing. A new data paradigm, Data Mesh, is proposed to overcome these challenges. Data Mesh treats domains as a first-class concern by distributing the data ownership from the central team to each data domain, while keeping the federated governance to monitor domains and their data products. Many multi-million dollar organizations like Paypal, Netflix, and Zalando have already transformed their data analysis pipelines based on this new architecture. In this decentralized architecture where data is locally preserved by each domain team, traditional centralized machine learning is incapable of conducting effective analysis across multiple domains, especially for security-sensitive organizations. To this end, we introduce a pioneering approach that incorporates Federated Learning into Data Mesh. To the best of our knowledge, this is the first open-source applied work that represents a critical advancement toward the integration of federated learning methods into the Data Mesh paradigm, underscoring the promising prospects for privacy-preserving and decentralized data analysis strategies within Data Mesh architecture.
Updated: 2024-03-27 16:01:00
标题: 用联邦学习强化数据网格
摘要: 数据架构的演变已经看到数据湖的崛起,旨在解决数据管理的瓶颈并促进智能决策。然而,这种集中式架构受到数据源的激增和对及时分析和处理的不断增长需求的限制。提出了一种新的数据范式,Data Mesh,旨在克服这些挑战。Data Mesh将领域视为首要关注点,通过将数据所有权从中央团队分配给每个数据领域来处理,同时保持联邦治理以监控领域及其数据产品。像Paypal、Netflix和Zalando等许多亿美元的组织已经基于这种新架构转变了他们的数据分析管道。在这种数据由每个领域团队本地保存的分散式架构中,传统的集中式机器学习无法在多个领域之间进行有效分析,尤其是对于安全敏感的组织。为此,我们引入了一种开创性的方法,将联邦学习融入Data Mesh中。据我们所知,这是第一个代表关键进展的开源应用工作,向Data Mesh范式中集成联邦学习方法迈出了重要一步,强调了在Data Mesh架构内进行隐私保护和分散式数据分析策略的前景。
更新时间: 2024-03-27 16:01:00
领域: cs.LG,cs.DC
SAT-NGP : Unleashing Neural Graphics Primitives for Fast Relightable Transient-Free 3D reconstruction from Satellite Imagery
Current stereo-vision pipelines produce high accuracy 3D reconstruction when using multiple pairs or triplets of satellite images. However, these pipelines are sensitive to the changes between images that can occur as a result of multi-date acquisitions. Such variations are mainly due to variable shadows, reflexions and transient objects (cars, vegetation). To take such changes into account, Neural Radiance Fields (NeRF) have recently been applied to multi-date satellite imagery. However, Neural methods are very compute-intensive, taking dozens of hours to learn, compared with minutes for standard stereo-vision pipelines. Following the ideas of Instant Neural Graphics Primitives we propose to use an efficient sampling strategy and multi-resolution hash encoding to accelerate the learning. Our model, Satellite Neural Graphics Primitives (SAT-NGP) decreases the learning time to 15 minutes while maintaining the quality of the 3D reconstruction.
Updated: 2024-03-27 15:58:25
标题: SAT-NGP:释放神经图形基元,从卫星图像实现快速、可重光、无暂变的3D重建
摘要: 目前的立体视觉流程在使用多对或三元卫星图像时能够产生高精度的3D重建。然而,这些流程对图像之间的变化非常敏感,这些变化可能是由于多日期采集导致的。这些变化主要是由于可变的阴影、反射和瞬时物体(汽车、植被)引起的。为了考虑这些变化,最近已将神经辐射场(NeRF)应用于多日期卫星图像。然而,神经方法非常计算密集,学习需要几十个小时,而标准立体视觉流程只需要几分钟。我们提出使用高效的采样策略和多分辨率哈希编码来加速学习,遵循即时神经图形基元的思想。我们的模型,卫星神经图形基元(SAT-NGP),将学习时间减少到15分钟,同时保持3D重建的质量。
更新时间: 2024-03-27 15:58:25
领域: cs.CV,cs.AI
Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching
In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback--Leibler divergence, this is in general not hold true for the Wasserstein distance. In this paper, we introduce a conditional Wasserstein distance via a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. Interestingly, the dual formulation of the conditional Wasserstein-1 flow resembles losses in the conditional Wasserstein GAN literature in a quite natural way. We derive theoretical properties of the conditional Wasserstein distance, characterize the corresponding geodesics and velocity fields as well as the flow ODEs. Subsequently, we propose to approximate the velocity fields by relaxing the conditional Wasserstein distance. Based on this, we propose an extension of OT Flow Matching for solving Bayesian inverse problems and demonstrate its numerical advantages on an inverse problem and class-conditional image generation.
Updated: 2024-03-27 15:54:55
标题: 具有贝叶斯OT流匹配应用的条件Wasserstein距离
摘要: 在逆问题中,许多条件生成模型通过最小化联合测度与其学习逼近之间的距离来近似后验测度。虽然这种方法在Kullback-Leibler散度情况下也控制后验测度之间的距离,但通常对于Wasserstein距离并不成立。在本文中,我们通过一组受限耦合引入了一种条件Wasserstein距离,它等于后验的期望Wasserstein距离。有趣的是,条件Wasserstein-1流的对偶形式在条件Wasserstein GAN文献中的损失以一种非常自然的方式类似。我们推导了条件Wasserstein距离的理论特性,描述了相应的测地线和速度场以及流ODEs。随后,我们提出通过放松条件Wasserstein距离来近似速度场。基于此,我们提出了一种用于解决贝叶斯逆问题的OT流匹配扩展,并展示了它在逆问题和类条件图像生成上的数值优势。
更新时间: 2024-03-27 15:54:55
领域: cs.LG,math.OC
Incorporating simulated spatial context information improves the effectiveness of contrastive learning models
Visual learning often occurs in a specific context, where an agent acquires skills through exploration and tracking of its location in a consistent environment. The historical spatial context of the agent provides a similarity signal for self-supervised contrastive learning. We present a unique approach, termed Environmental Spatial Similarity (ESS), that complements existing contrastive learning methods. Using images from simulated, photorealistic environments as an experimental setting, we demonstrate that ESS outperforms traditional instance discrimination approaches. Moreover, sampling additional data from the same environment substantially improves accuracy and provides new augmentations. ESS allows remarkable proficiency in room classification and spatial prediction tasks, especially in unfamiliar environments. This learning paradigm has the potential to enable rapid visual learning in agents operating in new environments with unique visual characteristics. Potentially transformative applications span from robotics to space exploration. Our proof of concept demonstrates improved efficiency over methods that rely on extensive, disconnected datasets.
Updated: 2024-03-27 15:49:52
标题: 将模拟的空间背景信息纳入其中可以提高对比学习模型的有效性
摘要: 视觉学习通常发生在特定的环境中,一个代理通过探索和跟踪其在一致环境中的位置来获得技能。代理的历史空间环境为自监督对比学习提供了相似性信号。我们提出了一种独特的方法,称为环境空间相似性(ESS),它补充了现有的对比学习方法。在模拟的逼真环境中使用图像作为实验设置,我们证明了ESS优于传统的实例区分方法。此外,从相同环境中采样额外数据显著提高了准确性并提供了新的增强。ESS允许在房间分类和空间预测任务中表现出卓越的熟练度,特别是在陌生环境中。这种学习范式有潜力使代理在具有独特视觉特征的新环境中实现快速视觉学习。潜在的变革性应用领域涵盖从机器人技术到太空探索。我们的概念验证证明了其在依赖广泛、不连贯数据集的方法上的效率提高。
更新时间: 2024-03-27 15:49:52
领域: cs.CV,cs.AI
Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling
Motivated by applications in large-scale and multi-agent reinforcement learning, we study the non-asymptotic performance of stochastic approximation (SA) schemes with delayed updates under Markovian sampling. While the effect of delays has been extensively studied for optimization, the manner in which they interact with the underlying Markov process to shape the finite-time performance of SA remains poorly understood. In this context, our first main contribution is to show that under time-varying bounded delays, the delayed SA update rule guarantees exponentially fast convergence of the \emph{last iterate} to a ball around the SA operator's fixed point. Notably, our bound is \emph{tight} in its dependence on both the maximum delay $\tau_{max}$, and the mixing time $\tau_{mix}$. To achieve this tight bound, we develop a novel inductive proof technique that, unlike various existing delayed-optimization analyses, relies on establishing uniform boundedness of the iterates. As such, our proof may be of independent interest. Next, to mitigate the impact of the maximum delay on the convergence rate, we provide the first finite-time analysis of a delay-adaptive SA scheme under Markovian sampling. In particular, we show that the exponent of convergence of this scheme gets scaled down by $\tau_{avg}$, as opposed to $\tau_{max}$ for the vanilla delayed SA rule; here, $\tau_{avg}$ denotes the average delay across all iterations. Moreover, the adaptive scheme requires no prior knowledge of the delay sequence for step-size tuning. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.
Updated: 2024-03-27 15:48:29
标题: 带有延迟更新的随机逼近:马尔可夫采样下的有限时间速率
摘要: 受大规模和多智能体强化学习应用启发,我们研究了在马尔可夫采样下延迟更新的随机逼近(SA)方案的非渐近性能。尽管延迟的影响已经在优化方面得到了广泛研究,但它们与底层马尔可夫过程相互作用以塑造SA有限时间性能的方式仍然不太清楚。在这个背景下,我们的第一个主要贡献是展示,在时间变化有界延迟的情况下,延迟的SA更新规则确保了\emph{最后迭代}指数级快速收敛到SA算子固定点周围的一个球体。值得注意的是,我们的界限在最大延迟$\tau_{max}$和混合时间$\tau_{mix}$上的依赖是\emph{紧致}的。为了实现这个紧致界限,我们开发了一种新颖的归纳证明技术,与各种现有延迟优化分析不同,该技术依赖于建立迭代的一致有界性。因此,我们的证明可能具有独立的兴趣。接下来,为了减轻最大延迟对收敛速度的影响,我们提供了第一个在马尔可夫采样下的延迟自适应SA方案的有限时间分析。特别地,我们展示了这个方案的收敛指数会被$\tau_{avg}$缩小,而不是对于普通延迟SA规则是$\tau_{max}$;这里,$\tau_{avg}$表示所有迭代中的平均延迟。此外,自适应方案不需要先验知识来调节步长的延迟序列。我们的理论发现揭示了延迟对广泛算法类别的有限时间影响,包括TD学习、Q学习和马尔可夫采样下的随机梯度下降。
更新时间: 2024-03-27 15:48:29
领域: cs.LG,cs.AI,cs.MA,cs.SY,eess.SY,math.OC
Contrastive Learning with Orthonormal Anchors (CLOA)
This study focuses on addressing the instability issues prevalent in contrastive learning, specifically examining the InfoNCE loss function and its derivatives. We reveal a critical observation that these loss functions exhibit a restrictive behavior, leading to a convergence phenomenon where embeddings tend to merge into a singular point. This "over-fusion" effect detrimentally affects classification accuracy in subsequent supervised-learning tasks. Through theoretical analysis, we demonstrate that embeddings, when equalized or confined to a rank-1 linear subspace, represent a local minimum for InfoNCE. In response to this challenge, our research introduces an innovative strategy that leverages the same or fewer labeled data than typically used in the fine-tuning phase. The loss we proposed, Orthonormal Anchor Regression Loss, is designed to disentangle embedding clusters, significantly enhancing the distinctiveness of each embedding while simultaneously ensuring their aggregation into dense, well-defined clusters. Our method demonstrates remarkable improvements with just a fraction of the conventional label requirements, as evidenced by our results on CIFAR10 and CIFAR100 datasets.
Updated: 2024-03-27 15:48:16
标题: 正交锚定的对比学习(CLOA)
摘要: 本研究着重解决对比学习中普遍存在的不稳定性问题,具体考察了InfoNCE损失函数及其导数。我们揭示了一个关键观察结果,即这些损失函数表现出限制性行为,导致嵌入趋向于融合成一个奇异点的收敛现象。这种“过度融合”效应对后续监督学习任务中的分类准确性产生不利影响。通过理论分析,我们证明了当嵌入被等量化或限制在一个秩为1的线性子空间中时,对于InfoNCE来说代表着一个局部最小值。为了应对这一挑战,我们的研究引入了一种创新策略,利用与通常在微调阶段使用的相同或更少的标记数据。我们提出的损失函数,正交锚回归损失,旨在解开嵌入簇,显著增强每个嵌入的独特性,同时确保它们聚合成密集、明确定义的簇。我们的方法在CIFAR10和CIFAR100数据集上取得了显著的改进,仅需一小部分传统标记需求,这一点可以从我们的结果中看出。
更新时间: 2024-03-27 15:48:16
领域: cs.LG,cs.AI
Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning
Semi-supervised learning (SSL) methods assume that labeled data, unlabeled data and test data are from the same distribution. Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers). Most previous works focused on outlier detection via binary classifiers, which suffer from insufficient scalability and inability to distinguish different types of uncertainty. In this paper, we propose a novel framework, Adaptive Negative Evidential Deep Learning (ANEDL) to tackle these limitations. Concretely, we first introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference. Furthermore, we propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers. As demonstrated empirically, our proposed method outperforms existing state-of-the-art methods across four datasets.
Updated: 2024-03-27 15:44:25
标题: 适应性负证据深度学习在开放式半监督学习中的应用
摘要: 半监督学习(SSL)方法假设标记数据、未标记数据和测试数据来自同一分布。开放集半监督学习(Open-set SSL)考虑了一个更实际的场景,即未标记数据和测试数据包含标记数据中未观察到的新类别(离群值)。大多数先前的工作集中于通过二元分类器进行异常检测,这些分类器存在扩展性不足和无法区分不同类型的不确定性的问题。在本文中,我们提出了一个新颖的框架,自适应负证据深度学习(ANEDL)来解决这些限制。具体来说,我们首先引入了证据深度学习(EDL)作为异常检测器,用于量化不同类型的不确定性,并为自训练和推断设计了不同的不确定性指标。此外,我们提出了一种新颖的自适应负优化策略,使EDL更适应同时包含内群值和离群值的未标记数据集。经验证实证,我们提出的方法在四个数据集上优于现有的最先进方法。
更新时间: 2024-03-27 15:44:25
领域: cs.LG,cs.AI,cs.CV
Annolid: Annotate, Segment, and Track Anything You Need
Annolid is a deep learning-based software package designed for the segmentation, labeling, and tracking of research targets within video files, focusing primarily on animal behavior analysis. Based on state-of-the-art instance segmentation methods, Annolid now harnesses the Cutie video object segmentation model to achieve resilient, markerless tracking of multiple animals from single annotated frames, even in environments in which they may be partially or entirely concealed by environmental features or by one another. Our integration of Segment Anything and Grounding-DINO strategies additionally enables the automatic masking and segmentation of recognizable animals and objects by text command, removing the need for manual annotation. Annolid's comprehensive approach to object segmentation flexibly accommodates a broad spectrum of behavior analysis applications, enabling the classification of diverse behavioral states such as freezing, digging, pup huddling, and social interactions in addition to the tracking of animals and their body parts.
Updated: 2024-03-27 15:41:23
标题: Annolid:注释、分割和跟踪您所需的任何内容
摘要: Annolid是一款基于深度学习的软件包,旨在对视频文件中的研究目标进行分割、标记和跟踪,主要关注动物行为分析。基于最先进的实例分割方法,Annolid现在利用Cutie视频对象分割模型实现了多个动物从单个注释帧中韧性的、无标记的跟踪,即使它们可能在环境中被部分或完全隐藏在环境特征或彼此之间。我们还整合了Segment Anything和Grounding-DINO策略,通过文本命令自动掩蔽和分割可识别的动物和物体,消除了手动注释的需要。Annolid对对象分割的全面方法灵活地适应了广泛的行为分析应用,使得可以对不同的行为状态进行分类,如冻结、挖掘、幼崽蜷缩和社交互动,除了跟踪动物和它们的身体部分。
更新时间: 2024-03-27 15:41:23
领域: cs.CV,cs.AI
InceptionTime vs. Wavelet -- A comparison for time series classification
Neural networks were used to classify infrasound data. Two different approaches were compared. One based on the direct classification of time series data, using a custom implementation of the InceptionTime network. For the other approach, we generated 2D images of the wavelet transformation of the signals, which were subsequently classified using a ResNet implementation. Choosing appropriate hyperparameter settings, both achieve a classification accuracy of above 90 %, with the direct approach reaching 95.2 %.
Updated: 2024-03-27 15:34:27
标题: InceptionTime与小波变换的比较:时间序列分类
摘要: 神经网络被用于分类次声数据。比较了两种不同的方法。一种基于对时间序列数据的直接分类,使用了InceptionTime网络的自定义实现。另一种方法是生成信号的小波变换的2D图像,随后使用ResNet实现进行分类。通过选择适当的超参数设置,两种方法都实现了超过90%的分类准确率,其中直接方法达到了95.2%。
更新时间: 2024-03-27 15:34:27
领域: cs.LG,I.5.4; J.2
Representatividad Muestral en la Incertidumbre Simétrica Multivariada para la Selección de Atributos
In this work, we analyze the behavior of the multivariate symmetric uncertainty (MSU) measure through the use of statistical simulation techniques under various mixes of informative and non-informative randomly generated features. Experiments show how the number of attributes, their cardinalities, and the sample size affect the MSU. In this thesis, through observation of results, it is proposed an heuristic condition that preserves good quality in the MSU under different combinations of these three factors, providing a new useful criterion to help drive the process of dimension reduction. -- En el presente trabajo hemos analizado el comportamiento de una versi\'on multivariada de la incertidumbre sim\'etrica a trav\'es de t\'ecnicas de simulaci\'on estad\'isticas sobre varias combinaciones de atributos informativos y no-informativos generados de forma aleatoria. Los experimentos muestran como el n\'umero de atributos, sus cardinalidades y el tama\~no muestral afectan al MSU como medida. En esta tesis, mediante la observaci\'on de resultados hemos propuesto una condici\'on que preserva una buena calidad en el MSU bajo diferentes combinaciones de los tres factores mencionados, lo cual provee un nuevo y valioso criterio para llevar a cabo el proceso de reducci\'on de dimensionalidad.
Updated: 2024-03-27 15:29:08
标题: Sample Representativeness in Multivariate Symmetric Uncertainty for the Selection of Attributes
摘要: In this work, we have analyzed the behavior of a multivariate version of the symmetric uncertainty through statistical simulation techniques on various combinations of informative and non-informative randomly generated features. Experiments show how the number of attributes, their cardinalities, and the sample size affect the MSU as a measure. In this thesis, through the observation of results, we have proposed a condition that preserves good quality in the MSU under different combinations of the three mentioned factors, providing a new and valuable criterion to carry out the process of dimension reduction.
更新时间: 2024-03-27 15:29:08
领域: cs.IT,cs.LG,math.IT,math.ST,stat.TH
TransFusion: Contrastive Learning with Transformers
This paper proposes a novel framework, TransFusion, designed to make the process of contrastive learning more analytical and explainable. TransFusion consists of attention blocks whose softmax being replaced by ReLU, and its final block's weighted-sum operation is truncated to leave the adjacency matrix as the output. The model is trained by minimizing the Jensen-Shannon Divergence between its output and the target affinity matrix, which indicates whether each pair of samples belongs to the same or different classes. The main contribution of TransFusion lies in defining a theoretical limit for answering two fundamental questions in the field: the maximum level of data augmentation and the minimum batch size required for effective contrastive learning. Furthermore, experimental results indicate that TransFusion successfully extracts features that isolate clusters from complex real-world data, leading to improved classification accuracy in downstream tasks.
Updated: 2024-03-27 15:24:54
标题: TransFusion:使用Transformer进行对比学习
摘要: 本文提出了一个新颖的框架TransFusion,旨在使对比学习过程更具分析性和可解释性。TransFusion包括注意力块,其softmax被ReLU替换,最终块的加权和操作被截断,以将邻接矩阵作为输出。该模型通过最小化其输出与目标亲和矩阵之间的Jensen-Shannon散度来训练,该矩阵指示每对样本是否属于同一类别或不同类别。TransFusion的主要贡献在于为回答该领域中两个基本问题定义了一个理论限制:数据增强的最大水平和有效对比学习所需的最小批量大小。此外,实验结果表明TransFusion成功提取了能够从复杂的现实世界数据中分离出聚类的特征,从而提高了下游任务中的分类准确性。
更新时间: 2024-03-27 15:24:54
领域: cs.LG,cs.AI
NL-ITI: Optimizing Probing and Intervention for Improvement of ITI Method
Large Language Models (LLM) are prone to returning false information. It constitutes one of major challenges in the AI field. In our work, we explore paradigm introduced by Inference-Time-Intervention (ITI). In first stage, it identifies attention heads, which contain the highest amount of desired type of knowledge (e.g., truthful). Afterwards, during inference, LLM activations are shifted for chosen subset of attention heads. We further improved the ITI framework by introducing a nonlinear probing and multi-token intervention - Non-Linear ITI (NL-ITI). NL-ITI is tested on diverse multiple-choice benchmarks, including TruthfulQA, on which we report around 14% MC1 metric improvement with respect to the baseline ITI results. NL-ITI achieves also encouraging results on other testsets - on Business Ethics subdomain of MMLU, around 18% MC1 improvement over baseline LLaMA2-7B. Additionally, NL-ITI performs better while being less invasive in the behavior of LLM at the same time (as measured by Kullback-Leibler divergence).
Updated: 2024-03-27 15:22:16
标题: NL-ITI: 优化探测和干预以改进ITI方法
摘要: 大型语言模型(LLM)很容易返回错误信息。这构成了人工智能领域的一个主要挑战。在我们的工作中,我们探索了由推理时间干预(ITI)引入的范式。在第一阶段,它识别包含所需类型知识(例如真实)最多的注意力头。随后,在推理过程中,LLM激活被转移到所选择的注意力头子集。我们通过引入非线性探测和多令牌干预 - 非线性ITI(NL-ITI)进一步改进了ITI框架。NL-ITI在包括TruthfulQA在内的多种选择基准上进行了测试,我们报告了相对于基线ITI结果约14%的MC1指标改进。NL-ITI还在其他测试集上取得了令人鼓舞的结果 - 在MMLU的商业伦理子领域上,相对于基线LLaMA2-7B,MC1指标提高了约18%。此外,NL-ITI在LLM行为方面表现更好,同时干扰更少(通过Kullback-Leibler散度测量)。
更新时间: 2024-03-27 15:22:16
领域: cs.CL,cs.LG
Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language
Relation extraction is essential for extracting and understanding biographical information in the context of digital humanities and related subjects. There is a growing interest in the community to build datasets capable of training machine learning models to extract relationships. However, annotating such datasets can be expensive and time-consuming, in addition to being limited to English. This paper applies guided distant supervision to create a large biographical relationship extraction dataset for German. Our dataset, composed of more than 80,000 instances for nine relationship types, is the largest biographical German relationship extraction dataset. We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision. We train several state-of-the-art machine learning models on the automatically created dataset and release them as well. Furthermore, we experiment with multilingual and cross-lingual experiments that could benefit many low-resource languages.
Updated: 2024-03-27 15:15:16
标题: Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language 多语言关系抽取数据的引导式远程监督:适应新语言
摘要: 关系提取对于在数字人文学和相关学科背景下提取和理解传记信息至关重要。社区对构建能够训练机器学习模型以提取关系的数据集越来越感兴趣。然而,标注这些数据集可能会昂贵且耗时,而且仅限于英语。本文应用引导式远程监督方法创建了一个大型的德语传记关系提取数据集。我们的数据集包含了九种关系类型的超过80,000个实例,是迄今为止最大的德语传记关系提取数据集。我们还创建了一个手动标注的数据集,包含了2000个实例用于评估模型,并与使用引导式远程监督方法编制的数据集一起发布。我们在自动创建的数据集上训练了几种最先进的机器学习模型,并将它们发布出来。此外,我们还进行了多语言和跨语言实验,这有助于许多资源匮乏的语言。
更新时间: 2024-03-27 15:15:16
领域: cs.CL,cs.LG
Fact Checking Beyond Training Set
Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propose novel algorithms to address this problem. We propose an adversarial algorithm to make the retriever component robust against distribution shift. Our core idea is to initially train a bi-encoder on the labeled source data, and then, to adversarially train two separate document and claim encoders using unlabeled target data. We then focus on the reader component and propose to train it such that it is insensitive towards the order of claims and evidence documents. Our empirical evaluations support the hypothesis that such a reader shows a higher robustness against distribution shift. To our knowledge, there is no publicly available multi-topic fact checking dataset. Thus, we propose a simple automatic method to re-purpose two well-known fact checking datasets. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models, including recent domain adaptation models that use GPT4 for generating synthetic data.
Updated: 2024-03-27 15:15:14
标题: 事实核查超越训练集
摘要: 评估日常索赔的真实性耗时,并且在某些情况下需要领域专业知识。我们通过实证方法证明,常用的事实检查流程,即检索器-阅读器,在从一个领域的标记数据进行训练并在另一个领域中使用时会出现性能下降。随后,我们深入研究流程的每个组件,并提出新颖的算法来解决这个问题。我们提出了一种对抗算法,使检索器组件能够抵抗分布转移。我们的核心思想是首先在标记源数据上训练一个双编码器,然后对两个分别使用未标记目标数据训练的文档和索赔编码器进行对抗训练。然后,我们专注于阅读器组件,并提出训练方法,使其不受索赔和证据文档顺序的影响。我们的实证评估支持这样一个阅读器对抵抗分布转移具有更高的稳健性的假设。据我们所知,目前没有公开可用的多主题事实检查数据集。因此,我们提出了一种简单的自动方法来重新利用两个知名的事实检查数据集。然后,我们从这些数据集构建了八个事实检查场景,并将我们的模型与一组强基线模型进行比较,包括最近使用GPT4生成合成数据的领域适应模型。
更新时间: 2024-03-27 15:15:14
领域: cs.CL,cs.LG
Aiming for Relevance
Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care. 10 pages, 9 figures.
Updated: 2024-03-27 15:11:07
标题: 瞄准相关性
摘要: 生命体征在重症监护室(ICU)中至关重要。它们用于跟踪患者的状态,并识别临床上显著的变化。预测生命体征轨迹对于早期发现不良事件至关重要。然而,传统的机器学习指标如RMSE通常无法捕捉到这些预测的真正临床相关性。我们引入了与临床背景一致的新型生命体征预测性能指标,重点关注与临床规范的偏差、总体趋势和趋势偏差。这些指标是通过与ICU临床医生进行访谈在之前的研究中获得的经验效用曲线导出的。我们使用模拟和真实临床数据集(MIMIC和eICU)验证了这些指标的实用性。此外,我们将这些指标作为神经网络的损失函数,导致模型在预测临床重要事件方面表现出色。这项研究为临床相关的机器学习模型评估和优化铺平了道路,有望改善ICU患者护理。共10页,9幅图。
更新时间: 2024-03-27 15:11:07
领域: cs.LG,cs.AI,cs.HC,stat.ML
SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies
Advances in mobile and wearable technologies have enabled the potential to passively monitor a person's mental, behavioral, and affective health. These approaches typically rely on longitudinal collection of self-reported outcomes, e.g., depression, stress, and anxiety, to train machine learning (ML) models. However, the need to continuously self-report adds a significant burden on the participants, often resulting in attrition, missing labels, or insincere responses. In this work, we introduce the Scale Scores Simulation using Mental Models (SeSaMe) framework to alleviate participants' burden in digital mental health studies. By leveraging pre-trained large language models (LLMs), SeSaMe enables the simulation of participants' responses on psychological scales. In SeSaMe, researchers can prompt LLMs with information on participants' internal behavioral dispositions, enabling LLMs to construct mental models of participants to simulate their responses on psychological scales. We demonstrate an application of SeSaMe, where we use GPT-4 to simulate responses on one scale using responses from another as behavioral information. We also evaluate the alignment between human and SeSaMe-simulated responses to psychological scales. Then, we present experiments to inspect the utility of SeSaMe-simulated responses as ground truth in training ML models by replicating established depression and anxiety screening tasks from a previous study. Our results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives. We also observed that model performance with simulated data was on par with using the real data for training in most evaluation scenarios. We conclude by discussing the potential implications of SeSaMe in addressing some challenges researchers face with ground-truth collection in passive sensing studies.
Updated: 2024-03-27 15:08:31
标题: SeSaMe:用于模拟自我报告地面真实情况的框架,用于心理健康感知研究
摘要: 移动和可穿戴技术的进步使得能够 passively 监测个人的心理、行为和情感健康的潜力得以实现。这些方法通常依赖于长期收集自我报告的结果,例如抑郁、压力和焦虑,以训练机器学习(ML)模型。然而,持续自我报告的需求给参与者带来了重大负担,通常导致流失、缺失标签或不真诚的回应。在这项工作中,我们引入了使用心理模型的规模分数模拟(SeSaMe)框架,以减轻数字心理健康研究中参与者的负担。通过利用预先训练的大型语言模型(LLMs),SeSaMe能够模拟参与者在心理量表上的回应。在SeSaMe中,研究人员可以向LLMs提供有关参与者内部行为倾向的信息,使LLMs能够构建参与者的心理模型,从而模拟他们在心理量表上的回应。我们展示了SeSaMe的一个应用,我们使用GPT-4来模拟一个量表上的回应,使用另一个量表上的回应作为行为信息。我们还评估了人类和SeSaMe模拟回应之间对心理量表的对齐情况。然后,我们展示了实验,检查了SeSaMe模拟回应作为训练ML模型的基准真实性的效用,通过复制先前研究中建立的抑郁和焦虑筛查任务。我们的结果表明SeSaMe是一种有前途的方法,但其对齐可能会在不同量表和具体预测目标之间变化。我们还观察到,在大多数评估情况下,使用模拟数据的模型性能与使用真实数据进行训练的性能相当。我们最后讨论了SeSaMe在解决研究人员在 passively 感知研究中面临的一些挑战时的潜在影响。
更新时间: 2024-03-27 15:08:31
领域: cs.HC,cs.AI,cs.CY
Neural Network-Based Piecewise Survival Models
In this paper, a family of neural network-based survival models is presented. The models are specified based on piecewise definitions of the hazard function and the density function on a partitioning of the time; both constant and linear piecewise definitions are presented, resulting in a family of four models. The models can be seen as an extension of the commonly used discrete-time and piecewise exponential models and thereby add flexibility to this set of standard models. Using a simulated dataset the models are shown to perform well compared to the highly expressive, state-of-the-art energy-based model, while only requiring a fraction of the computation time.
Updated: 2024-03-27 15:08:00
标题: 基于神经网络的分段生存模型
摘要: 在这篇论文中,介绍了一种基于神经网络的生存模型家族。这些模型是根据时间分区上的危险函数和密度函数的分段定义来指定的;提出了常数和线性分段定义,从而形成了四种模型。这些模型可以被看作是常用的离散时间和分段指数模型的扩展,从而为这组标准模型增加了灵活性。使用模拟数据集,表明这些模型与高度表达能力的最新能量模型相比表现良好,同时只需要计算时间的一小部分。
更新时间: 2024-03-27 15:08:00
领域: stat.ML,cs.LG,cs.SY,eess.SY
INEXA: Interactive and Explainable Process Model Abstraction Through Object-Centric Process Mining
Process events are recorded by multiple information systems at different granularity levels. Based on the resulting event logs, process models are discovered at different granularity levels, as well. Events stored at a fine-grained granularity level, for example, may hinder the discovered process model to be displayed due the high number of resulting model elements. The discovered process model of a real-world manufacturing process, for example, consists of 1,489 model elements and over 2,000 arcs. Existing process model abstraction techniques could help reducing the size of the model, but would disconnect it from the underlying event log. Existing event abstraction techniques do neither support the analysis of mixed granularity levels, nor interactive exploration of a suitable granularity level. To enable the exploration of discovered process models at different granularity levels, we propose INEXA, an interactive, explainable process model abstraction method that keeps the link to the event log. As a starting point, INEXA aggregates large process models to a "displayable" size, e.g., for the manufacturing use case to a process model with 58 model elements. Then, the process analyst can explore granularity levels interactively, while applied abstractions are automatically traced in the event log for explainability.
Updated: 2024-03-27 15:03:33
标题: INEXA:通过对象为中心的过程挖掘进行交互和可解释的过程模型抽象
摘要: 过程事件被记录在不同粒度级别的多个信息系统中。基于生成的事件日志,也会发现不同粒度级别的过程模型。例如,存储在细粒度级别的事件可能会由于产生的模型元素数量过多而阻碍显示发现的过程模型。例如,真实制造过程的发现的过程模型由1,489个模型元素和超过2,000个弧组成。现有的过程模型抽象技术可以帮助减小模型的大小,但会使其与基础事件日志断开连接。现有的事件抽象技术既不支持混合粒度级别的分析,也不支持交互式探索合适的粒度级别。为了能够在不同粒度级别探索发现的过程模型,我们提出了INEXA,一种交互式、可解释的过程模型抽象方法,保持与事件日志的联系。作为起点,INEXA将大型过程模型聚合到“可显示”的大小,例如,对于制造用例,将过程模型减少到58个模型元素。然后,过程分析师可以交互地探索粒度级别,同时应用的抽象会被自动跟踪在事件日志中以便解释。
更新时间: 2024-03-27 15:03:33
领域: cs.AI
Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity
This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-convex loss function or homogeneously distributed dataset, we conduct convergence analysis for not only strongly-convex but also non-convex loss function over heterogeneous dataset. According to our theoretical analysis, as long as the fraction of dataset from malicious users is less than half, RAGA can achieve convergence at rate $\mathcal{O}({1}/{T^{2/3- \delta}})$ where $T$ is the iteration number and $\delta \in (0, 2/3)$ for non-convex loss function, and at linear rate for strongly-convex loss function. Moreover, stationary point or global optimal solution is proved to obtainable as data heterogeneity vanishes. Experimental results corroborate the robustness of RAGA to Byzantine attacks and verifies the advantage of RAGA over baselines on convergence performance under various intensity of Byzantine attacks, for heterogeneous dataset.
Updated: 2024-03-27 14:57:54
标题: 拜占庭抗干扰的联邦学习及对数据异构性的适应性
摘要: 这篇论文涉及在存在恶意拜占庭攻击和数据异构性的情况下的联邦学习(FL)。提出了一种新颖的强健平均梯度算法(RAGA),该算法利用几何中位数进行聚合,并可以自由选择本地更新的轮数。与大多数现有的弹性方法不同,这些方法基于强凸损失函数或均匀分布的数据集进行收敛分析,我们对不仅强凸而且非凸损失函数在异构数据集上进行了收敛分析。根据我们的理论分析,只要来自恶意用户的数据集部分比例小于一半,RAGA可以实现收敛速度为$\mathcal{O}({1}/{T^{2/3- \delta}})$,其中$T$为迭代次数,$\delta \in (0, 2/3)$适用于非凸损失函数,并且对于强凸损失函数则具有线性收敛速度。此外,证明了当数据异构性消失时,可以获得稳定点或全局最优解。实验结果证实了RAGA对拜占庭攻击的强健性,并验证了在各种强度的拜占庭攻击下,RAGA相对于基线在异构数据集上的收敛性能优势。
更新时间: 2024-03-27 14:57:54
领域: cs.LG,cs.AI,cs.CR
Demystifying Misconceptions in Social Bots Research
Research on social bots aims at advancing knowledge and providing solutions to one of the most debated forms of online manipulation. Yet, social bot research is plagued by widespread biases, hyped results, and misconceptions that set the stage for ambiguities, unrealistic expectations, and seemingly irreconcilable findings. Overcoming such issues is instrumental towards ensuring reliable solutions and reaffirming the validity of the scientific method. In this contribution, we review some recent results in social bots research, highlighting and revising factual errors as well as methodological and conceptual biases. More importantly, we demystify common misconceptions, addressing fundamental points on how social bots research is discussed. Our analysis surfaces the need to discuss research about online disinformation and manipulation in a rigorous, unbiased, and responsible way. This article bolsters such effort by identifying and refuting common fallacious arguments used by both proponents and opponents of social bots research, as well as providing directions toward sound methodologies for future research in the field.
Updated: 2024-03-27 14:48:48
标题: 揭开社交机器人研究中的误解
摘要: 社交机器人研究旨在推进知识并提供解决方案,以应对最受争议的在线操纵形式之一。然而,社交机器人研究受到普遍偏见、夸大的结果和误解的困扰,这为模棱两可、不切实际的期望和看似无法调和的发现铺平了道路。克服这些问题对于确保可靠的解决方案并重新确认科学方法的有效性至关重要。在本文中,我们回顾了社交机器人研究中的一些最新结果,强调并修正了事实错误以及方法论和概念偏见。更重要的是,我们揭示了常见误解,讨论了社交机器人研究的基本观点。我们的分析凸显了以严谨、公正和负责任的方式讨论在线虚假信息和操纵研究的必要性。本文通过识别和驳斥社交机器人研究的支持者和反对者常用的谬误论点,并为未来研究领域提供了健全方法的方向,以加强这一努力。
更新时间: 2024-03-27 14:48:48
领域: cs.SI,cs.AI,cs.CY,cs.LG
LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral Competition
Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification tasks often suffer from limited labeled data. To help bridge these gaps, previous work developed neuro-inspired convolutional neural networks (CNNs) with sparse coding via the Locally Competitive Algorithm (LCA) in the first layer (i.e., LCANets) for computer vision. LCANets learn in a combination of supervised and unsupervised learning, reducing dependency on labeled samples. Motivated by the fact that auditory cortex is also sparse, we extend LCANets to audio recognition tasks and introduce LCANets++, which are CNNs that perform sparse coding in multiple layers via LCA. We demonstrate that LCANets++ are more robust than standard CNNs and LCANets against perturbations, e.g., background noise, as well as black-box and white-box attacks, e.g., evasion and fast gradient sign (FGSM) attacks.
Updated: 2024-03-27 14:47:41
标题: LCANets++:使用具有侧向竞争的多层神经网络进行强大的音频分类
摘要: 音频分类旨在识别音频信号,包括语音命令或声音事件。然而,当前的音频分类器容易受到扰动和对抗性攻击的影响。此外,现实世界中的音频分类任务通常受限于有限的标记数据。为了填补这些差距,先前的工作开发了通过局部竞争算法(LCA)在第一层进行稀疏编码的神经启发卷积神经网络(CNNs)(即LCANets)用于计算机视觉。LCANets通过组合监督和无监督学习来学习,减少对标记样本的依赖。受听觉皮层也是稀疏的事实的启发,我们将LCANets扩展到音频识别任务,并引入LCANets++,这是通过LCA在多层进行稀疏编码的CNNs。我们证明LCANets++比标准CNNs和LCANets更具鲁棒性,对扰动(如背景噪音)以及黑盒和白盒攻击(例如规避和快速梯度标志(FGSM)攻击)具有更好的抵抗能力。
更新时间: 2024-03-27 14:47:41
领域: cs.SD,cs.CR,cs.LG,eess.AS
Transformers-based architectures for stroke segmentation: A review
Stroke remains a significant global health concern, necessitating precise and efficient diagnostic tools for timely intervention and improved patient outcomes. The emergence of deep learning methodologies has transformed the landscape of medical image analysis. Recently, Transformers, initially designed for natural language processing, have exhibited remarkable capabilities in various computer vision applications, including medical image analysis. This comprehensive review aims to provide an in-depth exploration of the cutting-edge Transformer-based architectures applied in the context of stroke segmentation. It commences with an exploration of stroke pathology, imaging modalities, and the challenges associated with accurate diagnosis and segmentation. Subsequently, the review delves into the fundamental ideas of Transformers, offering detailed insights into their architectural intricacies and the underlying mechanisms that empower them to effectively capture complex spatial information within medical images. The existing literature is systematically categorized and analyzed, discussing various approaches that leverage Transformers for stroke segmentation. A critical assessment is provided, highlighting the strengths and limitations of these methods, including considerations of performance and computational efficiency. Additionally, this review explores potential avenues for future research and development
Updated: 2024-03-27 14:42:08
标题: 基于变压器的架构用于卒中分割:综述
摘要: 中风仍然是一个重要的全球健康问题,需要精确和高效的诊断工具,以进行及时干预并改善患者结果。深度学习方法的出现改变了医学图像分析的格局。最近,最初设计用于自然语言处理的Transformer,在包括医学图像分析在内的各种计算机视觉应用中展示出了显著的能力。这篇综合性评论旨在深入探讨在中风分割背景下应用的最新Transformer架构。它从探讨中风病理学、成像模式和准确诊断和分割所面临的挑战开始。随后,评论深入探讨了Transformer的基本思想,提供了对其架构复杂性和使其能够有效捕获医学图像中复杂空间信息的基本机制的详细见解。现有文献被系统地分类和分析,讨论了利用Transformer进行中风分割的各种方法。提供了对这些方法的优势和局限性进行的关键评估,包括对性能和计算效率的考虑。此外,这篇评论探讨了未来研究和发展的潜在途径。
更新时间: 2024-03-27 14:42:08
领域: eess.IV,cs.CV,cs.LG
Fusion approaches for emotion recognition from speech using acoustic and text-based features
In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSP-PODCAST datasets. We find that fusing acoustic and text-based systems is beneficial on both datasets, though only subtle differences are observed across the evaluated fusion approaches. Finally, for IEMOCAP, we show the large effect that the criteria used to define the cross-validation folds have on results. In particular, the standard way of creating folds for this dataset results in a highly optimistic estimation of performance for the text-based system, suggesting that some previous works may overestimate the advantage of incorporating transcriptions.
Updated: 2024-03-27 14:40:25
标题: 使用声学和基于文本的特征的语音情感识别的融合方法
摘要: 在本文中,我们研究了使用声学和基于文本的特征对语音情绪进行分类的不同方法。我们建议使用BERT获得上下文化的词嵌入来表示语音转录中包含的信息,并展示这比使用Glove嵌入产生更好的性能。我们还提出并比较了不同的策略来结合音频和文本模态,并在IEMOCAP和MSP-PODCAST数据集上对它们进行评估。我们发现,在两个数据集上融合声学和基于文本的系统是有益的,尽管在评估的融合方法中只观察到了细微差异。最后,对于IEMOCAP,我们展示了用于定义交叉验证折叠的标准标准对结果的巨大影响。特别是,对于这个数据集创建折叠的标准方法导致对基于文本系统性能的高度乐观估计,暗示一些先前的工作可能夸大了转录的优势。
更新时间: 2024-03-27 14:40:25
领域: cs.LG,cs.SD,eess.AS
First Experiences with the Identification of People at Risk for Diabetes in Argentina using Machine Learning Techniques
Detecting Type 2 Diabetes (T2D) and Prediabetes (PD) is a real challenge for medicine due to the absence of pathogenic symptoms and the lack of known associated risk factors. Even though some proposals for machine learning models enable the identification of people at risk, the nature of the condition makes it so that a model suitable for one population may not necessarily be suitable for another. In this article, the development and assessment of predictive models to identify people at risk for T2D and PD specifically in Argentina are discussed. First, the database was thoroughly preprocessed and three specific datasets were generated considering a compromise between the number of records and the amount of available variables. After applying 5 different classification models, the results obtained show that a very good performance was observed for two datasets with some of these models. In particular, RF, DT, and ANN demonstrated great classification power, with good values for the metrics under consideration. Given the lack of this type of tool in Argentina, this work represents the first step towards the development of more sophisticated models.
Updated: 2024-03-27 14:38:02
标题: 在阿根廷使用机器学习技术识别糖尿病风险人群的初步实践
摘要: 在医学领域,检测2型糖尿病(T2D)和糖尿病前期(PD)是一个真正的挑战,因为缺乏病原性症状和未知的相关风险因素。尽管一些机器学习模型的提议使得可以识别有风险的人群,但由于该病症的特性,一个适合某个人群的模型未必适用于另一个人群。本文讨论了在阿根廷特定地识别T2D和PD有风险人群的预测模型的开发和评估。首先,对数据库进行了彻底的预处理,并生成了三个特定数据集,考虑了记录数量和可用变量数量之间的折衷。应用了5种不同的分类模型后,结果表明在两个数据集中某些模型表现出很好的性能。特别是,RF、DT和ANN展示了很好的分类能力,并且在考虑的指标下取得了良好的数值。考虑到阿根廷缺乏这种类型的工具,这项工作代表了迈向更复杂模型开发的第一步。
更新时间: 2024-03-27 14:38:02
领域: cs.LG
Nested Dirichlet models for unsupervised attack pattern detection in honeypot data
Cyber-systems are under near-constant threat from intrusion attempts. Attacks types vary, but each attempt typically has a specific underlying intent, and the perpetrators are typically groups of individuals with similar objectives. Clustering attacks appearing to share a common intent is very valuable to threat-hunting experts. This article explores Dirichlet distribution topic models for clustering terminal session commands collected from honeypots, which are special network hosts designed to entice malicious attackers. The main practical implications of clustering the sessions are two-fold: finding similar groups of attacks, and identifying outliers. A range of statistical models are considered, adapted to the structures of command-line syntax. In particular, concepts of primary and secondary topics, and then session-level and command-level topics, are introduced into the models to improve interpretability. The proposed methods are further extended in a Bayesian nonparametric fashion to allow unboundedness in the vocabulary size and the number of latent intents. The methods are shown to discover an unusual MIRAI variant which attempts to take over existing cryptocurrency coin-mining infrastructure, not detected by traditional topic-modelling approaches.
Updated: 2024-03-27 14:30:59
标题: 在蜜罐数据中无监督攻击模式检测的嵌套Dirichlet模型
摘要: 网络系统经常面临入侵尝试的威胁。攻击类型各不相同,但每次尝试通常都有特定的意图,而肇事者通常是具有类似目标的个人组成的团体。聚类攻击似乎具有共同的意图对于威胁猎人专家非常有价值。本文探讨了从诱饵网络主机收集的终端会话命令进行聚类的狄利克雷分布主题模型,这些网络主机旨在引诱恶意攻击者。对会话进行聚类的主要实际影响是双重的:发现类似的攻击组和识别异常值。考虑了一系列统计模型,适应命令行语法的结构。特别是,将主题和次要主题的概念引入模型,然后进一步将会话级和命令级主题引入模型以提高可解释性。提出的方法进一步以贝叶斯非参数的方式扩展,以允许词汇量和潜在意图数量的无限扩展。这些方法被证明能发现一种不寻常的MIRAI变体,试图接管现有的加密货币挖矿基础设施,这是传统主题建模方法无法检测到的。
更新时间: 2024-03-27 14:30:59
领域: cs.CR,stat.AP
Structure Guided Large Language Model for SQL Generation
Generating accurate Structured Querying Language (SQL) is a long-standing problem, especially in matching users' semantic queries with structured databases and then generating structured SQL. Existing models typically input queries and database schemas into the LLM and rely on the LLM to perform semantic-structure matching and generate structured SQL. However, such solutions overlook the structural information within user queries and databases, which can be utilized to enhance the generation of structured SQL. This oversight can lead to inaccurate or unexecutable SQL generation. To fully exploit the structure, we propose a structure-to-SQL framework, which leverages the inherent structure information to improve the SQL generation of LLMs. Specifically, we introduce our Structure Guided SQL~(SGU-SQL) generation model. SGU-SQL first links user queries and databases in a structure-enhanced manner. It then decomposes complicated linked structures with grammar trees to guide the LLM to generate the SQL step by step. Extensive experiments on two benchmark datasets illustrate that SGU-SQL can outperform sixteen SQL generation baselines.
Updated: 2024-03-27 14:30:44
标题: 基于结构引导的大型语言模型用于SQL生成
摘要: 生成准确的结构化查询语言(SQL)是一个长期存在的问题,特别是在将用户的语义查询与结构化数据库匹配,然后生成结构化SQL方面。现有模型通常将查询和数据库模式输入到LLM中,并依赖LLM执行语义-结构匹配并生成结构化SQL。然而,这些解决方案忽视了用户查询和数据库中的结构信息,这些信息可以用于增强结构化SQL的生成。这种疏忽可能导致生成不准确或无法执行的SQL。为了充分利用结构,我们提出了一个结构化SQL框架,利用内在的结构信息来改进LLM的SQL生成。具体来说,我们介绍了我们的Structure Guided SQL(SGU-SQL)生成模型。SGU-SQL首先以结构增强的方式链接用户查询和数据库。然后,它通过语法树分解复杂的链接结构,逐步引导LLM生成SQL。在两个基准数据集上进行的大量实验表明,SGU-SQL可以胜过十六个SQL生成基线。
更新时间: 2024-03-27 14:30:44
领域: cs.DB,cs.AI,cs.CL
Scalable Lipschitz Estimation for CNNs
Estimating the Lipschitz constant of deep neural networks is of growing interest as it is useful for informing on generalisability and adversarial robustness. Convolutional neural networks (CNNs) in particular, underpin much of the recent success in computer vision related applications. However, although existing methods for estimating the Lipschitz constant can be tight, they have limited scalability when applied to CNNs. To tackle this, we propose a novel method to accelerate Lipschitz constant estimation for CNNs. The core idea is to divide a large convolutional block via a joint layer and width-wise partition, into a collection of smaller blocks. We prove an upper-bound on the Lipschitz constant of the larger block in terms of the Lipschitz constants of the smaller blocks. Through varying the partition factor, the resulting method can be adjusted to prioritise either accuracy or scalability and permits parallelisation. We demonstrate an enhanced scalability and comparable accuracy to existing baselines through a range of experiments.
Updated: 2024-03-27 14:28:44
标题: 可扩展的CNNs的Lipschitz估计
摘要: 估计深度神经网络的Lipschitz常数越来越受到关注,因为对于准确性和对抗性鲁棒性的评估非常有用。卷积神经网络(CNNs)在计算机视觉相关应用中取得了很大成功。然而,尽管现有的估计Lipschitz常数的方法可能很严格,但在应用于CNNs时,它们的可扩展性有限。为了解决这个问题,我们提出了一种加速CNNs Lipschitz常数估计的新方法。其核心思想是通过一个联合层和宽度划分,将一个大卷积块分成一组小块。我们证明了以小块的Lipschitz常数为基础,可以得出大块的Lipschitz常数的上界。通过调整划分因子,结果方法可以根据需要优先考虑准确性或可扩展性,并且支持并行化。通过一系列实验,我们展示了增强的可扩展性和与现有基准线相当的准确性。
更新时间: 2024-03-27 14:28:44
领域: cs.LG
Spikewhisper: Temporal Spike Backdoor Attacks on Federated Neuromorphic Learning over Low-power Devices
Federated neuromorphic learning (FedNL) leverages event-driven spiking neural networks and federated learning frameworks to effectively execute intelligent analysis tasks over amounts of distributed low-power devices but also perform vulnerability to poisoning attacks. The threat of backdoor attacks on traditional deep neural networks typically comes from time-invariant data. However, in FedNL, unknown threats may be hidden in time-varying spike signals. In this paper, we start to explore a novel vulnerability of FedNL-based systems with the concept of time division multiplexing, termed Spikewhisper, which allows attackers to evade detection as much as possible, as multiple malicious clients can imperceptibly poison with different triggers at different timeslices. In particular, the stealthiness of Spikewhisper is derived from the time-domain divisibility of global triggers, in which each malicious client pastes only one local trigger to a certain timeslice in the neuromorphic sample, and also the polarity and motion of each local trigger can be configured by attackers. Extensive experiments based on two different neuromorphic datasets demonstrate that the attack success rate of Spikewispher is higher than the temporally centralized attacks. Besides, it is validated that the effect of Spikewispher is sensitive to the trigger duration.
Updated: 2024-03-27 14:25:02
标题: 尖峰耳语:低功耗设备上的联合神经形态学学习中的时序尖峰后门攻击
摘要: 联合神经形态学学习(FedNL)利用事件驱动的脉冲神经网络和联合学习框架,有效地执行智能分析任务,可以在分布式低功耗设备上执行,但也容易受到毒化攻击的影响。传统深度神经网络面临的后门攻击威胁通常来自于时间不变的数据。然而,在FedNL中,未知威胁可能隐藏在时变的脉冲信号中。本文开始探讨基于FedNL的系统的一种新型脆弱性,即时分复用的概念,称为Spikewhisper,允许攻击者尽可能地规避检测,因为多个恶意客户端可以在不同的时间片段中使用不同的触发器进行毒化,而不易察觉。特别是,Spikewhisper的隐秘性源自于全局触发器的时域可分性,其中每个恶意客户端只在神经形态样本的某个时间片段中粘贴一个本地触发器,而且每个本地触发器的极性和运动可以由攻击者配置。基于两个不同的神经形态学数据集的大量实验表明,Spikewhispher的攻击成功率高于时间集中的攻击。此外,验证了Spikewhispher的效果对触发器持续时间敏感。
更新时间: 2024-03-27 14:25:02
领域: cs.CR,cs.AI,eess.SP
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
Procedure Planning in instructional videos entails generating a sequence of action steps based on visual observations of the initial and target states. Despite the rapid progress in this task, there remain several critical challenges to be solved: (1) Adaptive procedures: Prior works hold an unrealistic assumption that the number of action steps is known and fixed, leading to non-generalizable models in real-world scenarios where the sequence length varies. (2) Temporal relation: Understanding the step temporal relation knowledge is essential in producing reasonable and executable plans. (3) Annotation cost: Annotating instructional videos with step-level labels (i.e., timestamp) or sequence-level labels (i.e., action category) is demanding and labor-intensive, limiting its generalizability to large-scale datasets.In this work, we propose a new and practical setting, called adaptive procedure planning in instructional videos, where the procedure length is not fixed or pre-determined. To address these challenges we introduce Retrieval-Augmented Planner (RAP) model. Specifically, for adaptive procedures, RAP adaptively determines the conclusion of actions using an auto-regressive model architecture. For temporal relation, RAP establishes an external memory module to explicitly retrieve the most relevant state-action pairs from the training videos and revises the generated procedures. To tackle high annotation cost, RAP utilizes a weakly-supervised learning manner to expand the training dataset to other task-relevant, unannotated videos by generating pseudo labels for action steps. Experiments on CrossTask and COIN benchmarks show the superiority of RAP over traditional fixed-length models, establishing it as a strong baseline solution for adaptive procedure planning.
Updated: 2024-03-27 14:22:40
标题: RAP:检索增强规划器用于教学视频中的自适应程序规划
摘要: 在教学视频中的程序规划涉及根据初始和目标状态的视觉观察生成一系列动作步骤。尽管在这项任务中取得了快速进展,但仍然存在几个关键挑战需要解决:(1) 自适应程序:先前的研究假设动作步数已知且固定,导致在序列长度变化的真实场景中具有非泛化模型。(2) 时间关系:理解步骤的时间关系知识对于生成合理可执行的计划至关重要。(3) 标注成本:用步骤级标签(即时间戳)或序列级标签(即动作类别)对教学视频进行标注是繁重且劳动密集的,限制了其泛化到大规模数据集的能力。在这项工作中,我们提出了一种新的实用设置,称为教学视频中的自适应程序规划,其中程序长度不是固定或预先确定的。为了解决这些挑战,我们引入了检索增强规划器(RAP)模型。具体而言,对于自适应程序,RAP使用自回归模型架构自适应地确定动作的结论。对于时间关系,RAP建立了一个外部记忆模块,显式地从训练视频中检索出最相关的状态-动作对并修订生成的程序。为了解决高成本标注问题,RAP利用弱监督学习方式扩展训练数据集,通过为动作步骤生成伪标签来将其扩展到其他任务相关但未标记的视频。对CrossTask和COIN基准进行的实验显示了RAP相对于传统固定长度模型的优越性,将其确立为自适应程序规划的强大基线解决方案。
更新时间: 2024-03-27 14:22:40
领域: cs.CV,cs.AI,cs.RO
Heterogeneous Peridynamic Neural Operators: Discover Biotissue Constitutive Law and Microstructure From Digital Image Correlation Measurements
Human tissues are highly organized structures with specific collagen fiber arrangements varying from point to point. The effects of such heterogeneity play an important role for tissue function, and hence it is of critical to discover and understand the distribution of such fiber orientations from experimental measurements, such as the digital image correlation data. To this end, we introduce the heterogeneous peridynamic neural operator (HeteroPNO) approach, for data-driven constitutive modeling of heterogeneous anisotropic materials. The goal is to learn both a nonlocal constitutive law together with the material microstructure, in the form of a heterogeneous fiber orientation field, from loading field-displacement field measurements. To this end, we propose a two-phase learning approach. Firstly, we learn a homogeneous constitutive law in the form of a neural network-based kernel function and a nonlocal bond force, to capture complex homogeneous material responses from data. Then, in the second phase we reinitialize the learnt bond force and the kernel function, and training them together with a fiber orientation field for each material point. Owing to the state-based peridynamic skeleton, our HeteroPNO-learned material models are objective and have the balance of linear and angular momentum guaranteed. Moreover, the effects from heterogeneity and nonlinear constitutive relationship are captured by the kernel function and the bond force respectively, enabling physical interpretability. As a result, our HeteroPNO architecture can learn a constitutive model for a biological tissue with anisotropic heterogeneous response undergoing large deformation regime. Moreover, the framework is capable to provide displacement and stress field predictions for new and unseen loading instances.
Updated: 2024-03-27 14:20:11
标题: 异质Peridynamic神经算子:从数字图像相关测量中发现生物组织本构定律和微结构
摘要: 人类组织是高度有序的结构,具有特定的胶原纤维排列,从一个点到另一个点变化。这种异质性的影响对组织功能起着重要作用,因此发现和理解这种纤维取向的分布是至关重要的,这可以通过实验测量,如数字图像相关数据来实现。为此,我们引入了异质性偏动态神经操作器(HeteroPNO)方法,用于基于数据的异质各向异性材料的构成建模。目标是从加载场-位移场测量中学习出一种非局部构成法律以及材料微观结构,以异质纤维取向场的形式呈现。为此,我们提出了一个两阶段学习方法。首先,我们学习了一种由基于神经网络的核函数和非局部键力组成的均匀构成法律,以捕捉数据中的复杂均匀材料响应。然后,在第二阶段,我们重新初始化学习的键力和核函数,并将它们与每个材料点的纤维取向场一起训练。由于基于状态的偏动态骨架,我们学习的HeteroPNO材料模型是客观的,并且具有线性和角动量平衡。此外,异质性和非线性构成关系的影响分别由核函数和键力捕获,使其具有物理可解释性。因此,我们的HeteroPNO架构可以学习生物组织的各向异性异质响应的构成模型,同时经历大变形范围。此外,该框架能够为新的和未见的加载情况提供位移和应力场预测。
更新时间: 2024-03-27 14:20:11
领域: cond-mat.mtrl-sci,cs.LG
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding
The tokenizer, as one of the fundamental components of large models, has long been overlooked or even misunderstood in visual tasks. One key factor of the great comprehension power of the large language model is that natural language tokenizers utilize meaningful words or subwords as the basic elements of language. In contrast, mainstream visual tokenizers, represented by patch-based methods such as Patch Embed, rely on meaningless rectangular patches as basic elements of vision, which cannot serve as effectively as words or subwords in language. Starting from the essence of the tokenizer, we defined semantically independent regions (SIRs) for vision. We designed a simple HOmogeneous visual tOKenizer: HOOK. HOOK mainly consists of two modules: the Object Perception Module (OPM) and the Object Vectorization Module (OVM). To achieve homogeneity, the OPM splits the image into 4*4 pixel seeds and then utilizes the attention mechanism to perceive SIRs. The OVM employs cross-attention to merge seeds within the same SIR. To achieve adaptability, the OVM defines a variable number of learnable vectors as cross-attention queries, allowing for the adjustment of token quantity. We conducted experiments on the NWPU-RESISC45, WHU-RS19 classification dataset, and GID5 segmentation dataset for sparse and dense tasks. The results demonstrate that the visual tokens obtained by HOOK correspond to individual objects, which demonstrates homogeneity. HOOK outperformed Patch Embed by 6\% and 10\% in the two tasks and achieved state-of-the-art performance compared to the baselines used for comparison. Compared to Patch Embed, which requires more than one hundred tokens for one image, HOOK requires only 6 and 8 tokens for sparse and dense tasks, respectively, resulting in efficiency improvements of 1.5 to 2.8 times. The code is available at https://github.com/GeoX-Lab/Hook.
Updated: 2024-03-27 14:18:09
标题: 同质化分词器的重要性:用于遥感图像理解的同质化视觉分词器
摘要: 标记器作为大型模型的基本组件之一,长期以来在视觉任务中被忽视甚至误解。大型语言模型具有强大理解能力的一个关键因素是自然语言标记器利用有意义的单词或子词作为语言的基本元素。相比之下,以Patch Embed等基于补丁的方法为代表的主流视觉标记器依赖于无意义的矩形补丁作为视觉的基本元素,这些补丁无法像语言中的单词或子词那样有效地发挥作用。从标记器的本质出发,我们为视觉定义了语义独立区域(SIRs)。我们设计了一个简单的HOmogeneous视觉标记器:HOOK。HOOK主要由两个模块组成:对象感知模块(OPM)和对象向量化模块(OVM)。为实现同质性,OPM将图像分割为4*4像素种子,然后利用注意机制感知SIRs。OVM采用交互注意力来合并同一SIR内的种子。为实现适应性,OVM定义了可变数量的可学习向量作为交互注意力查询,允许调整标记数量。我们在NWPU-RESISC45、WHU-RS19分类数据集和GID5分割数据集上进行了实验,用于稀疏和密集任务。结果表明,HOOK获取的视觉标记对应于单个对象,表明同质性。在两个任务中,HOOK的性能优于Patch Embed分别6\%和10\%,并与用于比较的基线相比取得了最先进的性能。与Patch Embed相比,其需要超过一百个标记符号来表示一幅图像,HOOK仅需要6和8个标记符号用于稀疏和密集任务,从而使效率提高了1.5到2.8倍。代码可在https://github.com/GeoX-Lab/Hook获得。
更新时间: 2024-03-27 14:18:09
领域: cs.CV,cs.AI
The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision
Resource efficiency plays an important role for machine learning nowadays. The energy and decision latency are two critical aspects to ensure a sustainable and practical application. Unfortunately, the energy consumption and decision latency are not robust against adversaries. Researchers have recently demonstrated that attackers can compute and submit so-called sponge examples at inference time to increase the energy consumption and decision latency of neural networks. In computer vision, the proposed strategy crafts inputs with less activation sparsity which could otherwise be used to accelerate the computation. In this paper, we analyze the mechanism how these energy-latency attacks reduce activation sparsity. In particular, we find that input uniformity is a key enabler. A uniform image, that is, an image with mostly flat, uniformly colored surfaces, triggers more activations due to a specific interplay of convolution, batch normalization, and ReLU activation. Based on these insights, we propose two new simple, yet effective strategies for crafting sponge examples: sampling images from a probability distribution and identifying dense, yet inconspicuous inputs in natural datasets. We empirically examine our findings in a comprehensive evaluation with multiple image classification models and show that our attack achieves the same sparsity effect as prior sponge-example methods, but at a fraction of computation effort. We also show that our sponge examples transfer between different neural networks. Finally, we discuss applications of our findings for the good by improving efficiency by increasing sparsity.
Updated: 2024-03-27 14:11:23
标题: 均匀输入对计算机视觉中激活稀疏性和能量-延迟攻击的影响
摘要: 资源效率在当今的机器学习中扮演着重要角色。能源消耗和决策延迟是确保可持续和实际应用的两个关键方面。不幸的是,能源消耗和决策延迟并不具有抵抗攻击者的强度。研究人员最近展示了攻击者可以在推断时计算并提交所谓的海绵示例,以增加神经网络的能源消耗和决策延迟。在计算机视觉领域,所提出的策略制作了具有较低激活稀疏性的输入,这些输入本来可以用来加速计算。在本文中,我们分析了这些能源-延迟攻击如何降低激活稀疏性的机制。特别是,我们发现输入的均匀性是一个关键因素。一个均匀的图像,即一个大部分是平坦、颜色均匀的表面的图像,由于卷积、批量归一化和ReLU激活之间的特定相互作用,会触发更多的激活。基于这些见解,我们提出了两种新的简单而有效的策略来制作海绵示例:从概率分布中采样图像和在自然数据集中识别密集但不显眼的输入。我们通过对多个图像分类模型进行全面评估来实证检验我们的发现,并展示我们的攻击实现了与之前的海绵示例方法相同的稀疏效果,但计算成本却大大降低。我们还展示了我们的海绵示例在不同神经网络之间的传递性。最后,我们讨论了我们的发现对提高效率的应用,通过增加稀疏性。
更新时间: 2024-03-27 14:11:23
领域: cs.CR,cs.CV,cs.LG
One flow to correct them all: improving simulations in high-energy physics with a single normalising flow and a switch
Simulated events are key ingredients in almost all high-energy physics analyses. However, imperfections in the simulation can lead to sizeable differences between the observed data and simulated events. The effects of such mismodelling on relevant observables must be corrected either effectively via scale factors, with weights or by modifying the distributions of the observables and their correlations. We introduce a correction method that transforms one multidimensional distribution (simulation) into another one (data) using a simple architecture based on a single normalising flow with a boolean condition. We demonstrate the effectiveness of the method on a physics-inspired toy dataset with non-trivial mismodelling of several observables and their correlations.
Updated: 2024-03-27 14:03:41
标题: 一种流程来纠正所有问题:通过单一标准化流程和开关改进高能物理模拟
摘要: 模拟事件是几乎所有高能物理分析中的关键要素。然而,模拟中的缺陷可能导致观测数据和模拟事件之间存在较大差异。这种误建模对相关可观测量的影响必须通过比例因子、权重或修改可观测量及其相关性的分布来进行有效校正。我们介绍了一种校正方法,使用基于单一归一化流和布尔条件的简单架构将一个多维分布(模拟)转换为另一个多维分布(数据)。我们在一个受物理启发的玩具数据集上展示了该方法的有效性,该数据集存在多个可观测量及其相关性的非平凡误建模。
更新时间: 2024-03-27 14:03:41
领域: hep-ph,cs.LG,hep-ex,physics.data-an
Recurrent Action Transformer with Memory
Recently, the use of transformers in offline reinforcement learning has become a rapidly developing area. This is due to their ability to treat the agent's trajectory in the environment as a sequence, thereby reducing the policy learning problem to sequence modeling. In environments where the agent's decisions depend on past events, it is essential to capture both the event itself and the decision point in the context of the model. However, the quadratic complexity of the attention mechanism limits the potential for context expansion. One solution to this problem is to enhance transformers with memory mechanisms. In this paper, we propose the Recurrent Action Transformer with Memory (RATE) - a model that incorporates recurrent memory. To evaluate our model, we conducted extensive experiments on both memory-intensive environments (VizDoom-Two-Color, T-Maze) and classic Atari games and MuJoCo control environments. The results show that the use of memory can significantly improve performance in memory-intensive environments while maintaining or improving results in classic environments. We hope that our findings will stimulate research on memory mechanisms for transformers applicable to offline reinforcement learning.
Updated: 2024-03-27 14:02:58
标题: 带有记忆的循环动作变换器
摘要: 最近,在离线强化学习中,变压器的使用已经成为一个迅速发展的领域。这是因为它们能够将Agent在环境中的轨迹视为一个序列,从而将策略学习问题简化为序列建模。在Agent的决策取决于过去事件的环境中,捕捉事件本身和决策点在模型背景下是至关重要的。然而,注意力机制的二次复杂度限制了上下文扩展的潜力。解决这个问题的一个方法是通过记忆机制增强变压器。在本文中,我们提出了带有记忆的循环动作变压器(RATE)- 一个整合了循环记忆的模型。为了评估我们的模型,在记忆密集型环境(VizDoom-Two-Color,T-Maze)和经典的Atari游戏和MuJoCo控制环境上进行了大量实验。结果表明,在记忆密集型环境中使用记忆可以显著提高性能,同时在经典环境中保持或改善结果。我们希望我们的研究结果能够激发关于变压器记忆机制在离线强化学习中的应用的研究。
更新时间: 2024-03-27 14:02:58
领域: cs.LG,cs.AI
Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing
This paper introduces TRACE-GPT, which stands for Time-seRies Anomaly-detection with Convolutional Embedding and Generative Pre-trained Transformers. TRACE-GPT is designed to pre-train univariate time-series sensor data and detect faults on unlabeled datasets in semiconductor manufacturing. In semiconductor industry, classifying abnormal time-series sensor data from normal data is important because it is directly related to wafer defect. However, small, unlabeled, and even mixed training data without enough anomalies make classification tasks difficult. In this research, we capture features of time-series data with temporal convolutional embedding and Generative Pre-trained Transformer (GPT) to classify abnormal sequences from normal sequences using cross entropy loss. We prove that our model shows better performance than previous unsupervised models with both an open dataset, the University of California Riverside (UCR) time-series classification archive, and the process log of our Chemical Vapor Deposition (CVD) equipment. Our model has the highest F1 score at Equal Error Rate (EER) across all datasets and is only 0.026 below the supervised state-of-the-art baseline on the open dataset.
Updated: 2024-03-27 14:02:57
标题: 半导体制造中无监督故障检测的时间序列数据生成预训练
摘要: 本文介绍了TRACE-GPT,它代表着使用卷积嵌入和生成式预训练Transformer进行时间序列异常检测。TRACE-GPT旨在预训练单变量时间序列传感器数据,并在半导体制造的无标签数据集中检测故障。在半导体行业中,将异常时间序列传感器数据与正常数据进行分类是重要的,因为它直接与晶圆缺陷相关。然而,小规模、无标签甚至混合训练数据中缺乏足够的异常使得分类任务困难。在这项研究中,我们使用时间卷积嵌入和生成式预训练Transformer(GPT)捕捉时间序列数据的特征,使用交叉熵损失从正常序列中分类出异常序列。我们证明我们的模型在使用加利福尼亚大学河滨分校(UCR)时间序列分类存档和我们的化学气相沉积(CVD)设备的过程日志时比以前的无监督模型表现更好。我们的模型在所有数据集上都具有最高的F1分数,在开放数据集上仅比监督式最新基线低0.026。
更新时间: 2024-03-27 14:02:57
领域: cs.LG,cs.AI
MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction
The rise of Machine Learning as a Service (MLaaS) has led to the widespread deployment of machine learning models trained on diverse datasets. These models are employed for predictive services through APIs, raising concerns about the security and confidentiality of the models due to emerging vulnerabilities in prediction APIs. Of particular concern are model cloning attacks, where individuals with limited data and no knowledge of the training dataset manage to replicate a victim model's functionality through black-box query access. This commonly entails generating adversarial queries to query the victim model, thereby creating a labeled dataset. This paper proposes "MisGUIDE", a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD. The first step employs a Vision Transformer-based framework to identify OOD queries, while the second step perturbs the response for such queries, introducing a probabilistic loss function to MisGUIDE the attackers. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed framework significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings.
Updated: 2024-03-27 13:59:21
标题: 误导:对抗无数据深度学习模型提取
摘要: 机器学习即服务(MLaaS)的兴起导致了在各种数据集上训练的机器学习模型的广泛部署。这些模型通过API用于预测服务,引发了对模型安全性和保密性的担忧,因为预测API中出现了新的漏洞。特别关注的是模型克隆攻击,即通过黑匣子查询访问来自有限数据且不了解训练数据集的个人成功复制受害模型的功能。这通常包括生成敌对查询以查询受害模型,从而创建一个标记数据集。 本文提出了“MisGUIDE”,一个针对深度学习模型的两步防御框架,通过在查询被认为是OOD(Out-of-Distribution)时提供概率响应来破坏敌对样本生成过程。第一步采用基于Vision Transformer的框架识别OOD查询,而第二步对这些查询的响应进行扰动,引入一个概率损失函数来误导攻击者。所提出的防御方法的目的是降低克隆模型的准确性,同时保持对真实查询的准确性。在两个基准数据集上进行的大量实验表明,所提出的框架显著增强了针对黑匣子设置中最先进的无数据模型提取的抵抗力。
更新时间: 2024-03-27 13:59:21
领域: cs.CR,Under Review
On Optimizing Hyperparameters for Quantum Neural Networks
The increasing capabilities of Machine Learning (ML) models go hand in hand with an immense amount of data and computational power required for training. Therefore, training is usually outsourced into HPC facilities, where we have started to experience limits in scaling conventional HPC hardware, as theorized by Moore's law. Despite heavy parallelization and optimization efforts, current state-of-the-art ML models require weeks for training, which is associated with an enormous $CO_2$ footprint. Quantum Computing, and specifically Quantum Machine Learning (QML), can offer significant theoretical speed-ups and enhanced expressive power. However, training QML models requires tuning various hyperparameters, which is a nontrivial task and suboptimal choices can highly affect the trainability and performance of the models. In this study, we identify the most impactful hyperparameters and collect data about the performance of QML models. We compare different configurations and provide researchers with performance data and concrete suggestions for hyperparameter selection.
Updated: 2024-03-27 13:59:09
标题: 关于量子神经网络超参数优化的研究
摘要: 机器学习(ML)模型的增强能力与训练所需的数据量和计算能力密切相关。因此,训练通常外包到高性能计算设施,我们开始在传统HPC硬件的扩展上遇到限制,正如摩尔定律所预测的那样。尽管进行了大量并行化和优化工作,但当前最先进的ML模型需要数周的训练时间,这与巨大的$CO_2$足迹相关。量子计算,特别是量子机器学习(QML),可以提供显著的理论加速和增强的表达能力。然而,训练QML模型需要调整各种超参数,这是一项非常重要的任务,次优选择可能会严重影响模型的可训练性和性能。在本研究中,我们确定了最具影响力的超参数,并收集了关于QML模型性能的数据。我们比较不同配置,并为研究人员提供性能数据和关于超参数选择的具体建议。
更新时间: 2024-03-27 13:59:09
领域: cs.LG,cs.ET
SteinGen: Generating Fidelitous and Diverse Graph Samples
Generating graphs that preserve characteristic structures while promoting sample diversity can be challenging, especially when the number of graph observations is small. Here, we tackle the problem of graph generation from only one observed graph. The classical approach of graph generation from parametric models relies on the estimation of parameters, which can be inconsistent or expensive to compute due to intractable normalisation constants. Generative modelling based on machine learning techniques to generate high-quality graph samples avoids parameter estimation but usually requires abundant training samples. Our proposed generating procedure, SteinGen, which is phrased in the setting of graphs as realisations of exponential random graph models, combines ideas from Stein's method and MCMC by employing Markovian dynamics which are based on a Stein operator for the target model. SteinGen uses the Glauber dynamics associated with an estimated Stein operator to generate a sample, and re-estimates the Stein operator from the sample after every sampling step. We show that on a class of exponential random graph models this novel "estimation and re-estimation" generation strategy yields high distributional similarity (high fidelity) to the original data, combined with high sample diversity.
Updated: 2024-03-27 13:59:05
标题: SteinGen:生成忠实且多样化的图样本
摘要: 生成保留特征结构并促进样本多样性的图形可能具有挑战性,特别是当图形观察数量较少时。在这里,我们解决了仅从一个观察到的图形生成图形的问题。从参数模型生成图形的经典方法依赖于参数的估计,这可能由于难以处理的归一化常数而导致不一致或计算昂贵。基于机器学习技术的生成建模以生成高质量的图形样本避免了参数估计,但通常需要大量的训练样本。我们提出的生成过程SteinGen,在将图形作为指数随机图模型的实现的背景下,通过采用基于目标模型的Stein算子的马尔可夫动力学,结合了Stein方法和MCMC的思想。SteinGen使用与估计的Stein算子相关联的Glauber动力学来生成样本,并在每个采样步骤之后从样本中重新估计Stein算子。我们表明,在一类指数随机图模型上,这种新颖的“估计和重新估计”生成策略产生了与原始数据高分布相似性(高保真度)以及高样本多样性的结果。
更新时间: 2024-03-27 13:59:05
领域: stat.ML,cs.LG
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Large Language Models (LLMs) are now commonplace in conversation applications. However, their risks of misuse for generating harmful responses have raised serious societal concerns and spurred recent research on LLM conversation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and evaluations. Our goal is to provide a structured summary that enhances understanding of LLM conversation safety and encourages further investigation into this important subject. For easy reference, we have categorized all the studies mentioned in this survey according to our taxonomy, available at: https://github.com/niconi19/LLM-conversation-safety.
Updated: 2024-03-27 13:55:14
标题: LLM对话安全的攻击,防御和评估:一项调查
摘要: 大型语言模型(LLMs)现在在对话应用程序中很常见。然而,它们被滥用生成有害回应的风险引起了严重的社会关注,并激发了最近对LLM对话安全性的研究。因此,在这项调查中,我们提供了最近研究的综合概述,涵盖LLM对话安全性的三个关键方面:攻击、防御和评估。我们的目标是提供一个结构化的摘要,增进对LLM对话安全性的理解,并鼓励进一步研究这一重要主题。为了方便参考,我们根据我们的分类法对本调查中提到的所有研究进行了分类,可在https://github.com/niconi19/LLM-conversation-safety 上查看。
更新时间: 2024-03-27 13:55:14
领域: cs.CL,cs.AI,cs.CY,cs.LG
Bootstrapping Guarantees: Stability and Performance Analysis for Dynamic Encrypted Control
Encrypted dynamic controllers that operate for an unlimited time have been a challenging subject of research. The fundamental difficulty is the accumulation of errors and scaling factors in the internal state during operation. Bootstrapping, a technique commonly employed in fully homomorphic cryptosystems, can be used to avoid overflows in the controller state but can potentially introduce significant numerical errors. In this paper, we analyze dynamic encrypted control with explicit consideration of bootstrapping. By recognizing the bootstrapping errors occurring in the controller's state as an uncertainty in the robust control framework, we can provide stability and performance guarantees for the whole encrypted control system. Further, the conservatism of the stability and performance test is reduced by using a lifted version of the control system.
Updated: 2024-03-27 13:52:41
标题: 引导保证:动态加密控制的稳定性和性能分析
摘要: 加密动态控制器在无限时间内运作是一个具有挑战性的研究课题。其根本困难在于在运行过程中内部状态中误差和缩放因子的累积。引导技术是在完全同态加密系统中常用的一种技术,可以用来避免控制器状态中的溢出,但可能会引入显著的数值误差。本文分析了带有显式引导考虑的动态加密控制。通过将发生在控制器状态中的引导误差识别为鲁棒控制框架中的不确定性,我们可以为整个加密控制系统提供稳定性和性能保证。此外,通过使用控制系统的提升版本,可以减少稳定性和性能测试的保守性。
更新时间: 2024-03-27 13:52:41
领域: eess.SY,cs.CR,cs.SY,math.OC
Physics-Informed Graph Neural Networks for Water Distribution Systems
Water distribution systems (WDS) are an integral part of critical infrastructure which is pivotal to urban development. As 70% of the world's population will likely live in urban environments in 2050, efficient simulation and planning tools for WDS play a crucial role in reaching UN's sustainable developmental goal (SDG) 6 - "Clean water and sanitation for all". In this realm, we propose a novel and efficient machine learning emulator, more precisely, a physics-informed deep learning (DL) model, for hydraulic state estimation in WDS. Using a recursive approach, our model only needs a few graph convolutional neural network (GCN) layers and employs an innovative algorithm based on message passing. Unlike conventional machine learning tasks, the model uses hydraulic principles to infer two additional hydraulic state features in the process of reconstructing the available ground truth feature in an unsupervised manner. To the best of our knowledge, this is the first DL approach to emulate the popular hydraulic simulator EPANET, utilizing no additional information. Like most DL models and unlike the hydraulic simulator, our model demonstrates vastly faster emulation times that do not increase drastically with the size of the WDS. Moreover, we achieve high accuracy on the ground truth and very similar results compared to the hydraulic simulator as demonstrated through experiments on five real-world WDS datasets.
Updated: 2024-03-27 13:51:26
标题: 物理学信息图神经网络用于水配系统
摘要: 水配送系统(WDS)是城市发展至关重要的关键基础设施的一部分。预计到2050年,世界70%的人口将生活在城市环境中,因此,WDS的高效模拟和规划工具对于实现联合国可持续发展目标(SDG)6 - “为所有人提供清洁水和卫生设施”至关重要。在这个领域中,我们提出了一种新颖高效的机器学习仿真器,更确切地说是一种基于物理原理的深度学习(DL)模型,用于WDS中的水力状态估计。通过采用递归方法,我们的模型只需要少量图卷积神经网络(GCN)层,并采用基于消息传递的创新算法。与传统的机器学习任务不同,该模型利用水力原理在重建可用地面真实特征的过程中以无监督方式推断出两个额外的水力状态特征。据我们所知,这是第一个利用无需额外信息仿真流行水力模拟器EPANET的DL方法。与水力模拟器不同,我们的模型展示了远远更快的仿真时间,并且不会随着WDS规模的增加而急剧增加。此外,通过在五个真实世界WDS数据集上进行实验,我们实现了与地面真实数据高准确度和与水力模拟器非常相似的结果。
更新时间: 2024-03-27 13:51:26
领域: cs.LG,cs.AI
PDNNet: PDN-Aware GNN-CNN Heterogeneous Network for Dynamic IR Drop Prediction
IR drop on the power delivery network (PDN) is closely related to PDN's configuration and cell current consumption. As the integrated circuit (IC) design is growing larger, dynamic IR drop simulation becomes computationally unaffordable and machine learning based IR drop prediction has been explored as a promising solution. Although CNN-based methods have been adapted to IR drop prediction task in several works, the shortcomings of overlooking PDN configuration is non-negligible. In this paper, we consider not only how to properly represent cell-PDN relation, but also how to model IR drop following its physical nature in the feature aggregation procedure. Thus, we propose a novel graph structure, PDNGraph, to unify the representations of the PDN structure and the fine-grained cell-PDN relation. We further propose a dual-branch heterogeneous network, PDNNet, incorporating two parallel GNN-CNN branches to favorably capture the above features during the learning process. Several key designs are presented to make the dynamic IR drop prediction highly effective and interpretable. We are the first work to apply graph structure to deep-learning based dynamic IR drop prediction method. Experiments show that PDNNet outperforms the state-of-the-art CNN-based methods by up to 39.3% reduction in prediction error and achieves 545x speedup compared to the commercial tool, which demonstrates the superiority of our method.
Updated: 2024-03-27 13:50:13
标题: PDNNet:基于PDN感知的GNN-CNN异构网络用于动态IR降预测
摘要: 功率传递网络(PDN)上的IR降低与PDN的配置和单元电流消耗密切相关。随着集成电路(IC)设计变得越来越庞大,动态IR降低模拟变得计算成本高昂,基于机器学习的IR降低预测被探索为一种有希望的解决方案。尽管CNN-based方法已经在几项工作中被应用于IR降低预测任务,但忽视PDN配置的缺点是不可忽视的。在本文中,我们不仅考虑如何正确表示单元-PDN关系,还考虑如何在特征聚合过程中模拟IR降低的物理特性。因此,我们提出了一种新颖的图结构,PDNGraph,以统一PDN结构和细粒度单元-PDN关系的表示。我们进一步提出了一个双分支异构网络,PDNNet,结合两个并行的GNN-CNN分支,在学习过程中有利地捕捉上述特征。提出了几个关键设计,使动态IR降低预测高效且可解释。我们是第一家将图结构应用于基于深度学习的动态IR降低预测方法的研究。实验结果表明,PDNNet在预测误差上比最先进的基于CNN的方法提高了39.3%,与商业工具相比,实现了545倍的加速,证明了我们方法的优越性。
更新时间: 2024-03-27 13:50:13
领域: cs.LG,cs.AI
No-Regret Learning in Bilateral Trade via Global Budget Balance
Bilateral trade models the problem of intermediating between two rational agents -- a seller and a buyer -- both characterized by a private valuation for an item they want to trade. We study the online learning version of the problem, in which at each time step a new seller and buyer arrive and the learner has to set prices for them without any knowledge about their (adversarially generated) valuations. In this setting, known impossibility results rule out the existence of no-regret algorithms when budget balanced has to be enforced at each time step. In this paper, we introduce the notion of \emph{global budget balance}, which only requires the learner to fulfill budget balance over the entire time horizon. Under this natural relaxation, we provide the first no-regret algorithms for adversarial bilateral trade under various feedback models. First, we show that in the full-feedback model, the learner can guarantee $\tilde O(\sqrt{T})$ regret against the best fixed prices in hindsight, and that this bound is optimal up to poly-logarithmic terms. Second, we provide a learning algorithm guaranteeing a $\tilde O(T^{3/4})$ regret upper bound with one-bit feedback, which we complement with a $\Omega(T^{5/7})$ lower bound that holds even in the two-bit feedback model. Finally, we introduce and analyze an alternative benchmark that is provably stronger than the best fixed prices in hindsight and is inspired by the literature on bandits with knapsacks.
Updated: 2024-03-27 13:44:21
标题: 双边贸易中的无悔学习通过全局预算平衡
摘要: 双边贸易模型涉及两个理性代理人之间的中介问题 -- 一个卖方和一个买方 -- 他们都对他们想要交易的物品有着私人估值。我们研究了这个问题的在线学习版本,在这个版本中,每个时间步都会出现新的卖方和买方,学习者必须在没有任何关于它们(对手生成的)估值的情况下为它们设定价格。 在这种情况下,已知的不可能结果排除了当每个时间步都需要强制执行预算平衡时不存在遗憾算法的存在。在本文中,我们介绍了“全局预算平衡”的概念,它只要求学习者在整个时间范围内实现预算平衡。在这个自然的放宽条件下,我们提供了对各种反馈模型下的对手性双边贸易的第一个无遗憾算法。首先,我们展示了在完全反馈模型中,学习者可以保证对最佳固定价格的事后$\tilde O(\sqrt{T})$遗憾,并且这一界限在多对数项上是最优的。其次,我们提供了一个学习算法,保证在一位反馈的情况下有$\tilde O(T^{3/4})$的遗憾上界,我们将其与一个$\Omega(T^{5/7})$的下界相辅相成,即使在两位反馈模型中也成立。最后,我们介绍并分析了一个可证明比事后最佳固定价格更强大的替代基准,受到了有关带背包的老虎机的文献启发。
更新时间: 2024-03-27 13:44:21
领域: cs.GT,cs.LG
Shapley Values-Powered Framework for Fair Reward Split in Content Produced by GenAI
It is evident that, currently, generative models are surpassed in quality by human professionals. However, with the advancements in Artificial Intelligence, this gap will narrow, leading to scenarios where individuals who have dedicated years of their lives to mastering a skill become obsolete due to their high costs, which are inherently linked to the time they require to complete a task -- a task that AI could accomplish in minutes or seconds. To avoid future social upheavals, we must, even now, contemplate how to fairly assess the contributions of such individuals in training generative models and how to compensate them for the reduction or complete loss of their incomes. In this work, we propose a method to structure collaboration between model developers and data providers. To achieve this, we employ Shapley Values to quantify the contribution of artist(s) in an image generated by the Stable Diffusion-v1.5 model and to equitably allocate the reward among them.
Updated: 2024-03-27 13:42:25
标题: 使用沙普利值驱动的框架为GenAI生成的内容中的公平奖励分配
摘要: 目前明显的是,生成模型在质量上仍然不及人类专业人士。然而,随着人工智能的进步,这种差距将会缩小,导致那些花费多年时间来精通一项技能的个人因其高昂成本而变得过时,这与他们完成任务所需时间密切相关,而人工智能可以在几分钟甚至几秒钟内完成这项任务。为了避免未来社会动荡,我们必须即刻思考如何公平评估这些个人在训练生成模型中的贡献,并如何补偿他们因减少或完全失去收入而受到的影响。在这项工作中,我们提出了一种方法来构建模型开发者和数据提供者之间的合作关系。为了实现这一目标,我们利用Shapley Values来量化图像生成模型Stable Diffusion-v1.5中艺术家的贡献,并公平分配奖励。
更新时间: 2024-03-27 13:42:25
领域: cs.CV,cs.AI,91A12, 68T05, 91B32,I.2.6; I.3.3; I.2.0; J.5; J.7
Noise-Robust Keyword Spotting through Self-supervised Pretraining
Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored. Models of three different sizes are pretrained using different pretraining approaches and then fine-tuned for KWS. These models are then tested and compared to models trained using two baseline supervised learning methods, one being standard training using clean data and the other one being multi-style training (MTR). The results show that pretraining and fine-tuning on clean data is superior to supervised learning on clean data across all testing conditions, and superior to supervised MTR for testing conditions of SNR above 5 dB. This indicates that pretraining alone can increase the model's robustness. Finally, it is found that using noisy data for pretraining models, especially with the Data2Vec-denoising approach, significantly enhances the robustness of KWS models in noisy conditions.
Updated: 2024-03-27 13:42:14
标题: 通过自监督预训练实现抗噪声的关键词识别
摘要: 语音助手现在已经广泛可用,为了激活它们,通常会使用关键词识别(KWS)算法。现代KWS系统主要使用监督学习方法进行训练,并需要大量标记数据才能取得良好的性能。利用自监督学习(SSL)来利用未标记数据已被证明可以提高在清洁环境下的准确性。本文探讨了如何利用SSL预训练(如Data2Vec)来增强KWS模型在嘈杂环境中的鲁棒性,这是一个未被充分探索的领域。使用不同预训练方法对三种不同大小的模型进行预训练,然后对KWS进行微调。这些模型然后被测试并与使用两种基准监督学习方法训练的模型进行比较,其中一种是使用清洁数据进行标准训练,另一种是多样式训练(MTR)。结果表明,在所有测试条件下,对干净数据进行预训练和微调优于对干净数据进行监督学习,并且在信噪比超过5 dB的测试条件下优于监督MTR。这表明仅仅通过预训练就可以提高模型的鲁棒性。最后,发现使用嘈杂数据对模型进行预训练,尤其是使用Data2Vec去噪方法,显著增强了KWS模型在嘈杂环境中的鲁棒性。
更新时间: 2024-03-27 13:42:14
领域: eess.AS,cs.LG,cs.SD,68T10,I.2.6
Intelligent Learning Rate Distribution to reduce Catastrophic Forgetting in Transformers
Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the common practice of fine-tuning with a flat learning rate for the entire network in this context. We perform a hyperparameter optimization process to find learning rate distributions that are better than a flat learning rate. We combine the learning rate distributions thus found and show that they generalize to better performance with respect to the problem of catastrophic forgetting. We validate these learning rate distributions with a variety of NLP benchmarks from the GLUE dataset.
Updated: 2024-03-27 13:40:09
标题: 智能学习率分布以减少Transformer中的灾难性遗忘
摘要: 在自然语言处理中,利用大型文本语料库对语言模型进行预训练是一种常见做法。然后对这些模型进行微调,以在各种任务中取得最佳结果。本文研究了变压器神经网络中灾难性遗忘的问题,并质疑在这种情况下使用整个网络的固定学习率进行微调的常见做法。我们进行了超参数优化过程,找到了比固定学习率更好的学习率分布。我们结合这些学习率分布,并展示它们在解决灾难性遗忘问题方面具有更好的泛化性能。我们通过GLUE数据集中的各种自然语言处理基准验证了这些学习率分布。
更新时间: 2024-03-27 13:40:09
领域: cs.CL,cs.AI,cs.LG
Few-Shot Detection of Machine-Generated Text using Style Representations
The advent of instruction-tuned language models that convincingly mimic human writing poses a significant risk of abuse. However, such abuse may be counteracted with the ability to detect whether a piece of text was composed by a language model rather than a human author. Some previous approaches to this problem have relied on supervised methods by training on corpora of confirmed human- and machine- written documents. Unfortunately, model under-specification poses an unavoidable challenge for neural network-based detectors, making them brittle in the face of data shifts, such as the release of newer language models producing still more fluent text than the models used to train the detectors. Other approaches require access to the models that may have generated a document in question, which is often impractical. In light of these challenges, we pursue a fundamentally different approach not relying on samples from language models of concern at training time. Instead, we propose to leverage representations of writing style estimated from human-authored text. Indeed, we find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors, including state-of-the-art large language models like Llama-2, ChatGPT, and GPT-4. Furthermore, given a handful of examples composed by each of several specific language models of interest, our approach affords the ability to predict which model generated a given document. The code and data to reproduce our experiments are available at https://github.com/LLNL/LUAR/tree/main/fewshot_iclr2024.
Updated: 2024-03-27 13:38:35
标题: 使用风格表示进行少样本检测机器生成文本
摘要: 指导性调整的语言模型的出现使得能够逼真地模仿人类写作的风险显著增加。然而,这种滥用可能会被检测出来,即能够检测一段文本是否是由语言模型而非人类作者所创作。一些先前的方法依赖于监督方法,通过在已确认的人类和机器写作文档的语料库上进行训练来解决这个问题。不幸的是,模型的不充分规范性对基于神经网络的检测器构成了无法避免的挑战,使它们在面对数据转移时变得脆弱,例如发布新的语言模型,能够产生比用于训练检测器的模型更加流畅的文本。其他方法则需要访问可能生成了问题文档的模型,这通常是不切实际的。鉴于这些挑战,我们采用了一种根本不依赖于在训练时从关注的语言模型中取样的方法。相反,我们提议利用从人类创作的文本中估计得到的写作风格表示。事实上,我们发现能够有效区分人类作者的特征也能够有效区分人类和机器作者,包括像Llama-2、ChatGPT和GPT-4这样的最新大型语言模型。此外,鉴于每个感兴趣的几个特定语言模型创作的少量示例,我们的方法可以预测哪个模型生成了给定的文档。我们的实验代码和数据可在https://github.com/LLNL/LUAR/tree/main/fewshot_iclr2024找到。
更新时间: 2024-03-27 13:38:35
领域: cs.CL,cs.LG
ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models
Exploring alternative ideas by rewriting text is integral to the writing process. State-of-the-art Large Language Models (LLMs) can simplify writing variation generation. However, current interfaces pose challenges for simultaneous consideration of multiple variations: creating new variations without overwriting text can be difficult, and pasting them sequentially can clutter documents, increasing workload and disrupting writers' flow. To tackle this, we present ABScribe, an interface that supports rapid, yet visually structured, exploration and organization of writing variations in human-AI co-writing tasks. With ABScribe, users can swiftly modify variations using LLM prompts, which are auto-converted into reusable buttons. Variations are stored adjacently within text fields for rapid in-place comparisons using mouse-over interactions on a popup toolbar. Our user study with 12 writers shows that ABScribe significantly reduces task workload (d = 1.20, p < 0.001), enhances user perceptions of the revision process (d = 2.41, p < 0.001) compared to a popular baseline workflow, and provides insights into how writers explore variations using LLMs.
Updated: 2024-03-27 13:38:00
标题: ABScribe:使用大型语言模型快速探索和组织人工智能共同写作任务中的多种写作变体
摘要: 探索通过重写文本的替代想法是写作过程中的重要部分。最先进的大型语言模型(LLMs)可以简化写作变体生成。然而,当前的界面对于同时考虑多个变体存在挑战:创建新的变体而不覆盖文本可能很困难,并且按顺序粘贴它们可能会使文档混乱,增加工作量并破坏作者的写作流畅性。为了解决这个问题,我们提出了ABScribe,这是一个支持人工智能协作任务中写作变体的快速、但视觉结构化的探索和组织的界面。使用ABScribe,用户可以迅速修改LLM提示生成的变体,这些提示会自动转换为可重复使用的按钮。变体存储在文本字段中,通过鼠标悬停在弹出式工具栏上进行快速就地比较。我们对12名作家进行的用户研究表明,与流行的基线工作流程相比,ABScribe显著减少了任务工作量(d = 1.20, p < 0.001),增强了用户对修订过程的感知(d = 2.41, p < 0.001),并提供了作家如何使用LLMs探索变体的见解。
更新时间: 2024-03-27 13:38:00
领域: cs.HC,cs.AI,cs.LG
Neural Architecture Search for Sentence Classification with BERT
Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset.
Updated: 2024-03-27 13:25:43
标题: 使用BERT进行句子分类的神经结构搜索
摘要: 在自然语言处理中,对大型文本语料库进行语言模型的预训练是常见做法。随后,对这些模型进行微调以在各种任务上取得最佳结果。本文质疑只在网络顶部添加单个输出层作为分类头的常见做法。我们进行了自动机器学习搜索,找到了在计算成本只有很小的情况下胜过当前单层的结构。我们在GLUE数据集的各种自然语言处理基准上验证了我们的分类架构。
更新时间: 2024-03-27 13:25:43
领域: cs.AI
Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes
Fast and robust object grasping in clutter is a crucial component of robotics. Most current works resort to the whole observed point cloud for 6-Dof grasp generation, ignoring the guidance information excavated from global semantics, thus limiting high-quality grasp generation and real-time performance. In this work, we show that the widely used heatmaps are underestimated in the efficiency of 6-Dof grasp generation. Therefore, we propose an effective local grasp generator combined with grasp heatmaps as guidance, which infers in a global-to-local semantic-to-point way. Specifically, Gaussian encoding and the grid-based strategy are applied to predict grasp heatmaps as guidance to aggregate local points into graspable regions and provide global semantic information. Further, a novel non-uniform anchor sampling mechanism is designed to improve grasp accuracy and diversity. Benefiting from the high-efficiency encoding in the image space and focusing on points in local graspable regions, our framework can perform high-quality grasp detection in real-time and achieve state-of-the-art results. In addition, real robot experiments demonstrate the effectiveness of our method with a success rate of 94% and a clutter completion rate of 100%. Our code is available at https://github.com/THU-VCLab/HGGD.
Updated: 2024-03-27 13:24:58
标题: 在混乱场景中高效的热图引导下的六自由度抓取检测
摘要: 在杂乱环境中快速而稳健地抓取物体是机器人技术中至关重要的组成部分。大多数当前的工作都利用整个观察到的点云进行6自由度抓取生成,忽视了从全局语义中挖掘的引导信息,从而限制了高质量的抓取生成和实时性能。在这项工作中,我们展示了广泛使用的热图在6自由度抓取生成的效率上被低估了。因此,我们提出了一种有效的本地抓取生成器,结合了抓取热图作为引导,以全局到本地语义到点的方式推断。具体而言,采用高斯编码和基于网格的策略预测抓取热图作为引导,将本地点聚合到可抓取区域并提供全局语义信息。此外,设计了一种新颖的非均匀锚点采样机制来提高抓取精度和多样性。由于在图像空间中高效编码和专注于本地可抓取区域中的点,我们的框架可以实时执行高质量的抓取检测,并取得最先进的结果。此外,真实的机器人实验证明了我们方法的有效性,成功率为94%,杂乱物体完成率为100%。我们的代码可在https://github.com/THU-VCLab/HGGD 上找到。
更新时间: 2024-03-27 13:24:58
领域: cs.RO,cs.AI,cs.CV
Attention-aware semantic relevance predicting Chinese sentence reading
In recent years, several influential computational models and metrics have been proposed to predict how humans comprehend and process sentence. One particularly promising approach is contextual semantic similarity. Inspired by the attention algorithm in Transformer and human memory mechanisms, this study proposes an ``attention-aware'' approach for computing contextual semantic relevance. This new approach takes into account the different contributions of contextual parts and the expectation effect, allowing it to incorporate contextual information fully. The attention-aware approach also facilitates the simulation of existing reading models and evaluate them. The resulting ``attention-aware'' metrics of semantic relevance can more accurately predict fixation durations in Chinese reading tasks recorded in an eye-tracking corpus than those calculated by existing approaches. The study's findings further provide strong support for the presence of semantic preview benefits in Chinese naturalistic reading. Furthermore, the attention-aware metrics of semantic relevance, being memory-based, possess high interpretability from both linguistic and cognitive standpoints, making them a valuable computational tool for modeling eye-movements in reading and further gaining insight into the process of language comprehension. Our approach underscores the potential of these metrics to advance our comprehension of how humans understand and process language, ultimately leading to a better understanding of language comprehension and processing.
Updated: 2024-03-27 13:22:38
标题: 关注感知的语义相关性预测中文句子阅读
摘要: 近年来,提出了几种有影响力的计算模型和度量标准,用于预测人类理解和处理句子的方式。其中一种特别有前景的方法是上下文语义相似性。受Transformer中的注意力算法和人类记忆机制的启发,本研究提出了一种“注意力感知”方法,用于计算上下文语义相关性。这种新方法考虑了不同上下文部分的贡献和期望效应,使其能够充分整合上下文信息。注意力感知方法还促进了现有阅读模型的模拟和评估。由此产生的语义相关性的“注意力感知”度量能够比现有方法计算的更准确地预测在眼动追踪语料库中记录的汉语阅读任务中的注视持续时间。该研究的发现进一步支持了在中国自然阅读中存在语义预览效应的观点。此外,作为基于记忆的注意力感知语义相关性度量具有从语言学和认知角度高度可解释性,使其成为对阅读中眼动进行建模的有价值的计算工具,并进一步深入了解语言理解过程。我们的方法强调了这些度量标准推动我们理解人类如何理解和处理语言的潜力,最终促使更好地理解语言理解和处理。
更新时间: 2024-03-27 13:22:38
领域: cs.CL,cs.LG
skscope: Fast Sparsity-Constrained Optimization in Python
Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two examples in the paper, where sparse linear regression and trend filtering are addressed with just four lines of code. More importantly, skscope's efficient implementation allows state-of-the-art solvers to quickly attain the sparse solution regardless of the high dimensionality of parameter space. Numerical experiments reveal the available solvers in skscope can achieve up to 80x speedup on the competing relaxation solutions obtained via the benchmarked convex solver. skscope is published on the Python Package Index (PyPI) and Conda, and its source code is available at: https://github.com/abess-team/skscope.
Updated: 2024-03-27 13:17:15
标题: skscope:Python中的快速稀疏约束优化
摘要: 将迭代求解器应用于稀疏约束优化(SCO)需要繁琐的数学推导和仔细的编程/调试,这阻碍了这些求解器的广泛影响。在这篇论文中,介绍了名为skscope的库,以克服这样的障碍。使用skscope,用户只需编程目标函数即可解决SCO。论文中通过两个例子演示了skscope的便利性,其中稀疏线性回归和趋势滤波仅需四行代码即可解决。更重要的是,skscope的高效实现使最先进的求解器能够快速获得稀疏解,而不受参数空间高维度的影响。数值实验显示,skscope中可用的求解器可以在与基准凸求解器获得的竞争松弛解相比提高80倍的速度。skscope已发布在Python包索引(PyPI)和Conda上,其源代码可以在以下链接获得:https://github.com/abess-team/skscope.
更新时间: 2024-03-27 13:17:15
领域: stat.ML,cs.LG,stat.CO
A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks
Legal autonomy - the lawful activity of artificial intelligence agents - can be achieved in one of two ways. It can be achieved either by imposing constraints on AI actors such as developers, deployers and users, and on AI resources such as data, or by imposing constraints on the range and scope of the impact that AI agents can have on the environment. The latter approach involves encoding extant rules concerning AI driven devices into the software of AI agents controlling those devices (e.g., encoding rules about limitations on zones of operations into the agent software of an autonomous drone device). This is a challenge since the effectivity of such an approach requires a method of extracting, loading, transforming and computing legal information that would be both explainable and legally interoperable, and that would enable AI agents to reason about the law. In this paper, we sketch a proof of principle for such a method using large language models (LLMs), expert legal systems known as legal decision paths, and Bayesian networks. We then show how the proposed method could be applied to extant regulation in matters of autonomous cars, such as the California Vehicle Code.
Updated: 2024-03-27 13:12:57
标题: 通向法律自治的路径:利用大型语言模型、专家系统和贝叶斯网络提取、转换、加载和计算法律信息的可互操作和可解释方法
摘要: 法律自治——人工智能代理的合法活动——可以通过两种方式实现。一种方式是对AI参与者(如开发者、部署者和用户)以及AI资源(如数据)施加限制;另一种方式是对AI代理对环境可能造成的影响范围和程度施加限制。后一种方法涉及将有关由AI驱动设备的现有规则编码到控制这些设备的AI代理软件中(例如,将有关操作区域限制的规则编码到自主无人机设备的代理软件中)。这是一个挑战,因为这种方法的有效性需要一种提取、加载、转换和计算法律信息的方法,该方法既能解释清楚又能在法律上互操作,并使AI代理能够理解法律。在本文中,我们概述了使用大型语言模型(LLM)、专家法律系统(称为法律决策路径)和贝叶斯网络的此类方法的原理证明。然后,我们展示了所提出的方法如何应用于自动驾驶汽车事务中的现有监管,如加利福尼亚车辆法典。
更新时间: 2024-03-27 13:12:57
领域: cs.AI,cs.CL,cs.CY,cs.LO
A Novel Behavior-Based Recommendation System for E-commerce
The majority of existing recommender systems rely on user ratings, which are limited by the lack of user collaboration and the sparsity problem. To address these issues, this study proposes a behavior-based recommender system that leverages customers' natural behaviors, such as browsing and clicking, on e-commerce platforms. The proposed recommendation system involves clustering active customers, determining neighborhoods, collecting similar users, calculating product reputation based on similar users, and recommending high-reputation products. To overcome the complexity of customer behaviors and traditional clustering methods, an unsupervised clustering approach based on product categories is developed to enhance the recommendation methodology. This study makes notable contributions in several aspects. Firstly, a groundbreaking behavior-based recommendation methodology is developed, incorporating customer behavior to generate accurate and tailored recommendations leading to improved customer satisfaction and engagement. Secondly, an original unsupervised clustering method, focusing on product categories, enables more precise clustering and facilitates accurate recommendations. Finally, an approach to determine neighborhoods for active customers within clusters is established, ensuring grouping of customers with similar behavioral patterns to enhance recommendation accuracy and relevance. The proposed recommendation methodology and clustering method contribute to improved recommendation performance, offering valuable insights for researchers and practitioners in the field of e-commerce recommendation systems. Additionally, the proposed method outperforms benchmark methods in experiments conducted using a behavior dataset from the well-known e-commerce site Alibaba.
Updated: 2024-03-27 13:12:41
标题: 一种基于行为的电子商务推荐系统
摘要: 现有大多数推荐系统依赖用户评分,但由于缺乏用户合作和稀疏问题而受限。为了解决这些问题,本研究提出了一种基于行为的推荐系统,利用顾客在电子商务平台上的自然行为,如浏览和点击。所提出的推荐系统涉及对活跃顾客进行聚类、确定邻域、收集相似用户、基于相似用户计算产品声誉以及推荐高声誉产品。为了克服顾客行为和传统聚类方法的复杂性,开发了一种基于产品类别的无监督聚类方法,以增强推荐方法。本研究在几个方面取得了显著贡献。首先,开发了一种开创性的基于行为的推荐方法,将顾客行为纳入其中,生成准确和个性化的推荐,提高顾客满意度和参与度。其次,一种原创的无监督聚类方法,专注于产品类别,实现更精确的聚类并促进准确的推荐。最后,建立了一种为聚类内的活跃顾客确定邻域的方法,确保将具有相似行为模式的顾客分组以增强推荐的准确性和相关性。所提出的推荐方法和聚类方法提升了推荐性能,为电子商务推荐系统领域的研究人员和从业者提供了有价值的见解。此外,所提出的方法在使用来自知名电子商务网站阿里巴巴的行为数据集进行的实验证明中优于基准方法。
更新时间: 2024-03-27 13:12:41
领域: cs.IR,cs.AI,cs.HC
Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE.
Updated: 2024-03-27 13:11:34
标题: 层次化VAE受理论界限引导的神经图像编解码器
摘要: 最近的研究揭示了变分自编码器(VAEs)与率失真理论之间的显著理论联系,特别是利用VAEs来估计图像信息率失真函数的理论上界。这些估计的理论上界显著超过现有神经图像编解码器(NICs)的性能。为了缩小这一差距,我们提出了一种理论上界引导的分层VAE(BG-VAE)用于NIC。提出的BG-VAE利用理论上界来引导NIC模型朝着增强性能的方向发展。我们使用分层VAEs实现了BG-VAE,并通过大量实验证明了其有效性。除了先进的神经网络模块,我们还提供了一种多功能、可变速率的NIC,无论是在考虑率失真性能还是计算复杂性时,都优于现有方法。代码可在BG-VAE上找到。
更新时间: 2024-03-27 13:11:34
领域: eess.IV,cs.LG
Language Plays a Pivotal Role in the Object-Attribute Compositional Generalization of CLIP
Vision-language models, such as CLIP, have shown promising Out-of-Distribution (OoD) generalization under various types of distribution shifts. Recent studies attempted to investigate the leading cause of this capability. In this work, we follow the same path, but focus on a specific type of OoD data - images with novel compositions of attribute-object pairs - and study whether such models can successfully classify those images into composition classes. We carefully designed an authentic image test dataset called ImageNet-AO, consisting of attributes for objects that are unlikely encountered in the CLIP training sets. We found that CLIPs trained with large datasets such as OpenAI CLIP, LAION-400M, and LAION-2B show orders-of-magnitude improvement in effective compositional OoD generalization compared to both supervised models and CLIPs trained with smaller datasets, such as CC-12M and YFCC-15M. Our results provide evidence that the scale and diversity of training data and language supervision play a key role in unlocking the compositional generalization abilities of vision-language models.
Updated: 2024-03-27 12:59:44
标题: 语言在CLIP对象属性组合概括中发挥关键作用
摘要: 视觉-语言模型,如CLIP,已经展示了在各种类型的分布偏移下具有有希望的超出分布(OoD)泛化能力。最近的研究尝试调查这种能力的主要原因。在这项工作中,我们沿着同样的道路前进,但专注于一种特定类型的OoD数据 - 具有新颖的属性-对象对组合的图像 - 并研究这些模型是否能成功地将这些图像分类为组合类别。我们精心设计了一个真实图像测试数据集,称为ImageNet-AO,其中包含了在CLIP训练集中不太可能遇到的对象属性。我们发现,使用大型数据集(如OpenAI CLIP、LAION-400M和LAION-2B)训练的CLIPs在有效的组合OoD泛化方面比受监督模型和使用较小数据集(如CC-12M和YFCC-15M)训练的CLIPs有数量级的改进。我们的结果表明,训练数据的规模和多样性以及语言监督在释放视觉-语言模型的组合泛化能力方面起着关键作用。
更新时间: 2024-03-27 12:59:44
领域: cs.CV,cs.CL,cs.LG
CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning
Partial-label learning (PLL) is an important weakly supervised learning problem, which allows each training example to have a candidate label set instead of a single ground-truth label. Identification-based methods have been widely explored to tackle label ambiguity issues in PLL, which regard the true label as a latent variable to be identified. However, identifying the true labels accurately and completely remains challenging, causing noise in pseudo labels during model training. In this paper, we propose a new method called CroSel, which leverages historical predictions from the model to identify true labels for most training examples. First, we introduce a cross selection strategy, which enables two deep models to select true labels of partially labeled data for each other. Besides, we propose a novel consistency regularization term called co-mix to avoid sample waste and tiny noise caused by false selection. In this way, CroSel can pick out the true labels of most examples with high precision. Extensive experiments demonstrate the superiority of CroSel, which consistently outperforms previous state-of-the-art methods on benchmark datasets. Additionally, our method achieves over 90\% accuracy and quantity for selecting true labels on CIFAR-type datasets under various settings.
Updated: 2024-03-27 12:53:12
标题: CroSel: 部分标签学习中自信伪标签的交叉选择
摘要: Partial-label learning (PLL)是一个重要的弱监督学习问题,它允许每个训练样本具有一个候选标签集,而不是一个单一的地面真相标签。基于鉴别的方法已被广泛研究,用于解决PLL中的标签模糊问题,这些方法将真实标签视为待识别的潜变量。然而,准确完全地识别真实标签仍然具有挑战性,在模型训练过程中导致伪标签中的噪音。在本文中,我们提出了一种名为CroSel的新方法,利用模型的历史预测来识别大多数训练样本的真实标签。首先,我们介绍了一个交叉选择策略,使得两个深度模型可以相互选择部分标记数据的真实标签。此外,我们提出了一种名为co-mix的新颖一致性正则化项,以避免由错误选择引起的样本浪费和微小噪音。通过这种方式,CroSel可以高精度地找出大多数示例的真实标签。大量实验证明了CroSel的优越性,该方法在基准数据集上始终优于先前的最先进方法。此外,我们的方法在各种设置下在CIFAR类型数据集上实现了超过90\%的准确率和数量,用于选择真实标签。
更新时间: 2024-03-27 12:53:12
领域: cs.LG,cs.AI,I.2
Improving Line Search Methods for Large Scale Neural Network Training
In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets and more complex data domains than before. Specifically, we improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training, a task that was previously prone to failure using Armijo line search methods. Our optimization approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data. Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer.
Updated: 2024-03-27 12:50:27
标题: 改进大规模神经网络训练的线性搜索方法
摘要: 在最近的研究中,线性搜索方法在传统随机梯度下降技术的性能方面显示出显著的改进,消除了对特定学习率计划的需求。在本文中,我们确定了现有最先进的线性搜索方法中存在的问题,提出了改进措施,并严格评估了它们的有效性。我们在比以往更大的数据集和更复杂的数据领域上测试了这些方法。具体而言,我们通过将ADAM中的动量项整合到Armijo线性搜索中的搜索方向中,改进了Armijo线性搜索,使得能够有效进行大规模训练,这是以前使用Armijo线性搜索方法容易失败的任务。我们的优化方法优于以前的Armijo实现和调整后的Adam学习率计划。我们的评估集中在NLP和图像数据的Transformer和CNN领域。我们的工作作为一个Python包公开可用,提供了一个无超参数的Pytorch优化器。
更新时间: 2024-03-27 12:50:27
领域: cs.LG,cs.AI
Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models
Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because of the multifactor nature of these models and the lack of theory to back these choices. This paper aims at improving upon these issues. By studying a more general model called the Homogeneous Regularized Scale-Invariant, we prove that the scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. This observation allows to better understand the effect of regularization functions in low-rank approximation models, to guide the choice of the regularization hyperparameters, and to design balancing strategies to enhance the convergence speed of dedicated optimization algorithms. Some of these results were already known but restricted to specific instances of regularized low-rank approximations. We also derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations, with convergence guarantees. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.
Updated: 2024-03-27 12:49:14
标题: 正则化非负尺度不变低秩逼近模型的高效算法
摘要: 正则化非负低秩逼近,如稀疏非负矩阵分解或稀疏非负Tucker分解,是一种重要的降维模型,具有增强的可解释性。然而,从实际的角度来看,由于这些模型的多因素特性以及缺乏支撑这些选择的理论,正则化函数和正则化系数的选择,以及高效算法的设计是具有挑战性的。本文旨在改进这些问题。通过研究一个更一般的模型称为同质正则化尺度不变,我们证明了低秩逼近模型固有的尺度不变性导致了一种隐式的正则化,具有意想不到的有益和有害效果。这一观察使我们能够更好地理解低秩逼近模型中正则化函数的影响,指导正则化超参数的选择,并设计平衡策略以增强专用优化算法的收敛速度。其中一些结果已经知道,但局限于特定实例的正则化低秩逼近。我们还推导了一个通用的主要化极小化算法,处理许多正则化非负低秩逼近,并保证收敛。我们在稀疏非负矩阵分解、岭正则化的规范Polyadic分解和稀疏非负Tucker分解上展示了我们的贡献。
更新时间: 2024-03-27 12:49:14
领域: cs.LG,cs.NA,math.NA,math.OC
CT-3DFlow : Leveraging 3D Normalizing Flows for Unsupervised Detection of Pathological Pulmonary CT scans
Unsupervised pathology detection can be implemented by training a model on healthy data only and measuring the deviation from the training set upon inference, for example with CNN-based feature extraction and one-class classifiers, or reconstruction-score-based methods such as AEs, GANs and Diffusion models. Normalizing Flows (NF) have the ability to directly learn the probability distribution of training examples through an invertible architecture. We leverage this property in a novel 3D NF-based model named CT-3DFlow, specifically tailored for patient-level pulmonary pathology detection in chest CT data. Our model is trained unsupervised on healthy 3D pulmonary CT patches, and detects deviations from its log-likelihood distribution as anomalies. We aggregate patches-level likelihood values from a patient's CT scan to provide a patient-level 'normal'/'abnormal' prediction. Out-of-distribution detection performance is evaluated using expert annotations on a separate chest CT test dataset, outperforming other state-of-the-art methods.
Updated: 2024-03-27 12:44:57
标题: CT-3DFlow:利用3D归一化流进行无监督检测病理性肺部CT扫描
摘要: 无监督病理检测可以通过仅在健康数据上训练模型并在推断时测量与训练集偏差来实现,例如使用基于CNN的特征提取和一类分类器,或基于重建分数的方法,如自动编码器(AEs)、生成对抗网络(GANs)和扩散模型。归一化流(NF)具有直接通过可逆架构学习训练样本的概率分布的能力。我们利用这一特性在一种名为CT-3DFlow的新型3D NF模型中,专门针对胸部CT数据中的患者级肺部病理检测进行了设计。我们的模型在健康的3D肺部CT块上进行了无监督训练,并检测其对数似然分布中的偏差作为异常。我们从患者的CT扫描中聚合块级似然值,以提供患者级别的“正常”/“异常”预测。使用专家注释在单独的胸部CT测试数据集上评估了超出分布检测性能,胜过其他最先进方法。
更新时间: 2024-03-27 12:44:57
领域: eess.IV,cs.CV,cs.LG
Distributed Maximum Consensus over Noisy Links
We introduce a distributed algorithm, termed noise-robust distributed maximum consensus (RD-MC), for estimating the maximum value within a multi-agent network in the presence of noisy communication links. Our approach entails redefining the maximum consensus problem as a distributed optimization problem, allowing a solution using the alternating direction method of multipliers. Unlike existing algorithms that rely on multiple sets of noise-corrupted estimates, RD-MC employs a single set, enhancing both robustness and efficiency. To further mitigate the effects of link noise and improve robustness, we apply moving averaging to the local estimates. Through extensive simulations, we demonstrate that RD-MC is significantly more robust to communication link noise compared to existing maximum-consensus algorithms.
Updated: 2024-03-27 12:39:16
标题: 在有噪声链路上的分布式最大一致性
摘要: 我们介绍了一种分布式算法,称为噪声鲁棒分布式最大共识(RD-MC),用于在存在嘈杂通信链路的情况下估计多智能体网络中的最大值。我们的方法涉及将最大共识问题重新定义为分布式优化问题,允许使用交替方向乘法器方法来解决。与依赖多组噪声污染估计的现有算法不同,RD-MC使用单一集合,增强了鲁棒性和效率。为了进一步减轻链路噪声的影响并提高鲁棒性,我们对本地估计应用移动平均。通过大量模拟,我们证明了相对于现有最大共识算法,RD-MC对通信链路噪声更具鲁棒性。
更新时间: 2024-03-27 12:39:16
领域: cs.DC,cs.LG,eess.SP
Faster Convergence for Transformer Fine-tuning with Line Search Methods
Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and perform the line search separately on these local units. Our optimization method outperforms the traditional Adam optimizer and achieves significant performance improvements for small data sets or small training budgets, while performing equal or better for other tested cases. Our work is publicly available as a python package, which provides a hyperparameter-free pytorch optimizer that is compatible with arbitrary network architectures.
Updated: 2024-03-27 12:35:23
标题: 使用线性搜索方法加快Transformer微调的收敛速度
摘要: 最近的研究表明,线搜索方法极大地提高了传统随机梯度下降方法在各种数据集和架构上的性能。在这项工作中,我们成功地将线搜索方法扩展到新颖且广受欢迎的Transformer架构和自然语言处理中的数据领域。更具体地说,我们将Armijo线搜索与Adam优化器结合起来,并通过将网络架构细分为合理的单元,并在这些局部单元上分别执行线搜索来扩展它。我们的优化方法优于传统的Adam优化器,并在小数据集或小训练预算的情况下实现显著的性能改善,同时在其他测试案例中表现相等或更好。我们的工作作为一个Python软件包公开可用,提供了一个无超参数的PyTorch优化器,与任意网络架构兼容。
更新时间: 2024-03-27 12:35:23
领域: cs.LG,cs.AI
Machine Learning Optimized Orthogonal Basis Piecewise Polynomial Approximation
Piecewise Polynomials (PPs) are utilized in several engineering disciplines, like trajectory planning, to approximate position profiles given in the form of a set of points. While the approximation target along with domain-specific requirements, like Ck -continuity, can be formulated as a system of equations and a result can be computed directly, such closed-form solutions posses limited flexibility with respect to polynomial degrees, polynomial bases or adding further domain-specific requirements. Sufficiently complex optimization goals soon call for the use of numerical methods, like gradient descent. Since gradient descent lies at the heart of training Artificial Neural Networks (ANNs), modern Machine Learning (ML) frameworks like TensorFlow come with a set of gradient-based optimizers potentially suitable for a wide range of optimization problems beyond the training task for ANNs. Our approach is to utilize the versatility of PP models and combine it with the potential of modern ML optimizers for the use in function approximation in 1D trajectory planning in the context of electronic cam design. We utilize available optimizers of the ML framework TensorFlow directly, outside of the scope of ANNs, to optimize model parameters of our PP model. In this paper, we show how an orthogonal polynomial basis contributes to improving approximation and continuity optimization performance. Utilizing Chebyshev polynomials of the first kind, we develop a novel regularization approach enabling clearly improved convergence behavior. We show that, using this regularization approach, Chebyshev basis performs better than power basis for all relevant optimizers in the combined approximation and continuity optimization setting and demonstrate usability of the presented approach within the electronic cam domain.
Updated: 2024-03-27 12:28:02
标题: 机器学习优化的正交基分段多项式逼近
摘要: 分段多项式(PPs)在几个工程学科中被使用,比如轨迹规划,用于逼近以一组点的形式给出的位置曲线。虽然逼近目标以及特定领域的要求,比如Ck-连续性,可以被表述为一组方程,结果可以直接计算,但这样的闭式解对于多项式次数、多项式基或添加更多领域特定要求的灵活性有限。足够复杂的优化目标很快就需要使用数值方法,比如梯度下降。由于梯度下降是训练人工神经网络(ANNs)的核心,现代机器学习(ML)框架如TensorFlow配备了一组基于梯度的优化器,可能适用于训练人工神经网络之外的一系列优化问题。我们的方法是利用PP模型的多功能性,并将其与现代ML优化器的潜力结合起来,用于在电子凸轮设计背景下的一维轨迹规划中的函数逼近。我们直接利用ML框架TensorFlow的可用优化器,在超出ANNs范围的情况下,优化我们的PP模型的模型参数。在本文中,我们展示了正交多项式基如何有助于改善逼近和连续性优化性能。利用第一类Chebyshev多项式,我们开发了一种新颖的正则化方法,实现了明显改善的收敛行为。我们展示了,在使用这种正则化方法的情况下,对于所有相关优化器,在组合逼近和连续性优化设置中,Chebyshev基表现出比幂基更好的性能,并展示了所提出方法在电子凸轮领域内的可用性。
更新时间: 2024-03-27 12:28:02
领域: cs.LG
Challenging Common Paradigms in Multi-Task Learning
While multi-task learning (MTL) has gained significant attention in recent years, its underlying mechanisms remain poorly understood. Recent methods did not yield consistent performance improvements over single task learning (STL) baselines, underscoring the importance of gaining more profound insights about challenges specific to MTL. In our study, we challenge paradigms in MTL in the context of STL: First, the impact of the choice of optimizer has only been mildly investigated in MTL. We show the pivotal role of common STL tools such as the Adam optimizer in MTL empirically in various experiments. To further investigate Adam's effectiveness, we theoretical derive a partial loss-scale invariance under mild assumptions. Second, the notion of gradient conflicts has often been phrased as a specific problem in MTL. We delve into the role of gradient conflicts in MTL and compare it to STL. For angular gradient alignment we find no evidence that this is a unique problem in MTL. We emphasize differences in gradient magnitude as the main distinguishing factor. Lastly, we compare the transferability of features learned through MTL and STL on common image corruptions, and find light evidence that MTL can lead to superior transferability. Overall, we find surprising similarities between STL and MTL suggesting to consider methods from both fields in a broader context.
Updated: 2024-03-27 12:24:17
标题: 挑战多任务学习中的常见范式
摘要: 尽管多任务学习(MTL)近年来引起了相当大的关注,但其基本机制仍知之甚少。最近的方法未能在单任务学习(STL)基准上产生一致的性能改进,凸显了对于了解MTL特定挑战更深入的洞察力的重要性。在我们的研究中,我们挑战了MTL在STL的背景下的范式:首先,在MTL中,优化器的选择的影响仅受到轻微调查。我们通过各种实验证明了像Adam优化器这样的常见STL工具在MTL中的关键作用。为了进一步研究Adam的有效性,我们在温和的假设下从理论上推导了部分损失规模不变性。其次,梯度冲突的概念经常被描述为MTL中的一个特定问题。我们深入探讨了梯度冲突在MTL中的作用,并将其与STL进行比较。对于角度梯度对齐,我们没有发现这是MTL中的一个独特问题的证据。我们强调梯度幅度的差异是主要的区分因素。最后,我们比较了通过MTL和STL学习的特征在常见图像破坏上的可传递性,并发现轻微证据表明MTL可以导致更好的可传递性。总的来说,我们发现STL和MTL之间存在令人惊讶的相似之处,建议在更广泛的背景下考虑来自这两个领域的方法。
更新时间: 2024-03-27 12:24:17
领域: cs.LG,cs.AI,cs.CV
Direct mineral content prediction from drill core images via transfer learning
Deep subsurface exploration is important for mining, oil and gas industries, as well as in the assessment of geological units for the disposal of chemical or nuclear waste, or the viability of geothermal energy systems. Typically, detailed examinations of subsurface formations or units are performed on cuttings or core materials extracted during drilling campaigns, as well as on geophysical borehole data, which provide detailed information about the petrophysical properties of the rocks. Depending on the volume of rock samples and the analytical program, the laboratory analysis and diagnostics can be very time-consuming. This study investigates the potential of utilizing machine learning, specifically convolutional neural networks (CNN), to assess the lithology and mineral content solely from analysis of drill core images, aiming to support and expedite the subsurface geological exploration. The paper outlines a comprehensive methodology, encompassing data preprocessing, machine learning methods, and transfer learning techniques. The outcome reveals a remarkable 96.7% accuracy in the classification of drill core segments into distinct formation classes. Furthermore, a CNN model was trained for the evaluation of mineral content using a learning data set from multidimensional log analysis data (silicate, total clay, carbonate). When benchmarked against laboratory XRD measurements on samples from the cores, both the advanced multidimensional log analysis model and the neural network approach developed here provide equally good performance. This work demonstrates that deep learning and particularly transfer learning can support extracting petrophysical properties, including mineral content and formation classification, from drill core images, thus offering a road map for enhancing model performance and data set quality in image-based analysis of drill cores.
Updated: 2024-03-27 12:15:22
标题: 通过迁移学习从钻孔岩心图像直接预测矿物含量
摘要: 深部地下勘探对于采矿、石油和天然气行业以及评估地质单元用于化学或核废物处置或地热能系统的可行性非常重要。通常,对地下岩层或单元的详细检查是通过在钻井活动中提取的岩屑或岩芯材料以及地球物理钻孔数据来进行的,这些数据提供了有关岩石岩性特性的详细信息。根据岩石样本的数量和分析程序,实验室分析和诊断可能非常耗时。本研究探讨了利用机器学习,特别是卷积神经网络(CNN),仅通过分析钻芯图像来评估岩性和矿物含量的潜力,旨在支持并加快地下地质勘探。该论文概述了一套全面的方法论,包括数据预处理、机器学习方法和迁移学习技术。结果显示,在将钻芯段分类为不同的地层类别方面,达到了惊人的96.7%准确率。此外,利用来自多维测井分析数据(硅酸盐、总粘土、碳酸盐)的学习数据集,训练了一个CNN模型用于评估矿物含量。与实验室XRD测量的样品相比,无论是先进的多维测井分析模型还是本文开发的神经网络方法,都提供了同样出色的性能。这项工作表明,深度学习特别是迁移学习可以支持从钻芯图像中提取岩石物性,包括矿物含量和地层分类,从而为提高模型性能和图像分析数据集质量提供了一个路线图。
更新时间: 2024-03-27 12:15:22
领域: cs.CV,cs.LG,eess.IV
Solving a Real-World Package Delivery Routing Problem Using Quantum Annealers
Research focused on the conjunction between quantum computing and routing problems has been very prolific in recent years. Most of the works revolve around classical problems such as the Traveling Salesman Problem or the Vehicle Routing Problem. Even though working on these problems is valuable, it is also undeniable that their academic-oriented nature falls short of real-world requirements. The main objective of this research is to present a solving method for realistic instances, avoiding problem relaxations or technical shortcuts. Instead, a quantum-classical hybrid solver has been developed, coined Q4RPD, that considers a set of real constraints such as a heterogeneous fleet of vehicles, priority deliveries, and capacities characterized by two values: weight and dimensions of the packages. Q4RPD resorts to the Leap Constrained Quadratic Model Hybrid Solver of D-Wave. To demonstrate the application of Q4RPD, an experimentation composed of six different instances has been conducted, aiming to serve as illustrative examples.
Updated: 2024-03-27 12:13:42
标题: 利用量子退火器解决实际包裹送货路径问题
摘要: 近年来,关于量子计算和路由问题的研究非常丰富。大部分作品围绕着诸如旅行推销员问题或车辆路径问题等经典问题展开。尽管研究这些问题具有价值,但不可否认的是,它们的学术性质并不能完全满足现实世界的需求。本研究的主要目标是提出一种解决实际情况的方法,避免问题的放松或技术上的捷径。相反,开发了一种量子-经典混合求解器,称为Q4RPD,考虑了一组真实约束条件,例如异构车队、优先交付和以包裹的重量和尺寸为特征的容量。Q4RPD采用了D-Wave的Leap受限二次模型混合求解器。为了证明Q4RPD的应用,进行了包括六个不同实例的实验,旨在作为说明性示例。
更新时间: 2024-03-27 12:13:42
领域: cs.ET,cs.AI
Learning in PINNs: Phase transition, total diffusion, and generalization
We investigate the learning dynamics of fully-connected neural networks through the lens of gradient signal-to-noise ratio (SNR), examining the behavior of first-order optimizers like Adam in non-convex objectives. By interpreting the drift/diffusion phases in the information bottleneck theory, focusing on gradient homogeneity, we identify a third phase termed ``total diffusion", characterized by equilibrium in the learning rates and homogeneous gradients. This phase is marked by an abrupt SNR increase, uniform residuals across the sample space and the most rapid training convergence. We propose a residual-based re-weighting scheme to accelerate this diffusion in quadratic loss functions, enhancing generalization. We also explore the information compression phenomenon, pinpointing a significant saturation-induced compression of activations at the total diffusion phase, with deeper layers experiencing negligible information loss. Supported by experimental data on physics-informed neural networks (PINNs), which underscore the importance of gradient homogeneity due to their PDE-based sample inter-dependence, our findings suggest that recognizing phase transitions could refine ML optimization strategies for improved generalization.
Updated: 2024-03-27 12:10:30
标题: 在物理神经网络中的学习:相变、总扩散和泛化
摘要: 我们通过梯度信噪比(SNR)的视角研究了全连接神经网络的学习动态,考察了像Adam这样的一阶优化器在非凸目标中的行为。通过解释信息瓶颈理论中的漂移/扩散阶段,着重于梯度均匀性,我们确定了一个称为“总扩散”的第三阶段,其特征是学习率和梯度均匀,并且具有梯度均匀性。这个阶段以梯度信噪比的突然增加、样本空间中均匀的残差以及最快的训练收敛为标志。我们提出了一种基于残差的重新加权方案,加速二次损失函数中的这种扩散,增强泛化能力。我们还探讨了信息压缩现象,指出在总扩散阶段激活的显著饱和诱导了压缩,更深的层次经历了可忽略的信息损失。通过对物理信息神经网络(PINNs)的实验数据支持,这些数据强调了由于它们基于PDE的样本相互依赖性而产生的梯度均匀性的重要性,我们的发现表明,识别相变可以优化ML优化策略,提高泛化性能。
更新时间: 2024-03-27 12:10:30
领域: cs.LG
Impact of Employing Weather Forecast Data as Input to the Estimation of Evapotranspiration by Deep Neural Network Models
Reference Evapotranspiration (ET0) is a key parameter for designing smart irrigation scheduling, since it is related by a coefficient to the water needs of a crop. The United Nations Food and Agriculture Organization, proposed a standard method for ET0 computation (FAO56PM), based on the parameterization of the Penman-Monteith equation, that is widely adopted in the literature. To compute ET0 using the FAO56-PM method, four main weather parameters are needed: temperature, humidity, wind, and solar radiation (SR). One way to make daily ET0 estimations for future days is to use freely available weather forecast services (WFSs), where many meteorological parameters are estimated up to the next 15 days. A problem with this method is that currently, SR is not provided as a free forecast parameter on most of those online services or, normally, such forecasts present a financial cost penalty. For this reason, several ET0 estimation models using machine and deep learning were developed and presented in the literature, that use as input features a reduced set of carefully selected weather parameters, that are compatible with common freely available WFSs. However, most studies on this topic have only evaluated model performance using data from weather stations (WSs), without considering the effect of using weather forecast data. In this study, the performance of authors' previous models is evaluated when using weather forecast data from two online WFSs, in the following scenarios: (i) direct ET0 estimation by an ANN model, and (ii) estimate SR by ANN model, and then use that estimation for ET0 computation, using the FAO56-PM method. Employing data collected from two WFSs and a WS located in Vale do Lobo, Portugal, the latter approach achieved the best result, with a coefficient of determination (R2) ranging between 0.893 and 0.667, when considering forecasts up to 15 days.
Updated: 2024-03-27 12:01:51
标题: 使用天气预报数据作为深度神经网络模型估算蒸散发的影响
摘要: 参考蒸散发(ET0)是设计智能灌溉计划的关键参数,因为它与作物的水需求有关联的系数。联合国粮食和农业组织提出了一种标准的ET0计算方法(FAO56PM),基于Penman-Monteith方程的参数化,这在文献中被广泛采用。使用FAO56-PM方法计算ET0时,需要四个主要的气象参数:温度、湿度、风和太阳辐射(SR)。一种预测未来几天的日ET0估算的方法是使用免费提供的天气预报服务(WFSs),其中许多气象参数可以估计到未来的15天。这种方法的问题是,目前大多数在线服务不提供SR作为免费的预测参数,通常这样的预测会带来经济成本。因此,文献中提出了几种使用机器和深度学习的ET0估算模型,其输入特征是一组精心选择的气象参数,与常见的免费WFSs兼容。然而,大多数关于这个主题的研究仅使用气象站(WSs)的数据评估模型性能,而没有考虑使用天气预报数据的影响。在这项研究中,评估了作者先前模型在使用来自两个在线WFSs的天气预报数据时的性能,以下是几种情景:(i)通过ANN模型直接估算ET0,和(ii)通过ANN模型估算SR,然后使用该估算值进行ET0计算,使用FAO56-PM方法。使用来自葡萄牙洛博谷的两个WFSs和一个WS收集的数据,后一种方法在考虑15天的预测时取得了最佳结果,相关系数(R2)在0.893和0.667之间。
更新时间: 2024-03-27 12:01:51
领域: cs.AI,cs.LG
Synthesizing EEG Signals from Event-Related Potential Paradigms with Conditional Diffusion Models
Data scarcity in the brain-computer interface field can be alleviated through the use of generative models, specifically diffusion models. While diffusion models have previously been successfully applied to electroencephalogram (EEG) data, existing models lack flexibility w.r.t.~sampling or require alternative representations of the EEG data. To overcome these limitations, we introduce a novel approach to conditional diffusion models that utilizes classifier-free guidance to directly generate subject-, session-, and class-specific EEG data. In addition to commonly used metrics, domain-specific metrics are employed to evaluate the specificity of the generated samples. The results indicate that the proposed model can generate EEG data that resembles real data for each subject, session, and class.
Updated: 2024-03-27 11:58:45
标题: 使用条件扩散模型合成与事件相关电位范式相关的脑电图信号
摘要: 大脑-计算机接口领域中的数据稀缺问题可以通过使用生成模型,特别是扩散模型来缓解。虽然扩散模型先前已成功应用于脑电图(EEG)数据,但现有模型在采样方面缺乏灵活性或需要替代的EEG数据表示。为了克服这些限制,我们引入了一种新颖的条件扩散模型方法,利用无分类器的指导直接生成特定于受试者、会话和类别的EEG数据。除了常用的指标外,还使用领域特定的指标来评估生成样本的特异性。结果表明,所提出的模型可以生成类似真实数据的EEG数据,适用于每个受试者、会话和类别。
更新时间: 2024-03-27 11:58:45
领域: cs.LG,cs.AI,eess.SP,I.2.6; G.3; I.5.4; J.3
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D human pose estimation from videos. Our HoT begins with pruning pose tokens of redundant frames and ends with recovering full-length tokens, resulting in a few pose tokens in the intermediate transformer blocks and thus improving the model efficiency. To effectively achieve this, we propose a token pruning cluster (TPC) that dynamically selects a few representative tokens with high semantic diversity while eliminating the redundancy of video frames. In addition, we develop a token recovering attention (TRA) to restore the detailed spatio-temporal information based on the selected tokens, thereby expanding the network output to the original full-length temporal resolution for fast inference. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that our method can achieve both high efficiency and estimation accuracy compared to the original VPT models. For instance, applying to MotionBERT and MixSTE on Human3.6M, our HoT can save nearly 50% FLOPs without sacrificing accuracy and nearly 40% FLOPs with only 0.2% accuracy drop, respectively. Code and models are available at https://github.com/NationalGAILab/HoT.
Updated: 2024-03-27 11:43:28
标题: Hourglass Tokenizer用于高效的基于Transformer的3D人体姿势估计
摘要: 变压器已成功应用于基于视频的3D人体姿势估计领域。然而,这些视频姿势变压器(VPTs)的高计算成本使它们在资源受限设备上难以实现。在本文中,我们提出了一个即插即用的修剪和恢复框架,称为Hourglass Tokenizer(HoT),用于从视频中高效地基于变压器进行3D人体姿势估计。我们的HoT从修剪多余帧的姿势标记开始,最终恢复全长标记,从而在中间变压器块中产生少量姿势标记,从而提高模型效率。为了有效实现这一点,我们提出了一个标记修剪簇(TPC),该簇动态选择一些具有高语义多样性的代表性标记,同时消除视频帧的冗余。此外,我们开发了一个标记恢复注意力(TRA),根据选定的标记恢复详细的时空信息,从而将网络输出扩展到原始全长时空分辨率以进行快速推断。在两个基准数据集(即Human3.6M和MPI-INF-3DHP)上的大量实验表明,与原始VPT模型相比,我们的方法既可以在效率和估计精度方面取得成功。例如,在Human3.6M上应用MotionBERT和MixSTE,我们的HoT可以节省近50%的FLOPs而不会牺牲准确性,分别减少近40%的FLOPs,仅减少0.2%的准确性。代码和模型可在https://github.com/NationalGAILab/HoT 上获得。
更新时间: 2024-03-27 11:43:28
领域: cs.CV,cs.AI,cs.LG
$\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Prompt-based learning is a new language model training paradigm that adapts the Pre-trained Language Models (PLMs) to downstream tasks, which revitalizes the performance benchmarks across various natural language processing (NLP) tasks. Instead of using a fixed prompt template to fine-tune the model, some research demonstrates the effectiveness of searching for the prompt via optimization. Such prompt optimization process of prompt-based learning on PLMs also gives insight into generating adversarial prompts to mislead the model, raising concerns about the adversarial vulnerability of this paradigm. Recent studies have shown that universal adversarial triggers (UATs) can be generated to alter not only the predictions of the target PLMs but also the prediction of corresponding Prompt-based Fine-tuning Models (PFMs) under the prompt-based learning paradigm. However, UATs found in previous works are often unreadable tokens or characters and can be easily distinguished from natural texts with adaptive defenses. In this work, we consider the naturalness of the UATs and develop $\textit{LinkPrompt}$, an adversarial attack algorithm to generate UATs by a gradient-based beam search algorithm that not only effectively attacks the target PLMs and PFMs but also maintains the naturalness among the trigger tokens. Extensive results demonstrate the effectiveness of $\textit{LinkPrompt}$, as well as the transferability of UATs generated by $\textit{LinkPrompt}$ to open-sourced Large Language Model (LLM) Llama2 and API-accessed LLM GPT-3.5-turbo.
Updated: 2024-03-27 11:37:58
标题: $\textit{LinkPrompt}$:基于提示的语言模型的自然和通用对抗攻击
摘要: 提示为基础的学习是一种新的语言模型训练范式,它将预训练语言模型(PLMs)调整到下游任务,从而使各种自然语言处理(NLP)任务的性能基准得以复苏。与使用固定提示模板来微调模型不同,一些研究表明通过优化搜索提示的有效性。这种在PLMs上基于提示的学习的提示优化过程还揭示了生成对抗提示以误导模型的见解,引发了对这种范式对抗性脆弱性的担忧。最近的研究表明,可以生成通用对抗触发器(UATs)来改变不仅目标PLMs的预测,还有对基于提示的微调模型(PFMs)的预测,而这是在基于提示的学习范式下进行的。然而,以前的工作中发现的UATs通常是不可读的标记或字符,并且可以很容易地与自然文本通过自适应防御区分开来。在这项工作中,我们考虑UATs的自然性,并开发了一个对抗攻击算法$\textit{LinkPrompt}$,通过基于梯度的波束搜索算法生成UATs,不仅有效地攻击目标PLMs和PFMs,而且在触发标记之间保持自然性。广泛的结果证明了$\textit{LinkPrompt}$的有效性,以及由$\textit{LinkPrompt}$生成的UATs对开源的大型语言模型(LLM)Llama2和API访问的LLM GPT-3.5-turbo的可转移性。
更新时间: 2024-03-27 11:37:58
领域: cs.CL,cs.AI
LLatrieval: LLM-Verified Retrieval for Verifiable Generation
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents, which enables the user to flexibly verify the answer and makes the LLM's output more reliable. Retrieval plays a crucial role in verifiable generation. Specifically, the retrieved documents not only supplement knowledge to help the LLM generate correct answers, but also serve as supporting evidence for the user to verify the LLM's output. However, the widely used retrievers become the bottleneck of the entire pipeline and limit the overall performance. Their capabilities are usually inferior to LLMs since they often have much fewer parameters than the large language model and have not been demonstrated to scale well to the size of LLMs. If the retriever does not correctly find the supporting documents, the LLM can not generate the correct and verifiable answer, which overshadows the LLM's remarkable abilities. To address these limitations, we propose \LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question. Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to fully support verifiable generation. Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
Updated: 2024-03-27 11:36:46
标题: LLatrieval:用于可验证生成的LLM验证检索
摘要: 可验证生成旨在让大型语言模型(LLM)生成带有支持文档的文本,这使用户能够灵活验证答案,并使LLM的输出更可靠。检索在可验证生成中起着至关重要的作用。具体来说,检索到的文档不仅补充知识以帮助LLM生成正确答案,还作为支持证据供用户验证LLM的输出。然而,广泛使用的检索器成为整个流程的瓶颈,限制了整体性能。它们的能力通常低于LLM,因为它们通常比大型语言模型的参数少得多,并且尚未证明能够很好地扩展到LLM的规模。如果检索器未能正确找到支持文档,LLM就无法生成正确和可验证的答案,这会掩盖LLM的显著能力。为了解决这些限制,我们提出了LLatrieval(大型语言模型验证检索),其中LLM更新检索结果,直到验证检索到的文档足以支持回答问题。因此,LLM可以迭代地提供反馈给检索,并促进检索结果充分支持可验证生成。实验证明,LLatrieval明显优于广泛的基线,并取得了最先进的结果。
更新时间: 2024-03-27 11:36:46
领域: cs.CL,cs.AI,cs.IR
Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds
3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between domains, and integrates it into a two-stage self-training pipeline named DGT-ST. First, in contrast to existing works that simultaneously conduct data generation and feature/output alignment within unstable adversarial training, we employ the non-learnable DGT to bridge the domain gap at the input level. Second, to provide a well-initialized model for self-training, we propose a category-level adversarial network in stage one that utilizes the prototype to prevent negative transfer. Finally, by leveraging the designs above, a domain-mixed self-training method with source-aware consistency loss is proposed in stage two to narrow the domain gap further. Experiments on two synthetic-to-real segmentation tasks (SynLiDAR $\rightarrow$ semanticKITTI and SynLiDAR $\rightarrow$ semanticPOSS) demonstrate that DGT-ST outperforms state-of-the-art methods, achieving 9.4$\%$ and 4.3$\%$ mIoU improvements, respectively. Code is available at \url{https://github.com/yuan-zm/DGT-ST}.
Updated: 2024-03-27 11:28:57
标题: 密度引导的翻译器提升合成到真实的无监督域自适应三维点云分割
摘要: 3D合成到真实无监督领域自适应分割对于注释新领域至关重要。自训练是这项任务的一种竞争性方法,但其性能受到不同传感器采样模式(即点密度变化)和不完整的训练策略的限制。在这项工作中,我们提出了一种密度引导的翻译器(DGT),它在领域之间翻译点密度,并将其整合到一个名为DGT-ST的两阶段自训练流程中。首先,与现有的同时进行数据生成和特征/输出对齐的不稳定对抗训练相比,我们利用不可学习的DGT在输入级别上桥接领域差距。其次,为了为自训练提供一个良好初始化的模型,我们在第一阶段提出了一个利用原型来防止负迁移的类别级对抗网络。最后,通过利用上述设计,提出了一个具有源感知一致性损失的领域混合自训练方法,以进一步缩小领域差距。对两个合成到真实分割任务(SynLiDAR → semanticKITTI和SynLiDAR → semanticPOSS)的实验表明,DGT-ST优于现有方法,分别实现了9.4%和4.3%的mIoU提升。源代码可在\url{https://github.com/yuan-zm/DGT-ST}找到。
更新时间: 2024-03-27 11:28:57
领域: cs.CV,cs.AI
CoBOS: Constraint-Based Online Scheduler for Human-Robot Collaboration
Assembly processes involving humans and robots are challenging scenarios because the individual activities and access to shared workspace have to be coordinated. Fixed robot programs leave no room to diverge from a fixed protocol. Working on such a process can be stressful for the user and lead to ineffective behavior or failure. We propose a novel approach of online constraint-based scheduling in a reactive execution control framework facilitating behavior trees called CoBOS. This allows the robot to adapt to uncertain events such as delayed activity completions and activity selection (by the human). The user will experience less stress as the robotic coworkers adapt their behavior to best complement the human-selected activities to complete the common task. In addition to the improved working conditions, our algorithm leads to increased efficiency, even in highly uncertain scenarios. We evaluate our algorithm using a probabilistic simulation study with 56000 experiments. We outperform all baselines by a margin of 4-10%. Initial real robot experiments using a Franka Emika Panda robot and human tracking based on HTC Vive VR gloves look promising.
Updated: 2024-03-27 11:18:01
标题: CoBOS:基于约束的人机协作在线调度器
摘要: 涉及人类和机器人的装配过程是具有挑战性的场景,因为个体活动和共享工作空间的访问必须协调一致。固定的机器人程序不允许偏离固定协议。在这样的过程中工作可能会对用户产生压力,并导致无效的行为或失败。我们提出了一种新颖的在线基于约束的调度方法,结合了行为树的反应执行控制框架,称为CoBOS。这使得机器人能够适应不确定事件,例如延迟活动完成和活动选择(由人类)。用户将体验到更少的压力,因为机器人同事会调整他们的行为,以最好地补充人类选择的活动,完成共同任务。除了改善的工作条件外,我们的算法还导致了更高的效率,即使在高度不确定的情况下也是如此。我们使用56000次实验进行概率模拟研究来评估我们的算法。我们的表现超过所有基线4-10%。使用Franka Emika Panda机器人和基于HTC Vive VR手套的人类跟踪进行的初始真实机器人实验看起来很有前景。
更新时间: 2024-03-27 11:18:01
领域: cs.RO,cs.AI
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
There are five types of trajectory prediction tasks: deterministic, stochastic, domain adaptation, momentary observation, and few-shot. These associated tasks are defined by various factors, such as the length of input paths, data split and pre-processing methods. Interestingly, even though they commonly take sequential coordinates of observations as input and infer future paths in the same coordinates as output, designing specialized architectures for each task is still necessary. For the other task, generality issues can lead to sub-optimal performances. In this paper, we propose SingularTrajectory, a diffusion-based universal trajectory prediction framework to reduce the performance gap across the five tasks. The core of SingularTrajectory is to unify a variety of human dynamics representations on the associated tasks. To do this, we first build a Singular space to project all types of motion patterns from each task into one embedding space. We next propose an adaptive anchor working in the Singular space. Unlike traditional fixed anchor methods that sometimes yield unacceptable paths, our adaptive anchor enables correct anchors, which are put into a wrong location, based on a traversability map. Finally, we adopt a diffusion-based predictor to further enhance the prototype paths using a cascaded denoising process. Our unified framework ensures the generality across various benchmark settings such as input modality, and trajectory lengths. Extensive experiments on five public benchmarks demonstrate that SingularTrajectory substantially outperforms existing models, highlighting its effectiveness in estimating general dynamics of human movements. Code is publicly available at https://github.com/inhwanbae/SingularTrajectory .
Updated: 2024-03-27 11:11:08
标题: SingularTrajectory: 使用扩散模型的通用轨迹预测器
摘要: 有五种类型的轨迹预测任务:确定性、随机性、领域适应、瞬时观测和少样本。这些相关任务是由各种因素定义的,比如输入路径的长度、数据分割和预处理方法。有趣的是,即使它们通常将观测的顺序坐标作为输入,并推断未来路径与输出相同的坐标,设计专门的架构对于每个任务仍然是必要的。对于其他任务,普遍性问题可能导致次优性能。在本文中,我们提出了SingularTrajectory,这是一个基于扩散的通用轨迹预测框架,旨在减少五个任务之间的性能差距。SingularTrajectory的核心是统一各种与任务相关的人体动力学表示。为了实现这一点,我们首先建立一个Singular空间,将每个任务的所有类型的运动模式投影到一个嵌入空间中。接下来,我们提出了一个在Singular空间中工作的自适应锚点。与有时会产生不可接受路径的传统固定锚点方法不同,我们的自适应锚点能够根据可通行性地图将正确的锚点放置在错误的位置。最后,我们采用基于扩散的预测器,通过级联去噪过程进一步增强原型路径。我们的统一框架确保了在各种基准设置下的普遍性,如输入模态和轨迹长度。对五个公共基准的大量实验证明了SingularTrajectory在估计人体运动的一般动态方面明显优于现有模型,突显了其有效性。代码可在https://github.com/inhwanbae/SingularTrajectory 公开获取。
更新时间: 2024-03-27 11:11:08
领域: cs.CV,cs.LG,cs.RO
CoRAST: Towards Foundation Model-Powered Correlated Data Analysis in Resource-Constrained CPS and IoT
Foundation models (FMs) emerge as a promising solution to harness distributed and diverse environmental data by leveraging prior knowledge to understand the complicated temporal and spatial correlations within heterogeneous datasets. Unlike distributed learning frameworks such as federated learning, which often struggle with multimodal data, FMs can transform diverse inputs into embeddings. This process facilitates the integration of information from various modalities and the application of prior learning to new domains. However, deploying FMs in resource-constrained edge systems poses significant challenges. To this end, we introduce CoRAST, a novel learning framework that utilizes FMs for enhanced analysis of distributed, correlated heterogeneous data. Utilizing a server-based FM, CoRAST can exploit existing environment information to extract temporal, spatial, and cross-modal correlations among sensor data. This enables CoRAST to offer context-aware insights for localized client tasks through FM-powered global representation learning. Our evaluation on real-world weather dataset demonstrates CoRAST's ability to exploit correlated heterogeneous data through environmental representation learning to reduce the forecast errors by up to 50.3% compared to the baselines.
Updated: 2024-03-27 11:11:06
标题: CoRAST: 面向资源受限的CPS和物联网的基础模型驱动的相关数据分析
摘要: 基于先前知识来理解异构数据集内复杂的时空相关性,基础模型(FMs)被认为是利用分布式和多样化环境数据的一种有前途的解决方案。与联合学习等分布式学习框架不同,后者常常难以处理多模态数据,FMs能够将多样化输入转换为嵌入。这一过程有助于整合来自不同模态的信息,并将先前的学习应用于新的领域。然而,在资源受限的边缘系统中部署FMs面临着重大挑战。因此,我们介绍了CoRAST,这是一种利用FMs进行增强分析分布式、相关的异构数据的新型学习框架。通过利用基于服务器的FM,CoRAST能够利用现有的环境信息提取传感器数据中的时空和跨模态相关性。这使得CoRAST能够通过基于FM的全局表示学习为本地化客户任务提供上下文感知的见解。我们在真实的天气数据集上进行评估,结果表明CoRAST通过环境表示学习能够利用相关的异构数据将预测误差降低了高达50.3%,相比基线模型。
更新时间: 2024-03-27 11:11:06
领域: cs.LG,cs.AI
Deep Limit Order Book Forecasting
We exploit cutting-edge deep learning methodologies to explore the predictability of high-frequency Limit Order Book mid-price changes for a heterogeneous set of stocks traded on the NASDAQ exchange. In so doing, we release `LOBFrame', an open-source code base to efficiently process large-scale Limit Order Book data and quantitatively assess state-of-the-art deep learning models' forecasting capabilities. Our results are twofold. We demonstrate that the stocks' microstructural characteristics influence the efficacy of deep learning methods and that their high forecasting power does not necessarily correspond to actionable trading signals. We argue that traditional machine learning metrics fail to adequately assess the quality of forecasts in the Limit Order Book context. As an alternative, we propose an innovative operational framework that evaluates predictions' practicality by focusing on the probability of accurately forecasting complete transactions. This work offers academics and practitioners an avenue to make informed and robust decisions on the application of deep learning techniques, their scope and limitations, effectively exploiting emergent statistical properties of the Limit Order Book.
Updated: 2024-03-27 11:11:02
标题: 深度限价订单簿预测
摘要: 我们利用最先进的深度学习方法探索在纳斯达克交易所交易的一组股票的高频限价订单簿中间价变化的可预测性。为此,我们发布了“LOBFrame”,这是一个开源代码库,用于高效处理大规模限价订单簿数据,并定量评估最先进的深度学习模型的预测能力。我们的研究结果有两个方面。我们证明了股票的微观结构特征影响了深度学习方法的有效性,而它们的高预测力并不一定对应可操作的交易信号。我们认为传统的机器学习指标无法充分评估限价订单簿环境中预测的质量。作为替代方案,我们提出了一种创新的操作框架,通过关注准确预测完整交易的概率来评估预测的实用性。这项工作为学者和从业者提供了一个途径,使他们能够在深度学习技术的应用、范围和限制上做出明智而坚实的决策,有效地利用限价订单簿的新兴统计特性。
更新时间: 2024-03-27 11:11:02
领域: q-fin.TR,cs.LG
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Language models have demonstrated impressive ability in context understanding and generative performance. Inspired by the recent success of language foundation models, in this paper, we propose LMTraj (Language-based Multimodal Trajectory predictor), which recasts the trajectory prediction task into a sort of question-answering problem. Departing from traditional numerical regression models, which treat the trajectory coordinate sequence as continuous signals, we consider them as discrete signals like text prompts. Specially, we first transform an input space for the trajectory coordinate into the natural language space. Here, the entire time-series trajectories of pedestrians are converted into a text prompt, and scene images are described as text information through image captioning. The transformed numerical and image data are then wrapped into the question-answering template for use in a language model. Next, to guide the language model in understanding and reasoning high-level knowledge, such as scene context and social relationships between pedestrians, we introduce an auxiliary multi-task question and answering. We then train a numerical tokenizer with the prompt data. We encourage the tokenizer to separate the integer and decimal parts well, and leverage it to capture correlations between the consecutive numbers in the language model. Lastly, we train the language model using the numerical tokenizer and all of the question-answer prompts. Here, we propose a beam-search-based most-likely prediction and a temperature-based multimodal prediction to implement both deterministic and stochastic inferences. Applying our LMTraj, we show that the language-based model can be a powerful pedestrian trajectory predictor, and outperforms existing numerical-based predictor methods. Code is publicly available at https://github.com/inhwanbae/LMTrajectory .
Updated: 2024-03-27 11:06:44
标题: 语言能战胜数值回归吗?基于语言的多模态轨迹预测
摘要: 语言模型在上下文理解和生成性能方面展现出令人印象深刻的能力。受到最近语言基础模型成功的启发,本文提出了LMTraj(基于语言的多模态轨迹预测器),将轨迹预测任务重新构造为一种问题回答问题。与传统的将轨迹坐标序列视为连续信号的数值回归模型不同,我们将其视为文本提示一样的离散信号。特别地,我们首先将轨迹坐标的输入空间转换为自然语言空间。在这里,行人的整个时间序列轨迹被转换为一个文本提示,场景图像通过图像字幕描述为文本信息。然后,将转换后的数字和图像数据包装成问题回答模板以供语言模型使用。接下来,为了指导语言模型理解和推理高层次知识,如场景背景和行人之间的社会关系,我们引入了一个辅助多任务问题和回答。然后使用提示数据训练一个数字标记器。我们鼓励标记器很好地分离整数和小数部分,并利用它捕捉语言模型中连续数字之间的相关性。最后,我们使用数字标记器和所有问题回答提示来训练语言模型。在这里,我们提出了基于波束搜索的最可能预测和基于温度的多模态预测,以实现确定性和随机推理。通过应用我们的LMTraj,我们展示了基于语言的模型可以成为一个强大的行人轨迹预测器,并且优于现有的基于数字的预测方法。代码可在https://github.com/inhwanbae/LMTrajectory 上公开获取。
更新时间: 2024-03-27 11:06:44
领域: cs.CL,cs.CV,cs.LG,cs.RO
FRESCO: Federated Reinforcement Energy System for Cooperative Optimization
The rise in renewable energy is creating new dynamics in the energy grid that promise to create a cleaner and more participative energy grid, where technology plays a crucial part in making the required flexibility to achieve the vision of the next-generation grid. This work presents FRESCO, a framework that aims to ease the implementation of energy markets using a hierarchical control architecture of reinforcement learning agents trained using federated learning. The core concept we are proving is that having greedy agents subject to changing conditions from a higher level agent creates a cooperative setup that will allow for fulfilling all the individual objectives. This paper presents a general overview of the framework, the current progress, and some insights we obtained from the recent results.
Updated: 2024-03-27 11:00:53
标题: FRESCO:联邦式强化能源系统用于合作优化
摘要: 可再生能源的兴起正在为能源网络带来新的动态,承诺创建一个更清洁、更参与性的能源网络,在这个网络中,技术在实现所需的灵活性方面起着至关重要的作用,以实现下一代网络的愿景。本文介绍了一个名为FRESCO的框架,旨在通过使用经过联邦学习训练的层次控制架构的强化学习代理来简化能源市场的实施。我们正在证明的核心概念是,让贪婪的代理根据来自更高级代理的变化条件创建一个合作设置,这将允许实现所有个体目标。本文介绍了该框架的一般概述、当前进展情况以及我们从最近结果中获得的一些见解。
更新时间: 2024-03-27 11:00:53
领域: cs.LG
Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning
Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.
Updated: 2024-03-27 10:47:24
标题: 对抽奖票假设和迭代量剪枝的深入了解
摘要: 深度神经网络的彩票假设强调了重新训练使用迭代幅值修剪过程获得的更稀疏网络时所使用的初始化的重要性。对于为什么彩票假设提出的具体初始化在泛化(和训练)性能方面往往效果更好的解释尚不清楚。此外,迭代幅值修剪中的基本原理,如修剪较小幅值权重和迭代过程的作用,缺乏全面的理解和解释。在这项工作中,我们尝试通过实证研究在迭代幅值修剪过程的各个阶段获得的解决方案的体积/几何和损失景观特征,来深入了解这些现象。
更新时间: 2024-03-27 10:47:24
领域: cs.LG
Generalized Policy Learning for Smart Grids: FL TRPO Approach
The smart grid domain requires bolstering the capabilities of existing energy management systems; Federated Learning (FL) aligns with this goal as it demonstrates a remarkable ability to train models on heterogeneous datasets while maintaining data privacy, making it suitable for smart grid applications, which often involve disparate data distributions and interdependencies among features that hinder the suitability of linear models. This paper introduces a framework that combines FL with a Trust Region Policy Optimization (FL TRPO) aiming to reduce energy-associated emissions and costs. Our approach reveals latent interconnections and employs personalized encoding methods to capture unique insights, understanding the relationships between features and optimal strategies, allowing our model to generalize to previously unseen data. Experimental results validate the robustness of our approach, affirming its proficiency in effectively learning policy models for smart grid challenges.
Updated: 2024-03-27 10:47:06
标题: 智能电网的泛化政策学习:FL TRPO 方法
摘要: 智能电网领域需要增强现有能源管理系统的能力;联邦学习(FL)与此目标一致,因为它展现了在异构数据集上训练模型的显著能力,同时保持数据隐私,使其适用于智能电网应用,这些应用通常涉及不同的数据分布和特征之间的相互依赖,这些特征阻碍了线性模型的适用性。本文介绍了一个将FL与信任区域策略优化(FL TRPO)相结合的框架,旨在减少与能源相关的排放和成本。我们的方法揭示了潜在的相互连接,并采用个性化编码方法捕获独特的见解,理解特征之间的关系和最佳策略,使我们的模型能够泛化到以前未见的数据。实验结果验证了我们方法的稳健性,肯定其在有效学习智能电网挑战的政策模型方面的能力。
更新时间: 2024-03-27 10:47:06
领域: cs.LG
AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ Segmentation
Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers, which may not scale to multi-organ segmentation problems where inter-organ relation also plays a huge role. We introduce a new approach to impose anatomical constraints on any existing encoder-decoder segmentation model by conditioning model prediction with learnable anatomy prior. More specifically, given an abdominal scan, a part of the encoder spatially warps a learnable prior to align with the given input scan using thin plate spline (TPS) grid interpolation. The warped prior is then integrated during the decoding phase to guide the model for more anatomy-informed predictions. Code is available at \hyperlink{https://anonymous.4open.science/r/AIC-UNet-7048}{https://anonymous.4open.science/r/AIC-UNet-7048}.
Updated: 2024-03-27 10:46:24
标题: AIC-UNet:用于稳健多器官分割的解剖学引导级联UNet
摘要: 强加关键解剖特征,如器官数量、形状、大小和相对位置,对于构建强大的多器官分割模型至关重要。目前尝试整合解剖特征的方法包括通过资源-和数据密集型模块(如自注意力)扩大有效感受野(ERF)大小,或引入器官特定的拓扑正则化器,但这些方法可能不适用于多器官分割问题,其中器官间的关系也起着重要作用。我们提出了一种新方法,通过在任何现有的编码-解码分割模型中引入可学习的解剖先验,来施加解剖约束。具体而言,给定腹部扫描时,编码器的一部分通过薄板样条(TPS)网格插值将可学习的先验空间扭曲以与给定的输入扫描对齐。在解码阶段,扭曲的先验被整合进来,以指导模型进行更符合解剖学的预测。代码可在以下链接找到:https://anonymous.4open.science/r/AIC-UNet-7048。
更新时间: 2024-03-27 10:46:24
领域: cs.CV,cs.LG,eess.IV
Global Vegetation Modeling with Pre-Trained Weather Transformers
Accurate vegetation models can produce further insights into the complex interaction between vegetation activity and ecosystem processes. Previous research has established that long-term trends and short-term variability of temperature and precipitation affect vegetation activity. Motivated by the recent success of Transformer-based Deep Learning models for medium-range weather forecasting, we adapt the publicly available pre-trained FourCastNet to model vegetation activity while accounting for the short-term dynamics of climate variability. We investigate how the learned global representation of the atmosphere's state can be transferred to model the normalized difference vegetation index (NDVI). Our model globally estimates vegetation activity at a resolution of \SI{0.25}{\degree} while relying only on meteorological data. We demonstrate that leveraging pre-trained weather models improves the NDVI estimates compared to learning an NDVI model from scratch. Additionally, we compare our results to other recent data-driven NDVI modeling approaches from machine learning and ecology literature. We further provide experimental evidence on how much data and training time is necessary to turn FourCastNet into an effective vegetation model. Code and models will be made available upon publication.
Updated: 2024-03-27 10:45:16
标题: 使用预训练的天气变换器进行全球植被建模
摘要: 准确的植被模型可以深入了解植被活动与生态系统过程之间复杂的相互作用。先前的研究已经确定了长期趋势和短期变化的温度和降水对植被活动的影响。受基于Transformer的深度学习模型在中期天气预报中取得的最近成功的启发,我们将公开可用的预训练FourCastNet适应为模拟植被活动,同时考虑气候变异的短期动态。我们研究了如何将大气状态的学习全局表示转移到模拟归一化差异植被指数(NDVI)上。我们的模型在\SI{0.25}{\degree}的分辨率下全球估计植被活动,仅依赖气象数据。我们展示了利用预训练的天气模型改进了NDVI估计,与从头开始学习NDVI模型相比。此外,我们将我们的结果与其他最近的数据驱动NDVI建模方法进行比较,这些方法来自机器学习和生态学文献。我们进一步提供实验证据,说明将FourCastNet转化为一个有效的植被模型需要多少数据和训练时间。代码和模型将在发表后提供。
更新时间: 2024-03-27 10:45:16
领域: cs.LG
Collaborative Active Learning in Conditional Trust Environment
In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses privacy and security concerns by eliminating the need for direct model and data disclosure; (b) it enables the use of different data sources and insights without direct data exchange; and (c) it promotes cost-effectiveness and resource efficiency through shared labeling costs. To realize these benefits, we introduce a collaborative active learning framework designed to fulfill the aforementioned objectives. We validate the effectiveness of the proposed framework through simulations. The results demonstrate that collaboration leads to higher AUC scores compared to independent efforts, highlighting the framework's ability to overcome the limitations of individual models. These findings support the use of collaborative approaches in active learning, emphasizing their potential to enhance outcomes through collective expertise and shared resources. Our work provides a foundation for further research on collaborative active learning and its practical applications in various domains where data privacy, cost efficiency, and model performance are critical considerations.
Updated: 2024-03-27 10:40:27
标题: 在条件信任环境中的协作式积极学习
摘要: 在这篇论文中,我们研究了协作式主动学习,这是一种多个合作者利用其结合的机器学习能力探索新领域的范例,而不需要披露他们现有的数据和模型。相反,合作者们分享来自新领域的预测结果和新获得的标签。这种合作具有几个优点:(a)它通过消除直接模型和数据披露的需要来解决隐私和安全问题;(b)它使不同数据源和见解的使用成为可能,而无需直接交换数据;(c)通过共享标记成本促进成本效益和资源效率。为了实现这些好处,我们引入了一个旨在实现上述目标的协作式主动学习框架。我们通过模拟验证了所提框架的有效性。结果表明,与独立努力相比,合作导致更高的AUC分数,突显了该框架克服个体模型限制的能力。这些发现支持在主动学习中使用协作方法,强调了通过集体专业知识和共享资源增强结果的潜力。我们的工作为进一步研究协作式主动学习及其在各个关键考虑因素为数据隐私、成本效率和模型性能的各个领域中的实际应用奠定了基础。
更新时间: 2024-03-27 10:40:27
领域: cs.LG
U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models
Diffusion models have demonstrated remarkable performance in text-to-image synthesis, producing realistic and high resolution images that faithfully adhere to the corresponding text-prompts. Despite their great success, they still fall behind in sketch-to-image synthesis tasks, where in addition to text-prompts, the spatial layout of the generated images has to closely follow the outlines of certain reference sketches. Employing an MLP latent edge predictor to guide the spatial layout of the synthesized image by predicting edge maps at each denoising step has been recently proposed. Despite yielding promising results, the pixel-wise operation of the MLP does not take into account the spatial layout as a whole, and demands numerous denoising iterations to produce satisfactory images, leading to time inefficiency. To this end, we introduce U-Sketch, a framework featuring a U-Net type latent edge predictor, which is capable of efficiently capturing both local and global features, as well as spatial correlations between pixels. Moreover, we propose the addition of a sketch simplification network that offers the user the choice of preprocessing and simplifying input sketches for enhanced outputs. The experimental results, corroborated by user feedback, demonstrate that our proposed U-Net latent edge predictor leads to more realistic results, that are better aligned with the spatial outlines of the reference sketches, while drastically reducing the number of required denoising steps and, consequently, the overall execution time.
Updated: 2024-03-27 10:26:42
标题: U-Sketch:一种用于草图到图像扩散模型的高效方法
摘要: 扩散模型在文本到图像合成中表现出色,产生了逼真且高分辨率的图像,忠实地遵循相应的文本提示。尽管取得了巨大成功,但在素描到图像合成任务中仍然落后,除了文本提示外,生成图像的空间布局还必须紧密遵循某些参考素描的轮廓。最近提出了一种利用MLP潜在边缘预测器来引导合成图像的空间布局的方法,通过在每个去噪步骤预测边缘图。尽管取得了令人鼓舞的结果,但MLP的像素级操作没有将整体空间布局考虑在内,并且需要大量的去噪迭代才能产生令人满意的图像,导致时间效率低下。因此,我们引入了U-Sketch,一个具有U-Net类型潜在边缘预测器的框架,能够高效地捕捉本地和全局特征,以及像素之间的空间相关性。此外,我们提出增加一个素描简化网络,为用户提供对输入素描进行预处理和简化以获得增强的输出的选择。实验证明,我们提出的U-Net潜在边缘预测器导致更加逼真的结果,与参考素描的空间轮廓更加一致,同时大幅减少了所需的去噪步骤数量,从而减少了整体执行时间。
更新时间: 2024-03-27 10:26:42
领域: cs.CV,cs.AI,cs.LG
SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks
Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern. While current research has explored adversarial training techniques, their improvements to defend against word-level attacks have been limited. In this work, we propose a novel approach called Semantic Robust Defence (SemRoDe), a Macro Adversarial Training strategy to enhance the robustness of LMs. Drawing inspiration from recent studies in the image domain, we investigate and later confirm that in a discrete data setting such as language, adversarial samples generated via word substitutions do indeed belong to an adversarial domain exhibiting a high Wasserstein distance from the base domain. Our method learns a robust representation that bridges these two domains. We hypothesize that if samples were not projected into an adversarial domain, but instead to a domain with minimal shift, it would improve attack robustness. We align the domains by incorporating a new distance-based objective. With this, our model is able to learn more generalized representations by aligning the model's high-level output features and therefore better handling unseen adversarial samples. This method can be generalized across word embeddings, even when they share minimal overlap at both vocabulary and word-substitution levels. To evaluate the effectiveness of our approach, we conduct experiments on BERT and RoBERTa models on three datasets. The results demonstrate promising state-of-the-art robustness.
Updated: 2024-03-27 10:24:25
标题: SemRoDe: 宏观对抗训练,学习对单词级攻击具有鲁棒性的表示形式
摘要: 语言模型(LMs)是自然语言处理任务中不可或缺的工具,但它们对对抗性攻击的脆弱性仍然是一个令人担忧的问题。虽然目前的研究已经探索了对抗性训练技术,但它们在防御单词级攻击方面的改进有限。在这项工作中,我们提出了一种称为语义强化防御(SemRoDe)的新方法,这是一种宏观对抗性训练策略,旨在增强LMs的稳健性。受到图像领域最近研究的启发,我们调查并后来确认,在语言这样的离散数据设置中,通过单词替换生成的对抗性样本确实属于一个对抗性领域,与基础领域之间存在高Wasserstein距离。我们的方法学习了一个稳健的表示,将这两个领域连接起来。我们假设,如果样本没有投影到对抗性领域,而是投影到一个具有最小偏移的领域,它将提高攻击的稳健性。我们通过引入一种新的基于距离的目标来对齐领域。通过这种方式,我们的模型能够通过对齐模型的高级输出特征来学习更广泛的表示,从而更好地处理看不见的对抗性样本。这种方法可以泛化到单词嵌入,即使它们在词汇和单词替换水平上都共享最小重叠。为了评估我们方法的有效性,我们在三个数据集上对BERT和RoBERTa模型进行实验。结果表明,我们的方法具有有前途的最新稳健性。
更新时间: 2024-03-27 10:24:25
领域: cs.CL,cs.LG
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens. Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs, highlighting the potential effectiveness for future LLM development. One more important contribution of this study is an in-depth analysis of the routing mechanisms within our OpenMoE models, leading to three significant findings: Context-Independent Specialization, Early Routing Learning, and Drop-towards-the-End. We discovered that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance. The token-to-expert assignments are determined early in the pre-training phase and remain largely unchanged. This imperfect routing can result in performance degradation, particularly in sequential tasks like multi-turn conversations, where tokens appearing later in a sequence are more likely to be dropped. Finally, we rethink our design based on the above-mentioned observations and analysis. To facilitate future MoE LLM development, we propose potential strategies for mitigating the issues we found and further improving off-the-shelf MoE LLM designs.
Updated: 2024-03-27 10:21:24
标题: OpenMoE:一个早期的开放混合专家语言模型研究
摘要: 为了帮助开源社区更好地理解基于专家混合(MoE)的大型语言模型(LLMs),我们训练并发布了OpenMoE,一系列完全开源和可重现的仅解码器MoE LLMs,参数范围从650M到34B,训练数据量达到1T以上。我们的研究证实,基于MoE的LLMs可以提供比密集LLMs更有利的成本效益权衡,突显了未来LLM发展的潜在有效性。 这项研究的另一个重要贡献是对我们的OpenMoE模型中路由机制的深入分析,得出了三个重要发现:上下文无关专业化、早期路由学习和朝末端丢弃。我们发现,MoE模型中的路由决策主要基于标记ID,与上下文相关性极小。标记到专家的分配在预训练阶段早期确定,基本保持不变。这种不完美的路由可能导致性能下降,特别是在多轮对话等顺序任务中,后续出现的标记更有可能被丢弃。最后,我们根据上述观察和分析重新考虑我们的设计。为了促进未来MoE LLM的发展,我们提出了缓解我们发现的问题并进一步改进现成MoE LLM设计的潜在策略。
更新时间: 2024-03-27 10:21:24
领域: cs.CL,cs.AI,cs.DC,cs.LG
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build and release BioMedLM, a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with much larger models, such as achieving a score of 57.3% on MedMCQA (dev) and 69.0% on the MMLU Medical Genetics exam. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics. This demonstrates that smaller models can potentially serve as transparent, privacy-preserving, economical and environmentally friendly foundations for particular NLP applications, such as in biomedicine. The model is available on the Hugging Face Hub: https://huggingface.co/stanford-crfm/BioMedLM.
Updated: 2024-03-27 10:18:21
标题: BioMedLM: 在生物医学文本上训练的具有2.7B参数的语言模型
摘要: 像GPT-4和Med-PaLM 2这样的模型在各种生物医学自然语言处理任务上展现出令人印象深刻的性能。然而,这些模型有数百亿个参数,运行计算成本高昂,需要用户通过互联网发送其输入数据,并且是在未知数据源上训练的。较小、更有针对性的模型能否竞争?为了解决这个问题,我们构建并发布了BioMedLM,一个2.7亿参数的GPT风格的自回归模型,专门训练于PubMed的摘要和全文。在微调的情况下,BioMedLM可以产生与更大模型相竞争的强大的多项选择生物医学问答结果,例如在MedMCQA (dev)上达到57.3%的得分,在MMLU医学遗传学考试上达到69.0%。BioMedLM还可以被微调以对医疗主题的患者问题产生有用的回答。这表明较小的模型有可能作为透明、保护隐私、经济、环保的基础,用于特定的自然语言处理应用,比如在生物医学领域。该模型可在Hugging Face Hub上获取:https://huggingface.co/stanford-crfm/BioMedLM。
更新时间: 2024-03-27 10:18:21
领域: cs.CL,cs.AI
VIGraph: Generative Self-supervised Learning for Class-Imbalanced Node Classification
Class imbalance in graph data presents significant challenges for node classification. While existing methods, such as SMOTE-based approaches, partially mitigate this issue, they still exhibit limitations in constructing imbalanced graphs. Generative self-supervised learning (SSL) methods, exemplified by graph autoencoders (GAEs), offer a promising solution by directly generating minority nodes from the data itself, yet their potential remains underexplored. In this paper, we delve into the shortcomings of SMOTE-based approaches in the construction of imbalanced graphs. Furthermore, we introduce VIGraph, a simple yet effective generative SSL approach that relies on the Variational GAE as the fundamental model. VIGraph strictly adheres to the concept of imbalance when constructing imbalanced graphs and innovatively leverages the variational inference (VI) ability of Variational GAE to generate nodes for minority classes. VIGraph introduces comprehensive training strategies, including cross-view contrastive learning at the decoding phase to capture semantic knowledge, adjacency matrix reconstruction to preserve graph structure, and alignment strategy to ensure stable training. VIGraph can generate high-quality nodes directly usable for classification, eliminating the need to integrate the generated nodes back to the graph as well as additional retraining found in SMOTE-based methods. We conduct extensive experiments, results from which demonstrate the superiority and generality of our approach.
Updated: 2024-03-27 10:12:31
标题: VIGraph:用于不平衡类别节点分类的生成式自监督学习
摘要: 图数据中的类别不平衡对节点分类提出了重大挑战。虽然现有方法,如基于SMOTE的方法,部分缓解了这一问题,但它们在构建不平衡图时仍存在局限性。生成式自监督学习(SSL)方法,例如图自编码器(GAEs),通过直接从数据本身生成少数节点,提供了一种有前途的解决方案,但其潜力仍未得到充分探索。本文深入探讨了基于SMOTE的方法在构建不平衡图时的缺点。此外,我们介绍了VIGraph,这是一种简单而有效的生成式SSL方法,依赖于变分GAE作为基本模型。VIGraph在构建不平衡图时严格遵循不平衡概念,并创新地利用变分GAE的变分推理(VI)能力为少数类生成节点。VIGraph引入了全面的训练策略,包括在解码阶段进行跨视图对比学习以捕捉语义知识,邻接矩阵重建以保留图结构,以及对齐策略以确保稳定训练。VIGraph可以直接生成可用于分类的高质量节点,无需将生成的节点集成回图中,也无需在基于SMOTE的方法中进行额外的重新训练。我们进行了大量实验,结果表明了我们方法的优越性和通用性。
更新时间: 2024-03-27 10:12:31
领域: cs.LG,cs.AI
The Topos of Transformer Networks
The transformer neural network has significantly out-shined all other neural network architectures as the engine behind large language models. We provide a theoretical analysis of the expressivity of the transformer architecture through the lens of topos theory. From this viewpoint, we show that many common neural network architectures, such as the convolutional, recurrent and graph convolutional networks, can be embedded in a pretopos of piecewise-linear functions, but that the transformer necessarily lives in its topos completion. In particular, this suggests that the two network families instantiate different fragments of logic: the former are first order, whereas transformers are higher-order reasoners. Furthermore, we draw parallels with architecture search and gradient descent, integrating our analysis in the framework of cybernetic agents.
Updated: 2024-03-27 10:06:33
标题: 变压器网络的拓扑结构
摘要: 变压器神经网络作为大型语言模型背后的引擎,明显优于所有其他神经网络架构。我们通过拓扑理论的视角提供了对变压器架构表现力的理论分析。从这个视角出发,我们展示了许多常见的神经网络架构,如卷积、循环和图卷积网络,可以嵌入分段线性函数的前拓扑中,但变压器必须存在于其拓扑完备性中。特别是,这表明这两个网络家族实现了不同的逻辑片段:前者是一阶的,而变压器是高阶推理者。此外,我们将架构搜索和梯度下降与我们的分析相结合,将其整合到网络智能代理的框架中。
更新时间: 2024-03-27 10:06:33
领域: cs.LG,math.CT
Modelling the Impact of Quantum Circuit Imperfections on Networks and Computer Applications
Post Quantum and Quantum Cryptography schemes are feasible quantum computer applications for 7G networks. These schemes could possibly replace existing schemes. These algorithms have been compromised by advances in quantum search algorithms run on quantum computers like Shor algorithm. Shor algorithm is a quantum algorithm for finding the prime factors of an integer which is the basis of existing algorithm. This has become an available quantum computer application putting the use of ESA algorithm at risk. Our recent paper provides a detailed survey of the work on post quantum and quantum cryptography algorithms with focus on their applicability in 7G networks. Since the paper focuses on the cryptography algorithms as a follow up, in this paper, we provide a new framework for quantum network optimization and survey in detail the work on enabling technologies (quantum hardware) for the practical implementation of these algorithms including the most important segments of quantum hardware in 7G. As always in engineering practice practical solutions are a compromise between the performance and complexity of the implementation. For this reason, as the main contribution, the paper presents a network and computer applications optimization framework that includes implementation imperfections. The tools should be useful in optimizing future generation practical computer system design. After that a comprehensive survey of the existing work on quantum hardware is presented pointing out the sources of these imperfections. This enables us to make a fair assessment of how much investment into quantum hardware improvements contributes to the performance enhancement of the overall system. In this way a decision can be made on proper partitioning between the investment in hardware and system level complexity.
Updated: 2024-03-27 10:00:35
标题: 建模量子电路缺陷对网络和计算应用的影响
摘要: 后量子和量子密码方案是7G网络可行的量子计算机应用。这些方案可能取代现有方案。这些算法已经被在量子计算机上运行的量子搜索算法,如Shor算法,所妥协。Shor算法是一种用于找到整数的质因数的量子算法,这是现有算法的基础。这已经成为一个可用的量子计算机应用,使ESA算法的使用面临风险。我们最近的一篇论文提供了关于后量子和量子密码算法的工作的详细调查,重点放在它们在7G网络中的适用性上。 由于该论文关注密钥密码算法作为后续,因此,在本文中,我们提供了一个新的量子网络优化框架,并详细调查了为这些算法的实际实现提供的启用技术(量子硬件)的工作,包括在7G中最重要的量子硬件部分。正如在工程实践中总是如此,实际解决方案是实现性能和复杂性之间的折衷。出于这个原因,作为主要贡献,该论文提出了一个包括实现缺陷的网络和计算机应用优化框架。这些工具应该有助于优化未来一代实际计算机系统设计。在此之后,对现有量子硬件工作的全面调查被提出,指出这些缺陷的来源。这使我们能够公正评估投资于量子硬件改进对整个系统性能提升的贡献。通过这种方式,可以就硬件投资和系统级复杂性之间的适当分配做出决策。
更新时间: 2024-03-27 10:00:35
领域: cs.CR,quant-ph
A2V: A Semi-Supervised Domain Adaptation Framework for Brain Vessel Segmentation via Two-Phase Training Angiography-to-Venography Translation
We present a semi-supervised domain adaptation framework for brain vessel segmentation from different image modalities. Existing state-of-the-art methods focus on a single modality, despite the wide range of available cerebrovascular imaging techniques. This can lead to significant distribution shifts that negatively impact the generalization across modalities. By relying on annotated angiographies and a limited number of annotated venographies, our framework accomplishes image-to-image translation and semantic segmentation, leveraging a disentangled and semantically rich latent space to represent heterogeneous data and perform image-level adaptation from source to target domains. Moreover, we reduce the typical complexity of cycle-based architectures and minimize the use of adversarial training, which allows us to build an efficient and intuitive model with stable training. We evaluate our method on magnetic resonance angiographies and venographies. While achieving state-of-the-art performance in the source domain, our method attains a Dice score coefficient in the target domain that is only 8.9% lower, highlighting its promising potential for robust cerebrovascular image segmentation across different modalities.
Updated: 2024-03-27 09:51:15
标题: A2V: 一种半监督领域自适应框架,用于通过两阶段训练进行脑血管分割的血管造影至静脉造影转换
摘要: 我们提出了一种半监督领域自适应框架,用于从不同图像模态中分割脑血管。现有的最先进方法侧重于单一模态,尽管现有各种脑血管成像技术。这可能导致显著的分布转移,对跨模态的泛化产生负面影响。通过依赖有注释的血管造影和有限数量的有注释的静脉造影,我们的框架实现了图像到图像的转换和语义分割,利用一个解耦和语义丰富的潜在空间来表示异构数据,并执行从源域到目标域的图像级自适应。此外,我们减少了基于循环的架构的典型复杂性,并最小化了对抗性训练的使用,这使我们能够构建一个高效且直观的模型,具有稳定的训练。我们在磁共振血管造影和静脉造影上评估了我们的方法。虽然在源域中实现了最先进的性能,但我们的方法在目标域中获得的Dice分数系数仅比8.9%低,突显了其跨不同模态的坚固脑血管图像分割潜力。
更新时间: 2024-03-27 09:51:15
领域: eess.IV,cs.CV,cs.LG
A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification
Semi-supervised learning (SSL) is a practical challenge in computer vision. Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL. These approaches employ a threshold-to-pseudo-label (T2L) process to generate PLs by truncating the confidence scores of unlabeled data predicted by the self-training method. However, self-trained models typically yield biased and high-variance predictions, especially in the scenarios when a little labeled data are supplied. To address this issue, we propose a lightweight channel-based ensemble method to effectively consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one. Importantly, our approach can be readily extended to any SSL framework, such as FixMatch or FreeMatch. Experimental results demonstrate that our method significantly outperforms state-of-the-art techniques on CIFAR10/100 in terms of effectiveness and efficiency.
Updated: 2024-03-27 09:49:37
标题: 一种通道集成方法:无偏和低方差的伪标签对半监督分类至关重要
摘要: 半监督学习(SSL)是计算机视觉中的一个实际挑战。伪标签(PL)方法,例如FixMatch和FreeMatch,在SSL中取得了最先进的性能。这些方法采用了一个阈值到伪标签(T2L)过程,通过截断自我训练方法预测的未标记数据的置信度分数来生成伪标签。然而,自我训练模型通常产生偏倚和高方差的预测,特别是在提供少量标记数据的情况下。为了解决这个问题,我们提出了一种轻量级基于通道的集成方法,有效地将多个较差的伪标签整合成理论上保证的无偏和低方差的伪标签。重要的是,我们的方法可以很容易地扩展到任何SSL框架,如FixMatch或FreeMatch。实验结果表明,我们的方法在CIFAR10/100上在效果和效率方面明显优于最先进的技术。
更新时间: 2024-03-27 09:49:37
领域: cs.CV,cs.AI
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm well matched to a real-world RL deployment process. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but we show this can unnecessarily limit policy performance if the behavior policy is far from optimal. Instead, we forgo constraints and frame OtO RL as an exploration problem that aims to maximize the benefit of online data-collection. We first study the major online RL exploration methods based on intrinsic rewards and UCB in the OtO setting, showing that intrinsic rewards add training instability through reward-function modification, and UCB methods are myopic and it is unclear which learned-component's ensemble to use for action selection. We then introduce an algorithm for planning to go out-of-distribution (PTGOOD) that avoids these issues. PTGOOD uses a non-myopic planning procedure that targets exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy without altering rewards. We show empirically in several continuous control tasks that PTGOOD significantly improves agent returns during online fine-tuning and avoids the suboptimal policy convergence that many of our baselines exhibit in several environments.
Updated: 2024-03-27 09:48:34
标题: 计划在离线到在线强化学习中进行分布外探索
摘要: 离线预训练与静态数据集,随后进行在线微调(离线到在线,或OtO)是与现实世界RL部署过程相匹配的一种范式。在这种情况下,我们的目标是在有限的在线交互预算内找到表现最佳的策略。在OtO设置中的先前工作侧重于校正由离线RL算法的策略约束机制引入的偏差。这些约束使得学习的策略与收集数据集的行为策略保持接近,但我们表明,如果行为策略远离最优,则这可能无谓地限制策略性能。相反,我们放弃约束,并将OtO RL框架定为一个旨在最大化在线数据收集利益的探索问题。我们首先研究了基于内在奖励和UCB的主要在线RL探索方法在OtO设置中的情况,表明内在奖励通过修改奖励函数带来训练不稳定性,而UCB方法是短视的,不清楚应该使用哪个学习组件的集合进行动作选择。然后,我们介绍了一种名为计划走出分布(PTGOOD)的算法,避免了这些问题。PTGOOD使用一个非短视的计划程序,针对相对高奖励区域的状态-动作空间进行探索,这些区域不太可能被行为策略访问。通过利用条件熵瓶颈的概念,PTGOOD鼓励在线收集的数据提供与改进最终部署策略相关的新信息,而不改变奖励。我们在多个连续控制任务中通过实验证明,PTGOOD显著提高了代理的回报,在在线微调过程中避免了许多我们的基线在多个环境中表现出的次优策略收敛。
更新时间: 2024-03-27 09:48:34
领域: cs.LG
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Stimulated by the sophisticated reasoning capabilities of recent Large Language Models (LLMs), a variety of strategies for bridging video modality have been devised. A prominent strategy involves Video Language Models (VideoLMs), which train a learnable interface with video data to connect advanced vision encoders with LLMs. Recently, an alternative strategy has surfaced, employing readily available foundation models, such as VideoLMs and LLMs, across multiple stages for modality bridging. In this study, we introduce a simple yet novel strategy where only a single Vision Language Model (VLM) is utilized. Our starting point is the plain insight that a video comprises a series of images, or frames, interwoven with temporal information. The essence of video comprehension lies in adeptly managing the temporal aspects along with the spatial details of each frame. Initially, we transform a video into a single composite image by arranging multiple frames in a grid layout. The resulting single image is termed as an image grid. This format, while maintaining the appearance of a solitary image, effectively retains temporal information within the grid structure. Therefore, the image grid approach enables direct application of a single high-performance VLM without necessitating any video-data training. Our extensive experimental analysis across ten zero-shot video question answering benchmarks, including five open-ended and five multiple-choice benchmarks, reveals that the proposed Image Grid Vision Language Model (IG-VLM) surpasses the existing methods in nine out of ten benchmarks.
Updated: 2024-03-27 09:48:23
标题: 一幅图像网格可能胜过视频:使用 VLM 的零样本视频问答
摘要: 受最近大型语言模型(LLMs)复杂推理能力的激发,已经设计出了多种用于连接视频模态的策略。其中一个突出的策略涉及视频语言模型(VideoLMs),它通过训练可学习的接口与视频数据连接高级视觉编码器和LLMs。最近,出现了一种替代策略,利用现成的基础模型,如VideoLMs和LLMs,在多个阶段进行模态桥接。在这项研究中,我们介绍了一种简单而新颖的策略,仅利用单一的视觉语言模型(VLM)。我们的出发点是,视频由一系列图像或帧组成,其中交织着时间信息。视频理解的关键在于熟练地管理每帧的时间方面以及空间细节。最初,我们将视频转换为一个单一的复合图像,通过将多个帧排列在一个网格布局中。产生的单一图像被称为图像网格。这种格式在保持单一图像外观的同时,有效地在网格结构中保留了时间信息。因此,图像网格方法使得可以直接应用单一高性能的VLM,而无需进行任何视频数据训练。我们在十个零样本视频问答基准测试中进行了广泛的实验分析,包括五个开放式问题和五个多项选择基准测试,结果显示,提出的图像网格视觉语言模型(IG-VLM)在十个基准测试中有九个超越了现有方法。
更新时间: 2024-03-27 09:48:23
领域: cs.CV,cs.AI,cs.CL,cs.LG
Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval
Collecting relevant judgments for legal case retrieval is a challenging and time-consuming task. Accurately judging the relevance between two legal cases requires a considerable effort to read the lengthy text and a high level of domain expertise to extract Legal Facts and make juridical judgments. With the advent of advanced large language models, some recent studies have suggested that it is promising to use LLMs for relevance judgment. Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval is yet to be thoroughly explored. To fill this research gap, we devise a novel few-shot workflow tailored to the relevant judgment of legal cases. The proposed workflow breaks down the annotation process into a series of stages, imitating the process employed by human annotators and enabling a flexible integration of expert reasoning to enhance the accuracy of relevance judgments. By comparing the relevance judgments of LLMs and human experts, we empirically show that we can obtain reliable relevance judgments with the proposed workflow. Furthermore, we demonstrate the capacity to augment existing legal case retrieval models through the synthesis of data generated by the large language model.
Updated: 2024-03-27 09:46:56
标题: 利用大型语言模型进行法律案例检索中的相关性判断
摘要: 收集相关法律案例以用于检索是一项具有挑战性且耗时的任务。准确判断两个法律案例之间的相关性需要阅读冗长的文本并具有高水平的领域专业知识以提取法律事实并做出司法判断。随着先进的大型语言模型的出现,一些最近的研究表明使用LLMs进行相关性判断是很有前景的。然而,在法律案例检索中使用通用大型语言模型进行可靠的相关性判断的方法还有待深入探讨。为了填补这一研究空白,我们设计了一个针对法律案例相关判断的新颖少样本工作流程。所提出的工作流程将注释过程分解为一系列阶段,模仿人类标注者所采用的过程,并使得专家推理灵活地融入其中以增强相关性判断的准确性。通过比较LLMs和人类专家的相关性判断,我们从经验上表明我们可以通过所提出的工作流程获得可靠的相关性判断。此外,我们展示了通过合成大型语言模型生成的数据来增强现有的法律案例检索模型的能力。
更新时间: 2024-03-27 09:46:56
领域: cs.AI
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs
Analyzing the behavior of cryptographic functions in stripped binaries is a challenging but essential task. Cryptographic algorithms exhibit greater logical complexity compared to typical code, yet their analysis is unavoidable in areas such as virus analysis and legacy code inspection. Existing methods often rely on data or structural pattern matching, leading to suboptimal generalizability and suffering from manual work. In this paper, we propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. In FoC, we first build a binary large language model (FoCBinLLM) to summarize the semantics of cryptographic functions in natural language. The prediction of FoC-BinLLM is insensitive to minor changes, such as vulnerability patches. To mitigate it, we further build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database. In addition, we construct a cryptographic binary dataset for evaluation and to facilitate further research in this domain. And an automated method is devised to create semantic labels for extensive binary functions. Evaluation results demonstrate that FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score. FoC-Sim outperforms the previous best methods with a 52% higher Recall@1. Furthermore, our method also shows practical ability in virus analysis and 1-day vulnerability detection.
Updated: 2024-03-27 09:45:33
标题: FoC: 通过LLMs在剥离二进制文件中找出加密功能
摘要: 分析剥离二进制文件中加密函数的行为是一项具有挑战性但必不可少的任务。与典型代码相比,加密算法展现出更大的逻辑复杂性,然而在病毒分析和遗留代码检查等领域,对其进行分析是不可避免的。现有方法通常依赖于数据或结构模式匹配,导致泛化能力不佳且需要大量手工工作。在本文中,我们提出了一个名为FoC的新型框架,用于识别剥离二进制文件中的加密函数。在FoC中,我们首先构建了一个二进制大语言模型(FoCBinLLM),用于用自然语言总结加密函数的语义。FoC-BinLLM的预测对于次要变化(例如漏洞修补)是不敏感的。为了减轻这一问题,我们进一步在FoC-BinLLM上构建了一个二进制代码相似性模型(FoC-Sim),用于创建对变化敏感的表示,并用它来检索数据库中未知加密函数的类似实现。此外,我们构建了一个加密二进制数据集用于评估并促进该领域的进一步研究。并且设计了一种自动化方法来为广泛的二进制函数创建语义标签。评估结果表明,FoC-BinLLM在ROUGE-L得分上比ChatGPT高出14.61%。FoC-Sim在Recall@1上比先前最佳方法表现出52%的提高。此外,我们的方法还展示了在病毒分析和一日漏洞检测方面的实际能力。
更新时间: 2024-03-27 09:45:33
领域: cs.CR
On Spectrogram Analysis in a Multiple Classifier Fusion Framework for Power Grid Classification Using Electric Network Frequency
The Electric Network Frequency (ENF) serves as a unique signature inherent to power distribution systems. Here, a novel approach for power grid classification is developed, leveraging ENF. Spectrograms are generated from audio and power recordings across different grids, revealing distinctive ENF patterns that aid in grid classification through a fusion of classifiers. Four traditional machine learning classifiers plus a Convolutional Neural Network (CNN), optimized using Neural Architecture Search, are developed for One-vs-All classification. This process generates numerous predictions per sample, which are then compiled and used to train a shallow multi-label neural network specifically designed to model the fusion process, ultimately leading to the conclusive class prediction for each sample. Experimental findings reveal that both validation and testing accuracy outperform those of current state-of-the-art classifiers, underlining the effectiveness and robustness of the proposed methodology.
Updated: 2024-03-27 09:44:50
标题: 关于在多分类器融合框架中使用频率进行电网分类的频谱分析
摘要: 电网频率(ENF)作为电力分配系统固有的独特签名。在这里,开发了一种利用ENF进行电网分类的新方法。从不同电网的音频和电力记录中生成频谱图,揭示出有助于通过融合分类器进行电网分类的独特ENF模式。开发了四种传统机器学习分类器和一个经过神经结构搜索优化的卷积神经网络(CNN)用于一对多分类。这个过程为每个样本生成了大量预测结果,然后编译并用于训练一个浅层多标签神经网络,专门设计用于模拟融合过程,最终导致对每个样本的确定性类别预测。实验结果显示,验证和测试准确性均优于当前最先进的分类器,突显了所提出方法的有效性和鲁棒性。
更新时间: 2024-03-27 09:44:50
领域: cs.LG
Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks
Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, optimisation techniques, and regularisation methods aimed at improving stability and realism in art generation enabling effective study of generated patterns. The proposed mDCGAN incorporates meticulous adjustments in layer configurations and architectural choices, offering tailored solutions to the unique demands of art generation while effectively combating issues like mode collapse and gradient vanishing. Further this paper explores the generated latent space by performing random walks to understand vector relationships between brush strokes and colours in the abstract art space and a statistical analysis of unstable outputs after a certain period of GAN training and compare its significant difference. These findings validate the effectiveness of the proposed approach, emphasising its potential to revolutionise the field of digital art generation and digital art ecosystem.
Updated: 2024-03-27 09:35:56
标题: 使用修改后的深度卷积生成对抗网络识别抽象艺术中的颜色和笔触模式
摘要: 抽象艺术是一种极受欢迎、广泛讨论的艺术形式,通常能够表现艺术家的情感。许多研究人员尝试使用机器学习和深度学习的边缘检测、笔触和情感识别算法来研究抽象艺术。本文描述了使用生成对抗神经网络(GAN)对广泛分布的抽象绘画进行研究。GAN具有学习和再现分布的能力,使研究人员和科学家能够有效地探索和研究生成的图像空间。然而,挑战在于开发一个有效的GAN架构,克服常见的训练困难。本文通过引入专为高质量艺术品生成而设计的改进型DCGAN(mDCGAN)来解决这一挑战。该方法涉及对改进的探索,深入研究DCGAN的复杂工作方式,优化技术和旨在提高稳定性和逼真性的正规化方法,从而实现有效研究生成的图案。所提出的mDCGAN在层配置和架构选择方面进行了细致调整,为艺术生成的独特需求提供了量身定制的解决方案,同时有效地解决了模式崩溃和梯度消失等问题。此外,本文通过执行随机漫步来探索生成的潜在空间,以了解抽象艺术空间中笔触和颜色之间的向量关系,并对一定周期的GAN训练后的不稳定输出进行统计分析并比较其显著差异。这些发现验证了所提出方法的有效性,并强调其改革数字艺术生成领域和数字艺术生态系统的潜力。
更新时间: 2024-03-27 09:35:56
领域: cs.CV,cs.AI,cs.LG
Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering
Graph learning is widely recognized as a crucial technique in multi-view clustering. Existing graph learning methods typically involve constructing an adaptive neighbor graph based on probabilistic neighbors and then learning a consensus graph to for clustering, however, they are confronted with two limitations. Firstly, they often rely on Euclidean distance to measure similarity when constructing the adaptive neighbor graph, which proves inadequate in capturing the intrinsic structure among data points in many real-world scenarios. Secondly, most of these methods focus solely on consensus graph, ignoring view-specific graph information. In response to the aforementioned drawbacks, we in this paper propose a novel tensor-based graph learning framework that simultaneously considers consistency and specificity for multi-view clustering. Specifically, we calculate the similarity distance on the Stiefel manifold to preserve the intrinsic structure among data points. By making an assumption that the learned neighbor graph of each view comprises both a consistent graph and a view-specific graph, we formulate a new tensor-based target graph learning paradigm. Owing to the benefits of tensor singular value decomposition (t-SVD) in uncovering high-order correlations, this model is capable of achieving a complete understanding of the target graph. Furthermore, we develop an iterative algorithm to solve the proposed objective optimization problem. Experiments conducted on real-world datasets have demonstrated the superior performance of the proposed method over some state-of-the-art multi-view clustering methods. The source code has been released on https://github.com/lshi91/CSTGL-Code.
Updated: 2024-03-27 09:30:50
标题: 基于张量的一致性和特异性图学习用于多视图聚类
摘要: 图形学习被广泛认为是多视图聚类中的关键技术。现有的图形学习方法通常涉及基于概率邻居构建自适应邻居图,然后学习一个一致性图进行聚类,然而,它们面临两个限制。首先,它们通常依赖于欧氏距离来度量相似性,当构建自适应邻居图时,这在许多实际场景中捕捉数据点之间的内在结构方面是不足够的。其次,大多数这些方法仅关注一致性图,忽略了视图特定的图形信息。针对上述缺点,本文提出了一种新的基于张量的图学习框架,同时考虑了多视图聚类的一致性和特异性。具体而言,我们在Stiefel流形上计算相似性距离,以保留数据点之间的内在结构。通过假设每个视图学习到的邻居图包含一致图和视图特定图,我们制定了一个新的基于张量的目标图学习范式。由于张量奇异值分解(t-SVD)在揭示高阶相关性方面的优势,这个模型能够完全理解目标图。此外,我们开发了一个迭代算法来解决提出的目标优化问题。在真实数据集上进行的实验表明,所提出的方法相对于一些最新的多视图聚类方法具有更优越的性能。源代码已发布在https://github.com/lshi91/CSTGL-Code。
更新时间: 2024-03-27 09:30:50
领域: cs.LG
FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion
Spiking Neural Networks (SNNs) offer a promising avenue for energy-efficient computing compared with Artificial Neural Networks (ANNs), closely mirroring biological neural processes. However, this potential comes with inherent challenges in directly training SNNs through spatio-temporal backpropagation -- stemming from the temporal dynamics of spiking neurons and their discrete signal processing -- which necessitates alternative ways of training, most notably through ANN-SNN conversion. In this work, we introduce a lightweight Forward Temporal Bias Correction (FTBC) technique, aimed at enhancing conversion accuracy without the computational overhead. We ground our method on provided theoretical findings that through proper temporal bias calibration the expected error of ANN-SNN conversion can be reduced to be zero after each time step. We further propose a heuristic algorithm for finding the temporal bias only in the forward pass, thus eliminating the computational burden of backpropagation and we evaluate our method on CIFAR-10/100 and ImageNet datasets, achieving a notable increase in accuracy on all datasets. Codes are released at a GitHub repository.
Updated: 2024-03-27 09:25:20
标题: FTBC: 用于优化ANN-SNN转换的前向时间偏差校正
摘要: 脉冲神经网络(SNN)相较于人工神经网络(ANN)提供了一种节能的计算方法,与生物神经过程密切相关。然而,直接通过时空反向传播训练SNN存在固有挑战,源自脉冲神经元的时间动态和其离散信号处理,这需要通过ANN-SNN转换等替代训练方法。在本研究中,我们介绍了一种轻量级的前向时间偏差校正(FTBC)技术,旨在提高转换准确性而不增加计算负担。我们的方法基于提供的理论发现,通过正确的时间偏差校准,ANN-SNN转换的预期误差可以在每个时间步长后降至零。我们进一步提出了一种启发式算法,用于仅在前向传递中找到时间偏差,从而消除反向传播的计算负担,并在CIFAR-10/100和ImageNet数据集上评估了我们的方法,在所有数据集上均取得了显著的准确性提升。代码已在GitHub存储库中发布。
更新时间: 2024-03-27 09:25:20
领域: cs.AI,cs.CV
Learning Quadruped Locomotion Using Differentiable Simulation
While most recent advancements in legged robot control have been driven by model-free reinforcement learning, we explore the potential of differentiable simulation. Differentiable simulation promises faster convergence and more stable training by computing low-variant first-order gradients using the robot model, but so far, its use for legged robot control has remained limited to simulation. The main challenge with differentiable simulation lies in the complex optimization landscape of robotic tasks due to discontinuities in contact-rich environments, e.g., quadruped locomotion. This work proposes a new, differentiable simulation framework to overcome these challenges. The key idea involves decoupling the complex whole-body simulation, which may exhibit discontinuities due to contact, into two separate continuous domains. Subsequently, we align the robot state resulting from the simplified model with a more precise, non-differentiable simulator to maintain sufficient simulation accuracy. Our framework enables learning quadruped walking in minutes using a single simulated robot without any parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills, including trot, pace, bound, and gallop, on challenging terrains in minutes. Additionally, our policy achieves robust locomotion performance in the real world zero-shot. To the best of our knowledge, this work represents the first demonstration of using differentiable simulation for controlling a real quadruped robot. This work provides several important insights into using differentiable simulations for legged locomotion in the real world.
Updated: 2024-03-27 09:24:55
标题: 使用可微分仿真学习四足动物的运动
摘要: 最近大多数四足机器人控制方面的进展都是由无模型强化学习推动的,我们探索了可微模拟的潜力。可微模拟通过使用机器人模型计算低变异一阶梯度,承诺更快的收敛速度和更稳定的训练,但迄今为止,它在四足机器人控制方面的应用仍然局限于模拟。可微模拟面临的主要挑战在于机器人任务的优化景观复杂,因为在接触丰富的环境中存在不连续性,例如四足动物的运动。本文提出了一个新的、可微模拟框架来克服这些挑战。关键思想涉及将可能因接触而出现不连续性的复杂全身模拟解耦为两个单独的连续域。随后,我们将简化模型产生的机器人状态与更精确的、不可微分的模拟器进行对齐,以保持足够的模拟精度。我们的框架可以使四足机器人在几分钟内学会行走,而无需任何并行化。当增加GPU并行化时,我们的方法允许四足机器人在几分钟内掌握各种具有挑战性地形的运动技能,包括慢跑、步行、奔跑和飞奔。此外,我们的策略可以在实际世界中实现稳健的运动表现。据我们所知,本研究代表了首次使用可微模拟来控制真实四足机器人的示范。本研究为在现实世界中使用可微模拟进行四足动物运动提供了几个重要见解。
更新时间: 2024-03-27 09:24:55
领域: cs.RO,cs.AI
Generative Multi-modal Models are Good Class-Incremental Learners
In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL. However, transitioning from discriminative to generative models requires addressing two key challenges. The primary challenge lies in transferring the generated textual information into the classification of distinct categories. Additionally, it requires formulating the task of CIL within a generative framework. To this end, we propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates labels for images using an adapted generative model. After obtaining the detailed text, we use a text encoder to extract text features and employ feature matching to determine the most similar label as the classification prediction. In the conventional CIL settings, we achieve significantly better results in long-sequence task scenarios. Under the Few-shot CIL setting, we have improved by at least 14\% accuracy over all the current state-of-the-art methods with significantly less forgetting. Our code is available at \url{https://github.com/DoubleClass/GMM}.
Updated: 2024-03-27 09:21:07
标题: 生成式多模态模型是良好的类增量学习器
摘要: 在类增量学习(CIL)场景中,由于分类器对当前任务的偏向性而引起的灾难性遗忘现象长期以来一直是一个重大挑战。这主要是由于辨别模型的特征造成的。随着生成式多模态模型的日益流行,我们将探讨用生成式模型取代辨别模型用于CIL。然而,从辨别模型过渡到生成模型需要解决两个关键挑战。主要挑战在于将生成的文本信息转化为不同类别的分类。此外,还需要在生成框架内制定CIL的任务。为此,我们提出了一个新颖的生成式多模态模型(GMM)框架用于类增量学习。我们的方法直接使用适应的生成模型为图像生成标签。在获得详细文本后,我们使用文本编码器提取文本特征,并使用特征匹配确定最相似的标签作为分类预测。在传统的CIL设置中,在长序列任务场景中我们取得了显著更好的结果。在少样本CIL设置中,我们在所有当前最先进的方法上至少提高了14\%的准确率,遗忘现象显著减少。我们的代码可在\url{https://github.com/DoubleClass/GMM}上找到。
更新时间: 2024-03-27 09:21:07
领域: cs.CV,cs.AI,cs.LG
Improving Attributed Text Generation of Large Language Models via Preference Learning
Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content. Recent works aim to reduce misinformation and hallucinations by resorting to attribution as a means to provide evidence (i.e., citations). However, current attribution methods usually focus on the retrieval stage and automatic evaluation that neglect mirroring the citation mechanisms in human scholarly writing to bolster credibility. In this paper, we address these challenges by modelling the attribution task as preference learning and introducing an Automatic Preference Optimization (APO) framework. First, we create a curated collection for post-training with 6,330 examples by collecting and filtering from existing datasets. Second, considering the high cost of labelling preference data, we further propose an automatic method to synthesize attribution preference data resulting in 95,263 pairs. Moreover, inspired by the human citation process, we further propose a progressive preference optimization method by leveraging fine-grained information. Extensive experiments on three datasets (i.e., ASQA, StrategyQA, and ELI5) demonstrate that APO achieves state-of-the-art citation F1 with higher answer quality.
Updated: 2024-03-27 09:19:13
标题: 通过偏好学习改进大型语言模型的属性文本生成
摘要: 大型语言模型已被广泛应用于自然语言处理领域,但面临生成不可靠内容的挑战。最近的研究旨在通过归因作为提供证据的手段(即引用)来减少错误信息和幻觉。然而,当前的归因方法通常集中在检索阶段和自动评估上,忽视了在人类学术写作中反映引文机制以增强可信度的问题。在本文中,我们将归因任务建模为偏好学习,并引入了自动偏好优化(APO)框架来解决这些挑战。首先,我们通过收集和过滤现有数据集,创建了一个包含6,330个示例的精心策划的集合用于后续训练。其次,考虑到标记偏好数据的高成本,我们进一步提出了一种自动方法来合成归因偏好数据,结果为95,263对。此外,受人类引文过程的启发,我们进一步提出了一种通过利用细粒度信息的渐进偏好优化方法。对三个数据集(即ASQA、StrategyQA和ELI5)的大量实验表明,APO实现了具有更高答案质量的最先进引文F1。
更新时间: 2024-03-27 09:19:13
领域: cs.CL,cs.AI
InferDPT: Privacy-Preserving Inference for Black-box Large Language Model
Large language models (LLMs), like ChatGPT, have greatly simplified text generation tasks. However, they have also raised concerns about privacy risks such as data leakage and unauthorized data collection. Existing solutions for privacy-preserving inference face practical challenges related to computation time and communication costs. In this paper, we propose InferDPT, the first practical framework for the privacy-preserving Inference of black-box LLMs, implementing Differential Privacy in Text generation. InferDPT comprises two key modules: the "perturbation module" utilizes the exponential mechanism to generate a perturbed prompt, facilitating privacy-preserving inference with black-box LLMs, and the "extraction module", inspired by knowledge distillation and retrieval-augmented generation, extracts coherent and consistent text from the perturbed generation result, ensuring successful text generation completion. To address privacy concerns related to previous exponential mechanisms' susceptibility to embedding revision attacks, we introduce RANTEXT, a novel differential privacy mechanism integrated into the perturbation module of InferDPT, which introduces the concept of "RANdom adjacency" for TEXT perturbation within the prompt. Experimental results across three datasets demonstrate that the text generation quality of InferDPT is comparable to that of non-private GPT-4, and RANTEXT surpasses existing state-of-the-art mechanisms, namely, SANTEXT+ and CUSTEXT+ in the trade-off between privacy and utility. Even with an privacy parameter epsilon value of 6.0, RANTEXT achieves an average privacy protection rate exceeding 90% against embedding revision attacks, which is 0.58 times higher than that of SANTEXT+ and 3.35 times higher than that of CUSTEXT+.
Updated: 2024-03-27 09:19:01
标题: InferDPT:保护隐私的黑盒大型语言模型推理
摘要: 大型语言模型(LLMs),如ChatGPT,极大简化了文本生成任务。然而,它们也引发了关于隐私风险,如数据泄漏和未经授权的数据收集的担忧。现有的隐私保护推理解决方案面临与计算时间和通信成本相关的实际挑战。在本文中,我们提出了InferDPT,这是第一个用于保护黑盒LLMs推理的实用框架,实现了在文本生成中的差分隐私。InferDPT包括两个关键模块:“扰动模块”利用指数机制生成扰动提示,促进了与黑盒LLMs的隐私保护推理,并且“提取模块”灵感来自知识蒸馏和检索增强生成,从扰动生成结果中提取连贯一致的文本,确保成功的文本生成完成。为了解决先前指数机制容易受到嵌入式修订攻击的隐私问题,我们引入了RANTEXT,这是一个新颖的差分隐私机制,集成到InferDPT的扰动模块中,引入了“RANdom adjacency”概念,用于在提示中对文本进行扰动。三个数据集的实验结果表明,InferDPT的文本生成质量与非私有的GPT-4相当,而RANTEXT在隐私和效用之间的权衡中超越了现有的最先进机制,即SANTEXT+和CUSTEXT+。即使隐私参数epsilon值为6.0,RANTEXT也实现了平均隐私保护率超过90%,抵御嵌入式修订攻击,比SANTEXT+高出0.58倍,比CUSTEXT+高出3.35倍。
更新时间: 2024-03-27 09:19:01
领域: cs.CR
IIP-Mixer:Intra-Inter Patch Mixing Architecture for Battery Remaining Useful Life Prediction
Accurately estimating the Remaining Useful Life (RUL) of lithium-ion batteries is crucial for maintaining the safe and stable operation of rechargeable battery management systems. However, this task is often challenging due to the complex temporal dynamics involved. Recently, attention-based networks, such as Transformers and Informer, have been the popular architecture in time series forecasting. Despite their effectiveness, these models with abundant parameters necessitate substantial training time to unravel temporal patterns. To tackle these challenges, we propose a simple MLP-Mixer-based architecture named 'Intra-Inter Patch Mixer' (IIP-Mixer), which is an architecture based exclusively on multi-layer perceptrons (MLPs), extracting information by mixing operations along both intra-patch and inter-patch dimensions for battery RUL prediction. The proposed IIP-Mixer comprises parallel dual-head mixer layers: the intra-patch mixing MLP, capturing local temporal patterns in the short-term period, and the inter-patch mixing MLP, capturing global temporal patterns in the long-term period. Notably, to address the varying importance of features in RUL prediction, we introduce a weighted loss function in the MLP-Mixer-based architecture, marking the first time such an approach has been employed. Our experiments demonstrate that IIP-Mixer achieves competitive performance in battery RUL prediction, outperforming other popular time-series frameworks
Updated: 2024-03-27 09:17:50
标题: IIP-Mixer:用于电池剩余寿命预测的内部-外部补丁混合架构
摘要: 准确估计锂离子电池的剩余寿命(RUL)对于维持可充电电池管理系统的安全稳定运行至关重要。然而,由于涉及复杂的时间动态,这项任务通常具有挑战性。最近,基于注意力的网络,如变压器和Informer,已成为时间序列预测中流行的架构。尽管这些模型有效,但由于参数众多,需要大量训练时间来揭示时间模式。为了解决这些挑战,我们提出了一种简单的基于MLP-Mixer的架构,命名为“Intra-Inter Patch Mixer”(IIP-Mixer),这是一种完全基于多层感知器(MLPs)的架构,通过在内部补丁和间补丁维度上的混合操作提取信息,用于电池RUL预测。所提出的IIP-Mixer包括并行双头混合器层:内部补丁混合MLP,捕获短期内的局部时间模式,和间补丁混合MLP,捕获长期内的全局时间模式。值得注意的是,为了解决RUL预测中特征的重要性变化,我们在基于MLP-Mixer的架构中引入了加权损失函数,这是首次采用这种方法。我们的实验证明,IIP-Mixer在电池RUL预测中取得了竞争性能,优于其他流行的时间序列框架。
更新时间: 2024-03-27 09:17:50
领域: cs.LG,cs.AI
Retrieval-Augmented Generation for Large Language Models: A Survey
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development.
Updated: 2024-03-27 09:16:57
标题: 大型语言模型的检索增强生成:一项调查
摘要: 大型语言模型(LLMs)展示了令人印象深刻的能力,但也遇到了幻觉、过时知识和非透明、无法追溯的推理过程等挑战。检索增强生成(RAG)已经成为一个有希望的解决方案,通过整合来自外部数据库的知识。这提高了生成的准确性和可信度,特别是对于知识密集型任务,并允许进行连续的知识更新和领域特定信息的整合。RAG将LLMs固有的知识与外部数据库的庞大、动态的知识库相结合。这篇综述论文详细审视了RAG范式的发展,包括天真的RAG、高级的RAG和模块化的RAG。它细致地审查了RAG框架的三方基础,包括检索、生成和增强技术。该论文突出了每个关键组件中嵌入的最新技术,深入了解RAG系统的进展。此外,该论文介绍了最新的评估框架和基准。最后,本文勾勒了当前面临的挑战,并指出了未来的研究和发展方向。
更新时间: 2024-03-27 09:16:57
领域: cs.CL,cs.AI
Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.
Updated: 2024-03-27 09:14:36
标题: 慢进程感知低延迟同步联邦学习:通过逐层模型更新实现
摘要: 同步联邦学习(FL)是协作边缘学习的一种流行范式。它通常涉及一组不同类型的设备在本地并行训练神经网络(NN)模型,并定期进行集中聚合。由于一些设备可能具有有限的计算资源和不同的可用性,FL延迟对于慢设备非常敏感。传统方法会丢弃由慢设备完成的不完整模型更新,改变本地工作负载和架构的量,或者转向异步设置;所有这些都会影响在严格的训练延迟约束下训练模型的性能。在这项工作中,我们提出了一种考虑慢设备的逐层联邦学习(SALF),它通过反向传播优化NN的过程以逐层方式更新全局模型。SALF允许慢设备同步传递部分梯度,每个全局模型的层都独立更新,具有不同的贡献用户集。我们提供了一个理论分析,对参与设备的分布进行了温和假设,建立了全局模型的收敛保证,揭示了SALF与没有时间限制的FL具有相同的渐近速率收敛。这一见解与实证观察相匹配,展示了SALF相对于减轻FL中设备异质性差距的替代机制的性能增益。
更新时间: 2024-03-27 09:14:36
领域: cs.LG,eess.SP
World Models via Policy-Guided Trajectory Diffusion
World models are a powerful tool for developing intelligent agents. By predicting the outcome of a sequence of actions, world models enable policies to be optimised via on-policy reinforcement learning (RL) using synthetic data, i.e. in "in imagination". Existing world models are autoregressive in that they interleave predicting the next state with sampling the next action from the policy. Prediction error inevitably compounds as the trajectory length grows. In this work, we propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model. Our approach, Policy-Guided Trajectory Diffusion (PolyGRAD), leverages a denoising model in addition to the gradient of the action distribution of the policy to diffuse a trajectory of initially random states and actions into an on-policy synthetic trajectory. We analyse the connections between PolyGRAD, score-based generative models, and classifier-guided diffusion models. Our results demonstrate that PolyGRAD outperforms state-of-the-art baselines in terms of trajectory prediction error for short trajectories, with the exception of autoregressive diffusion. For short trajectories, PolyGRAD obtains similar errors to autoregressive diffusion, but with lower computational requirements. For long trajectories, PolyGRAD obtains comparable performance to baselines. Our experiments demonstrate that PolyGRAD enables performant policies to be trained via on-policy RL in imagination for MuJoCo continuous control domains. Thus, PolyGRAD introduces a new paradigm for accurate on-policy world modelling without autoregressive sampling.
Updated: 2024-03-27 09:11:48
标题: 通过策略引导的轨迹扩散实现世界模型
摘要: 世界模型是开发智能代理的强大工具。通过预测一系列行动的结果,世界模型使政策能够通过使用合成数据在“想象中”进行在线政策强化学习(RL)进行优化。现有的世界模型是自回归的,它交替地预测下一个状态并从政策中抽样下一个行动。随着轨迹长度的增长,预测误差不可避免地会增加。在这项工作中,我们提出了一种新颖的世界建模方法,它不是自回归的,并且通过扩散模型在一次通过中生成整个在线政策轨迹。我们的方法,策略引导轨迹扩散(PolyGRAD),利用了一个去噪模型以及政策的动作分布的梯度,将最初随机的状态和行动的轨迹扩散成一个在线政策合成轨迹。我们分析了PolyGRAD、基于分数的生成模型和分类器引导扩散模型之间的关系。我们的结果表明,PolyGRAD在短轨迹的轨迹预测误差方面优于现有的基线,除了自回归扩散。对于短轨迹,PolyGRAD获得了与自回归扩散相似的误差,但计算要求更低。对于长轨迹,PolyGRAD获得了与基线相当的性能。我们的实验证明,PolyGRAD使得能够通过在“想象中”进行在线政策RL训练出性能良好的策略,适用于MuJoCo连续控制领域。因此,PolyGRAD引入了一种新的准确的在线政策世界建模范式,而不需要自回归抽样。
更新时间: 2024-03-27 09:11:48
领域: cs.LG,cs.AI
Emerging Trends in Federated Learning: From Model Fusion to Federated X Learning
Federated learning is a new learning paradigm that decouples data collection and model training via multi-party computation and model aggregation. As a flexible learning setting, federated learning has the potential to integrate with other learning frameworks. We conduct a focused survey of federated learning in conjunction with other learning algorithms. Specifically, we explore various learning algorithms to improve the vanilla federated averaging algorithm and review model fusion methods such as adaptive aggregation, regularization, clustered methods, and Bayesian methods. Following the emerging trends, we also discuss federated learning in the intersection with other learning paradigms, termed federated X learning, where X includes multitask learning, meta-learning, transfer learning, unsupervised learning, and reinforcement learning. In addition to reviewing state-of-the-art studies, this paper also identifies key challenges and applications in this field, while also highlighting promising future directions.
Updated: 2024-03-27 09:07:29
标题: 新兴趋势在联邦学习中:从模型融合到联邦X学习
摘要: 联邦学习是一种通过多方计算和模型聚合解耦数据收集和模型训练的新学习范式。作为一种灵活的学习设置,联邦学习有潜力与其他学习框架集成。我们针对联邦学习与其他学习算法结合进行了重点调查。具体来说,我们探讨了各种学习算法,以改进基本的联邦平均算法,并审查了模型融合方法,如自适应聚合、正则化、聚类方法和贝叶斯方法。随着新兴趋势,我们还讨论了联邦学习与其他学习范式的交叉,称为联邦X学习,其中X包括多任务学习、元学习、迁移学习、无监督学习和强化学习。除了审查最新研究成果,本文还识别了该领域的关键挑战和应用,同时强调了前景光明的未来方向。
更新时间: 2024-03-27 09:07:29
领域: cs.LG,cs.DC
Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2
We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limitation and have also proven effective, but their highly iterative nature also affects scalability. To address this scalability challenge, we leverage the "Residual-to-Residual DNN series for high-Dynamic range imaging (R2D2)" approach recently introduced in astronomical imaging. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of DNNs taking the previous iteration's image estimate and associated data residual as inputs. The method can be interpreted as a learned version of the Matching Pursuit algorithm. We demonstrate R2D2 in simulation, considering radial k-space sampling acquisition sequences. Our preliminary results suggest that R2D2 achieves: (i) suboptimal performance compared to its unrolled incarnation R2D2-Net, which is however non-scalable due to the necessary embedding of NUFFT-based data-consistency layers; (ii) superior reconstruction quality to a scalable version of R2D2-Net embedding an FFT-based approximation for data consistency; (iii) superior reconstruction quality to PnP, while only requiring few iterations.
Updated: 2024-03-27 09:07:02
标题: 具有R2D2的可扩展非笛卡尔磁共振成像
摘要: 我们提出了一种新的非笛卡尔磁共振图像重建方法。虽然展开的架构通过数据一致性层提供了鲁棒性,但在大规模情况下将测量算子嵌入深度神经网络(DNN)可能变得不切实际。另一种替代的插拔式(PnP)方法,其中去噪DNN对测量设置是盲目的,不受此限制的影响,而且也被证明是有效的,但它们的高度迭代性也影响了可伸缩性。为了解决这一可伸缩性挑战,我们利用了最近在天文成像中引入的"高动态范围成像的残差到残差DNN系列(R2D2)"方法。R2D2的重建被形成为一系列残差图像,被迭代地估计为取前一次迭代的图像估计和相关数据残差作为输入的DNN的输出。该方法可以被解释为匹配追踪算法的学习版本。我们在模拟中展示了R2D2,考虑径向k-空间采样收集序列。我们的初步结果表明,R2D2实现了:(i)相对于其非展开版本R2D2-Net的次优性能,然而由于NUFFT基础数据一致性层的必要嵌入而不具备可伸缩性;(ii)相对于将基于FFT的近似用于数据一致性的可伸缩版本的R2D2-Net具有更优质的重建质量;(iii)相对于PnP具有更优质的重建质量,同时只需要少量迭代。
更新时间: 2024-03-27 09:07:02
领域: eess.IV,cs.CV,cs.LG,eess.SP
Ship in Sight: Diffusion Models for Ship-Image Super Resolution
In recent years, remarkable advancements have been achieved in the field of image generation, primarily driven by the escalating demand for high-quality outcomes across various image generation subtasks, such as inpainting, denoising, and super resolution. A major effort is devoted to exploring the application of super-resolution techniques to enhance the quality of low-resolution images. In this context, our method explores in depth the problem of ship image super resolution, which is crucial for coastal and port surveillance. We investigate the opportunity given by the growing interest in text-to-image diffusion models, taking advantage of the prior knowledge that such foundation models have already learned. In particular, we present a diffusion-model-based architecture that leverages text conditioning during training while being class-aware, to best preserve the crucial details of the ships during the generation of the super-resoluted image. Since the specificity of this task and the scarcity availability of off-the-shelf data, we also introduce a large labeled ship dataset scraped from online ship images, mostly from ShipSpotting\footnote{\url{www.shipspotting.com}} website. Our method achieves more robust results than other deep learning models previously employed for super resolution, as proven by the multiple experiments performed. Moreover, we investigate how this model can benefit downstream tasks, such as classification and object detection, thus emphasizing practical implementation in a real-world scenario. Experimental results show flexibility, reliability, and impressive performance of the proposed framework over state-of-the-art methods for different tasks. The code is available at: https://github.com/LuigiSigillo/ShipinSight .
Updated: 2024-03-27 09:06:36
标题: 船只在视野中:船只图像超分辨率的扩散模型
摘要: 近年来,在图像生成领域取得了显著的进展,主要受到对各种图像生成子任务(如修补、去噪和超分辨率)高质量结果不断增加的需求推动。人们正在致力于探索超分辨率技术在提高低分辨率图像质量方面的应用。在这种背景下,我们的方法深入探讨了船舶图像超分辨率的问题,这对沿海和港口监视至关重要。我们研究了文本到图像扩散模型日益增长的兴趣所带来的机会,利用这些基础模型已经学到的先验知识。具体来说,我们提出了一种基于扩散模型的架构,在训练过程中利用文本条件,同时具有类别感知性,以最好地保留在生成超分辨率图像过程中的船只的关键细节。由于这一任务的特殊性和现成数据的稀缺性,我们还介绍了一个大型的从在线船舶图像中获取的带标签的船只数据集,主要来源于ShipSpotting网站。我们的方法比先前用于超分辨率的其他深度学习模型取得了更为稳健的结果,多次实验证明了这一点。此外,我们研究了这种模型如何有益于下游任务,如分类和目标检测,从而强调在实际场景中的实际实施。实验结果显示了所提出框架在不同任务上的灵活性、可靠性和出色性能,优于当前最先进方法。代码可在https://github.com/LuigiSigillo/ShipinSight上找到。
更新时间: 2024-03-27 09:06:36
领域: cs.CV,cs.LG
High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers
Robust distributed learning with Byzantine failures has attracted extensive research interests in recent years. However, most of existing methods suffer from curse of dimensionality, which is increasingly serious with the growing complexity of modern machine learning models. In this paper, we design a new method that is suitable for high dimensional problems, under arbitrary number of Byzantine attackers. The core of our design is a direct high dimensional semi-verified mean estimation method. Our idea is to identify a subspace first. The components of mean value perpendicular to this subspace can be estimated via gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. We then use our new method as the aggregator of distributed learning problems. Our theoretical analysis shows that the new method has minimax optimal statistical rates. In particular, the dependence on dimensionality is significantly improved compared with previous works.
Updated: 2024-03-27 09:04:04
标题: 使用任意数量的拜占庭攻击者的高维分布式梯度下降
摘要: 近年来,具有拜占庭故障的稳健分布式学习引起了广泛的研究兴趣。然而,大多数现有方法都受到维度灾难的困扰,随着现代机器学习模型复杂性的增加,这一困扰变得越来越严重。在本文中,我们设计了一种适用于高维问题的新方法,适用于任意数量的拜占庭攻击者。我们设计的核心是一种直接的高维半验证均值估计方法。我们的想法是首先识别一个子空间。通过从工作机器上传载的梯度向量可以估计垂直于该子空间的均值组件,而在该子空间内的组件则使用辅助数据集进行估计。然后,我们将我们的新方法用作分布式学习问题的聚合器。我们的理论分析表明,这种新方法具有极小最优统计率。特别是,与以前的工作相比,对维度的依赖性得到显着改善。
更新时间: 2024-03-27 09:04:04
领域: cs.LG,cs.CR,cs.DC
Functional Graph Convolutional Networks: A unified multi-task and multi-modal learning framework to facilitate health and social-care insights
This paper introduces a novel Functional Graph Convolutional Network (funGCN) framework that combines Functional Data Analysis and Graph Convolutional Networks to address the complexities of multi-task and multi-modal learning in digital health and longitudinal studies. With the growing importance of health solutions to improve health care and social support, ensure healthy lives, and promote well-being at all ages, funGCN offers a unified approach to handle multivariate longitudinal data for multiple entities and ensures interpretability even with small sample sizes. Key innovations include task-specific embedding components that manage different data types, the ability to perform classification, regression, and forecasting, and the creation of a knowledge graph for insightful data interpretation. The efficacy of funGCN is validated through simulation experiments and a real-data application.
Updated: 2024-03-27 08:57:20
标题: 功能图卷积网络:一个统一的多任务和多模态学习框架,促进健康和社会护理洞察力
摘要: 本文介绍了一种新颖的功能图卷积网络(funGCN)框架,结合了功能数据分析和图卷积网络,以解决数字健康和纵向研究中多任务和多模态学习的复杂性。随着健康解决方案对改善医疗保健和社会支持的重要性日益增长,确保各个年龄段的健康生活和促进幸福感,funGCN提供了一种统一的方法来处理多个实体的多变量纵向数据,并确保即使样本量较小也能解释。关键创新包括管理不同数据类型的任务特定嵌入组件,执行分类、回归和预测的能力,以及创建知识图以进行深入的数据解释。funGCN的有效性通过模拟实验和实际数据应用得到验证。
更新时间: 2024-03-27 08:57:20
领域: cs.LG,cs.AI
Intent-Aware DRL-Based Uplink Dynamic Scheduler for 5G-NR
We investigate the problem of supporting Industrial Internet of Things user equipment (IIoT UEs) with intent (i.e., requested quality of service (QoS)) and random traffic arrival. A deep reinforcement learning (DRL) based centralized dynamic scheduler for time-frequency resources is proposed to learn how to schedule the available communication resources among the IIoT UEs. The proposed scheduler leverages an RL framework to adapt to the dynamic changes in the wireless communication system and traffic arrivals. Moreover, a graph-based reduction scheme is proposed to reduce the state and action space of the RL framework to allow fast convergence and a better learning strategy. Simulation results demonstrate the effectiveness of the proposed intelligent scheduler in guaranteeing the expressed intent of IIoT UEs compared to several traditional scheduling schemes, such as round-robin, semi-static, and heuristic approaches. The proposed scheduler also outperforms the contention-free and contention-based schemes in maximizing the number of successfully computed tasks.
Updated: 2024-03-27 08:57:15
标题: 意图感知的基于DRL的5G-NR上行动态调度器
摘要: 我们研究了支持工业物联网用户设备(IIoT UEs)具有意图(即,请求的服务质量(QoS))和随机流量到达的问题。提出了一种基于深度强化学习(DRL)的集中式动态调度器,用于学习如何在IIoT UEs之间调度可用通信资源。所提出的调度器利用RL框架适应无线通信系统和流量到达的动态变化。此外,提出了一种基于图的缩减方案,用于减少RL框架的状态和动作空间,以实现快速收敛和更好的学习策略。模拟结果表明,与几种传统调度方案(如循环、半静态和启发式方法)相比,所提出的智能调度器在保证IIoT UEs表达意图方面的有效性。所提出的调度器还优于无争用和基于争用的方案,在最大化成功计算任务的数量方面表现更好。
更新时间: 2024-03-27 08:57:15
领域: cs.IT,cs.AI,cs.LG,math.IT
Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning
Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories, unseen object categories, and real-world data. Further details of this work are provided at https://fqyqc.github.io/ConTranPlan/.
Updated: 2024-03-27 08:54:35
标题: 学习基于概念的因果转换和符号推理用于视觉规划
摘要: 视觉规划模拟了人类如何做决策以实现期望的目标,通过搜索初始视觉状态和最终视觉目标状态之间的视觉因果转变来实现。在以自我为中心的视觉中,视觉规划越来越重要,因为它在指导代理在复杂环境中执行日常任务方面具有优势。在本文中,我们提出了一个可解释且可泛化的视觉规划框架,包括:i)一种新颖的基于替代的概念学习器(SCL),将视觉输入抽象为分解的概念表示,ii)符号抽象和推理,通过自学习的符号执行任务规划,以及iii)一个视觉因果转变模型(ViCT),将视觉因果转变与语义相似的现实世界行为联系起来。给定一个初始状态,我们使用由学习表示和因果转变推动的符号推理方法进行目标条件的视觉规划,以达到目标状态。为验证所提出模型的有效性,我们收集了一个基于AI2-THOR的大规模视觉规划数据集,称为CCTP。在这个具有挑战性的数据集上的大量实验表明了我们方法在视觉任务规划中的卓越性能。从经验上讲,我们展示了我们的框架可以泛化到未见过的任务轨迹、未见过的物体类别和真实世界数据。有关这项工作的进一步细节请访问https://fqyqc.github.io/ConTranPlan/。
更新时间: 2024-03-27 08:54:35
领域: cs.AI,cs.CV,cs.LG
Centered Masking for Language-Image Pre-Training
We introduce Gaussian masking for Language-Image Pre-Training (GLIP) a novel, straightforward, and effective technique for masking image patches during pre-training of a vision-language model. GLIP builds on Fast Language-Image Pre-Training (FLIP), which randomly masks image patches while training a CLIP model. GLIP replaces random masking with centered masking, that uses a Gaussian distribution and is inspired by the importance of image patches at the center of the image. GLIP retains the same computational savings as FLIP, while improving performance across a range of downstream datasets and tasks, as demonstrated by our experimental results. We show the benefits of GLIP to be easy to obtain, requiring no delicate tuning of the Gaussian, and also applicable to data sets containing images without an obvious center focus.
Updated: 2024-03-27 08:54:06
标题: 以中心掩模为基础的语言-图像预训练
摘要: 我们引入了高斯遮罩技术用于语言-图像预训练(GLIP),这是一种新颖、直接且有效的技术,用于在视觉-语言模型的预训练过程中遮罩图像补丁。GLIP建立在Fast Language-Image Pre-Training(FLIP)的基础上,该技术在训练CLIP模型时会随机遮罩图像补丁。GLIP将随机遮罩替换为中心遮罩,使用高斯分布,受到图像中心补丁重要性的启发。GLIP保留了与FLIP相同的计算节省效果,同时在一系列下游数据集和任务中提高了性能,我们的实验结果证明了这一点。我们展示了GLIP的好处易于获得,无需精细调整高斯分布,并且也适用于包含没有明显中心焦点的图像数据集。
更新时间: 2024-03-27 08:54:06
领域: cs.CV,cs.CL,cs.LG
Asymptotic Bayes risk of semi-supervised learning with uncertain labeling
This article considers a semi-supervised classification setting on a Gaussian mixture model, where the data is not labeled strictly as usual, but instead with uncertain labels. Our main aim is to compute the Bayes risk for this model. We compare the behavior of the Bayes risk and the best known algorithm for this model. This comparison eventually gives new insights over the algorithm.
Updated: 2024-03-27 08:49:19
标题: 半监督学习中具有不确定标签的渐近贝叶斯风险
摘要: 本文考虑了在高斯混合模型上的半监督分类设置,其中数据不像通常那样严格标记,而是带有不确定标签。我们的主要目标是计算该模型的贝叶斯风险。我们比较了贝叶斯风险和该模型的最佳已知算法的行为。这种比较最终为算法提供了新的见解。
更新时间: 2024-03-27 08:49:19
领域: stat.ML,cs.LG
Supervised Multiple Kernel Learning approaches for multi-omics data integration
Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.We provide novel MKL approaches based on different kernel fusion strategies.To learn from the meta-kernel of input kernels, we adaptedunsupervised integration algorithms for supervised tasks with support vector machines.We also tested deep learning architectures for kernel fusion and classification.The results show that MKL-based models can compete with more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics genomic data. Our results offer a direction for bio-data mining research and further development of methods for heterogeneous data integration.
Updated: 2024-03-27 08:48:16
标题: 监督式多核学习方法用于多组学数据整合
摘要: 高通量技术的进步导致了越来越多的组学数据集的可用性。整合多个异质数据源目前是生物学和生物信息学的一个问题。多核学习(MKL)已被证明是一个灵活且有效的方法,可以考虑多组学输入的多样性,尽管它在基因组数据挖掘中是一个少用的工具。我们提供了基于不同核融合策略的新型MKL方法。为了从输入核的元核中学习,我们采用了无监督整合算法,用于支持向量机的监督任务。我们还测试了核融合和分类的深度学习架构。结果表明,基于MKL的模型可以与更复杂、最先进的监督多组学整合方法竞争。多核学习为多组学基因组数据中的预测模型提供了一个自然的框架。我们的结果为生物数据挖掘研究和异质数据整合方法的进一步发展提供了方向。
更新时间: 2024-03-27 08:48:16
领域: stat.ML,cs.LG,stat.AP
Identifying the Correlation Between Language Distance and Cross-Lingual Transfer in a Multilingual Representation Space
Prior research has investigated the impact of various linguistic features on cross-lingual transfer performance. In this study, we investigate the manner in which this effect can be mapped onto the representation space. While past studies have focused on the impact on cross-lingual alignment in multilingual language models during fine-tuning, this study examines the absolute evolution of the respective language representation spaces produced by MLLMs. We place a specific emphasis on the role of linguistic characteristics and investigate their inter-correlation with the impact on representation spaces and cross-lingual transfer performance. Additionally, this paper provides preliminary evidence of how these findings can be leveraged to enhance transfer to linguistically distant languages.
Updated: 2024-03-27 08:43:28
标题: 在多语言表示空间中,语言距离与跨语言转移之间的关联识别
摘要: 过去的研究已经调查了各种语言特征对跨语言转移性能的影响。在本研究中,我们调查了这种影响如何映射到表示空间中。过去的研究集中在微调期间多语言语言模型中对跨语言对齐的影响,而本研究则检查了由MLLMs生成的各自语言表示空间的绝对演变。我们特别强调了语言特征的作用,并调查了它们与表示空间和跨语言转移性能影响之间的相互关系。此外,本文提供了初步证据,说明如何利用这些发现来增强对语言差异较大的语言的转移。
更新时间: 2024-03-27 08:43:28
领域: cs.CL,cs.AI,cs.LG
Generating Diverse Agricultural Data for Vision-Based Farming Applications
We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. This model is capable of simulating distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural generation process enhances the photorealism and applicability of the synthetic data. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture, such as semantic segmentation for autonomous weed control. We validate our model's effectiveness by comparing the synthetic data against real agricultural images, demonstrating its potential to significantly augment training data for machine learning models in agriculture. This approach not only provides a cost-effective solution for generating high-quality, diverse data but also addresses specific needs in agricultural vision tasks that are not fully covered by general-purpose models.
Updated: 2024-03-27 08:42:47
标题: 生成多样化的农业数据用于基于视觉的农业应用
摘要: 我们提出了一个专门的程序模型,用于生成合成农业场景,重点放在大豆作物以及各种杂草上。该模型能够模拟这些植物的不同生长阶段、多样化的土壤条件,以及在不同光照条件下随机排列的田地。将现实世界的纹理和环境因素整合到程序生成过程中,增强了合成数据的逼真性和适用性。我们的数据集包括带有语义标签的12,000张图像,为精准农业中的计算机视觉任务提供了全面的资源,比如自主除草的语义分割。通过将合成数据与真实农业图像进行比较,我们验证了我们模型的有效性,展示了其在为农业机器学习模型提供训练数据方面的潜力。这种方法不仅提供了一种经济高效的生成高质量多样化数据的解决方案,还满足了农业视觉任务中特定需求,这些需求一般用通用模型无法完全覆盖。
更新时间: 2024-03-27 08:42:47
领域: cs.CV,cs.AI,cs.GR,cs.LG,68T07, 68T45,I.2.10; I.4.6
A Quantum Fuzzy-based Approach for Real-Time Detection of Solar Coronal Holes
The detection and analysis of the solar coronal holes (CHs) is an important field of study in the domain of solar physics. Mainly, it is required for the proper prediction of the geomagnetic storms which directly or indirectly affect various space and ground-based systems. For the detection of CHs till date, the solar scientist depends on manual hand-drawn approaches. However, with the advancement of image processing technologies, some automated image segmentation methods have been used for the detection of CHs. In-spite of this, fast and accurate detection of CHs are till a major issues. Here in this work, a novel quantum computing-based fast fuzzy c-mean technique has been developed for fast detection of the CHs region. The task has been carried out in two stages, in first stage the solar image has been segmented using a quantum computing based fast fuzzy c-mean (QCFFCM) and in the later stage the CHs has been extracted out from the segmented image based on image morphological operation. In the work, quantum computing has been used to optimize the cost function of the fast fuzzy c-mean (FFCM) algorithm, where quantum approximate optimization algorithm (QAOA) has been used to optimize the quadratic part of the cost function. The proposed method has been tested for 193 \AA{} SDO/AIA full-disk solar image datasets and has been compared with the existing techniques. The outcome shows the comparable performance of the proposed method with the existing one within a very lesser time.
Updated: 2024-03-27 08:38:56
标题: 一种基于量子模糊的方法用于太阳日冕空洞的实时检测
摘要: 太阳冠空洞(CHs)的检测和分析是太阳物理领域的重要研究领域。主要是为了正确预测直接或间接影响各种空间和地面系统的地磁暴。迄今为止,太阳科学家主要依赖手工绘制方法来检测CHs。然而,随着图像处理技术的进步,一些自动图像分割方法已被用于检测CHs。尽管如此,快速且准确地检测CHs仍然是一个主要问题。在这项工作中,开发了一种基于量子计算的快速模糊c均值技术,用于快速检测CHs区域。任务分为两个阶段进行,首先使用基于量子计算的快速模糊c均值(QCFFCM)对太阳图像进行分割,然后根据图像形态学运算从分割图像中提取出CHs。在该工作中,量子计算被用来优化快速模糊c均值(FFCM)算法的成本函数,其中使用量子近似优化算法(QAOA)来优化成本函数的二次部分。该方法已在193 \AA{} SDO/AIA全盘太阳图像数据集上进行了测试,并与现有技术进行了比较。结果表明,在较短的时间内,所提出的方法与现有方法具有可比较的性能。
更新时间: 2024-03-27 08:38:56
领域: astro-ph.SR,cs.AI,cs.CV,cs.LG
LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models
To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address these challenges by proposing LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models (LLMs). Essentially, we reformulate the lane change prediction task as a language modeling problem, processing heterogeneous driving scenario information in natural language as prompts for input into the LLM and employing a supervised fine-tuning technique to tailor the LLM specifically for our lane change prediction task. This allows us to utilize the LLM's powerful common sense reasoning abilities to understand complex interactive information, thereby improving the accuracy of long-term predictions. Furthermore, we incorporate explanatory requirements into the prompts in the inference stage. Therefore, our LC-LLM model not only can predict lane change intentions and trajectories but also provides explanations for its predictions, enhancing the interpretability. Extensive experiments on the large-scale highD dataset demonstrate the superior performance and interpretability of our LC-LLM in lane change prediction task. To the best of our knowledge, this is the first attempt to utilize LLMs for predicting lane change behavior. Our study shows that LLMs can encode comprehensive interaction information for driving behavior understanding.
Updated: 2024-03-27 08:34:55
标题: LC-LLM:具有大型语言模型的可解释变道意图和轨迹预测
摘要: 为了确保在动态环境下安全驾驶,自动驾驶车辆应具备准确预测周围车辆变道意图并预测其未来轨迹的能力。现有的运动预测方法在长期预测准确性和可解释性方面仍有很大的改进空间。在本文中,我们通过提出LC-LLM,一个可解释的变道预测模型,利用大型语言模型(LLMs)的强大推理能力和自我解释能力来解决这些挑战。本质上,我们将变道预测任务重新构建为一个语言建模问题,将异构驾驶场景信息以自然语言形式作为输入提示,并采用监督微调技术专门为我们的变道预测任务定制LLM。这使我们能够利用LLM强大的常识推理能力来理解复杂的交互信息,从而提高长期预测的准确性。此外,我们在推理阶段将解释要求纳入提示中。因此,我们的LC-LLM模型不仅可以预测变道意图和轨迹,还可以为其预测提供解释,增强了可解释性。对大规模高D数据集的广泛实验表明,我们的LC-LLM在变道预测任务中表现出更优越的性能和可解释性。据我们所知,这是首次尝试利用LLMs来预测变道行为。我们的研究表明,LLMs可以为驾驶行为理解编码全面的交互信息。
更新时间: 2024-03-27 08:34:55
领域: cs.AI
The Artificial Neural Twin -- Process Optimization and Continual Learning in Distributed Process Chains
Industrial process optimization and control is crucial to increase economic and ecologic efficiency. However, data sovereignty, differing goals, or the required expert knowledge for implementation impede holistic implementation. Further, the increasing use of data-driven AI-methods in process models and industrial sensory often requires regular fine-tuning to accommodate distribution drifts. We propose the Artificial Neural Twin, which combines concepts from model predictive control, deep learning, and sensor networks to address these issues. Our approach introduces differentiable data fusion to estimate the state of distributed process steps and their dependence on input data. By treating the interconnected process steps as a quasi neural-network, we can backpropagate loss gradients for process optimization or model fine-tuning to process parameters or AI models respectively. The concept is demonstrated on a virtual machine park simulated in Unity, consisting of bulk material processes in plastic recycling.
Updated: 2024-03-27 08:34:39
标题: 人工神经双胞胎 - 分布式流程链中的过程优化和持续学习
摘要: 工业过程优化和控制对于增加经济和生态效益至关重要。然而,数据主权、不同的目标或实施所需的专业知识阻碍了整体实施。此外,工业传感器中越来越多地使用数据驱动的人工智能方法,通常需要定期进行微调以适应分布漂移。我们提出了人工神经双胞胎,结合了模型预测控制、深度学习和传感器网络的概念,以解决这些问题。我们的方法引入了可微分的数据融合,估计分布式过程步骤的状态及其对输入数据的依赖性。通过将相互连接的过程步骤视为准神经网络,我们可以将过程优化或模型微调的损失梯度反向传播到相应的过程参数或人工智能模型。该概念在Unity中模拟的虚拟机组中得到了验证,其中包括塑料回收中的散装材料过程。
更新时间: 2024-03-27 08:34:39
领域: cs.LG,I.2.11; J.2; F.2.2
mALBERT: Is a Compact Multilingual BERT Model Still Worth It?
Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical andecological impact of such models. In this article, considering these critical remarks, we propose to focus on smallermodels, such as compact models like ALBERT, which are more ecologically virtuous than these PLM. However,PLMs enable huge breakthroughs in Natural Language Processing tasks, such as Spoken and Natural LanguageUnderstanding, classification, Question--Answering tasks. PLMs also have the advantage of being multilingual, and,as far as we know, a multilingual version of compact ALBERT models does not exist. Considering these facts, wepropose the free release of the first version of a multilingual compact ALBERT model, pre-trained using Wikipediadata, which complies with the ethical aspect of such a language model. We also evaluate the model against classicalmultilingual PLMs in classical NLP tasks. Finally, this paper proposes a rare study on the subword tokenizationimpact on language performances.
Updated: 2024-03-27 08:25:28
标题: mALBERT:一个紧凑的多语言BERT模型仍然值得吗?
摘要: 在当前的预训练语言模型(PLM)的趋势中,关于这些模型的伦理和生态影响出现了越来越多的批评。在本文中,考虑到这些批评意见,我们建议专注于较小的模型,例如像ALBERT这样的紧凑模型,这些模型在生态方面比这些PLM更具优势。然而,PLM使自然语言处理任务取得了巨大突破,例如口语和自然语言理解、分类、问答任务。PLM还具有多语言的优势,据我们所知,紧凑型ALBERT模型的多语言版本尚不存在。考虑到这些事实,我们提议释放第一个版本的多语言紧凑ALBERT模型,使用维基百科数据进行预训练,符合这种语言模型的伦理方面。我们还评估了该模型在经典自然语言处理任务中与经典多语言PLM的对比。最后,本文提出了一项关于子词记号化对语言性能影响的罕见研究。
更新时间: 2024-03-27 08:25:28
领域: cs.AI
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks. Previous physical attacks against MDE models rely on 2D adversarial patches, so they only affect a small, localized region in the MDE map but fail under various viewpoints. To address these limitations, we propose 3D Depth Fool (3D$^2$Fool), the first 3D texture-based adversarial attack against MDE models. 3D$^2$Fool is specifically optimized to generate 3D adversarial textures agnostic to model types of vehicles and to have improved robustness in bad weather conditions, such as rain and fog. Experimental results validate the superior performance of our 3D$^2$Fool across various scenarios, including vehicles, MDE models, weather conditions, and viewpoints. Real-world experiments with printed 3D textures on physical vehicle models further demonstrate that our 3D$^2$Fool can cause an MDE error of over 10 meters.
Updated: 2024-03-27 08:23:09
标题: 自动驾驶中单目深度估计的物理3D对抗攻击
摘要: 基于深度学习的单目深度估计(MDE)在自动驾驶中得到广泛应用,但已知容易受到对抗攻击的影响。先前针对MDE模型的物理攻击依赖于2D对抗性补丁,因此它们只会影响MDE图中的一个小范围区域,在不同视角下无法成功。为了解决这些限制,我们提出了3D Depth Fool(3D$^2$Fool),这是针对MDE模型的第一个基于3D纹理的对抗攻击。3D$^2$Fool专门优化以生成对车辆模型类型不可知的3D对抗纹理,并在恶劣天气条件下(如雨和雾)具有改进的鲁棒性。实验结果验证了我们的3D$^2$Fool在各种情况下(包括车辆、MDE模型、天气条件和视角)的卓越性能。在物理车辆模型上使用印刷的3D纹理进行的现实世界实验进一步证明,我们的3D$^2$Fool可以导致MDE误差超过10米。
更新时间: 2024-03-27 08:23:09
领域: cs.CV,cs.CR
Macroscale fracture surface segmentation via semi-supervised learning considering the structural similarity
To this date the safety assessment of materials, used for example in the nuclear power sector, commonly relies on a fracture mechanical analysis utilizing macroscopic concepts, where a global load quantity K or J is compared to the materials fracture toughness curve. Part of the experimental effort involved in these concepts is dedicated to the quantitative analysis of fracture surfaces. Within the scope of this study a methodology for the semi-supervised training of deep learning models for fracture surface segmentation on a macroscopic level was established. Therefore, three distinct and unique datasets were created to analyze the influence of structural similarity on the segmentation capability. The structural similarity differs due to the assessed materials and specimen, as well as imaging-induced variance due to fluctuations in image acquisition in different laboratories. The datasets correspond to typical isolated laboratory conditions, complex real-world circumstances, and a curated subset of the two. We implemented a weak-to-strong consistency regularization for semi-supervised learning. On the heterogeneous dataset we were able to train robust and well-generalizing models that learned feature representations from images across different domains without observing a significant drop in prediction quality. Furthermore, our approach reduced the number of labeled images required for training by a factor of 6. To demonstrate the success of our method and the benefit of our approach for the fracture mechanics assessment, we utilized the models for initial crack size measurements with the area average method. For the laboratory setting, the deep learning assisted measurements proved to have the same quality as manual measurements. For models trained on the heterogeneous dataset, very good measurement accuracies with mean deviations smaller than 1 % could be achieved...
Updated: 2024-03-27 08:21:41
标题: 考虑结构相似性的半监督学习在宏观尺度断裂表面分割中的应用
摘要: 迄今为止,用于核能行业等领域的材料的安全评估通常依赖于利用宏观概念的断裂力学分析,其中将全局载荷量K或J与材料的断裂韧性曲线进行比较。这些概念涉及的实验工作的一部分致力于对断裂表面进行定量分析。在本研究范围内,建立了一种用于宏观级别断裂表面分割的深度学习模型的半监督训练方法。因此,创建了三个独特且独特的数据集,以分析结构相似性对分割能力的影响。由于评估材料和试样的结构相似性不同,以及由于不同实验室中图像获取中的波动引起的成像诱发的变化,因此结构相似性有所不同。数据集对应于典型的孤立实验室条件、复杂的现实世界情况以及这两者的筛选子集。我们实施了半监督学习的弱到强一致性规则。在异构数据集上,我们能够训练出稳健且泛化能力良好的模型,这些模型可以从不同领域的图像中学习特征表示,而不会观察到预测质量的显著下降。此外,我们的方法将所需的标记图像数量减少了6倍。为了展示我们方法的成功以及我们方法对断裂力学评估的好处,我们利用模型对初始裂缝尺寸进行了面积平均法测量。对于实验室环境,深度学习辅助测量证明具有与手动测量相同的质量。对于在异构数据集上训练的模型,可以实现非常好的测量精度,平均偏差小于1%。
更新时间: 2024-03-27 08:21:41
领域: cs.LG,I.m
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual language models for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.
Updated: 2024-03-27 08:21:01
标题: 一个用于德语、法语和日语药物警戒的数据集:跨语言标注不良药物反应
摘要: 用户生成的数据来源在揭示药物不良反应(ADRs)方面变得越来越重要,在数字世界中进行了越来越多的讨论。然而,现有的临床语料库主要围绕英文科学文章展开。本文介绍了一个涉及药物不良反应的多语言语料库,这些文本来自多种来源,包括患者论坛、社交媒体和德语、法语和日语的临床报告。我们的语料库包含涵盖12种实体类型、四种属性类型和13种关系类型的注释。它有助于为医疗保健开发真实世界的多语言语言模型。我们提供统计数据以突出与语料库相关的某些挑战,并进行初步实验,得出了提取实体和这些实体之间关系的强基线,无论是在单一语言内部还是跨语言。
更新时间: 2024-03-27 08:21:01
领域: cs.CL,cs.LG
Tracking-Assisted Object Detection with Event Cameras
Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various memory mechanisms to preserve as many features as possible at the current time, guided by temporal clues. While these implicit-learned memories retain some short-term information, they still struggle to preserve long-term features effectively. In this paper, we consider those invisible objects as pseudo-occluded objects and aim to reveal their features. Firstly, we introduce visibility attribute of objects and contribute an auto-labeling algorithm to append additional visibility labels on an existing event camera dataset. Secondly, we exploit tracking strategies for pseudo-occluded objects to maintain their permanence and retain their bounding boxes, even when features have not been available for a very long time. These strategies can be treated as an explicit-learned memory guided by the tracking objective to record the displacements of objects across frames. Lastly, we propose a spatio-temporal feature aggregation module to enrich the latent features and a consistency loss to increase the robustness of the overall pipeline. We conduct comprehensive experiments to verify our method's effectiveness where still objects are retained but real occluded objects are discarded. The results demonstrate that (1) the additional visibility labels can assist in supervised training, and (2) our method outperforms state-of-the-art approaches with a significant improvement of 7.9% absolute mAP.
Updated: 2024-03-27 08:11:25
标题: 使用事件摄像头进行跟踪辅助的目标检测
摘要: 基于事件的目标检测近年来在计算机视觉社区中引起了关注,这是因为事件摄像头具有出色的特性,如高动态范围和无运动模糊。然而,特征的异步性和稀疏性会导致相对于摄像头没有相对运动的对象变得不可见,这在任务中构成了一个重大挑战。先前的研究已经研究了各种存储机制,以保留尽可能多的当前时间的特征,受到时间线索的引导。虽然这些隐式学习的记忆可以保留一些短期信息,但它们仍然难以有效保留长期特征。在本文中,我们将这些不可见的对象视为伪遮挡对象,并旨在揭示它们的特征。首先,我们介绍了对象的可见属性,并提出了一个自动标记算法,将额外的可见性标签附加到现有的事件摄像头数据集上。其次,我们利用追踪策略对伪遮挡对象进行跟踪,以保持它们的持久性并保留它们的边界框,即使特征很长时间没有出现。这些策略可以被视为由追踪目标引导的显式学习记忆,用于记录对象在帧之间的位移。最后,我们提出了一个时空特征聚合模块来丰富潜在特征,并提出一种一致性损失来增强整体流程的鲁棒性。我们进行了全面实验来验证我们方法的有效性,其中静止的对象被保留,但真正的遮挡对象被丢弃。结果表明,(1)额外的可见性标签可以帮助监督训练,(2)我们的方法在绝对mAP的显著提高方面胜过了最先进的方法,提高了7.9%。
更新时间: 2024-03-27 08:11:25
领域: cs.CV,cs.LG
Can LLMs Converse Formally? Automatically Assessing LLMs in Translating and Interpreting Formal Specifications
Stakeholders often describe system requirements using natural language which are then converted to formal syntax by a domain-expert leading to increased design costs. This paper assesses the capabilities of Large Language Models (LLMs) in converting between natural language descriptions and formal specifications. Existing work has evaluated the capabilities of LLMs in generating formal syntax such as source code but such experiments are typically hand-crafted and use problems that are likely to be in the training set of LLMs, and often require human-annotated datasets. We propose an approach that can use two copies of an LLM in conjunction with an off-the-shelf verifier to automatically evaluate its translation abilities without any additional human input. Our approach generates formal syntax using language grammars to automatically generate a dataset. We conduct an empirical evaluation to measure the accuracy of this translation task and show that SOTA LLMs cannot adequately solve this task, limiting their current utility in the design of complex systems.
Updated: 2024-03-27 08:08:00
标题: LLM能正式交流吗?自动评估LLM在翻译和解释正式规范中的表现
摘要: 利益相关者经常使用自然语言描述系统需求,然后由领域专家将其转换为正式语法,导致设计成本增加。本文评估了大型语言模型(LLMs)在自然语言描述和正式规范之间转换的能力。现有研究已经评估了LLMs在生成形式语法(如源代码)方面的能力,但这些实验通常是手工制作的,并使用可能在LLMs的训练集中的问题,并且通常需要人工标注的数据集。我们提出了一种方法,可以使用两个LLM的副本与现成的验证器结合使用,自动评估其翻译能力,无需额外的人工输入。我们的方法使用语言语法生成形式语法,自动生成数据集。我们进行了实证评估,以衡量这一翻译任务的准确性,并显示目前领先的LLMs无法充分解决这一任务,限制了它们在复杂系统设计中的实用性。
更新时间: 2024-03-27 08:08:00
领域: cs.CL,cs.AI
Privacy-Preserving Distributed Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is an effective data representation tool with numerous applications in signal processing and machine learning. However, deploying NMF in a decentralized manner over ad-hoc networks introduces privacy concerns due to the conventional approach of sharing raw data among network agents. To address this, we propose a privacy-preserving algorithm for fully-distributed NMF that decomposes a distributed large data matrix into left and right matrix factors while safeguarding each agent's local data privacy. It facilitates collaborative estimation of the left matrix factor among agents and enables them to estimate their respective right factors without exposing raw data. To ensure data privacy, we secure information exchanges between neighboring agents utilizing the Paillier cryptosystem, a probabilistic asymmetric algorithm for public-key cryptography that allows computations on encrypted data without decryption. Simulation results conducted on synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithm in achieving privacy-preserving distributed NMF over ad-hoc networks.
Updated: 2024-03-27 08:07:07
标题: 隐私保护的分布式非负矩阵分解
摘要: 非负矩阵分解(NMF)是一种有效的数据表示工具,在信号处理和机器学习中有许多应用。然而,在自组织网络上以分散的方式部署NMF会引入隐私问题,因为传统方法是在网络代理之间共享原始数据。为了解决这个问题,我们提出了一种用于完全分布式NMF的隐私保护算法,它将一个分布式的大数据矩阵分解为左右矩阵因子,同时保护每个代理的本地数据隐私。它促进了代理之间左矩阵因子的协同估计,并使他们能够估计各自的右因子,而不暴露原始数据。为了确保数据隐私,我们利用Paillier加密系统来保护邻近代理之间的信息交换,这是一种概率非对称算法,用于公钥加密,允许在未解密的数据上进行计算。在合成和真实数据集上进行的模拟结果表明,所提出的算法在自组织网络上实现了隐私保护的分布式NMF的有效性。
更新时间: 2024-03-27 08:07:07
领域: cs.CR,cs.DC,cs.LG,eess.SP
Quantum Algorithms: A New Frontier in Financial Crime Prevention
Financial crimes fast proliferation and sophistication require novel approaches that provide robust and effective solutions. This paper explores the potential of quantum algorithms in combating financial crimes. It highlights the advantages of quantum computing by examining traditional and Machine Learning (ML) techniques alongside quantum approaches. The study showcases advanced methodologies such as Quantum Machine Learning (QML) and Quantum Artificial Intelligence (QAI) as powerful solutions for detecting and preventing financial crimes, including money laundering, financial crime detection, cryptocurrency attacks, and market manipulation. These quantum approaches leverage the inherent computational capabilities of quantum computers to overcome limitations faced by classical methods. Furthermore, the paper illustrates how quantum computing can support enhanced financial risk management analysis. Financial institutions can improve their ability to identify and mitigate risks, leading to more robust risk management strategies by exploiting the quantum advantage. This research underscores the transformative impact of quantum algorithms on financial risk management. By embracing quantum technologies, organisations can enhance their capabilities to combat evolving threats and ensure the integrity and stability of financial systems.
Updated: 2024-03-27 07:52:10
标题: 量子算法:金融犯罪预防的新前沿
摘要: 金融犯罪快速蔓延和复杂化需要提供强大有效的解决方案的新方法。本文探讨了量子算法在打击金融犯罪方面的潜力。通过研究传统和机器学习技术以及量子方法,突出了量子计算的优势。研究展示了量子机器学习(QML)和量子人工智能(QAI)等先进方法作为检测和预防金融犯罪(包括洗钱、金融犯罪检测、加密货币攻击和市场操纵)的强大解决方案。这些量子方法利用了量子计算机固有的计算能力来克服经典方法所面临的局限性。此外,本文阐述了量子计算如何支持增强的金融风险管理分析。金融机构可以通过利用量子优势提高识别和减轻风险的能力,从而制定更加健壮的风险管理策略。这项研究强调了量子算法对金融风险管理的变革性影响。通过拥抱量子技术,组织可以增强其应对不断演变的威胁的能力,确保金融系统的完整性和稳定性。
更新时间: 2024-03-27 07:52:10
领域: cs.LG,cs.ET
Implementation of the Principal Component Analysis onto High-Performance Computer Facilities for Hyperspectral Dimensionality Reduction: Results and Comparisons
Dimensionality reduction represents a critical preprocessing step in order to increase the efficiency and the performance of many hyperspectral imaging algorithms. However, dimensionality reduction algorithms, such as the Principal Component Analysis (PCA), suffer from their computationally demanding nature, becoming advisable for their implementation onto high-performance computer architectures for applications under strict latency constraints. This work presents the implementation of the PCA algorithm onto two different high-performance devices, namely, an NVIDIA Graphics Processing Unit (GPU) and a Kalray manycore, uncovering a highly valuable set of tips and tricks in order to take full advantage of the inherent parallelism of these high-performance computing platforms, and hence, reducing the time that is required to process a given hyperspectral image. Moreover, the achieved results obtained with different hyperspectral images have been compared with the ones that were obtained with a field programmable gate array (FPGA)-based implementation of the PCA algorithm that has been recently published, providing, for the first time in the literature, a comprehensive analysis in order to highlight the pros and cons of each option.
Updated: 2024-03-27 07:50:45
标题: 在高性能计算机设备上实施主成分分析以进行高光谱维度缩减:结果和比较
摘要: 降维代表了许多高光谱成像算法中的一个关键预处理步骤,以增加效率和性能。然而,降维算法,如主成分分析(PCA),由于计算密集性而存在问题,因此建议将其实现到高性能计算机架构上,以满足严格的延迟约束条件。本文介绍了PCA算法在两种不同高性能设备上的实现,即NVIDIA图形处理单元(GPU)和Kalray多核,揭示了一套非常有价值的技巧,以充分利用这些高性能计算平台的固有并行性,从而减少处理给定高光谱图像所需的时间。此外,不同高光谱图像获得的结果与最近发表的基于现场可编程门阵列(FPGA)的PCA算法实现相比进行了比较,首次在文献中提供了全面的分析,以突出每种选择的优缺点。
更新时间: 2024-03-27 07:50:45
领域: cs.LG,cs.CV
Multi-Modal Contrastive Learning for Online Clinical Time-Series Applications
Electronic Health Record (EHR) datasets from Intensive Care Units (ICU) contain a diverse set of data modalities. While prior works have successfully leveraged multiple modalities in supervised settings, we apply advanced self-supervised multi-modal contrastive learning techniques to ICU data, specifically focusing on clinical notes and time-series for clinically relevant online prediction tasks. We introduce a loss function Multi-Modal Neighborhood Contrastive Loss (MM-NCL), a soft neighborhood function, and showcase the excellent linear probe and zero-shot performance of our approach.
Updated: 2024-03-27 07:38:36
标题: 多模态对比学习用于在线临床时间序列应用
摘要: 电子健康记录(EHR)数据集来自重症监护室(ICU),包含多种数据模态。虽然先前的研究成功地在监督设置中利用了多种模态,但我们将先进的自监督多模态对比学习技术应用于ICU数据,特别关注临床笔记和时间序列,用于临床相关的在线预测任务。我们引入了一个损失函数Multi-Modal Neighborhood Contrastive Loss(MM-NCL),一个软邻域函数,并展示了我们方法的优秀线性测试和零-shot表现。
更新时间: 2024-03-27 07:38:36
领域: cs.LG
A Physics-embedded Deep Learning Framework for Cloth Simulation
Delicate cloth simulations have long been desired in computer graphics. Various methods were proposed to improve engaged force interactions, collision handling, and numerical integrations. Deep learning has the potential to achieve fast and real-time simulation, but common neural network structures often demand many parameters to capture cloth dynamics. This paper proposes a physics-embedded learning framework that directly encodes physical features of cloth simulation. The convolutional neural network is used to represent spatial correlations of the mass-spring system, after which three branches are designed to learn linear, nonlinear, and time derivate features of cloth physics. The framework can also integrate with other external forces and collision handling through either traditional simulators or sub neural networks. The model is tested across different cloth animation cases, without training with new data. Agreement with baselines and predictive realism successfully validate its generalization ability. Inference efficiency of the proposed model also defeats traditional physics simulation. This framework is also designed to easily integrate with other visual refinement techniques like wrinkle carving, which leaves significant chances to incorporate prevailing macing learning techniques in 3D cloth amination.
Updated: 2024-03-27 07:35:47
标题: 一个基于物理学的深度学习框架用于布料模拟
摘要: 精细的布料模拟长期以来一直是计算机图形学中所期望的。已经提出了各种方法来改进受力交互、碰撞处理和数值积分。深度学习有潜力实现快速和实时的模拟,但常见的神经网络结构通常需要许多参数来捕捉布料动力学。本文提出了一种物理嵌入式学习框架,直接编码布料模拟的物理特征。卷积神经网络用于表示质点弹簧系统的空间相关性,之后设计了三个分支来学习布料物理学的线性、非线性和时间导数特征。该框架还可以通过传统模拟器或子神经网络与其他外部力和碰撞处理集成。该模型在不使用新数据进行训练的情况下在不同的布料动画案例中进行了测试。与基线的一致性和预测的逼真性成功验证了其泛化能力。所提出模型的推理效率也超过了传统的物理模拟。该框架还设计用于轻松与其他视觉细化技术(如皱纹雕刻)集成,这为将流行的机器学习技术整合到3D布料动画中留下了重要机会。
更新时间: 2024-03-27 07:35:47
领域: cs.GR,cs.LG,I.2.0; I.3.7
A thermodynamically consistent physics-informed deep learning material model for short fiber/polymer nanocomposites
This work proposes a physics-informed deep learning (PIDL)-based constitutive model for investigating the viscoelastic-viscoplastic behavior of short fiber-reinforced nanoparticle-filled epoxies under various ambient conditions. The deep-learning model is trained to enforce thermodynamic principles, leading to a thermodynamically consistent constitutive model. To accomplish this, a long short-term memory network is combined with a feed-forward neural network to predict internal variables required for characterizing the internal dissipation of the nanocomposite materials. In addition, another feed-forward neural network is used to indicate the free-energy function, which enables defining the thermodynamic state of the entire system. The PIDL model is initially developed for the three-dimensional case by generating synthetic data from a classical constitutive model. The model is then trained by extracting the data directly from cyclic loading-unloading experimental tests. Numerical examples show that the PIDL model can accurately predict the mechanical behavior of epoxy-based nanocomposites for different volume fractions of fibers and nanoparticles under various hygrothermal conditions.
Updated: 2024-03-27 07:22:32
标题: 一种热力学一致的基于物理学的深度学习材料模型,用于短纤维/聚合物纳米复合材料
摘要: 这项工作提出了一种基于物理信息的深度学习(PIDL)构成模型,用于研究在不同环境条件下短纤维增强纳米颗粒填充环氧树脂的粘弹性-粘塑性行为。深度学习模型被训练以强制执行热力学原理,从而导致一个热力学一致的构成模型。为了实现这一点,长短期记忆网络与前馈神经网络结合,以预测用于表征纳米复合材料内部耗散的内部变量。此外,另一个前馈神经网络被用来指示自由能函数,从而定义整个系统的热力学状态。PIDL模型最初是为三维情况开发的,通过从经典构成模型生成合成数据。然后通过直接从循环加载-卸载实验测试中提取数据来训练模型。数值实例表明,PIDL模型可以准确预测在不同体积分数纤维和纳米颗粒的不同湿热条件下,环氧基纳米复合材料的力学行为。
更新时间: 2024-03-27 07:22:32
领域: cs.LG,cs.AI,cs.CE,cs.NA,math.NA
Temporal Graph Networks for Graph Anomaly Detection in Financial Networks
This paper explores the utilization of Temporal Graph Networks (TGN) for financial anomaly detection, a pressing need in the era of fintech and digitized financial transactions. We present a comprehensive framework that leverages TGN, capable of capturing dynamic changes in edges within financial networks, for fraud detection. Our study compares TGN's performance against static Graph Neural Network (GNN) baselines, as well as cutting-edge hypergraph neural network baselines using DGraph dataset for a realistic financial context. Our results demonstrate that TGN significantly outperforms other models in terms of AUC metrics. This superior performance underlines TGN's potential as an effective tool for detecting financial fraud, showcasing its ability to adapt to the dynamic and complex nature of modern financial systems. We also experimented with various graph embedding modules within the TGN framework and compared the effectiveness of each module. In conclusion, we demonstrated that, even with variations within TGN, it is possible to achieve good performance in the anomaly detection task.
Updated: 2024-03-27 07:17:16
标题: 金融网络中用于图异常检测的时序图网络
摘要: 本文探讨了在金融科技和数字化金融交易时代迫切需要的金融异常检测中利用时间图网络(TGN)的方法。我们提出了一个全面的框架,利用TGN来捕捉金融网络中边的动态变化,用于欺诈检测。我们的研究比较了TGN与静态图神经网络(GNN)基线以及使用DGraph数据集进行实际金融背景下的最新超图神经网络基线的性能。我们的结果表明,TGN在AUC指标方面明显优于其他模型。这种卓越的性能突显了TGN作为一种有效的金融欺诈检测工具的潜力,展示了其适应现代金融系统动态和复杂性的能力。我们还在TGN框架中尝试了各种图嵌入模块,并比较了每个模块的有效性。总之,我们证明了即使在TGN内部存在变化,也可以在异常检测任务中取得良好的性能。
更新时间: 2024-03-27 07:17:16
领域: q-fin.ST,cs.AI,cs.LG
Bayesian Learned Models Can Detect Adversarial Malware For Free
The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can be identified using epistemic uncertainty of ML approaches -- epistemic uncertainty in a machine learning-based malware detector is a result of a lack of similar training samples in regions of the problem space. In particular, a Bayesian formulation can capture the model parameters' distribution and quantify epistemic uncertainty without sacrificing model performance. To verify our hypothesis, we consider Bayesian learning approaches with a mutual information-based formulation to quantify uncertainty and detect adversarial malware in Android, Windows domains and PDF malware. We found, quantifying uncertainty through Bayesian learning methods can defend against adversarial malware. In particular, Bayesian models: (1) are generally capable of identifying adversarial malware in both feature and problem space, (2) can detect concept drift by measuring uncertainty, and (3) with a diversity-promoting approach (or better posterior approximations) lead to parameter instances from the posterior to significantly enhance a detectors' ability.
Updated: 2024-03-27 07:16:48
标题: 贝叶斯学习模型可以免费检测对抗性恶意软件
摘要: 机器学习基础恶意软件检测器对对抗性攻击的脆弱性促使需要强大的解决方案。对抗训练是一种有效的方法,但在扩展到大型数据集时计算成本高昂,且以牺牲模型性能换取稳健性为代价。我们假设对抗性恶意软件利用模型的低置信区域,并可以使用机器学习方法的认识不确定性来识别 -- 恶意软件检测器的认识不确定性是由于问题空间的某些区域缺乏相似的训练样本而产生的。特别地,贝叶斯公式可以捕捉模型参数的分布并量化认识不确定性,而不牺牲模型性能。为了验证我们的假设,我们考虑了基于互信息的贝叶斯学习方法来量化不确定性并检测Android、Windows领域和PDF恶意软件中的对抗性恶意软件。我们发现,通过贝叶斯学习方法量化不确定性可以防御对抗性恶意软件。特别是,贝叶斯模型:(1)通常能够在特征空间和问题空间中识别对抗性恶意软件,(2)可以通过测量不确定性来检测概念漂移,(3)采用多样性促进方法(或更好的后验逼近)会导致从后验中获得的参数实例显著增强检测器的能力。
更新时间: 2024-03-27 07:16:48
领域: cs.CR
The Effects of Mixed Sample Data Augmentation are Class Dependent
Mixed Sample Data Augmentation (MSDA) techniques, such as Mixup, CutMix, and PuzzleMix, have been widely acknowledged for enhancing performance in a variety of tasks. A previous study reported the class dependency of traditional data augmentation (DA), where certain classes benefit disproportionately compared to others. This paper reveals a class dependent effect of MSDA, where some classes experience improved performance while others experience degraded performance. This research addresses the issue of class dependency in MSDA and proposes an algorithm to mitigate it. The approach involves training on a mixture of MSDA and non-MSDA data, which not only mitigates the negative impact on the affected classes, but also improves overall accuracy. Furthermore, we provide in-depth analysis and discussion of why MSDA introduced class dependencies and which classes are most likely to have them.
Updated: 2024-03-27 07:16:28
标题: 混合样本数据增强的效果取决于类别
摘要: 混合样本数据增强(MSDA)技术,如Mixup、CutMix和PuzzleMix,被广泛认可为提升各种任务性能的有效手段。一项先前的研究报告了传统数据增强(DA)的类别依赖性,即某些类别相比其他类别获益更多。本文揭示了MSDA的类别依赖效应,即某些类别的性能得到改善,而其他类别的性能下降。本研究解决了MSDA中类别依赖性的问题,并提出了一种算法来减轻这种影响。该方法涉及在MSDA和非MSDA数据的混合上进行训练,不仅减轻了受影响类别的负面影响,还提高了整体准确性。此外,我们对MSDA为什么引入类别依赖性以及哪些类别最有可能受到影响进行了深入分析和讨论。
更新时间: 2024-03-27 07:16:28
领域: cs.CV,cs.AI
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
Although 3D shape matching and interpolation are highly interrelated, they are often studied separately and applied sequentially to relate different 3D shapes, thus resulting in sub-optimal performance. In this work we present a unified framework to predict both point-wise correspondences and shape interpolation between 3D shapes. To this end, we combine the deep functional map framework with classical surface deformation models to map shapes in both spectral and spatial domains. On the one hand, by incorporating spatial maps, our method obtains more accurate and smooth point-wise correspondences compared to previous functional map methods for shape matching. On the other hand, by introducing spectral maps, our method gets rid of commonly used but computationally expensive geodesic distance constraints that are only valid for near-isometric shape deformations. Furthermore, we propose a novel test-time adaptation scheme to capture both pose-dominant and shape-dominant deformations. Using different challenging datasets, we demonstrate that our method outperforms previous state-of-the-art methods for both shape matching and interpolation, even compared to supervised approaches.
Updated: 2024-03-27 07:16:21
标题: 光谱遇见空间:和谐3D形状匹配和插值
摘要: 尽管3D形状匹配和插值高度相关,但它们经常被分开研究并依次应用于关联不同的3D形状,从而导致性能次优。在这项工作中,我们提出了一个统一的框架,用于预测3D形状之间的点对应关系和形状插值。为此,我们将深度功能映射框架与经典表面变形模型相结合,以在频谱和空间域中映射形状。一方面,通过纳入空间映射,我们的方法与先前的功能映射方法相比,获得了更准确和平滑的点对应关系,用于形状匹配。另一方面,通过引入频谱映射,我们的方法摆脱了通常使用但计算昂贵的仅适用于近等距形状变形的测地距离约束。此外,我们提出了一种新颖的测试时间适应方案,以捕捉姿势主导和形状主导的变形。通过使用不同具有挑战性的数据集,我们证明了我们的方法在形状匹配和插值方面优于先前的最新方法,甚至与监督方法相比。
更新时间: 2024-03-27 07:16:21
领域: cs.CV,cs.AI,cs.CG
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities
There is increasing attention being given to how to regulate AI systems. As governing bodies grapple with what values to encapsulate into regulation, we consider the technical half of the question: To what extent can AI experts vet an AI system for adherence to regulatory requirements? We investigate this question through the lens of two public sector procurement checklists, identifying what we can do now, what should be possible with technical innovation, and what requirements need a more interdisciplinary approach.
Updated: 2024-03-27 07:11:30
标题: 走向可调控的人工智能系统:技术差距与政策机遇
摘要: 越来越多的关注被给予如何监管人工智能系统。随着管理机构努力确定要将哪些价值观纳入监管中,我们考虑了问题的技术方面:人工智能专家在多大程度上可以审查人工智能系统是否符合监管要求?我们通过两个公共部门采购清单的视角对这个问题进行了调查,确定我们现在可以做什么,技术创新应该可以实现什么,以及哪些要求需要更多跨学科的方法。
更新时间: 2024-03-27 07:11:30
领域: cs.AI,cs.CY
MMP++: Motion Manifold Primitives with Parametric Curve Models
Motion Manifold Primitives (MMP), a manifold-based approach for encoding basic motion skills, can produce diverse trajectories, enabling the system to adapt to unseen constraints. Nonetheless, we argue that current MMP models lack crucial functionalities of movement primitives, such as temporal and via-points modulation, found in traditional approaches. This shortfall primarily stems from MMP's reliance on discrete-time trajectories. To overcome these limitations, we introduce Motion Manifold Primitives++ (MMP++), a new model that integrates the strengths of both MMP and traditional methods by incorporating parametric curve representations into the MMP framework. Furthermore, we identify a significant challenge with MMP++: performance degradation due to geometric distortions in the latent space, meaning that similar motions are not closely positioned. To address this, Isometric Motion Manifold Primitives++ (IMMP++) is proposed to ensure the latent space accurately preserves the manifold's geometry. Our experimental results across various applications, including 2-DoF planar motions, 7-DoF robot arm motions, and SE(3) trajectory planning, show that MMP++ and IMMP++ outperform existing methods in trajectory generation tasks, achieving substantial improvements in some cases. Moreover, they enable the modulation of latent coordinates and via-points, thereby allowing efficient online adaptation to dynamic environments.
Updated: 2024-03-27 07:04:58
标题: MMP++:具有参数曲线模型的运动流形基元
摘要: Motion Manifold Primitives (MMP)是一种基于流形的方法,用于编码基本运动技能,可以产生多样化的轨迹,使系统能够适应未知约束。然而,我们认为当前的MMP模型缺乏传统方法中发现的运动原语的关键功能,如时间和经过点调制。这种不足主要源于MMP对离散时间轨迹的依赖。为了克服这些限制,我们引入了Motion Manifold Primitives++ (MMP++),这是一个新模型,将参数曲线表示集成到MMP框架中,从而整合了MMP和传统方法的优势。此外,我们发现MMP++存在一个重要挑战:性能下降,原因是潜在空间中的几何失真,这意味着相似的运动并不紧密相邻。为了解决这个问题,我们提出了等距运动流形原语++ (IMMP++),以确保潜在空间准确保持流形的几何形状。我们在各种应用中的实验结果,包括2自由度平面运动、7自由度机械臂运动和SE(3)轨迹规划,显示MMP++和IMMP++在轨迹生成任务中优于现有方法,在某些情况下实现了显着改进。此外,它们还能够调制潜在坐标和经过点,从而实现对动态环境的高效在线适应。
更新时间: 2024-03-27 07:04:58
领域: cs.AI,cs.LG,cs.RO
A Recommender System for NFT Collectibles with Item Feature
Recommender systems have been actively studied and applied in various domains to deal with information overload. Although there are numerous studies on recommender systems for movies, music, and e-commerce, comparatively less attention has been paid to the recommender system for NFTs despite the continuous growth of the NFT market. This paper presents a recommender system for NFTs that utilizes a variety of data sources, from NFT transaction records to external item features, to generate precise recommendations that cater to individual preferences. We develop a data-efficient graph-based recommender system to efficiently capture the complex relationship between each item and users and generate node(item) embeddings which incorporate both node feature information and graph structure. Furthermore, we exploit inputs beyond user-item interactions, such as image feature, text feature, and price feature. Numerical experiments verify the performance of the graph-based recommender system improves significantly after utilizing all types of item features as side information, thereby outperforming all other baselines.
Updated: 2024-03-27 06:59:39
标题: 一个基于项目特征的NFT收藏品推荐系统
摘要: 推荐系统已被广泛研究和应用于各个领域,以应对信息过载。尽管关于电影、音乐和电子商务的推荐系统有大量研究,但相比之下,尽管NFT市场持续增长,对NFT的推荐系统付出的注意相对较少。本文提出了一个利用各种数据源的NFT推荐系统,从NFT交易记录到外部物品特征,生成符合个人偏好的精准推荐。我们开发了一种数据高效的基于图的推荐系统,能够有效捕捉每个物品与用户之间的复杂关系,并生成同时包含节点特征信息和图结构的节点(物品)嵌入。此外,我们利用超出用户-物品互动之外的输入,如图像特征、文本特征和价格特征。数值实验验证了在利用所有类型的物品特征作为辅助信息后,基于图的推荐系统的性能显著提高,从而胜过所有其他基线模型。
更新时间: 2024-03-27 06:59:39
领域: cs.IR,cs.AI
Super-Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using SDO/HMI Data and an Attention-Aided Convolutional Neural Network
Image super-resolution has been an important subject in image processing and recognition. Here, we present an attention-aided convolutional neural network (CNN) for solar image super-resolution. Our method, named SolarCNN, aims to enhance the quality of line-of-sight (LOS) magnetograms of solar active regions (ARs) collected by the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric Observatory (SOHO). The ground-truth labels used for training SolarCNN are the LOS magnetograms collected by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO). Solar ARs consist of strong magnetic fields in which magnetic energy can suddenly be released to produce extreme space weather events, such as solar flares, coronal mass ejections, and solar energetic particles. SOHO/MDI covers Solar Cycle 23, which is stronger with more eruptive events than Cycle 24. Enhanced SOHO/MDI magnetograms allow for better understanding and forecasting of violent events of space weather. Experimental results show that SolarCNN improves the quality of SOHO/MDI magnetograms in terms of the structural similarity index measure (SSIM), Pearson's correlation coefficient (PCC), and the peak signal-to-noise ratio (PSNR).
Updated: 2024-03-27 06:58:01
标题: 使用SDO/HMI数据和注意力辅助卷积神经网络对太阳活动区域的SOHO/MDI磁图进行超分辨率处理
摘要: 图像超分辨率一直是图像处理和识别中的重要课题。在这里,我们提出了一种注意力辅助的卷积神经网络(CNN)用于太阳图像的超分辨率。我们的方法,命名为SolarCNN,旨在提高Michelson多普勒成像仪(MDI)在太阳和日球观测站(SOHO)上收集的太阳活动区域(ARs)的视线磁图的质量。用于训练SolarCNN的地面真实标签是由太阳动力学观测站(SDO)上的Helioseismic和磁成像仪(HMI)收集的视线磁图。太阳ARs由强磁场组成,其中磁能可以突然释放,产生极端的太空天气事件,如太阳耀斑、日冕质量抛射和太阳活动粒子。SOHO/MDI覆盖了太阳周期23,比太阳周期24更强,有更多的喷发事件。增强的SOHO/MDI磁图有助于更好地理解和预测太空天气的暴力事件。实验结果表明,SolarCNN在结构相似性指数测量(SSIM)、皮尔逊相关系数(PCC)和峰值信噪比(PSNR)方面改善了SOHO/MDI磁图的质量。
更新时间: 2024-03-27 06:58:01
领域: astro-ph.SR,cs.LG
Regret-Based Defense in Adversarial Reinforcement Learning
Deep Reinforcement Learning (DRL) policies have been shown to be vulnerable to small adversarial noise in observations. Such adversarial noise can have disastrous consequences in safety-critical environments. For instance, a self-driving car receiving adversarially perturbed sensory observations about nearby signs (e.g., a stop sign physically altered to be perceived as a speed limit sign) or objects (e.g., cars altered to be recognized as trees) can be fatal. Existing approaches for making RL algorithms robust to an observation-perturbing adversary have focused on reactive approaches that iteratively improve against adversarial examples generated at each iteration. While such approaches have been shown to provide improvements over regular RL methods, they are reactive and can fare significantly worse if certain categories of adversarial examples are not generated during training. To that end, we pursue a more proactive approach that relies on directly optimizing a well-studied robustness measure, regret instead of expected value. We provide a principled approach that minimizes maximum regret over a "neighborhood" of observations to the received "observation". Our regret criterion can be used to modify existing value- and policy-based Deep RL methods. We demonstrate that our approaches provide a significant improvement in performance across a wide variety of benchmarks against leading approaches for robust Deep RL.
Updated: 2024-03-27 06:57:30
标题: 对抗性强化学习中基于后悔的防御
摘要: 深度强化学习(DRL)策略已经显示出在观察中受到小的对抗性噪声的影响。这种对抗性噪声在安全关键环境中可能会带来灾难性后果。例如,一个接收到对附近标志物进行对抗性扰动的自动驾驶汽车(例如,一个被物理改变为被认为是限速标志的停车标志)或物体(例如,被改变为被识别为树木的汽车)可能会导致致命后果。现有的使RL算法对观察扰动的对抗性对手具有鲁棒性的方法主要集中在反应性方法上,这些方法在每次迭代中针对生成的对抗性示例进行改进。尽管这些方法已经显示出相对于常规RL方法提供了改进,但它们是反应性的,如果在训练过程中未生成某些类别的对抗性示例,则可能表现得更差。因此,我们提出了一种更主动的方法,该方法依赖于直接优化一个经过深入研究的鲁棒性度量,即后悔而不是期望值。我们提供了一种基于原则的方法,通过最小化接收到的“观察”周围的“邻域”上的最大后悔来实现。我们的后悔标准可用于修改现有的基于值和策略的深度RL方法。我们证明,我们的方法在对抗性深度RL的领先方法的各种基准测试中提供了显著的性能改进。
更新时间: 2024-03-27 06:57:30
领域: cs.LG,cs.AI
Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives
The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satisfy constraints such as fairness. We find that current state-of-the-art empirical techniques offer sub-optimal performance on these practical, non-decomposable performance objectives. On the other hand, the theoretical techniques necessitate training a new model from scratch for each performance objective. To bridge the gap, we propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective. The core idea of our framework is to determine a sampling distribution to perform a mixup of features between samples from particular classes such that it optimizes the given objective. We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
Updated: 2024-03-27 06:55:23
标题: 选择性Mixup微调以优化不可分解目标
摘要: 互联网使用量的增加导致了大量数据的生成,从而采用了各种监督和半监督机器学习算法,这些算法可以有效利用海量数据来训练模型。然而,在将这些模型部署到现实世界之前,必须严格评估其在诸如最坏情况召回率等性能指标上的表现,并满足公平性等约束条件。我们发现,当前现有的经验技术在这些实际的、不可分解的性能目标上表现出次优性能。另一方面,理论技术需要针对每个性能目标从头开始训练一个新模型。为了弥合这一差距,我们提出了SelMix,这是一种基于选择性混合的廉价微调技术,用于预训练模型,以优化所需目标。我们框架的核心思想是确定一个采样分布,以在特定类别的样本之间执行特征混合,从而优化给定的目标。我们在标准基准数据集上全面评估了我们的技术与现有的经验和理论方法,用于不平衡分类。我们发现,所提出的SelMix微调显著改善了各种实际的不可分解目标在基准上的表现。
更新时间: 2024-03-27 06:55:23
领域: cs.LG,cs.AI,cs.CV,stat.ML
HotStuff-2 vs. HotStuff: The Difference and Advantage
Byzantine consensus protocols are essential in blockchain technology. The widely recognized HotStuff protocol uses cryptographic measures for efficient view changes and reduced communication complexity. Recently, the main authors of HotStuff introduced an advanced iteration named HotStuff-2. This paper aims to compare the principles and analyze the effectiveness of both protocols, hoping to depict their key differences and assess the potential enhancements offered by HotStuff-2.
Updated: 2024-03-27 06:54:56
标题: HotStuff-2与HotStuff:差异与优势
摘要: 拜占庭共识协议在区块链技术中至关重要。被广泛认可的HotStuff协议利用加密措施实现高效的视图更改和减少通信复杂性。最近,HotStuff的主要作者推出了一个先进的版本,名为HotStuff-2。本文旨在比较这两种协议的原则并分析其有效性,希望描绘它们的关键差异并评估HotStuff-2提供的潜在增强功能。
更新时间: 2024-03-27 06:54:56
领域: cs.CR,cs.DC
GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication Paradigm
Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. However, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.
Updated: 2024-03-27 06:46:59
标题: GeNet:基于图神经网络的抗噪声任务导向语义通信范式
摘要: 传统的语义通信任务方法依赖于信噪比(SNR)的知识来减轻信道噪声。然而,这些方法需要在特定的SNR条件下进行训练,需要大量的时间和计算资源。在本文中,我们提出了GeNet,一种基于图神经网络(GNN)的语义通信范式,旨在对抗噪声,从而促进面向任务的通信(TOC)。我们提出了一种新颖的方法,首先将输入数据图像转换为图结构。然后我们利用基于GNN的编码器从源数据中提取语义信息。这些提取的语义信息随后通过信道传输。在接收端,利用基于GNN的解码器从源数据中重建相关的语义信息以用于TOC。通过实验评估,我们展示了GeNet在解耦SNR依赖性的同时在抗噪声TOC中的有效性。我们进一步通过改变节点数量来评估GeNet的性能,揭示了它作为语义通信新范式的多功能性。此外,我们通过在不使用数据增强的情况下测试不同旋转角度的几何变换来展示GeNet的鲁棒性。
更新时间: 2024-03-27 06:46:59
领域: cs.LG,cs.AI,eess.SP
MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification
The objective of search result diversification (SRD) is to ensure that selected documents cover as many different subtopics as possible. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods aim to approximately optimize the diversity metric, such as $\alpha$-NDCG, but the results still remain suboptimal. To address these challenges, we introduce Multi-Agent reinforcement learning (MARL) for search result DIVersity, which called MA4DIV. In this approach, each document is an agent and the search result diversification is modeled as a cooperative task among multiple agents. This approach allows for directly optimizing the diversity metrics, such as $\alpha$-NDCG, while achieving high training efficiency. We conducted preliminary experiments on public TREC datasets to demonstrate the effectiveness and potential of MA4DIV. Considering the limited number of queries in public TREC datasets, we construct a large-scale dataset from industry sources and show that MA4DIV achieves substantial improvements in both effectiveness and efficiency than existing baselines on a industrial scale dataset.
Updated: 2024-03-27 06:28:53
标题: MA4DIV:用于搜索结果多样化的多智能体强化学习
摘要: 搜索结果多样化(SRD)的目标是确保所选文档涵盖尽可能多的不同子主题。现有方法主要利用“贪婪选择”范例,即一次选择一个具有最高多样性分数的文档。这些方法往往效率低下,并容易陷入次优状态。此外,一些其他方法旨在近似优化多样性指标,如$\alpha$-NDCG,但结果仍然保持次优。为了应对这些挑战,我们引入了多智体强化学习(MARL)用于搜索结果多样性,称为MA4DIV。在这种方法中,每个文档都是一个智体,搜索结果多样化被建模为多个智体之间的合作任务。这种方法可以直接优化多样性指标,如$\alpha$-NDCG,同时实现高训练效率。我们在公共TREC数据集上进行了初步实验,以展示MA4DIV的有效性和潜力。考虑到公共TREC数据集中查询数量有限,我们从行业来源构建了一个大规模数据集,并展示了MA4DIV相比现有基线在工业规模数据集上在效果和效率方面取得了显著改进。
更新时间: 2024-03-27 06:28:53
领域: cs.IR,cs.AI
Few-Shot Recalibration of Language Models
Recent work has uncovered promising ways to extract well-calibrated confidence estimates from language models (LMs), where the model's confidence score reflects how likely it is to be correct. However, while LMs may appear well-calibrated over broad distributions, this often hides significant miscalibration within narrower slices (e.g., systemic over-confidence in math can balance out systemic under-confidence in history, yielding perfect calibration in aggregate). To attain well-calibrated confidence estimates for any slice of a distribution, we propose a new framework for few-shot slice-specific recalibration. Specifically, we train a recalibration model that takes in a few unlabeled examples from any given slice and predicts a curve that remaps confidence scores to be more accurate for that slice. Our trained model can recalibrate for arbitrary new slices, without using any labeled data from that slice. This enables us to identify domain-specific confidence thresholds above which the LM's predictions can be trusted, and below which it should abstain. Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods, for instance improving calibration error for PaLM2-Large on MMLU by 16%, as compared to temperature scaling.
Updated: 2024-03-27 06:25:40
标题: 语言模型的少样本重新校准
摘要: 最近的研究发现了从语言模型(LMs)中提取出良好校准的置信度估计的有希望的方法,其中模型的置信度分数反映了其正确性的可能性。然而,虽然LMs在广泛分布上可能看起来校准良好,但这经常隐藏在较窄切片内的显着校准错误(例如,数学中的系统过度自信可能抵消了历史中的系统低自信,从而在总体上产生完美的校准)。为了获得任何分布切片的良好校准置信度估计,我们提出了一种新的少样本切片特定重新校准的框架。具体来说,我们训练一个重新校准模型,该模型接收来自任何给定切片的少量未标记示例,并预测一个曲线,将置信度分数重新映射为该切片更准确。我们训练的模型可以重新校准任意新的切片,而不使用来自该切片的任何标记数据。这使我们能够确定领域特定的置信度阈值,超过这个阈值,LM的预测可以被信任,低于这个阈值,它应该弃权。实验表明,我们的少样本重新校准器始终优于现有的校准方法,例如相对于温度缩放,在MMLU上,PaLM2-Large的校准错误提高了16%。
更新时间: 2024-03-27 06:25:40
领域: cs.CL,cs.AI,cs.LG
LLMs Are Few-Shot In-Context Low-Resource Language Learners
In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages. Nonetheless, there is only a handful of works explored ICL for low-resource languages with most of them focusing on relatively high-resource languages, such as French and Spanish. In this work, we extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages. Our study not only assesses the effectiveness of ICL with LLMs in low-resource languages but also identifies the shortcomings of in-context label alignment, and introduces a more effective alternative: query alignment. Moreover, we provide valuable insights into various facets of ICL for low-resource languages. Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs through semantically relevant information by closing the language gap in the target language and aligning the semantics between the targeted low-resource and the high-resource language that the model is proficient in. Our work highlights the importance of advancing ICL research, particularly for low-resource languages.
Updated: 2024-03-27 06:25:10
标题: LLMs 是少样本、上下文相关、低资源语言学习者
摘要: 在上下文学习(ICL)的支持下,大型语言模型(LLMs)可以利用短时的上下文信息在少数被低估的语言中执行多样化任务,这为缩小高资源语言和低资源语言之间的差距提供了关键途径。然而,目前只有少数几项研究探讨了针对低资源语言的ICL,其中大部分集中在相对高资源的语言,如法语和西班牙语上。在本研究中,我们广泛研究了ICL及其跨语言变体(X-ICL)在25种低资源语言和7种相对高资源语言上的应用。我们的研究不仅评估了LLMs在低资源语言中使用ICL的有效性,还发现了上下文标签对齐的不足之处,并引入了一种更有效的替代方法:查询对齐。此外,我们为低资源语言的ICL提供了有价值的见解。我们的研究结论强调了少量上下文信息对于通过语义相关信息提高LLMs在低资源语言中的理解质量的重要性,从而通过关闭目标语言中的语言差距并使目标低资源语言与模型精通的高资源语言之间的语义对齐来提升LLMs的能力。我们的工作突出了推进ICL研究的重要性,特别是对于低资源语言。
更新时间: 2024-03-27 06:25:10
领域: cs.CL,cs.AI
Enhancing Programming Education with ChatGPT: A Case Study on Student Perceptions and Interactions in a Python Course
The integration of ChatGPT as a supportive tool in education, notably in programming courses, addresses the unique challenges of programming education by providing assistance with debugging, code generation, and explanations. Despite existing research validating ChatGPT's effectiveness, its application in university-level programming education and a detailed understanding of student interactions and perspectives remain limited. This paper explores ChatGPT's impact on learning in a Python programming course tailored for first-year students over eight weeks. By analyzing responses from surveys, open-ended questions, and student-ChatGPT dialog data, we aim to provide a comprehensive view of ChatGPT's utility and identify both its advantages and limitations as perceived by students. Our study uncovers a generally positive reception toward ChatGPT and offers insights into its role in enhancing the programming education experience. These findings contribute to the broader discourse on AI's potential in education, suggesting paths for future research and application.
Updated: 2024-03-27 06:22:41
标题: 使用ChatGPT增强编程教育:以Python课程学生感知和互动为案例研究
摘要: ChatGPT作为在教育中的支持工具的整合,尤其是在编程课程中,通过提供调试、代码生成和解释方面的帮助,解决了编程教育的独特挑战。尽管现有研究验证了ChatGPT的有效性,但其在大学级编程教育中的应用以及对学生互动和观点的详细理解仍然有限。本文探讨了在为一年级学生量身定制的Python编程课程中,在八周内ChatGPT对学习的影响。通过分析来自调查问卷、开放性问题和学生-ChatGPT对话数据的回应,我们旨在提供ChatGPT实用性的全面视角,并确定学生所感知到的其优势和局限性。我们的研究揭示了学生对ChatGPT普遍积极的接受态度,并提供了关于其在增强编程教育体验中的作用的见解。这些发现有助于广泛讨论AI在教育中的潜力,为未来研究和应用提供了路径。
更新时间: 2024-03-27 06:22:41
领域: cs.CY,cs.AI,cs.PL
Identification and Uses of Deep Learning Backbones via Pattern Mining
Deep learning is extensively used in many areas of data mining as a black-box method with impressive results. However, understanding the core mechanism of how deep learning makes predictions is a relatively understudied problem. Here we explore the notion of identifying a backbone of deep learning for a given group of instances. A group here can be instances of the same class or even misclassified instances of the same class. We view each instance for a given group as activating a subset of neurons and attempt to find a subgraph of neurons associated with a given concept/group. We formulate this problem as a set cover style problem and show it is intractable and presents a highly constrained integer linear programming (ILP) formulation. As an alternative, we explore a coverage-based heuristic approach related to pattern mining, and show it converges to a Pareto equilibrium point of the ILP formulation. Experimentally we explore these backbones to identify mistakes and improve performance, explanation, and visualization. We demonstrate application-based results using several challenging data sets, including Bird Audio Detection (BAD) Challenge and Labeled Faces in the Wild (LFW), as well as the classic MNIST data.
Updated: 2024-03-27 06:13:39
标题: 深度学习骨干的识别和应用:通过模式挖掘
摘要: 深度学习被广泛应用于数据挖掘的许多领域,作为一个具有令人印象深刻结果的黑盒方法。然而,了解深度学习如何进行预测的核心机制是一个相对未被研究的问题。在这里,我们探讨了为给定组实例识别深度学习骨干的概念。这里的一组可以是同一类的实例,甚至是同一类的误分类实例。我们将给定组的每个实例视为激活神经元的一个子集,并尝试找到与给定概念/组相关联的神经元子图。我们将这个问题制定为一个集合覆盖风格的问题,并显示它是难以解决的,并提出了一个高度受限的整数线性规划(ILP)公式。作为一种替代方案,我们探讨了与模式挖掘相关的基于覆盖的启发式方法,并展示它收敛到ILP公式的帕累托均衡点。在实验中,我们探索这些骨干以识别错误并改进性能、解释和可视化。我们展示了使用几个具有挑战性的数据集进行基于应用的结果,包括鸟类音频检测(BAD)挑战和野外标记面孔(LFW),以及经典的MNIST数据。
更新时间: 2024-03-27 06:13:39
领域: cs.AI
SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces
Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their memory consumption, which increases quadratically with the length of the sequence. This limitation presents significant challenges when attempting to generate longer video sequences using diffusion models. To overcome this challenge, we propose leveraging state-space models (SSMs). SSMs have recently gained attention as viable alternatives due to their linear memory consumption relative to sequence length. In the experiments, we first evaluate our SSM-based model with UCF101, a standard benchmark of video generation. In addition, to investigate the potential of SSMs for longer video generation, we perform an experiment using the MineRL Navigate dataset, varying the number of frames to 64, 200, and 400. In these settings, our SSM-based model can considerably save memory consumption for longer sequences, while maintaining competitive FVD scores to the attention-based models. Our codes are available at https://github.com/shim0114/SSM-Meets-Video-Diffusion-Models.
Updated: 2024-03-27 06:02:38
标题: SSM遇见视频扩散模型:利用结构化状态空间进行高效视频生成
摘要: 鉴于扩散模型在图像生成方面取得的显著成就,研究界对将这些模型扩展到视频生成表现出越来越浓厚的兴趣。最近用于视频生成的扩散模型主要利用注意力层来提取时间特征。然而,注意力层受到其内存消耗的限制,随着序列长度的增加呈二次增长。当尝试使用扩散模型生成更长的视频序列时,这种限制带来了重大挑战。为了克服这一挑战,我们提出利用状态空间模型(SSMs)。最近,由于相对于序列长度的线性内存消耗,SSMs作为可行的替代方案引起了关注。在实验中,我们首先通过UCF101进行评估我们基于SSM的模型,这是视频生成的标准基准。此外,为了研究SSMs在更长视频生成中的潜力,我们使用MineRL Navigate数据集进行了实验,将帧数变化为64、200和400。在这些设置中,我们基于SSM的模型可以显著节省更长序列的内存消耗,同时保持与基于注意力的模型相竞争的FVD分数。我们的代码可在 https://github.com/shim0114/SSM-Meets-Video-Diffusion-Models 获取。
更新时间: 2024-03-27 06:02:38
领域: cs.CV,cs.AI
Clustering Change Sign Detection by Fusing Mixture Complexity
This paper proposes an early detection method for cluster structural changes. Cluster structure refers to discrete structural characteristics, such as the number of clusters, when data are represented using finite mixture models, such as Gaussian mixture models. We focused on scenarios in which the cluster structure gradually changed over time. For finite mixture models, the concept of mixture complexity (MC) measures the continuous cluster size by considering the cluster proportion bias and overlap between clusters. In this paper, we propose MC fusion as an extension of MC to handle situations in which multiple mixture numbers are possible in a finite mixture model. By incorporating the fusion of multiple models, our approach accurately captured the cluster structure during transitional periods of gradual change. Moreover, we introduce a method for detecting changes in the cluster structure by examining the transition of MC fusion. We demonstrate the effectiveness of our method through empirical analysis using both artificial and real-world datasets.
Updated: 2024-03-27 05:50:23
标题: 通过融合混合复杂度对聚类变化符号检测进行研究
摘要: 本文提出了一种用于集群结构变化早期检测的方法。集群结构指的是离散的结构特征,例如在使用有限混合模型(如高斯混合模型)表示数据时的集群数量。我们关注的是集群结构随时间逐渐变化的情况。对于有限混合模型,混合复杂度(MC)的概念通过考虑集群比例偏差和集群之间的重叠来衡量连续的集群大小。在本文中,我们提出了MC融合作为MC的延伸,以处理有限混合模型中可能存在多个混合数的情况。通过融合多个模型,我们的方法准确捕捉了在渐变变化过程中的集群结构。此外,我们引入了一种通过检查MC融合的转变来检测集群结构变化的方法。我们通过使用人工和真实世界数据集的实证分析展示了我们方法的有效性。
更新时间: 2024-03-27 05:50:23
领域: stat.ML,cs.IT,cs.LG,math.IT
Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge
The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this paper, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model of the form $S_{h+1} = f(S_h, A_h) + W_h$, where $f$ represents the underlying system dynamics, and $W_h$ are unknown disturbances independent of states and actions. In the setting of finite episodic Markov decision processes with $S$ states, $A$ actions, and episode length $H$, we present an optimistic Q-learning algorithm that achieves $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{T})$ regret under perfect knowledge of $f$, where $T$ is the total number of interactions with the system. This is in contrast to the typical $\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{SAT})$ regret for existing Q-learning methods. Further, if only a noisy estimate $\hat{f}$ of $f$ is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error $\hat{f}-f$, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods.
Updated: 2024-03-27 05:48:21
标题: 具有部分动态知识的样本高效强化学习
摘要: 在线强化学习的样本复杂性问题通常在文献中研究,而不考虑可能加速学习过程的系统动态的任何部分知识。在本文中,我们研究了在线Q-learning方法的样本复杂性,当一些关于动态的先验知识可用或可以有效地学习时。我们关注按照形式为$S_{h+1} = f(S_h, A_h) + W_h$的加性干扰模型演化的系统,其中$f$代表基础系统动态,$W_h$是与状态和动作无关的未知干扰项。在有限的离散马尔可夫决策过程设置中,具有$S$个状态,$A$个动作和剧集长度$H$,我们提出了一个乐观的Q-learning算法,可以在对$f$进行完美知识的情况下实现$\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{T})$的后悔,其中$T$是与系统交互的总次数。这与现有Q-learning方法的典型$\tilde{\mathcal{O}}(\text{Poly}(H)\sqrt{SAT})$后悔形成对比。此外,如果只有$f$的一个嘈杂估计$\hat{f}$可用,我们的方法可以在样本数上学习一个近似最优策略,而与状态和动作空间的基数无关。次优性差距取决于近似误差$\hat{f}-f$,以及相应的最优值函数的Lipschitz常数。我们的方法不需要对过渡概率进行建模,并且享有与无模型方法相同的内存复杂性。
更新时间: 2024-03-27 05:48:21
领域: cs.LG,math.OC,stat.ML
Weakly Supervised AUC Optimization: A Unified Partial AUC Approach
Since acquiring perfect supervision is usually difficult, real-world machine learning tasks often confront inaccurate, incomplete, or inexact supervision, collectively referred to as weak supervision. In this work, we present WSAUC, a unified framework for weakly supervised AUC optimization problems, which covers noisy label learning, positive-unlabeled learning, multi-instance learning, and semi-supervised learning scenarios. Within the WSAUC framework, we first frame the AUC optimization problems in various weakly supervised scenarios as a common formulation of minimizing the AUC risk on contaminated sets, and demonstrate that the empirical risk minimization problems are consistent with the true AUC. Then, we introduce a new type of partial AUC, specifically, the reversed partial AUC (rpAUC), which serves as a robust training objective for AUC maximization in the presence of contaminated labels. WSAUC offers a universal solution for AUC optimization in various weakly supervised scenarios by maximizing the empirical rpAUC. Theoretical and experimental results under multiple settings support the effectiveness of WSAUC on a range of weakly supervised AUC optimization tasks.
Updated: 2024-03-27 05:45:37
标题: 弱监督下的AUC优化:统一的部分AUC方法
摘要: 由于获得完美的监督通常很困难,现实世界中的机器学习任务经常面临不准确、不完整或不精确的监督,统称为弱监督。在这项工作中,我们提出了WSAUC,这是一个统一的弱监督AUC优化问题框架,涵盖了嘈杂标签学习、正类别-无标签学习、多实例学习和半监督学习场景。在WSAUC框架内,我们首先将各种弱监督场景中的AUC优化问题构建为在受污染集上最小化AUC风险的常见公式,并证明经验风险最小化问题与真实AUC一致。然后,我们引入了一种新型的部分AUC,即反向部分AUC(rpAUC),它作为存在受污染标签时AUC最大化的强大训练目标。WSAUC通过最大化经验rpAUC,在各种弱监督场景中提供了AUC优化的通用解决方案。在多种设置下的理论和实验结果支持了WSAUC在一系列弱监督AUC优化任务上的有效性。
更新时间: 2024-03-27 05:45:37
领域: cs.LG,cs.AI
DSF-GAN: DownStream Feedback Generative Adversarial Network
Utility and privacy are two crucial measurements of the quality of synthetic tabular data. While significant advancements have been made in privacy measures, generating synthetic samples with high utility remains challenging. To enhance the utility of synthetic samples, we propose a novel architecture called the DownStream Feedback Generative Adversarial Network (DSF-GAN). This approach incorporates feedback from a downstream prediction model during training to augment the generator's loss function with valuable information. Thus, DSF-GAN utilizes a downstream prediction task to enhance the utility of synthetic samples. To evaluate our method, we tested it using two popular datasets. Our experiments demonstrate improved model performance when training on synthetic samples generated by DSF-GAN, compared to those generated by the same GAN architecture without feedback. The evaluation was conducted on the same validation set comprising real samples. All code and datasets used in this research will be made openly available for ease of reproduction.
Updated: 2024-03-27 05:41:50
标题: DSF-GAN:下游反馈生成对抗网络
摘要: 效用和隐私是衡量合成表格数据质量的两个关键指标。虽然在隐私度量方面取得了显著进展,但生成具有高效用性的合成样本仍然具有挑战性。为了增强合成样本的效用性,我们提出了一种新颖的架构,称为DownStream反馈生成对抗网络(DSF-GAN)。该方法在训练过程中结合了下游预测模型的反馈,通过有价值的信息扩充生成器的损失函数,从而增强了合成样本的效用性。因此,DSF-GAN利用下游预测任务来增强合成样本的效用性。为了评估我们的方法,我们使用了两个流行的数据集进行测试。我们的实验表明,在使用DSF-GAN生成的合成样本进行训练时,模型性能得到了改善,与没有反馈的相同GAN架构生成的合成样本相比。评估是在相同的验证集上进行的,该验证集包含真实样本。本研究使用的所有代码和数据集将公开提供,以便于再现。
更新时间: 2024-03-27 05:41:50
领域: cs.LG,cs.AI,I.2
Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised Learning
Self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. However, as real-world applications continually integrate new content, the high computational and resource demands of SSL necessitate continual learning rather than complete retraining. This poses a challenge in striking a balance between stability and plasticity when adapting to new information. In this paper, we employ Centered Kernel Alignment for quantitatively analyzing model stability and plasticity, revealing the critical roles of batch normalization layers for stability and convolutional layers for plasticity. Motivated by this, we propose Branch-tuning, an efficient and straightforward method that achieves a balance between stability and plasticity in continual SSL. Branch-tuning consists of branch expansion and compression, and can be easily applied to various SSL methods without the need of modifying the original methods, retaining old data or models. We validate our method through incremental experiments on various benchmark datasets, demonstrating its effectiveness and practical value in real-world scenarios. We hope our work offers new insights for future continual self-supervised learning research. The code will be made publicly available.
Updated: 2024-03-27 05:38:48
标题: 支流调节:在持续自我监督学习中平衡稳定性和可塑性
摘要: 自监督学习(SSL)已经成为从大量未标记数据中获取通用表示的有效范例。然而,随着现实世界的应用不断整合新内容,SSL 的高计算和资源需求需要持续学习而不是完全重新训练。在适应新信息时,平衡稳定性和可塑性之间的挑战。在本文中,我们使用中心化核对齐来定量分析模型的稳定性和可塑性,揭示了批量归一化层对稳定性和卷积层对可塑性的关键作用。受此启发,我们提出了一种高效简单的方法Branch-tuning,实现了在持续 SSL 中稳定性和可塑性之间的平衡。Branch-tuning 包括分支扩展和压缩,可以轻松应用于各种 SSL 方法,无需修改原始方法,保留旧数据或模型。我们通过对各种基准数据集的增量实验验证了我们的方法,展示了其在现实场景中的有效性和实际价值。我们希望我们的工作为未来持续自监督学习研究提供新的见解。代码将公开发布。
更新时间: 2024-03-27 05:38:48
领域: cs.LG,cs.CV
Foundation Model Makes Clustering A Better Initialization For Cold-Start Active Learning
Active learning selects the most informative samples from the unlabelled dataset to annotate in the context of a limited annotation budget. While numerous methods have been proposed for subsequent sample selection based on an initialized model, scant attention has been paid to the indispensable phase of active learning: selecting samples for model cold-start initialization. Most of the previous studies resort to random sampling or naive clustering. However, random sampling is prone to fluctuation, and naive clustering suffers from convergence speed, particularly when dealing with high-dimensional data such as imaging data. In this work, we propose to integrate foundation models with clustering methods to select samples for cold-start active learning initialization. Foundation models refer to those trained on massive datasets by the self-supervised paradigm and capable of generating informative and compacted embeddings for various downstream tasks. Leveraging these embeddings to replace raw features such as pixel values, clustering quickly converges and identifies better initial samples. For a comprehensive comparison, we included a classic ImageNet-supervised model to acquire embeddings. Experiments on two clinical tasks of image classification and segmentation demonstrated that foundation model-based clustering efficiently pinpointed informative initial samples, leading to models showcasing enhanced performance than the baseline methods. We envisage that this study provides an effective paradigm for future cold-start active learning.
Updated: 2024-03-27 05:23:40
标题: 基于基础模型的聚类使得冷启动主动学习的初始化更好
摘要: 主动学习从未标记的数据集中选择最具信息量的样本进行标注,以在有限的标注预算下进行。虽然已经提出了许多基于已初始化模型的后续样本选择方法,但对于主动学习的不可或缺阶段——为模型冷启动初始化选择样本,却鲜有关注。大多数先前的研究都采用随机抽样或天真聚类。然而,随机抽样容易出现波动,而天真聚类在处理高维数据(如图像数据)时尤其受到收敛速度的影响。在这项工作中,我们提出将基础模型与聚类方法相结合,以选择用于冷启动主动学习初始化的样本。基础模型是指通过自监督范式在大规模数据集上训练的模型,能够为各种下游任务生成信息丰富且紧凑的嵌入。利用这些嵌入替代原始特征(如像素值),聚类快速收敛并识别出更好的初始样本。为了进行全面比较,我们还包括了一个经典的ImageNet监督模型来获取嵌入。在图像分类和分割的两个临床任务上的实验表明,基础模型基于聚类有效地确定了信息丰富的初始样本,导致模型展示出比基准方法更好的性能。我们预见这项研究为未来的冷启动主动学习提供了一种有效的范式。
更新时间: 2024-03-27 05:23:40
领域: cs.LG,cs.CV
Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The ability to forget is a crucial brain function that facilitates continual learning by selectively discarding less relevant information for humans. However, in the field of machine learning models, the concept of intentionally forgetting has not been extensively investigated. In this study we aim to bridge this gap by incorporating the forgetting mechanisms into GCIL, thereby examining their impact on the models' ability to learn in continual learning. Through our experiments, we have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge, underscoring the positive role that strategic forgetting plays in the process of continual learning.
Updated: 2024-03-27 05:10:38
标题: 利用遗忘模型方法提高生成式增量学习性能
摘要: 这项研究提出了一种新的生成类增量学习(GCIL)方法,引入了遗忘机制,旨在动态管理类信息,以更好地适应流数据。GCIL是计算机视觉领域的热门话题之一,被认为是社会中至关重要的任务之一,特别是生成模型的持续学习。遗忘能力是一种关键的大脑功能,通过有选择地丢弃对人类不太相关的信息,促进持续学习。然而,在机器学习模型领域,有意遗忘的概念并未得到广泛研究。本研究旨在通过将遗忘机制纳入GCIL中,来填补这一空白,从而考察其对模型在持续学习中学习能力的影响。通过我们的实验,我们发现整合遗忘机制显著提升了模型在获取新知识方面的表现,强调了战略性遗忘在持续学习过程中发挥的积极作用。
更新时间: 2024-03-27 05:10:38
领域: cs.CV,cs.AI
ProSwitch: Knowledge-Guided Language Model Fine-Tuning to Generate Professional and Non-Professional Styled Text
Large Language Models (LLMs) have demonstrated efficacy in various linguistic applications, including text summarization and controlled text generation. However, studies into their capacity of switching between styles via fine-tuning remain underexplored. This study concentrates on textual professionalism and introduces a novel methodology, named ProSwitch, which equips a language model with the ability to produce both professional and non-professional responses through knowledge-guided instruction tuning. ProSwitch unfolds across three phases: data preparation for gathering domain knowledge and training corpus; instruction tuning for optimizing language models with multiple levels of instruction formats; and comprehensive evaluation for assessing the professionalism discrimination and reference-based quality of generated text. Comparative analysis of ProSwitch against both general and specialized language models reveals that our approach outperforms baselines in switching between professional and non-professional text generation.
Updated: 2024-03-27 05:02:55
标题: ProSwitch:知识引导的语言模型微调,生成专业和非专业风格的文本
摘要: 大型语言模型(LLMs)已经在各种语言应用中展示出有效性,包括文本摘要和受控文本生成。然而,关于它们通过微调在不同风格之间切换能力的研究仍未得到充分探索。本研究集中在文本专业性上,并引入了一种名为ProSwitch的新方法,通过知识引导的指导调整,使语言模型具备产生专业和非专业响应的能力。ProSwitch包括三个阶段:数据准备用于收集领域知识和训练语料库;指导调整用于优化具有多种指导格式的语言模型;和全面评估用于评估生成文本的专业性区分和基于参考的质量。ProSwitch与一般和专门语言模型的比较分析显示,我们的方法在专业和非专业文本生成之间的切换方面优于基准。
更新时间: 2024-03-27 05:02:55
领域: cs.CL,cs.AI,68T50,I.2.7
Manipulating Neural Path Planners via Slight Perturbations
Data-driven neural path planners are attracting increasing interest in the robotics community. However, their neural network components typically come as black boxes, obscuring their underlying decision-making processes. Their black-box nature exposes them to the risk of being compromised via the insertion of hidden malicious behaviors. For example, an attacker may hide behaviors that, when triggered, hijack a delivery robot by guiding it to a specific (albeit wrong) destination, trapping it in a predefined region, or inducing unnecessary energy expenditure by causing the robot to repeatedly circle a region. In this paper, we propose a novel approach to specify and inject a range of hidden malicious behaviors, known as backdoors, into neural path planners. Our approach provides a concise but flexible way to define these behaviors, and we show that hidden behaviors can be triggered by slight perturbations (e.g., inserting a tiny unnoticeable object), that can nonetheless significantly compromise their integrity. We also discuss potential techniques to identify these backdoors aimed at alleviating such risks. We demonstrate our approach on both sampling-based and search-based neural path planners.
Updated: 2024-03-27 04:56:48
标题: 通过轻微扰动操纵神经路径规划者
摘要: 基于数据的神经路径规划器正在引起机器人学界越来越多的关注。然而,它们的神经网络组件通常是黑盒,模糊了它们的基础决策过程。它们的黑盒特性使它们面临着通过插入隐藏的恶意行为来被篡改的风险。例如,攻击者可能隐藏行为,当触发时,会通过引导传送机器人到特定(虽然错误的)目的地、将其困在预定义区域内、或通过导致机器人反复绕圈而诱发不必要的能量消耗来劫持传送机器人。在本文中,我们提出了一种新颖的方法,用于指定和注入一系列隐藏的恶意行为,称为后门,到神经路径规划器中。我们的方法提供了一种简洁而灵活的方式来定义这些行为,并且我们展示了隐藏行为可以通过轻微扰动(例如,插入微不可见的物体)来触发,从而显著地损害其完整性。我们还讨论了识别这些后门的潜在技术,旨在减轻这些风险。我们在基于采样的和基于搜索的神经路径规划器上展示了我们的方法。
更新时间: 2024-03-27 04:56:48
领域: cs.RO,cs.AI
CBQ: Cross-Block Quantization for Large Language Models
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs. However, existing PTQ methods only focus on handling the outliers within one layer or one block, which ignores the dependency of blocks and leads to severe performance degradation in low-bit settings. In this paper, we propose CBQ, a cross-block reconstruction-based PTQ method for LLMs. CBQ employs a cross-block dependency using a homologous reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation. Furthermore, CBQ incorporates a coarse-to-fine preprocessing (CFP) strategy for suppressing weight and activation outliers, coupled with an adaptive LoRA-Rounding technique for precise weight quantization. These innovations enable CBQ to not only handle extreme outliers effectively but also improve overall quantization accuracy. Extensive experiments show that CBQ achieves superior low-bit quantization (W4A4, W4A8, W2A16) and outperforms existing state-of-the-art methods across various LLMs and datasets. Notably, CBQ quantizes the 4-bit LLAMA1-65B model within only 4.3 hours on a single GPU, achieving a commendable tradeoff between performance and quantization efficiency.
Updated: 2024-03-27 04:51:51
标题: CBQ:大型语言模型的交叉块量化
摘要: 训练后量化(PTQ)在以极低成本压缩大型语言模型(LLMs)方面发挥了关键作用。然而,现有的PTQ方法仅关注处理单个层或单个块中的异常值,忽略了块之间的依赖关系,导致在低比特设置中性能严重下降。在本文中,我们提出了CBQ,一种基于跨块重建的LLMs的PTQ方法。CBQ采用同源重建方案的跨块依赖关系,建立跨多个块的长程依赖关系,以最小化误差积累。此外,CBQ结合了一种粗到细的预处理(CFP)策略,用于抑制权重和激活异常值,并配合自适应的LoRA-Rounding技术进行精确的权重量化。这些创新使CBQ不仅能够有效处理极端异常值,还能提高整体量化精度。大量实验表明,CBQ在各种LLMs和数据集上实现了卓越的低比特量化(W4A4、W4A8、W2A16),并超越了现有的最先进方法。值得注意的是,CBQ仅在单个GPU上用4.3小时将4位LLAMA1-65B模型量化,实现了性能和量化效率之间的可观权衡。
更新时间: 2024-03-27 04:51:51
领域: cs.LG,cs.CL
Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Visual representation learning has been a cornerstone in computer vision, evolving from supervised learning with human-annotated labels to aligning image-text pairs from the Internet. Despite recent advancements in multi-modal large language models (MLLMs), the visual representations they rely on, such as CLIP embeddings, often lack access to external world knowledge critical for real-world visual reasoning. In this work, we propose Visual Table, a novel visual representation tailored for MLLMs. It provides hierarchical text descriptions of holistic visual scenes, consisting of a scene description and multiple object-centric descriptions that encompass categories, attributes, and knowledge at instance level. We further develop a scalable generator for visual table generation and train it on small-scale annotations from GPT4V. Extensive evaluations demonstrate that, with generated visual tables as additional visual representations, our model can consistently outperform the state-of-the-art (SOTA) MLLMs across diverse benchmarks. When visual tables serve as standalone visual representations, our model can closely match or even beat the SOTA MLLMs that are built on CLIP visual embeddings. Our code is available at https://github.com/LaVi-Lab/Visual-Table.
Updated: 2024-03-27 04:49:23
标题: 超越嵌入:多模态模型中视觉表的潜力
摘要: 视觉表征学习一直是计算机视觉中的基石,从人工标注标签的监督学习发展到对齐来自互联网的图像-文本对。尽管最近在多模态大型语言模型(MLLMs)方面取得了进展,但它们依赖的视觉表征,如CLIP嵌入,通常缺乏关键的外部世界知识,这对实际视觉推理至关重要。在这项工作中,我们提出了Visual Table,这是一种为MLLMs量身定制的新型视觉表征。它提供了整体视觉场景的层次化文本描述,包括场景描述和涵盖类别、属性和实例级别知识的多个以对象为中心的描述。我们进一步开发了一个可伸缩的生成器,用于生成视觉表,并在来自GPT4V的小规模注释上进行训练。广泛的评估表明,通过生成的视觉表作为额外的视觉表征,我们的模型可以在多样化的基准测试中 consistently 超越最先进的MLLMs。当视觉表作为独立的视觉表征时,我们的模型可以接近甚至超越建立在CLIP视觉嵌入上的最先进的MLLMs。我们的代码可在https://github.com/LaVi-Lab/Visual-Table 上找到。
更新时间: 2024-03-27 04:49:23
领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM
An Experimentally Validated Feasible Quantum Protocol for Identity-Based Signature with Application to Secure Email Communication
Digital signatures are one of the simplest cryptographic building blocks that provide appealing security characteristics such as authenticity, unforgeability, and undeniability. In 1984, Shamir developed the first Identity-based signature (IBS) to simplify public key infrastructure and circumvent the need for certificates. It makes the process uncomplicated by enabling users to verify digital signatures using only the identifiers of signers, such as email, phone number, etc. Nearly all existing IBS protocols rely on several theoretical assumption-based hard problems. Unfortunately, these hard problems are unsafe and pose a hazard in the quantum realm. Thus, designing IBS algorithms that can withstand quantum attacks and ensure long-term security is an important direction for future research. Quantum cryptography (QC) is one such approach. In this paper, we propose an IBS based on QC. Our scheme's security is based on the laws of quantum mechanics. It thereby achieves long-term security and provides resistance against quantum attacks. We verify the proposed design's correctness and feasibility by simulating it in a prototype quantum device and the IBM Qiskit quantum simulator. The implementation code in qiskit with Jupyternotebook is provided in the Annexure. Moreover, we discuss the application of our design in secure email communication.
Updated: 2024-03-27 04:32:41
标题: 一个经过实验证的可行的基于身份签名的量子协议,应用于安全电子邮件通信
摘要: 数字签名是最简单的密码学基本构件之一,具有诸如真实性、防伪性和不可否认性等吸引人的安全特性。1984年,Shamir开发了第一个基于身份的签名(IBS),以简化公钥基础设施并规避证书的需求。它通过仅使用签名者的标识符(如电子邮件、电话号码等)来使验证数字签名的过程变得简单。几乎所有现有的IBS协议都依赖于几个基于理论假设的难题。不幸的是,这些难题是不安全的,并在量子领域中构成危险。因此,设计能够抵御量子攻击并确保长期安全性的IBS算法是未来研究的重要方向。量子密码学(QC)是一种这样的方法。在本文中,我们提出了一种基于QC的IBS。我们方案的安全性基于量子力学定律,因此实现了长期安全性并提供了对抗量子攻击的能力。我们通过在原型量子设备和IBM Qiskit量子模拟器中模拟来验证所提出设计的正确性和可行性。在附件中提供了在Qiskit中的实现代码和Jupyternotebook。此外,我们讨论了我们设计在安全电子邮件通信中的应用。
更新时间: 2024-03-27 04:32:41
领域: cs.CR,cs.IT,math.IT
Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check
Retrieval-Augmented Generation (RAG) aims to generate more reliable and accurate responses, by augmenting large language models (LLMs) with the external vast and dynamic knowledge. Most previous work focuses on using RAG for single-round question answering, while how to adapt RAG to the complex conversational setting wherein the question is interdependent on the preceding context is not well studied. In this paper, we propose a conversation-level RAG approach, which incorporates fine-grained retrieval augmentation and self-check for conversational question answering (CQA). In particular, our approach consists of three components, namely conversational question refiner, fine-grained retriever and self-check based response generator, which work collaboratively for question understanding and relevant information acquisition in conversational settings. Extensive experiments demonstrate the great advantages of our approach over the state-of-the-art baselines. Moreover, we also release a Chinese CQA dataset with new features including reformulated question, extracted keyword, retrieved paragraphs and their helpfulness, which facilitates further researches in RAG enhanced CQA.
Updated: 2024-03-27 04:20:18
标题: 通过细粒度检索增强和自检来提升对话式问答
摘要: 检索增强生成(RAG)旨在通过将大型语言模型(LLMs)与外部庞大且动态的知识相结合,生成更可靠和准确的响应。大多数先前的工作集中在将RAG用于单轮问答,而如何将RAG调整到复杂的会话环境中,其中问题与前文内容相互依赖,尚未得到充分研究。本文提出了一种会话级RAG方法,该方法将细粒度检索增强和自检融入到会话问答(CQA)中。具体而言,我们的方法包括三个组件,即会话问题精化器、细粒度检索器和基于自检的响应生成器,它们共同为在会话环境中进行问题理解和相关信息获取。广泛的实验表明,我们的方法在最先进的基线上具有巨大优势。此外,我们还发布了一个包含改写问题、提取关键词、检索段落及其有帮助程度等新特性的中文CQA数据集,这有助于进一步研究RAG增强的CQA。
更新时间: 2024-03-27 04:20:18
领域: cs.AI
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to generate highly diverse 3D shapes that comply with the specified constraints. In this paper, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling. To ensure spatial coherence and reduce memory usage, we incorporate a hybrid shape representation technique that directly learns a continuous signed distance field representation of the 3D shape using orthogonal 2D planes. Additionally, we meticulously enforce spatial correspondences across distinct planes using a transformer-based autoencoder structure, promoting the preservation of spatial relationships in the generated 3D shapes. This yields an algorithm that consistently outperforms state-of-the-art 3D shape generation methods on various tasks, including unconditional shape generation, multi-modal shape completion, single-view reconstruction, and text-to-shape synthesis.
Updated: 2024-03-27 04:09:34
标题: NeuSDFusion: 一种用于3D形状补全、重建和生成的空间感知生成模型
摘要: 3D形状生成旨在产生符合特定条件和约束的创新性3D内容。现有方法通常将3D形状分解为一系列局部组件,将每个元素单独处理,而不考虑空间一致性。因此,这些方法在3D数据表示和形状生成方面表现出有限的灵活性,阻碍了它们生成符合指定约束的高度多样化的3D形状的能力。在本文中,我们引入了一种新颖的空间感知3D形状生成框架,利用2D平面表示增强3D形状建模。为了确保空间一致性和减少内存使用,我们采用了一种混合形状表示技术,直接学习使用正交2D平面的连续有符号距离场表示3D形状。此外,我们通过基于变压器的自动编码器结构精心强化不同平面之间的空间对应关系,促进生成的3D形状中空间关系的保持。这产生了一种算法,在各种任务中始终优于最先进的3D形状生成方法,包括无条件形状生成、多模态形状完成、单视图重建和文本到形状合成。
更新时间: 2024-03-27 04:09:34
领域: cs.CV,cs.AI,cs.GR,cs.LG
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years. Current VL models either use lightweight uni-modal encoders and learn to extract, align and fuse both modalities simultaneously in a deep cross-modal encoder, or feed the last-layer uni-modal representations from the deep pre-trained uni-modal encoders into the top cross-modal encoder. Both approaches potentially restrict vision-language representation learning and limit model performance. In this paper, we propose BridgeTower, which introduces multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder. This enables effective bottom-up cross-modal alignment and fusion between visual and textual representations of different semantic levels of pre-trained uni-modal encoders in the cross-modal encoder. Pre-trained with only 4M images, BridgeTower achieves state-of-the-art performance on various downstream vision-language tasks. In particular, on the VQAv2 test-std set, BridgeTower achieves an accuracy of 78.73%, outperforming the previous state-of-the-art model METER by 1.09% with the same pre-training data and almost negligible additional parameters and computational costs. Notably, when further scaling the model, BridgeTower achieves an accuracy of 81.15%, surpassing models that are pre-trained on orders-of-magnitude larger datasets. Code and checkpoints are available at https://github.com/microsoft/BridgeTower.
Updated: 2024-03-27 03:53:23
标题: BridgeTower:在视觉-语言表示学习中构建编码器之间的桥梁
摘要: 视觉语言(VL)模型采用双塔架构近年来主导了视觉语言表示学习。当前的VL模型要么使用轻量级单模式编码器,并学习同时在深度跨模态编码器中提取、对齐和融合两种模态,要么将深度预训练的单模式编码器的最后一层单模式表示输入顶部跨模态编码器。这两种方法都可能限制视觉语言表示学习并限制模型性能。在本文中,我们提出了BridgeTower,引入了多个桥接层,建立了单模式编码器的顶层与跨模态编码器的每一层之间的连接。这使得预训练的单模式编码器的不同语义级别的视觉和文本表示在跨模态编码器中进行了有效的自下而上的跨模态对齐和融合。仅使用400万张图像进行预训练,BridgeTower在各种下游视觉语言任务上实现了最先进的性能。特别是在VQAv2测试集上,BridgeTower实现了78.73%的准确率,比之前的最新模型METER高出1.09%,使用相同的预训练数据和几乎可以忽略不计的额外参数和计算成本。值得注意的是,当进一步扩展模型时,BridgeTower实现了81.15%的准确率,超过了在数量级更大的数据集上预训练的模型。代码和检查点可在https://github.com/microsoft/BridgeTower找到。
更新时间: 2024-03-27 03:53:23
领域: cs.CV,cs.CL,cs.LG
Discovering and Mitigating Visual Biases through Keyword Explanation
Addressing biases in computer vision models is crucial for real-world AI deployments. However, mitigating visual biases is challenging due to their unexplainable nature, often identified indirectly through visualization or sample statistics, which necessitates additional human supervision for interpretation. To tackle this issue, we propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. Specifically, we extract common keywords from the captions of mispredicted images to identify potential biases in the model. We then validate these keywords by measuring their similarity to the mispredicted images using a vision-language scoring model. The keyword explanation form of visual bias offers several advantages, such as a clear group naming for bias discovery and a natural extension for debiasing using these group names. Our experiments demonstrate that B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. Additionally, B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet. For example, we discovered a contextual bias between "bee" and "flower" in ImageNet. We also highlight various applications of B2T keywords, including debiased training, CLIP prompting, and model comparison.
Updated: 2024-03-27 03:47:20
标题: 发现和减轻视觉偏见通过关键词解释
摘要: 解决计算机视觉模型中的偏见对于实际应用是至关重要的人工智能部署。然而,减轻视觉偏见是具有挑战性的,因为它们的不可解释性,通常通过可视化或样本统计间接识别,这需要额外的人类监督来进行解释。为了解决这个问题,我们提出了Bias-to-Text(B2T)框架,将视觉偏见解释为关键字。具体来说,我们从误判图像的标题中提取常见关键字,以识别模型中的潜在偏见。然后,我们通过使用视觉语言评分模型来测量这些关键字与误判图像的相似性来验证这些关键字。视觉偏见的关键字解释形式提供了几个优势,例如用于偏见发现的清晰群组命名以及使用这些群组名称进行去偏见的自然延伸。我们的实验表明,B2T可以识别已知的偏见,例如CelebA中的性别偏见,Waterbirds中的背景偏见以及ImageNet-R/C中的分布偏移。此外,B2T还揭示了更大数据集中的新偏见,例如Dollar Street和ImageNet。例如,我们在ImageNet中发现了"蜜蜂"和"花"之间的背景偏见。我们还强调了B2T关键字的各种应用,包括去偏见训练、CLIP提示和模型比较。
更新时间: 2024-03-27 03:47:20
领域: cs.LG,cs.CV
Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data
PURPOSE: Deep learning methods for classifying prostate cancer (PCa) in ultrasound images typically employ convolutional networks (CNNs) to detect cancer in small regions of interest (ROI) along a needle trace region. However, this approach suffers from weak labelling, since the ground-truth histopathology labels do not describe the properties of individual ROIs. Recently, multi-scale approaches have sought to mitigate this issue by combining the context awareness of transformers with a CNN feature extractor to detect cancer from multiple ROIs using multiple-instance learning (MIL). In this work, we present a detailed study of several image transformer architectures for both ROI-scale and multi-scale classification, and a comparison of the performance of CNNs and transformers for ultrasound-based prostate cancer classification. We also design a novel multi-objective learning strategy that combines both ROI and core predictions to further mitigate label noise. METHODS: We evaluate 3 image transformers on ROI-scale cancer classification, then use the strongest model to tune a multi-scale classifier with MIL. We train our MIL models using our novel multi-objective learning strategy and compare our results to existing baselines. RESULTS: We find that for both ROI-scale and multi-scale PCa detection, image transformer backbones lag behind their CNN counterparts. This deficit in performance is even more noticeable for larger models. When using multi-objective learning, we can improve performance of MIL, with a 77.9% AUROC, a sensitivity of 75.9%, and a specificity of 66.3%. CONCLUSION: Convolutional networks are better suited for modelling sparse datasets of prostate ultrasounds, producing more robust features than transformers in PCa detection. Multi-scale methods remain the best architecture for this task, with multi-objective learning presenting an effective way to improve performance.
Updated: 2024-03-27 03:39:57
标题: 对前列腺癌超声数据进行图像转换器的基准测试
摘要: 目的:深度学习方法通常使用卷积网络(CNNs)来检测针迹区域沿着感兴趣区域(ROI)中的癌症,以分类前列腺癌(PCa)的超声图像。然而,这种方法存在标注不足的问题,因为地面真实组织学标签无法描述单个ROI的特性。最近,多尺度方法试图通过将变换器的上下文感知与CNN特征提取器相结合,使用多实例学习(MIL)从多个ROI中检测癌症来缓解这一问题。在这项工作中,我们对几种图像变换器架构进行了详细研究,用于ROI尺度和多尺度分类,并比较了基于超声的前列腺癌分类中CNN和变换器的性能。我们还设计了一种新颖的多目标学习策略,将ROI和核心预测结合起来,以进一步减轻标签噪声。方法:我们评估了3种图像变换器在ROI尺度癌症分类上的性能,然后使用最强模型来调整一个多尺度分类器,并使用MIL进行训练。我们使用我们的新颖多目标学习策略来训练我们的MIL模型,并将结果与现有基线进行比较。结果:我们发现,无论是在ROI尺度还是多尺度PCa检测中,图像变换器的性能都落后于其CNN对应物。这种性能不足在较大模型中尤为明显。当使用多目标学习时,我们可以提高MIL的性能,获得77.9%的AUROC,75.9%的敏感性和66.3%的特异性。结论:卷积网络更适合建模稀疏的前列腺超声数据集,比变换器在PCa检测中生成更强大的特征。多尺度方法仍然是这项任务的最佳架构,多目标学习是提高性能的有效方法。
更新时间: 2024-03-27 03:39:57
领域: eess.IV,cs.CV,cs.LG,q-bio.TO
Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation
Large language models (LLMs), in conjunction with various reasoning reinforcement methodologies, have demonstrated remarkable capabilities comparable to humans in fields such as mathematics, law, coding, common sense, and world knowledge. In this paper, we delve into the reasoning abilities of LLMs within complex human systems. We propose a novel reasoning framework, termed ``Mosaic Expert Observation Wall'' (MEOW) exploiting generative-agents-based simulation technique. In the MEOW framework, simulated data are utilized to train an expert model concentrating ``experience'' about a specific task in each independent time of simulation. It is the accumulated ``experience'' through the simulation that makes for an expert on a task in a complex human system. We conduct the experiments within a communication game that mirrors real-world security scenarios. The results indicate that our proposed methodology can cooperate with existing methodologies to enhance the reasoning abilities of LLMs in complex human systems.
Updated: 2024-03-27 03:33:32
标题: 大型语言模型需要咨询顾问进行推理:通过行为模拟成为复杂人类系统的专家
摘要: 大型语言模型(LLMs)结合各种推理强化方法,已经展示出在数学、法律、编码、常识和世界知识等领域与人类相媲美的卓越能力。本文深入探讨LLMs在复杂人类系统中的推理能力。我们提出了一个新颖的推理框架,称为“Mosaic Expert Observation Wall”(MEOW),利用基于生成代理的模拟技术。在MEOW框架中,模拟数据被用来训练一个专家模型,集中于每个独立模拟时间内关于特定任务的“经验”。正是通过模拟积累的“经验”使得在复杂人类系统中成为专家。我们在一个反映真实世界安全场景的通讯游戏中进行实验。结果表明,我们提出的方法可以与现有方法合作,增强LLMs在复杂人类系统中的推理能力。
更新时间: 2024-03-27 03:33:32
领域: cs.AI
Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification
Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby the Spiking Self-Attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this paper, we innovatively replace vanilla SSA (using dynamic bases calculating from Query and Key) with spike-form Fourier Transform, Wavelet Transform, and their combinations (using fixed triangular or wavelets bases), based on a key hypothesis that both of them use a set of basis functions for information transformation. Hence, the Fourier-or-Wavelet-based spikformer (FWformer) is proposed and verified in visual classification tasks, including both static image and event-based video datasets. The FWformer can achieve comparable or even higher accuracies ($0.4\%$-$1.5\%$), higher running speed ($9\%$-$51\%$ for training and $19\%$-$70\%$ for inference), reduced theoretical energy consumption ($20\%$-$25\%$), and reduced GPU memory usage ($4\%$-$26\%$), compared to the standard spikformer. Our result indicates the continuous refinement of new Transformers, that are inspired either by biological discovery (spike-form), or information theory (Fourier or Wavelet Transform), is promising.
Updated: 2024-03-27 03:31:16
标题: Fourier或小波基作为Spikformer中高效视觉分类的对应自注意力
摘要: 提出了一种能效高的Spikformer,通过将生物学合理的脉冲神经网络(SNN)和人工Transformer集成在一起,利用脉冲自注意力(SSA)实现更高的准确性和更低的计算成本。然而,似乎自注意力并不总是必要的,特别是在稀疏的脉冲形式计算方式中。在本文中,我们创新地将普通的SSA(使用动态基于Query和Key计算的基)替换为脉冲形式的傅里叶变换、小波变换及它们的组合(使用固定的三角形或小波基),基于一个关键假设,即它们都使用一组基函数进行信息转换。因此,提出了基于傅里叶或小波的spikformer(FWformer),并在视觉分类任务中进行了验证,包括静态图像和基于事件的视频数据集。与标准spikformer相比,FWformer在实现可比或甚至更高的准确率(0.4%-1.5%)、更高的运行速度(训练时9%-51%,推断时19%-70%)、更低的理论能耗(20%-25%)和更低的GPU内存使用率(4%-26%)方面表现出色。我们的结果表明,受生物发现(脉冲形式)或信息理论(傅里叶或小波变换)启发的新Transformer的持续改进是有希望的。
更新时间: 2024-03-27 03:31:16
领域: cs.CV,cs.LG,cs.NE
Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation
Large Language Models (LLMs) are emerging as promising approaches to enhance session-based recommendation (SBR), where both prompt-based and fine-tuning-based methods have been widely investigated to align LLMs with SBR. However, the former methods struggle with optimal prompts to elicit the correct reasoning of LLMs due to the lack of task-specific feedback, leading to unsatisfactory recommendations. Although the latter methods attempt to fine-tune LLMs with domain-specific knowledge, they face limitations such as high computational costs and reliance on open-source backbones. To address such issues, we propose a Reflective Reinforcement Large Language Model (Re2LLM) for SBR, guiding LLMs to focus on specialized knowledge essential for more accurate recommendations effectively and efficiently. In particular, we first design the Reflective Exploration Module to effectively extract knowledge that is readily understandable and digestible by LLMs. To be specific, we direct LLMs to examine recommendation errors through self-reflection and construct a knowledge base (KB) comprising hints capable of rectifying these errors. To efficiently elicit the correct reasoning of LLMs, we further devise the Reinforcement Utilization Module to train a lightweight retrieval agent. It learns to select hints from the constructed KB based on the task-specific feedback, where the hints can serve as guidance to help correct LLMs reasoning for better recommendations. Extensive experiments on multiple real-world datasets demonstrate that our method consistently outperforms state-of-the-art methods.
Updated: 2024-03-27 03:27:24
标题: Re2LLM:基于反思的强化大型语言模型用于基于会话的推荐
摘要: 大型语言模型(LLMs)正逐渐成为增强基于会话的推荐(SBR)的有希望的方法,其中已经广泛研究了基于提示和微调的方法来使LLMs与SBR对齐。然而,前者的方法由于缺乏任务特定的反馈而难以找到引起LLMs正确推理的最佳提示,导致推荐不尽人意。虽然后者的方法试图用领域特定知识微调LLMs,但它们面临诸如计算成本高和依赖开源骨干的限制。为了解决这些问题,我们提出了一种反思性强化大型语言模型(Re2LLM)用于SBR,引导LLMs专注于更准确推荐所必需的专业知识,而且效果高效。具体来说,我们首先设计反思探索模块来有效提取LLMs容易理解和消化的知识。具体地,我们指导LLMs通过自我反思来检查推荐错误,并构建一个包含能够纠正这些错误的提示的知识库(KB)。为了有效引起LLMs正确推理,我们进一步设计了强化利用模块来训练一个轻量级检索代理。它学会根据任务特定的反馈从构建的KB中选择提示,其中这些提示可以作为指导帮助纠正LLMs的推理以获得更好的推荐。在多个真实世界数据集上的大量实验表明,我们的方法始终优于最先进的方法。
更新时间: 2024-03-27 03:27:24
领域: cs.AI
A Transformer-Based Framework for Payload Malware Detection and Classification
As malicious cyber threats become more sophisticated in breaching computer networks, the need for effective intrusion detection systems (IDSs) becomes crucial. Techniques such as Deep Packet Inspection (DPI) have been introduced to allow IDSs analyze the content of network packets, providing more context for identifying potential threats. IDSs traditionally rely on using anomaly-based and signature-based detection techniques to detect unrecognized and suspicious activity. Deep learning techniques have shown great potential in DPI for IDSs due to their efficiency in learning intricate patterns from the packet content being transmitted through the network. In this paper, we propose a revolutionary DPI algorithm based on transformers adapted for the purpose of detecting malicious traffic with a classifier head. Transformers learn the complex content of sequence data and generalize them well to similar scenarios thanks to their self-attention mechanism. Our proposed method uses the raw payload bytes that represent the packet contents and is deployed as man-in-the-middle. The payload bytes are used to detect malicious packets and classify their types. Experimental results on the UNSW-NB15 and CIC-IOT23 datasets demonstrate that our transformer-based model is effective in distinguishing malicious from benign traffic in the test dataset, attaining an average accuracy of 79\% using binary classification and 72\% on the multi-classification experiment, both using solely payload bytes.
Updated: 2024-03-27 03:25:45
标题: 一个基于Transformer的框架用于负载恶意软件的检测和分类
摘要: 随着恶意网络威胁在入侵计算机网络方面变得更加复杂,有效的入侵检测系统(IDSs)的需求变得至关重要。诸如深度数据包检测(DPI)等技术已被引入,允许IDSs分析网络数据包的内容,为识别潜在威胁提供更多背景信息。IDSs传统上依赖异常检测和签名检测技术来检测未识别和可疑活动。深度学习技术在DPI中显示出巨大潜力,因为它们能够有效地从通过网络传输的数据包内容中学习复杂模式。本文提出了一种基于转换器的革命性DPI算法,专门用于检测恶意流量,并带有分类器头部。转换器学习序列数据的复杂内容,并通过其自注意机制很好地泛化到类似情景中。我们提出的方法使用表示数据包内容的原始有效载荷字节,并部署为中间人。有效载荷字节用于检测恶意数据包并对其进行分类。在UNSW-NB15和CIC-IOT23数据集上的实验结果表明,我们基于转换器的模型在测试数据集中有效区分恶意和良性流量,使用仅有效载荷字节的二元分类实验达到了79%的平均准确率,并在多类别实验中达到了72%。
更新时间: 2024-03-27 03:25:45
领域: cs.CR,cs.AI,cs.LG
Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies
Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. Toward addressing this challenge, we propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions by aggregating the local information of candidate actions. We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates. The accompanying code is accessible at the link: https://github.com/BobWu1998/uncertainty_quant_all.git
Updated: 2024-03-27 03:19:36
标题: 不确定性感知的预训练语言条件模仿学习策略部署
摘要: 在来自不同任务和机器人平台的数据训练的大规模机器人政策具有使通用目的机器人成为可能的巨大潜力;然而,可靠地推广到新环境条件仍然是一个重要挑战。为了解决这一挑战,我们提出了一种新颖的方法,用于不确定性感知部署经过预训练的语言条件模仿学习代理。具体来说,我们使用温度缩放来校准这些模型,并利用校准模型通过聚合候选动作的本地信息来做出不确定性感知决策。我们在仿真中使用三个这样的预训练模型来实现我们的方法,并展示其显著提高任务完成率的潜力。附带的代码可在以下链接获得:https://github.com/BobWu1998/uncertainty_quant_all.git
更新时间: 2024-03-27 03:19:36
领域: cs.RO,cs.LG
CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models
Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. The design philosophy of CAFE is to dynamically allocate more memory resources to important features (called hot features), and allocate less memory to unimportant ones. In CAFE, we propose a fast and lightweight sketch data structure, named HotSketch, to capture feature importance and report hot features in real time. For each reported hot feature, we assign it a unique embedding. For the non-hot features, we allow multiple features to share one embedding by using hash embedding technique. Guided by our design philosophy, we further propose a multi-level hash embedding framework to optimize the embedding tables of non-hot features. We theoretically analyze the accuracy of HotSketch, and analyze the model convergence against deviation. Extensive experiments show that CAFE significantly outperforms existing embedding compression methods, yielding 3.92% and 3.68% superior testing AUC on Criteo Kaggle dataset and CriteoTB dataset at a compression ratio of 10000x. The source codes of CAFE are available at GitHub.
Updated: 2024-03-27 03:14:14
标题: CAFE: 面向大规模推荐模型的紧凑、自适应和快速嵌入
摘要: 最近,深度学习推荐模型(DLRMs)中嵌入表的增长内存需求给模型训练和部署带来了巨大挑战。现有的嵌入压缩解决方案无法同时满足三个关键设计要求:内存效率、低延迟和适应动态数据分布。本文提出了CAFE,一个紧凑、自适应和快速的嵌入压缩框架,以满足上述要求。CAFE的设计理念是动态分配更多内存资源给重要特征(称为热门特征),并为不重要的特征分配较少的内存。在CAFE中,我们提出了一种快速且轻量级的草图数据结构,名为HotSketch,用于实时捕获特征重要性并报告热门特征。对于每个报告的热门特征,我们为其分配一个唯一的嵌入。对于非热门特征,我们通过使用哈希嵌入技术允许多个特征共享一个嵌入。在我们的设计理念指导下,我们进一步提出了一个多级哈希嵌入框架来优化非热门特征的嵌入表。我们从理论上分析了HotSketch的准确性,并分析了模型收敛性对偏差的影响。大量实验表明,CAFE显著优于现有的嵌入压缩方法,在Criteo Kaggle数据集和CriteoTB数据集上,在10000x的压缩比下,测试AUC分别提高了3.92%和3.68%。CAFE的源代码可在GitHub上获得。
更新时间: 2024-03-27 03:14:14
领域: cs.LG
From Two-Dimensional to Three-Dimensional Environment with Q-Learning: Modeling Autonomous Navigation with Reinforcement Learning and no Libraries
Reinforcement learning (RL) algorithms have become indispensable tools in artificial intelligence, empowering agents to acquire optimal decision-making policies through interactions with their environment and feedback mechanisms. This study explores the performance of RL agents in both two-dimensional (2D) and three-dimensional (3D) environments, aiming to research the dynamics of learning across different spatial dimensions. A key aspect of this investigation is the absence of pre-made libraries for learning, with the algorithm developed exclusively through computational mathematics. The methodological framework centers on RL principles, employing a Q-learning agent class and distinct environment classes tailored to each spatial dimension. The research aims to address the question: How do reinforcement learning agents adapt and perform in environments of varying spatial dimensions, particularly in 2D and 3D settings? Through empirical analysis, the study evaluates agents' learning trajectories and adaptation processes, revealing insights into the efficacy of RL algorithms in navigating complex, multi-dimensional spaces. Reflections on the findings prompt considerations for future research, particularly in understanding the dynamics of learning in higher-dimensional environments.
Updated: 2024-03-27 03:07:18
标题: 从二维到三维环境的Q学习:使用强化学习建模无库自主导航
摘要: 强化学习(RL)算法已经成为人工智能中不可或缺的工具,赋予代理通过与环境互动和反馈机制获取最佳决策策略的能力。本研究探讨了RL代理在二维(2D)和三维(3D)环境中的表现,旨在研究不同空间维度下学习的动态。该调查的一个关键方面是缺乏用于学习的预制库,算法完全通过计算数学开发。方法论框架侧重于RL原则,采用Q-learning代理类和针对每个空间维度量身定制的不同环境类。研究旨在回答一个问题:强化学习代理如何在不同空间维度的环境中适应和表现,特别是在2D和3D设置中?通过实证分析,研究评估了代理的学习轨迹和适应过程,揭示了RL算法在导航复杂多维空间中的有效性。对研究结果的反思引发了对未来研究的思考,特别是在理解高维环境中学习动态方面。
更新时间: 2024-03-27 03:07:18
领域: cs.LG,cs.AI,stat.CO
Dial-MAE: ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems
Dialogue response selection aims to select an appropriate response from several candidates based on a given user and system utterance history. Most existing works primarily focus on post-training and fine-tuning tailored for cross-encoders. However, there are no post-training methods tailored for dense encoders in dialogue response selection. We argue that when the current language model, based on dense dialogue systems (such as BERT), is employed as a dense encoder, it separately encodes dialogue context and response, leading to a struggle to achieve the alignment of both representations. Thus, we propose Dial-MAE (Dialogue Contextual Masking Auto-Encoder), a straightforward yet effective post-training technique tailored for dense encoders in dialogue response selection. Dial-MAE uses an asymmetric encoder-decoder architecture to compress the dialogue semantics into dense vectors, which achieves better alignment between the features of the dialogue context and response. Our experiments have demonstrated that Dial-MAE is highly effective, achieving state-of-the-art performance on two commonly evaluated benchmarks.
Updated: 2024-03-27 03:06:13
标题: Dial-MAE: 基于检索的对话系统的上下文掩码自编码器
摘要: 对话响应选择旨在根据给定的用户和系统话语历史,从多个候选项中选择一个合适的响应。现有的大部分工作主要侧重于针对交叉编码器进行后训练和微调。然而,在对话响应选择中,没有针对密集编码器的后训练方法。我们认为,当基于密集对话系统(如BERT)的当前语言模型被用作密集编码器时,它分别对对话上下文和响应进行编码,导致难以实现两者表示的对齐。因此,我们提出了 Dial-MAE(对话上下文遮蔽自动编码器),这是一种针对对话响应选择中的密集编码器量身定制的简单而有效的后训练技术。Dial-MAE 使用一个不对称的编码器-解码器架构将对话语义压缩为密集向量,从而实现对话上下文和响应特征之间更好的对齐。我们的实验证明,Dial-MAE 非常有效,在两个常见的评估基准上取得了最先进的性能。
更新时间: 2024-03-27 03:06:13
领域: cs.CL,cs.AI
Leveraging Large Language Models for Fuzzy String Matching in Political Science
Fuzzy string matching remains a key issue when political scientists combine data from different sources. Existing matching methods invariably rely on string distances, such as Levenshtein distance and cosine similarity. As such, they are inherently incapable of matching strings that refer to the same entity with different names such as ''JP Morgan'' and ''Chase Bank'', ''DPRK'' and ''North Korea'', ''Chuck Fleischmann (R)'' and ''Charles Fleischmann (R)''. In this letter, we propose to use large language models to entirely sidestep this problem in an easy and intuitive manner. Extensive experiments show that our proposed methods can improve the state of the art by as much as 39% in terms of average precision while being substantially easier and more intuitive to use by political scientists. Moreover, our results are robust against various temperatures. We further note that enhanced prompting can lead to additional performance improvements.
Updated: 2024-03-27 03:04:21
标题: 利用大型语言模型进行政治科学中的模糊字符串匹配
摘要: 模糊字符串匹配仍然是政治科学家在整合不同数据来源时面临的关键问题。现有的匹配方法通常依赖于字符串距离,例如Levenshtein距离和余弦相似度。因此,它们本质上无法匹配指代同一实体但具有不同名称的字符串,比如"JP Morgan"和"Chase Bank","DPRK"和"朝鲜","Chuck Fleischmann (R)"和"Charles Fleischmann (R)"。在这封信中,我们提议使用大型语言模型以一种简单直观的方式完全绕过这个问题。大量实验证明,我们提出的方法可以在平均精度方面将当前技术水平提高多达39%,同时对政治科学家来说使用起来更加简单直观。此外,我们的结果对不同温度具有稳健性。我们进一步指出,增强提示可以带来额外的性能改进。
更新时间: 2024-03-27 03:04:21
领域: cs.AI
SoftTiger: A Clinical Foundation Model for Healthcare Workflows
We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization. Therefore, we publicly release SoftTiger models at scales of 13 billion and 70 billion parameters, as well as datasets and code for our innovative scalable evaluation, hopefully, making a significant contribution to the healthcare industry.
Updated: 2024-03-27 03:03:00
标题: SoftTiger:一种用于医疗工作流程的临床基础模型
摘要: 我们介绍了SoftTiger,一个专为医疗工作流程设计的临床大型语言模型(CLaM)。临床笔记的叙述性和非结构化特性是医疗智能化的一大障碍。我们解决了将临床笔记转化为临床数据的关键问题,符合国际互操作标准。我们收集并标记了三个子任务的数据,即国际患者摘要、临床印象和医疗接触。然后,我们使用公共和经过认证的临床数据对最先进的LLM进行了监督微调。培训是以这样一种方式进行的,以便目标模型首先支持基本临床任务,如缩写扩展和时间信息提取,然后学会执行更复杂的下游临床任务。此外,我们解决了医疗环境中的一些建模挑战,例如额外长的上下文窗口。我们的盲目两两评估显示,SoftTiger的性能优于其他流行的开源模型和GPT-3.5,与Gemini-pro相当,与GPT-4相比略有差距。我们认为LLM可能成为医疗数字化和民主化的一个阶梯。因此,我们公开发布了13亿和70亿参数规模的SoftTiger模型,以及我们创新的可扩展评估的数据集和代码,希望为医疗行业做出重要贡献。
更新时间: 2024-03-27 03:03:00
领域: cs.CL,cs.AI
Probing Multimodal Large Language Models for Global and Local Semantic Representations
The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving state-of-the-art performance on image-to-text tasks. However, there are few studies exploring which layers of MLLMs make the most effort to the global image information, which plays vital roles in multimodal comprehension and generation. In this study, we find that the intermediate layers of models can encode more global semantic information, whose representation vectors perform better on visual-language entailment tasks, rather than the topmost layers. We further probe models regarding local semantic representations through object recognition tasks. We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information. Our code and data are released via https://github.com/kobayashikanna01/probing_MLLM_rep.
Updated: 2024-03-27 02:59:57
标题: 探究用于全局和局部语义表示的多模态大型语言模型
摘要: 多模态大语言模型(MLLMs)的进步大大加速了在理解综合文本和图像方面的应用的发展。最近的研究利用图像标题数据集来训练MLLMs,在图像到文本任务上取得了最先进的性能。然而,很少有研究探索MLLMs的哪些层对全局图像信息起到最大作用,这在多模态理解和生成中起着至关重要的作用。在这项研究中,我们发现模型的中间层可以编码更多的全局语义信息,其表示向量在视觉语言蕴涵任务上表现更好,而不是最顶层的层。我们进一步通过对象识别任务来探究模型关于局部语义表示的情况。我们发现最顶层可能过分关注局部信息,导致其编码全局信息的能力减弱。我们的代码和数据可通过https://github.com/kobayashikanna01/probing_MLLM_rep释放。
更新时间: 2024-03-27 02:59:57
领域: cs.CL,cs.AI
Minimax Optimal Fair Classification with Bounded Demographic Disparity
Mitigating the disparate impact of statistical machine learning methods is crucial for ensuring fairness. While extensive research aims to reduce disparity, the effect of using a \emph{finite dataset} -- as opposed to the entire population -- remains unclear. This paper explores the statistical foundations of fair binary classification with two protected groups, focusing on controlling demographic disparity, defined as the difference in acceptance rates between the groups. Although fairness may come at the cost of accuracy even with infinite data, we show that using a finite sample incurs additional costs due to the need to estimate group-specific acceptance thresholds. We study the minimax optimal classification error while constraining demographic disparity to a user-specified threshold. To quantify the impact of fairness constraints, we introduce a novel measure called \emph{fairness-aware excess risk} and derive a minimax lower bound on this measure that all classifiers must satisfy. Furthermore, we propose FairBayes-DDP+, a group-wise thresholding method with an offset that we show attains the minimax lower bound. Our lower bound proofs involve several innovations. Experiments support that FairBayes-DDP+ controls disparity at the user-specified level, while being faster and having a more favorable fairness-accuracy tradeoff than several baselines.
Updated: 2024-03-27 02:59:04
标题: 具有有界人口差异的最小化最优公平分类
摘要: 减轻统计机器学习方法的不平等影响对确保公平性至关重要。尽管广泛的研究旨在减少不平等,但使用有限数据集的效果--与整个人口相反--仍不清楚。本文探讨了具有两个受保护群体的公平二元分类的统计基础,重点关注控制人口统计差异,定义为两个群体之间接受率的差异。尽管公平性可能会以准确性为代价,即使有无限数据,我们也表明使用有限样本会产生额外成本,因为需要估计特定于群体的接受阈值。我们研究了在将人口统计差异限制为用户指定阈值的情况下最小化最优分类错误。为了量化公平性约束的影响,我们引入了一个称为“公平感知超额风险”的新度量,并推导了这个度量的最小最优下界,所有分类器都必须满足。此外,我们提出了FairBayes-DDP+,一种具有偏移的群组阈值方法,我们证明它达到了最小最优下界。我们的下界证明涉及几项创新。实验支持FairBayes-DDP+能够控制人口统计差异达到用户指定水平,同时比几种基线方法更快速,具有更有利的公平性-准确性权衡。
更新时间: 2024-03-27 02:59:04
领域: stat.ML,cs.CY,cs.LG,math.ST,stat.TH
In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia Detection
This paper presents a systematic investigation into the effectiveness of Self-Supervised Learning (SSL) methods for Electrocardiogram (ECG) arrhythmia detection. We begin by conducting a novel analysis of the data distributions on three popular ECG-based arrhythmia datasets: PTB-XL, Chapman, and Ribeiro. To the best of our knowledge, our study is the first to quantitatively explore and characterize these distributions in the area. We then perform a comprehensive set of experiments using different augmentations and parameters to evaluate the effectiveness of various SSL methods, namely SimCRL, BYOL, and SwAV, for ECG representation learning, where we observe the best performance achieved by SwAV. Furthermore, our analysis shows that SSL methods achieve highly competitive results to those achieved by supervised state-of-the-art methods. To further assess the performance of these methods on both In-Distribution (ID) and Out-of-Distribution (OOD) ECG data, we conduct cross-dataset training and testing experiments. Our comprehensive experiments show almost identical results when comparing ID and OOD schemes, indicating that SSL techniques can learn highly effective representations that generalize well across different OOD datasets. This finding can have major implications for ECG-based arrhythmia detection. Lastly, to further analyze our results, we perform detailed per-disease studies on the performance of the SSL methods on the three datasets.
Updated: 2024-03-27 02:58:26
标题: 分布内和分布外的自监督心电图表示学习用于心律失常检测
摘要: 本文对自监督学习(SSL)方法在心电图(ECG)心律失常检测中的有效性进行了系统调查。我们首先对三个流行的基于ECG的心律失常数据集(PTB-XL,Chapman和Ribeiro)的数据分布进行了新颖的分析。据我们所知,我们的研究是该领域首次定量探索和表征这些分布。然后,我们使用不同的增强和参数执行了一系列实验,评估了各种SSL方法(即SimCRL,BYOL和SwAV)在ECG表示学习中的有效性,我们观察到SwAV取得了最佳性能。此外,我们的分析显示,SSL方法取得了与监督最先进方法相当的竞争结果。为了进一步评估这些方法在内分布(ID)和外分布(OOD)ECG数据上的性能,我们进行了交叉数据集训练和测试实验。我们的全面实验显示,当比较ID和OOD方案时,结果几乎相同,表明SSL技术可以学习高效的表示,能够很好地泛化到不同的OOD数据集。这一发现对基于ECG的心律失常检测可能有重大影响。最后,为了进一步分析我们的结果,我们对三个数据集上SSL方法的性能进行了详细的疾病研究。
更新时间: 2024-03-27 02:58:26
领域: cs.LG,cs.AI,eess.SP
Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred Policies
Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each temporally extended goal is expressed using a formula in Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. Accordingly, a most preferred policy under a stochastic ordering induces a stochastic nondominated probability distribution over the finite paths in the MDP. To synthesize a most preferred policy, our technical approach includes two key steps. In the first step, we develop a procedure to transform a partially ordered preference over temporal goals into a computational model, called preference automaton, which is a semi-automaton with a partial order over acceptance conditions. In the second step, we prove that finding a most preferred policy is equivalent to computing a Pareto-optimal policy in a multi-objective MDP that is constructed from the original MDP, the preference automaton, and the chosen stochastic ordering relation. Throughout the paper, we employ running examples to illustrate the proposed preference specification and solution approaches. We demonstrate the efficacy of our algorithm using these examples, providing detailed analysis, and then discuss several potential future directions.
Updated: 2024-03-27 02:46:09
标题: 偏好驱动的规划在随机环境中:从部分排序的时间目标到最优偏好政策
摘要: 人类的偏好并非总是通过完全线性顺序来表示:使用部分排序的偏好来表达无法比较的结果是很自然的。在这项工作中,我们考虑决策和概率规划在被建模为马尔可夫决策过程(MDPs)的随机系统中,给定一个在一组时间延长目标上的部分排序偏好。具体来说,每个时间延长目标都是使用线性时间逻辑在有限轨迹上表示的公式(LTL$_f$)。为了根据部分排序偏好进行规划,我们引入排序理论来将时间目标上的偏好映射到MDP的策略上的偏好。因此,在随机排序下最受偏好的策略会导致MDP中有限路径上的随机非支配概率分布。为了合成一个最受偏好的策略,我们的技术方法包括两个关键步骤。在第一步中,我们开发了一个将时间目标上的部分排序偏好转化为计算模型的过程,称为偏好自动机,这是一个带有接受条件的部分排序的半自动机。在第二步中,我们证明找到一个最受偏好的策略等同于计算一个在多目标MDP中的帕累托最优策略,该多目标MDP是由原始MDP、偏好自动机和选择的随机排序关系构建而成。在整个论文中,我们使用运行示例来说明所提出的偏好规范和解决方法。我们使用这些示例展示了我们算法的有效性,提供了详细的分析,然后讨论了几个潜在的未来方向。
更新时间: 2024-03-27 02:46:09
领域: cs.RO,cs.AI,cs.FL,cs.LO
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
Recent fMRI-to-image approaches mainly focused on associating fMRI signals with specific conditions of pre-trained diffusion models. These approaches, while producing high-quality images, capture only a limited aspect of the complex information in fMRI signals and offer little detailed control over image creation. In contrast, this paper proposes to directly modulate the generation process of diffusion models using fMRI signals. Our approach, NeuroPictor, divides the fMRI-to-image process into three steps: i) fMRI calibrated-encoding, to tackle multi-individual pre-training for a shared latent space to minimize individual difference and enable the subsequent cross-subject training; ii) fMRI-to-image cross-subject pre-training, perceptually learning to guide diffusion model with high- and low-level conditions across different individuals; iii) fMRI-to-image single-subject refining, similar with step ii but focus on adapting to particular individual. NeuroPictor extracts high-level semantic features from fMRI signals that characterizing the visual stimulus and incrementally fine-tunes the diffusion model with a low-level manipulation network to provide precise structural instructions. By training with over 60,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity, particularly in the within-subject setting, as evidenced in benchmark datasets. Project page: https://jingyanghuo.github.io/neuropictor/.
Updated: 2024-03-27 02:42:52
标题: 神经画家:通过多个个体预训练和多级调制改进fMRI到图像重建
摘要: 最近的fMRI到图像方法主要集中在将fMRI信号与预先训练的扩散模型的特定条件相关联。这些方法虽然能够产生高质量的图像,但只捕捉了fMRI信号中的有限方面,并且对图像创建的细节控制很少。相比之下,本文提出直接调节扩散模型的生成过程,使用fMRI信号。我们的方法NeuroPictor,将fMRI到图像的过程分为三个步骤:i)fMRI校准编码,处理多个个体的共享潜在空间的多个人的预训练,以最小化个体差异并实现随后的跨个体训练;ii)fMRI到图像跨个体预训练,通过感知学习指导不同个体之间的高级和低级条件的扩散模型;iii)fMRI到图像单个体细化,类似于第二步,但侧重于适应特定个体。NeuroPictor从fMRI信号中提取描述视觉刺激的高级语义特征,并逐步通过低级操作网络对扩散模型进行微调,提供精确的结构指令。通过使用来自各个个体的超过60,000对fMRI图像进行训练,我们的模型在fMRI到图像解码能力方面表现出色,特别是在单个体设置中,在基准数据集中得到证明。项目页面:https://jingyanghuo.github.io/neuropictor/。
更新时间: 2024-03-27 02:42:52
领域: cs.CV,cs.LG
Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving
Reinforcement learning (RL) has been widely used in decision-making tasks, but it cannot guarantee the agent's safety in the training process due to the requirements of interaction with the environment, which seriously limits its industrial applications such as autonomous driving. Safe RL methods are developed to handle this issue by constraining the expected safety violation costs as a training objective, but they still permit unsafe state occurrence, which is unacceptable in autonomous driving tasks. Moreover, these methods are difficult to achieve a balance between the cost and return expectations, which leads to learning performance degradation for the algorithms. In this paper, we propose a novel algorithm based on the long and short-term constraints (LSTC) for safe RL. The short-term constraint aims to guarantee the short-term state safety that the vehicle explores, while the long-term constraint ensures the overall safety of the vehicle throughout the decision-making process. In addition, we develop a safe RL method with dual-constraint optimization based on the Lagrange multiplier to optimize the training process for end-to-end autonomous driving. Comprehensive experiments were conducted on the MetaDrive simulator. Experimental results demonstrate that the proposed method achieves higher safety in continuous state and action tasks, and exhibits higher exploration performance in long-distance decision-making tasks compared with state-of-the-art methods.
Updated: 2024-03-27 02:41:52
标题: 长期和短期约束驱动的自主驾驶安全强化学习
摘要: 强化学习(RL)在决策任务中被广泛应用,但由于与环境的交互需求,无法保证代理程序在训练过程中的安全性,这严重限制了其在工业应用中的应用,如自动驾驶。安全RL方法被开发来处理这个问题,通过约束预期的安全违规成本作为训练目标,但它们仍然允许不安全的状态发生,这在自动驾驶任务中是不可接受的。此外,这些方法在成本和回报预期之间难以达到平衡,导致算法的学习性能下降。本文提出了一种基于长期和短期约束(LSTC)的安全RL算法。短期约束旨在保证车辆探索的短期状态安全性,而长期约束确保整个决策过程中车辆的整体安全。此外,我们开发了一种基于拉格朗日乘子的双约束优化的安全RL方法,以优化端到端自动驾驶的训练过程。在MetaDrive模拟器上进行了全面的实验。实验结果表明,所提出的方法在连续状态和动作任务中实现了更高的安全性,并在长距离决策任务中展现了比最先进方法更高的探索性能。
更新时间: 2024-03-27 02:41:52
领域: cs.LG,cs.AI,cs.RO
Imitating Cost-Constrained Behaviors in Reinforcement Learning
Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance.
Updated: 2024-03-27 02:39:26
标题: 在强化学习中模拟成本受限行为
摘要: 复杂的规划和调度问题长期以来一直使用各种优化或启发式方法来解决。近年来,旨在从专家演示中学习的模仿学习被提出作为解决这些问题的一个可行替代方法。一般来说,模仿学习旨在通过观察专家的行为来学习奖励(或偏好)模型或直接学习行为策略。现有的模仿学习和逆向强化学习的研究主要集中在非受限设置中模仿(例如,车辆消耗的燃料没有限制)。然而,在许多现实世界的领域中,专家的行为不仅受奖励(或偏好)的影响,还受到约束的影响。例如,自动驾驶送货车的决策不仅取决于路线偏好/奖励(取决于过去的需求数据),还取决于车辆中的燃料和可用时间。在这种问题中,模仿学习具有挑战性,因为决策不仅受奖励模型的支配,还取决于成本约束模型。在本文中,我们提供了多种方法,通过(a)基于拉格朗日的方法;(b)元梯度来找到期望回报和最小化约束违规之间的良好权衡;以及(c)基于成本违规的交替梯度来匹配专家分布在轨迹成本约束存在的情况下。我们经验性地展示,主要的模仿学习方法对成本受限行为的模仿效果不佳,而我们基于元梯度的方法实现了最佳性能。
更新时间: 2024-03-27 02:39:26
领域: cs.LG,cs.AI
An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition
Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.
Updated: 2024-03-27 02:39:23
标题: 一种具有自适应多模态融合的手势识别的进化网络架构搜索框架
摘要: 基于多模态数据的手势识别(HGR)引起了广泛关注,因为它在应用中具有巨大潜力。各种手动设计的多模态深度网络在多模态HGR(MHGR)中表现良好,但大多数现有算法需要大量专家经验和耗时的手动尝试。为了解决这些问题,我们提出了一种具有自适应多模型融合(AMF-ENAS)的进化网络架构搜索框架。具体地,我们设计了一个编码空间,同时考虑多模态数据的融合位置和比率,允许通过解码自动构建具有不同架构的多模态网络。此外,我们考虑了三个输入流,分别对应于内部模态表面肌电图(sEMG)、内部模态加速度计(ACC)和跨模态sEMG-ACC。为了自动适应各种数据集,ENAS框架被设计成自动搜索具有适当融合位置和比率的MHGR网络。据我们所知,这是ENAS首次在MHGR中被用于处理与多模态数据融合位置和比率相关的问题。实验结果表明,AMF-ENAS在Ninapro DB2、DB3和DB7数据集上实现了最先进的性能。
更新时间: 2024-03-27 02:39:23
领域: cs.CV,cs.AI,cs.NE
Exploring the Privacy Protection Capabilities of Chinese Large Language Models
Large language models (LLMs), renowned for their impressive capabilities in various tasks, have significantly advanced artificial intelligence. Yet, these advancements have raised growing concerns about privacy and security implications. To address these issues and explain the risks inherent in these models, we have devised a three-tiered progressive framework tailored for evaluating privacy in language systems. This framework consists of progressively complex and in-depth privacy test tasks at each tier. Our primary objective is to comprehensively evaluate the sensitivity of large language models to private information, examining how effectively they discern, manage, and safeguard sensitive data in diverse scenarios. This systematic evaluation helps us understand the degree to which these models comply with privacy protection guidelines and the effectiveness of their inherent safeguards against privacy breaches. Our observations indicate that existing Chinese large language models universally show privacy protection shortcomings. It seems that at the moment this widespread issue is unavoidable and may pose corresponding privacy risks in applications based on these models.
Updated: 2024-03-27 02:31:54
标题: 探索中国大型语言模型的隐私保护能力
摘要: 大型语言模型(LLMs)以其在各种任务中的出色能力而闻名,已经显著推进了人工智能。然而,这些进展引发了对隐私和安全影响日益增长的担忧。为了解决这些问题并解释这些模型固有风险,我们设计了一个针对语言系统隐私评估的三层递进性框架。该框架在每个层次包括逐渐复杂和深入的隐私测试任务。我们的主要目标是全面评估大型语言模型对私人信息的敏感性,检查它们在不同情景中如何有效地辨别、管理和保护敏感数据。这种系统评估帮助我们了解这些模型在隐私保护准则方面的遵守程度以及它们固有防护措施对抗隐私侵犯的效果。我们的观察表明,现有的中国大型语言模型普遍存在隐私保护不足的问题。目前似乎这种普遍问题是不可避免的,并可能在基于这些模型的应用中带来相应的隐私风险。
更新时间: 2024-03-27 02:31:54
领域: cs.AI
EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications
Artificial intelligence (AI) techniques are widely applied in the life sciences. However, applying innovative AI techniques to understand and deconvolute biological complexity is hindered by the learning curve for life science scientists to understand and use computing languages. An open-source, user-friendly interface for AI models, that does not require programming skills to analyze complex biological data will be extremely valuable to the bioinformatics community. With easy access to different sequencing technologies and increased interest in different 'omics' studies, the number of biological datasets being generated has increased and analyzing these high-throughput datasets is computationally demanding. The majority of AI libraries today require advanced programming skills as well as machine learning, data preprocessing, and visualization skills. In this research, we propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning (ML) models without manual intervention or coding expertise. By integrating traditional machine learning and deep neural network models with visualizations, our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets, including images, languages, and one-dimensional numerical data, for drug discovery, pathogen classification, and medical diagnostics.
Updated: 2024-03-27 02:24:38
标题: EndToEndML:一个开源的端到端机器学习应用程序管道
摘要: 人工智能(AI)技术被广泛应用于生命科学领域。然而,将创新的AI技术应用于理解和解决生物复杂性受到了生命科学科学家了解和使用计算语言的学习曲线的阻碍。一个开源、用户友好的AI模型界面,不需要编程技能来分析复杂的生物数据,对生物信息学社区将是极其有价值的。随着不同测序技术的易于获取和对不同“组学”研究的兴趣增加,产生的生物数据集数量增加,分析这些高通量数据集是需要计算的。如今大多数AI库需要高级编程技能以及机器学习、数据预处理和可视化技能。在这项研究中,我们提出了一个基于网络的端到端管道,能够在没有手动干预或编码专业知识的情况下预处理、训练、评估和可视化机器学习(ML)模型。通过将传统机器学习和深度神经网络模型与可视化结合,我们的库有助于识别、分类、聚类和预测各种多模态、多传感器数据集,包括图像、语言和一维数值数据,用于药物发现、病原体分类和医学诊断。
更新时间: 2024-03-27 02:24:38
领域: cs.AI
The Innovation Paradox: Concept Space Expansion with Diminishing Originality and the Promise of Creative AI
Innovation, typically spurred by reusing, recombining, and synthesizing existing concepts, is expected to result in an exponential growth of the concept space over time. However, our statistical analysis of TechNet, which is a comprehensive technology semantic network encompassing over four million concepts derived from patent texts, reveals a linear rather than exponential expansion of the overall technological concept space. Moreover, there is a notable decline in the originality of newly created concepts. These trends can be attributed to the constraints of human cognitive abilities to innovate beyond an ever-growing space of prior art, among other factors. Integrating creative artificial intelligence (CAI) into the innovation process holds the potential to overcome these limitations and alter the observed trends in the future.
Updated: 2024-03-27 02:23:29
标题: 创新悖论:概念空间扩展与原创性减少及创造性人工智能的应许
摘要: 创新通常受到重新利用、重新组合和综合现有概念的推动,预计会导致概念空间随时间呈指数增长。然而,我们对TechNet进行的统计分析显示,这是一个包含来自专利文本的四百万个概念的综合技术语义网络,揭示了整体技术概念空间的扩展是线性的,而不是指数级的。此外,新创建概念的独创性明显下降。这些趋势可以归因于人类认知能力的限制,无法在不断增长的先前技术空间之外进行创新,还有其他因素。将创造性人工智能(CAI)整合到创新过程中有潜力克服这些限制,并改变未来观察到的趋势。
更新时间: 2024-03-27 02:23:29
领域: cs.SI,cs.AI
LLMs in Political Science: Heralding a New Era of Visual Analysis
Interest is increasing among political scientists in leveraging the extensive information available in images. However, the challenge of interpreting these images lies in the need for specialized knowledge in computer vision and access to specialized hardware. As a result, image analysis has been limited to a relatively small group within the political science community. This landscape could potentially change thanks to the rise of large language models (LLMs). This paper aims to raise awareness of the feasibility of using Gemini for image content analysis. A retrospective analysis was conducted on a corpus of 688 images. Content reports were elicited from Gemini for each image and then manually evaluated by the authors. We find that Gemini is highly accurate in performing object detection, which is arguably the most common and fundamental task in image analysis for political scientists. Equally important, we show that it is easy to implement as the entire command consists of a single prompt in natural language; it is fast to run and should meet the time budget of most researchers; and it is free to use and does not require any specialized hardware. In addition, we illustrate how political scientists can leverage Gemini for other image understanding tasks, including face identification, sentiment analysis, and caption generation. Our findings suggest that Gemini and other similar LLMs have the potential to drastically stimulate and accelerate image research in political science and social sciences more broadly.
Updated: 2024-03-27 02:21:03
标题: 政治学中的LLMs:预示着视觉分析的新时代
摘要: 政治学家越来越感兴趣利用图像中丰富的信息。然而,解释这些图像的挑战在于需要计算机视觉领域的专业知识和专门硬件的访问。因此,图像分析一直局限于政治科学界中的一个相对较小的群体。这种格局可能会因为大型语言模型(LLMs)的兴起而发生改变。本文旨在提高使用Gemini进行图像内容分析的可行性。对一个包含688张图像的语料库进行了回顾性分析。对于每张图片,从Gemini中获得了内容报告,然后由作者进行了手动评估。我们发现Gemini在执行对象检测方面非常准确,这可以说是政治学家进行图像分析中最常见和最基本的任务。同样重要的是,我们展示了它易于实施,整个命令只需一个自然语言提示;运行速度快,应该能够满足大多数研究人员的时间预算;并且免费使用,不需要任何专门的硬件。此外,我们还说明了政治学家如何利用Gemini进行其他图像理解任务,包括面部识别、情感分析和标题生成。我们的研究结果表明,Gemini和其他类似的LLMs有潜力极大地刺激和加速政治科学和社会科学领域的图像研究。
更新时间: 2024-03-27 02:21:03
领域: cs.CV,cs.AI
Looking Beyond What You See: An Empirical Analysis on Subgroup Intersectional Fairness for Multi-label Chest X-ray Classification Using Social Determinants of Racial Health Inequities
There has been significant progress in implementing deep learning models in disease diagnosis using chest X- rays. Despite these advancements, inherent biases in these models can lead to disparities in prediction accuracy across protected groups. In this study, we propose a framework to achieve accurate diagnostic outcomes and ensure fairness across intersectional groups in high-dimensional chest X- ray multi-label classification. Transcending traditional protected attributes, we consider complex interactions within social determinants, enabling a more granular benchmark and evaluation of fairness. We present a simple and robust method that involves retraining the last classification layer of pre-trained models using a balanced dataset across groups. Additionally, we account for fairness constraints and integrate class-balanced fine-tuning for multi-label settings. The evaluation of our method on the MIMIC-CXR dataset demonstrates that our framework achieves an optimal tradeoff between accuracy and fairness compared to baseline methods.
Updated: 2024-03-27 02:13:20
标题: 超越表面所见:基于社会决定因素的种族健康不平等,对多标签胸部X射线分类的子组交叉公平性进行实证分析
摘要: 在使用胸部X射线进行疾病诊断方面,实施深度学习模型取得了显著进展。尽管取得了这些进展,但这些模型中固有的偏见可能导致在受保护群体之间的预测准确性存在差异。在本研究中,我们提出了一个框架,以实现准确的诊断结果,并确保在高维度胸部X射线多标签分类中跨交叉群体的公平性。超越传统的受保护属性,我们考虑社会决定因素内部的复杂交互作用,实现更细粒度的公平性基准和评估。我们提出了一种简单而强大的方法,涉及使用跨群体平衡的数据集重新训练预训练模型的最后分类层。此外,我们考虑公平性约束,并将类平衡微调应用于多标签设置。我们在MIMIC-CXR数据集上对我们的方法进行评估,结果表明,与基线方法相比,我们的框架在准确性和公平性之间取得了最佳的折衷。
更新时间: 2024-03-27 02:13:20
领域: cs.LG,cs.AI,cs.CV,cs.CY
SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly Network
Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a self-correction module can partially alleviate such issues. Motivated by this concern, we introduce the single-step assembly error correction task, which involves identifying and rectifying misassembled components. To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures. Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel method to address this task. SCANet treats assembled components as queries, determining their correctness in manual images and providing corrections when necessary. Finally, we utilize SCANet to correct the assembly results of MEPNet. Experimental results demonstrate that SCANet can identify and correct MEPNet's misassembled results, significantly improving the correctness of assembly. Our code and dataset are available at https://github.com/Yaser-wyx/SCANet.
Updated: 2024-03-27 02:08:12
标题: SCANet:使用自校正组装网络纠正乐高组装错误
摘要: 在机器人学和3D视觉中,自主装配存在重要挑战,尤其是确保装配正确性。目前,主要方法如MEPNet专注于根据手动提供的图像装配组件。然而,这些方法在需要长期规划的任务上通常难以取得令人满意的结果。同时,我们观察到,整合自我校正模块可以部分缓解此类问题。受到这一关注的启发,我们引入了单步装配错误校正任务,涉及识别和纠正组件装配错误。为支持这一领域的研究,我们提出了LEGO错误校正装配数据集(LEGO-ECA),包括手动图像用于装配步骤和装配失败实例。此外,我们提出了自我校正装配网络(SCANet),这是一种解决此任务的新方法。SCANet将已装配的组件视为查询,在手动图像中确定它们的正确性,并在必要时提供更正。最后,我们利用SCANet纠正MEPNet的装配结果。实验结果表明,SCANet能够识别和纠正MEPNet的组件错误装配结果,显着提高了装配的正确性。我们的代码和数据集可在https://github.com/Yaser-wyx/SCANet 上获得。
更新时间: 2024-03-27 02:08:12
领域: cs.RO,cs.AI
Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples
Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected with equal probability when constructing mini-batches. However, the intrinsic class imbalance in multi-label data may bias the model towards majority labels, since samples relevant to minority labels may be underrepresented in each mini-batch. Meanwhile, during the training process, we observe that instances associated with minority labels tend to induce greater losses. Existing heuristic batch selection methods, such as priority selection of samples with high contribution to the objective function, i.e., samples with high loss, have been proven to accelerate convergence while reducing the loss and test error in single-label data. However, batch selection methods have not yet been applied and validated in multi-label data. In this study, we introduce a simple yet effective adaptive batch selection algorithm tailored to multi-label deep learning models. It adaptively selects each batch by prioritizing hard samples related to minority labels. A variant of our method also takes informative label correlations into consideration. Comprehensive experiments combining five multi-label deep learning models on thirteen benchmark datasets show that our method converges faster and performs better than random batch selection.
Updated: 2024-03-27 02:00:18
标题: 多标签自适应批量选择:突出困难和不平衡样本
摘要: 深度神经网络模型已经证明了它们在分类来自各个领域的多标签数据中的有效性。通常,它们采用结合小批量和优化器的训练模式,其中在构建小批量时每个样本都是以相等概率随机选择的。然而,在多标签数据中固有的类别不平衡可能会使模型偏向于多数标签,因为与少数标签相关的样本在每个小批量中可能被低估。同时,在训练过程中,我们观察到与少数标签相关的实例往往会引起更大的损失。现有的启发式批量选择方法,例如优先选择对目标函数贡献较高的样本,即损失较高的样本,已被证明可以加速收敛,同时降低单标签数据中的损失和测试误差。然而,批量选择方法尚未在多标签数据中应用和验证。在这项研究中,我们引入了一种简单而有效的自适应批量选择算法,专门针对多标签深度学习模型。它通过优先选择与少数标签相关的困难样本来自适应地选择每个批次。我们方法的一个变体还考虑了信息丰富的标签相关性。在十三个基准数据集上结合五个多标签深度学习模型进行的综合实验表明,我们的方法比随机批量选择收敛更快,表现更好。
更新时间: 2024-03-27 02:00:18
领域: cs.LG
Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language Models
Fine-tuning in information retrieval systems using pre-trained language models (PLM-based IR) requires learning query representations and query-document relations, in addition to downstream task-specific learning. This study introduces coarse-tuning as an intermediate learning stage that bridges pre-training and fine-tuning. By learning query representations and query-document relations in coarse-tuning, we aim to reduce the load of fine-tuning and improve the learning effect of downstream IR tasks. We propose Query-Document Pair Prediction (QDPP) for coarse-tuning, which predicts the appropriateness of query-document pairs. Evaluation experiments show that the proposed method significantly improves MRR and/or nDCG@5 in four ad-hoc document retrieval datasets. Furthermore, the results of the query prediction task suggested that coarse-tuning facilitated learning of query representation and query-document relations.
Updated: 2024-03-27 01:53:36
标题: 使用预训练语言模型进行Ad-hoc文档检索的粗调优
摘要: 使用预训练语言模型(PLM-based IR)在信息检索系统中进行微调需要学习查询表示和查询-文档关系,除了下游任务特定学习。本研究引入了粗调谐作为一个中间学习阶段,将预训练和微调连接起来。通过在粗调谐中学习查询表示和查询-文档关系,我们旨在减轻微调的负担,并改善下游IR任务的学习效果。我们提出了Query-Document Pair Prediction(QDPP)用于粗调谐,预测查询-文档对的适宜性。评估实验显示,所提出的方法显著改善了四个ad-hoc文档检索数据集中的MRR和/或nDCG@5。此外,查询预测任务的结果表明,粗调谐促进了查询表示和查询-文档关系的学习。
更新时间: 2024-03-27 01:53:36
领域: cs.IR,cs.AI,cs.CL,cs.LG
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models
Large language models (LLMs) still grapple with complex tasks like mathematical reasoning. Despite significant efforts invested in improving prefix prompts or reasoning process, the crucial role of problem context might have been neglected. Accurate recognition of inputs is fundamental for solving mathematical tasks, as ill-formed problems could potentially mislead LLM's reasoning. In this study, we propose a new approach named Problem Elaboration Prompting (PEP) to enhance the mathematical capacities of LLMs. Specifically, PEP decomposes and elucidates the problem context before reasoning, therefore enhancing the context modeling and parsing efficiency. Experiments across datasets and models demonstrate promising performances: (1) PEP demonstrates an overall enhancement in various mathematical tasks. For instance, with the GPT-3.5 model, PEP exhibits improvements of 9.93% and 8.80% on GSM8k through greedy decoding and self-consistency, respectively. (2) PEP can be easily implemented and integrated with other prompting methods. (3) PEP shows particular strength in handling distraction problems.
Updated: 2024-03-27 01:23:58
标题: 审慎行事:问题阐述提示改善大型语言模型的数学推理
摘要: 大型语言模型(LLMs)仍然在处理数学推理等复杂任务时面临困难。尽管在改进前缀提示或推理过程方面投入了大量努力,但问题背景的重要作用可能已被忽视。准确识别输入对于解决数学任务至关重要,因为不完整的问题可能会误导LLM的推理。在这项研究中,我们提出了一种名为问题阐述提示(PEP)的新方法,以增强LLMs的数学能力。具体而言,PEP在推理之前分解和阐明问题背景,从而增强了上下文建模和解析效率。跨数据集和模型的实验表明了令人期待的表现:(1)PEP在各种数学任务中总体上展现出增强效果。例如,使用GPT-3.5模型,PEP在GSM8k上通过贪婪解码和自一致性分别表现出9.93%和8.80%的改进。(2)PEP可以轻松实施并与其他提示方法集成。(3)PEP在处理干扰问题方面表现出特别优势。
更新时间: 2024-03-27 01:23:58
领域: cs.CL,cs.AI
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
A well-designed document communicates not only through its words but also through its visual eloquence. Authors utilize aesthetic elements such as colors, fonts, graphics, and layouts to shape the perception of information. Thoughtful document design, informed by psychological insights, enhances both the visual appeal and the comprehension of the content. While state-of-the-art document AI models demonstrate the benefits of incorporating layout and image data, it remains unclear whether the nuances of document aesthetics are effectively captured. To bridge the gap between human cognition and AI interpretation of aesthetic elements, we formulated hypotheses concerning AI behavior in document understanding tasks, specifically anchored in document design principles. With a focus on legibility and layout quality, we tested four aspects of aesthetic effects: noise, font-size contrast, alignment, and complexity, on model confidence using correlational analysis. The results and observations highlight the value of model analysis rooted in document design theories. Our work serves as a trailhead for further studies and we advocate for continued research in this topic to deepen our understanding of how AI interprets document aesthetics.
Updated: 2024-03-27 01:21:48
标题: AI模型能否欣赏文档美学?关于预测置信度与可读性、布局质量的探讨
摘要: 一个设计良好的文档不仅通过其文字,还通过其视觉优美来传达信息。作者利用美学元素,如颜色、字体、图形和布局来塑造信息的感知。深思熟虑的文档设计,结合心理学见解,增强了内容的视觉吸引力和理解力。虽然最先进的文档AI模型展示了将布局和图像数据纳入考虑的好处,但目前还不清楚文档美学的细微差别是否被有效捕捉。为了弥合人类认知与AI对美学元素的解释之间的差距,我们制定了关于AI在文档理解任务中行为的假设,特别是基于文档设计原则。我们专注于易读性和布局质量,通过相关性分析测试了四个美学效果方面:噪声、字体大小对比、对齐和复杂性对模型信心的影响。结果和观察突显了基于文档设计理论的模型分析的价值。我们的工作为进一步研究开辟了道路,我们主张在这一主题上继续研究,以深化我们对AI如何解释文档美学的理解。
更新时间: 2024-03-27 01:21:48
领域: cs.AI,cs.IR
Compression of the Koopman matrix for nonlinear physical models via hierarchical clustering
Machine learning methods allow the prediction of nonlinear dynamical systems from data alone. The Koopman operator is one of them, which enables us to employ linear analysis for nonlinear dynamical systems. The linear characteristics of the Koopman operator are hopeful to understand the nonlinear dynamics and perform rapid predictions. The extended dynamic mode decomposition (EDMD) is one of the methods to approximate the Koopman operator as a finite-dimensional matrix. In this work, we propose a method to compress the Koopman matrix using hierarchical clustering. Numerical demonstrations for the cart-pole model and comparisons with the conventional singular value decomposition (SVD) are shown; the results indicate that the hierarchical clustering performs better than the naive SVD compressions.
Updated: 2024-03-27 01:18:00
标题: 通过分层聚类压缩非线性物理模型中的Koopman矩阵
摘要: 机器学习方法允许仅凭数据对非线性动态系统进行预测。Koopman算子是其中之一,它使我们能够对非线性动态系统应用线性分析。Koopman算子的线性特性有望帮助我们理解非线性动态并进行快速预测。扩展动态模态分解(EDMD)是一种近似Koopman算子为有限维矩阵的方法之一。在这项工作中,我们提出了一种使用分层聚类压缩Koopman矩阵的方法。展示了对倒立摆模型的数值演示,并与传统的奇异值分解(SVD)进行比较;结果表明,分层聚类优于朴素的SVD压缩。
更新时间: 2024-03-27 01:18:00
领域: cs.LG,math.DS
Mistake, Manipulation and Margin Guarantees in Online Strategic Classification
We consider an online strategic classification problem where each arriving agent can manipulate their true feature vector to obtain a positive predicted label, while incurring a cost that depends on the amount of manipulation. The learner seeks to predict the agent's true label given access to only the manipulated features. After the learner releases their prediction, the agent's true label is revealed. Previous algorithms such as the strategic perceptron guarantee finitely many mistakes under a margin assumption on agents' true feature vectors. However, these are not guaranteed to encourage agents to be truthful. Promoting truthfulness is intimately linked to obtaining adequate margin on the predictions, thus we provide two new algorithms aimed at recovering the maximum margin classifier in the presence of strategic agent behavior. We prove convergence, finite mistake and finite manipulation guarantees for a variety of agent cost structures. We also provide generalized versions of the strategic perceptron with mistake guarantees for different costs. Our numerical study on real and synthetic data demonstrates that the new algorithms outperform previous ones in terms of margin, number of manipulation and number of mistakes.
Updated: 2024-03-27 01:05:45
标题: 网络战略分类中的错误、操纵和保证金担保
摘要: 我们考虑一个在线战略分类问题,每到达一个代理都可以操纵他们的真实特征向量以获得一个预测为正的标签,同时会产生一个取决于操纵量的成本。学习者试图在仅访问操纵特征的情况下预测代理的真实标签。在学习者发布他们的预测后,代理的真实标签将被揭示。先前的算法,如战略感知器,保证在代理的真实特征向量上存在间隔假设时有有限的错误次数。然而,这些算法不能保证鼓励代理说真话。促进说真话与获取足够的预测间隔密切相关,因此我们提供了两种旨在在存在战略代理行为下恢复最大间隔分类器的新算法。我们证明了对多种代理成本结构的收敛性、有限错误和有限操纵保证。我们还提供了战略感知器的泛化版本,对不同成本提供了错误保证。我们在真实和合成数据上的数值研究表明,这些新算法在间隔、操纵次数和错误次数方面优于先前的算法。
更新时间: 2024-03-27 01:05:45
领域: cs.LG,cs.GT,math.OC
Deep Learning-Driven Approach for Handwritten Chinese Character Classification
Handwritten character recognition (HCR) is a challenging problem for machine learning researchers. Unlike printed text data, handwritten character datasets have more variation due to human-introduced bias. With numerous unique character classes present, some data, such as Logographic Scripts or Sino-Korean character sequences, bring new complications to the HCR problem. The classification task on such datasets requires the model to learn high-complexity details of the images that share similar features. With recent advances in computational resource availability and further computer vision theory development, some research teams have effectively addressed the arising challenges. Although known for achieving high accuracy while keeping the number of parameters small, many common approaches are still not generalizable and use dataset-specific solutions to achieve better results. Due to complex structure, existing methods frequently prevent the solutions from gaining popularity. This paper proposes a highly scalable approach for detailed character image classification by introducing the model architecture, data preprocessing steps, and testing design instructions. We also perform experiments to compare the performance of our method with that of existing ones to show the improvements achieved.
Updated: 2024-03-27 00:46:26
标题: 深度学习驱动的手写汉字分类方法
摘要: 手写字符识别(HCR)是机器学习研究人员面临的一个具有挑战性的问题。与印刷文本数据不同,手写字符数据集由于人为引入的偏差而具有更多的变化。由于存在众多独特的字符类别,一些数据,如象形文字脚本或汉字韩文字符序列,给HCR问题带来了新的复杂性。在这些数据集上的分类任务需要模型学习图像的高复杂性细节,这些图像具有相似的特征。随着计算资源的最新进展和进一步的计算机视觉理论发展,一些研究团队有效地解决了出现的挑战。尽管以高精确度著称并保持参数数量较少,许多常见方法仍不具有泛化能力,并使用特定于数据集的解决方案以获得更好的结果。由于复杂的结构,现有方法经常阻止解决方案获得普及。本文提出了一种高度可扩展的方法,通过引入模型架构、数据预处理步骤和测试设计说明来进行详细字符图像分类。我们还进行实验,比较我们的方法与现有方法的性能,以展示所取得的改进。
更新时间: 2024-03-27 00:46:26
领域: cs.CV,cs.LG
Follower Agnostic Methods for Stackelberg Games
In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers' utility functions or strategy space. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a departure from traditional assumptions of optimal play, we model followers' responses using a convergent adaptation rule, allowing for realistic and dynamic interactions. The leader constructs the gradient estimator solely based on observations of followers' actions. We provide both non-asymptotic convergence rates to stationary points of the leader's objective and demonstrate asymptotic convergence to a \emph{local Stackelberg equilibrium}. To validate the effectiveness of our algorithm, we use this algorithm to solve the problem of incentive design on a large-scale transportation network, showcasing its robustness even when the leader lacks access to followers' demand.
Updated: 2024-03-27 00:38:33
标题: 无关追随者的斯塔克贝格博弈方法
摘要: 在这篇论文中,我们提出了一种有效的算法来解决在线斯塔克伯格博弈,特点是具有多个追随者,以一种追随者不可知的方式。与先前的工作不同,我们的方法即使领导者对追随者的效用函数或策略空间一无所知也能运行。我们的算法引入了一种独特的梯度估计器,利用特别设计的策略来探测追随者。与传统的最优游戏假设不同,我们使用一个收敛的适应规则来模拟追随者的响应,允许进行逼真和动态的互动。领导者仅基于观察到的追随者行动来构建梯度估计器。我们提供了领导者目标的非渐近收敛速率和演示到一个局部斯塔克伯格均衡的渐近收敛。为了验证我们算法的有效性,我们使用这个算法来解决一个大规模交通网络上的激励设计问题,展示了即使领导者无法获得追随者需求时其稳健性。
更新时间: 2024-03-27 00:38:33
领域: math.OC,cs.AI,cs.GT,math.DS,91A65
Mechanisms of non-factual hallucinations in language models
State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. Despite extensive efforts to detect and mitigate hallucinations, understanding their internal mechanisms remains elusive. Our study investigates the mechanistic causes of hallucination, specifically non-factual ones where the LM incorrectly predicts object attributes in response to subject-relation queries. With causal mediation analysis and embedding space projection, we identify two general mechanistic causes of hallucinations shared across LMs of various scales and designs: 1) insufficient subject attribute knowledge in lower layer MLPs, and 2) failing to select the correct object attribute in upper layer attention heads and MLPs. These two mechanisms exhibit varying degrees of subject-object association, predictive uncertainty and perturbation robustness. Additionally, we scrutinize LM pre-training checkpoints, revealing distinct learning dynamics for the two mechanistic causes of hallucinations. We also highlight how attribution features from our causal analysis can effectively construct hallucination detectors. Our work proposes a mechanistic understanding of LM factual errors.
Updated: 2024-03-27 00:23:03
标题: 语言模型中非事实性幻觉的机制
摘要: 目前的语言模型(LMs)有时会生成与世界知识不符的非事实幻觉。尽管已经付出了大量努力来检测和减轻幻觉,但对其内部机制的理解仍然是困难的。我们的研究调查了幻觉的机制原因,特别是非事实幻觉,其中LM在对主题关系查询的回应中错误地预测对象属性。通过因果中介分析和嵌入空间投影,我们确定了跨不同规模和设计的LMs共享的两种幻觉的一般机制原因:1)在较低层MLPs中主题属性知识不足,2)在较高层注意力头和MLPs中选择正确的对象属性失败。这两种机制表现出不同程度的主体-客体关联、预测不确定性和扰动稳健性。此外,我们审查了LM预训练检查点,揭示了两种幻觉机制原因的不同学习动态。我们还强调了如何从我们的因果分析中提取的归因特征可以有效地构建幻觉探测器。我们的工作提出了对LM事实错误的机制理解。
更新时间: 2024-03-27 00:23:03
领域: cs.CL,cs.AI
Learning to Act without Actions
Pre-training large models on vast amounts of web data has proven to be an effective approach for obtaining powerful, general models in domains such as language and vision. However, this paradigm has not yet taken hold in reinforcement learning. This is because videos, the most abundant form of embodied behavioral data on the web, lack the action labels required by existing methods for imitating behavior from demonstrations. We introduce Latent Action Policies (LAPO), a method for recovering latent action information, and thereby latent-action policies, world models, and inverse dynamics models, purely from videos. LAPO is the first method able to recover the structure of the true action space just from observed dynamics, even in challenging procedurally-generated environments. LAPO enables training latent-action policies that can be rapidly fine-tuned into expert-level policies, either offline using a small action-labeled dataset, or online with rewards. LAPO takes a first step towards pre-training powerful, generalist policies and world models on the vast amounts of videos readily available on the web.
Updated: 2024-03-27 00:15:16
标题: 学会无所作为地行动
摘要: 在大量网络数据上对大型模型进行预训练已被证明是获得强大、通用模型的有效方法,尤其在语言和视觉等领域。然而,这种范式尚未在强化学习中得到应用。这是因为视频作为网络上最丰富的行为数据形式,缺乏现有方法所需的动作标签,无法模仿行为演示。我们引入了潜在动作策略(LAPO),一种从视频中恢复潜在动作信息,从而恢复潜在动作策略、世界模型和逆动力学模型的方法。LAPO是第一种能够仅从观察到的动态中恢复真实动作空间结构的方法,即使在具有挑战性的程序生成环境中也是如此。LAPO使得能够训练潜在动作策略,可以通过离线使用小规模动作标记数据集或在线奖励快速微调为专家级策略。LAPO迈出了向网络上大量视频上预训练强大、通用策略和世界模型迈出的第一步。
更新时间: 2024-03-27 00:15:16
领域: cs.LG,cs.AI
Optimizing Cyber Response Time on Temporal Active Directory Networks Using Decoys
Microsoft Active Directory (AD) is the default security management system for Window domain network. We study the problem of placing decoys in AD network to detect potential attacks. We model the problem as a Stackelberg game between an attacker and a defender on AD attack graphs where the defender employs a set of decoys to detect the attacker on their way to Domain Admin (DA). Contrary to previous works, we consider time-varying (temporal) attack graphs. We proposed a novel metric called response time, to measure the effectiveness of our decoy placement in temporal attack graphs. Response time is defined as the duration from the moment attackers trigger the first decoy to when they compromise the DA. Our goal is to maximize the defender's response time to the worst-case attack paths. We establish the NP-hard nature of the defender's optimization problem, leading us to develop Evolutionary Diversity Optimization (EDO) algorithms. EDO algorithms identify diverse sets of high-quality solutions for the optimization problem. Despite the polynomial nature of the fitness function, it proves experimentally slow for larger graphs. To enhance scalability, we proposed an algorithm that exploits the static nature of AD infrastructure in the temporal setting. Then, we introduce tailored repair operations, ensuring the convergence to better results while maintaining scalability for larger graphs.
Updated: 2024-03-27 00:05:48
标题: 通过使用诱饵在时间上优化活跃目录网络的网络响应时间
摘要: 微软活动目录(AD)是Window域网络的默认安全管理系统。我们研究在AD网络中放置诱饵以检测潜在攻击的问题。我们将问题建模为一个Stackelberg博弈,攻击者和防御者在AD攻击图上展开对抗,防御者利用一组诱饵来检测攻击者前往域管理员(DA)的过程。与先前的研究不同,我们考虑了时变(时间)攻击图。我们提出了一种名为响应时间的新度量标准,用于衡量我们在时变攻击图中放置诱饵的有效性。响应时间被定义为攻击者触发第一个诱饵到他们占领DA的持续时间。我们的目标是最大化防御者对最坏情况攻击路径的响应时间。我们确定了防御者优化问题的NP难度性质,这导致我们开发了进化多样性优化(EDO)算法。EDO算法确定了优化问题的多样化高质量解集。尽管适应函数的多项式性质,但在较大图中实验上证明较慢。为了增强可扩展性,我们提出了一种利用时变设置中AD基础设施的静态特性的算法。然后,我们介绍定制修复操作,确保收敛到更好的结果同时保持较大图的可扩展性。
更新时间: 2024-03-27 00:05:48
领域: cs.CR,cs.GT,cs.NE