MatchSeg: Towards Better Segmentation via Reference Image Matching
Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the query set. Inspired by this paradigm, we introduce MatchSeg, a novel framework that enhances medical image segmentation through strategic reference image matching. We leverage contrastive language-image pre-training (CLIP) to select highly relevant samples when defining the support set. Additionally, we design a joint attention module to strengthen the interaction between support and query features, facilitating a more effective knowledge transfer between support and query sets. We validated our method across four public datasets. Experimental results demonstrate superior segmentation performance and powerful domain generalization ability of MatchSeg against existing methods for domain-specific and cross-domain segmentation tasks. Our code is made available at https://github.com/keeplearning-again/MatchSeg
Updated: 2024-08-17 23:49:15
标题: MatchSeg:通过参考图像匹配实现更好的分割
摘要: 最近,基于深度学习的自动化医学图像分割方法取得了巨大成功。然而,它们严重依赖于昂贵且耗时的大型标注数据集。少样本学习旨在通过使用一个小型标记数据集(称为支持集)来引导对新的、未标记图像(称为查询集)的标签预测,从而克服了对标注数据的需求。受到这一范例的启发,我们介绍了MatchSeg,一个通过战略性参考图像匹配增强医学图像分割的新框架。我们利用对比性语言-图像预训练(CLIP)来在定义支持集时选择高度相关的样本。此外,我们设计了一个联合注意力模块,以加强支持和查询特征之间的交互,促进支持和查询集之间更有效的知识传递。我们在四个公共数据集上验证了我们的方法。实验结果表明,与现有方法相比,MatchSeg在特定领域和跨领域分割任务中表现出卓越的分割性能和强大的领域泛化能力。我们的代码可在https://github.com/keeplearning-again/MatchSeg 上获得。
更新时间: 2024-08-17 23:49:15
领域: cs.AI,cs.CV
Direction of Arrival Estimation with Sparse Subarrays
This paper proposes design techniques for partially-calibrated sparse linear subarrays and algorithms to perform direction-of-arrival (DOA) estimation. First, we introduce array architectures that incorporate two distinct array categories, namely type-I and type-II arrays. The former breaks down a known sparse linear geometry into as many pieces as we need, and the latter employs each subarray such as it fits a preplanned sparse linear geometry. Moreover, we devise two Direction of Arrival (DOA) estimation algorithms that are suitable for partially-calibrated array scenarios within the coarray domain. The algorithms are capable of estimating a greater number of sources than the number of available physical sensors, while maintaining the hardware and computational complexity within practical limits for real-time implementation. To this end, we exploit the intersection of projections onto affine spaces by devising the Generalized Coarray Multiple Signal Classification (GCA-MUSIC) in conjunction with the estimation of a refined projection matrix related to the noise subspace, as proposed in the GCA root-MUSIC algorithm. An analysis is performed for the devised subarray configurations in terms of degrees of freedom, as well as the computation of the Cram\`er-Rao Lower Bound for the utilized data model, in order to demonstrate the good performance of the proposed methods. Simulations assess the performance of the proposed design methods and algorithms against existing approaches.
Updated: 2024-08-17 23:47:24
标题: 使用稀疏子阵列进行到达方向估计
摘要: 本文提出了设计部分校准的稀疏线性子阵列的技术和用于进行到达方向(DOA)估计的算法。首先,我们介绍了结合了两种不同阵列类别的阵列架构,即类型I和类型II阵列。前者将已知的稀疏线性几何形状分解为我们需要的部分,后者利用每个子阵列,使其适应预先规划的稀疏线性几何形状。此外,我们设计了两种适用于部分校准阵列场景的到达方向(DOA)估计算法,这些算法位于共阵列域内。这些算法能够估计比可用物理传感器数量更多的信号源,同时保持硬件和计算复杂性在实时实现的实际限制范围内。为此,我们利用在仿射空间中的投影交集,通过设计广义共阵列多信号分类(GCA-MUSIC)算法,并结合GCA根MUSIC算法提出的有关噪声子空间的精细投影矩阵的估计。在术语自由度和利用的数据模型的Cram\`er-Rao下界的计算方面对设计的子阵列配置进行了分析,以展示所提方法的良好性能。模拟评估了所提设计方法和算法与现有方法的性能。
更新时间: 2024-08-17 23:47:24
领域: eess.SP,cs.IT,cs.LG,math.IT
A Benchmark Time Series Dataset for Semiconductor Fabrication Manufacturing Constructed using Component-based Discrete-Event Simulation Models
Advancements in high-computing devices increase the necessity for improved and new understanding and development of smart manufacturing factories. Discrete-event models with simulators have been shown to be critical to architect, designing, building, and operating the manufacturing of semiconductor chips. The diffusion, implantation, and lithography machines have intricate processes due to their feedforward and feedback connectivity. The dataset collected from simulations of the factory models holds the promise of generating valuable machine-learning models. As surrogate data-based models, their executions are highly efficient compared to the physics-based counterpart models. For the development of surrogate models, it is beneficial to have publicly available benchmark simulation models that are grounded in factory models that have concise structures and accurate behaviors. Hence, in this research, a dataset is devised and constructed based on a benchmark model of an Intel semiconductor fabrication factory. The model is formalized using the Parallel Discrete-Event System Specification and executed using the DEVS-Suite simulator. The time series dataset is constructed using discrete-event time trajectories. This dataset is further analyzed and used to develop baseline univariate and multivariate machine learning models. The dataset can also be utilized in the machine learning community for behavioral analysis based on formalized and scalable component-based discrete-event models and simulations.
Updated: 2024-08-17 23:05:47
标题: 使用基于组件的离散事件仿真模型构建的半导体制造制造基准时间序列数据集
摘要: 高性能计算设备的进步增加了对智能制造厂的改进和新理解以及发展的必要性。离散事件模型与模拟器已被证明对于架构、设计、建造和操作半导体芯片制造至关重要。扩散、植入和光刻机由于其前馈和反馈连接性而具有复杂的过程。从工厂模型模拟收集的数据集有望产生有价值的机器学习模型。作为基于替代数据的模型,它们的执行效率与基于物理的对应模型相比非常高。为了开发替代模型,具有简洁结构和准确行为的基于工厂模型的公开可用基准模拟模型是有益的。因此,在这项研究中,基于英特尔半导体制造工厂的基准模型构建了一个数据集。该模型采用并行离散事件系统规范进行形式化,并使用DEVS-Suite模拟器执行。时间序列数据集是使用离散事件时间轨迹构建的。这个数据集进一步被分析并用于开发基准单变量和多变量机器学习模型。该数据集还可以在机器学习社区中用于基于形式化和可扩展的基于组件的离散事件模型和模拟的行为分析。
更新时间: 2024-08-17 23:05:47
领域: cs.LG,cs.AI
Sampling Foundational Transformer: A Theoretical Perspective
The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. To apply transformers across different data modalities, practitioners have to make specific clever data-modality-dependent constructions. In this paper, we propose Sampling Foundational Transformer (SFT) that can work on multiple data modalities (e.g., point cloud, graph, and sequence) and constraints (e.g., rotational-invariant). The existence of such model is important as contemporary foundational modeling requires operability on multiple data sources. For efficiency on large number of tokens, our model relies on our context aware sampling-without-replacement mechanism for both linear asymptotic computational complexity and real inference time gain. For efficiency, we rely on our newly discovered pseudoconvex formulation of transformer layer to increase model's convergence rate. As a model working on multiple data modalities, SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models.
Updated: 2024-08-17 22:33:06
标题: 采样基础变压器:理论视角
摘要: 自我注意机制的多功能性使变换器在几乎所有数据模态上取得了巨大成功,但存在二次复杂度和训练难度的限制。为了应用变换器跨越不同的数据模态,从业者必须进行特定的巧妙的数据模态相关构建。在本文中,我们提出了适用于多种数据模态(例如点云、图形和序列)和约束(例如旋转不变性)的采样基础变换器(SFT)。这种模型的存在对于当代基础建模非常重要,因为它需要在多个数据源上运行。为了在大量标记上提高效率,我们的模型依赖于我们的上下文感知无重复采样机制,以实现线性渐近计算复杂性和实际推理时间的提高。为了提高效率,我们依赖于我们新发现的伪凸形式的变换器层,以增加模型的收敛速度。作为一个能够处理多种数据模态的模型,SFT在许多基准测试中取得了竞争性的结果,同时在推理速度上比其他非常专门化的模型更快。
更新时间: 2024-08-17 22:33:06
领域: cs.LG,cs.CV
Data Science Principles for Interpretable and Explainable AI
Society's capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field. We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable models. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques -- including learned embeddings, integrated gradients, and concept bottlenecks -- are illustrated with a simple case study. We also review criteria for objectively evaluating interpretability approaches. Throughout, we underscore the importance of considering audience goals when designing interactive data-driven systems. Finally, we outline open challenges and discuss the potential role of data science in addressing them. Code to reproduce all examples can be found at https://go.wisc.edu/3k1ewe.
Updated: 2024-08-17 22:08:38
标题: 可解释和可解释人工智能的数据科学原则
摘要: 社会对算法问题解决的能力从未如此强大。人工智能现在应用于比以往更多的领域,这是强大抽象、丰富数据和易于获取的软件的结果。随着能力的扩展,风险也在增加,模型经常在没有完全理解其潜在影响的情况下部署。可解释性和交互式机器学习旨在使复杂模型更加透明和可控,增强用户代理。本综述综合了这一领域日益增长的文献中的关键原则。我们首先引入了用于讨论可解释性的精确词汇,如透明模型和可解释模型之间的区别。然后我们探讨与经典统计和设计原则的联系,如简洁性和交互的鸿沟。基本的可解释性技术--包括学习的嵌入、综合梯度和概念瓶颈--通过一个简单案例研究进行了说明。我们还审查了客观评估可解释性方法的标准。在整个过程中,我们强调在设计交互式数据驱动系统时考虑受众目标的重要性。最后,我们概述了未解决的挑战,并讨论了数据科学在解决这些挑战中的潜在作用。所有示例的代码可在https://go.wisc.edu/3k1ewe 找到。
更新时间: 2024-08-17 22:08:38
领域: stat.ML,cs.LG
Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model
We present Malacopula, a neural-based generalised Hammerstein model designed to introduce adversarial perturbations to spoofed speech utterances so that they better deceive automatic speaker verification (ASV) systems. Using non-linear processes to modify speech utterances, Malacopula enhances the effectiveness of spoofing attacks. The model comprises parallel branches of polynomial functions followed by linear time-invariant filters. The adversarial optimisation procedure acts to minimise the cosine distance between speaker embeddings extracted from spoofed and bona fide utterances. Experiments, performed using three recent ASV systems and the ASVspoof 2019 dataset, show that Malacopula increases vulnerabilities by a substantial margin. However, speech quality is reduced and attacks can be detected effectively under controlled conditions. The findings emphasise the need to identify new vulnerabilities and design defences to protect ASV systems from adversarial attacks in the wild.
Updated: 2024-08-17 21:58:11
标题: Malacopula:使用基于神经网络的广义Hammerstein模型进行对抗性自动说话人验证攻击
摘要: 我们提出了Malacopula,这是一个基于神经网络的广义Hammerstein模型,旨在引入对欺骗性语音话语进行敌对扰动,以更好地欺骗自动说话者验证(ASV)系统。通过使用非线性过程修改语音话语,Malacopula增强了欺骗攻击的有效性。该模型包括并行的多项式函数分支,后跟线性时不变滤波器。敌对优化过程旨在最小化从欺骗和真实话语中提取的说话者嵌入之间的余弦距离。使用三个最新的ASV系统和ASVspoof 2019数据集进行的实验表明,Malacopula显著增加了系统的易受攻击性。然而,在受控条件下,语音质量降低,攻击可以被有效地检测到。研究结果强调了有必要识别新的漏洞并设计防御措施,以保护ASV系统免受野外敌对攻击的影响。
更新时间: 2024-08-17 21:58:11
领域: eess.AS,cs.CR,cs.LG,cs.SD
Out-of-distribution materials property prediction using adversarial learning based fine-tuning
The accurate prediction of material properties is crucial in a wide range of scientific and engineering disciplines. Machine learning (ML) has advanced the state of the art in this field, enabling scientists to discover novel materials and design materials with specific desired properties. However, one major challenge that persists in material property prediction is the generalization of models to out-of-distribution (OOD) samples,i.e., samples that differ significantly from those encountered during training. In this paper, we explore the application of advancements in OOD learning approaches to enhance the robustness and reliability of material property prediction models. We propose and apply the Crystal Adversarial Learning (CAL) algorithm for OOD materials property prediction,which generates synthetic data during training to bias the training towards those samples with high prediction uncertainty. We further propose an adversarial learning based targeting finetuning approach to make the model adapted to a particular OOD dataset, as an alternative to traditional fine-tuning. Our experiments demonstrate the success of our CAL algorithm with its high effectiveness in ML with limited samples which commonly occurs in materials science. Our work represents a promising direction toward better OOD learning and materials property prediction.
Updated: 2024-08-17 21:22:21
标题: 使用基于对抗学习的微调方法进行外域材料性质预测
摘要: 材料性能的准确预测在各种科学和工程学科中至关重要。机器学习(ML)在这一领域取得了突破,使科学家能够发现新材料并设计具有特定所需性能的材料。然而,在材料性能预测中仍存在一个主要挑战,即将模型推广到分布之外(OOD)样本,即与训练过程中遇到的样本显著不同的样本。在本文中,我们探讨了OOD学习方法的进展在增强材料性能预测模型的稳健性和可靠性方面的应用。我们提出并应用了水晶对抗学习(CAL)算法用于OOD材料性能预测,该算法在训练过程中生成合成数据,以偏向那些具有高预测不确定性的样本。我们进一步提出了一种基于对抗学习的目标微调方法,以使模型适应特定OOD数据集,作为传统微调的替代方法。我们的实验表明,我们的CAL算法在材料科学中常见的有限样本中具有很高的有效性。我们的工作代表了朝着更好的OOD学习和材料性能预测的方向迈出了一步。
更新时间: 2024-08-17 21:22:21
领域: cond-mat.mtrl-sci,cs.LG
Black-Box Anomaly Attribution
When the prediction of a black-box machine learning model deviates from the true observation, what can be said about the reason behind that deviation? This is a fundamental and ubiquitous question that the end user in a business or industrial AI application often asks. The deviation may be due to a sub-optimal black-box model, or it may be simply because the sample in question is an outlier. In either case, one would ideally wish to obtain some form of attribution score -- a value indicative of the extent to which an input variable is responsible for the anomaly. In the present paper we address this task of ``anomaly attribution,'' particularly in the setting in which the model is black-box and the training data are not available. Specifically, we propose a novel likelihood-based attribution framework we call the ``likelihood compensation (LC),'' in which the responsibility score is equated with the correction on each input variable needed to attain the highest possible likelihood. We begin by showing formally why mainstream model-agnostic explanation methods, such as the local linear surrogate modeling and Shapley values, are not designed to explain anomalies. In particular, we show that they are ``deviation-agnostic,'' namely, that their explanations are blind to the fact that there is a deviation in the model prediction for the sample of interest. We do this by positioning these existing methods under the unified umbrella of a function family we call the ``integrated gradient family.'' We validate the effectiveness of the proposed LC approach using publicly available data sets. We also conduct a case study with a real-world building energy prediction task and confirm its usefulness in practice based on expert feedback.
Updated: 2024-08-17 20:34:16
标题: 黑匣子异常归因
摘要: 当一个黑盒机器学习模型的预测与真实观察结果偏离时,可以说这种偏差背后的原因是什么?这是商业或工业AI应用中的最终用户经常提出的一个基本且普遍的问题。这种偏差可能是由于次优的黑盒模型,也可能仅仅是因为所涉及的样本是异常值。在任何情况下,人们理想地希望获得某种形式的归因分数--一个指示输入变量对异常负责程度的值。 在本文中,我们解决了这个“异常归因”任务,特别是在模型是黑盒的情况下,训练数据不可用的情况下。具体来说,我们提出了一个称为“似然补偿(LC)”的新型基于似然的归因框架,在这个框架中,责任分数等于每个输入变量所需的校正,以获得可能的最高似然。我们首先正式展示了为什么主流的模型无关解释方法,如局部线性替代建模和Shapley值,并非设计用来解释异常。特别是,我们表明它们是“偏差不可知”的,即它们的解释对于模型对感兴趣样本的预测存在偏差这一事实是盲目的。我们通过将这些现有方法置于我们称之为“综合梯度家族”的一个函数家族的统一框架下来做到这一点。我们使用公开可用的数据集验证了所提出的LC方法的有效性。我们还对一个真实的建筑能源预测任务进行了案例研究,并根据专家反馈确认了它在实践中的有用性。
更新时间: 2024-08-17 20:34:16
领域: cs.LG
GCondNet: A Novel Method for Improving Neural Networks on Small High-Dimensional Tabular Data
Neural networks often struggle with high-dimensional but small sample-size tabular datasets. One reason is that current weight initialisation methods assume independence between weights, which can be problematic when there are insufficient samples to estimate the model's parameters accurately. In such small data scenarios, leveraging additional structures can improve the model's performance and training stability. To address this, we propose GCondNet, a general approach to enhance neural networks by leveraging implicit structures present in tabular data. We create a graph between samples for each data dimension, and utilise Graph Neural Networks (GNNs) to extract this implicit structure, and for conditioning the parameters of the first layer of an underlying predictor network. By creating many small graphs, GCondNet exploits the data's high-dimensionality, and thus improves the performance of an underlying predictor network. We demonstrate GCondNet's effectiveness on 12 real-world datasets, where it outperforms 14 standard and state-of-the-art methods. The results show that GCondNet is a versatile framework for injecting graph-regularisation into various types of neural networks, including MLPs and tabular Transformers. Code is available at https://github.com/andreimargeloiu/GCondNet.
Updated: 2024-08-17 20:16:45
标题: GCondNet:一种改进小规模高维表格数据上神经网络的新方法
摘要: 神经网络经常在高维但样本量较小的表格数据集上遇到困难。一个原因是当前的权重初始化方法假设权重之间相互独立,当样本不足以准确估计模型参数时,这可能会导致问题。在这种小数据情况下,利用额外的结构可以提高模型的性能和训练稳定性。为了解决这个问题,我们提出了GCondNet,这是一种通过利用表格数据中的隐含结构来增强神经网络的通用方法。我们为每个数据维度之间的样本创建了一个图,并利用图神经网络(GNNs)来提取这种隐含结构,并用于条件化底层预测网络的第一层参数。通过创建许多小图,GCondNet利用了数据的高维性,从而提高了底层预测网络的性能。我们在12个真实世界数据集上展示了GCondNet的有效性,它胜过了14种标准和最先进的方法。结果表明,GCondNet是一种灵活的框架,可以将图正规化注入各种类型的神经网络,包括MLPs和表格Transformer。代码可在https://github.com/andreimargeloiu/GCondNet获得。
更新时间: 2024-08-17 20:16:45
领域: cs.LG,q-bio.QM
Evaluating Usability and Engagement of Large Language Models in Virtual Reality for Traditional Scottish Curling
This paper explores the innovative application of Large Language Models (LLMs) in Virtual Reality (VR) environments to promote heritage education, focusing on traditional Scottish curling presented in the game ``Scottish Bonspiel VR''. Our study compares the effectiveness of LLM-based chatbots with pre-defined scripted chatbots, evaluating key criteria such as usability, user engagement, and learning outcomes. The results show that LLM-based chatbots significantly improve interactivity and engagement, creating a more dynamic and immersive learning environment. This integration helps document and preserve cultural heritage and enhances dissemination processes, which are crucial for safeguarding intangible cultural heritage (ICH) amid environmental changes. Furthermore, the study highlights the potential of novel technologies in education to provide immersive experiences that foster a deeper appreciation of cultural heritage. These findings support the wider application of LLMs and VR in cultural education to address global challenges and promote sustainable practices to preserve and enhance cultural heritage.
Updated: 2024-08-17 20:13:34
标题: 评估大型语言模型在传统苏格兰冰壶运动中在虚拟现实中的可用性和参与度
摘要: 这篇论文探讨了在虚拟现实(VR)环境中创新应用大型语言模型(LLMs)以促进传统苏格兰冰壶遗产教育,重点关注于游戏《苏格兰Bonspiel VR》中呈现的传统苏格兰冰壶。我们的研究比较了基于LLM的聊天机器人与预定义脚本化聊天机器人的有效性,评估了关键标准如可用性、用户参与度和学习成果。结果显示,基于LLM的聊天机器人显著提高了互动性和参与度,创造了一个更具动态和沉浸式的学习环境。这种整合有助于记录和保护文化遗产,并增强传播过程,这对于在环境变化中保护非物质文化遗产(ICH)至关重要。此外,该研究强调了新技术在教育中的潜力,提供沉浸式体验,促进对文化遗产的更深入欣赏。这些发现支持LLMs和VR在文化教育中的广泛应用,以应对全球挑战,并促进保护和增强文化遗产的可持续实践。
更新时间: 2024-08-17 20:13:34
领域: cs.HC,cs.AI
Design Editing for Offline Model-based Optimization
Offline model-based optimization (MBO) aims to maximize a black-box objective function using only an offline dataset of designs and scores. These tasks span various domains, such as robotics, material design, and protein and molecular engineering. A common approach involves training a surrogate model using existing designs and their corresponding scores, and then generating new designs through gradient-based updates with respect to the surrogate model. This method suffers from the out-of-distribution issue, where the surrogate model may erroneously predict high scores for unseen designs. To address this challenge, we introduce a novel method, Design Editing for Offline Model-based Optimization} (DEMO), which leverages a diffusion prior to calibrate overly optimized designs. DEMO first generates pseudo design candidates by performing gradient ascent with respect to a surrogate model. Then, an editing process refines these pseudo design candidates by introducing noise and subsequently denoising them with a diffusion prior trained on the offline dataset, ensuring they align with the distribution of valid designs. We provide a theoretical proof that the difference between the final optimized designs generated by DEMO and the prior distribution of the offline dataset is controlled by the noise injected during the editing process. Empirical evaluations on seven offline MBO tasks show that DEMO outperforms various baseline methods, achieving the highest mean rank of 2.1 and a median rank of 1.
Updated: 2024-08-17 19:51:14
标题: 离线模型优化的设计编辑
摘要: 离线模型优化(MBO)旨在最大化一个黑盒目标函数,仅使用离线设计和评分数据集。这些任务涵盖了各个领域,如机器人技术、材料设计、蛋白质和分子工程。一种常见的方法是使用现有设计及其相应的评分来训练一个替代模型,然后通过相对于替代模型的梯度更新生成新设计。该方法存在着分布外问题,即替代模型可能会错误地预测未见设计的高分数。为了解决这一挑战,我们提出了一种新方法,名为离线模型优化设计编辑(DEMO),它利用扩散先验来校准过度优化的设计。DEMO首先通过相对于替代模型的梯度上升生成伪设计候选。然后,通过引入噪声和随后利用在离线数据集上训练的扩散先验对这些伪设计候选进行精炼,确保它们与有效设计的分布一致。我们提供了一个理论证明,即DEMO生成的最终优化设计与离线数据集的先验分布之间的差异受到编辑过程中注入的噪声的控制。对七个离线MBO任务的实证评估表明,DEMO优于各种基线方法,获得了最高的平均排名(2.1)和中位数排名(1)。
更新时间: 2024-08-17 19:51:14
领域: cs.LG,cs.CE
SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization
Recent studies have revealed that, during the inference on generative AI models such as transformer, the importance of different weights exhibits substantial context-dependent variations. This naturally manifests a promising potential of adaptively configuring weight quantization to improve the generative AI inference efficiency. Although configurable weight quantization can readily leverage the hardware support of variable-precision arithmetics in modern GPU and AI accelerators, little prior research has studied how one could exploit variable weight quantization to proportionally improve the AI model memory access speed and energy efficiency. Motivated by the rapidly maturing CXL ecosystem, this work develops a CXL-based design solution to fill this gap. The key is to allow CXL memory controllers play an active role in supporting and exploiting runtime configurable weight quantization. Using transformer as a representative generative AI model, we carried out experiments that well demonstrate the effectiveness of the proposed design solution.
Updated: 2024-08-17 19:44:41
标题: 智能量化:支持运行时配置的CXL基于AI模型存储
摘要: 最近的研究表明,在生成式AI模型(如Transformer)的推理过程中,不同权重的重要性表现出显著的上下文相关变化。这自然地展示了自适应配置权重量化以提高生成式AI推理效率的潜力。虽然可配置的权重量化可以方便地利用现代GPU和人工智能加速器中可变精度算术的硬件支持,但很少有先前的研究探讨了如何利用可变权重量化来成比例地提高AI模型的内存访问速度和能源效率。受CXL生态系统迅速成熟的启发,本文提出了一种基于CXL的设计解决方案来填补这一空白。关键在于允许CXL内存控制器在支持和利用运行时可配置的权重量化方面发挥积极作用。以Transformer作为代表性的生成式AI模型,我们进行了实验,充分展示了所提出的设计解决方案的有效性。
更新时间: 2024-08-17 19:44:41
领域: cs.LG,cs.AI,cs.AR
Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
Research on multi-modal contrastive learning strategies for audio and text has rapidly gained interest. Contrastively trained Audio-Language Models (ALMs), such as CLAP, which establish a unified representation across audio and language modalities, have enhanced the efficacy in various subsequent tasks by providing good text aligned audio encoders and vice versa. These improvements are evident in areas like zero-shot audio classification and audio retrieval, among others. However, the ability of these models to understand natural language and temporal relations is still a largely unexplored and open field for research. In this paper, we propose to equip the multi-modal ALMs with temporal understanding without loosing their inherent prior capabilities of audio-language tasks with a temporal instillation method TeminAL. We implement a two-stage training scheme TeminAL A $\&$ B, where the model first learns to differentiate between multiple sounds in TeminAL A, followed by a phase that instills a sense of time, thereby enhancing its temporal understanding in TeminAL B. This approach results in an average performance gain of $5.28\%$ in temporal understanding on the ESC-50 dataset, while the model remains competitive in zero-shot retrieval and classification tasks on the AudioCap/Clotho datasets. We also note the lack of proper evaluation techniques for contrastive ALMs and propose a strategy for evaluating ALMs in zero-shot settings. The general-purpose zero-shot model evaluation strategy ZSTE, is used to evaluate various prior models. ZSTE demonstrates a general strategy to evaluate all ZS contrastive models. The model trained with TeminAL successfully outperforms current models on most downstream tasks.
Updated: 2024-08-17 18:53:17
标题: 通过自监督的文本-音频对后训练增强音频语言模型
摘要: 对音频和文本的多模态对比学习策略的研究迅速引起了兴趣。对比训练的音频-语言模型(ALM),如CLAP,建立了跨音频和语言模态的统一表示,通过提供良好的文本对齐音频编码器等方式增强了各种后续任务的效力。这些改进在零样本音频分类和音频检索等领域是显而易见的。然而,这些模型理解自然语言和时间关系的能力仍然是一个尚未深入探讨的开放研究领域。在本文中,我们提出为多模态ALM配备具有时间理解能力的方法TeminAL,而不会失去其固有的音频-语言任务的能力。我们实现了一个两阶段训练方案TeminAL A&B,其中模型首先在TeminAL A中学习区分多个声音,然后在TeminAL B中灌输时间感,从而增强其时间理解能力。这种方法导致ESC-50数据集中时间理解能力平均提高了5.28%,而模型在AudioCap/Clotho数据集上的零样本检索和分类任务中仍然具有竞争力。我们还注意到对比ALM缺乏适当的评估技巧,并提出了一种在零样本设置下评估ALM的策略。通用零样本模型评估策略ZSTE用于评估各种先前模型。ZSTE展示了一种评估所有零样本对比模型的一般策略。使用TeminAL训练的模型在大多数下游任务上成功超越了当前模型。
更新时间: 2024-08-17 18:53:17
领域: cs.SD,cs.LG,eess.AS
Can Machines Imitate Humans? Integrative Turing Tests for Vision and Language Demonstrate a Narrowing Gap
As AI algorithms increasingly participate in daily activities, it becomes critical to ascertain whether the agents we interact with are human or not. To address this question, we turn to the Turing test and systematically benchmark current AIs in their abilities to imitate humans in three language tasks (Image captioning, Word association, and Conversation) and three vision tasks (Object detection, Color estimation, and Attention prediction). The experiments involved 549 human agents plus 26 AI agents for dataset creation, and 1,126 human judges plus 10 AI judges, in 25,650 Turing-like tests. The results reveal that current AIs are not far from being able to impersonate humans in complex language and vision challenges. While human judges were often deceived, simple AI judges outperformed human judges in distinguishing human answers from AI answers. The results of imitation tests are only minimally correlated with standard performance metrics in AI. Thus, evaluating whether a machine can pass as a human constitutes an important independent test to evaluate AI algorithms. The curated, large-scale, Turing datasets introduced here and their evaluation metrics provide new benchmarks and insights to assess whether an agent is human or not and emphasize the relevance of rigorous, systematic, and quantitative imitation tests in these and other AI domains.
Updated: 2024-08-17 18:37:13
标题: 机器能模仿人类吗?视觉和语言的集成图灵测试显示出差距在缩小。
摘要: 随着人工智能算法在日常活动中的日益参与,确定我们与之互动的代理是否为人类变得至关重要。为了解决这个问题,我们转向图灵测试,并系统地评估当前人工智能在三种语言任务(图像字幕、词汇联想和对话)和三种视觉任务(物体检测、颜色估计和注意力预测)中模仿人类的能力。实验包括549名人类代理和26名人工智能代理用于数据集创建,以及1,126名人类评委和10名人工智能评委,进行了25,650次类似图灵测试。结果显示,当前人工智能在复杂语言和视觉挑战中模仿人类并不遥远。尽管人类评委经常被欺骗,但简单的人工智能评委在区分人类答案和人工智能答案方面表现优于人类评委。模仿测试的结果与人工智能中的标准性能指标只有极小的相关性。因此,评估机器是否能够通过人类验证构成了一个重要的独立测试,以评估人工智能算法。这里介绍的策划、大规模的图灵数据集及其评估指标提供了新的基准和见解,以评估一个代理是否为人类,并强调在这些和其他人工智能领域中进行严格、系统和定量的模仿测试的相关性。
更新时间: 2024-08-17 18:37:13
领域: cs.CV,cs.AI
ByCAN: Reverse Engineering Controller Area Network (CAN) Messages from Bit to Byte Level
As the primary standard protocol for modern cars, the Controller Area Network (CAN) is a critical research target for automotive cybersecurity threats and autonomous applications. As the decoding specification of CAN is a proprietary black-box maintained by Original Equipment Manufacturers (OEMs), conducting related research and industry developments can be challenging without a comprehensive understanding of the meaning of CAN messages. In this paper, we propose a fully automated reverse-engineering system, named ByCAN, to reverse engineer CAN messages. ByCAN outperforms existing research by introducing byte-level clusters and integrating multiple features at both byte and bit levels. ByCAN employs the clustering and template matching algorithms to automatically decode the specifications of CAN frames without the need for prior knowledge. Experimental results demonstrate that ByCAN achieves high accuracy in slicing and labeling performance, i.e., the identification of CAN signal boundaries and labels. In the experiments, ByCAN achieves slicing accuracy of 80.21%, slicing coverage of 95.21%, and labeling accuracy of 68.72% for general labels when analyzing the real-world CAN frames.
Updated: 2024-08-17 18:10:15
标题: 通过CAN:从位级到字节级逆向工程控制器区域网络(CAN)消息
摘要: 作为现代汽车的主要标准协议,控制器区域网络(CAN)是汽车网络安全威胁和自动驾驶应用的关键研究对象。由于CAN的解码规范是由原始设备制造商(OEMs)维护的专有黑匣子,因此在没有全面理解CAN消息含义的情况下进行相关研究和行业发展可能具有挑战性。本文提出了一个名为ByCAN的完全自动化反向工程系统,用于反向工程CAN消息。ByCAN通过引入字节级聚类并在字节和位级别集成多个特征,优于现有研究。ByCAN采用聚类和模板匹配算法自动解码CAN帧的规范,无需先前知识。实验结果表明,ByCAN在切片和标记性能方面取得了高准确性,即CAN信号边界和标签的识别。在实验中,当分析真实世界的CAN帧时,ByCAN实现了80.21%的切片准确性,95.21%的切片覆盖率以及68.72%的标签准确性。
更新时间: 2024-08-17 18:10:15
领域: cs.CR,cs.LG,cs.NI,cs.SY,eess.SY
How Susceptible are LLMs to Influence in Prompts?
Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model -- or a system with access to more external information -- provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.
Updated: 2024-08-17 17:40:52
标题: LLM对提示的影响有多大?
摘要: 大型语言模型(LLMs)对提示非常敏感,包括其中提供的额外背景信息。随着LLMs的能力增长,了解它们对提示的敏感性变得越来越关键,以确保可靠和稳健的性能,特别是在评估这些模型变得更具挑战性的情况下。在这项工作中,我们研究了当当前模型(Llama、Mixtral、Falcon)在另一个模型提供的额外输入的情况下作出反应时,模拟一个更具能力的模型或者可以访问更多外部信息的系统向目标模型提供补充信息的情景。在广泛的问答任务中,我们研究了LLM对多项选择题的回答在提示包括另一个模型的预测和解释时会发生怎样的变化。具体来说,我们探讨了解释的存在、信息源的声明权威性以及补充输入的声明信心对LLM的影响。我们的研究结果显示,模型受到很大影响,当提供解释时,它们会受到影响,无论解释的质量如何。如果输入被表述为权威或自信,模型更有可能受到影响,但影响程度很小。这项研究强调了LLMs的显著提示敏感性,并突出了在没有经过彻底审查和进一步验证的情况下将外部来源的输出纳入的潜在风险。随着LLMs的不断进步,了解和减轻这种敏感性对于它们的可靠和可信赖的部署至关重要。
更新时间: 2024-08-17 17:40:52
领域: cs.CL,cs.AI,cs.LG
UniTS: A Universal Time Series Analysis Framework Powered by Self-Supervised Representation Learning
Machine learning has emerged as a powerful tool for time series analysis. Existing methods are usually customized for different analysis tasks and face challenges in tackling practical problems such as partial labeling and domain shift. To improve the performance and address the practical problems universally, we develop UniTS, a novel framework that incorporates self-supervised representation learning (or pre-training). The components of UniTS are designed using sklearn-like APIs to allow flexible extensions. We demonstrate how users can easily perform an analysis task using the user-friendly GUIs, and show the superior performance of UniTS over the traditional task-specific methods without self-supervised pre-training on five mainstream tasks and two practical settings.
Updated: 2024-08-17 17:33:19
标题: UniTS:一种由自监督表示学习驱动的通用时间序列分析框架
摘要: 机器学习已经成为时间序列分析的强大工具。现有的方法通常针对不同的分析任务进行定制,并面临着处理部分标记和领域转移等实际问题的挑战。为了提高性能并解决普遍存在的实际问题,我们开发了UniTS,这是一个将自监督表示学习(或预训练)纳入其中的新颖框架。UniTS的组件采用类似sklearn的APIs设计,以允许灵活扩展。我们展示了用户如何可以轻松地使用用户友好的GUI执行分析任务,并展示了UniTS在五个主流任务和两个实际设置上优于传统的具体任务方法(不包括自监督预训练)的性能。
更新时间: 2024-08-17 17:33:19
领域: cs.LG,cs.DB
FedST: Secure Federated Shapelet Transformation for Time Series Classification
This paper explores how to build a shapelet-based time series classification (TSC) model in the federated learning (FL) scenario, that is, using more data from multiple owners without actually sharing the data. We propose FedST, a novel federated TSC framework extended from a centralized shapelet transformation method. We recognize the federated shapelet search step as the kernel of FedST. Thus, we design a basic protocol for the FedST kernel that we prove to be secure and accurate. However, we identify that the basic protocol suffers from efficiency bottlenecks and the centralized acceleration techniques lose their efficacy due to the security issues. To speed up the federated protocol with security guarantee, we propose several optimizations tailored for the FL setting. Our theoretical analysis shows that the proposed methods are secure and more efficient. We conduct extensive experiments using both synthetic and real-world datasets. Empirical results show that our FedST solution is effective in terms of TSC accuracy, and the proposed optimizations can achieve three orders of magnitude of speedup.
Updated: 2024-08-17 17:26:23
标题: FedST:用于时间序列分类的安全联邦形状变换
摘要: 本文探讨了如何在联邦学习(FL)场景中构建基于形状子的时间序列分类(TSC)模型,即利用来自多个所有者的更多数据而不实际共享数据。我们提出了FedST,这是一个新颖的联邦TSC框架,扩展自集中式形状子转换方法。我们将联邦形状子搜索步骤视为FedST的核心。因此,我们设计了一个用于FedST核心的基本协议,我们证明了它既安全又准确。然而,我们发现基本协议存在效率瓶颈,而中心化的加速技术由于安全问题导致其效果失效。为了加速具有安全保障的联邦协议,我们提出了几种针对FL环境的优化。我们的理论分析表明,所提出的方法是安全且更高效的。我们进行了大量实验,使用合成和真实世界数据集。实证结果表明,我们的FedST解决方案在TSC准确性方面是有效的,提出的优化方法可以实现三个数量级的加速。
更新时间: 2024-08-17 17:26:23
领域: cs.LG,cs.DB
PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks
Most methods for neural network verification focus on bounding the image, i.e., set of outputs for a given input set. This can be used to, for example, check the robustness of neural network predictions to bounded perturbations of an input. However, verifying properties concerning the preimage, i.e., the set of inputs satisfying an output property, requires abstractions in the input space. We present a general framework for preimage abstraction that produces under- and over-approximations of any polyhedral output set. Our framework employs cheap parameterised linear relaxations of the neural network, together with an anytime refinement procedure that iteratively partitions the input region by splitting on input features and neurons. The effectiveness of our approach relies on carefully designed heuristics and optimization objectives to achieve rapid improvements in the approximation volume. We evaluate our method on a range of tasks, demonstrating significant improvement in efficiency and scalability to high-input-dimensional image classification tasks compared to state-of-the-art techniques. Further, we showcase the application to quantitative verification and robustness analysis, presenting a sound and complete algorithm for the former and providing sound quantitative results for the latter.
Updated: 2024-08-17 17:24:47
标题: PREMAP:神经网络的统一预映像近似框架
摘要: 大多数神经网络验证方法集中在图像边界上,即给定输入集的输出集。这可以用来检查神经网络对输入的有界扰动的鲁棒性。然而,验证有关原像的属性,即满足输出属性的输入集,需要在输入空间中进行抽象。我们提出了一个通用的原像抽象框架,可以生成任何多面体输出集的下估计和上估计。我们的框架利用神经网络的廉价参数化线性松弛,结合随时优化程序,通过在输入特征和神经元上分割输入区域来迭代地细化分区。我们的方法的有效性取决于精心设计的启发式和优化目标,以实现近似体积的快速改进。我们在一系列任务上评估了我们的方法,与最先进的技术相比,在高维输入图像分类任务的效率和可扩展性方面取得了显著改进。此外,我们展示了该方法在定量验证和鲁棒性分析中的应用,为前者提供了一个完整的算法,并为后者提供了可靠的定量结果。
更新时间: 2024-08-17 17:24:47
领域: cs.LG,cs.AI,cs.LO
Estimating large causal polytrees from small samples
We consider the problem of estimating a large causal polytree from a relatively small i.i.d. sample. This is motivated by the problem of determining causal structure when the number of variables is very large compared to the sample size, such as in gene regulatory networks. We give an algorithm that recovers the tree with high accuracy in such settings. The algorithm works under essentially no distributional or modeling assumptions other than some mild non-degeneracy conditions.
Updated: 2024-08-17 17:22:07
标题: 从小样本中估计大型因果多树
摘要: 我们考虑从相对较小的独立同分布样本中估计一个大型因果多叉树的问题。这是由于在变量数量远远大于样本大小的情况下确定因果结构的问题,比如在基因调控网络中。我们提出了一个算法,在这种情况下能够高精度地恢复树结构。该算法在基本上没有任何分布或建模假设,除了一些轻微的非退化条件。
更新时间: 2024-08-17 17:22:07
领域: stat.ME,cs.LG,math.PR,math.ST,stat.ML,stat.TH,62D20
Unconditionally secure quantum commitments with preprocessing
We demonstrate how to build computationally secure commitment schemes with the aid of quantum auxiliary inputs without unproven complexity assumptions. Furthermore, the quantum auxiliary input can be either sampled in uniform exponential time or prepared in at most doubly exponential time, without relying on an external trusted third party. Classically, this remains impossible without first proving $\mathsf{P} \neq \mathsf{NP}$.
Updated: 2024-08-17 17:15:27
标题: 具有预处理的无条件安全量子承诺
摘要: 我们展示了如何借助量子辅助输入构建计算上安全的承诺方案,而无需未经证明的复杂性假设。此外,量子辅助输入可以在均匀指数时间内进行抽样,或者在最多双指数时间内准备,而无需依赖外部可信第三方。在经典情况下,这仍然是不可能的,除非首先证明P不等于NP。
更新时间: 2024-08-17 17:15:27
领域: quant-ph,cs.CR
Improving EEG Classification Through Randomly Reassembling Original and Generated Data with Transformer-based Diffusion Models
Electroencephalogram (EEG) classification has been widely used in various medical and engineering applications, where it is important for understanding brain function, diagnosing diseases, and assessing mental health conditions. However, the scarcity of EEG data severely restricts the performance of EEG classification networks, and generative model-based data augmentation methods have emerged as potential solutions to overcome this challenge. There are two problems with existing methods: (1) The quality of the generated EEG signals is not high; (2) The enhancement of EEG classification networks is not effective. In this paper, we propose a Transformer-based denoising diffusion probabilistic model and a generated data-based augmentation method to address the above two problems. For the characteristics of EEG signals, we propose a constant-factor scaling method to preprocess the signals, which reduces the loss of information. We incorporated Multi-Scale Convolution and Dynamic Fourier Spectrum Information modules into the model, improving the stability of the training process and the quality of the generated data. The proposed augmentation method randomly reassemble the generated data with original data in the time-domain to obtain vicinal data, which improves the model performance by minimizing the empirical risk and the vicinal risk. We verify the proposed augmentation method on four EEG datasets for four tasks and observe significant accuracy performance improvements: 14.00% on the Bonn dataset; 6.38% on the SleepEDF-20 dataset; 9.42% on the FACED dataset; 2.5% on the Shu dataset. We will make the code of our method publicly accessible soon.
Updated: 2024-08-17 17:13:25
标题: 通过基于Transformer的扩散模型随机重新组装原始和生成的数据来改善EEG分类
摘要: 脑电图(EEG)分类在各种医学和工程应用中广泛使用,对于理解脑功能、诊断疾病和评估心理健康状况至关重要。然而,EEG数据的稀缺严重限制了EEG分类网络的性能,基于生成模型的数据增强方法已经成为克服这一挑战的潜在解决方案。现有方法存在两个问题:(1)生成的EEG信号质量不高;(2)EEG分类网络的增强效果不佳。本文提出了基于Transformer的去噪扩散概率模型和基于生成数据的增强方法来解决上述两个问题。针对EEG信号的特点,我们提出了一个常数因子缩放方法来预处理信号,减少信息损失。我们将多尺度卷积和动态傅里叶谱信息模块纳入模型,提高了训练过程的稳定性和生成数据的质量。提出的增强方法在时间域中随机重新组合生成数据和原始数据,以获得邻近数据,通过最小化经验风险和邻近风险来提高模型性能。我们在四个EEG数据集上验证了所提出的增强方法进行四个任务,并观察到显著的准确率性能提升:Bonn数据集提高了14.00%;SleepEDF-20数据集提高了6.38%;FACED数据集提高了9.42%;Shu数据集提高了2.5%。我们将很快公开我们方法的代码。
更新时间: 2024-08-17 17:13:25
领域: eess.SP,cs.LG
A Shapelet-based Framework for Unsupervised Multivariate Time Series Representation Learning
Recent studies have shown great promise in unsupervised representation learning (URL) for multivariate time series, because URL has the capability in learning generalizable representation for many downstream tasks without using inaccessible labels. However, existing approaches usually adopt the models originally designed for other domains (e.g., computer vision) to encode the time series data and {rely on strong assumptions to design learning objectives, which limits their ability to perform well}. To deal with these problems, we propose a novel URL framework for multivariate time series by learning time-series-specific shapelet-based representation through a popular contrasting learning paradigm. To the best of our knowledge, this is the first work that explores the shapelet-based embedding in the unsupervised general-purpose representation learning. A unified shapelet-based encoder and a novel learning objective with multi-grained contrasting and multi-scale alignment are particularly designed to achieve our goal, and a data augmentation library is employed to improve the generalization. We conduct extensive experiments using tens of real-world datasets to assess the representation quality on many downstream tasks, including classification, clustering, and anomaly detection. The results demonstrate the superiority of our method against not only URL competitors, but also techniques specially designed for downstream tasks. Our code has been made publicly available at https://github.com/real2fish/CSL.
Updated: 2024-08-17 17:05:29
标题: 基于形状特征的无监督多变量时间序列表示学习框架
摘要: 最近的研究显示,无监督表示学习(URL)在多变量时间序列中表现出极大的潜力,因为URL具有学习可推广表示的能力,可用于许多下游任务,而无需使用不可访问的标签。然而,现有方法通常采用最初设计用于其他领域(如计算机视觉)的模型来编码时间序列数据,并依赖于强假设来设计学习目标,从而限制了它们表现良好的能力。为了解决这些问题,我们提出了一种新颖的URL框架,通过一种流行的对比学习范式学习基于时间序列的形状特征表示。据我们所知,这是第一项探索基于形状特征嵌入的无监督通用表示学习的工作。一个统一的基于形状特征的编码器和一个新颖的学习目标,具有多粒度对比和多尺度对齐,特别设计用于实现我们的目标,并采用数据增强库以提高泛化能力。我们进行了大量实验,使用数十个真实世界数据集来评估在许多下游任务上的表示质量,包括分类、聚类和异常检测。结果表明,我们的方法不仅优于URL竞争对手,还优于专门设计用于下游任务的技术。我们的代码已公开发布在https://github.com/real2fish/CSL。
更新时间: 2024-08-17 17:05:29
领域: cs.LG
V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models
Advancements in autonomous driving have increasingly focused on end-to-end (E2E) systems that manage the full spectrum of driving tasks, from environmental perception to vehicle navigation and control. This paper introduces V2X-VLM, an innovative E2E vehicle-infrastructure cooperative autonomous driving (VICAD) framework with large vision-language models (VLMs). V2X-VLM is designed to enhance situational awareness, decision-making, and ultimate trajectory planning by integrating data from vehicle-mounted cameras, infrastructure sensors, and textual information. The strength of the comprehensive multimodel data fusion of the VLM enables precise and safe E2E trajectory planning in complex and dynamic driving scenarios. Validation on the DAIR-V2X dataset demonstrates that V2X-VLM outperforms existing state-of-the-art methods in cooperative autonomous driving.
Updated: 2024-08-17 16:42:13
标题: V2X-VLM:通过大型视觉语言模型实现端到端的V2X合作自动驾驶
摘要: 自动驾驶技术的进步越来越集中在端到端(E2E)系统上,这些系统管理从环境感知到车辆导航和控制的整个驾驶任务谱。本文介绍了V2X-VLM,这是一种创新的端到端车辆基础设施合作自动驾驶(VICAD)框架,具有大型视觉语言模型(VLM)。V2X-VLM旨在通过整合来自车载摄像头、基础设施传感器和文本信息的数据,提升情境感知、决策制定和最终轨迹规划能力。VLM的全面多模态数据融合能力使得在复杂和动态的驾驶场景中实现精确且安全的端到端轨迹规划成为可能。在DAIR-V2X数据集上的验证表明,V2X-VLM在合作自动驾驶方面优于现有的最先进方法。
更新时间: 2024-08-17 16:42:13
领域: cs.RO,cs.AI,cs.LG
Reduced-Space Iteratively Reweighted Second-Order Methods for Nonconvex Sparse Regularization
This paper explores a specific type of nonconvex sparsity-promoting regularization problems, namely those involving $\ell_p$-norm regularization, in conjunction with a twice continuously differentiable loss function. We propose a novel second-order algorithm designed to effectively address this class of challenging nonconvex and nonsmooth problems, showcasing several innovative features: (i) The use of an alternating strategy to solve a reweighted $\ell_1$ regularized subproblem and the subspace approximate Newton step. (ii) The reweighted $\ell_1$ regularized subproblem relies on a convex approximation to the nonconvex regularization term, enabling a closed-form solution characterized by the soft-thresholding operator. This feature allows our method to be applied to various nonconvex regularization problems. (iii) Our algorithm ensures that the iterates maintain their sign values and that nonzero components are kept away from 0 for a sufficient number of iterations, eventually transitioning to a perturbed Newton method. (iv) We provide theoretical guarantees of global convergence, local superlinear convergence in the presence of the Kurdyka-\L ojasiewicz (KL) property, and local quadratic convergence when employing the exact Newton step in our algorithm. We also showcase the effectiveness of our approach through experiments on a diverse set of model prediction problems.
Updated: 2024-08-17 16:31:37
标题: 减少空间迭代重新加权的非凸稀疏正则化二阶方法
摘要: 本文探讨了一种特定类型的非凸稀疏促进正则化问题,即涉及$\ell_p$-范数正则化的问题,结合具有两次连续可微损失函数。我们提出了一种新颖的二阶算法,旨在有效解决这类具有挑战性的非凸和非光滑问题,展示了几个创新特点:(i) 使用交替策略解决重新加权的$\ell_1$正则化子问题和子空间近似牛顿步。 (ii) 重新加权的$\ell_1$正则化子问题依赖于对非凸正则化项的凸近似,从而实现由软阈值算子表征的封闭形式解。这一特点使得我们的方法可以应用于各种非凸正则化问题。(iii) 我们的算法确保迭代保持其符号值,并且非零分量在足够多次迭代中远离0,最终过渡到扰动牛顿方法。(iv) 我们在算法中使用精确牛顿步时,提供了全局收敛性的理论保证,存在Kurdyka-\L ojasiewicz (KL)性质时的局部超线性收敛性,并且在本文中展示了在各种模型预测问题上我们方法的有效性。
更新时间: 2024-08-17 16:31:37
领域: math.OC,cs.LG,90C26, 49M15, 90C53
Towards Effective Top-N Hamming Search via Bipartite Graph Contrastive Hashing
Searching on bipartite graphs serves as a fundamental task for various real-world applications, such as recommendation systems, database retrieval, and document querying. Conventional approaches rely on similarity matching in continuous Euclidean space of vectorized node embeddings. To handle intensive similarity computation efficiently, hashing techniques for graph-structured data have emerged as a prominent research direction. However, despite the retrieval efficiency in Hamming space, previous studies have encountered catastrophic performance decay. To address this challenge, we investigate the problem of hashing with Graph Convolutional Network for effective Top-N search. Our findings indicate the learning effectiveness of incorporating hashing techniques within the exploration of bipartite graph reception fields, as opposed to simply treating hashing as post-processing to output embeddings. To further enhance the model performance, we advance upon these findings and propose Bipartite Graph Contrastive Hashing (BGCH+). BGCH+ introduces a novel dual augmentation approach to both intermediate information and hash code outputs in the latent feature spaces, thereby producing more expressive and robust hash codes within a dual self-supervised learning paradigm. Comprehensive empirical analyses on six real-world benchmarks validate the effectiveness of our dual feature contrastive learning in boosting the performance of BGCH+ compared to existing approaches.
Updated: 2024-08-17 16:21:32
标题: 朝向通过二分图对比哈希实现有效的Top-N Hamming搜索
摘要: 在双分图上进行搜索是各种现实世界应用的基本任务,例如推荐系统、数据库检索和文档查询。传统方法依赖于在矢量化节点嵌入的连续欧几里德空间中进行相似性匹配。为了有效处理密集的相似性计算,图结构数据的哈希技术已成为一个突出的研究方向。然而,尽管在汉明空间中的检索效率很高,先前的研究遇到了灾难性的性能衰减。为了解决这一挑战,我们研究了使用图卷积网络进行哈希化以进行有效的Top-N搜索的问题。我们的发现表明,在探索双分图接收领域时,将哈希技术纳入学习是有效的,而不是简单地将哈希视为后处理以输出嵌入。为了进一步提高模型性能,我们基于这些发现并提出了双分图对比哈希(BGCH+)。BGCH+引入了一种新颖的双增强方法,既在中间信息中,也在潜在特征空间中的哈希码输出中,从而在双自监督学习范式中产生更具表现力和鲁棒性的哈希码。对六个真实世界基准数据集的全面实证分析验证了我们的双特征对比学习在提升BGCH+性能方面比现有方法更有效的效果。
更新时间: 2024-08-17 16:21:32
领域: cs.IR,cs.AI
A Novel Data-driven Numerical Method for Hydrological Modeling of Water Infiltration in Porous Media
Root-zone soil moisture monitoring is essential for sensor-based smart irrigation and agricultural drought prevention. Modeling the spatiotemporal water flow dynamics in porous media such as soil is typically achieved by solving an agro-hydrological model, the most important of which being the Richards equation. In this paper, we present a novel data-driven solution algorithm named the DRW (Data-driven global Random Walk) algorithm, which holistically integrates adaptive linearization scheme, neural networks, and global random walk in a finite volume discretization framework. We discuss the need and benefits of introducing these components to achieve synergistic improvements in solution accuracy and numerical stability. We show that the DRW algorithm can accurately solve $n$-dimensional Richards equation with guaranteed convergence under reasonable assumptions. Through examples, we also demonstrate that the DRW algorithm can better preserve the underlying physics and mass conservation of the Richards equation compared to state-of-the-art solution algorithms and commercial solver.
Updated: 2024-08-17 16:07:59
标题: 一种新颖的基于数据驱动的水文建模方法:多孔介质水分浸润的数值模拟
摘要: 根区土壤水分监测对于基于传感器的智能灌溉和农业旱情预防至关重要。在多孔介质如土壤中建模时,通常通过解决农业水文模型来实现时空水流动力学,其中最重要的是Richards方程。在本文中,我们介绍了一种名为DRW(数据驱动全局随机漫步)算法的创新数据驱动解决方案,该算法在有限体积离散化框架中全面集成了自适应线性化方案、神经网络和全局随机漫步。我们讨论了引入这些组件以实现解决方案准确性和数值稳定性的协同改进的需要和好处。我们展示了DRW算法能够准确解决具有保证收敛性的n维Richards方程,基于合理假设。通过示例,我们还演示了与最先进的解决方案算法和商业求解器相比,DRW算法能够更好地保持Richards方程的基本物理学和质量守恒。
更新时间: 2024-08-17 16:07:59
领域: math.NA,cs.LG,cs.NA,35Q86,G.1.8
Increasing transformer token length with a Maximum Entropy Principle Method
Transformers suffer from the computational overhead of their quadratic dependence on the length of sequences processed. We present three methods, all adding an intermediate step between training and inference/generation, which extend the autoregressive length of transformers. All rely on a Maximum Entropy Principle (MEP) whereby entropy is maximized in the presence of suitable constraints, accounted for by use of Lagrange Multipliers. These constraint methods extend the autoregressive character from T to 2T tokens in a linear-with-T fashion. There is overhead associated with this added step, but they should still be faster than the standard methods.
Updated: 2024-08-17 15:47:39
标题: 用最大熵原理方法增加变压器标记长度
摘要: Transformers suffer from the computational overhead caused by their quadratic dependence on the length of processed sequences. In this study, we introduce three methods that add an intermediate step between training and inference/generation to extend the autoregressive length of transformers. These methods all rely on the Maximum Entropy Principle (MEP), where entropy is maximized while considering suitable constraints through the use of Lagrange Multipliers. By using these constraint methods, the autoregressive character of transformers can be extended from T to 2T tokens in a linear fashion with T. While there is some overhead associated with this additional step, these methods are expected to be faster than traditional approaches.
更新时间: 2024-08-17 15:47:39
领域: cs.LG
Addressing Distribution Shift in RTB Markets via Exponential Tilting
In machine learning applications, distribution shifts between training and target environments can lead to significant drops in model performance. This study investigates the impact of such shifts on binary classification models within the Real-Time Bidding (RTB) market context, where selection bias contributes to these shifts. To address this challenge, we apply the Exponential Tilt Reweighting Alignment (ExTRA) algorithm, proposed by Maity et al. (2023). This algorithm estimates importance weights for the empirical risk by considering both covariate and label distributions, without requiring target label information, by assuming a specific weight structure. The goal of this study is to estimate weights that correct for the distribution shifts in RTB model and to evaluate the efficiency of the proposed model using simulated real-world data.
Updated: 2024-08-17 15:43:36
标题: 通过指数倾斜来解决RTB市场中的分布偏移问题
摘要: 在机器学习应用中,训练和目标环境之间的分布变化可能导致模型性能显著下降。本研究调查了这种变化对实时竞价(RTB)市场背景下的二元分类模型的影响,其中选择偏差导致了这些变化。为了解决这一挑战,我们应用了Maity等人(2023年)提出的指数倾斜重新加权对齐(ExTRA)算法。该算法通过考虑协变量和标签分布来估计经验风险的重要性权重,而不需要目标标签信息,假设了特定的权重结构。本研究的目标是估计纠正RTB模型中的分布变化的权重,并使用模拟的真实世界数据评估所提出模型的效率。
更新时间: 2024-08-17 15:43:36
领域: stat.ML,cs.LG
Auto-weighted Bayesian Physics-Informed Neural Networks and robust estimations for multitask inverse problems in pore-scale imaging of dissolution
In this article, we present a novel data assimilation strategy in pore-scale imaging and demonstrate that this makes it possible to robustly address reactive inverse problems incorporating Uncertainty Quantification (UQ). Pore-scale modeling of reactive flow offers a valuable opportunity to investigate the evolution of macro-scale properties subject to dynamic processes. Yet, they suffer from imaging limitations arising from the associated X-ray microtomography (X-ray microCT) process, which induces discrepancies in the properties estimates. Assessment of the kinetic parameters also raises challenges, as reactive coefficients are critical parameters that can cover a wide range of values. We account for these two issues and ensure reliable calibration of pore-scale modeling, based on dynamical microCT images, by integrating uncertainty quantification in the workflow. The present method is based on a multitasking formulation of reactive inverse problems combining data-driven and physics-informed techniques in calcite dissolution. This allows quantifying morphological uncertainties on the porosity field and estimating reactive parameter ranges through prescribed PDE models with a latent concentration field and dynamical microCT. The data assimilation strategy relies on sequential reinforcement incorporating successively additional PDE constraints. We guarantee robust and unbiased uncertainty quantification by straightforward adaptive weighting of Bayesian Physics-Informed Neural Networks (BPINNs), ensuring reliable micro-porosity changes during geochemical transformations. We demonstrate successful Bayesian Inference in 1D+Time and 2D+Time calcite dissolution based on synthetic microCT images with meaningful posterior distribution on the reactive parameters and dimensionless numbers.
Updated: 2024-08-17 15:43:04
标题: 自动加权的贝叶斯物理信息神经网络和多任务反问题中溶解孔隙尺度成像的鲁棒估计
摘要: 在这篇文章中,我们提出了一种新颖的数据同化策略,用于孔隙尺度成像,并展示这使得能够稳健地解决包含不确定性量化(UQ)的反应性逆问题。反应性流动的孔隙尺度建模提供了一个宝贵的机会,可以研究宏观尺度性质在动态过程中的演变。然而,由于相关的X射线微CT过程引起的成像限制,它们受到成像限制,这导致了性质估计的差异。对动力学参数的评估也带来了挑战,因为反应性系数是关键参数,可以涵盖一系列值。我们考虑这两个问题,并通过将不确定性量化整合到工作流程中,确保了对基于动态微CT图像的孔隙尺度建模的可靠校准。 现有方法基于钙石灰溶解中的反应性逆问题的多任务表述,结合基于数据驱动和基于物理的技术,允许在孔隙度场上量化形态学不确定性,并通过带有潜在浓度场和动态微CT的预设PDE模型估计反应性参数范围。数据同化策略依赖于逐步增加PDE约束的顺序强化。通过简单的自适应权重调整贝叶斯物理信息神经网络(BPINNs),我们保证了对地球化学转化过程中微观孔隙度变化的可靠和无偏的不确定性量化。我们演示了基于合成微CT图像的1D+时间和2D+时间钙石灰溶解的成功贝叶斯推断,获得了对反应性参数和无量纲数具有实质后验分布的有意义结果。
更新时间: 2024-08-17 15:43:04
领域: cs.LG,cs.NA,cs.NE,math.NA
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models
Foundation models have demonstrated remarkable capabilities in handling diverse modalities and tasks, outperforming conventional artificial intelligence (AI) approaches that are highly task-specific and modality-reliant. In the medical domain, however, the development of comprehensive foundation models is constrained by limited access to diverse modalities and stringent privacy regulations. To address these constraints, this study introduces a novel knowledge injection approach, FedKIM, designed to scale the medical foundation model within a federated learning framework. FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model using a designed adaptive Multitask Multimodal Mixture Of Experts (M3OE) module. This method not only preserves privacy but also enhances the model's ability to handle complex medical tasks involving multiple modalities. Our extensive experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings, highlighting its potential to scale medical foundation models without direct access to sensitive data.
Updated: 2024-08-17 15:42:29
标题: FEDKIM:自适应联邦知识注入到医学基础模型
摘要: 基础模型在处理多种形式和任务方面表现出卓越的能力,胜过传统的人工智能方法,后者高度特定于任务并依赖于形式。然而,在医学领域,由于对多种形式的有限访问和严格的隐私法规,全面的基础模型的开发受到限制。为了解决这些限制,本研究引入了一种新颖的知识注入方法FedKIM,旨在在联邦学习框架内扩展医学基础模型。FedKIM利用轻量级本地模型从私人数据中提取医疗健康知识,并将这些知识整合到一个设计的自适应多任务多模态专家混合模块(M3OE)的中央基础模型中。这种方法不仅保护隐私,还增强了模型处理涉及多种形式的复杂医学任务的能力。我们在七种形式的十二项任务中进行了大量实验,展示了FedKIM在各种环境中的有效性,并突出了它在无需直接访问敏感数据的情况下扩展医学基础模型的潜力。
更新时间: 2024-08-17 15:42:29
领域: cs.LG,cs.AI
Unraveling Text Generation in LLMs: A Stochastic Differential Equation Approach
This paper explores the application of Stochastic Differential Equations (SDE) to interpret the text generation process of Large Language Models (LLMs) such as GPT-4. Text generation in LLMs is modeled as a stochastic process where each step depends on previously generated content and model parameters, sampling the next word from a vocabulary distribution. We represent this generation process using SDE to capture both deterministic trends and stochastic perturbations. The drift term describes the deterministic trends in the generation process, while the diffusion term captures the stochastic variations. We fit these functions using neural networks and validate the model on real-world text corpora. Through numerical simulations and comprehensive analyses, including drift and diffusion analysis, stochastic process property evaluation, and phase space exploration, we provide deep insights into the dynamics of text generation. This approach not only enhances the understanding of the inner workings of LLMs but also offers a novel mathematical perspective on language generation, which is crucial for diagnosing, optimizing, and controlling the quality of generated text.
Updated: 2024-08-17 15:30:27
标题: 揭示语言模型文本生成的过程:一种随机微分方程方法
摘要: 本文探讨了随机微分方程(SDE)在解释大型语言模型(LLMs)如GPT-4的文本生成过程中的应用。LLMs中的文本生成被建模为一个随机过程,其中每一步依赖于先前生成的内容和模型参数,从词汇分布中抽样下一个词。我们使用SDE表示这个生成过程,以捕捉确定性趋势和随机扰动。漂移项描述生成过程中的确定性趋势,扩散项捕捉随机变化。我们使用神经网络拟合这些函数,并在真实文本语料库上验证模型。通过数值模拟和全面分析,包括漂移和扩散分析、随机过程属性评估和相空间探索,我们深入了解文本生成的动态。这种方法不仅增强了对LLMs内部运作的理解,还为语言生成提供了一种新颖的数学视角,这对于诊断、优化和控制生成文本的质量至关重要。
更新时间: 2024-08-17 15:30:27
领域: cs.LG,cs.AI,cs.CL
iRAG: Advancing RAG for Videos with an Incremental Approach
Retrieval-augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for understanding of videos is appealing but there are two critical limitations. One-time, upfront conversion of all content in large corpus of videos into text descriptions entails high processing times. Also, not all information in the rich video data is typically captured in the text descriptions. Since user queries are not known apriori, developing a system for video to text conversion and interactive querying of video data is challenging. To address these limitations, we propose an incremental RAG system called iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of a large corpus of videos. Unlike traditional RAG, iRAG quickly indexes large repositories of videos, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the videos to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long video to text conversion times, and overcomes information loss issues due to conversion of video to text, by doing on-demand query-specific extraction of details in video data. This ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of a large corpus of videos. Experimental results on real-world datasets demonstrate 23x to 25x faster video to text ingestion, while ensuring that latency and quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any user querying.
Updated: 2024-08-17 15:29:35
标题: iRAG:采用增量方法推进视频RAG
摘要: 检索增强生成(RAG)系统结合了语言生成和信息检索的优势,用于支持诸如聊天机器人之类的许多现实世界应用。使用RAG进行视频理解具有吸引力,但存在两个关键限制。一次性将大型视频语料库中的所有内容转换为文本描述需要较长的处理时间。此外,富含视频数据的所有信息通常并未捕获在文本描述中。由于用户查询事先不可知,因此开发一个用于视频到文本转换和交互式查询视频数据的系统具有挑战性。 为了解决这些限制,我们提出了一种称为增量RAG系统的iRAG,该系统通过新颖的增量工作流程来增强RAG,以支持对大型视频语料库的交互式查询。与传统的RAG不同,iRAG可以快速索引大型视频存储库,并且在增量工作流程中,它利用索引从视频的选择部分机会性地提取更多细节,以检索与交互用户查询相关的上下文。这种增量工作流避免了长时间的视频到文本转换,通过根据需求进行查询特定提取视频数据中的细节来克服由视频转换成文本引起的信息丢失问题。这确保了对通常事先不可知的交互用户查询的高质量响应。据我们所知,iRAG是第一个使用增量工作流增强RAG以支持对大型视频语料库进行高效交互式查询的系统。在真实世界数据集上的实验结果显示,视频到文本摄取速度提高了23倍至25倍,同时确保对交互式用户查询的响应的延迟和质量与将所有视频数据转换为文本之前进行任何用户查询的传统RAG的响应相当。
更新时间: 2024-08-17 15:29:35
领域: cs.CV,cs.IR,cs.LG
Siamese Multiple Attention Temporal Convolution Networks for Human Mobility Signature Identification
The Human Mobility Signature Identification (HuMID) problem stands as a fundamental task within the realm of driving style representation, dedicated to discerning latent driving behaviors and preferences from diverse driver trajectories for driver identification. Its solutions hold significant implications across various domains (e.g., ride-hailing, insurance), wherein their application serves to safeguard users and mitigate potential fraudulent activities. Present HuMID solutions often exhibit limitations in adaptability when confronted with lengthy trajectories, consequently incurring substantial computational overhead. Furthermore, their inability to effectively extract crucial local information further impedes their performance. To address this problem, we propose a Siamese Multiple Attention Temporal Convolutional Network (Siamese MA-TCN) to capitalize on the strengths of both TCN architecture and multi-head self-attention, enabling the proficient extraction of both local and long-term dependencies. Additionally, we devise a novel attention mechanism tailored for the efficient aggregation of multi-scale representations derived from our model. Experimental evaluations conducted on two real-world taxi trajectory datasets reveal that our proposed model effectively extracts both local key information and long-term dependencies. These findings highlight the model's outstanding generalization capabilities, demonstrating its robustness and adaptability across datasets of varying sizes.
Updated: 2024-08-17 15:27:38
标题: 《暹罗多重关注时间卷积网络用于人类活动特征识别》
摘要: 人类移动特征识别(HuMID)问题是驾驶风格表示领域中的一个基本任务,旨在从不同司机轨迹中辨别潜在的驾驶行为和偏好,用于司机识别。其解决方案在各个领域(如网约车、保险)中具有重要意义,其应用可用于保护用户并减少潜在的欺诈活动。目前的HuMID解决方案在面对长轨迹时往往存在适应性限制,因此会产生大量的计算开销。此外,它们无法有效提取关键的局部信息,进一步影响了性能。为了解决这个问题,我们提出了一个Siamese Multiple Attention Temporal Convolutional Network(Siamese MA-TCN),以充分利用TCN架构和多头自注意力的优势,实现对局部和长期依赖性的有效提取。此外,我们设计了一种新颖的注意力机制,专门用于高效聚合从我们的模型中得出的多尺度表示。对两个真实出租车轨迹数据集进行的实验评估表明,我们提出的模型有效提取了局部关键信息和长期依赖性。这些发现突显了模型出色的泛化能力,展示了其在不同大小的数据集上的鲁棒性和适应性。
更新时间: 2024-08-17 15:27:38
领域: cs.AI
KAN or MLP: A Fairer Comparison
This paper does not introduce a novel method. Instead, it offers a fairer and more comprehensive comparison of KAN and MLP models across various tasks, including machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation. Specifically, we control the number of parameters and FLOPs to compare the performance of KAN and MLP. Our main observation is that, except for symbolic formula representation tasks, MLP generally outperforms KAN. We also conduct ablation studies on KAN and find that its advantage in symbolic formula representation mainly stems from its B-spline activation function. When B-spline is applied to MLP, performance in symbolic formula representation significantly improves, surpassing or matching that of KAN. However, in other tasks where MLP already excels over KAN, B-spline does not substantially enhance MLP's performance. Furthermore, we find that KAN's forgetting issue is more severe than that of MLP in a standard class-incremental continual learning setting, which differs from the findings reported in the KAN paper. We hope these results provide insights for future research on KAN and other MLP alternatives. Project link: https://github.com/yu-rp/KANbeFair
Updated: 2024-08-17 15:20:31
标题: KAN还是MLP:一个更公平的比较
摘要: 这篇论文并没有介绍一种新颖的方法。相反,它提供了对KAN和MLP模型在各种任务中进行更公平和全面的比较,包括机器学习、计算机视觉、音频处理、自然语言处理和符号公式表示。具体来说,我们控制参数数量和FLOPs来比较KAN和MLP的性能。我们的主要观察结果是,除了符号公式表示任务外,MLP通常优于KAN。我们还对KAN进行了消融研究,发现它在符号公式表示中的优势主要来自其B样条激活函数。当B样条应用于MLP时,符号公式表示的性能显著提高,超过或与KAN相匹配。然而,在MLP已经优于KAN的其他任务中,B样条并不能显著提升MLP的性能。此外,我们发现在标准的逐步学习设置中,KAN的遗忘问题比MLP更严重,这与KAN论文中报道的结果不同。我们希望这些结果为未来关于KAN和其他MLP替代方案的研究提供见解。项目链接:https://github.com/yu-rp/KANbeFair
更新时间: 2024-08-17 15:20:31
领域: cs.LG,cs.AI
FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection
This study introduces the Federated Medical Knowledge Injection (FEDMEKI) platform, a new benchmark designed to address the unique challenges of integrating medical knowledge into foundation models under privacy constraints. By leveraging a cross-silo federated learning approach, FEDMEKI circumvents the issues associated with centralized data collection, which is often prohibited under health regulations like the Health Insurance Portability and Accountability Act (HIPAA) in the USA. The platform is meticulously designed to handle multi-site, multi-modal, and multi-task medical data, which includes 7 medical modalities, including images, signals, texts, laboratory test results, vital signs, input variables, and output variables. The curated dataset to validate FEDMEKI covers 8 medical tasks, including 6 classification tasks (lung opacity detection, COVID-19 detection, electrocardiogram (ECG) abnormal detection, mortality prediction, sepsis prediction, and enlarged cardiomediastinum detection) and 2 generation tasks (medical visual question answering (MedVQA) and ECG noise clarification). This comprehensive dataset is partitioned across several clients to facilitate the decentralized training process under 16 benchmark approaches. FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models by allowing them to learn from a broader spectrum of medical knowledge without direct data exposure, thereby setting a new benchmark in the application of foundation models within the healthcare sector.
Updated: 2024-08-17 15:18:56
标题: FEDMEKI:通过联邦知识注入扩展医学基础模型的基准
摘要: 这项研究介绍了联邦医学知识注入(FEDMEKI)平台,这是一个旨在解决在隐私约束下将医学知识整合到基础模型中的独特挑战的新基准。通过利用跨界联邦学习方法,FEDMEKI规避了与集中数据收集相关的问题,这在像美国的《健康保险便携与责任法案》(HIPAA)等健康法规下通常是被禁止的。该平台经过精心设计,可处理多站点、多模态和多任务的医学数据,其中包括7种医学模态,包括图像、信号、文本、实验室检验结果、生命体征、输入变量和输出变量。用于验证FEDMEKI的筛选数据集涵盖了8个医学任务,包括6个分类任务(肺部浓度检测、COVID-19检测、心电图(ECG)异常检测、死亡预测、败血症预测和心影增大检测)和2个生成任务(医学视觉问题回答(MedVQA)和ECG噪音澄清)。这个全面的数据集被分成几个客户端,以便在16种基准方法下促进去中心化训练过程。FEDMEKI不仅保护数据隐私,还通过让医学基础模型能够从更广泛的医学知识中学习而无需直接暴露数据,从而在医疗保健领域中的基础模型应用中设立了一个新的基准。
更新时间: 2024-08-17 15:18:56
领域: cs.AI
Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration
In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing artificial intelligence (AI) systems primarily rooted in the cloud. As these systems grapple with shifting data distributions between the cloud and devices, the traditional approach of fine-tuning-based adaptation (FTA) exists the following issues: the costly and time-consuming data annotation required by FTA and the looming risk of model overfitting. To surmount these challenges, we introduce a Universal On-Device Multi-modal Model Adaptation Framework, revolutionizing on-device model adaptation by striking a balance between efficiency and effectiveness. The framework features the Fast Domain Adaptor (FDA) hosted in the cloud, providing tailored parameters for the Lightweight Multi-modal Model on devices. To enhance adaptability across multi-modal tasks, the AnchorFrame Distribution Reasoner (ADR) minimizes communication costs. Our contributions, encapsulated in the Cloud-Device Collaboration Multi-modal Parameter Generation (CDC-MMPG) framework, represent a pioneering solution for on-Device Multi-modal Model Adaptation (DMMA). Extensive experiments validate the efficiency and effectiveness of our method, particularly in video question answering and retrieval tasks, driving forward the integration of intelligent devices into our daily lives.
Updated: 2024-08-17 15:07:59
标题: 无需反向传播的多模态设备上的模型适应:通过云设备协作实现
摘要: 在我们日益互联的世界中,智能设备不断积累大量个性化的多模态数据,迫切需要提供高质量、个性化的设备感知服务。然而,这一努力对目前主要基于云的人工智能系统提出了多方面挑战。随着这些系统在云和设备之间处理数据分布的变化,基于微调的适应(FTA)传统方法存在以下问题:FTA需要昂贵和耗时的数据注释,以及模型过拟合的风险。为了克服这些挑战,我们引入了一种革新的通用设备感知多模态模型适应框架,通过在设备上实现模型适应的效率和效果之间的平衡。该框架包含在云中托管的快速域适配器(FDA),为设备上的轻量级多模态模型提供定制参数。为了增强跨多模态任务的适应性,AnchorFrame分布推理器(ADR)降低了通信成本。我们的贡献,包括在云设备协作多模态参数生成(CDC-MMPG)框架中概括的内容,代表了一种针对设备多模态模型适应(DMMA)的开创性解决方案。大量实验证实了我们的方法的效率和效果,特别是在视频问答和检索任务中,在推动智能设备融入我们日常生活方面取得了进展。
更新时间: 2024-08-17 15:07:59
领域: cs.DC,cs.AI,cs.LG
Mitigating Pooling Bias in E-commerce Search via False Negative Estimation
Efficient and accurate product relevance assessment is critical for user experiences and business success. Training a proficient relevance assessment model requires high-quality query-product pairs, often obtained through negative sampling strategies. Unfortunately, current methods introduce pooling bias by mistakenly sampling false negatives, diminishing performance and business impact. To address this, we present Bias-mitigating Hard Negative Sampling (BHNS), a novel negative sampling strategy tailored to identify and adjust for false negatives, building upon our original False Negative Estimation algorithm. Our experiments in the Instacart search setting confirm BHNS as effective for practical e-commerce use. Furthermore, comparative analyses on public dataset showcase its domain-agnostic potential for diverse applications.
Updated: 2024-08-17 15:04:58
标题: 通过假阴性估计减轻电子商务搜索中的池化偏差
摘要: 高效和准确的产品相关性评估对于用户体验和业务成功至关重要。训练一个熟练的相关性评估模型需要高质量的查询-产品对,通常通过负采样策略获得。不幸的是,当前的方法通过错误地采样假负例引入了池化偏差,降低了性能和业务影响。为了解决这个问题,我们提出了一种新颖的负采样策略Bias-mitigating Hard Negative Sampling(BHNS),它专门针对识别和调整假负例而设计,基于我们原始的False Negative Estimation算法。我们在Instacart搜索设置中的实验证实了BHNS在实际电子商务中的有效性。此外,对公共数据集的比较分析展示了它在各种应用中的领域不可知潜力。
更新时间: 2024-08-17 15:04:58
领域: cs.IR,cs.LG
Flatten: Video Action Recognition is an Image Classification task
In recent years, video action recognition, as a fundamental task in the field of video understanding, has been deeply explored by numerous researchers.Most traditional video action recognition methods typically involve converting videos into three-dimensional data that encapsulates both spatial and temporal information, subsequently leveraging prevalent image understanding models to model and analyze these data. However,these methods have significant drawbacks. Firstly, when delving into video action recognition tasks, image understanding models often need to be adapted accordingly in terms of model architecture and preprocessing for these spatiotemporal tasks; Secondly, dealing with high-dimensional data often poses greater challenges and incurs higher time costs compared to its lower-dimensional counterparts.To bridge the gap between image-understanding and video-understanding tasks while simplifying the complexity of video comprehension, we introduce a novel video representation architecture, Flatten, which serves as a plug-and-play module that can be seamlessly integrated into any image-understanding network for efficient and effective 3D temporal data modeling.Specifically, by applying specific flattening operations (e.g., row-major transform), 3D spatiotemporal data is transformed into 2D spatial information, and then ordinary image understanding models are used to capture temporal dynamic and spatial semantic information, which in turn accomplishes effective and efficient video action recognition. Extensive experiments on commonly used datasets (Kinetics-400, Something-Something v2, and HMDB-51) and three classical image classification models (Uniformer, SwinV2, and ResNet), have demonstrated that embedding Flatten provides a significant performance improvements over original model.
Updated: 2024-08-17 14:59:58
标题: 压平:视频动作识别是一项图像分类任务
摘要: 近年来,视频动作识别作为视频理解领域的基础任务,已经被众多研究人员深入探索。大多数传统的视频动作识别方法通常涉及将视频转换为包含空间和时间信息的三维数据,随后利用普遍的图像理解模型来对这些数据进行建模和分析。然而,这些方法存在显著的缺点。首先,在探究视频动作识别任务时,图像理解模型通常需要根据模型架构和预处理对这些时空任务进行相应调整;其次,处理高维数据通常比处理低维数据带来更大的挑战和更高的时间成本。为了弥合图像理解和视频理解任务之间的差距,同时简化视频理解的复杂性,我们引入了一种新颖的视频表示架构,Flatten,它作为一个即插即用的模块,可以无缝地集成到任何图像理解网络中,用于高效和有效地建模3D时间数据。具体地,通过应用特定的展平操作(例如,行主转换),3D时空数据被转换为2D空间信息,然后普通的图像理解模型被用来捕捉时间动态和空间语义信息,从而实现有效和高效的视频动作识别。对常用数据集(Kinetics-400、Something-Something v2和HMDB-51)以及三种经典图像分类模型(Uniformer、SwinV2和ResNet)的大量实验表明,嵌入Flatten可以显著提高性能。
更新时间: 2024-08-17 14:59:58
领域: cs.CV,cs.AI
RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representations from X-Ray Images
In this study, we propose a novel method called region-guided masked image modeling (RGMIM) for learning meaningful representations from X-ray images. Our method adopts a new masking strategy that utilizes organ mask information to identify valid regions for learning more meaningful representations. We conduct quantitative evaluations on an open lung X-ray image dataset as well as masking ratio hyperparameter studies. When using the entire training set, RGMIM outperformed other comparable methods, achieving a 0.962 lung disease detection accuracy. Specifically, RGMIM significantly improved performance in small data volumes, such as 5% and 10% of the training set compared to other methods. RGMIM can mask more valid regions, facilitating the learning of discriminative representations and the subsequent high-accuracy lung disease detection. RGMIM outperforms other state-of-the-art self-supervised learning methods in experiments, particularly when limited training data is used.
Updated: 2024-08-17 14:59:56
标题: RGMIM: 区域引导的掩模图像建模,用于从X射线图像中学习有意义的表示
摘要: 在这项研究中,我们提出了一种称为区域引导遮罩图像建模(RGMIM)的新方法,用于从X射线图像中学习有意义的表示。我们的方法采用了一种新的遮罩策略,利用器官遮罩信息来识别学习更有意义表示的有效区域。我们对一个开放的肺部X射线图像数据集进行了定量评估,同时进行了遮罩比例超参数研究。在使用整个训练集时,RGMIM优于其他可比较的方法,实现了0.962的肺部疾病检测准确率。具体而言,与其他方法相比,RGMIM在小数据量(例如训练集的5%和10%)中显著提高了性能。RGMIM可以遮罩更多有效区域,有助于学习具有区分性的表示,并随后实现高准确度的肺部疾病检测。在实验中,RGMIM优于其他最先进的自监督学习方法,特别是在使用有限的训练数据时。
更新时间: 2024-08-17 14:59:56
领域: cs.CV,cs.LG,eess.IV
FedKBP: Federated dose prediction framework for knowledge-based planning in radiation therapy
Dose prediction plays a key role in knowledge-based planning (KBP) by automatically generating patient-specific dose distribution. Recent advances in deep learning-based dose prediction methods necessitates collaboration among data contributors for improved performance. Federated learning (FL) has emerged as a solution, enabling medical centers to jointly train deep-learning models without compromising patient data privacy. We developed the FedKBP framework to evaluate the performances of centralized, federated, and individual (i.e. separated) training of dose prediction model on the 340 plans from OpenKBP dataset. To simulate FL and individual training, we divided the data into 8 training sites. To evaluate the effect of inter-site data variation on model training, we implemented two types of case distributions: 1) Independent and identically distributed (IID), where the training and validating cases were evenly divided among the 8 sites, and 2) non-IID, where some sites have more cases than others. The results show FL consistently outperforms individual training on both model optimization speed and out-of-sample testing scores, highlighting the advantage of FL over individual training. Under IID data division, FL shows comparable performance to centralized training, underscoring FL as a promising alternative to traditional pooled-data training. Under non-IID division, larger sites outperformed smaller sites by up to 19% on testing scores, confirming the need of collaboration among data owners to achieve better prediction accuracy. Meanwhile, non-IID FL showed reduced performance as compared to IID FL, posing the need for more sophisticated FL method beyond mere model averaging to handle data variation among participating sites.
Updated: 2024-08-17 14:57:14
标题: FedKBP:放射治疗中基于知识的规划的联合剂量预测框架
摘要: Dose prediction在基于知识的规划(KBP)中起着关键作用,通过自动生成特定患者的剂量分布。最近基于深度学习的剂量预测方法的进步需要数据贡献者之间的合作以提高性能。联邦学习(FL)已经成为一个解决方案,使医疗中心能够共同训练深度学习模型而不损害患者数据隐私。我们开发了FedKBP框架来评估在OpenKBP数据集的340个计划上对剂量预测模型的中央、联邦和个体(即分离)训练的性能。为了模拟FL和个体训练,我们将数据分为8个训练站点。为了评估站点间数据变化对模型训练的影响,我们实施了两种类型的案例分布:1)独立且相同分布(IID),其中训练和验证案例在8个站点之间均匀分布,2)非IID,其中一些站点比其他站点拥有更多案例。结果显示FL在模型优化速度和样本外测试分数方面一直优于个体训练,突显了FL相对于个体训练的优势。在IID数据分割下,FL表现出与中央训练相当的性能,强调FL作为传统混合数据训练的有希望替代方案。在非IID分割下,较大站点在测试分数上优于较小站点高达19%,证实了数据所有者之间合作以实现更好的预测精度的需求。同时,与IID FL相比,非IID FL表现出降低的性能,提出了需要更复杂的FL方法来处理参与站点间数据变化的需求。
更新时间: 2024-08-17 14:57:14
领域: cs.LG,cs.AI
Scalable and Certifiable Graph Unlearning via Lazy Local Propagation
With the recent adoption of laws supporting the ``right to be forgotten'' and the widespread use of Graph Neural Networks for modeling graph-structured data, graph unlearning has emerged as a crucial research area. Current studies focus on the efficient update of model parameters. However, they often overlook the time-consuming re-computation of graph propagation required for each removal, significantly limiting their scalability on large graphs. In this paper, we present ScaleGUN, the first certifiable graph unlearning mechanism that scales to billion-edge graphs. ScaleGUN employs a lazy local propagation method to facilitate efficient updates of the embedding matrix during data removal. Such lazy local propagation can be proven to ensure certified unlearning under all three graph unlearning scenarios, including node feature, edge, and node unlearning. Extensive experiments on real-world datasets demonstrate the efficiency and efficacy of ScaleGUN. Remarkably, ScaleGUN accomplishes $(\epsilon,\delta)=(1,10^{-4})$ certified unlearning on the billion-edge graph ogbn-papers100M in 20 seconds for a $5K$-random-edge removal request -- of which only 5 seconds are required for updating the embedding matrix -- compared to 1.91 hours for retraining and 1.89 hours for re-propagation. Our code is available online.
Updated: 2024-08-17 14:41:02
标题: 可扩展且可验证的图遗忘方法:基于懒惰本地传播
摘要: 最近,随着支持“被遗忘权利”的法律的颁布以及广泛使用图神经网络来建模图结构化数据,图遗忘已经成为一个关键的研究领域。当前的研究主要集中在模型参数的高效更新上。然而,他们通常忽略了每次删除所需的耗时重新计算图传播,这显著限制了它们在大型图上的可伸缩性。 本文介绍了ScaleGUN,这是第一个能够扩展到百亿边图的可证明图遗忘机制。ScaleGUN采用一种懒惰的局部传播方法,在数据删除过程中促进嵌入矩阵的高效更新。这种懒惰的局部传播可以证明在所有三种图遗忘场景下确保认证的遗忘,包括节点特征、边和节点遗忘。在真实世界数据集上的大量实验表明了ScaleGUN的效率和功效。值得注意的是,与重新训练需要1.91小时和重新传播需要1.89小时相比,ScaleGUN在20秒内就可以在十亿边图ogbn-papers100M上完成(ε,δ)=(1,10^ -4 )的认证遗忘,用于5K个随机边删除请求-其中仅需要5秒来更新嵌入矩阵。我们的代码可以在线获得。
更新时间: 2024-08-17 14:41:02
领域: cs.LG
On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization
Forward-only learning algorithms have recently gained attention as alternatives to gradient backpropagation, replacing the backward step of this latter solver with an additional contrastive forward pass. Among these approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to achieve competitive levels of performance in terms of generalization and complexity. Networks trained using FFA learn to contrastively maximize a layer-wise defined goodness score when presented with real data (denoted as positive samples) and to minimize it when processing synthetic data (corr. negative samples). However, this algorithm still faces weaknesses that negatively affect the model accuracy and training stability, primarily due to a gradient imbalance between positive and negative samples. To overcome this issue, in this work we propose a novel implementation of the FFA algorithm, denoted as Polar-FFA, which extends the original formulation by introducing a neural division (\emph{polarization}) between positive and negative instances. Neurons in each of these groups aim to maximize their goodness when presented with their respective data type, thereby creating a symmetric gradient behavior. To empirically gauge the improved learning capabilities of our proposed Polar-FFA, we perform several systematic experiments using different activation and goodness functions over image classification datasets. Our results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and convergence speed. Furthermore, its lower reliance on hyperparameters reduces the need for hyperparameter tuning to guarantee optimal generalization capabilities, thereby allowing for a broader range of neural network configurations.
Updated: 2024-08-17 14:32:18
标题: 关于通过神经极化改善前向学习的泛化性和稳定性
摘要: 最近,正向学习算法作为梯度反向传播的替代方案引起了人们的关注,用额外的对比正向传递替换了后者求解器的反向步骤。在这些方法中,所谓的前向前向算法(FFA)已被证明在泛化和复杂性方面达到了竞争水平。使用FFA训练的网络学会在呈现真实数据(标记为正样本)时对比最大化逐层定义的优势分数,并在处理合成数据(相关负样本)时对其进行最小化。然而,这种算法仍然面临着一些负面影响模型准确性和训练稳定性的弱点,主要是由于正负样本之间的梯度不平衡。为了克服这个问题,在这项工作中,我们提出了FFA算法的新颖实现,称为Polar-FFA,它通过在正负实例之间引入神经分割(极化)扩展了原始公式。每个组中的神经元都旨在在呈现各自数据类型时最大化其优势,从而创建对称的梯度行为。为了在经验上衡量我们提出的Polar-FFA的改进学习能力,我们在不同的激活和优势函数上使用图像分类数据集进行了几个系统实验。我们的结果表明,Polar-FFA在准确性和收敛速度方面优于FFA。此外,它对超参数的较低依赖性减少了对超参数调整的需求,以确保最佳的泛化能力,从而允许更广泛范围的神经网络配置。
更新时间: 2024-08-17 14:32:18
领域: cs.LG,cs.AI,cs.NE
H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory
Convolutional Neural Networks (CNNs) combine large amounts of parallelizable computation with frequent memory access. Field Programmable Gate Arrays (FPGAs) can achieve low latency and high throughput CNN inference by implementing dataflow accelerators that pipeline layer-specific hardware to implement an entire network. By implementing a different processing element for each CNN layer, these layer-pipelined accelerators can achieve high compute density, but having all layers processing in parallel requires high memory bandwidth. Traditionally this has been satisfied by storing all weights on chip, but this is infeasible for the largest CNNs, which are often those most in need of acceleration. In this work we augment a state-of-the-art dataflow accelerator (HPIPE) to leverage both High-Bandwidth Memory (HBM) and on-chip storage, enabling high performance layer-pipelined dataflow acceleration of large CNNs. Based on profiling results of HBM's latency and throughput against expected address patterns, we develop an algorithm to choose which weight buffers should be moved off chip and how deep the on-chip FIFOs to HBM should be to minimize compute unit stalling. We integrate the new hardware generation within the HPIPE domain-specific CNN compiler and demonstrate good bandwidth efficiency against theoretical limits. Compared to the best prior work we obtain speed-ups of at least 19.4x, 5.1x and 10.5x on ResNet-18, ResNet-50 and VGG-16 respectively.
Updated: 2024-08-17 14:25:32
标题: H2PIPE:具有高带宽存储器的FPGA上的高吞吐量CNN推断
摘要: 卷积神经网络(CNNs)将大量可并行计算与频繁的内存访问结合在一起。可编程门阵列(FPGAs)通过实现数据流加速器,将特定于层的硬件进行流水线处理以实现整个网络,从而实现低延迟和高吞吐量的CNN推断。通过为每个CNN层实现不同的处理单元,这些层级流水线加速器可以实现高计算密度,但所有层级同时处理需要高内存带宽。传统上通过在芯片上存储所有权重来满足这一需求,但对于最大的CNNs来说,这是不可行的,而这些通常是最需要加速的。在本研究中,我们对最先进的数据流加速器(HPIPE)进行了增强,以利用高带宽存储器(HBM)和芯片上的存储,实现大型CNN的高性能层级流水线数据流加速。根据HBM的延迟和吞吐量的分析结果与预期的地址模式进行比较,我们开发了一种算法来选择哪些权重缓冲区应该移出芯片,以及应该将芯片上的FIFO深度设置为HBM以最大程度地减少计算单元的停顿。我们将新硬件生成集成到HPIPE领域特定的CNN编译器中,并展示了与理论极限相比的良好带宽效率。与最佳先前工作相比,我们在ResNet-18、ResNet-50和VGG-16上分别获得至少19.4倍、5.1倍和10.5倍的加速。
更新时间: 2024-08-17 14:25:32
领域: cs.AR,cs.LG
ADformer: A Multi-Granularity Transformer for EEG-Based Alzheimer's Disease Assessment
Electroencephalogram (EEG) has emerged as a cost-effective and efficient method for supporting neurologists in assessing Alzheimer's disease (AD). Existing approaches predominantly utilize handcrafted features or Convolutional Neural Network (CNN)-based methods. However, the potential of the transformer architecture, which has shown promising results in various time series analysis tasks, remains underexplored in interpreting EEG for AD assessment. Furthermore, most studies are evaluated on the subject-dependent setup but often overlook the significance of the subject-independent setup. To address these gaps, we present ADformer, a novel multi-granularity transformer designed to capture temporal and spatial features to learn effective EEG representations. We employ multi-granularity data embedding across both dimensions and utilize self-attention to learn local features within each granularity and global features among different granularities. We conduct experiments across 5 datasets with a total of 525 subjects in setups including subject-dependent, subject-independent, and leave-subjects-out. Our results show that ADformer outperforms existing methods in most evaluations, achieving F1 scores of 75.19% and 93.58% on two large datasets with 65 subjects and 126 subjects, respectively, in distinguishing AD and healthy control (HC) subjects under the challenging subject-independent setup.
Updated: 2024-08-17 14:10:41
标题: ADformer:一种用于基于脑电图的阿尔茨海默病评估的多粒度Transformer
摘要: 脑电图(EEG)已经成为一种成本效益高且高效的方法,用于支持神经学家评估阿尔茨海默病(AD)。现有方法主要利用手工特征或基于卷积神经网络(CNN)的方法。然而,变压器架构的潜力,在各种时间序列分析任务中显示出有希望的结果,但在解释用于AD评估的EEG方面仍然未被充分探索。此外,大多数研究是基于受试者的设置进行评估,但往往忽视了基于受试者的设置的重要性。为了解决这些空白,我们提出了ADformer,这是一种新颖的多粒度变压器,旨在捕捉时间和空间特征,以学习有效的EEG表示。我们在两个维度上采用多粒度数据嵌入,并利用自注意力来学习每个粒度内的局部特征和不同粒度之间的全局特征。我们在5个数据集上进行实验,共计525名受试者,包括基于受试者的设置、基于受试者的设置和留出受试者。我们的结果表明,在大多数评估中,ADformer的性能优于现有方法,在两个大型数据集中,分别有65名受试者和126名受试者,在具有挑战性的基于受试者的设置下,将AD和健康对照(HC)受试者区分开来,分别达到75.19%和93.58%的F1分数。
更新时间: 2024-08-17 14:10:41
领域: eess.SP,cs.CE,cs.LG
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild
Blind image quality assessment (IQA) in the wild, which assesses the quality of images with complex authentic distortions and no reference images, presents significant challenges. Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem. Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA (DP-IQA), to utilize the T2I model's prior for improved performance and generalization ability. Specifically, we utilize pre-trained Stable Diffusion as the backbone, extracting multi-level features from the denoising U-Net guided by prompt embeddings through a tunable text adapter. Simultaneously, an image adapter compensates for information loss introduced by the lossy pre-trained encoder. Unlike T2I models that require full image distribution modeling, our approach targets image quality assessment, which inherently requires fewer parameters. To improve applicability, we distill the knowledge into a lightweight CNN-based student model, significantly reducing parameters while maintaining or even enhancing generalization performance. Experimental results demonstrate that DP-IQA achieves state-of-the-art performance on various in-the-wild datasets, highlighting the superior generalization capability of T2I priors in blind IQA tasks. To our knowledge, DP-IQA is the first method to apply pre-trained diffusion priors in blind IQA. Codes and checkpoints are available at https://github.com/RomGai/DP-IQA.
Updated: 2024-08-17 13:53:17
标题: DP-IQA:利用扩散先验进行野外盲图像质量评估
摘要: 在野外进行盲目图像质量评估(IQA),评估具有复杂真实失真和无参考图像的图像质量,面临着重大挑战。鉴于收集大规模训练数据的困难,利用有限数据开发具有强大泛化能力的模型仍然是一个开放性问题。受预训练文本到图像(T2I)扩散模型的强大图像感知能力启发,我们提出了一种新颖的IQAmethod,基于扩散先验的图像质量评估(DP-IQA),以利用T2I模型的先验提高性能和泛化能力。具体来说,我们利用预训练的Stable Diffusion作为骨干,通过可调节的文本适配器从去噪U-Net中提取多级特征,通过提示嵌入进行引导。同时,图像适配器补偿了由有损预训练编码器引入的信息损失。与需要完整图像分布建模的T2I模型不同,我们的方法针对图像质量评估,这在本质上需要更少的参数。为了提高适用性,我们将知识提炼到基于轻量级CNN的学生模型中,显著减少参数,同时保持甚至增强泛化性能。实验结果表明,DP-IQA在各种野外数据集上实现了最先进的性能,突显了T2I先验在盲目IQA任务中的优越泛化能力。据我们所知,DP-IQA是第一个在盲目IQA中应用预训练扩散先验的方法。代码和检查点可在https://github.com/RomGai/DP-IQA找到。
更新时间: 2024-08-17 13:53:17
领域: cs.CV,cs.AI
PhaGO: Protein function annotation for bacteriophages by integrating the genomic context
Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation. Accurate function annotation of phage proteins presents several challenges, including their inherent diversity and the scarcity of annotated ones. Existing tools have yet to fully leverage the unique properties of phages in annotating protein functions. In this work, we propose a new protein function annotation tool for phages by leveraging the modular genomic structure of phage genomes. By employing embeddings from the latest protein foundation models and Transformer to capture contextual information between proteins in phage genomes, PhaGO surpasses state-of-the-art methods in annotating diverged proteins and proteins with uncommon functions by 6.78% and 13.05% improvement, respectively. PhaGO can annotate proteins lacking homology search results, which is critical for characterizing the rapidly accumulating phage genomes. We demonstrate the utility of PhaGO by identifying 688 potential holins in phages, which exhibit high structural conservation with known holins. The results show the potential of PhaGO to extend our understanding of newly discovered phages.
Updated: 2024-08-17 13:46:36
标题: PhaGO:通过整合基因组上下文对噬菌体进行蛋白质功能注释
摘要: 细菌噬菌体是针对细菌的病毒,在微生物生态中扮演着关键角色。噬菌体蛋白在理解噬菌体生物学方面至关重要,比如病毒感染、复制和进化。尽管通过宏基因组测序已经发现了大量新的噬菌体,但其中许多的蛋白功能注释有限。准确的噬菌体蛋白功能注释面临着几个挑战,包括它们固有的多样性和已注释蛋白的稀缺性。现有工具尚未充分利用噬菌体在注释蛋白功能方面的独特性。在这项工作中,我们提出了一种利用噬菌体基因组的模块化结构的新蛋白功能注释工具。通过利用最新的蛋白质基础模型和Transformer中的嵌入来捕获噬菌体基因组中蛋白之间的上下文信息,PhaGO在注释分化蛋白和具有不常见功能的蛋白方面分别提高了6.78%和13.05%。PhaGO可以注释缺乏同源搜索结果的蛋白,这对于表征迅速积累的噬菌体基因组至关重要。我们通过识别688个潜在的噬菌体中的孔蛋白,展示了PhaGO的实用性,这些孔蛋白具有与已知孔蛋白高度结构保守性。结果显示了PhaGO扩展我们对新发现噬菌体的理解的潜力。
更新时间: 2024-08-17 13:46:36
领域: q-bio.QM,cs.AI,cs.LG
MagLive: Robust Voice Liveness Detection on Smartphones Using Magnetic Pattern Changes
Voice authentication has been widely used on smartphones. However, it remains vulnerable to spoofing attacks, where the attacker replays recorded voice samples from authentic humans using loudspeakers to bypass the voice authentication system. In this paper, we present MagLive, a robust voice liveness detection scheme designed for smartphones to mitigate such spoofing attacks. MagLive leverages the differences in magnetic pattern changes generated by different speakers (i.e., humans or loudspeakers) when speaking for liveness detection, which are captured by the built-in magnetometer on smartphones. To extract effective and robust magnetic features, MagLive utilizes a TF-CNN-SAF model as the feature extractor, which includes a time-frequency convolutional neural network (TF-CNN) combined with a self-attention-based fusion (SAF) model. Supervised contrastive learning is then employed to achieve user-irrelevance, device-irrelevance, and content-irrelevance. MagLive imposes no additional burden on users and does not rely on active sensing or specialized hardware. We conducted comprehensive experiments with various settings to evaluate the security and robustness of MagLive. Our results demonstrate that MagLive effectively distinguishes between humans and attackers (i.e., loudspeakers), achieving an average balanced accuracy (BAC) of 99.01% and an equal error rate (EER) of 0.77%.
Updated: 2024-08-17 13:41:58
标题: MagLive:利用磁场模式变化在智能手机上实现声音活体检测
摘要: 语音认证已广泛应用于智能手机上。然而,它仍然容易受到欺骗攻击的影响,攻击者可以通过使用扬声器重放录制的真实人类声音样本来绕过语音认证系统。本文介绍了MagLive,这是一种专为智能手机设计的强大的语音活体检测方案,旨在减轻此类欺骗攻击。MagLive利用智能手机上内置的磁力计捕捉由不同说话者(即人类或扬声器)产生的磁场模式变化以用于活体检测。为了提取有效且强大的磁特征,MagLive利用了TF-CNN-SAF模型作为特征提取器,该模型包括一个时间频率卷积神经网络(TF-CNN)和一个基于自注意力的融合(SAF)模型。然后采用监督对比学习来实现用户无关性、设备无关性和内容无关性。MagLive不会给用户增加额外负担,也不依赖于主动感应或专门的硬件。我们进行了各种设置的全面实验来评估MagLive的安全性和稳健性。我们的结果表明,MagLive有效区分人类和攻击者(即扬声器),平均平衡准确度(BAC)达到99.01%,等错误率(EER)为0.77%。
更新时间: 2024-08-17 13:41:58
领域: cs.CR
Solving Partial Differential Equations with Equivariant Extreme Learning Machines
We utilize extreme-learning machines for the prediction of partial differential equations (PDEs). Our method splits the state space into multiple windows that are predicted individually using a single model. Despite requiring only few data points (in some cases, our method can learn from a single full-state snapshot), it still achieves high accuracy and can predict the flow of PDEs over long time horizons. Moreover, we show how additional symmetries can be exploited to increase sample efficiency and to enforce equivariance.
Updated: 2024-08-17 13:40:48
标题: 用等变极限学习机解决偏微分方程
摘要: 我们利用极限学习机来预测偏微分方程(PDEs)。我们的方法将状态空间分割成多个窗口,分别使用单一模型进行预测。尽管只需要少量数据点(在某些情况下,我们的方法甚至可以从一个完整状态的快照中学习),但仍然能够达到高准确性,并且可以预测长时间范围内的PDEs流动。此外,我们展示了如何利用额外的对称性来增加样本效率并强制等变性。
更新时间: 2024-08-17 13:40:48
领域: cs.LG
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
A single language model, even when aligned with labelers through reinforcement learning from human feedback (RLHF), may not suit all human preferences. Recent approaches therefore prefer customization, gathering multi-dimensional feedback, and creating distinct reward models for each dimension. Different language models are then optimized for various preferences using multi-objective RLHF (MORLHF) with varying reward weights. However, RL fine-tuning is unstable and resource-heavy, especially with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free extension of Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models that combine all objectives with specific weights. MODPO theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient. Empirical results in safety alignment and long-form question answering show that MODPO matches or outperforms existing methods, producing a Pareto front of language models catering to diverse preferences with three times less computational resources compared to MORLHF. Code is available at https://github.com/ZHZisZZ/modpo.
Updated: 2024-08-17 13:39:13
标题: 超越一种偏好适配方法:多目标直接偏好优化
摘要: 一个单一的语言模型,即使通过强化学习从人类反馈中与标记者对齐(RLHF),可能并不适合所有人类偏好。因此,最近的方法更倾向于定制化,收集多维反馈,并为每个维度创建不同的奖励模型。然后,使用不同的奖励权重,通过多目标RLHF(MORLHF)优化不同的语言模型以满足各种偏好。然而,RL微调是不稳定且资源密集的,尤其是在具有多样化且通常相互冲突的目标的情况下。在本文中,我们提出了多目标直接偏好优化(MODPO),这是Direct Preference Optimization(DPO)的无RL扩展,适用于多个对齐目标。基本上,MODPO将语言建模直接融入奖励建模中,将语言模型训练为隐式集体奖励模型,结合具有特定权重的所有目标。MODPO在理论上产生与MORLHF相同的最优解,但实际上更加稳定和高效。在安全对齐和长篇问答方面的实证结果表明,MODPO与现有方法匹配或表现更好,生成满足各种偏好的语言模型帕累托前沿,与MORLHF相比,计算资源减少三倍。代码可在 https://github.com/ZHZisZZ/modpo 获取。
更新时间: 2024-08-17 13:39:13
领域: cs.LG,cs.AI
Maintainability Challenges in ML: A Systematic Literature Review
Background: As Machine Learning (ML) advances rapidly in many fields, it is being adopted by academics and businesses alike. However, ML has a number of different challenges in terms of maintenance not found in traditional software projects. Identifying what causes these maintainability challenges can help mitigate them early and continue delivering value in the long run without degrading ML performance. Aim: This study aims to identify and synthesise the maintainability challenges in different stages of the ML workflow and understand how these stages are interdependent and impact each other's maintainability. Method: Using a systematic literature review, we screened more than 13000 papers, then selected and qualitatively analysed 56 of them. Results: (i) a catalogue of maintainability challenges in different stages of Data Engineering, Model Engineering workflows and the current challenges when building ML systems are discussed; (ii) a map of 13 maintainability challenges to different interdependent stages of ML that impact the overall workflow; (iii) Provided insights to developers of ML tools and researchers. Conclusions: In this study, practitioners and organisations will learn about maintainability challenges and their impact at different stages of ML workflow. This will enable them to avoid pitfalls and help to build a maintainable ML system. The implications and challenges will also serve as a basis for future research to strengthen our understanding of the ML system's maintainability.
Updated: 2024-08-17 13:24:15
标题: 机器学习中的可维护性挑战:一项系统文献综述
摘要: 背景:随着机器学习(ML)在许多领域迅速发展,学术界和企业纷纷采用。然而,ML在维护方面存在许多与传统软件项目不同的挑战。识别导致这些维护挑战的原因可以帮助及早减轻它们,并在长期内持续提供价值而不降低ML性能。目的:本研究旨在识别和综合ML工作流程不同阶段的维护挑战,并了解这些阶段如何相互依存并影响彼此的可维护性。方法:使用系统文献综述,我们筛选了13000多篇论文,然后选择并对其中的56篇进行了定性分析。结果:(i)讨论了数据工程、模型工程工作流程中不同阶段的可维护性挑战目录以及构建ML系统时的当前挑战;(ii)绘制了13个可维护性挑战与ML不同相互依存阶段之间的地图,影响整体工作流程;(iii)为ML工具开发人员和研究人员提供了见解。结论:通过本研究,从业者和组织将了解ML工作流程不同阶段的可维护性挑战及其影响。这将使他们避免陷阱,并有助于构建一个可维护的ML系统。这些含义和挑战也将为未来研究加强我们对ML系统可维护性的理解奠定基础。
更新时间: 2024-08-17 13:24:15
领域: cs.AI,cs.SE
Quality Assessment in the Era of Large Models: A Survey
Quality assessment, which evaluates the visual quality level of multimedia experiences, has garnered significant attention from researchers and has evolved substantially through dedicated efforts. Before the advent of large models, quality assessment typically relied on small expert models tailored for specific tasks. While these smaller models are effective at handling their designated tasks and predicting quality levels, they often lack explainability and robustness. With the advancement of large models, which align more closely with human cognitive and perceptual processes, many researchers are now leveraging the prior knowledge embedded in these large models for quality assessment tasks. This emergence of quality assessment within the context of large models motivates us to provide a comprehensive review focusing on two key aspects: 1) the assessment of large models, and 2) the role of large models in assessment tasks. We begin by reflecting on the historical development of quality assessment. Subsequently, we move to detailed discussions of related works concerning quality assessment in the era of large models. Finally, we offer insights into the future progression and potential pathways for quality assessment in this new era. We hope this survey will enable a rapid understanding of the development of quality assessment in the era of large models and inspire further advancements in the field.
Updated: 2024-08-17 13:20:55
标题: 在大模型时代的质量评估:一项调查
摘要: 质量评估,评估多媒体体验的视觉质量水平,引起了研究人员的广泛关注,并通过专门的努力取得了显著进展。在大型模型出现之前,质量评估通常依赖于为特定任务量身定制的小型专家模型。虽然这些较小的模型在处理其指定任务和预测质量水平方面是有效的,但它们通常缺乏解释性和鲁棒性。随着大型模型的进步,这些模型更加贴近人类认知和感知过程,许多研究人员现在正在利用这些大型模型中嵌入的先前知识进行质量评估任务。大型模型背景下质量评估的出现激励我们提供一份关注两个关键方面的全面综述:1)大型模型的评估,2)大型模型在评估任务中的作用。我们首先回顾了质量评估的历史发展。随后,我们将详细讨论与大型模型时代质量评估相关的作品。最后,我们对未来在这个新时代质量评估的发展和潜在路径提供见解。我们希望这份调查能够快速了解大型模型时代质量评估的发展,并激发该领域的进一步发展。
更新时间: 2024-08-17 13:20:55
领域: cs.HC,cs.AI
TimeSense: Multi-Person Device-free Indoor Localization via RTT
Locating the persons moving through an environment without the necessity of them being equipped with special devices has become vital for many applications including security, IoT, healthcare, etc. Existing device-free indoor localization systems commonly rely on the utilization of Received Signal Strength Indicator (RSSI) and WiFi Channel State Information (CSI) techniques. However, the accuracy of RSSI is adversely affected by environmental factors like multi-path interference and fading. Additionally, the lack of standardization in CSI necessitates the use of specialized hardware and software. In this paper, we present TimeSense, a deep learning-based multi-person device-free indoor localization system that addresses these challenges. TimeSense leverages Time of Flight information acquired by the fine-time measurement protocol of IEEE 802.11-2016 standard. Specifically, the measured round trip time between the transmitter and receiver is influenced by the dynamic changes in the environment induced by human presence. TimeSense effectively detects this anomalous behavior using a stacked denoising auto-encoder model, thereby estimating the user's location. The system incorporates a probabilistic approach on top of the deep learning model to ensure seamless tracking of the users. The evaluation of TimeSene in two realistic environments demonstrates its efficacy, achieving a median localization accuracy of 1.57 and 2.65 meters. This surpasses the performance of state-of-the-art techniques by 49% and 103% in the two testbeds.
Updated: 2024-08-17 13:12:33
标题: TimeSense:通过RTT实现的多人无设备室内定位
摘要: 在许多应用程序中,无需被装备特殊设备即可定位通过环境移动的人员已变得至关重要,包括安全、物联网和医疗保健等。现有的无设备室内定位系统通常依赖于接收信号强度指示器(RSSI)和WiFi信道状态信息(CSI)技术。然而,RSSI的准确性受到环境因素(如多径干扰和衰落)的不利影响。此外,CSI的标准化不足需要使用专用硬件和软件。在本文中,我们介绍了TimeSense,这是一个基于深度学习的多人无设备室内定位系统,旨在解决这些挑战。TimeSense利用IEEE 802.11-2016标准的精确时间测量协议获取的飞行时间信息。具体来说,发射器和接收器之间的往返时间受到人体存在引起的环境动态变化的影响。TimeSense利用堆叠去噪自编码器模型有效地检测这种异常行为,从而估计用户的位置。该系统在深度学习模型之上采用概率方法,以确保对用户的无缝跟踪。在两个现实环境中对TimeSense的评估证明了其有效性,实现了1.57米和2.65米的中位定位精度。这超过了两个测试床上最先进技术的表现分别提高了49%和103%。
更新时间: 2024-08-17 13:12:33
领域: eess.SP,cs.AI,cs.LG
DRL-Based Resource Allocation for Motion Blur Resistant Federated Self-Supervised Learning in IoV
In the Internet of Vehicles (IoV), Federated Learning (FL) provides a privacy-preserving solution by aggregating local models without sharing data. Traditional supervised learning requires image data with labels, but data labeling involves significant manual effort. Federated Self-Supervised Learning (FSSL) utilizes Self-Supervised Learning (SSL) for local training in FL, eliminating the need for labels while protecting privacy. Compared to other SSL methods, Momentum Contrast (MoCo) reduces the demand for computing resources and storage space by creating a dictionary. However, using MoCo in FSSL requires uploading the local dictionary from vehicles to Base Station (BS), which poses a risk of privacy leakage. Simplified Contrast (SimCo) addresses the privacy leakage issue in MoCo-based FSSL by using dual temperature instead of a dictionary to control sample distribution. Additionally, considering the negative impact of motion blur on model aggregation, and based on SimCo, we propose a motion blur-resistant FSSL method, referred to as BFSSL. Furthermore, we address energy consumption and delay in the BFSSL process by proposing a Deep Reinforcement Learning (DRL)-based resource allocation scheme, called DRL-BFSSL. In this scheme, BS allocates the Central Processing Unit (CPU) frequency and transmission power of vehicles to minimize energy consumption and latency, while aggregating received models based on the motion blur level. Simulation results validate the effectiveness of our proposed aggregation and resource allocation methods.
Updated: 2024-08-17 13:12:04
标题: 基于深度强化学习的IoV中抗动态模糊的联合自监督学习资源分配
摘要: 在车联网中,联邦学习(FL)通过聚合本地模型而无需共享数据提供了一种保护隐私的解决方案。传统的监督学习需要带有标签的图像数据,但数据标记需要大量人工劳动。联邦自监督学习(FSSL)利用自监督学习(SSL)在FL中进行本地训练,消除了对标签的需求同时保护隐私。与其他SSL方法相比,动量对比(MoCo)通过创建字典减少了对计算资源和存储空间的需求。然而,在FSSL中使用MoCo需要将本地字典从车辆上传至基站(BS),这会导致隐私泄露的风险。简化对比(SimCo)通过使用双温度而不是字典来控制样本分布,解决了基于MoCo的FSSL中的隐私泄露问题。此外,考虑到运动模糊对模型聚合的负面影响,基于SimCo,我们提出了一种抗运动模糊的FSSL方法,称为BFSSL。此外,我们通过提出基于深度强化学习(DRL)的资源分配方案DRL-BFSSL来解决BFSSL过程中的能量消耗和延迟问题。在这种方案中,BS分配车辆的中央处理单元(CPU)频率和传输功率以最小化能量消耗和延迟,并基于运动模糊水平聚合接收到的模型。模拟结果验证了我们提出的聚合和资源分配方法的有效性。
更新时间: 2024-08-17 13:12:04
领域: cs.CV,cs.LG,cs.NI
AI Managed Emergency Documentation with a Pretrained Model
This study investigates the use of a large language model system to improve efficiency and quality in emergency department (ED) discharge letter writing. Time constraints and infrastructural deficits make compliance with current discharge letter targets difficult. We explored potential efficiencies from an artificial intelligence software in the generation of ED discharge letters and the attitudes of doctors toward this technology. The evaluated system leverages advanced techniques to fine-tune a model to generate discharge summaries from short-hand inputs, including voice, text, and electronic health record data. Nineteen physicians with emergency medicine experience evaluated the system text and voice-to-text interfaces against manual typing. The results showed significant time savings with MedWrite LLM interfaces compared to manual methods.
Updated: 2024-08-17 13:11:46
标题: 人工智能管理的应急文档与预训练模型
摘要: 这项研究调查了使用大型语言模型系统来提高急诊科(ED)出院信写作效率和质量。时间限制和基础设施不足使得难以达到当前的出院信目标。我们探讨了人工智能软件在生成ED出院信和医生对这项技术的态度中可能存在的效率。评估的系统利用先进技术对模型进行微调,从简化输入(包括语音、文本和电子健康记录数据)生成出院摘要。十九位有急诊医学经验的医生评估了系统文本和语音转文本界面与手动输入的比较。结果显示,与手动方法相比,使用MedWrite LLM界面可以显著节省时间。
更新时间: 2024-08-17 13:11:46
领域: cs.AI,cs.CL
SA-GDA: Spectral Augmentation for Graph Domain Adaptation
Graph neural networks (GNNs) have achieved impressive impressions for graph-related tasks. However, most GNNs are primarily studied under the cases of signal domain with supervised training, which requires abundant task-specific labels and is difficult to transfer to other domains. There are few works focused on domain adaptation for graph node classification. They mainly focused on aligning the feature space of the source and target domains, without considering the feature alignment between different categories, which may lead to confusion of classification in the target domain. However, due to the scarcity of labels of the target domain, we cannot directly perform effective alignment of categories from different domains, which makes the problem more challenging. In this paper, we present the \textit{Spectral Augmentation for Graph Domain Adaptation (\method{})} for graph node classification. First, we observe that nodes with the same category in different domains exhibit similar characteristics in the spectral domain, while different classes are quite different. Following the observation, we align the category feature space of different domains in the spectral domain instead of aligning the whole features space, and we theoretical proof the stability of proposed \method{}. Then, we develop a dual graph convolutional network to jointly exploits local and global consistency for feature aggregation. Last, we utilize a domain classifier with an adversarial learning submodule to facilitate knowledge transfer between different domain graphs. Experimental results on a variety of publicly available datasets reveal the effectiveness of our \method{}.
Updated: 2024-08-17 13:01:45
标题: SA-GDA:图领域自适应的频谱增强
摘要: 图神经网络(GNNs)在与图相关的任务中取得了令人印象深刻的成就。然而,大多数GNNs主要是在受监督训练的信号域下研究的,这需要大量的特定任务标签,并且很难转移到其他领域。针对图节点分类的领域自适应的研究较少。它们主要关注于对齐源域和目标域的特征空间,而不考虑不同类别之间的特征对齐,这可能导致目标域中的分类混淆。然而,由于目标域标签稀缺,我们无法直接对不同领域的类别进行有效的对齐,这使问题更具挑战性。在本文中,我们提出了用于图节点分类的图领域自适应的\textit{光谱增强方法(\method{})}。首先,我们观察到不同领域中相同类别的节点在光谱域中表现出相似的特征,而不同类别则相差很大。根据这一观察,我们在光谱域中对不同领域的类别特征空间进行对齐,而不是对齐整个特征空间,并且我们理论上证明了提出方法的稳定性。然后,我们开发了一个双图卷积网络,同时利用局部和全局一致性进行特征聚合。最后,我们利用一个带有对抗学习子模块的领域分类器促进不同领域图之间的知识转移。对各种公开可用数据集的实验结果揭示了我们方法的有效性。
更新时间: 2024-08-17 13:01:45
领域: cs.LG,cs.AI
Attack Anything: Blind DNNs via Universal Background Adversarial Attack
It has been widely substantiated that deep neural networks (DNNs) are susceptible and vulnerable to adversarial perturbations. Existing studies mainly focus on performing attacks by corrupting targeted objects (physical attack) or images (digital attack), which is intuitively acceptable and understandable in terms of the attack's effectiveness. In contrast, our focus lies in conducting background adversarial attacks in both digital and physical domains, without causing any disruptions to the targeted objects themselves. Specifically, an effective background adversarial attack framework is proposed to attack anything, by which the attack efficacy generalizes well between diverse objects, models, and tasks. Technically, we approach the background adversarial attack as an iterative optimization problem, analogous to the process of DNN learning. Besides, we offer a theoretical demonstration of its convergence under a set of mild but sufficient conditions. To strengthen the attack efficacy and transferability, we propose a new ensemble strategy tailored for adversarial perturbations and introduce an improved smooth constraint for the seamless connection of integrated perturbations. We conduct comprehensive and rigorous experiments in both digital and physical domains across various objects, models, and tasks, demonstrating the effectiveness of attacking anything of the proposed method. The findings of this research substantiate the significant discrepancy between human and machine vision on the value of background variations, which play a far more critical role than previously recognized, necessitating a reevaluation of the robustness and reliability of DNNs. The code will be publicly available at https://github.com/JiaweiLian/Attack_Anything
Updated: 2024-08-17 12:46:53
标题: 攻击一切:通过通用背景对抗攻击盲目的深度神经网络
摘要: 已经广泛证实,深度神经网络(DNNs)容易受到对抗性扰动的影响并容易受攻击。现有研究主要集中在通过破坏目标对象(物理攻击)或图像(数字攻击)来进行攻击,这在攻击效果方面直观可接受和可理解。相比之下,我们的重点在于在数字和物理领域进行背景对抗攻击,而不会对目标对象本身造成任何干扰。具体而言,我们提出了一种有效的背景对抗攻击框架,通过该框架,攻击效果在不同对象、模型和任务之间广泛泛化。从技术上讲,我们将背景对抗攻击看作是一个迭代优化问题,类似于DNN学习的过程。此外,我们在一组温和但足够的条件下提供了其收敛性的理论证明。为了增强攻击效果和可转移性,我们提出了一种针对对抗性扰动量身定制的新集成策略,并引入了一种改进的平滑约束,以实现集成扰动的无缝连接。我们在数字和物理领域对各种对象、模型和任务进行了全面严格的实验,展示了所提方法攻击任何目标的有效性。本研究的发现证实了人类和机器视觉对背景变化价值之间的显著差异,这些变化发挥的作用比以前认识的更为关键,需要重新评估DNN的鲁棒性和可靠性。该代码将公开在https://github.com/JiaweiLian/Attack_Anything。
更新时间: 2024-08-17 12:46:53
领域: cs.CV,cs.CR,cs.LG
EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition
Emotion recognition using electroencephalography (EEG) signals has garnered widespread attention in recent years. However, existing studies have struggled to develop a sufficiently generalized model suitable for different datasets without re-training (cross-corpus). This difficulty arises because distribution differences across datasets far exceed the intra-dataset variability. To solve this problem, we propose a novel Soft Contrastive Masked Modeling (SCMM) framework. Inspired by emotional continuity, SCMM integrates soft contrastive learning with a new hybrid masking strategy to effectively mine the "short-term continuity" characteristics inherent in human emotions. During the self-supervised learning process, soft weights are assigned to sample pairs, enabling adaptive learning of similarity relationships across samples. Furthermore, we introduce an aggregator that weightedly aggregates complementary information from multiple close samples based on pairwise similarities among samples to enhance fine-grained feature representation, which is then used for original sample reconstruction. Extensive experiments on the SEED, SEED-IV and DEAP datasets show that SCMM achieves state-of-the-art (SOTA) performance, outperforming the second-best method by an average accuracy of 4.26% under two types of cross-corpus conditions (same-class and different-class) for EEG-based emotion recognition.
Updated: 2024-08-17 12:35:13
标题: EEG-SCMM:用于跨语料库基于脑电图的情绪识别的软对比掩蔽建模
摘要: 情绪识别利用脑电图(EEG)信号近年来引起了广泛关注。然而,现有研究在不重新训练(跨语料库)的情况下开发适用于不同数据集的广义模型时遇到了困难。这一困难是因为数据集之间的分布差异远远超过了数据集内的变化。为了解决这个问题,我们提出了一个新颖的Soft Contrastive Masked Modeling(SCMM)框架。受情感连续性的启发,SCMM将软对比学习与新的混合掩膜策略相结合,有效挖掘人类情绪中固有的“短期连续性”特征。在自监督学习过程中,对样本配对分配软权重,实现对样本之间相似关系的自适应学习。此外,我们引入了一个聚合器,根据样本之间的成对相似性加权聚合来自多个相近样本的互补信息,以增强细粒度特征表示,然后用于原始样本重建。对SEED、SEED-IV和DEAP数据集的大量实验表明,SCMM在基于脑电图的情绪识别中取得了最先进的性能(SOTA),在两种跨语料库条件(相同类别和不同类别)下,准确率平均比第二好的方法高出4.26%。
更新时间: 2024-08-17 12:35:13
领域: cs.HC,cs.AI
PADetBench: Towards Benchmarking Physical Attacks against Object Detection
Physical attacks against object detection have gained increasing attention due to their significant practical implications. However, conducting physical experiments is extremely time-consuming and labor-intensive. Moreover, physical dynamics and cross-domain transformation are challenging to strictly regulate in the real world, leading to unaligned evaluation and comparison, severely hindering the development of physically robust models. To accommodate these challenges, we explore utilizing realistic simulation to thoroughly and rigorously benchmark physical attacks with fairness under controlled physical dynamics and cross-domain transformation. This resolves the problem of capturing identical adversarial images that cannot be achieved in the real world. Our benchmark includes 20 physical attack methods, 48 object detectors, comprehensive physical dynamics, and evaluation metrics. We also provide end-to-end pipelines for dataset generation, detection, evaluation, and further analysis. In addition, we perform 8064 groups of evaluation based on our benchmark, which includes both overall evaluation and further detailed ablation studies for controlled physical dynamics. Through these experiments, we provide in-depth analyses of physical attack performance and physical adversarial robustness, draw valuable observations, and discuss potential directions for future research. Codebase: https://github.com/JiaweiLian/Benchmarking_Physical_Attack
Updated: 2024-08-17 12:11:22
标题: PADetBench:面向目标检测的物理攻击基准测试
摘要: 对物体检测的物理攻击引起了越来越多的关注,因为它们具有重要的实际意义。然而,进行物理实验非常耗时且劳动密集。此外,现实世界中的物理动态和跨领域转换具有挑战性,难以严格监管,导致评估和比较不一致,严重阻碍了物理鲁棒模型的发展。为了解决这些挑战,我们探索利用逼真的模拟来全面且严格地对物理攻击进行公平的基准测试,在受控的物理动态和跨领域转换下。这解决了在现实世界中无法实现的捕捉相同对抗性图像的问题。我们的基准测试包括20种物理攻击方法,48种物体检测器,全面的物理动态和评估指标。我们还提供了用于数据集生成、检测、评估和进一步分析的端到端流程。此外,我们基于我们的基准测试进行了8064组评估,包括整体评估和进一步详细的受控物理动态消融研究。通过这些实验,我们对物理攻击性能和物理对抗鲁棒性进行了深入分析,得出了有价值的观察,并讨论了未来研究的潜在方向。 代码库:https://github.com/JiaweiLian/Benchmarking_Physical_Attack
更新时间: 2024-08-17 12:11:22
领域: cs.CV,cs.CR,cs.LG
On the Reliability of Radio Frequency Fingerprinting
Radio Frequency Fingerprinting (RFF) offers a unique method for identifying devices at the physical (PHY) layer based on their RF emissions due to intrinsic hardware differences. Nevertheless, RFF techniques depend on the ability to extract information from the PHY layer of the radio spectrum by resorting to Software Defined Radios (SDR). Previous works have highlighted the so-called ``Day-After-Tomorrow'' effect, i.e., an intrinsic issue of SDRs leading to a fingerprint mutation following a radio power cycle. In this work, we extend such a study by demonstrating that fingerprint mutations appear every time a new FPGA image is reloaded, i.e., when the SDR initiates a new communication. In this context, we provide an in-depth analysis of the reliability of RFF over multiple FPGA image reloading operations, highlighting its ephemeral and mutational nature. We introduce a methodology for abstracting fingerprint mutations into a graph and provide a theoretical framework for assessing fingerprint reliability. Our results show that the common assumption of considering the RF fingerprint as unique and always persistent is incorrect. By combining real-world measurements, high-performance SDRs, and state-of-the-art deep learning techniques, we experimentally demonstrate that radio devices feature multiple fingerprints that can be clustered according to shared features. Moreover, we show that the RF fingerprint is a time-independent probabilistic phenomenon, which requires the collection of multiple samples to achieve the necessary reliability.
Updated: 2024-08-17 12:06:21
标题: 关于无线电频率指纹识别的可靠性
摘要: 射频指纹识别(RFF)提供了一种独特的方法,可以根据设备在物理(PHY)层的射频发射来识别设备,这是由于固有硬件差异导致的。然而,RFF技术依赖于通过软件定义无线电(SDR)从无线频谱的PHY层提取信息的能力。先前的研究强调了所谓的“后天效应”,即SDR的固有问题,导致射频功率循环后指纹发生变异。在这项工作中,我们通过展示每次重新加载新的FPGA图像(即当SDR启动新的通信时)时指纹变异出现,扩展了这样的研究。在这种情况下,我们对RFF在多次FPGA图像重新加载操作中的可靠性进行了深入分析,突出其短暂和变异的特性。我们引入了一种将指纹变异抽象成图形的方法,并提供了一个评估指纹可靠性的理论框架。我们的结果表明,将射频指纹视为唯一且始终持续的常见假设是不正确的。通过结合现实世界的测量数据、高性能SDR和最先进的深度学习技术,我们实验性地展示了射频设备具有多个根据共享特征进行聚类的指纹。此外,我们表明射频指纹是一种与时间无关的概率现象,需要收集多个样本来达到必要的可靠性。
更新时间: 2024-08-17 12:06:21
领域: cs.CR
Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties
We introduce the elEmBERT model for chemical classification tasks. It is based on deep learning techniques, such as a multilayer encoder architecture. We demonstrate the opportunities offered by our approach on sets of organic, inorganic and crystalline compounds. In particular, we developed and tested the model using the Matbench and Moleculenet benchmarks, which include crystal properties and drug design-related benchmarks. We also conduct an analysis of vector representations of chemical compounds, shedding light on the underlying patterns in structural data. Our model exhibits exceptional predictive capabilities and proves universally applicable to molecular and material datasets. For instance, on the Tox21 dataset, we achieved an average precision of 96%, surpassing the previously best result by 10%.
Updated: 2024-08-17 12:06:04
标题: 结构到性质:化学元素嵌入和深度学习方法用于准确预测化学性质
摘要: 我们介绍了elEmBERT模型用于化学分类任务。它基于深度学习技术,如多层编码器架构。我们展示了我们的方法在有机、无机和晶体化合物集上所提供的机会。特别是,我们使用Matbench和Moleculenet基准进行了模型的开发和测试,这些基准包括晶体性质和药物设计相关的基准。我们还对化合物的向量表示进行了分析,揭示了结构数据中的潜在模式。我们的模型展现出了出色的预测能力,并证明在分子和材料数据集上普遍适用。例如,在Tox21数据集上,我们取得了96%的平均精度,超过之前最佳结果10%。
更新时间: 2024-08-17 12:06:04
领域: physics.chem-ph,cond-mat.mtrl-sci,cs.LG,physics.atm-clus,q-bio.QM
Chinese Metaphor Recognition Using a Multi-stage Prompting Large Language Model
Metaphors are common in everyday language, and the identification and understanding of metaphors are facilitated by models to achieve a better understanding of the text. Metaphors are mainly identified and generated by pre-trained models in existing research, but situations, where tenors or vehicles are not included in the metaphor, cannot be handled. The problem can be effectively solved by using Large Language Models (LLMs), but significant room for exploration remains in this early-stage research area. A multi-stage generative heuristic-enhanced prompt framework is proposed in this study to enhance the ability of LLMs to recognize tenors, vehicles, and grounds in Chinese metaphors. In the first stage, a small model is trained to obtain the required confidence score for answer candidate generation. In the second stage, questions are clustered and sampled according to specific rules. Finally, the heuristic-enhanced prompt needed is formed by combining the generated answer candidates and demonstrations. The proposed model achieved 3rd place in Track 1 of Subtask 1, 1st place in Track 2 of Subtask 1, and 1st place in both tracks of Subtask 2 at the NLPCC-2024 Shared Task 9.
Updated: 2024-08-17 11:56:38
标题: 使用多阶段提示大型语言模型识别汉语隐喻
摘要: 隐喻在日常语言中很常见,通过模型的识别和理解,可以更好地理解文本。现有研究主要通过预先训练的模型来识别和生成隐喻,但是在隐喻中没有包含被喻体或载体的情况下无法处理。这个问题可以通过使用大型语言模型(LLMs)来有效解决,但在这个早期研究领域仍有很大的探索空间。本研究提出了一个多阶段生成式启发式增强提示框架,以增强LLMs识别中文隐喻中的被喻体、载体和基础的能力。在第一阶段,训练一个小模型以获得所需的答案候选生成的置信分数。在第二阶段,根据特定规则对问题进行聚类和抽样。最后,通过组合生成的答案候选和演示形成所需的启发式增强提示。提出的模型在NLPCC-2024共享任务9的Track 1的Subtask 1中获得第三名,在Track 2的Subtask 1中获得第一名,并在Subtask 2的两个轨道中都获得第一名。
更新时间: 2024-08-17 11:56:38
领域: cs.CL,cs.AI
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making
Resolving the dichotomy between the human-like yet constrained reasoning processes of Cognitive Architectures and the broad but often noisy inference behavior of Large Language Models (LLMs) remains a challenging but exciting pursuit, for enabling reliable machine reasoning capabilities in production systems. Because Cognitive Architectures are famously developed for the purpose of modeling the internal mechanisms of human cognitive decision-making at a computational level, new investigations consider the goal of informing LLMs with the knowledge necessary for replicating such processes, e.g., guided perception, memory, goal-setting, and action. Previous approaches that use LLMs for grounded decision-making struggle with complex reasoning tasks that require slower, deliberate cognition over fast and intuitive inference -- reporting issues related to the lack of sufficient grounding, as in hallucination. To resolve these challenges, we introduce LLM-ACTR, a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making by integrating the ACT-R Cognitive Architecture with LLMs. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations, injects this information into trainable LLM adapter layers, and fine-tunes the LLMs for downstream prediction. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability of our approach, compared to LLM-only baselines that leverage chain-of-thought reasoning strategies.
Updated: 2024-08-17 11:49:53
标题: 认知LLMs:将认知架构与大型语言模型整合,用于制造业决策-making
摘要: 解决认知架构的类人但受限的推理过程与大型语言模型(LLMs)的广泛但常常嘈杂推理行为之间的二元对立仍然是一个具有挑战性但令人兴奋的追求,这为在生产系统中实现可靠的机器推理能力铺平了道路。由于认知架构被广泛用于在计算级别建模人类认知决策的内部机制,新的研究致力于为LLMs提供复制类似过程所需的知识,例如引导感知、记忆、目标设定和行动。先前使用LLMs进行基于实地决策的方法在需要较慢、深思熟虑认知而非快速直觉推理的复杂推理任务上遇到困难,报告了与不足的基础有关的问题,如幻觉。为了解决这些挑战,我们引入了LLM-ACTR,这是一种新颖的神经符号架构,通过将ACT-R认知架构与LLMs集成,提供了与人类对齐且多功能的决策能力。我们的框架提取并嵌入ACT-R内部决策过程的知识作为潜在神经表示,将此信息注入可训练的LLM适配器层,并对LLMs进行下游预测的微调。我们在新颖的制造设计任务上进行的实验显示,与利用链式思维推理策略的仅使用LLMs的基线相比,我们的方法既提高了任务性能,也提高了基于实地决策的能力。
更新时间: 2024-08-17 11:49:53
领域: cs.AI,cs.CL,cs.SC
Ranking Across Different Content Types: The Robust Beauty of Multinomial Blending
An increasing number of media streaming services have expanded their offerings to include entities of multiple content types. For instance, audio streaming services that started by offering music only, now also offer podcasts, merchandise items, and videos. Ranking items across different content types into a single slate poses a significant challenge for traditional learning-to-rank (LTR) algorithms due to differing user engagement patterns for different content types. We explore a simple method for cross-content-type ranking, called multinomial blending (MB), which can be used in conjunction with most existing LTR algorithms. We compare MB to existing baselines not only in terms of ranking quality but also from other industry-relevant perspectives such as interpretability, ease-of-use, and stability in dynamic environments with changing user behavior and ranking model retraining. Finally, we report the results of an A/B test from an Amazon Music ranking use-case.
Updated: 2024-08-17 11:11:31
标题: 跨不同内容类型的排名:多项混合的稳健之美
摘要: 越来越多的媒体流媒体服务已经扩展了他们的服务,包括多种内容类型的实体。例如,最初只提供音乐的音频流媒体服务现在也提供播客、商品和视频。将不同内容类型的项目排名到一个单一平台上对传统的学习排序(LTR)算法提出了重大挑战,原因是不同内容类型的用户参与模式不同。我们探索了一种用于跨内容类型排名的简单方法,称为多项混合(MB),可以与大多数现有的LTR算法一起使用。我们不仅从排名质量的角度比较MB与现有基准,还从其他与行业相关的角度比较,如可解释性、易用性以及在用户行为和排名模型重新训练变化的动态环境中的稳定性。最后,我们报告了一个亚马逊音乐排名使用案例的A/B测试结果。
更新时间: 2024-08-17 11:11:31
领域: cs.IR,cs.AI,cs.LG
Benchmarking quantum machine learning kernel training for classification tasks
Quantum-enhanced machine learning is a rapidly evolving field that aims to leverage the unique properties of quantum mechanics to enhance classical machine learning. However, the practical applicability of these methods remains an open question, particularly in the context of real-world datasets and the limitations of current quantum hardware. This work performs a benchmark study of Quantum Kernel Estimation (QKE) and Quantum Kernel Training (QKT) with a focus on classification tasks. Through a series of experiments, the versatility and generalization capabilities of two quantum feature mappings, namely ZZFeatureMap and CovariantFeatureMap, are analyzed in this context. Remarkably, these feature maps have been proposed in the literature under the conjecture of possible near-term quantum advantage and have shown promising performance in ad-hoc datasets. This study explores both artificial and established reference datasets and incorporates classical machine learning methods, specifically Support Vector Machines (SVMs) and logistic regression, as baseline comparisons. Experimental results indicate that quantum methods exhibit varying performance across different datasets. While they outperform classical methods in ad-hoc datasets, they frequently encounter difficulties in generalizing to unseen test data when dealing with reference classical datasets, even if achieving high classification accuracy on the training data. It is suggested that the choice of the feature mapping and the optimization of kernel parameters through QKT are critical for maximizing the effectiveness of quantum methods.
Updated: 2024-08-17 10:53:06
标题: 基于分类任务的量子机器学习核训练基准测试
摘要: 量子增强机器学习是一个快速发展的领域,旨在利用量子力学的独特特性来增强经典机器学习。然而,这些方法的实际适用性仍然是一个开放的问题,特别是在真实世界数据集和当前量子硬件的限制的背景下。本研究对量子核估计(QKE)和量子核训练(QKT)进行了基准研究,重点是分类任务。通过一系列实验,分析了两种量子特征映射,即ZZFeatureMap和CovariantFeatureMap,在这一背景下的多功能性和泛化能力。值得注意的是,这些特征映射在文献中提出,在可能近期获得量子优势的推测下,已在特定数据集中表现出有希望的性能。本研究探讨了人工和已建立的参考数据集,并将经典机器学习方法,特别是支持向量机(SVM)和逻辑回归,作为基准比较。实验结果表明,量子方法在不同数据集上表现出不同的性能。虽然它们在特定数据集中优于经典方法,但在处理参考经典数据集时,它们经常遇到在未见测试数据上泛化困难的问题,即使在训练数据上实现高分类准确率。建议选择特征映射和通过QKT优化核参数对最大化量子方法的有效性至关重要。
更新时间: 2024-08-17 10:53:06
领域: quant-ph,cs.LG
Zero-Shot Object-Centric Representation Learning
The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.
Updated: 2024-08-17 10:37:07
标题: Zero-Shot Object-Centric Representation Learning的翻译为:零样本对象中心表示学习
摘要: 对象中心表示学习的目标是将视觉场景分解为一个结构化表示,以隔离实体。最近的成功表明,通过利用预训练的自监督特征,对象中心表示学习可以扩展到现实世界的场景中。然而,到目前为止,对象中心方法主要应用于内部分布,模型在同一数据集上训练和评估。这与机器学习中朝向可直接应用于未见数据和任务的通用模型的更广泛趋势相反。因此,在这项工作中,我们通过引入一个包括八个不同的合成和现实世界数据集的基准来从零样本泛化的视角研究当前的对象中心方法。我们分析影响零样本性能的因素,并发现在多样化的现实世界图像上训练可以提高到未见场景的可迁移性。此外,受基础模型中任务特定微调成功的启发,我们引入了一种新颖的微调策略,以适应预训练的视觉编码器用于对象发现任务。我们发现,所提出的方法导致了无监督对象发现的最先进性能,展现了对未见数据集的强大零样本迁移。
更新时间: 2024-08-17 10:37:07
领域: cs.CV,cs.LG
Linear Attention is Enough in Spatial-Temporal Forecasting
As the most representative scenario of spatial-temporal forecasting tasks, the traffic forecasting task attracted numerous attention from machine learning community due to its intricate correlation both in space and time dimension. Existing methods often treat road networks over time as spatial-temporal graphs, addressing spatial and temporal representations independently. However, these approaches struggle to capture the dynamic topology of road networks, encounter issues with message passing mechanisms and over-smoothing, and face challenges in learning spatial and temporal relationships separately. To address these limitations, we propose treating nodes in road networks at different time steps as independent spatial-temporal tokens and feeding them into a vanilla Transformer to learn complex spatial-temporal patterns, design STformer achieving SOTA. Given its quadratic complexity, we introduce a variant NSTformer based on Nystr$\ddot{o}$m method to approximate self-attention with linear complexity but even slightly better than former in a few cases astonishingly. Extensive experimental results on traffic datasets demonstrate that the proposed method achieves state-of-the-art performance at an affordable computational cost. Our code will be made available.
Updated: 2024-08-17 10:06:50
标题: 线性注意力在时空预测中足够
摘要: 作为空间-时间预测任务中最具代表性的场景,交通预测任务由于其在空间和时间维度上的复杂相关性吸引了机器学习社区的广泛关注。现有方法通常将随时间变化的道路网络视为空间-时间图,独立处理空间和时间表示。然而,这些方法往往难以捕捉道路网络的动态拓扑结构,遇到消息传递机制和过度平滑的问题,并且在分别学习空间和时间关系方面面临挑战。为了解决这些限制,我们提出将道路网络中不同时间步的节点视为独立的空间-时间标记,并将它们馈送到传统Transformer中学习复杂的空间-时间模式,设计STformer实现了最先进技术。考虑到其二次复杂度,我们引入了基于Nyström方法的变种NSTformer来近似自注意力,其具有线性复杂度,但在某些情况下甚至略优于前者,令人惊讶。对交通数据集的大量实验结果表明,所提出的方法以可承受的计算成本实现了最先进的性能。我们的代码将会公开发布。
更新时间: 2024-08-17 10:06:50
领域: cs.LG,cs.AI
On the KL-Divergence-based Robust Satisficing Model
Empirical risk minimization, a cornerstone in machine learning, is often hindered by the Optimizer's Curse stemming from discrepancies between the empirical and true data-generating distributions.To address this challenge, the robust satisficing framework has emerged recently to mitigate ambiguity in the true distribution. Distinguished by its interpretable hyperparameter and enhanced performance guarantees, this approach has attracted increasing attention from academia. However, its applicability in tackling general machine learning problems, notably deep neural networks, remains largely unexplored due to the computational challenges in solving this model efficiently across general loss functions. In this study, we delve into the Kullback Leibler divergence based robust satisficing model under a general loss function, presenting analytical interpretations, diverse performance guarantees, efficient and stable numerical methods, convergence analysis, and an extension tailored for hierarchical data structures. Through extensive numerical experiments across three distinct machine learning tasks, we demonstrate the superior performance of our model compared to state-of-the-art benchmarks.
Updated: 2024-08-17 10:05:05
标题: 基于KL散度的鲁棒满意模型
摘要: 经验风险最小化是机器学习中的基石,经常受到优化器的诅咒的阻碍,这是由于经验数据与真实数据生成分布之间的差异。为了解决这一挑战,最近出现了鲁棒满意框架,以减轻真实分布中的模糊性。这种方法以其可解释的超参数和增强的性能保证而著称,已经引起了学术界越来越多的关注。然而,由于解决这种模型在一般损失函数下的计算挑战,其在解决一般机器学习问题,尤其是深度神经网络方面的适用性仍然未被充分探索。在本研究中,我们深入研究了基于Kullback Leibler散度的鲁棒满意模型在一般损失函数下,提出了分析解释、多样性性能保证、高效稳定的数值方法、收敛分析以及针对分层数据结构定制的扩展。通过在三个不同的机器学习任务上进行广泛的数值实验,我们展示了我们的模型相对于最先进的基准的卓越性能。
更新时间: 2024-08-17 10:05:05
领域: cs.LG
Proving membership in LLM pretraining data via data watermarks
Detecting whether copyright holders' works were used in LLM pretraining is poised to be an important problem. This work proposes using data watermarks to enable principled detection with only black-box model access, provided that the rightholder contributed multiple training documents and watermarked them before public release. By applying a randomly sampled data watermark, detection can be framed as hypothesis testing, which provides guarantees on the false detection rate. We study two watermarks: one that inserts random sequences, and another that randomly substitutes characters with Unicode lookalikes. We first show how three aspects of watermark design -- watermark length, number of duplications, and interference -- affect the power of the hypothesis test. Next, we study how a watermark's detection strength changes under model and dataset scaling: while increasing the dataset size decreases the strength of the watermark, watermarks remain strong if the model size also increases. Finally, we view SHA hashes as natural watermarks and show that we can robustly detect hashes from BLOOM-176B's training data, as long as they occurred at least 90 times. Together, our results point towards a promising future for data watermarks in real world use.
Updated: 2024-08-17 10:04:53
标题: 通过数据水印证明LLM预训练数据的成员资格
摘要: 检测LLM预训练中是否使用了版权所有者的作品,被认为是一个重要的问题。本文提出使用数据水印来实现有原则的检测,只需访问黑盒模型,前提是版权所有者在公开发布之前贡献了多个训练文档并对其进行了水印处理。通过应用随机抽样的数据水印,检测可以被构建为假设检验,这提供了关于虚假检测率的保证。我们研究了两种水印:一种插入随机序列,另一种随机替换字符与Unicode相似字符。我们首先展示了水印设计的三个方面--水印长度、复制次数和干扰--如何影响假设检验的能力。接下来,我们研究了水印在模型和数据集缩放下的检测强度变化:增加数据集大小会降低水印的强度,但如果模型大小也增加,则水印仍然保持强度。最后,我们将SHA哈希视为自然水印,并展示我们可以稳健地检测到BLOOM-176B的训练数据中的哈希,只要它们出现至少90次。总的来说,我们的结果指向了数据水印在现实世界中的有前途的未来。
更新时间: 2024-08-17 10:04:53
领域: cs.CR,cs.CL,cs.LG
CogLM: Tracking Cognitive Development of Large Language Models
Piaget's Theory of Cognitive Development (PTC) posits that the development of cognitive levels forms the foundation for human learning across various abilities. As Large Language Models (LLMs) have recently shown remarkable abilities across a wide variety of tasks, we are curious about the cognitive levels of current LLMs: to what extent they have developed and how this development has been achieved. To this end, we construct a benchmark CogLM (Cognitive Ability Evaluation for Language Model) based on PTC to assess the cognitive levels of LLMs. CogLM comprises 1,220 questions spanning 10 cognitive abilities crafted by more than 20 human experts, providing a comprehensive testbed for the cognitive levels of LLMs. Through extensive experiments across multiple mainstream LLMs with CogLM, we find that: (1) Human-like cognitive abilities have emerged in advanced LLMs (GPT-4), comparable to those of a 20-year-old human. (2) The parameter size and optimization objective are two key factors affecting the cognitive levels of LLMs. (3) The performance on downstream tasks is positively correlated with the level of cognitive abilities. These findings fill the gap in research on the cognitive abilities of LLMs, tracing the development of LLMs from a cognitive perspective and guiding the future direction of their evolution.
Updated: 2024-08-17 09:49:40
标题: CogLM:跟踪大型语言模型的认知发展
摘要: 皮亚杰的认知发展理论(PTC)认为,认知水平的发展构成了人类学习各种能力的基础。随着大型语言模型(LLMs)最近展现出在各种任务上的显著能力,我们对当前LLMs的认知水平感到好奇:它们发展到了何种程度,以及这种发展是如何实现的。为此,我们基于PTC构建了一个基准CogLM(语言模型认知能力评估),用于评估LLMs的认知水平。CogLM包含了1220个问题,涵盖了10种认知能力,由20多位人类专家精心制作,为LLMs的认知水平提供了全面的测试基础。通过对多个主流LLMs进行广泛实验,我们发现:(1)人类般的认知能力已经在先进的LLMs(GPT-4)中出现,可以与20岁的人类相媲美。(2)参数大小和优化目标是影响LLMs认知水平的两个关键因素。(3)在下游任务上的表现与认知能力水平呈正相关。这些发现填补了关于LLMs认知能力的研究空白,从认知角度追踪了LLMs的发展,并指导了它们未来演化的方向。
更新时间: 2024-08-17 09:49:40
领域: cs.CL,cs.AI
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities
In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affects developing countries like India. To address this gap, we present spatiotemporal measurements of air quality from 30 indoor sites over six months during summer and winter seasons. The sites are geographically located across four regions of type: rural, suburban, and urban, covering the typical low to middle-income population in India. The dataset contains various types of indoor environments (e.g., studio apartments, classrooms, research laboratories, food canteens, and residential households), and can provide the basis for data-driven learning model research aimed at coping with unique pollution patterns in developing countries. This unique dataset demands advanced data cleaning and imputation techniques for handling missing data due to power failure or network outages during data collection. Furthermore, through a simple speech-to-text application, we provide real-time indoor activity labels annotated by occupants. Therefore, environmentalists and ML enthusiasts can utilize this dataset to understand the complex patterns of the pollutants under different indoor activities, identify recurring sources of pollution, forecast exposure, improve floor plans and room structures of modern indoor designs, develop pollution-aware recommender systems, etc.
Updated: 2024-08-17 09:41:03
标题: 低至中等收入社区居民日常生活活动中的室内空气质量数据集
摘要: 近年来,室内空气污染对我们的社会构成了重大威胁,每年造成超过320万人死亡。发展中国家,如印度,受到的影响最为严重,因为缺乏知识、监管不足和室外空气污染导致了人们每天暴露在污染物中。然而,只有少数研究尝试了解室内空气污染如何影响印度等发展中国家。为了填补这一空白,我们在夏季和冬季的六个月内对30个室内站点的空气质量进行了时空测量。这些站点地理位置跨越四个类型的地区:农村、郊区和城市,覆盖了印度典型的低至中等收入人口。数据集包含各种类型的室内环境(例如,工作室公寓、教室、研究实验室、食堂和居民家庭),可作为面向处理发展中国家独特污染模式的数据驱动学习模型研究的基础。这一独特数据集需要先进的数据清洗和插补技术,以处理由于停电或网络中断导致的数据缺失。此外,通过一个简单的语音转文本应用程序,我们提供由居住者标注的实时室内活动标签。因此,环保人士和机器学习爱好者可以利用这一数据集来了解不同室内活动下污染物的复杂模式,识别污染源的重复出现,预测暴露情况,改善现代室内设计的平面图和房间结构,开发污染感知的推荐系统等。
更新时间: 2024-08-17 09:41:03
领域: cs.LG
Direct Multi-Turn Preference Optimization for Language Agents
Adapting Large Language Models (LLMs) for agent tasks is critical in developing language agents. Direct Preference Optimization (DPO) is a promising technique for this adaptation with the alleviation of compounding errors, offering a means to directly optimize Reinforcement Learning (RL) objectives. However, applying DPO to multi-turn tasks presents challenges due to the inability to cancel the partition function. Overcoming this obstacle involves making the partition function independent of the current state and addressing length disparities between preferred and dis-preferred trajectories. In this light, we replace the policy constraint with the state-action occupancy measure constraint in the RL objective and add length normalization to the Bradley-Terry model, yielding a novel loss function named DMPO for multi-turn agent tasks with theoretical explanations. Extensive experiments on three multi-turn agent task datasets confirm the effectiveness and superiority of the DMPO loss.
Updated: 2024-08-17 09:33:12
标题: 语言代理的直接多轮偏好优化
摘要: 将大型语言模型(LLMs)调整为代理任务对于开发语言代理至关重要。直接偏好优化(DPO)是一种有前途的技术,可以通过减轻复合错误来进行适应,提供了直接优化强化学习(RL)目标的手段。然而,将DPO应用于多轮任务会面临挑战,因为无法取消分区函数。克服这一障碍涉及使分区函数与当前状态无关,并解决偏好和非偏好轨迹之间的长度差异。在这种情况下,我们用RL目标中的状态-动作占用度量约束替换策略约束,并向Bradley-Terry模型添加长度归一化,得到一种名为DMPO的新型损失函数,用于多轮代理任务,并提供了理论解释。对三个多轮代理任务数据集的大量实验验证了DMPO损失的有效性和优越性。
更新时间: 2024-08-17 09:33:12
领域: cs.CL,cs.LG
Order of Compression: A Systematic and Optimal Sequence to Combinationally Compress CNN
Model compression has gained significant popularity as a means to alleviate the computational and memory demands of machine learning models. Each compression technique leverages unique features to reduce the size of neural networks. Although intuitively combining different techniques may enhance compression effectiveness, we find that the order in which they are combined significantly influences performance. To identify the optimal sequence for compressing neural networks, we propose the Order of Compression, a systematic and optimal sequence to apply multiple compression techniques in the most effective order. We start by building the foundations of the orders between any two compression approaches and then demonstrate inserting additional compression between any two compressions will not break the order of the two compression approaches. Based on the foundations, an optimal order is obtained with topological sorting. Validated on image-based regression and classification networks across different datasets, our proposed Order of Compression significantly reduces computational costs by up to 859 times on ResNet34, with negligible accuracy loss (-0.09% for CIFAR10) compared to the baseline model. We believe our simple yet effective exploration of the order of compression will shed light on the practice of model compression.
Updated: 2024-08-17 09:20:42
标题: 压缩顺序:一种系统且最佳的序列以对CNN进行组合压缩
摘要: 模型压缩作为减轻机器学习模型计算和内存需求的手段已经获得了显著的流行。每种压缩技术利用独特的特性来减小神经网络的大小。尽管直观地结合不同的技术可能会增强压缩效果,但我们发现它们被结合的顺序显著影响性能。为了确定压缩神经网络的最佳顺序,我们提出了压缩顺序,这是一个系统化和最佳的顺序,可以在最有效的顺序中应用多种压缩技术。我们首先建立了任何两种压缩方法之间的顺序的基础,然后证明在任何两种压缩之间插入额外的压缩不会破坏这两种压缩方法的顺序。基于这些基础,通过拓扑排序获得了最佳顺序。在不同数据集上对基于图像的回归和分类网络进行验证,我们提出的压缩顺序可以将 ResNet34 的计算成本降低高达 859 倍,与基准模型相比,准确率损失微乎其微(CIFAR10 为 -0.09%)。我们相信我们对压缩顺序的简单而有效的探索将为模型压缩的实践带来启示。
更新时间: 2024-08-17 09:20:42
领域: cs.LG,cs.CV,cs.NE
Point Source Identification Using Singularity Enriched Neural Networks
The inverse problem of recovering point sources represents an important class of applied inverse problems. However, there is still a lack of neural network-based methods for point source identification, mainly due to the inherent solution singularity. In this work, we develop a novel algorithm to identify point sources, utilizing a neural network combined with a singularity enrichment technique. We employ the fundamental solution and neural networks to represent the singular and regular parts, respectively, and then minimize an empirical loss involving the intensities and locations of the unknown point sources, as well as the parameters of the neural network. Moreover, by combining the conditional stability argument of the inverse problem with the generalization error of the empirical loss, we conduct a rigorous error analysis of the algorithm. We demonstrate the effectiveness of the method with several challenging experiments.
Updated: 2024-08-17 08:51:18
标题: 使用奇异性丰富神经网络进行点源识别
摘要: 恢复点源的逆问题代表了一个重要的应用逆问题类别。然而,目前仍然缺乏基于神经网络的点源识别方法,主要是由于固有的解奇异性。在这项工作中,我们开发了一种新颖的算法来识别点源,利用神经网络结合奇异性丰富技术。我们利用基本解和神经网络分别表示奇异部分和常规部分,然后最小化涉及未知点源的强度和位置以及神经网络参数的经验损失。此外,通过将逆问题的条件稳定性论证与经验损失的泛化误差相结合,我们对算法进行了严格的误差分析。我们通过多个具有挑战性的实验展示了该方法的有效性。
更新时间: 2024-08-17 08:51:18
领域: math.NA,cs.LG,cs.NA
Gaussian Process Kolmogorov-Arnold Networks
In this paper, we introduce a probabilistic extension to Kolmogorov Arnold Networks (KANs) by incorporating Gaussian Process (GP) as non-linear neurons, which we refer to as GP-KAN. A fully analytical approach to handling the output distribution of one GP as an input to another GP is achieved by considering the function inner product of a GP function sample with the input distribution. These GP neurons exhibit robust non-linear modelling capabilities while using few parameters and can be easily and fully integrated in a feed-forward network structure. They provide inherent uncertainty estimates to the model prediction and can be trained directly on the log-likelihood objective function, without needing variational lower bounds or approximations. In the context of MNIST classification, a model based on GP-KAN of 80 thousand parameters achieved 98.5% prediction accuracy, compared to current state-of-the-art models with 1.5 million parameters.
Updated: 2024-08-17 08:45:38
标题: 高斯过程科尔莫戈罗夫-阿诺德网络
摘要: 在本文中,我们通过将高斯过程(GP)作为非线性神经元并纳入 Kolmogorov Arnold 网络(KANs)中,引入了对KANs的概率扩展,我们将其称为GP-KAN。通过考虑将一个GP的输出分布作为另一个GP的输入,实现了对处理输出分布的完全分析方法。这些GP神经元展现出强大的非线性建模能力,同时使用较少的参数,并且可以轻松完全地集成在前馈网络结构中。它们为模型预测提供固有的不确定性估计,并且可以直接根据对数似然性目标函数进行训练,无需变分下界或近似。在MNIST分类的背景下,基于80,000个参数的GP-KAN模型实现了98.5%的预测准确率,与当前拥有150万个参数的最先进模型相比。
更新时间: 2024-08-17 08:45:38
领域: cs.LG,stat.ML
Learning to Explore for Stochastic Gradient MCMC
Bayesian Neural Networks(BNNs) with high-dimensional parameters pose a challenge for posterior inference due to the multi-modality of the posterior distributions. Stochastic Gradient MCMC(SGMCMC) with cyclical learning rate scheduling is a promising solution, but it requires a large number of sampling steps to explore high-dimensional multi-modal posteriors, making it computationally expensive. In this paper, we propose a meta-learning strategy to build \gls{sgmcmc} which can efficiently explore the multi-modal target distributions. Our algorithm allows the learned SGMCMC to quickly explore the high-density region of the posterior landscape. Also, we show that this exploration property is transferrable to various tasks, even for the ones unseen during a meta-training stage. Using popular image classification benchmarks and a variety of downstream tasks, we demonstrate that our method significantly improves the sampling efficiency, achieving better performance than vanilla \gls{sgmcmc} without incurring significant computational overhead.
Updated: 2024-08-17 08:36:42
标题: 学习探索随机梯度MCMC
摘要: 高维参数的贝叶斯神经网络(BNNs)由于后验分布的多模态性而对后验推断构成挑战。具有周期学习率调度的随机梯度MCMC(SGMCMC)是一个有前途的解决方案,但需要大量的采样步骤来探索高维多模态后验分布,使其在计算上昂贵。在本文中,我们提出了一种元学习策略来构建能够有效探索多模态目标分布的SGMCMC。我们的算法允许学习到的SGMCMC快速探索后验景观的高密度区域。此外,我们展示了这种探索性质可以传递到各种任务,即使在元训练阶段未见过的任务。通过使用流行的图像分类基准和各种下游任务,我们证明了我们的方法显著提高了采样效率,实现了比普通SGMCMC更好的性能,而不会带来显著的计算开销。
更新时间: 2024-08-17 08:36:42
领域: cs.LG,cs.AI,cs.CV
Better Python Programming for all: With the focus on Maintainability
This study aims to enhance the maintainability of code generated by Large Language Models (LLMs), with a focus on the Python programming language. As the use of LLMs for coding assistance grows, so do concerns about the maintainability of the code they produce. Previous research has mainly concentrated on the functional accuracy and testing success of generated code, overlooking aspects of maintainability. Our approach involves the use of a specifically designed dataset for training and evaluating the model, ensuring a thorough assessment of code maintainability. At the heart of our work is the fine-tuning of an LLM for code refactoring, aimed at enhancing code readability, reducing complexity, and improving overall maintainability. After fine-tuning an LLM to prioritize code maintainability, our evaluations indicate that this model significantly improves code maintainability standards, suggesting a promising direction for the future of AI-assisted software development.
Updated: 2024-08-17 08:14:22
标题: 更好的Python编程:以可维护性为重点。
摘要: 这项研究旨在提高由大型语言模型(LLMs)生成的代码的可维护性,重点关注Python编程语言。随着LLMs用于编码辅助的使用增加,人们对其生成的代码的可维护性也越来越担忧。先前的研究主要集中在生成的代码的功能准确性和测试成功率,忽略了可维护性方面。 我们的方法涉及使用专门设计的数据集来训练和评估模型,确保对代码可维护性进行全面评估。我们的工作核心是对LLM进行代码重构的微调,旨在提高代码可读性,降低复杂性,并改善整体可维护性。 在对LLM进行微调以优先考虑代码可维护性后,我们的评估表明,这个模型显著提高了代码可维护性标准,为AI辅助软件开发的未来指明了一个有前景的方向。
更新时间: 2024-08-17 08:14:22
领域: cs.SE,cs.AI
Machine Learning Potentials: A Roadmap Toward Next-Generation Biomolecular Simulations
Machine learning potentials offer a revolutionary, unifying framework for molecular simulations across scales, from quantum chemistry to coarse-grained models. Here, I explore their potential to dramatically improve accuracy and scalability in simulating complex molecular systems. I discuss key challenges that must be addressed to fully realize their transformative potential in chemical biology and related fields.
Updated: 2024-08-17 07:53:33
标题: 机器学习潜力:通往下一代生物分子模拟的路线图
摘要: 机器学习潜力为分子模拟提供了一个革命性的统一框架,涵盖了从量子化学到粗粒度模型的各个尺度。在这里,我探讨了它们在模拟复杂分子系统中显著提高准确性和可扩展性的潜力。我讨论了必须解决的关键挑战,以充分实现它们在化学生物学和相关领域的变革潜力。
更新时间: 2024-08-17 07:53:33
领域: physics.chem-ph,cs.LG
PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning
Imitation learning has shown great potential for enabling robots to acquire complex manipulation behaviors. However, these algorithms suffer from high sample complexity in long-horizon tasks, where compounding errors accumulate over the task horizons. We present PRIME (PRimitive-based IMitation with data Efficiency), a behavior primitive-based framework designed for improving the data efficiency of imitation learning. PRIME scaffolds robot tasks by decomposing task demonstrations into primitive sequences, followed by learning a high-level control policy to sequence primitives through imitation learning. Our experiments demonstrate that PRIME achieves a significant performance improvement in multi-stage manipulation tasks, with 10-34% higher success rates in simulation over state-of-the-art baselines and 20-48% on physical hardware.
Updated: 2024-08-17 07:50:34
标题: PRIME:使用行为基元对操纵任务进行支架构建,实现数据高效的模仿学习
摘要: 模仿学习已显示出使机器人获得复杂操作行为的巨大潜力。然而,在长期任务中,这些算法存在高样本复杂性,其中累积的错误在任务范围内不断增加。我们提出了PRIME(基于行为原语的数据效率IMitation),这是一个基于行为原语的框架,旨在提高模仿学习的数据效率。PRIME通过将任务演示分解为原语序列来支持机器人任务,然后通过模仿学习学习高层控制策略以对原语进行排序。我们的实验表明,PRIME在多阶段操作任务中取得了显著的性能改进,在模拟环境中的成功率比最先进的基线高10-34%,在物理硬件上比基线高20-48%。
更新时间: 2024-08-17 07:50:34
领域: cs.RO,cs.AI,cs.LG
Identifying Technical Debt and Its Types Across Diverse Software Projects Issues
Technical Debt (TD) identification in software projects issues is crucial for maintaining code quality, reducing long-term maintenance costs, and improving overall project health. This study advances TD classification using transformer-based models, addressing the critical need for accurate and efficient TD identification in large-scale software development. Our methodology employs multiple binary classifiers for TD and its type, combined through ensemble learning, to enhance accuracy and robustness in detecting various forms of TD. We train and evaluate these models on a comprehensive dataset from GitHub Archive Issues (2015-2024), supplemented with industrial data validation. We demonstrate that in-project fine-tuned transformer models significantly outperform task-specific fine-tuned models in TD classification, highlighting the importance of project-specific context in accurate TD identification. Our research also reveals the superiority of specialized binary classifiers over multi-class models for TD and its type identification, enabling more targeted debt resolution strategies. A comparative analysis shows that the smaller DistilRoBERTa model is more effective than larger language models like GPTs for TD classification tasks, especially after fine-tuning, offering insights into efficient model selection for specific TD detection tasks. The study also assesses generalization capabilities using metrics such as MCC, AUC ROC, Recall, and F1 score, focusing on model effectiveness, fine-tuning impact, and relative performance. By validating our approach on out-of-distribution and real-world industrial datasets, we ensure practical applicability, addressing the diverse nature of software projects.
Updated: 2024-08-17 07:46:54
标题: 识别不同软件项目中的技术债务及其类型问题
摘要: 软件项目中技术债务(TD)的识别是至关重要的,可以维护代码质量,减少长期维护成本,并改善整体项目健康状况。本研究通过使用基于Transformer的模型推进TD分类,解决了大规模软件开发中准确和高效的TD识别的关键需求。 我们的方法采用多个用于TD及其类型的二元分类器,通过集成学习结合,以增强在检测各种形式的TD时的准确性和鲁棒性。我们在GitHub Archive Issues(2015-2024)的全面数据集上训练并评估这些模型,并补充工业数据验证。 我们展示了在项目内进行微调的Transformer模型在TD分类方面明显优于特定任务的微调模型,突显了项目特定上下文在准确识别TD中的重要性。我们的研究还揭示了专门的二元分类器在TD及其类型识别方面优于多类模型,从而实现更有针对性的债务解决策略。比较分析显示,较小的DistilRoBERTa模型在TD分类任务中比大型语言模型(如GPTs)更有效,特别是经过微调后,为特定TD检测任务的高效模型选择提供了见解。 该研究还评估了使用MCC、AUC ROC、Recall和F1分数等指标的泛化能力,重点关注模型的有效性、微调影响和相对性能。通过在分布之外和真实世界的工业数据集上验证我们的方法,我们确保了其实际适用性,解决了软件项目的多样性特质。
更新时间: 2024-08-17 07:46:54
领域: cs.SE,cs.AI
Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning
Imitation learning (IL) is notably effective for robotic tasks where directly programming behaviors or defining optimal control costs is challenging. In this work, we address a scenario where the imitator relies solely on observed behavior and cannot make environmental interactions during learning. It does not have additional supplementary datasets beyond the expert's dataset nor any information about the transition dynamics. Unlike state-of-the-art (SOTA) IL methods, this approach tackles the limitations of conventional IL by operating in a more constrained and realistic setting. Our method uses the Markov balance equation and introduces a novel conditional density estimation-based imitation learning framework. It employs conditional normalizing flows for transition dynamics estimation and aims at satisfying a balance equation for the environment. Through a series of numerical experiments on Classic Control and MuJoCo environments, we demonstrate consistently superior empirical performance compared to many SOTA IL algorithms.
Updated: 2024-08-17 07:17:19
标题: 马尔可夫平衡满足度提高了严格批量离线模仿学习的性能
摘要: 模仿学习(IL)在直接编程行为或定义最优控制成本具有挑战性的机器人任务中表现出显著的有效性。在这项工作中,我们解决了一个情景,其中模仿者仅依赖于观察到的行为,在学习过程中无法进行环境交互。它没有额外的补充数据集,超过专家数据集,也没有关于转移动态的任何信息。与最先进的IL方法不同,这种方法通过在更受限制和现实的环境中运行来解决传统IL的局限性。我们的方法使用马尔可夫平衡方程,并引入了一种基于条件密度估计的新颖的模仿学习框架。它利用条件归一化流来估计转移动态,并旨在满足环境的平衡方程。通过一系列在经典控制和MuJoCo环境中的数值实验,我们展示了与许多最先进的IL算法相比始终表现出卓越的实证性能。
更新时间: 2024-08-17 07:17:19
领域: cs.LG,cs.AI
Dynamic Neural Dowker Network: Approximating Persistent Homology in Dynamic Directed Graphs
Persistent homology, a fundamental technique within Topological Data Analysis (TDA), captures structural and shape characteristics of graphs, yet encounters computational difficulties when applied to dynamic directed graphs. This paper introduces the Dynamic Neural Dowker Network (DNDN), a novel framework specifically designed to approximate the results of dynamic Dowker filtration, aiming to capture the high-order topological features of dynamic directed graphs. Our approach creatively uses line graph transformations to produce both source and sink line graphs, highlighting the shared neighbor structures that Dowker complexes focus on. The DNDN incorporates a Source-Sink Line Graph Neural Network (SSLGNN) layer to effectively capture the neighborhood relationships among dynamic edges. Additionally, we introduce an innovative duality edge fusion mechanism, ensuring that the results for both the sink and source line graphs adhere to the duality principle intrinsic to Dowker complexes. Our approach is validated through comprehensive experiments on real-world datasets, demonstrating DNDN's capability not only to effectively approximate dynamic Dowker filtration results but also to perform exceptionally in dynamic graph classification tasks.
Updated: 2024-08-17 07:13:12
标题: 动态神经Dowker网络:在动态有向图中逼近持久同调
摘要: 持续同调,是拓扑数据分析(TDA)中的一种基本技术,可以捕捉图形的结构和形状特征,但在应用于动态有向图时遇到计算困难。本文介绍了动态神经道克网络(DNDN),这是一个新颖的框架,专门设计用于近似动态道克滤波的结果,旨在捕捉动态有向图的高阶拓扑特征。我们的方法创造性地利用线图转换来产生源线图和汇线图,突出了道克复合体关注的共享邻居结构。DNDN包含一个源-汇线图神经网络(SSLGNN)层,以有效捕捉动态边之间的邻域关系。此外,我们引入了一种创新的对偶边融合机制,确保汇线图和源线图的结果都遵循道克复合体固有的对偶原则。通过对真实世界数据集的综合实验验证了我们的方法,展示了DNDN不仅能有效近似动态道克滤波结果,而且在动态图分类任务中表现出色。
更新时间: 2024-08-17 07:13:12
领域: cs.LG,math.AT
Time Series Analysis by State Space Learning
Time series analysis by state-space models is widely used in forecasting and extracting unobservable components like level, slope, and seasonality, along with explanatory variables. However, their reliance on traditional Kalman filtering frequently hampers their effectiveness, primarily due to Gaussian assumptions and the absence of efficient subset selection methods to accommodate the multitude of potential explanatory variables in today's big-data applications. Our research introduces the State Space Learning (SSL), a novel framework and paradigm that leverages the capabilities of statistical learning to construct a comprehensive framework for time series modeling and forecasting. By utilizing a regularized high-dimensional regression framework, our approach jointly extracts typical time series unobservable components, detects and addresses outliers, and selects the influence of exogenous variables within a high-dimensional space in polynomial time and global optimality guarantees. Through a controlled numerical experiment, we demonstrate the superiority of our approach in terms of subset selection of explanatory variables accuracy compared to relevant benchmarks. We also present an intuitive forecasting scheme and showcase superior performances relative to traditional time series models using a dataset of 48,000 monthly time series from the M4 competition. We extend the applicability of our approach to reformulate any linear state space formulation featuring time-varying coefficients into high-dimensional regularized regressions, expanding the impact of our research to other engineering applications beyond time series analysis. Finally, our proposed methodology is implemented within the Julia open-source package, ``StateSpaceLearning.jl".
Updated: 2024-08-17 07:04:26
标题: 状态空间学习进行的时间序列分析
摘要: 时间序列分析通过状态空间模型广泛用于预测和提取不可观测的组件,如水平、斜率和季节性,以及解释变量。然而,它们对传统Kalman滤波的依赖经常阻碍其有效性,主要是由于高斯假设和缺乏有效的子集选择方法来适应当今大数据应用中潜在的众多解释变量。我们的研究引入了状态空间学习(SSL),这是一个利用统计学习的能力构建时间序列建模和预测的全面框架和范式。通过利用正则化的高维回归框架,我们的方法共同提取典型的时间序列不可观测组件,检测和处理异常值,并在多项式时间和全局最优性保证的情况下选择高维空间中外生变量的影响。通过控制的数值实验,我们证明了我们的方法在解释变量子集选择准确性方面相对于相关基准的优越性。我们还提出了一个直观的预测方案,并展示了相对于传统时间序列模型在48,000个月度时间序列数据集(来自M4竞赛)上的卓越表现。我们将我们的方法的适用性扩展到将任何具有时间变化系数的线性状态空间公式重构为高维正则化回归,将我们的研究影响扩展到时间序列分析之外的其他工程应用。最后,我们提出的方法论已在Julia开源软件包“StateSpaceLearning.jl”中实施。
更新时间: 2024-08-17 07:04:26
领域: stat.ML,cs.LG,62M20 (Primary) 68T05 (Secondary)
Stock Recommendations for Individual Investors: A Temporal Graph Network Approach with Mean-Variance Efficient Sampling
Recommender systems can be helpful for individuals to make well-informed decisions in complex financial markets. While many studies have focused on predicting stock prices, even advanced models fall short of accurately forecasting them. Additionally, previous studies indicate that individual investors often disregard established investment theories, favoring their personal preferences instead. This presents a challenge for stock recommendation systems, which must not only provide strong investment performance but also respect these individual preferences. To create effective stock recommender systems, three critical elements must be incorporated: 1) individual preferences, 2) portfolio diversification, and 3) the temporal dynamics of the first two. In response, we propose a new model, Portfolio Temporal Graph Network Recommender, PfoTGNRec, which can handle time-varying collaborative signals and incorporates diversification-enhancing sampling. On real-world individual trading data, our approach demonstrates superior performance compared to state-of-the-art baselines, including cutting-edge dynamic embedding models and existing stock recommendation models. Indeed, we show that PfoTGNRec is an effective solution that can balance customer preferences with the need to suggest portfolios with high Return-on-Investment. The source code and data are available at https://anonymous.4open.science/r/ICAIF2024-E23E.
Updated: 2024-08-17 06:45:17
标题: 个人投资者的股票推荐:一种具有均值-方差有效抽样的时间图网络方法
摘要: 推荐系统对个人在复杂的金融市场中做出明智的决策是有帮助的。尽管许多研究集中在预测股票价格上,但即使是先进的模型也无法准确预测它们。此外,先前的研究表明,个人投资者经常忽视已建立的投资理论,而更偏爱他们的个人偏好。这给股票推荐系统带来了挑战,它们不仅必须提供强大的投资绩效,还必须尊重这些个人偏好。为了创建有效的股票推荐系统,必须融入三个关键元素:1)个人偏好,2)投资组合多样化,以及3)前两者的时间动态。为此,我们提出了一个新模型,Portfolio Temporal Graph Network Recommender,PfoTGNRec,它可以处理时变的协作信号,并融入增强多样化的抽样。在真实的个人交易数据上,我们的方法表现优于最先进的基线,包括领先的动态嵌入模型和现有的股票推荐模型。实际上,我们展示了PfoTGNRec是一个有效的解决方案,可以平衡客户偏好和需要建议具有高投资回报率的投资组合。源代码和数据可在https://anonymous.4open.science/r/ICAIF2024-E23E获得。
更新时间: 2024-08-17 06:45:17
领域: q-fin.ST,cs.AI,cs.LG
Toward End-to-End Bearing Fault Diagnosis for Industrial Scenarios with Spiking Neural Networks
Spiking neural networks (SNNs) transmit information via low-power binary spikes and have received widespread attention in areas such as computer vision and reinforcement learning. However, there have been very few explorations of SNNs in more practical industrial scenarios. In this paper, we focus on the application of SNNs in bearing fault diagnosis to facilitate the integration of high-performance AI algorithms and real-world industries. In particular, we identify two key limitations of existing SNN fault diagnosis methods: inadequate encoding capacity that necessitates cumbersome data preprocessing, and non-spike-oriented architectures that constrain the performance of SNNs. To alleviate these problems, we propose a Multi-scale Residual Attention SNN (MRA-SNN) to simultaneously improve the efficiency, performance, and robustness of SNN methods. By incorporating a lightweight attention mechanism, we have designed a multi-scale attention encoding module to extract multiscale fault features from vibration signals and encode them as spatio-temporal spikes, eliminating the need for complicated preprocessing. Then, the spike residual attention block extracts high-dimensional fault features and enhances the expressiveness of sparse spikes with the attention mechanism for end-to-end diagnosis. In addition, the performance and robustness of MRA-SNN is further enhanced by introducing the lightweight attention mechanism within the spiking neurons to simulate the biological dendritic filtering effect. Extensive experiments on MFPT and JNU benchmark datasets demonstrate that MRA-SNN significantly outperforms existing methods in terms of accuracy, energy consumption and noise robustness, and is more feasible for deployment in real-world industrial scenarios.
Updated: 2024-08-17 06:41:58
标题: 朝向工业场景中使用脉冲神经网络进行端到端轴承故障诊断
摘要: 尖峰神经网络(SNNs)通过低功率二进制尖峰传输信息,在计算机视觉和强化学习等领域受到广泛关注。然而,在更实际的工业场景中,对SNN的探索非常有限。本文关注SNN在轴承故障诊断中的应用,以促进高性能人工智能算法与实际工业的整合。特别是,我们确定现有SNN故障诊断方法存在两个关键限制:编码能力不足,需要繁琐的数据预处理,以及非尖峰导向的架构限制了SNN的性能。为了缓解这些问题,我们提出了一个多尺度残余注意SNN(MRA-SNN),以同时提高SNN方法的效率、性能和鲁棒性。通过整合轻量级注意机制,我们设计了一个多尺度注意编码模块,从振动信号中提取多尺度故障特征,并将其编码为时空尖峰,消除了繁琐的预处理需求。然后,尖峰残余注意块提取高维度故障特征,并通过注意机制增强稀疏尖峰的表现力,用于端到端诊断。此外,通过在尖峰神经元内引入轻量级注意机制来模拟生物树突滤波效应,进一步增强了MRA-SNN的性能和鲁棒性。在MFPT和JNU基准数据集上的广泛实验表明,MRA-SNN在准确性、能耗和噪声鲁棒性方面明显优于现有方法,并更适合部署在实际工业场景中。
更新时间: 2024-08-17 06:41:58
领域: cs.NE,cs.AI,cs.LG
Training Verifiably Robust Agents Using Set-Based Reinforcement Learning
Reinforcement learning often uses neural networks to solve complex control tasks. However, neural networks are sensitive to input perturbations, which makes their deployment in safety-critical environments challenging. This work lifts recent results from formally verifying neural networks against such disturbances to reinforcement learning in continuous state and action spaces using reachability analysis. While previous work mainly focuses on adversarial attacks for robust reinforcement learning, we train neural networks utilizing entire sets of perturbed inputs and maximize the worst-case reward. The obtained agents are verifiably more robust than agents obtained by related work, making them more applicable in safety-critical environments. This is demonstrated with an extensive empirical evaluation of four different benchmarks.
Updated: 2024-08-17 06:26:17
标题: 使用基于集合的强化学习训练可验证鲁棒代理
摘要: 强化学习通常使用神经网络来解决复杂的控制任务。然而,神经网络对输入扰动敏感,这使得它们在安全关键环境中的部署具有挑战性。本研究将最近验证神经网络对抗这种干扰的结果应用到连续状态和动作空间的强化学习中,使用可达性分析。虽然先前的工作主要集中在对抗性攻击以实现强化学习的稳健性,但我们训练神经网络利用整套受扰动输入并最大化最坏情况下的奖励。得到的代理相对于相关工作获得的代理在稳健性上具有可验证性,使它们在安全关键环境中更具适用性。通过对四个不同基准测试的广泛实证评估来证明这一点。
更新时间: 2024-08-17 06:26:17
领域: cs.LG,cs.RO,cs.SY,eess.SY
Measuring Visual Sycophancy in Multimodal Models
This paper introduces and examines the phenomenon of "visual sycophancy" in multimodal language models, a term we propose to describe these models' tendency to disproportionately favor visually presented information, even when it contradicts their prior knowledge or responses. Our study employs a systematic methodology to investigate this phenomenon: we present models with images of multiple-choice questions, which they initially answer correctly, then expose the same model to versions with visually pre-marked options. Our findings reveal a significant shift in the models' responses towards the pre-marked option despite their previous correct answers. Comprehensive evaluations demonstrate that visual sycophancy is a consistent and quantifiable behavior across various model architectures. Our findings highlight potential limitations in the reliability of these models when processing potentially misleading visual information, raising important questions about their application in critical decision-making contexts.
Updated: 2024-08-17 06:25:36
标题: 测量多模态模型中的视觉谄媚
摘要: 这篇论文介绍并研究了多模态语言模型中的“视觉阿谀”现象,我们提出这个术语来描述这些模型倾向于不成比例地偏爱视觉呈现的信息,即使这些信息与它们先前的知识或回答相矛盾。我们的研究采用系统方法来调查这一现象:我们向模型呈现带有图片的多项选择题,它们最初能正确回答,然后将同一模型暴露于带有视觉预标记选项的版本中。我们的研究结果显示,尽管之前回答正确,模型的响应在很大程度上转向了预先标记的选项。全面评估表明,视觉阿谀是各种模型体系结构中一种一致且可量化的行为。我们的研究结果突显了这些模型在处理潜在误导性视觉信息时可靠性存在潜在限制,引发了关于它们在关键决策情境中应用的重要问题。
更新时间: 2024-08-17 06:25:36
领域: cs.AI,cs.CL,cs.CV,cs.HC
Improved Q-learning based Multi-hop Routing for UAV-Assisted Communication
Designing effective Unmanned Aerial Vehicle(UAV)-assisted routing protocols is challenging due to changing topology, limited battery capacity, and the dynamic nature of communication environments. Current protocols prioritize optimizing individual network parameters, overlooking the necessity for a nuanced approach in scenarios with intermittent connectivity, fluctuating signal strength, and varying network densities, ultimately failing to address aerial network requirements comprehensively. This paper proposes a novel, Improved Q-learning-based Multi-hop Routing (IQMR) algorithm for optimal UAV-assisted communication systems. Using Q(\lambda) learning for routing decisions, IQMR substantially enhances energy efficiency and network data throughput. IQMR improves system resilience by prioritizing reliable connectivity and inter-UAV collision avoidance while integrating real-time network status information, all in the absence of predefined UAV path planning, thus ensuring dynamic adaptability to evolving network conditions. The results validate IQMR's adaptability to changing system conditions and superiority over the current techniques. IQMR showcases 36.35\% and 32.05\% improvements in energy efficiency and data throughput over the existing methods.
Updated: 2024-08-17 06:24:31
标题: 改进的基于Q-learning的多跳路由对于无人机辅助通信的应用
摘要: 设计有效的无人机辅助路由协议是具有挑战性的,因为拓扑结构不断变化、电池容量有限,以及通信环境的动态性。当前的协议优先考虑优化个别网络参数,忽视了在具有间歇连接、信号强度波动和网络密度变化的情景中需要细致方法的必要性,最终未能全面满足空中网络的需求。本文提出了一种新颖的基于改进Q学习的多跳路由(IQMR)算法,用于优化无人机辅助通信系统。通过使用Q(\lambda)学习进行路由决策,IQMR大大提高了能源效率和网络数据吞吐量。IQMR通过优先考虑可靠连接和无人机之间的碰撞避免来提高系统的弹性,同时整合实时网络状态信息,无需预先定义无人机路径规划,从而确保对不断变化的网络条件具有动态适应能力。结果验证了IQMR对不断变化的系统条件的适应性,以及其优于当前技术的优越性。IQMR在能源效率和数据吞吐量方面分别比现有方法改进了36.35%和32.05%。
更新时间: 2024-08-17 06:24:31
领域: cs.NI,cs.LG
Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation
Spiking neural networks (SNNs) have received widespread attention as an ultra-low energy computing paradigm. Recent studies have focused on improving the feature extraction capability of SNNs, but they suffer from inefficient inference and suboptimal performance. In this paper, we propose a simple yet effective temporal reversed training (TRT) method to optimize the spatio-temporal performance of SNNs and circumvent these problems. We perturb the input temporal data by temporal reversal, prompting the SNN to produce original-reversed consistent output logits and to learn perturbation-invariant representations. For static data without temporal dimension, we generalize this strategy by exploiting the inherent temporal property of spiking neurons for spike feature temporal reversal. In addition, we utilize the lightweight ``star operation" (element-wise multiplication) to hybridize the original and temporally reversed spike firing rates and expand the implicit dimensions, which serves as spatio-temporal regularization to further enhance the generalization of the SNN. Our method involves only an additional temporal reversal operation and element-wise multiplication during training, thus incurring negligible training overhead and not affecting the inference efficiency at all. Extensive experiments on static/neuromorphic object/action recognition, and 3D point cloud classification tasks demonstrate the effectiveness and generalizability of our method. In particular, with only two timesteps, our method achieves 74.77\% and 90.57\% accuracy on ImageNet and ModelNet40, respectively.
Updated: 2024-08-17 06:23:38
标题: 时序反向训练用于具有广义时空表示的脉冲神经网络
摘要: 尖峰神经网络(SNNs)作为一种超低能量计算范式,受到了广泛关注。最近的研究集中在提高SNNs的特征提取能力,但它们存在着推理效率低和性能亚优的问题。在本文中,我们提出了一种简单而有效的时间反转训练(TRT)方法,以优化SNNs的时空性能并规避这些问题。我们通过时间反转扰动输入时间数据,促使SNN产生原始-反转一致的输出逻辑,并学习扰动不变的表示。对于没有时间维度的静态数据,我们通过利用尖峰神经元的固有时间特性来推广这一策略,进行尖峰特征的时间反转。此外,我们利用轻量级的“星操作”(逐元素乘法)将原始和时间反转的尖峰发射率混合,扩展了隐式维度,作为时空正则化以进一步增强SNN的泛化能力。我们的方法仅涉及额外的时间反转操作和逐元素乘法,在训练过程中几乎没有额外的训练开销,并且完全不影响推理效率。对静态/神经形态对象/动作识别和3D点云分类任务的广泛实验表明了我们方法的有效性和泛化能力。特别是,仅使用两个时间步长,我们的方法在ImageNet和ModelNet40上分别实现了74.77\%和90.57\%的准确率。
更新时间: 2024-08-17 06:23:38
领域: cs.AI,cs.CV
Fragment-Masked Molecular Optimization
Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many target-based molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures or their hypothesized roles in combating diseases. However, challenges such as a limited number of available targets and a difficulty capturing clear structures hinder innovative drug development. In contrast, phenotypic drug discovery (PDD) does not depend on clear target structures and can identify hits with novel and unbiased polypharmacology signatures. As a result, PDD-based molecular optimization can reduce potential safety risks while optimizing phenotypic activity, thereby increasing the likelihood of clinical success. Therefore, we propose a fragment-masked molecular optimization method based on PDD (FMOP). FMOP employs a regression-free diffusion model to conditionally optimize the molecular masked regions without training, effectively generating new molecules with similar scaffolds. On the large-scale drug response dataset GDSCv2, we optimize the potential molecules across all 945 cell lines. The overall experiments demonstrate that the in-silico optimization success rate reaches 94.4%, with an average efficacy increase of 5.3%. Additionally, we conduct extensive ablation and visualization experiments, confirming that FMOP is an effective and robust molecular optimization method. The code is available at:https://anonymous.4open.science/r/FMOP-98C2.
Updated: 2024-08-17 06:00:58
标题: 分子优化的片段屏蔽
摘要: 分子优化是药物发现的一个关键方面,旨在通过优化分子结构来增强药物的功效并减少副作用,从而加速整个药物开发过程。许多基于靶点的分子优化方法已被提出,显著推动了药物发现的进展。这些方法主要是针对理解特定药物靶点结构或其在对抗疾病中的假设角色。然而,挑战如有限数量的可用靶点和难以捕捉清晰结构等阻碍了创新药物的发展。相比之下,表型药物发现(PDD)不依赖于清晰的靶点结构,可以识别具有新颖和无偏多药作用特征的活性化合物。因此,基于PDD的分子优化可以减少潜在的安全风险,同时优化表型活性,从而增加临床成功的可能性。因此,我们提出了一种基于PDD的片段掩码分子优化方法(FMOP)。FMOP采用无回归扩散模型,有条件地优化分子掩码区域而无需训练,有效生成具有相似骨架的新化合物。在大规模药物响应数据集GDSCv2上,我们优化了所有945个细胞系的潜在分子。总体实验表明,体外优化成功率达到94.4%,平均功效增加了5.3%。此外,我们进行了大量的消融和可视化实验,证实FMOP是一种有效且稳健的分子优化方法。该代码可在以下网址找到:https://anonymous.4open.science/r/FMOP-98C2。
更新时间: 2024-08-17 06:00:58
领域: q-bio.BM,cs.AI
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem. It significantly reduces human labeling costs and traditionally relies on multi-instance learning and pseudo-labeling. This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment Anything Model (SAM). WeakSAM addresses two critical limitations in traditional WSOD retraining, i.e., pseudo ground truth (PGT) incompleteness and noisy PGT instances, through adaptive PGT generation and Region of Interest (RoI) drop regularization. It also addresses the SAM's problems of requiring prompts and category unawareness for automatic object detection and segmentation. Our results indicate that WeakSAM significantly surpasses previous state-of-the-art methods in WSOD and WSIS benchmarks with large margins, i.e. average improvements of 7.4% and 8.5%, respectively. The code is available at \url{https://github.com/hustvl/WeakSAM}.
Updated: 2024-08-17 04:55:22
标题: WeakSAM:弱监督实例级别识别遇到分割任何东西
摘要: 使用不精确监督的弱监督视觉识别是一个关键而具有挑战性的学习问题。它显著降低了人力标注成本,并传统上依赖于多实例学习和伪标签。本文介绍了WeakSAM,并利用视觉基础模型中预先学习的世界知识,即Segment Anything Model (SAM),来解决弱监督目标检测(WSOD)和分割问题。WeakSAM通过自适应伪标签生成和感兴趣区域(RoI)丢弃正则化,解决了传统WSOD重新训练中的两个关键限制,即伪地面真实标注(PGT)的不完整性和嘈杂的PGT实例。它还解决了SAM在自动目标检测和分割中需要提示和类别不知情的问题。我们的结果表明,WeakSAM在WSOD和WSIS基准测试中明显超越了以往的最先进方法,分别平均提升了7.4%和8.5%。代码可在\url{https://github.com/hustvl/WeakSAM}上找到。
更新时间: 2024-08-17 04:55:22
领域: cs.CV,cs.AI
Depth-guided Texture Diffusion for Image Semantic Segmentation
Depth information provides valuable insights into the 3D structure especially the outline of objects, which can be utilized to improve the semantic segmentation tasks. However, a naive fusion of depth information can disrupt feature and compromise accuracy due to the modality gap between the depth and the vision. In this work, we introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge. Our method extracts low-level features from edges and textures to create a texture image. This image is then selectively diffused across the depth map, enhancing structural information vital for precisely extracting object outlines. By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image, enabling more accurate semantic segmentation. We conduct comprehensive experiments across diverse, commonly-used datasets spanning a wide range of semantic segmentation tasks, including Camouflaged Object Detection (COD), Salient Object Detection (SOD), and indoor semantic segmentation. With source-free estimated depth or depth captured by depth cameras, our method consistently outperforms existing baselines and achieves new state-of-theart results, demonstrating the effectiveness of our Depth-guided Texture Diffusion for image semantic segmentation.
Updated: 2024-08-17 04:55:03
标题: 深度引导的纹理扩散用于图像语义分割
摘要: 深度信息提供了有价值的见解,特别是对3D结构的轮廓,这可以被利用来改进语义分割任务。然而,深度信息的简单融合可能会破坏特征并降低准确性,因为深度和视觉之间存在模态差距。在这项工作中,我们引入了一种深度引导的纹理扩散方法,有效应对了这一挑战。我们的方法从边缘和纹理中提取低级特征,创建一个纹理图像。然后,这个图像被选择性地扩散到深度图上,增强了对于精确提取物体轮廓至关重要的结构信息。通过将这个丰富的深度图与原始RGB图像整合到一个联合特征嵌入中,我们的方法有效地弥合了深度图和图像之间的差距,实现了更准确的语义分割。我们在跨越广泛的语义分割任务的多样常用数据集上进行了全面实验,包括伪装物体检测(COD)、显著物体检测(SOD)和室内语义分割。通过无源估计的深度或深度相机捕获的深度,我们的方法始终优于现有基线,并取得了新的最先进结果,展示了我们的深度引导纹理扩散对图像语义分割的有效性。
更新时间: 2024-08-17 04:55:03
领域: cs.CV,cs.AI
Research on color recipe recommendation based on unstructured data using TENN
Recently, services and business models based on large language models, such as OpenAI Chatgpt, Google BARD, and Microsoft copilot, have been introduced, and the applications utilizing natural language processing with deep learning are increasing, and it is one of the natural language preprocessing methods. Conversion to machine language through tokenization and processing of unstructured data are increasing. Although algorithms that can understand and apply human language are becoming increasingly sophisticated, it is difficult to apply them to processes that rely on human emotions and senses in industries that still mainly deal with standardized data. In particular, in processes where brightness, saturation, and color information are essential, such as painting and injection molding, most small and medium-sized companies, excluding large corporations, rely on the tacit knowledge and sensibility of color mixers, and even customer companies often present non-standardized requirements. . In this paper, we proposed TENN to infer color recipe based on unstructured data with emotional natural language, and demonstrated it.
Updated: 2024-08-17 04:45:48
标题: 基于TENN使用非结构化数据的色彩配方推荐研究
摘要: 最近,基于大型语言模型的服务和商业模式,如OpenAI Chatgpt、Google BARD和微软copilot等,已经推出,并且利用深度学习的自然语言处理应用正在增加,这是自然语言预处理方法之一。通过标记化将数据转换为机器语言并处理非结构化数据的方式也在增加。尽管能够理解和应用人类语言的算法变得越来越复杂,但在仍然主要处理标准化数据的行业中,很难将其应用于依赖于人类情感和感官的流程。特别是在亮度、饱和度和颜色信息至关重要的过程中,如绘画和注塑,大多数中小型企业,除了大型公司,依赖于调色师的默契知识和感性,即使客户公司也经常提出非标准化要求。本文提出了基于情感自然语言的TENN来推断颜色配方,并进行了演示。
更新时间: 2024-08-17 04:45:48
领域: cs.AI
BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger
Multimodal Large Language Models (MLLMs) have showcased impressive performance in a variety of multimodal tasks. On the other hand, the integration of additional image modality may allow the malicious users to inject harmful content inside the images for jailbreaking. Unlike text-based LLMs, where adversaries need to select discrete tokens to conceal their malicious intent using specific algorithms, the continuous nature of image signals provides a direct opportunity for adversaries to inject harmful intentions. In this work, we propose $\textbf{BaThe}$ ($\textbf{Ba}$ckdoor $\textbf{T}$rigger S$\textbf{h}$i$\textbf{e}$ld), a simple yet effective jailbreak defense mechanism. Our work is motivated by recent research on jailbreak backdoor attack and virtual prompt backdoor attack in generative language models. Jailbreak backdoor attack uses harmful instructions combined with manually crafted strings as triggers to make the backdoored model generate prohibited responses. We assume that harmful instructions can function as triggers, and if we alternatively set rejection responses as the triggered response, the backdoored model then can defend against jailbreak attacks. We achieve this by utilizing virtual rejection prompt, similar to the virtual prompt backdoor attack. We embed the virtual rejection prompt into the soft text embeddings, which we call ``wedge''. Our comprehensive experiments demonstrate that BaThe effectively mitigates various types of jailbreak attacks and is adaptable to defend against unseen attacks, with minimal impact on MLLMs' performance.
Updated: 2024-08-17 04:43:26
标题: BaThe:将有害指令视为后门触发器,在多模式大型语言模型中防御越狱攻击
摘要: 多模态大型语言模型(MLLMs)在各种多模态任务中展示了令人印象深刻的性能。另一方面,额外图像模态的整合可能会让恶意用户向图像中注入有害内容以进行越狱。与基于文本的LLMs不同,在那里,对手需要选择离散的标记来隐藏他们的恶意意图,使用特定算法,图像信号的连续性质为对手提供了直接的机会来注入有害意图。在这项工作中,我们提出了$\textbf{BaThe}$($\textbf{Ba}$ckdoor $\textbf{T}$rigger S$\textbf{h}$i$\textbf{e}$ld),这是一种简单而有效的越狱防御机制。我们的工作受到最近关于生成语言模型中越狱后门攻击和虚拟提示后门攻击的研究的启发。越狱后门攻击使用有害指令结合手工制作的字符串作为触发器,使带有后门的模型生成禁止的响应。我们假设有害指令可以作为触发器,如果我们将拒绝响应设置为触发响应,那么带有后门的模型就可以抵御越狱攻击。我们通过利用虚拟拒绝提示来实现这一点,类似于虚拟提示后门攻击。我们将虚拟拒绝提示嵌入到软文本嵌入中,我们称之为“楔”。我们的全面实验表明,BaThe有效地减轻了各种类型的越狱攻击,并能够适应防御未知攻击,对MLLMs的性能影响很小。
更新时间: 2024-08-17 04:43:26
领域: cs.CR
Dynamic Graph Representation Learning for Passenger Behavior Prediction
Passenger behavior prediction aims to track passenger travel patterns through historical boarding and alighting data, enabling the analysis of urban station passenger flow and timely risk management. This is crucial for smart city development and public transportation planning. Existing research primarily relies on statistical methods and sequential models to learn from individual historical interactions, which ignores the correlations between passengers and stations. To address these issues, this paper proposes DyGPP, which leverages dynamic graphs to capture the intricate evolution of passenger behavior. First, we formalize passengers and stations as heterogeneous vertices in a dynamic graph, with connections between vertices representing interactions between passengers and stations. Then, we sample the historical interaction sequences for passengers and stations separately. We capture the temporal patterns from individual sequences and correlate the temporal behavior between the two sequences. Finally, we use an MLP-based encoder to learn the temporal patterns in the interactions and generate real-time representations of passengers and stations. Experiments on real-world datasets confirmed that DyGPP outperformed current models in the behavior prediction task, demonstrating the superiority of our model.
Updated: 2024-08-17 04:35:17
标题: 乘客行为预测的动态图表示学习
摘要: 乘客行为预测旨在通过历史上车和下车数据跟踪乘客的出行模式,从而实现对城市车站乘客流量的分析和及时的风险管理。这对于智慧城市发展和公共交通规划至关重要。现有研究主要依赖统计方法和顺序模型来从个体历史互动中学习,忽略了乘客和车站之间的相关性。为了解决这些问题,本文提出了DyGPP,利用动态图来捕捉乘客行为的复杂演变。首先,我们将乘客和车站形式化为动态图中的异质顶点,顶点之间的连接表示乘客和车站之间的互动。然后,我们分别对乘客和车站的历史互动序列进行采样。我们从个体序列中捕捉时间模式,并将两个序列之间的时间行为相关起来。最后,我们使用基于MLP的编码器来学习互动中的时间模式,并生成乘客和车站的实时表示。对真实世界数据集的实验证实,DyGPP在行为预测任务中胜过了当前模型,展示了我们模型的优越性。
更新时间: 2024-08-17 04:35:17
领域: cs.LG
Pre-processing matters: A segment search method for WSI classification
Pre-processing whole slide images (WSIs) can impact classification performance. Our study shows that using fixed hyper-parameters for pre-processing out-of-domain WSIs can significantly degrade performance. Therefore, it is critical to search domain-specific hyper-parameters during inference. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose SSAPT, a novel Similarity-based Simulated Annealing approach for fast parameter tuning to enhance inference performance on out-of-domain data. The proposed SSAPT achieves 5\% to 50\% improvement in accuracy with $\times5$ times faster parameter searching speed on average.
Updated: 2024-08-17 04:27:03
标题: 预处理很重要:一种用于WSI分类的分段搜索方法
摘要: 前处理整个幻灯片图像(WSIs)可能会影响分类性能。我们的研究表明,对于处理域外WSIs,使用固定的超参数可能会显著降低性能。因此,在推断过程中搜索特定领域的超参数是至关重要的。然而,寻找最佳参数集是耗时的。为了克服这一问题,我们提出了SSAPT,一种新颖的基于相似性的模拟退火方法,用于快速调整参数以提高处理域外数据的推断性能。所提出的SSAPT在平均参数搜索速度加快5倍的情况下,实现了5%至50%的准确性提高。
更新时间: 2024-08-17 04:27:03
领域: cs.CV,cs.LG
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test sets along with a correlation study within the Ko-H5 benchmark and temporal analyses of the Ko-H5 score. Moreover, we present empirical support for the need to expand beyond set benchmarks. We hope the Open Ko-LLM Leaderboard sets precedent for expanding LLM evaluation to foster more linguistic diversity.
Updated: 2024-08-17 03:45:25
标题: 开放的Ko-LLM排行榜:使用Ko-H5基准评估韩文大语言模型
摘要: 本文介绍了开放的Ko-LLM排行榜和Ko-H5基准测试作为评估韩语大型语言模型(LLMs)的重要工具。在模仿英语开放LLM排行榜的同时,结合私有测试集,我们建立了一个稳健的评估框架,已经在韩语LLM社区中得到很好的整合。我们进行了数据泄漏分析,展示了私有测试集的好处,以及Ko-H5基准测试内的相关性研究和Ko-H5得分的时间分析。此外,我们提出了需要超越设定基准的经验支持。我们希望开放的Ko-LLM排行榜为扩展LLM评估树立了先例,以促进更多的语言多样性。
更新时间: 2024-08-17 03:45:25
领域: cs.CL,cs.AI
A simple uniformly optimal method without line search for convex optimization
Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with H\"{o}lder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.
Updated: 2024-08-17 03:06:14
标题: 一种简单的无需线性搜索的凸优化均匀最优方法
摘要: 线搜索(或回溯)程序已被广泛应用于解决凸优化问题的一阶方法中,特别是那些具有未知问题参数(例如,利普希茨常数)的问题。本文表明,在解决凸优化问题时,线搜索在达到最优收敛速度方面是多余的,特别是对于那些参数未预先给定的问题。具体而言,我们提出了一种新颖的加速梯度下降类型算法,称为自适应快速梯度法(AC-FGM),可实现对光滑凸优化的最佳$\mathcal{O}(1/k^2)$收敛速度,而无需估计全局利普希茨常数或使用线搜索程序。然后,我们将AC-FGM扩展到解决具有H\"{o}lder连续梯度的凸优化问题,并表明它会自动实现所有问题类的最佳收敛速度,其精度为唯一的输入。最后,我们报告了一些令人鼓舞的数值结果,证明了AC-FGM相对于以前开发的无参数方法在凸优化中的优势。
更新时间: 2024-08-17 03:06:14
领域: math.OC,cs.LG
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification
Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.
Updated: 2024-08-17 03:01:59
标题: 将大型语言模型整合到三模态架构中,用于自动抑郁症分类
摘要: 抑郁症是一种普遍存在的精神健康状况,全球影响了3亿人。本研究提出了一种新颖的基于BiLSTM的三模态模型级融合架构,用于从临床访谈录音中二元分类抑郁症。所提出的架构结合了梅尔频率倒谱系数、面部动作单位,并使用基于两次学习的GPT-4模型处理文本数据。这是首次将大型语言模型融入多模态架构进行此任务。它在DAIC-WOZ AVEC 2016挑战赛的交叉验证分组和Leave-One-Subject-Out交叉验证分组上取得了令人印象深刻的结果,超过了所有基准模型和多个最先进模型。在Leave-One-Subject-Out测试中,准确率达到91.01%,F1分数为85.95%,精确度为80%,召回率为92.86%。
更新时间: 2024-08-17 03:01:59
领域: cs.CV,cs.AI,cs.LG,cs.MM
Twin Sorting Dynamic Programming Assisted User Association and Wireless Bandwidth Allocation for Hierarchical Federated Learning
In this paper, we study user association and wireless bandwidth allocation for a hierarchical federated learning system that consists of mobile users, edge servers, and a cloud server. To minimize the length of a global round in hierarchical federated learning with equal bandwidth allocation, we formulate a combinatorial optimization problem. We design the twin sorting dynamic programming (TSDP) algorithm that obtains a globally optimal solution in polynomial time when there are two edge servers. In addition, we put forward the TSDP-assisted algorithm for user association when there are three or more edge servers. Furthermore, given a user association matrix, we formulate and solve a convex optimization problem for optimal wireless bandwidth allocation. Simulation results show that the proposed approach outperforms a number of alternative schemes.
Updated: 2024-08-17 02:29:32
标题: 双重排序动态规划辅助用户关联和无线带宽分配用于分层联合学习
摘要: 在本文中,我们研究了一个由移动用户、边缘服务器和云服务器组成的分层联邦学习系统的用户关联和无线带宽分配问题。为了最小化具有相等带宽分配的分层联邦学习中全局轮次的长度,我们制定了一个组合优化问题。我们设计了双排序动态规划(TSDP)算法,在存在两个边缘服务器时,该算法能够在多项式时间内获得全局最优解。此外,当存在三个或更多边缘服务器时,我们提出了TSDP辅助算法用于用户关联。此外,给定一个用户关联矩阵,我们制定并解决了一个用于最优无线带宽分配的凸优化问题。仿真结果表明,提出的方法优于许多备选方案。
更新时间: 2024-08-17 02:29:32
领域: cs.LG,cs.NI
Gradient-Variation Online Learning under Generalized Smoothness
Gradient-variation online learning aims to achieve regret guarantees that scale with the variations in the gradients of online functions, which has been shown to be crucial for attaining fast convergence in games and robustness in stochastic optimization, hence receiving increased attention. Existing results often require the smoothness condition by imposing a fixed bound on the gradient Lipschitzness, but this may not hold in practice. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. In this paper, we systematically study gradient-variation online learning under generalized smoothness. To this end, we extend the classic optimistic mirror descent algorithm to derive gradient-variation bounds by conducting stability analysis over the optimization trajectory and exploiting smoothness locally. Furthermore, we explore universal online learning, designing a single algorithm enjoying optimal gradient-variation regrets for convex and strongly convex functions simultaneously without knowing curvature information. The algorithm adopts a two-layer structure with a meta-algorithm running over a group of base-learners. To ensure favorable guarantees, we have designed a new meta-algorithm that is Lipschitz-adaptive to handle potentially unbounded gradients and meanwhile ensures second-order regret to cooperate with base-learners. Finally, we provide implications of our findings and obtain new results in fast-rate games and stochastic extended adversarial optimization.
Updated: 2024-08-17 02:22:08
标题: 梯度变化在线学习在广义平滑性下
摘要: 梯度变化在线学习旨在实现遗憾保证,其与在线函数梯度变化成比例,已被证明对于在游戏中实现快速收敛和在随机优化中的鲁棒性至关重要,因此受到了越来越多的关注。现有结果通常需要通过对梯度Lipschitzness施加固定边界的平滑条件,但这在实践中可能不成立。神经网络优化的最新努力表明了一种广义平滑条件,允许平滑与梯度范数相关。在本文中,我们系统地研究了在广义平滑条件下的梯度变化在线学习。为此,我们将经典的乐观镜像下降算法扩展到通过在优化轨迹上进行稳定性分析并利用局部平滑度来推导梯度变化界。此外,我们探讨了通用在线学习,设计了一种单一算法,同时针对凸函数和强凸函数具有最佳梯度变化遗憾,而无需了解曲率信息。该算法采用两层结构,其中一个元算法在一组基学习器上运行。为了确保有利的保证,我们设计了一种新的元算法,适应于处理潜在的无界梯度,同时确保二阶遗憾与基学习器合作。最后,我们提供了我们发现的意义,并在快速游戏和随机扩展对抗优化中获得了新的结果。
更新时间: 2024-08-17 02:22:08
领域: cs.LG,math.OC
HookChain: A new perspective for Bypassing EDR Solutions
In the current digital security ecosystem, where threats evolve rapidly and with complexity, companies developing Endpoint Detection and Response (EDR) solutions are in constant search for innovations that not only keep up but also anticipate emerging attack vectors. In this context, this article introduces the HookChain, a look from another perspective at widely known techniques, which when combined, provide an additional layer of sophisticated evasion against traditional EDR systems. Through a precise combination of IAT Hooking techniques, dynamic SSN resolution, and indirect system calls, HookChain redirects the execution flow of Windows subsystems in a way that remains invisible to the vigilant eyes of EDRs that only act on Ntdll.dll, without requiring changes to the source code of the applications and malwares involved. This work not only challenges current conventions in cybersecurity but also sheds light on a promising path for future protection strategies, leveraging the understanding that continuous evolution is key to the effectiveness of digital security. By developing and exploring the HookChain technique, this study significantly contributes to the body of knowledge in endpoint security, stimulating the development of more robust and adaptive solutions that can effectively address the ever-changing dynamics of digital threats. This work aspires to inspire deep reflection and advancement in the research and development of security technologies that are always several steps ahead of adversaries.
Updated: 2024-08-17 02:12:39
标题: 钩链:绕过EDR解决方案的新视角
摘要: 在当前数字安全生态系统中,威胁迅速演变且复杂,开发端点检测和响应(EDR)解决方案的公司正在不断寻找创新,不仅要跟上,还要预测新兴的攻击向量。在这种背景下,本文介绍了HookChain,这是从另一个角度看广为人知的技术,当结合在一起时,提供了一层复杂的逃避传统EDR系统的防范。通过精确结合IAT Hooking技术、动态SSN解析和间接系统调用,HookChain以一种使其对只作用于Ntdll.dll的警惕EDR系统保持隐形的方式重定向Windows子系统的执行流程,而无需对涉及的应用程序和恶意软件的源代码进行更改。这项工作不仅挑战了当前的网络安全传统,还为未来的保护策略铺平了一条充满希望的道路,利用持续演变是数字安全有效性的关键的理解。通过开发和探索HookChain技术,本研究显著地贡献了端点安全知识体系,促进了更健壮和适应性更强的解决方案的开发,可以有效地应对数字威胁的不断变化动态。这项工作旨在激励对安全技术的研究和发展进行深入反思和进步,始终领先于对手。
更新时间: 2024-08-17 02:12:39
领域: cs.CR,cs.NI,cs.OS
Sentiment analysis of preservice teachers' reflections using a large language model
In this study, the emotion and tone of preservice teachers' reflections were analyzed using sentiment analysis with LLMs: GPT-4, Gemini, and BERT. We compared the results to understand how each tool categorizes and describes individual reflections and multiple reflections as a whole. This study aims to explore ways to bridge the gaps between qualitative, quantitative, and computational analyses of reflective practices in teacher education. This study finds that to effectively integrate LLM analysis into teacher education, developing an analysis method and result format that are both comprehensive and relevant for preservice teachers and teacher educators is crucial.
Updated: 2024-08-17 01:56:15
标题: 使用大型语言模型对未来教师的反思进行情感分析
摘要: 在这项研究中,使用情感分析工具LLMs:GPT-4、Gemini和BERT,分析了实习教师的反思的情感和语调。我们比较了结果,以了解每个工具如何对个体反思和整体反思进行分类和描述。本研究旨在探索如何弥合教师教育中反思实践的定性、定量和计算分析之间的差距。研究发现,为了有效地将LLM分析整合到教师教育中,关键是开发既全面又与实习教师和教师教育者相关的分析方法和结果格式。
更新时间: 2024-08-17 01:56:15
领域: cs.CL,cs.AI
Honor Among Bandits: No-Regret Learning for Online Fair Division
We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a variant of (stochastic) multi-armed bandits, where there exists an arm for each player's value for each type of good. At each time step, we choose a distribution over arms which determines how the next item is allocated. We consider two sets of fairness constraints for this problem: envy-freeness in expectation and proportionality in expectation. Our main result is the design of an explore-then-commit algorithm that achieves $\tilde{O}(T^{2/3})$ regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space. We also prove a lower bound of $\tilde{\Omega}(T^{2/3})$ regret for our setting, showing that our results are tight.
Updated: 2024-08-17 01:53:00
标题: 盗匪之间的荣誉:在线公平分配的无悔学习
摘要: 我们考虑在线公平分配不可分割物品给玩家的问题,当存在有限种类的物品并且玩家价值是从未知均值的分布中抽取的时候。我们的目标是在分配物品时公平地最大化社会福利。当玩家对某个物品的价值在分配时是未知的时候,我们表明这个问题可以归约为(随机)多臂老虎机的一种变体,其中每种物品对应每个玩家价值的一只手臂。在每个时间步长,我们选择一个手臂的分布,确定下一个物品的分配方式。我们考虑这个问题的两组公平性约束:期望下的无嫉妒和期望下的比例性。我们的主要结果是设计了一种探索-然后-承诺算法,实现了$ \tilde{O}(T^{2/3}) $的遗憾,同时保持任一公平性约束。这个结果依赖于公平分配约束的独特属性,这些属性允许更快的学习速率,尽管动作空间受限。我们还证明了对于我们的设置,存在$ \tilde{\Omega}(T^{2/3}) $的遗憾下界,表明我们的结果是紧密的。
更新时间: 2024-08-17 01:53:00
领域: cs.GT,cs.LG
Black-Box Optimization with Implicit Constraints for Public Policy
Black-box optimization (BBO) has become increasingly relevant for tackling complex decision-making problems, especially in public policy domains such as police redistricting. However, its broader application in public policymaking is hindered by the complexity of defining feasible regions and the high-dimensionality of decisions. This paper introduces a novel BBO framework, termed as the Conditional And Generative Black-box Optimization (CageBO). This approach leverages a conditional variational autoencoder to learn the distribution of feasible decisions, enabling a two-way mapping between the original decision space and a simplified, constraint-free latent space. The CageBO efficiently handles the implicit constraints often found in public policy applications, allowing for optimization in the latent space while evaluating objectives in the original space. We validate our method through a case study on large-scale police redistricting problems in Atlanta, Georgia. Our results reveal that our CageBO offers notable improvements in performance and efficiency compared to the baselines.
Updated: 2024-08-17 01:45:06
标题: 具有隐含约束的黑盒优化在公共政策中的应用
摘要: 黑盒优化(BBO)在处理复杂决策问题方面变得越来越重要,特别是在公共政策领域,如警察重新划分地区。然而,由于定义可行区域的复杂性和决策的高维度,它在公共政策制定中的广泛应用受到阻碍。本文介绍了一种新颖的BBO框架,称为条件和生成黑盒优化(CageBO)。这种方法利用条件变分自动编码器来学习可行决策的分布,实现原始决策空间和简化的、无约束的潜在空间之间的双向映射。CageBO有效处理了公共政策应用中经常出现的隐含约束,允许在潜在空间中进行优化,同时在原始空间中评估目标。我们通过在乔治亚州亚特兰大的大规模警察重新划分问题的案例研究验证了我们的方法。我们的结果显示,与基线相比,我们的CageBO在性能和效率方面都有显著改进。
更新时间: 2024-08-17 01:45:06
领域: stat.ML,cs.CE,cs.LG
Linking Robustness and Generalization: A k* Distribution Analysis of Concept Clustering in Latent Space for Vision Models
Most evaluations of vision models use indirect methods to assess latent space quality. These methods often involve adding extra layers to project the latent space into a new one. This projection makes it difficult to analyze and compare the original latent space. This article uses the k* Distribution, a local neighborhood analysis method, to examine the learned latent space at the level of individual concepts, which can be extended to examine the entire latent space. We introduce skewness-based true and approximate metrics for interpreting individual concepts to assess the overall quality of vision models' latent space. Our findings indicate that current vision models frequently fracture the distributions of individual concepts within the latent space. Nevertheless, as these models improve in generalization across multiple datasets, the degree of fracturing diminishes. A similar trend is observed in robust vision models, where increased robustness correlates with reduced fracturing. Ultimately, this approach enables a direct interpretation and comparison of the latent spaces of different vision models and reveals a relationship between a model's generalizability and robustness. Results show that as a model becomes more general and robust, it tends to learn features that result in better clustering of concepts. Project Website is available online at https://shashankkotyan.github.io/k-Distribution/
Updated: 2024-08-17 01:43:51
标题: 将鲁棒性和泛化联系起来:视觉模型中潜在空间中概念聚类的k*分布分析
摘要: 大多数视觉模型的评估使用间接方法来评估潜在空间质量。这些方法通常涉及添加额外的层来将潜在空间投影到一个新的空间中。这种投影使得分析和比较原始潜在空间变得困难。本文使用 k* 分布,一种局部邻域分析方法,来检查在个别概念层面学到的潜在空间,这可以扩展到检查整个潜在空间。我们引入基于偏度的真实和近似度量来解释个别概念,以评估视觉模型潜在空间的整体质量。我们的研究结果表明,当前的视觉模型经常会破坏潜在空间内个别概念的分布。然而,随着这些模型在多个数据集上的泛化能力的提高,破坏程度会减少。在强大的视觉模型中也观察到类似的趋势,其中增加的稳健性与减少的破坏相关。最终,这种方法使得可以直接解释和比较不同视觉模型的潜在空间,并揭示了模型的泛化能力和稳健性之间的关系。结果表明,随着模型变得更加普遍和稳健,它倾向于学习导致更好概念聚类的特征。项目网站可在线访问https://shashankkotyan.github.io/k-Distribution/。
更新时间: 2024-08-17 01:43:51
领域: cs.CV,cs.AI,cs.LG
MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality
Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial computational resources. To address these issues, we introduce Modality-aware Low-Rank Adaptation (MoRA), a computationally efficient method. MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities. Practically, MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires minimal computational resources, with less than 1.6% of the trainable parameters needed compared to training the entire model. Experimental results show that MoRA outperforms existing techniques in disease diagnosis, demonstrating superior performance, robustness, and training efficiency.
Updated: 2024-08-17 01:40:00
标题: MoRA:LoRA引导的具有缺失模态的多模态疾病诊断
摘要: 多模态预训练模型可以高效地提取和融合不同模态的特征,且对于微调具有低内存需求。尽管具有这种高效性,它们在疾病诊断中的应用仍未被充分探索。一个重要挑战是频繁出现缺失模态,这会影响性能。此外,微调整个预训练模型需要大量计算资源。为了解决这些问题,我们提出了一种计算效率高的方法,即Modality-aware Low-Rank Adaptation(MoRA)。MoRA将每个输入投影到低内在维度,但在缺失模态的情况下使用不同的模态感知上投影进行模态特定的适应。在实践中,MoRA集成到模型的第一个块中,在模态缺失时显著提高性能。与训练整个模型相比,它需要极少的计算资源,可减少不到1.6%的可训练参数。实验结果显示,MoRA在疾病诊断方面优于现有技术,表现出卓越的性能、稳健性和训练效率。
更新时间: 2024-08-17 01:40:00
领域: cs.CV,cs.LG
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed-forward network (FFN) to initialize the MoE's experts while merging other parameters. However, this method limits the reuse of dense model parameters to only the FFN layers, thereby constraining the advantages when "upcycling" these models into MoEs. We propose BAM (Branch-Attend-Mix), a simple yet effective method that addresses this shortcoming. BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers. We explore two methods for upcycling attention parameters: 1) initializing separate attention experts from dense models including all attention parameters for the best model performance; and 2) sharing key and value parameters across all experts to facilitate for better inference efficiency. To further improve efficiency, we adopt a parallel attention transformer architecture to MoEs, which allows the attention experts and FFN experts to be computed concurrently. Our experiments on seed models ranging from 590 million to 2 billion parameters demonstrate that BAM surpasses baselines in both perplexity and downstream task performance, within the same computational and data constraints.
Updated: 2024-08-17 01:33:30
标题: 砰!就是这样:混合专家模型的简单高效参数循环利用
摘要: 混合专家(MoE)框架已成为大型语言模型的流行架构,因为它在密集模型上表现出优越性能。然而,在大规模情况下从头开始训练MoEs是代价高昂的。现有方法通过独立预训练多个密集专家模型并使用它们来初始化MoE来缓解这一问题。这是通过使用专家的前馈网络(FFN)来初始化MoE的专家,同时合并其他参数来实现的。然而,这种方法将密集模型参数的重用限制在仅限于FFN层,从而限制了将这些模型转化为MoEs时的优势。我们提出了BAM(Branch-Attend-Mix),这是一个简单而有效的方法,解决了这个缺点。BAM充分利用了专门的密集模型,不仅使用它们的FFN来初始化MoE层,还通过将专家的注意力参数完全初始化为Mixture of Attention(MoA)层的软变体来充分利用专家的注意力参数。我们探索了两种方法来转化注意力参数:1)初始化来自密集模型的单独的注意力专家,其中包括所有注意力参数以获得最佳模型性能;2)在所有专家之间共享关键和值参数,以促进更好的推理效率。为了进一步提高效率,我们采用了并行注意力变换器架构到MoEs,这允许同时计算注意力专家和FFN专家。我们对从5.9亿到20亿参数的种子模型进行的实验表明,BAM在困惑度和下游任务性能方面均优于基线,在相同的计算和数据约束条件下。
更新时间: 2024-08-17 01:33:30
领域: cs.LG
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial role in extracting relevant information from clinical texts. Despite the promise of LLMs, current research mostly concentrates on document-level NER, identifying entities in a more general context across entire documents, without extracting their precise location. Additionally, efforts have been directed towards adapting ChatGPT for token-level NER. However, there is a significant research gap when it comes to employing token-level NER for clinical texts, especially with the use of local open-source LLMs. This study aims to bridge this gap by investigating the effectiveness of both proprietary and local LLMs in token-level clinical NER. Essentially, we delve into the capabilities of these models through a series of experiments involving zero-shot prompting, few-shot prompting, retrieval-augmented generation (RAG), and instruction-fine-tuning. Our exploration reveals the inherent challenges LLMs face in token-level NER, particularly in the context of rare diseases, and suggests possible improvements for their application in healthcare. This research contributes to narrowing a significant gap in healthcare informatics and offers insights that could lead to a more refined application of LLMs in the healthcare sector.
Updated: 2024-08-17 00:59:55
标题: 大型语言模型在标记级别的临床命名实体识别中遇到困难
摘要: 大型语言模型(LLMs)已经在包括医疗保健在内的各个领域引起了革命,它们在各种应用中被使用。它们在罕见疾病领域的作用尤为重要,因为数据稀缺、复杂性和特异性带来了相当大的挑战。在临床领域,命名实体识别(NER)是一个重要的任务,它在从临床文本中提取相关信息方面发挥着关键作用。尽管LLMs有很大的潜力,但目前的研究主要集中在文档级NER,识别更一般背景下整个文档中的实体,而不提取它们的精确位置。此外,也已经开始努力将ChatGPT适应于标记级NER。然而,在将标记级NER应用于临床文本时,特别是使用本地开源LLMs方面存在着重大的研究空白。本研究旨在通过研究专有和本地LLMs在标记级临床NER中的有效性来弥补这一空白。本质上,我们通过一系列涉及零次提示、少次提示、检索增强生成(RAG)和指令微调的实验来探讨这些模型的能力。我们的探究揭示了LLMs在标记级NER方面面临的固有挑战,特别是在罕见疾病的背景下,并提出了可能的改进措施,以提高它们在医疗保健领域的应用。这项研究有助于缩小医疗信息学中的重要差距,并提供了可能导致在医疗保健行业更精细应用LLMs的见解。
更新时间: 2024-08-17 00:59:55
领域: cs.CL,cs.AI,cs.LG
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
Large Transformer networks are increasingly used in settings where low inference latency can improve the end-user experience and enable new applications. However, autoregressive inference is resource intensive and requires parallelism for efficiency. Parallelism introduces collective communication that is both expensive and represents a phase when hardware resources are underutilized. Towards mitigating this, Kraken is an evolution of the standard Transformer architecture that is designed to complement existing tensor parallelism schemes for efficient inference on multi-device systems. By introducing a fixed degree of intra-layer model parallelism, the architecture allows collective operations to be overlapped with compute, decreasing latency and increasing hardware utilization. When trained on OpenWebText, Kraken models reach a similar perplexity as standard Transformers while also preserving their language modeling capabilities when evaluated on the SuperGLUE benchmark. Importantly, when tested on multi-GPU systems using TensorRT-LLM engines, Kraken speeds up Time To First Token by a mean of 35.6% across a range of model sizes, context lengths, and degrees of tensor parallelism.
Updated: 2024-08-17 00:58:10
标题: Kraken:具有固有并行性的变压器,用于高效的多设备推理
摘要: 大型Transformer网络越来越多地用于需要低推理延迟以改善最终用户体验并实现新应用的设置。然而,自回归推理对资源要求很高,需要并行性以提高效率。并行性引入了集体通信,既昂贵又代表了硬件资源被低效利用的一个阶段。为了缓解这一问题,Kraken是标准Transformer架构的演进,旨在与现有的张量并行性方案相辅相成,以在多设备系统上实现高效推理。通过引入固定程度的层内模型并行性,该架构允许集体操作与计算重叠,降低延迟并提高硬件利用率。在OpenWebText上训练时,Kraken模型达到了与标准Transformer类似的困惑度,同时在SuperGLUE基准测试上保持了其语言建模能力。重要的是,在使用TensorRT-LLM引擎的多GPU系统上进行测试时,Kraken通过一系列模型大小、上下文长度和张量并行度,平均加快了首个令牌的时间达35.6%。
更新时间: 2024-08-17 00:58:10
领域: cs.LG,cs.DC
Sim2Real in Reconstructive Spectroscopy: Deep Learning with Augmented Device-Informed Data Simulation
This work proposes a deep learning (DL)-based framework, namely Sim2Real, for spectral signal reconstruction in reconstructive spectroscopy, focusing on efficient data sampling and fast inference time. The work focuses on the challenge of reconstructing real-world spectral signals under the extreme setting where only device-informed simulated data are available for training. Such device-informed simulated data are much easier to collect than real-world data but exhibit large distribution shifts from their real-world counterparts. To leverage such simulated data effectively, a hierarchical data augmentation strategy is introduced to mitigate the adverse effects of this domain shift, and a corresponding neural network for the spectral signal reconstruction with our augmented data is designed. Experiments using a real dataset measured from our spectrometer device demonstrate that Sim2Real achieves significant speed-up during the inference while attaining on-par performance with the state-of-the-art optimization-based methods.
Updated: 2024-08-17 00:50:26
标题: 从模拟到真实:在重建光谱学中的Sim2Real:使用增强的基于设备信息的数据模拟进行深度学习
摘要: 这项工作提出了一个基于深度学习(DL)的框架,即Sim2Real,用于重建光谱信号,在重建光谱学中专注于高效的数据采样和快速推断时间。该工作侧重于在极端设置下重建现实世界光谱信号的挑战,其中仅有设备知情的模拟数据可用于训练。这种设备知情的模拟数据比真实世界数据更容易收集,但与真实世界对应的数据存在较大的分布偏移。为了有效利用这些模拟数据,引入了一种分层数据增强策略来减轻这种域偏移的不利影响,并设计了一个用于光谱信号重建的相应神经网络。通过使用从我们的光谱仪设备测量得到的真实数据集进行实验,证明Sim2Real在推断过程中实现了显著的加速,同时在性能上与基于优化的最新方法相当。
更新时间: 2024-08-17 00:50:26
领域: cs.LG,eess.SP
k* Distribution: Evaluating the Latent Space of Deep Neural Networks using Local Neighborhood Analysis
Most examinations of neural networks' learned latent spaces typically employ dimensionality reduction techniques such as t-SNE or UMAP. These methods distort the local neighborhood in the visualization, making it hard to distinguish the structure of a subset of samples in the latent space. In response to this challenge, we introduce the {k*~distribution} and its corresponding visualization technique This method uses local neighborhood analysis to guarantee the preservation of the structure of sample distributions for individual classes within the subset of the learned latent space. This facilitates easy comparison of different k*~distributions, enabling analysis of how various classes are processed by the same neural network. Our study reveals three distinct distributions of samples within the learned latent space subset: a) Fractured, b) Overlapped, and c) Clustered, providing a more profound understanding of existing contemporary visualizations. Experiments show that the distribution of samples within the network's learned latent space significantly varies depending on the class. Furthermore, we illustrate that our analysis can be applied to explore the latent space of diverse neural network architectures, various layers within neural networks, transformations applied to input samples, and the distribution of training and testing data for neural networks. Thus, the k* distribution should aid in visualizing the structure inside neural networks and further foster their understanding. Project Website is available online at https://shashankkotyan.github.io/k-Distribution/.
Updated: 2024-08-17 00:43:08
标题: K*分布:使用局部邻域分析评估深度神经网络的潜在空间
摘要: 大多数关于神经网络学习潜在空间的研究通常采用降维技术,如t-SNE或UMAP。这些方法扭曲了可视化中的局部邻域,使得难以区分潜在空间中样本子集的结构。为了应对这一挑战,我们引入了{k*~distribution}及其相应的可视化技术。该方法利用局部邻域分析来保证在学习潜在空间子集中个别类别的样本分布结构的保持。这有助于轻松比较不同k*~分布,从而分析各个类别如何被同一神经网络处理。我们的研究揭示了学习潜在空间子集中样本的三种不同分布:a) 断裂,b) 重叠,和c) 聚类,提供了对现有当代可视化更深刻的理解。实验表明,网络学习潜在空间中样本的分布在很大程度上取决于类别。此外,我们说明我们的分析可以应用于探索不同神经网络架构的潜在空间,神经网络内不同层次,应用于输入样本的转换,以及神经网络的训练和测试数据的分布。因此,k* distribution应有助于可视化神经网络内部结构,并进一步促进对其理解。项目网站可在线访问https://shashankkotyan.github.io/k-Distribution/.
更新时间: 2024-08-17 00:43:08
领域: cs.LG,cs.AI,cs.CV
Fairness-Aware Streaming Feature Selection with Causal Graphs
Its crux lies in the optimization of a tradeoff between accuracy and fairness of resultant models on the selected feature subset. The technical challenge of our setting is twofold: 1) streaming feature inputs, such that an informative feature may become obsolete or redundant for prediction if its information has been covered by other similar features that arrived prior to it, and 2) non-associational feature correlation, such that bias may be leaked from those seemingly admissible, non-protected features. To overcome this, we propose Streaming Feature Selection with Causal Fairness (SFCF) that builds two causal graphs egocentric to prediction label and protected feature, respectively, striving to model the complex correlation structure among streaming features, labels, and protected information. As such, bias can be eradicated from predictive modeling by removing those features being causally correlated with the protected feature yet independent to the labels. We theorize that the originally redundant features for prediction can later become admissible, when the learning accuracy is compromised by the large number of removed features (non-protected but can be used to reconstruct bias information). We benchmark SFCF\ on five datasets widely used in streaming feature research, and the results substantiate its performance superiority over six rival models in terms of efficiency and sparsity of feature selection and equalized odds of the resultant predictive models.
Updated: 2024-08-17 00:41:02
标题: 具有因果图的公平感知流式特征选择
摘要: 其关键在于在选择的特征子集上优化结果模型的准确性和公平性之间的权衡。我们设置的技术挑战是双重的:1)流式特征输入,这样一个信息量丰富的特征可能会因为其信息被先到达的其他类似特征所覆盖而变得过时或多余,以及2)非关联特征相关性,这样偏见可能会从那些表面上可接受的非受保护特征中泄漏。为了克服这一挑战,我们提出了具有因果公平性(SFCF)的流式特征选择方法,它建立了两个与预测标签和受保护特征各自相关的因果图,努力模拟流式特征、标签和受保护信息之间的复杂相关结构。因此,通过删除与受保护特征因果相关但与标签无关的特征,可以从预测建模中根除偏见。我们理论上认为,原本对预测多余的特征在学习准确性受到大量被移除特征(非受保护但可用于重建偏见信息)的影响时,后来可能会变得可接受。我们在五个广泛用于流式特征研究的数据集上对SFCF进行了基准测试,结果证实了其在特征选择的效率和稀疏性以及结果预测模型的平等几率方面优于六个竞争模型的性能。
更新时间: 2024-08-17 00:41:02
领域: cs.LG,cs.AI,cs.GR
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules. Meta evaluation verifies that RAGChecker has significantly better correlations with human judgments than other evaluation metrics. Using RAGChecker, we evaluate 8 RAG systems and conduct an in-depth analysis of their performance, revealing insightful patterns and trade-offs in the design choices of RAG architectures. The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems. This work has been open sourced at https://github.com/amazon-science/RAGChecker.
Updated: 2024-08-17 00:30:04
标题: RAGChecker:用于诊断检索增强生成的细粒度框架
摘要: 尽管检索增强生成(RAG)在利用外部知识方面显示出有希望的能力,但由于RAG的模块化性质,长篇回复的评估以及测量的可靠性,对RAG系统进行全面评估仍然具有挑战性。在本文中,我们提出了一个细粒度评估框架RAGChecker,该框架结合了一套诊断性指标,用于检索和生成模块。元评估验证了RAGChecker与人类判断之间的显著相关性远远优于其他评估指标。利用RAGChecker,我们评估了8个RAG系统,并对它们的表现进行了深入分析,揭示了RAG架构设计选择中的有见地的模式和权衡。RAGChecker的指标可以指导研究人员和从业者开发更有效的RAG系统。该工作已在https://github.com/amazon-science/RAGChecker 上开源。
更新时间: 2024-08-17 00:30:04
领域: cs.CL,cs.AI