Meta-Learning Loss Functions for Deep Neural Networks
Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.
Updated: 2024-06-29 23:51:03
标题: 深度神经网络的元学习损失函数
摘要: 人类通常可以在只给出少量示例的情况下,迅速高效地解决复杂的新学习任务。相比之下,现代人工智能系统通常需要成千上万甚至百万观察才能解决甚至最基本的任务。元学习旨在通过利用类似学习任务的过去经验,将适当的归纳偏见嵌入到学习系统中,以解决这个问题。历史上,元学习组件的方法,如优化器、参数初始化等,已经导致了显著的性能提升。本论文旨在探讨元学习的概念,通过通常被忽视的损失函数组件来提高性能。损失函数是学习系统的重要组成部分,因为它代表了主要的学习目标,系统成功优化该目标的能力通过它来确定和量化。
更新时间: 2024-06-29 23:51:03
领域: cs.LG,cs.AI,cs.NE
Improving Word Translation via Two-Stage Contrastive Learning
Word translation or bilingual lexicon induction (BLI) is a key cross-lingual task, aiming to bridge the lexical gap between different languages. In this work, we propose a robust and effective two-stage contrastive learning framework for the BLI task. At Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps. In Stage C2, we conduct BLI-oriented contrastive fine-tuning of mBERT, unlocking its word translation capability. We also show that static WEs induced from the `C2-tuned' mBERT complement static WEs from Stage C1. Comprehensive experiments on standard BLI datasets for diverse languages and different experimental setups demonstrate substantial gains achieved by our framework. While the BLI method from Stage C1 already yields substantial gains over all state-of-the-art BLI methods in our comparison, even stronger improvements are met with the full two-stage framework: e.g., we report gains for 112/112 BLI setups, spanning 28 language pairs.
Updated: 2024-06-29 23:11:29
标题: 通过两阶段对比学习改善单词翻译
摘要: 词语翻译或双语词汇归纳(BLI)是一个关键的跨语言任务,旨在弥合不同语言之间的词汇差距。在这项工作中,我们提出了一个健壮而有效的两阶段对比学习框架,用于BLI任务。在C1阶段,我们建议通过对比学习目标来优化静态词嵌入(WEs)之间的标准跨语言线性映射;我们还展示了如何将其整合到自学习过程中,以获取更精细的跨语言映射。在C2阶段,我们对mBERT进行面向BLI的对比微调,解锁其词语翻译能力。我们还表明,从“C2调整”mBERT中诱导出的静态WEs与C1阶段的静态WEs互补。对于不同语言和不同实验设置的标准BLI数据集进行的全面实验表明,我们的框架取得了显著的增益。尽管C1阶段的BLI方法已经比我们比较中所有最先进的BLI方法取得了显著的增益,但完整的两阶段框架还取得了更强大的改进:例如,我们报告了112/112个BLI设置的增益,涵盖了28个语言对。
更新时间: 2024-06-29 23:11:29
领域: cs.CL,cs.AI,cs.IR,cs.LG
Answering real-world clinical questions using large language model based systems
Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care.
Updated: 2024-06-29 22:39:20
标题: 使用基于大型语言模型的系统回答真实世界临床问题
摘要: 证据指导医疗决策往往受到相关和可信文献的缺乏以及难以将现有研究情境化以适用于特定患者的困扰。大型语言模型(LLMs)可能通过总结已发表的文献或基于真实世界数据(RWD)生成新研究来解决这两个挑战。我们评估了五种基于LLM的系统在回答50个临床问题的能力,并由九名独立医师审查回答的相关性、可靠性和可操作性。目前,通用型LLMs(ChatGPT-4、Claude 3 Opus、Gemini Pro 1.5)很少产生被认为是相关和以证据为基础的答案(2% - 10%)。相比之下,检索增强生成(RAG)和主动型LLM系统针对24%(OpenEvidence)至58%(ChatRWD)的问题产生相关和以证据为基础的答案。只有主动型ChatRWD能够回答新颖问题,不同于其他LLMs(65%对比0-9%)。这些结果表明,虽然通用型LLMs不应直接使用,但基于RAG的针对证据总结的专用系统以及基于协同工作的产生新证据的系统将改善为患者护理提供相关证据的可用性。
更新时间: 2024-06-29 22:39:20
领域: cs.CL,cs.AI,cs.IR
Privacy-Preserving and Trustworthy Deep Learning for Medical Imaging
The shift towards efficient and automated data analysis through Machine Learning (ML) has notably impacted healthcare systems, particularly Radiomics. Radiomics leverages ML to analyze medical images accurately and efficiently for precision medicine. Current methods rely on Deep Learning (DL) to improve performance and accuracy (Deep Radiomics). Given the sensitivity of medical images, ensuring privacy throughout the Deep Radiomics pipeline-from data generation and collection to model training and inference-is essential, especially when outsourced. Thus, Privacy-Enhancing Technologies (PETs) are crucial tools for Deep Radiomics. Previous studies and systematization efforts have either broadly overviewed PETs and their applications or mainly focused on subsets of PETs for ML algorithms. In Deep Radiomics, where efficiency, accuracy, and privacy are crucial, many PETs, while theoretically applicable, may not be practical without specialized optimizations or hybrid designs. Additionally, not all DL models are suitable for Radiomics. Consequently, there is a need for specialized studies that investigate and systematize the effective and practical integration of PETs into the Deep Radiomics pipeline. This work addresses this research gap by (1) classifying existing PETs, presenting practical hybrid PETS constructions, and a taxonomy illustrating their potential integration with the Deep Radiomics pipeline, with comparative analyses detailing assumptions, architectural suitability, and security, (2) Offering technical insights, describing potential challenges and means of combining PETs into the Deep Radiomics pipeline, including integration strategies, subtilities, and potential challenges, (3) Proposing potential research directions, identifying challenges, and suggesting solutions to enhance the PETs in Deep Radiomics.
Updated: 2024-06-29 22:26:05
标题: 隐私保护和值得信赖的医学影像深度学习
摘要: 通过机器学习(ML)实现高效自动化数据分析的转变显著影响了医疗系统,特别是放射组学。放射组学利用机器学习准确高效地分析医学图像,用于精准医学。当前方法依赖深度学习(DL)来提高性能和准确性(深度放射组学)。鉴于医学图像的敏感性,在整个深度放射组学流程中(从数据生成和收集到模型训练和推断),确保隐私至关重要,尤其是在外包时。因此,隐私增强技术(PETs)是深度放射组学的关键工具。先前的研究和系统化努力要么广泛概述了PETs及其应用,要么主要关注于ML算法的PETs子集。在深度放射组学中,效率、准确性和隐私至关重要,许多PETs虽然在理论上适用,但在没有专门优化或混合设计的情况下可能并不实用。此外,并非所有的DL模型都适用于放射组学。因此,有必要进行专门研究,以研究并系统化将PETs有效实用地整合到深度放射组学流程中。本研究通过(1)对现有PETs进行分类,提出实用的混合PETs构造,并提供一个说明它们如何与深度放射组学流程整合的分类法,通过比较分析详细说明假设、架构适用性和安全性,(2)提供技术见解,描述将PETs结合到深度放射组学流程中的潜在挑战和手段,包括整合策略、细微差别和潜在挑战,(3)提出潜在的研究方向,识别挑战并提出解决方案,以增强深度放射组学中的PETs。
更新时间: 2024-06-29 22:26:05
领域: cs.CR,cs.AI,cs.CV,eess.IV
Blockchain based Decentralized Petition System
A decentralized online petition system enables individuals or groups to create, sign, and share petitions without a central authority. Using blockchain technology, these systems ensure the integrity and transparency of the petition process by recording every signature or action on the blockchain, making alterations or deletions impossible. This provides a permanent, tamper-proof record of the petition's progress. Such systems allow users to bypass traditional intermediaries like government or social media platforms, fostering more democratic and transparent decision-making. This paper reviews research on petition systems, highlighting the shortcomings of existing systems such as lack of accountability, vulnerability to hacking, and security issues. The proposed blockchain-based implementation aims to overcome these challenges. Decentralized voting systems have garnered interest recently due to their potential to provide secure and transparent voting platforms without intermediaries, addressing issues like voter fraud, manipulation, and trust in the electoral process. We propose a decentralized voting system web application using blockchain technology to ensure the integrity and security of the voting process. This system aims to provide a transparent, decentralized decision-making process that counts every vote while eliminating the need for centralized authorities. The paper presents an overview of the system architecture, design considerations, and implementation details, along with the potential benefits and limitations. Finally, we discuss future research directions, examining the technical aspects of the application, including underlying algorithms and protocols. Our research aims to enhance the integrity and accessibility of democratic processes, improve security, and ensure fairness, transparency, and tamper-proofness.
Updated: 2024-06-29 21:44:35
标题: 基于区块链的去中心化请愿系统
摘要: 一个去中心化的在线请愿系统使个人或团体能够创建、签署和分享请愿,而无需中央机构。利用区块链技术,这些系统通过在区块链上记录每个签名或操作,确保请愿过程的完整性和透明性,使修改或删除变得不可能。这提供了请愿进展的永久、防篡改的记录。这种系统允许用户绕过传统的中介机构,如政府或社交媒体平台,促进更民主和透明的决策。 本文回顾了关于请愿系统的研究,突出了现有系统的缺点,如缺乏问责制,易受黑客攻击和安全问题。提出的基于区块链的实施旨在克服这些挑战。最近,去中心化投票系统引起了人们的兴趣,因为它们有潜力提供安全和透明的投票平台,无需中介机构,解决选民欺诈、操纵和对选举过程的信任等问题。 我们提出了一个使用区块链技术的去中心化投票系统Web应用程序,以确保投票过程的完整性和安全性。该系统旨在提供一个透明、去中心化的决策过程,每次投票都会被统计,同时消除了对中央机构的需求。本文介绍了系统架构、设计考虑因素和实施细节的概述,以及潜在的好处和局限性。 最后,我们讨论了未来的研究方向,审视应用程序的技术方面,包括底层算法和协议。我们的研究旨在增强民主过程的完整性和可访问性,改善安全性,并确保公平、透明和防篡改。
更新时间: 2024-06-29 21:44:35
领域: cs.CR
PhishNet: A Phishing Website Detection Tool using XGBoost
PhisNet is a cutting-edge web application designed to detect phishing websites using advanced machine learning. It aims to help individuals and organizations identify and prevent phishing attacks through a robust AI framework. PhisNet utilizes Python to apply various machine learning algorithms and feature extraction techniques for high accuracy and efficiency. The project starts by collecting and preprocessing a comprehensive dataset of URLs, comprising both phishing and legitimate sites. Key features such as URL length, special characters, and domain age are extracted to effectively train the model. Multiple machine learning algorithms, including logistic regression, decision trees, and neural networks, are evaluated to determine the best performance in phishing detection. The model is finely tuned to optimize metrics like accuracy, precision, recall, and the F1 score, ensuring reliable detection of both common and sophisticated phishing tactics. PhisNet's web application is developed using React.js, which allows for client-side rendering and smooth integration with backend services, creating a responsive and user-friendly interface. Users can input URLs and receive immediate predictions with confidence scores, thanks to a robust backend infrastructure that processes data and provides real-time results. The model is deployed using Google Colab and AWS EC2 for their computational power and scalability, ensuring the application remains accessible and functional under varying loads. In summary, PhisNet represents a significant advancement in cybersecurity, showcasing the effective use of machine learning and web development technologies to enhance user security. It empowers users to prevent phishing attacks and highlights AI's potential in transforming cybersecurity.
Updated: 2024-06-29 21:31:13
标题: PhishNet: 使用XGBoost的钓鱼网站检测工具
摘要: PhisNet是一款前沿的网络应用程序,旨在利用先进的机器学习技术检测网络钓鱼网站。它旨在通过强大的人工智能框架帮助个人和组织识别并防止网络钓鱼攻击。PhisNet利用Python应用各种机器学习算法和特征提取技术,以实现高精度和高效性。 该项目首先收集和预处理包括网络钓鱼和合法网站在内的全面URL数据集。提取关键特征,如URL长度、特殊字符和域名年龄,以有效训练模型。评估多种机器学习算法,包括逻辑回归、决策树和神经网络,以确定网络钓鱼检测的最佳性能。对模型进行精细调整,优化准确性、精确度、召回率和F1得分等指标,确保可靠检测常见和复杂的网络钓鱼策略。 PhisNet的网络应用程序使用React.js开发,实现客户端渲染,并与后端服务平滑集成,创建响应式和用户友好的界面。用户可以输入URL并获取带有置信度分数的即时预测,这得益于强大的后端基础设施处理数据并提供实时结果。模型使用Google Colab和AWS EC2部署,以确保应用程序在不同负载下始终可访问和可用。 总之,PhisNet代表了网络安全领域的重大进展,展示了机器学习和网络开发技术的有效运用,以增强用户安全性。它赋予用户预防网络钓鱼攻击的能力,并突显了人工智能在改变网络安全领域的潜力。
更新时间: 2024-06-29 21:31:13
领域: cs.CR,cs.LG
Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
Speech contains information that is clinically relevant to some diseases, which has the potential to be used for health assessment. Recent work shows an interest in applying deep learning algorithms, especially pretrained large speech models to the applications of Automatic Speech Assessment. One question that has not been explored is how these models output the results based on their inputs. In this work, we train and compare two configurations of Audio Spectrogram Transformer in the context of Voice Disorder Detection and apply the attention rollout method to produce model relevance maps, the computed relevance of the spectrogram regions when the model makes predictions. We use these maps to analyse how models make predictions in different conditions and to show that the spread of attention is reduced as a model is finetuned, and the model attention is concentrated on specific phoneme regions.
Updated: 2024-06-29 21:14:48
标题: 使用预训练语音模型进行声音障碍的自动语音评估研究
摘要: 演讲包含一些与一些疾病临床相关的信息,这些信息有潜力被用于健康评估。最近的工作表明,人们对应用深度学习算法,尤其是预训练的大型语音模型到自动语音评估的应用感兴趣。一个尚未探讨的问题是这些模型如何基于它们的输入输出结果。在这项工作中,我们训练和比较两种音频频谱变换器的配置,用于声音障碍检测,并应用关注度展开方法来生成模型相关性图,即模型在进行预测时计算出的频谱区域的相关性。我们使用这些图来分析模型在不同条件下进行预测的方式,并展示随着模型的微调,关注度的扩散减少,模型关注度集中在特定的音素区域。
更新时间: 2024-06-29 21:14:48
领域: cs.SD,cs.AI,eess.AS
Test Case Features as Hyper-heuristics for Inductive Programming
Instruction subsets are heuristics that can reduce the size of the inductive programming search space by tens of orders of magnitude. Comprising many overlapping subsets of different sizes, they serve as predictions of the instructions required to code a solution for any problem. Currently, this approach employs a single, large family of subsets meaning that some problems can search thousands of subsets before a solution is found. In this paper we introduce the use of test case type signatures as hyper-heuristics to select one of many, smaller families of instruction subsets. The type signature for any set of test cases maps directly to a single family and smaller families mean that fewer subsets need to be considered for most problems. Having many families also permits subsets to be reordered to better reflect their relative occurrence in human code - again reducing the search space size for many problems. Overall the new approach can further reduce the size of the inductive programming search space by between 1 and 3 orders of magnitude, depending on the type signature. Larger and more consistent reductions are possible through the use of more sophisticated type systems. The potential use of additional test case features as hyper-heuristics and some other possible future work is also briefly discussed.
Updated: 2024-06-29 19:46:11
标题: 将测试案例特征作为归纳编程的超启发式算法
摘要: 指令子集是一种启发式方法,可以将归纳编程搜索空间的大小减少数十个数量级。由许多重叠的不同大小的子集组成,它们用作为任何问题编写解决方案所需指令的预测。目前,这种方法采用单个大家族的子集,意味着在找到解决方案之前,某些问题可能需要搜索成千上万个子集。在本文中,我们介绍了使用测试用例类型签名作为超启发式方法来选择许多较小的指令子集家族之一。任何一组测试用例的类型签名直接映射到一个家族,较小的家族意味着大多数问题需要考虑的子集更少。拥有许多家族还允许重新排序子集,以更好地反映它们在人类代码中的相对出现情况 - 进一步减少许多问题的搜索空间大小。总体而言,这种新方法可以进一步减少归纳编程搜索空间的大小1到3个数量级,取决于类型签名。通过使用更复杂的类型系统,还可以实现更大和更一致的减少。还简要讨论了将附加测试用例特征用作超启发式方法以及一些可能的未来工作。
更新时间: 2024-06-29 19:46:11
领域: cs.AI,D.1.2; D.3.3; F.1.1; F.3.1; F.3.3; I.2.1; I.2.2; I.2.4; I.2.5; I.2.8; I.5.3
A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features
We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2 and 3-layer networks with piecewise linear activations, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in absolute value and symmetrized ReLU networks, a third layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.
Updated: 2024-06-29 19:21:59
标题: 一个镜像库:低维深度神经网络是具有反射特征的凸套索模型
摘要: 我们证明,在1-D数据上训练神经网络等价于解决一个具有固定、明确定义的特征字典矩阵的凸Lasso问题。具体的字典取决于激活函数和深度。我们考虑具有分段线性激活函数的2层和3层网络,以及具有符号激活函数和任意深度的矩形和树状网络。有趣的是,在绝对值和对称化ReLU网络中,第三层创建的特征代表了关于训练数据的镜像。Lasso表示法揭示了全局最优网络和解决方案空间的见解。
更新时间: 2024-06-29 19:21:59
领域: cs.LG,cs.AI,cs.NE,math.OC,stat.ML
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in \textit{a single autoregressive step}, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of an LLM to control a symbolic architecture which performs arithmetic. Our implementation using Llama 3 8B Instruct with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o and on par with GPT 4o using a code interpreter. OccamLlama also outperforms GPT 4o both with and without a code interpreter on mathematical problem solving benchmarks involving challenging arithmetic, thus enabling small LLMs to match the arithmetic performance of even much larger models. We will make our code public shortly.
Updated: 2024-06-29 19:13:23
标题: OccamLLM:一步快速准确的语言模型算术
摘要: 尽管文本生成和推理方面取得了显著进展,但大型语言模型(LLMs)在准确执行复杂算术运算方面仍面临挑战。为了实现准确的计算,语言模型系统通常会使LLMs生成算术运算的代码。然而,这种方法会影响速度和安全性,并且如果涉及微调,可能会使语言模型失去先前的能力。我们提出了一个框架,可以在一个自回归步骤中实现精确的算术运算,从而提供更快、更安全、更易解释的LLM系统,具有算术能力。我们利用LLM的隐藏状态来控制一个执行算术运算的符号架构。我们使用Llama 3 8B Instruct与OccamNet作为符号模型(OccamLlama)的实现,在单一算术运算($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$)中实现了100%的准确性,优于GPT 4o,并与使用代码解释器的GPT 4o持平。OccamLlama还在涉及具有挑战性的算术的数学问题解决基准测试中,无论是否使用代码解释器,也优于GPT 4o,从而使小型LLMs可以与更大模型的算术性能相匹敌。我们将很快公开我们的代码。
更新时间: 2024-06-29 19:13:23
领域: cs.CL,cs.AI,cs.LG
Stochastic stem bucking using mixture density neural networks
Poor bucking decisions made by forest harvesters can have a negative effect on the products that are generated from the logs. Making the right bucking decisions is not an easy task because harvesters must rely on predictions of the stem profile for the part of the stems that is not yet measured. The goal of this project is to improve the bucking decisions made by forest harvesters with a stochastic bucking method. We developed a Long Short-Term Memory (LSTM) neural network that predicted the parameters of a Gaussian distribution conditioned on the known part of the stem, enabling the creation of multiple samples of stem profile predictions for the unknown part of the stem. The bucking decisions could then be optimized using a novel stochastic bucking algorithm which used all the stem profiles generated to choose the logs to generate from the stem. The stochastic bucking algorithm was compared to two benchmark models: A polynomial model that could not condition its predictions on more than one diameter measurement, and a deterministic LSTM neural network. All models were evaluated on stem profiles of four coniferous species prevalent in eastern Canada. In general, the best bucking decisions were taken by the stochastic LSTM models, demonstrating the usefulness of the method. The second-best results were mostly obtained by the deterministic LSTM model and the worst results by the polynomial model, corroborating the usefulness of conditioning the stem curve predictions on multiple measurements.
Updated: 2024-06-29 18:44:49
标题: 混合密度神经网络在随机茎材屈曲中的应用
摘要: 森林采伐者所做的糟糕的拦下决策会对从原木生成的产品产生负面影响。做出正确的拦下决策并不是一项容易的任务,因为采伐者必须依靠对尚未测量的部分的茎轮廓的预测。该项目的目标是通过一种随机拦下方法改进森林采伐者所做的拦下决策。我们开发了一个长短期记忆(LSTM)神经网络,该网络预测了一个高斯分布的参数,条件是已知茎的部分,从而能够为未知茎的部分创建多个茎轮廓预测样本。然后可以使用一种新颖的随机拦下算法来优化拦下决策,该算法使用生成的所有茎轮廓来选择要从茎中生成的原木。将随机拦下算法与两个基准模型进行了比较:一个多项式模型,该模型无法将其预测条件设置在超过一个直径测量之上,以及一个确定性LSTM神经网络。所有模型块均在加拿大东部普遍存在的四种针叶树种的茎轮廓上进行了评估。总体而言,最佳的拦下决策由随机LSTM模型做出,显示了该方法的实用性。第二好的结果主要由确定性LSTM模型获得,而最差的结果由多项式模型获得,证实了将茎曲线预测条件设置在多个测量上的实用性。
更新时间: 2024-06-29 18:44:49
领域: cs.LG,cs.AI
Leveraging Ontologies to Document Bias in Data
Machine Learning (ML) systems are capable of reproducing and often amplifying undesired biases. This puts emphasis on the importance of operating under practices that enable the study and understanding of the intrinsic characteristics of ML pipelines, prompting the emergence of documentation frameworks with the idea that ``any remedy for bias starts with awareness of its existence''. However, a resource that can formally describe these pipelines in terms of biases detected is still amiss. To fill this gap, we present the Doc-BiasO ontology, a resource that aims to create an integrated vocabulary of biases defined in the \textit{fair-ML} literature and their measures, as well as to incorporate relevant terminology and the relationships between them. Overseeing ontology engineering best practices, we re-use existing vocabulary on machine learning and AI, to foster knowledge sharing and interoperability between the actors concerned with its research, development, regulation, among others. Overall, our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI and to improve the interpretation of bias in data and downstream impact.
Updated: 2024-06-29 18:41:07
标题: 利用本体论来记录数据中的偏见
摘要: 机器学习(ML)系统能够复制并经常放大不良偏见。这强调了在实践中操作的重要性,以便研究和理解ML管道的固有特征,促使出现了旨在“任何消除偏见的方法都始于意识到其存在”的文档框架。然而,一个能够正式描述这些管道中检测到的偏见的资源仍然缺失。为了填补这一空白,我们提出了Doc-BiasO本体论,这是一个旨在创建一个集成的词汇表,定义了\textit{fair-ML}文献中的偏见及其度量标准,同时还包括相关术语及它们之间的关系。在监督本体工程最佳实践的同时,我们重用了现有的机器学习和人工智能词汇,以促进参与其研究、开发、监管等工作的相关人员之间的知识共享和互操作性。总体而言,我们的主要目标是为了促进关于偏见研究的现有术语的澄清,因为它迅速扩展到人工智能的所有领域,并改善数据中偏见的解释和下游影响。
更新时间: 2024-06-29 18:41:07
领域: cs.AI
ShapG: new feature importance method based on the Shapley value
With wide application of Artificial Intelligence (AI), it has become particularly important to make decisions of AI systems explainable and transparent. In this paper, we proposed a new Explainable Artificial Intelligence (XAI) method called ShapG (Explanations based on Shapley value for Graphs) for measuring feature importance. ShapG is a model-agnostic global explanation method. At the first stage, it defines an undirected graph based on the dataset, where nodes represent features and edges are added based on calculation of correlation coefficients between features. At the second stage, it calculates an approximated Shapley value by sampling the data taking into account this graph structure. The sampling approach of ShapG allows to calculate the importance of features efficiently, i.e. to reduce computational complexity. Comparison of ShapG with other existing XAI methods shows that it provides more accurate explanations for two examined datasets. We also compared other XAI methods developed based on cooperative game theory with ShapG in running time, and the results show that ShapG exhibits obvious advantages in its running time, which further proves efficiency of ShapG. In addition, extensive experiments demonstrate a wide range of applicability of the ShapG method for explaining complex models. We find ShapG an important tool in improving explainability and transparency of AI systems and believe it can be widely used in various fields.
Updated: 2024-06-29 18:19:55
标题: ShapG:基于Shapley值的新特征重要性方法
摘要: 随着人工智能(AI)的广泛应用,使得AI系统的决策变得可解释和透明变得尤为重要。在本文中,我们提出了一种新的可解释人工智能(XAI)方法,称为ShapG(基于Shapley值的图解释),用于衡量特征重要性。ShapG是一种与模型无关的全局解释方法。在第一阶段,它基于数据集定义了一个无向图,其中节点代表特征,边根据特征之间的相关系数计算而添加。在第二阶段,它通过对数据进行采样考虑该图结构来计算近似的Shapley值。ShapG的采样方法可以有效地计算特征的重要性,即减少了计算复杂性。将ShapG与其他现有的XAI方法进行比较表明,在两个研究数据集上,它提供了更准确的解释。我们还将基于合作博弈理论开发的其他XAI方法与ShapG在运行时间上进行了比较,结果显示ShapG在运行时间上具有明显的优势,进一步证明了ShapG的效率。此外,大量实验展示了ShapG方法在解释复杂模型方面的广泛适用性。我们发现ShapG是提高AI系统可解释性和透明性的重要工具,并相信它可以广泛应用于各个领域。
更新时间: 2024-06-29 18:19:55
领域: cs.AI,cs.GT,cs.LG,68T01, 68T20
Federated Graph Semantic and Structural Learning
Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenges. Most relative arts focus on traditional distributed tasks like images and voices, incapable of graph structures. This paper firstly reveals that local client distortion is brought by both node-level semantics and graph-level structure. First, for node-level semantics, we find that contrasting nodes from distinct classes is beneficial to provide a well-performing discrimination. We pull the local node towards the global node of the same class and push it away from the global node of different classes. Second, we postulate that a well-structural graph neural network possesses similarity for neighbors due to the inherent adjacency relationships. However, aligning each node with adjacent nodes hinders discrimination due to the potential class inconsistency. We transform the adjacency relationships into the similarity distribution and leverage the global model to distill the relation knowledge into the local model, which preserves the structural information and discriminability of the local model. Empirical results on three graph datasets manifest the superiority of the proposed method over its counterparts.
Updated: 2024-06-29 18:17:40
标题: 联邦图语义和结构学习
摘要: 联邦图学习协作地学习具有分布式图的全局图神经网络,其中非独立和同分布性质是主要挑战之一。大多数相关文献关注传统的分布式任务,如图像和语音,而无法处理图结构。本文首先揭示了本地客户端失真是由节点级语义和图级结构带来的。首先,在节点级语义方面,我们发现对比不同类别的节点有利于提供良好的区分性能。我们将本地节点拉向同一类别的全局节点,并将其推离不同类别的全局节点。其次,我们假设一个良好结构的图神经网络由于固有的邻接关系而使邻居之间具有相似性。然而,将每个节点与邻接节点对齐会因潜在的类别不一致性而阻碍区分性能。我们将邻接关系转化为相似性分布,并利用全局模型将关系知识提炼到本地模型中,从而保留本地模型的结构信息和区分能力。三个图数据集上的实证结果显示,所提出的方法优于其对手。
更新时间: 2024-06-29 18:17:40
领域: cs.LG,cs.AI
How Well Do Large Language Models Truly Ground?
To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge. We introduce a new dataset and a grounding metric to evaluate model capability under the definition. We perform experiments across 25 LLMs of different sizes and training methods and provide insights into factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.
Updated: 2024-06-29 18:07:34
标题: 大型语言模型真正扎根得有多好?
摘要: 为了减少大型语言模型(LLMs)中出现幻觉和缺乏控制等问题,一种常见的方法是通过在输入中提供外部语境来生成响应,即知识增强模型。然而,先前的研究通常将“基础”狭义地定义为仅仅拥有正确的答案,这并不能确保整个响应的可靠性。为了克服这一问题,我们提出了一个更严格的基础定义:如果模型(1)充分利用了提供的语境中的必要知识,并且(2)在该知识的限制范围内保持,那么模型才真正具有基础。我们引入了一个新的数据集和一个基础度量标准,以评估模型在这一定义下的能力。我们在25个不同大小和训练方法的LLMs上进行实验,并提供了影响基础表现的因素的见解。我们的发现有助于更好地理解如何提高基础能力,并提出了一个改进方向,以实现更可靠和可控的LLM应用。
更新时间: 2024-06-29 18:07:34
领域: cs.CL,cs.AI
Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting
While most time series are non-stationary, it is inevitable for models to face the distribution shift issue in time series forecasting. Existing solutions manipulate statistical measures (usually mean and std.) to adjust time series distribution. However, these operations can be theoretically seen as the transformation towards zero frequency component of the spectrum which cannot reveal full distribution information and would further lead to information utilization bottleneck in normalization, thus hindering forecasting performance. To address this problem, we propose to utilize the whole frequency spectrum to transform time series to make full use of data distribution from the frequency perspective. We present a deep frequency derivative learning framework, DERITS, for non-stationary time series forecasting. Specifically, DERITS is built upon a novel reversible transformation, namely Frequency Derivative Transformation (FDT) that makes signals derived in the frequency domain to acquire more stationary frequency representations. Then, we propose the Order-adaptive Fourier Convolution Network to conduct adaptive frequency filtering and learning. Furthermore, we organize DERITS as a parallel-stacked architecture for the multi-order derivation and fusion for forecasting. Finally, we conduct extensive experiments on several datasets which show the consistent superiority in both time series forecasting and shift alleviation.
Updated: 2024-06-29 17:56:59
标题: 深度频率导数学习用于非平稳时间序列预测
摘要: 尽管大多数时间序列是非平稳的,模型在时间序列预测中面对分布偏移问题是不可避免的。现有的解决方案通过操纵统计量(通常是均值和标准差)来调整时间序列分布。然而,这些操作在理论上可以看作是向频谱的零频率分量转换,无法揭示完整的分布信息,进而导致规范化中的信息利用瓶颈,从而阻碍预测性能。为了解决这个问题,我们提出利用整个频谱来转换时间序列,从频率的角度充分利用数据分布。我们提出了一种深频率导数学习框架DERITS,用于非平稳时间序列预测。具体地,DERITS建立在一种新颖的可逆转换之上,即频率导数转换(FDT),使得在频率域中导出的信号获得更稳定的频率表示。然后,我们提出了自适应阶数傅立叶卷积网络进行自适应频率滤波和学习。此外,我们将DERITS组织为并行堆叠的架构,用于多阶导数和融合进行预测。最后,我们在几个数据集上进行了大量实验,结果表明在时间序列预测和偏移缓解方面均具有一致的优势。
更新时间: 2024-06-29 17:56:59
领域: cs.LG,cs.AI
Aeroengine performance prediction using a physical-embedded data-driven method
Accurate and efficient prediction of aeroengine performance is of paramount importance for engine design, maintenance, and optimization endeavours. However, existing methodologies often struggle to strike an optimal balance among predictive accuracy, computational efficiency, modelling complexity, and data dependency. To address these challenges, we propose a strategy that synergistically combines domain knowledge from both the aeroengine and neural network realms to enable real-time prediction of engine performance parameters. Leveraging aeroengine domain knowledge, we judiciously design the network structure and regulate the internal information flow. Concurrently, drawing upon neural network domain expertise, we devise four distinct feature fusion methods and introduce an innovative loss function formulation. To rigorously evaluate the effectiveness and robustness of our proposed strategy, we conduct comprehensive validation across two distinct datasets. The empirical results demonstrate :(1) the evident advantages of our tailored loss function; (2) our model's ability to maintain equal or superior performance with a reduced parameter count; (3) our model's reduced data dependency compared to generalized neural network architectures; (4)Our model is more interpretable than traditional black box machine learning methods.
Updated: 2024-06-29 17:56:58
标题: 使用物理嵌入数据驱动方法预测航空发动机性能
摘要: 精确和高效地预测航空发动机性能对于发动机设计、维护和优化工作至关重要。然而,现有的方法往往难以在预测准确性、计算效率、建模复杂性和数据依赖性之间找到最佳平衡。为了解决这些挑战,我们提出了一种策略,通过协同结合航空发动机和神经网络领域的领域知识,实现对发动机性能参数的实时预测。利用航空发动机领域知识,我们精心设计网络结构并调节内部信息流。同时,借鉴神经网络领域专业知识,我们设计了四种不同的特征融合方法,并引入了创新的损失函数公式。为了严格评估我们提出的策略的有效性和稳健性,我们在两个不同的数据集上进行了全面验证。实证结果表明:(1)我们量身定制的损失函数的明显优势;(2)我们的模型能够在减少参数数量的情况下保持相等或更优越的性能;(3)我们的模型相对于泛化的神经网络架构减少了数据依赖性;(4)我们的模型比传统的黑匣子机器学习方法更具可解释性。
更新时间: 2024-06-29 17:56:58
领域: cs.LG,cs.AI,cs.CE
Intrinsic PAPR for Point-level 3D Scene Albedo and Shading Editing
Recent advancements in neural rendering have excelled at novel view synthesis from multi-view RGB images. However, they often lack the capability to edit the shading or colour of the scene at a detailed point-level, while ensuring consistency across different viewpoints. In this work, we address the challenge of point-level 3D scene albedo and shading editing from multi-view RGB images, focusing on detailed editing at the point-level rather than at a part or global level. While prior works based on volumetric representation such as NeRF struggle with achieving 3D consistent editing at the point level, recent advancements in point-based neural rendering show promise in overcoming this challenge. We introduce ``Intrinsic PAPR'', a novel method based on the recent point-based neural rendering technique Proximity Attention Point Rendering (PAPR). Unlike other point-based methods that model the intrinsic decomposition of the scene, our approach does not rely on complicated shading models or simplistic priors that may not universally apply. Instead, we directly model scene decomposition into albedo and shading components, leading to better estimation accuracy. Comparative evaluations against the latest point-based inverse rendering methods demonstrate that Intrinsic PAPR achieves higher-quality novel view rendering and superior point-level albedo and shading editing.
Updated: 2024-06-29 17:46:10
标题: 点级别3D场景反照率和阴影编辑的内在PAPR
摘要: 最近神经渲染技术在多视角RGB图像的新视图合成方面取得了显著进展。然而,它们通常缺乏编辑场景阴影或颜色的能力,尤其是在详细的点级别上,同时确保在不同视角之间的一致性。在这项工作中,我们解决了从多视角RGB图像进行点级别3D场景反照率和阴影编辑的挑战,重点关注点级别的详细编辑,而不是部分或全局级别。虽然基于体积表示的先前作品(如NeRF)在点级别难以实现一致的3D编辑,但最近点的神经渲染技术的进展显示出克服这一挑战的希望。我们介绍了一种基于最近点的神经渲染技术Proximity Attention Point Rendering(PAPR)的新方法“Intrinsic PAPR”。与其他模拟场景固有分解的点级方法不同,我们的方法不依赖于复杂的阴影模型或简化的先验,这些模型可能并不普遍适用。相反,我们直接对场景进行反照率和阴影分解建模,从而提高了估计准确性。与最新的基于点的逆渲染方法进行比较评估表明,Intrinsic PAPR实现了更高质量的新视图渲染和更优秀的点级别反照率和阴影编辑。
更新时间: 2024-06-29 17:46:10
领域: cs.CV,cs.AI,cs.GR,cs.LG
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended NLG tasks. We propose a sampling-based uncertainty measure leveraging self-consistency and develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the design of the CP algorithm. Experimental results indicate that our uncertainty measure generally surpasses prior state-of-the-art methods. Furthermore, we calibrate the prediction sets within the model's unfixed answer distribution and achieve strict control over the correctness coverage rate across 6 LLMs on 4 free-form NLG datasets, spanning general-purpose and medical domains, while the small average set size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.
Updated: 2024-06-29 17:33:07
标题: ConU:具有正确性覆盖率保证的大型语言模型中的一致性不确定性
摘要: 自然语言生成(NLG)任务中的不确定性量化(UQ)仍然是一个开放挑战,最近大型语言模型(LLMs)复杂的性质加剧了这一挑战。本研究调查了将符合性预测(CP)应用于黑盒LLMs在开放式NLG任务中的可能性,CP可以通过构建预测集将任何启发式不确定性度量转化为严格的理论保证。我们提出了一种基于抽样的不确定性度量,利用自一致性,并通过将与正确性一致的不确定性条件整合到CP算法的设计中,开发了一个符合性不确定性标准。实验结果表明,我们的不确定性度量通常优于先前最先进的方法。此外,我们在模型的未固定答案分布中校准了预测集,并在4个自由形式NLG数据集上的6个LLMs上实现了对正确性覆盖率的严格控制,这些数据集涵盖了通用和医学领域,而小平均集大小进一步凸显了我们方法在为实际开放式NLG应用提供可信保证方面的效率。
更新时间: 2024-06-29 17:33:07
领域: cs.CL,cs.AI,cs.LG
A Two-stage Reinforcement Learning-based Approach for Multi-entity Task Allocation
Task allocation is a key combinatorial optimization problem, crucial for modern applications such as multi-robot cooperation and resource scheduling. Decision makers must allocate entities to tasks reasonably across different scenarios. However, traditional methods assume static attributes and numbers of tasks and entities, often relying on dynamic programming and heuristic algorithms for solutions. In reality, task allocation resembles Markov decision processes, with dynamically changing task and entity attributes. Thus, algorithms must dynamically allocate tasks based on their states. To address this issue, we propose a two-stage task allocation algorithm based on similarity, utilizing reinforcement learning to learn allocation strategies. The proposed pre-assign strategy allows entities to preselect appropriate tasks, effectively avoiding local optima and thereby better finding the optimal allocation. We also introduce an attention mechanism and a hyperparameter network structure to adapt to the changing number and attributes of entities and tasks, enabling our network structure to generalize to new tasks. Experimental results across multiple environments demonstrate that our algorithm effectively addresses the challenges of dynamic task allocation in practical applications. Compared to heuristic algorithms like genetic algorithms, our reinforcement learning approach better solves dynamic allocation problems and achieves zero-shot generalization to new tasks with good performance. The code is available at https://github.com/yk7333/TaskAllocation.
Updated: 2024-06-29 17:13:44
标题: 一个基于两阶段强化学习的多实体任务分配方法
摘要: 任务分配是一个关键的组合优化问题,对于现代应用程序如多机器人协作和资源调度至关重要。决策者必须在不同场景下合理地分配实体到任务中。然而,传统方法通常假设任务和实体的属性和数量是静态的,通常依赖于动态规划和启发式算法来解决问题。实际上,任务分配类似于马尔可夫决策过程,其中任务和实体属性会动态变化。因此,算法必须根据它们的状态动态分配任务。为了解决这个问题,我们提出了一种基于相似性的两阶段任务分配算法,利用强化学习来学习分配策略。所提出的预分配策略允许实体预先选择适当的任务,有效地避免局部最优解,从而更好地找到最佳分配。我们还引入了一个注意机制和一个超参数网络结构,以适应实体和任务的数量和属性的变化,使我们的网络结构能够推广到新任务。跨多个环境的实验结果表明,我们的算法有效地解决了实际应用中动态任务分配的挑战。与遗传算法等启发式算法相比,我们的强化学习方法更好地解决了动态分配问题,并实现了对新任务的零-shot泛化,表现良好。该代码可在https://github.com/yk7333/TaskAllocation 上找到。
更新时间: 2024-06-29 17:13:44
领域: cs.LG,cs.AI
PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models
Large Language Models (LLMs) excel in fluency but risk producing inaccurate content, called "hallucinations." This paper outlines a standardized process for categorizing fine-grained hallucination types and proposes an innovative framework--the Progressive Fine-grained Model Editor (PFME)--specifically designed to detect and correct fine-grained hallucinations in LLMs. PFME consists of two collaborative modules: the Real-time Fact Retrieval Module and the Fine-grained Hallucination Detection and Editing Module. The former identifies key entities in the document and retrieves the latest factual evidence from credible sources. The latter further segments the document into sentence-level text and, based on relevant evidence and previously edited context, identifies, locates, and edits each sentence's hallucination type. Experimental results on FavaBench and FActScore demonstrate that PFME outperforms existing methods in fine-grained hallucination detection tasks. Particularly, when using the Llama3-8B-Instruct model, PFME's performance in fine-grained hallucination detection with external knowledge assistance improves by 8.7 percentage points (pp) compared to ChatGPT. In editing tasks, PFME further enhances the FActScore of FActScore-Alpaca13B and FActScore-ChatGPT datasets, increasing by 16.2pp and 4.6pp, respectively.
Updated: 2024-06-29 16:35:57
标题: PFME:一种用于大型语言模型细粒度虚构检测和编辑的模块化方法
摘要: 大型语言模型(LLMs)在流畅性方面表现优异,但存在生产不准确内容的风险,称为“幻觉”。本文概述了一个标准化过程,用于对幻觉类型进行细致分类,并提出了一个创新框架——逐步细粒度模型编辑器(PFME)——专门设计用于检测和纠正LLMs中的细粒度幻觉。PFME由两个协作模块组成:实时事实检索模块和细粒度幻觉检测和编辑模块。前者识别文档中的关键实体,并从可信来源检索最新的事实证据。后者进一步将文档分段为句子级文本,并根据相关证据和先前编辑的上下文,识别、定位和编辑每个句子的幻觉类型。在FavaBench和FActScore上的实验结果表明,PFME在细粒度幻觉检测任务中优于现有方法。特别是,在使用Llama3-8B-Instruct模型时,PFME在辅助外部知识的细粒度幻觉检测性能方面比ChatGPT提高了8.7个百分点(pp)。在编辑任务中,PFME进一步提高了FActScore-Alpaca13B和FActScore-ChatGPT数据集的FActScore,分别增加了16.2pp和4.6pp。
更新时间: 2024-06-29 16:35:57
领域: cs.CL,cs.AI
Navigating the road to automotive cybersecurity compliance
The automotive industry has evolved significantly since the introduction of the Ford Model T in 1908. Today's vehicles are not merely mechanical constructs; they are integral components of a complex digital ecosystem, equipped with advanced connectivity features powered by Artificial Intelligence and cloud computing technologies. This evolution has enhanced vehicle safety, efficiency, and the overall driving experience. However, it also introduces new challenges, notably in cybersecurity. With the increasing integration of digital technologies, vehicles have become more susceptible to cyber-attacks, prompting significant cybersecurity concerns. These concerns include securing sensitive data, protecting vehicles from unauthorized access, and ensuring user privacy. In response, the automotive industry is compelled to adopt robust cybersecurity measures to safeguard both vehicles and data against potential threats. Legislative frameworks such as UNR155 and UNR156 by the United Nations, along with other international regulations, aim to establish stringent cybersecurity mandates. These regulations require compliance with comprehensive cybersecurity management systems and necessitate regular updates and testing to cope with the evolving nature of cyber threats. The introduction of such regulations highlights the growing recognition of cybersecurity as a critical component of automotive safety and functionality. The future of automotive cybersecurity lies in the continuous development of advanced protective measures and collaborative efforts among all stakeholders, including manufacturers, policymakers, and cybersecurity professionals. Only through such concerted efforts can the industry hope to address the dual goals of innovation in vehicle functionality and stringent security measures against the backdrop of an increasingly interconnected digital landscape.
Updated: 2024-06-29 16:07:48
标题: 导航通往汽车网络安全合规的道路
摘要: 自1908年福特T型车推出以来,汽车行业已经发生了显著的变化。如今的汽车不仅仅是机械构造物;它们是复杂数字生态系统的重要组成部分,配备了由人工智能和云计算技术驱动的先进连接功能。这种演变增强了车辆的安全性、效率和整体驾驶体验。然而,它也带来了新的挑战,尤其是在网络安全方面。 随着数字技术的不断整合,汽车变得更容易受到网络攻击,引发了重大的网络安全担忧。这些担忧包括保护敏感数据、防止未经授权的车辆访问和确保用户隐私。作为回应,汽车行业被迫采取强有力的网络安全措施,以保护车辆和数据免受潜在威胁。 联合国的UNR155和UNR156等立法框架以及其他国际法规旨在建立严格的网络安全规定。这些规定要求遵守全面的网络安全管理系统,并需要定期更新和测试以适应网络威胁的不断发展。这些规定的推出突显了将网络安全视为汽车安全和功能的关键组成部分的日益增长的认识。 汽车网络安全的未来在于不断发展先进的保护措施和所有利益相关者之间的协作努力,包括制造商、政策制定者和网络安全专业人员。只有通过这种协同努力,行业才能希望在日益互联的数字景观背景下实现车辆功能创新和严格的安全措施这两个目标。
更新时间: 2024-06-29 16:07:48
领域: cs.CR
Quantifying Spuriousness of Biased Datasets Using Partial Information Decomposition
Spurious patterns refer to a mathematical association between two or more variables in a dataset that are not causally related. However, this notion of spuriousness, which is usually introduced due to sampling biases in the dataset, has classically lacked a formal definition. To address this gap, this work presents the first information-theoretic formalization of spuriousness in a dataset (given a split of spurious and core features) using a mathematical framework called Partial Information Decomposition (PID). Specifically, we disentangle the joint information content that the spurious and core features share about another target variable (e.g., the prediction label) into distinct components, namely unique, redundant, and synergistic information. We propose the use of unique information, with roots in Blackwell Sufficiency, as a novel metric to formally quantify dataset spuriousness and derive its desirable properties. We empirically demonstrate how higher unique information in the spurious features in a dataset could lead a model into choosing the spurious features over the core features for inference, often having low worst-group-accuracy. We also propose a novel autoencoder-based estimator for computing unique information that is able to handle high-dimensional image data. Finally, we also show how this unique information in the spurious feature is reduced across several dataset-based spurious-pattern-mitigation techniques such as data reweighting and varying levels of background mixing, demonstrating a novel tradeoff between unique information (spuriousness) and worst-group-accuracy.
Updated: 2024-06-29 16:05:47
标题: 使用部分信息分解量化有偏数据集的虚假性
摘要: 虚假模式是指数据集中两个或多个变量之间的数学关联,并非因果关系。然而,这种通常是由于数据集中的抽样偏差引入的虚假性概念,经典上缺乏正式定义。为了填补这一空白,本文提出了对数据集中虚假性的第一个信息论形式化(给定虚假和核心特征的划分),使用了一种称为部分信息分解(PID)的数学框架。具体地,我们将虚假和核心特征关于另一个目标变量(例如预测标签)的联合信息内容分解为独特信息、冗余信息和协同信息等不同组成部分。我们提出使用具有Blackwell Sufficiency根源的独特信息作为一种新的度量标准,以正式量化数据集虚假性并推导其理想属性。我们通过实验证明,在数据集中虚假特征中较高的独特信息可能导致模型选择虚假特征进行推断,通常具有较低的最差组准确性。我们还提出了一种基于自动编码器的估计器来计算独特信息,能够处理高维图像数据。最后,我们还展示了虚假特征中的独特信息如何在数据重新加权和不同水平的背景混合等多种基于数据集的虚假模式缓解技术中减少,展示了独特信息(虚假性)和最差组准确性之间的新型权衡。
更新时间: 2024-06-29 16:05:47
领域: cs.LG,cs.AI,cs.CV,cs.CY,cs.IT,math.IT
Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for developing stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural incompatibility and representation bottlenecks. To tackle this challenge, we propose a novel framework for learning from relational graphs via knowledge-aware parsimony learning. Our method draws inspiration from the duality between data and knowledge inherent in these graphs. Specifically, we first extract knowledge (like symbolic logic and physical laws) during the learning process, and then apply combinatorial generalization to the task at hand. This extracted knowledge serves as the ``building blocks'' for achieving parsimony learning. By applying this philosophy to architecture, parameters, and inference, we can effectively achieve versatile, sample-efficient, and interpretable learning. Experimental results show that our proposed framework surpasses methods that strictly follow the traditional scaling-up roadmap. This highlights the importance of incorporating knowledge in the development of next-generation learning technologies.
Updated: 2024-06-29 15:52:37
标题: 知识感知的简约学习:从关系图的视角
摘要: 比例定律是一种策略,涉及对训练数据集和可学习参数进行蛮力缩放,已成为开发更强大学习模型的流行方法。在本文中,我们从关系图学习的角度审视了其合理性。我们证明,直接遵循这种比例定律并不一定会产生更强大的模型,原因是架构不兼容和表示瓶颈。为了解决这一挑战,我们提出了一种通过知识感知的简约学习来学习关系图的新框架。我们的方法从这些图中固有的数据与知识之间的二元性中汲取灵感。具体地,我们首先在学习过程中提取知识(如符号逻辑和物理定律),然后将组合泛化应用于手头的任务。这些提取的知识作为实现简约学习的“构建模块”。通过将这种哲学应用于架构、参数和推断,我们可以有效实现多功能、样本高效且可解释的学习。实验结果显示,我们提出的框架超越了严格遵循传统扩展路线图的方法。这突显了在下一代学习技术的发展中纳入知识的重要性。
更新时间: 2024-06-29 15:52:37
领域: cs.LG,cs.AI
OptBA: Optimizing Hyperparameters with the Bees Algorithm for Improved Medical Text Classification
One of the main challenges in the field of deep learning is obtaining the optimal model hyperparameters. The search for optimal hyperparameters usually hinders the progress of solutions to real-world problems such as healthcare. Previous solutions have been proposed, but they can still get stuck in local optima. To overcome this hurdle, we propose OptBA to automatically fine-tune the hyperparameters of deep learning models by leveraging the Bees Algorithm, which is a recent promising swarm intelligence algorithm. In this paper, the optimization problem of OptBA is to maximize the accuracy in classifying ailments using medical text, where initial hyperparameters are iteratively adjusted by specific criteria. Experimental results demonstrate a noteworthy enhancement in accuracy with approximately 1.4%. This outcome highlights the effectiveness of the proposed mechanism in addressing the critical issue of hyperparameter optimization and its potential impact on advancing solutions for healthcare. The code is available publicly at \url{https://github.com/Mai-CS/OptBA}.
Updated: 2024-06-29 15:40:27
标题: OptBA:利用蜜蜂算法优化超参数以改进医学文本分类
摘要: 在深度学习领域的主要挑战之一是获得最佳模型超参数。寻找最佳超参数通常阻碍了解决真实世界问题(如医疗保健)的进展。先前已经提出了一些解决方案,但它们仍然可能陷入局部最优解。为了克服这一障碍,我们提出了OptBA,通过利用蜂群算法自动微调深度学习模型的超参数,蜂群算法是一种最近很有潜力的群体智能算法。本文中,OptBA的优化问题是通过特定标准迭代地调整初始超参数,以最大化使用医学文本进行疾病分类的准确性。实验结果表明,在准确性方面取得了显着的提高,约为1.4%。这一结果突显了所提出机制在解决超参数优化的关键问题以及其对推进医疗解决方案的潜在影响的有效性。代码公开可在\url{https://github.com/Mai-CS/OptBA}找到。
更新时间: 2024-06-29 15:40:27
领域: cs.CL,cs.AI
MH-pFLGB: Model Heterogeneous personalized Federated Learning via Global Bypass for Medical Image Analysis
In the evolving application of medical artificial intelligence, federated learning is notable for its ability to protect training data privacy. Federated learning facilitates collaborative model development without the need to share local data from healthcare institutions. Yet, the statistical and system heterogeneity among these institutions poses substantial challenges, which affects the effectiveness of federated learning and hampers the exchange of information between clients. To address these issues, we introduce a novel approach, MH-pFLGB, which employs a global bypass strategy to mitigate the reliance on public datasets and navigate the complexities of non-IID data distributions. Our method enhances traditional federated learning by integrating a global bypass model, which would share the information among the clients, but also serves as part of the network to enhance the performance on each client. Additionally, MH-pFLGB provides a feature fusion module to better combine the local and global features. We validate \model{}'s effectiveness and adaptability through extensive testing on different medical tasks, demonstrating superior performance compared to existing state-of-the-art methods.
Updated: 2024-06-29 15:38:37
标题: MH-pFLGB:模型异构个性化联邦学习通过全局绕过用于医学图像分析
摘要: 在医疗人工智能的不断应用中,联邦学习以其保护训练数据隐私的能力而备受关注。联邦学习促进了卫生保健机构之间的协作模型开发,无需共享本地数据。然而,这些机构之间的统计和系统异质性带来了重大挑战,影响了联邦学习的有效性,阻碍了客户之间的信息交流。为了解决这些问题,我们提出了一种新方法MH-pFLGB,它采用全局绕过策略来减轻对公共数据集的依赖并应对非IID数据分布的复杂性。我们的方法通过集成全局绕过模型增强了传统的联邦学习,该模型既可以在客户之间共享信息,又可以作为网络的一部分,提高每个客户端的性能。此外,MH-pFLGB提供了一个特征融合模块,更好地结合了本地和全局特征。我们通过对不同医疗任务进行广泛测试,验证了该模型的有效性和适应性,展示了与现有最先进方法相比的卓越性能。
更新时间: 2024-06-29 15:38:37
领域: cs.LG,cs.AI
IoT-Based Preventive Mental Health Using Knowledge Graphs and Standards for Better Well-Being
Sustainable Development Goals (SDGs) give the UN a road map for development with Agenda 2030 as a target. SDG3 "Good Health and Well-Being" ensures healthy lives and promotes well-being for all ages. Digital technologies can support SDG3. Burnout and even depression could be reduced by encouraging better preventive health. Due to the lack of patient knowledge and focus to take care of their health, it is necessary to help patients before it is too late. New trends such as positive psychology and mindfulness are highly encouraged in the USA. Digital Twin (DT) can help with the continuous monitoring of emotion using physiological signals (e.g., collected via wearables). Digital twins facilitate monitoring and provide constant health insight to improve quality of life and well-being with better personalization. Healthcare DT challenges are standardizing data formats, communication protocols, and data exchange mechanisms. To achieve those data integration and knowledge challenges, we designed the Mental Health Knowledge Graph (ontology and dataset) to boost mental health. The Knowledge Graph (KG) acquires knowledge from ontology-based mental health projects classified within the LOV4IoT ontology catalog (Emotion, Depression, and Mental Health). Furthermore, the KG is mapped to standards (e.g., ontologies) when possible. Standards from ETSI SmartM2M, ITU/WHO, ISO, W3C, NIST, and IEEE are relevant to mental health.
Updated: 2024-06-29 15:29:56
标题: 基于物联网的预防性心理健康利用知识图和标准以提升幸福感
摘要: 可持续发展目标(SDGs)为联合国提供了一个发展路线图,以2030年议程为目标。SDG3“良好健康和福祉”确保所有年龄段的健康生活并促进福祉。数字技术可以支持SDG3。通过鼓励更好的预防健康,可以减少疲劳和甚至抑郁。由于患者缺乏关于自己健康的知识和关注,有必要在为时已晚之前帮助患者。美国高度鼓励积极心理学和正念等新趋势。数字孪生(DT)可以通过使用生理信号(例如通过可穿戴设备收集)进行情绪的持续监测。数字孪生可促进监测并提供持续的健康洞察,以改善生活质量和福祉,并实现更好的个性化。医疗DT的挑战包括标准化数据格式、通信协议和数据交换机制。为了解决这些数据集成和知识挑战,我们设计了精神健康知识图(本体和数据集)来促进精神健康。知识图(KG)从基于本体的精神健康项目中获取知识,这些项目被分类为LOV4IoT本体目录(情绪、抑郁和精神健康)。此外,KG在可能的情况下映射到标准(例如本体)。来自ETSI SmartM2M、ITU/WHO、ISO、W3C、NIST和IEEE的标准与精神健康相关。
更新时间: 2024-06-29 15:29:56
领域: cs.AI,cs.CL,cs.CY,cs.LG
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial performance, undermining the credibility of these evaluations. To address this issue while maintaining the efficiency of MCQ evaluations, we propose MMEvalPro, a benchmark designed to avoid Type-I errors through a trilogy evaluation pipeline and more rigorous metrics. For each original question from existing benchmarks, human annotators augment it by creating one perception question and one knowledge anchor question through a meticulous annotation process. MMEvalPro comprises $2,138$ question triplets, totaling $6,414$ distinct questions. Two-thirds of these questions are manually labeled by human experts, while the rest are sourced from existing benchmarks (MMMU, ScienceQA, and MathVista). Compared with the existing benchmarks, our experiments with the latest LLMs and LMMs demonstrate that MMEvalPro is more challenging (the best LMM lags behind human performance by $31.73\%$, compared to an average gap of $8.03\%$ in previous benchmarks) and more trustworthy (the best LLM trails the best LMM by $23.09\%$, whereas the gap for previous benchmarks is just $14.64\%$). Our in-depth analysis explains the reason for the large performance gap and justifies the trustworthiness of evaluation, underscoring its significant potential for advancing future research.
Updated: 2024-06-29 15:28:45
标题: MMEvalPro: 将多模式基准标定为可信和高效评估
摘要: 大型多模态模型(LMM)展现出令人印象深刻的跨模态理解和推理能力,通常通过包含图像、问题和多个选项的多项选择题(MCQs)进行评估。然而,许多用于此类评估的基准存在系统性偏差。值得注意的是,没有任何视觉感知能力的大型语言模型(LLMs)也能取得非平凡的表现,削弱了这些评估的可信度。为了解决这个问题,同时保持MCQ评估的效率,我们提出了MMEvalPro,一个旨在通过三部曲评估流程和更严格的指标来避免I型错误的基准。对于现有基准的每个原始问题,人类注释员通过细致的注释过程增加了一个感知问题和一个知识锚问题。MMEvalPro包含2138个问题三元组,总共6414个不同的问题。其中三分之二的问题由人类专家手动标记,其余的来自现有基准(MMMU、ScienceQA和MathVista)。与现有基准相比,我们对最新的LLMs和LMMs的实验表明,MMEvalPro更具挑战性(最佳LMM的表现落后于人类表现31.73%,而之前基准的平均差距为8.03%),更可靠(最佳LLM落后于最佳LMM23.09%,而之前基准的差距仅为14.64%)。我们的深入分析解释了大量表现差距的原因,并证明了评估的可信度,强调了其对推动未来研究的重要潜力。
更新时间: 2024-06-29 15:28:45
领域: cs.CV,cs.AI,cs.CL
A Holistic Indicator of Polarization to Measure Online Sexism
The online trend of the manosphere and feminist discourse on social networks requires a holistic measure of the level of sexism in an online community. This indicator is important for policymakers and moderators of online communities (e.g., subreddits) and computational social scientists, either to revise moderation strategies based on the degree of sexism or to match and compare the temporal sexism across different platforms and communities with real-time events and infer social scientific insights. In this paper, we build a model that can provide a comparable holistic indicator of toxicity targeted toward male and female identity and male and female individuals. Despite previous supervised NLP methods that require annotation of toxic comments at the target level (e.g. annotating comments that are specifically toxic toward women) to detect targeted toxic comments, our indicator uses supervised NLP to detect the presence of toxicity and unsupervised word embedding association test to detect the target automatically. We apply our model to gender discourse communities (e.g., r/TheRedPill, r/MGTOW, r/FemaleDatingStrategy) to detect the level of toxicity toward genders (i.e., sexism). Our results show that our framework accurately and consistently (93% correlation) measures the level of sexism in a community. We finally discuss how our framework can be generalized in the future to measure qualities other than toxicity (e.g. sentiment, humor) toward general-purpose targets and turn into an indicator of different sorts of polarizations.
Updated: 2024-06-29 15:27:34
标题: 一个全面的极化指标,用于衡量在线性别歧视
摘要: 互联网上男权主义者圈和女权主义者圈在社交网络上的趋势需要对在线社区中的性别歧视水平进行整体衡量。这一指标对于决策者和在线社区的管理员(例如,子版块)以及计算社会科学家来说至关重要,他们可以根据性别歧视程度修改管理策略,或者将不同平台和社区的时间性别歧视进行匹配和比较,结合实时事件推断社会科学见解。 在本文中,我们建立了一个模型,可提供一个可比较的综合指标,针对男性和女性身份以及男性和女性个体的毒性。尽管先前的监督式自然语言处理方法需要在目标级别进行有毒评论的标注(例如,标注专门针对女性有毒的评论),以便检测有针对性的有毒评论,但我们的指标使用监督式自然语言处理来检测毒性的存在,并使用无监督的词嵌入关联测试来自动检测目标。 我们将我们的模型应用于性别话语社区(例如,r/TheRedPill、r/MGTOW、r/FemaleDatingStrategy),以检测对性别(即性别歧视)的毒性水平。我们的结果显示,我们的框架准确且一致地(93%的相关性)衡量了社区中的性别歧视水平。最后,我们讨论了如何将我们的框架在未来推广,以衡量除毒性之外的其他特质(例如情感、幽默)对普适目标的影响,并成为不同类型极化的指标。
更新时间: 2024-06-29 15:27:34
领域: cs.SI,cs.AI
BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science
Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an AI Scientist perspective remains largely unexplored. To this end, we draw inspiration from one most important abilities of scientists, understanding the literature, and introduce BioKGBench. In contrast to traditional evaluation benchmark that only focuses on factual QA, where the LLMs are known to have hallucination issues, we first disentangle "Understanding Literature" into two atomic abilities, i) "Understanding" the unstructured text from research papers by performing scientific claim verification, and ii) Ability to interact with structured Knowledge-Graph Question-Answering (KGQA) as a form of "Literature" grounding. We then formulate a novel agent task, dubbed KGCheck, using KGQA and domain-based Retrieval-Augmented Generation (RAG) to identify the factual errors of existing large-scale knowledge graph databases. We collect over two thousand data for two atomic tasks and 225 high-quality annotated data for the agent task. Surprisingly, we discover that state-of-the-art agents, both daily scenarios and biomedical ones, have either failed or inferior performance on our benchmark. We then introduce a simple yet effective baseline, dubbed BKGAgent. On the widely used popular knowledge graph, we discover over 90 factual errors which provide scenarios for agents to make discoveries and demonstrate the effectiveness of our approach. The code and data are available at https://github.com/westlake-autolab/BioKGBench.
Updated: 2024-06-29 15:23:28
标题: BioKGBench:生物医学科学AI代理的知识图检查基准
摘要: 追求用于生物医学科学的人工智能,即AI科学家,吸引了越来越多的关注,其中一个常见的方法是构建由大型语言模型(LLMs)驱动的副驾驶代理。然而,为了评估这样的系统,人们要么依赖于直接向LLM提出问题,要么以生物医学实验的方式。如何从AI科学家的角度精确评估生物医学代理仍然是一个较少探讨的领域。为此,我们从科学家最重要的能力之一——理解文献中获取灵感,并引入了BioKGBench。与传统的评估基准相比,传统的评估基准只关注事实问答,LLMs已知存在幻觉问题,我们首先将“理解文献”分解为两个基本能力,即i)通过执行科学主张验证来理解研究论文中的非结构化文本,以及ii)与结构化知识图问答(KGQA)进行交互的能力作为“文献”基础。然后,我们通过使用KGQA和基于领域的检索增强生成(RAG)制定了一个新颖的代理任务,名为KGCheck,以识别现有大规模知识图数据库的事实错误。我们为两个基本任务收集了两千多个数据,并为代理任务收集了225个高质量注释数据。令人惊讶的是,我们发现,最先进的代理,无论是日常场景还是生物医学场景,都在我们的基准测试中要么失败,要么表现不佳。然后我们引入了一个简单但有效的基准,名为BKGAgent。在广泛使用的知识图上,我们发现了超过90个事实错误,为代理提供了发现场景,并展示了我们方法的有效性。代码和数据可在https://github.com/westlake-autolab/BioKGBench。
更新时间: 2024-06-29 15:23:28
领域: cs.CL,cs.AI
pFLFE: Cross-silo Personalized Federated Learning via Feature Enhancement on Medical Image Segmentation
In medical image segmentation, personalized cross-silo federated learning (FL) is becoming popular for utilizing varied data across healthcare settings to overcome data scarcity and privacy concerns. However, existing methods often suffer from client drift, leading to inconsistent performance and delayed training. We propose a new framework, Personalized Federated Learning via Feature Enhancement (pFLFE), designed to mitigate these challenges. pFLFE consists of two main stages: feature enhancement and supervised learning. The first stage improves differentiation between foreground and background features, and the second uses these enhanced features for learning from segmentation masks. We also design an alternative training approach that requires fewer communication rounds without compromising segmentation quality, even with limited communication resources. Through experiments on three medical segmentation tasks, we demonstrate that pFLFE outperforms the state-of-the-art methods.
Updated: 2024-06-29 15:20:03
标题: pFLFE:通过对医学图像分割中的特征增强实现跨细分个性化联邦学习
摘要: 在医学图像分割中,个性化跨领域联邦学习(FL)越来越受欢迎,可以利用医疗保健领域各种各样的数据,克服数据稀缺和隐私问题。然而,现有方法经常受到客户漂移的影响,导致性能不一致和训练延迟。我们提出了一个新的框架,名为通过特征增强的个性化联邦学习(pFLFE),旨在缓解这些挑战。pFLFE包括两个主要阶段:特征增强和监督学习。第一阶段改善前景和背景特征之间的区分度,第二阶段利用这些增强的特征从分割掩模中进行学习。我们还设计了一种替代的训练方法,需要较少的通信轮次,而不会影响分割质量,即使通信资源有限。通过对三个医学分割任务的实验,我们证明pFLFE优于现有最先进的方法。
更新时间: 2024-06-29 15:20:03
领域: cs.CV,cs.AI
A Rule-Based Behaviour Planner for Autonomous Driving
Autonomous vehicles require highly sophisticated decision-making to determine their motion. This paper describes how such functionality can be achieved with a practical rule engine learned from expert driving decisions. We propose an algorithm to create and maintain a rule-based behaviour planner, using a two-layer rule-based theory. The first layer determines a set of feasible parametrized behaviours, given the perceived state of the environment. From these, a resolution function chooses the most conservative high-level maneuver. The second layer then reconciles the parameters into a single behaviour. To demonstrate the practicality of our approach, we report results of its implementation in a level-3 autonomous vehicle and its field test in an urban environment.
Updated: 2024-06-29 15:15:41
标题: 一个基于规则的自动驾驶行为规划器
摘要: 自动驾驶车辆需要高度复杂的决策能力来确定它们的运动。本文描述了如何通过从专家驾驶决策中学习的实用规则引擎实现这种功能。我们提出了一种算法,用于创建和维护一个基于规则的行为规划器,使用两层基于规则的理论。第一层根据环境的感知状态确定一组可行的参数化行为。从中,一个解析函数选择最保守的高级机动行为。然后,第二层将参数调和为单一行为。为了展示我们方法的实用性,我们报告了其在一个三级自动驾驶车辆中的实施结果以及在城市环境中的现场测试。
更新时间: 2024-06-29 15:15:41
领域: cs.AI,cs.RO
Knowing When to Stop: Delay-Adaptive Spiking Neural Network Classifiers with Reliability Guarantees
Spiking neural networks (SNNs) process time-series data via internal event-driven neural dynamics. The energy consumption of an SNN depends on the number of spikes exchanged between neurons over the course of the input presentation. Typically, decisions are produced after the entire input sequence has been processed. This results in latency and energy consumption levels that are fairly uniform across inputs. However, as explored in recent work, SNNs can produce an early decision when the SNN model is sufficiently ``confident'', adapting delay and energy consumption to the difficulty of each example. Existing techniques are based on heuristic measures of confidence that do not provide reliability guarantees, potentially exiting too early. In this paper, we introduce a novel delay-adaptive SNN-based inference methodology that, wrapping around any pre-trained SNN classifier, provides guaranteed reliability for the decisions produced at input-dependent stopping times. The approach, dubbed SpikeCP, leverages tools from conformal prediction (CP). It entails minimal complexity increase as compared to the underlying SNN, requiring only additional thresholding and counting operations at run time. SpikeCP is also extended to integrate a CP-aware training phase that targets delay performance. Variants of CP based on alternative confidence correction schemes, from Bonferroni to Simes, are explored, and extensive experiments are described using the MNIST-DVS data set, DVS128 Gesture dataset, and CIFAR-10 dataset.
Updated: 2024-06-29 15:11:10
标题: 知道何时停止:具有可靠性保证的延迟自适应脉冲神经网络分类器
摘要: 尖峰神经网络(SNNs)通过内部事件驱动的神经动态处理时间序列数据。SNN的能耗取决于在输入呈现过程中神经元之间交换的尖峰数量。通常,在整个输入序列被处理后才产生决策。这导致延迟和能耗水平在输入之间相对均匀。然而,正如最近的研究所探讨的那样,当SNN模型足够“自信”时,SNN可以在较早时做出决策,将延迟和能耗调整到每个示例的难度。现有技术基于启发式置信度度量,这些度量不能提供可靠性保证,可能会过早退出。在本文中,我们介绍了一种新颖的基于延迟自适应的SNN推断方法,该方法在任何预先训练的SNN分类器周围提供了针对输入相关停止时间产生的决策的可靠性保证。这种方法被称为SpikeCP,利用了符合预测(CP)的工具。与基础SNN相比,该方法的复杂度增加最小,仅需要在运行时进行额外的阈值处理和计数操作。SpikeCP还扩展到集成一个CP感知训练阶段,以改善延迟性能。探讨了基于替代置信度校正方案的CP变体,从Bonferroni到Simes,并使用MNIST-DVS数据集、DVS128手势数据集和CIFAR-10数据集进行了广泛的实验描述。
更新时间: 2024-06-29 15:11:10
领域: cs.NE,cs.AI,cs.LG
Cross-silo Federated Learning with Record-level Personalized Differential Privacy
Federated learning (FL) enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named \textit{rPDP-FL}, employing a two-stage hybrid sampling scheme with both uniform client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is how to determine the ideal per-record sampling probability $q$ given the personalized privacy budget $\varepsilon$. We introduce a versatile solution named \textit{Simulation-CurveFitting}, allowing us to uncover a significant insight into the nonlinear correlation between $q$ and $\varepsilon$ and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.
Updated: 2024-06-29 14:58:30
标题: 跨领域的记录级个性化差分隐私联邦学习
摘要: 由差分隐私增强的联邦学习(FL)已成为一种流行的方法,通过在训练过程中保护客户端数据的隐私来更好地保护客户的贡献。现有的解决方案通常假设所有记录都有统一的隐私预算,并提供一揽子解决方案,这可能不足以满足每个记录的隐私要求。在本文中,我们探索了跨边界FL与记录级个性化差分隐私的未知领域。我们设计了一个名为rPDP-FL的新框架,采用两阶段混合采样方案,同时进行统一的客户级采样和非统一的记录级采样,以满足不同的隐私要求。 一个关键且非常复杂的问题是如何确定给定个性化隐私预算ε的每个记录的理想采样概率q。我们介绍了一个通用解决方案,名为Simulation-CurveFitting,使我们能够揭示q和ε之间的非线性相关性,并推导出一个优雅的数学模型来解决这个问题。我们的评估表明,我们的解决方案可以显著提高性能,超过那些不考虑个性化隐私保护的基线。
更新时间: 2024-06-29 14:58:30
领域: cs.CR,cs.AI,cs.LG
Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models
Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream Code LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by Code LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers have different coding styles. Additionally, we study the possible causes of these inconsistencies and provide some solutions to alleviate the problem.
Updated: 2024-06-29 14:56:11
标题: 超越功能正确性:探究大型语言模型中的编码风格不一致性
摘要: 大型语言模型(LLMs)已经给代码生成领域带来了一场范式转变,为增强软件开发过程提供了潜力。然而,先前的研究主要关注代码生成的准确性,而LLMs和人类开发者之间的编码风格差异仍未被充分探索。在本文中,我们通过实证分析主流代码LLMs生成的代码和人类开发者编写的代码之间的编码风格差异,并总结了编码风格不一致的分类法。具体来说,我们首先通过手动分析大量生成结果总结了编码风格不一致的类型。然后,我们比较了代码LLMs生成的代码与人类程序员编写的代码在可读性、简洁性和健壮性方面的差异。结果显示,LLMs和开发者有不同的编码风格。此外,我们研究了这些不一致性的可能原因,并提供了一些缓解问题的解决方案。
更新时间: 2024-06-29 14:56:11
领域: cs.SE,cs.AI
KHNNs: hypercomplex neural networks computations via Keras using TensorFlow and PyTorch
Neural networks used in computations with more advanced algebras than real numbers perform better in some applications. However, there is no general framework for constructing hypercomplex neural networks. We propose a library integrated with Keras that can do computations within TensorFlow and PyTorch. It provides Dense and Convolutional 1D, 2D, and 3D layers architectures.
Updated: 2024-06-29 14:36:37
标题: KHNNs:使用TensorFlow和PyTorch通过Keras进行超复杂神经网络计算
摘要: 神经网络在计算中使用比实数更高级代数时,在一些应用中表现更好。然而,目前还没有通用的构建超复数神经网络的框架。我们提出了一个集成了Keras的库,可以在TensorFlow和PyTorch中进行计算。它提供了密集和卷积1D、2D和3D层的架构。
更新时间: 2024-06-29 14:36:37
领域: cs.LG,cs.AI,cs.NE
A survey on the impact of AI-based recommenders on human behaviours: methodologies, outcomes and future directions
Recommendation systems and assistants (in short, recommenders) are ubiquitous in online platforms and influence most actions of our day-to-day lives, suggesting items or providing solutions based on users' preferences or requests. This survey analyses the impact of recommenders in four human-AI ecosystems: social media, online retail, urban mapping and generative AI ecosystems. Its scope is to systematise a fast-growing field in which terminologies employed to classify methodologies and outcomes are fragmented and unsystematic. We follow the customary steps of qualitative systematic review, gathering 144 articles from different disciplines to develop a parsimonious taxonomy of: methodologies employed (empirical, simulation, observational, controlled), outcomes observed (concentration, model collapse, diversity, echo chamber, filter bubble, inequality, polarisation, radicalisation, volume), and their level of analysis (individual, item, model, and systemic). We systematically discuss all findings of our survey substantively and methodologically, highlighting also potential avenues for future research. This survey is addressed to scholars and practitioners interested in different human-AI ecosystems, policymakers and institutional stakeholders who want to understand better the measurable outcomes of recommenders, and tech companies who wish to obtain a systematic view of the impact of their recommenders.
Updated: 2024-06-29 14:34:32
标题: 一项关于基于人工智能推荐系统对人类行为影响的调查:方法、结果和未来方向
摘要: 推荐系统和助手(简称推荐者)在在线平台上无处不在,影响着我们日常生活中的大部分行为,根据用户的偏好或请求建议项目或提供解决方案。本调查分析了推荐者在四个人工智能生态系统中的影响:社交媒体、在线零售、城市地图和生成式人工智能生态系统。其范围是系统化一个快速增长的领域,在这个领域中,用于分类方法和结果的术语是分散和无系统的。我们遵循定性系统评审的惯例步骤,收集来自不同学科的144篇文章,以开发一个简洁的分类法:采用的方法(经验性、模拟、观察性、控制性)、观察到的结果(集中、模型崩溃、多样性、回音室、过滤泡沫、不平等、极化、激进化、数量)以及它们的分析层次(个体、项目、模型和系统)。我们系统地讨论了调查的所有发现,从实质和方法上强调了未来研究的潜在途径。这份调查针对对不同人工智能生态系统感兴趣的学者和从业者、希望更好地了解推荐者可衡量结果的政策制定者和机构利益相关者,以及希望获得对其推荐者影响的系统化视角的技术公司。
更新时间: 2024-06-29 14:34:32
领域: cs.IR,cs.AI,cs.CY,cs.HC
Fully tensorial approach to hypercomplex neural networks
Fully tensorial theory of hypercomplex neural networks is given. The key point is to observe that the algebra multiplication can be represented as a rank three tensor. This approach is attractive for neural network libraries that support effective tensorial operations.
Updated: 2024-06-29 14:19:40
标题: 超复杂神经网络的完全张量方法
摘要: 这篇文献提供了关于超复杂神经网络的完全张量理论。关键点在于观察代数乘法可以表示为一个三阶张量。这种方法对支持有效张量操作的神经网络库非常吸引人。
更新时间: 2024-06-29 14:19:40
领域: cs.LG,cs.AI,cs.NE,15A69, 15-04
LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language
LLMs have demonstrated commendable performance across diverse domains. Nevertheless, formulating high-quality prompts to instruct LLMs proficiently poses a challenge for non-AI experts. Existing research in prompt engineering suggests somewhat scattered optimization principles and designs empirically dependent prompt optimizers. Unfortunately, these endeavors lack a structured design template, incurring high learning costs and resulting in low reusability. In addition, it is not conducive to the iterative updating of prompts. Inspired by structured reusable programming languages, we propose LangGPT, a dual-layer prompt design framework as the programming language for LLMs. LangGPT has an easy-to-learn normative structure and provides an extended structure for migration and reuse. Experiments illustrate that LangGPT significantly enhances the performance of LLMs. Moreover, the case study shows that LangGPT leads LLMs to generate higher-quality responses. Furthermore, we analyzed the ease of use and reusability of LangGPT through a user survey in our online community.
Updated: 2024-06-29 14:19:08
标题: LangGPT:重新思考编程语言模型的结构化可重用提示设计框架
摘要: LLMs在各个领域展示了令人钦佩的表现。然而,为了有效地指导LLMs,制定高质量的提示对非AI专家来说是一个挑战。现有的提示工程研究表明,优化原则和设计有些零散,并且依赖经验性提示优化器。不幸的是,这些努力缺乏结构化设计模板,导致高学习成本和低可重用性。此外,这不利于提示的迭代更新。受结构化可重用编程语言的启发,我们提出LangGPT,一个双层提示设计框架,作为LLMs的编程语言。LangGPT具有易于学习的规范结构,并提供了一个扩展结构用于迁移和重用。实验证明LangGPT显著提高了LLMs的性能。此外,案例研究显示LangGPT引导LLMs生成更高质量的回复。此外,我们通过在线社区的用户调查分析了LangGPT的易用性和可重用性。
更新时间: 2024-06-29 14:19:08
领域: cs.SE,cs.AI,cs.CL,cs.PL
PanopticNDT: Efficient and Robust Panoptic Mapping
As the application scenarios of mobile robots are getting more complex and challenging, scene understanding becomes increasingly crucial. A mobile robot that is supposed to operate autonomously in indoor environments must have precise knowledge about what objects are present, where they are, what their spatial extent is, and how they can be reached; i.e., information about free space is also crucial. Panoptic mapping is a powerful instrument providing such information. However, building 3D panoptic maps with high spatial resolution is challenging on mobile robots, given their limited computing capabilities. In this paper, we propose PanopticNDT - an efficient and robust panoptic mapping approach based on occupancy normal distribution transform (NDT) mapping. We evaluate our approach on the publicly available datasets Hypersim and ScanNetV2. The results reveal that our approach can represent panoptic information at a higher level of detail than other state-of-the-art approaches while enabling real-time panoptic mapping on mobile robots. Finally, we prove the real-world applicability of PanopticNDT with qualitative results in a domestic application.
Updated: 2024-06-29 14:18:59
标题: 全景NDT:高效且稳健的全景映射
摘要: 随着移动机器人的应用场景变得越来越复杂和具有挑战性,场景理解变得越来越关键。一个被认为在室内环境中自主运行的移动机器人必须对存在的物体有精确的了解,它们的位置在哪里,它们的空间范围是什么,以及如何到达它们;即,关于自由空间的信息也是至关重要的。全景地图是提供这种信息的强大工具。然而,在移动机器人上构建具有高空间分辨率的3D全景地图是具有挑战性的,因为它们的计算能力有限。在本文中,我们提出了PanopticNDT - 一种基于占用正态分布变换(NDT)地图的高效且稳健的全景地图建模方法。我们在公开可用的Hypersim和ScanNetV2数据集上评估我们的方法。结果显示,我们的方法能够比其他最先进的方法更详细地表示全景信息,同时实现移动机器人上的实时全景地图建模。最后,我们通过在家庭应用中的定性结果证明了PanopticNDT的实际适用性。
更新时间: 2024-06-29 14:18:59
领域: cs.RO,cs.AI,cs.LG
Protecting the 'Stop Using My Data' Right through Blockchain-assisted Evidence Generation
In order to provide personalized services to users, Internet-based platforms collect and utilize user-generated behavioral data. Although the 'stop using my data' right should be a fundamental data right, which allows individuals to request their personal data to be no longer utilized by online platforms, the existing preventive data protection measures (e.g., cryptographic data elimination, differential privacy) are unfortunately not applicable. This work aims to develop the first Evidence Generation Framework for deterring post-acquisition data right violations. We formulated the 'stop using my data' problem, which captures a vantage facet of the multi-faceted notion of 'right to be forgotten'. We designed and implemented the first blockchain-assisted system to generate evidence for deterring the violations of the 'stop using my data' right. Our system employs a novel two-stage evidence generation protocol whose efficacy is ensured by a newly proposed Lemma. To validate our framework, we conducted a case study on recommendation systems with systematic evaluation experiments using two real-world datasets: the measured success rate exceeds 99%.
Updated: 2024-06-29 13:51:57
标题: 通过区块链辅助证据生成来保护“停止使用我的数据”权利
摘要: 为了向用户提供个性化服务,基于互联网的平台收集和利用用户生成的行为数据。尽管“停止使用我的数据”权利应该是一项基本数据权利,允许个人要求在线平台不再利用其个人数据,但现有的预防数据保护措施(例如,加密数据消除、差分隐私)遗憾地不适用。本研究旨在开发第一个用于阻止后获取数据权利违规行为的证据生成框架。我们制定了“停止使用我的数据”问题,该问题捕捉了“被遗忘权”的多方面概念的一个重要方面。我们设计并实现了第一个利用区块链辅助的系统,用于生成阻止“停止使用我的数据”权利违规行为的证据。我们的系统采用了一种新颖的两阶段证据生成协议,其有效性由新提出的引理保证。为了验证我们的框架,我们进行了一项关于推荐系统的案例研究,使用两个真实世界的数据集进行系统评估实验:测得的成功率超过99%。
更新时间: 2024-06-29 13:51:57
领域: cs.CR
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data
In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data can address this, but existing models typically require substantial amounts of data to train effectively, contradicting our objective to solve data scarcity. To address this challenge, we propose a novel framework to generate synthetic tabular data, powered by large language models (LLMs) that emulates the architecture of a Generative Adversarial Network (GAN). By incorporating data generation process as contextual information and utilizing LLM as the optimizer, our approach significantly enhance the quality of synthetic data generation in common scenarios with small sample sizes. Our experimental results on public and private datasets demonstrate that our model outperforms several state-of-art models regarding generating higher quality synthetic data for downstream tasks while keeping privacy of the real data.
Updated: 2024-06-29 13:48:12
标题: MALLM-GAN:多智能体大型语言模型作为生成对抗网络用于合成表格数据
摘要: 在大数据时代,获取丰富数据对推动研究至关重要。然而,由于隐私顾虑或高昂成本,这类数据通常难以获取,特别是在医疗领域。生成合成(表格)数据可以解决这个问题,但现有模型通常需要大量数据才能有效训练,与我们解决数据稀缺的目标相矛盾。为了解决这一挑战,我们提出了一个新颖的框架来生成合成表格数据,由大型语言模型(LLMs)驱动,模拟生成对抗网络(GAN)的架构。通过将数据生成过程作为上下文信息并利用LLM作为优化器,我们的方法显著提高了在样本量较小的常见情况下生成合成数据的质量。我们在公共和私人数据集上的实验结果表明,我们的模型在生成更高质量的合成数据以用于下游任务的同时保持真实数据的隐私方面优于几种最先进的模型。
更新时间: 2024-06-29 13:48:12
领域: cs.LG,cs.AI
Global Trends in Cryptocurrency Regulation: An Overview
Cryptocurrencies have evolved into an important asset class, providing a variety of benefits. However, they also present significant risks, such as market volatility and the potential for misuse in illegal activities. These risks underline the urgent need for a comprehensive regulatory framework to ensure consumer protection, market integrity, and financial stability. Yet, the global landscape of cryptocurrency regulation remains complex, marked by substantial variations in regulatory frameworks among different countries. This paper aims to study these differences by investigating the regulatory landscapes across various jurisdictions. We first discuss regulatory challenges and considerations, and then conduct a comparative analysis of international regulatory stances, approaches, and measures. We hope our study offers practical insights to enhance the understanding of global trends in cryptocurrency regulation.
Updated: 2024-06-29 13:21:19
标题: 全球加密货币监管趋势:概览
摘要: 加密货币已经发展成为一个重要的资产类别,提供了各种好处。然而,它们也存在重大风险,如市场波动性和在非法活动中被滥用的潜力。这些风险凸显了迫切需要建立全面的监管框架来确保消费者保护、市场完整性和金融稳定性。然而,全球加密货币监管环境仍然复杂,不同国家之间监管框架存在显著差异。本文旨在通过调查各个司法管辖区的监管环境来研究这些差异。我们首先讨论监管挑战和考虑因素,然后进行国际监管立场、方法和措施的比较分析。我们希望我们的研究能提供实用的见解,以增强对全球加密货币监管趋势的理解。
更新时间: 2024-06-29 13:21:19
领域: cs.CY,cs.CR
A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation
Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction, or graph summarization, has gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide a comprehensive understanding of graph reduction methods, including graph sparsification, graph coarsening, and graph condensation. Specifically, we establish a unified definition for these methods and introduce a hierarchical taxonomy to categorize the challenges they address. Our survey then systematically reviews the technical details of these methods and emphasizes their practical applications across diverse scenarios. Furthermore, we outline critical research directions to ensure the continued effectiveness of graph reduction techniques, as well as provide a comprehensive paper list at \url{https://github.com/Emory-Melody/awesome-graph-reduction}. We hope this survey will bridge literature gaps and propel the advancement of this promising field.
Updated: 2024-06-29 13:07:00
标题: 一份关于图减少的综合调查:稀疏化、粗化和凝聚
摘要: 许多现实世界的数据集可以自然地表示为图形,跨越了广泛的领域。然而,图形数据集的复杂性和规模的增加为分析和计算带来了重大挑战。为此,图形缩减或图形摘要已经因简化大型图形而保留基本属性而备受关注。在本调查中,我们旨在提供对图形缩减方法的全面了解,包括图形稀疏化、图形粗化和图形凝缩。具体而言,我们为这些方法建立了统一定义,并引入了一个分层分类法来对其解决的挑战进行分类。我们的调查系统地审查了这些方法的技术细节,并强调它们在各种场景中的实际应用。此外,我们概述了确保图形缩减技术持续有效性的关键研究方向,并提供了一个全面的论文列表,网址为\url{https://github.com/Emory-Melody/awesome-graph-reduction}。我们希望这项调查可以弥补文献差距,推动这一充满希望的领域的发展。
更新时间: 2024-06-29 13:07:00
领域: cs.SI,cs.AI,cs.DS,cs.LG
Time Series Clustering with General State Space Models via Stochastic Variational Inference
In this paper, we propose a novel method of model-based time series clustering with mixtures of general state space models (MSSMs). Each component of MSSMs is associated with each cluster. An advantage of the proposed method is that it enables the use of time series models appropriate to the specific time series. This not only improves clustering and prediction accuracy but also enhances the interpretability of the estimated parameters. The parameters of the MSSMs are estimated using stochastic variational inference, a subtype of variational inference. The proposed method estimates the latent variables of an arbitrary state space model by using neural networks with a normalizing flow as a variational estimator. The number of clusters can be estimated using the Bayesian information criterion. In addition, to prevent MSSMs from converging to the local optimum, we propose several optimization tricks, including an additional penalty term called entropy annealing. Experiments on simulated datasets show that the proposed method is effective for clustering, parameter estimation, and estimating the number of clusters.
Updated: 2024-06-29 12:48:53
标题: 通过随机变分推断的一般状态空间模型进行时间序列聚类
摘要: 在本文中,我们提出了一种基于模型的时间序列聚类方法,使用混合一般状态空间模型(MSSMs)。每个MSSMs组件与每个聚类相关联。所提出的方法的优势在于它能够使用适合特定时间序列的时间序列模型。这不仅提高了聚类和预测准确性,还增强了估计参数的可解释性。MSSMs的参数是使用随机变分推断(一种变分推断的子类型)来估计的。所提出的方法通过使用神经网络与归一化流作为变分估计器来估计任意状态空间模型的潜变量。可以使用贝叶斯信息准则来估计聚类的数量。此外,为了防止MSSMs收敛到局部最优解,我们提出了几种优化技巧,包括一个名为熵退火的额外惩罚项。对模拟数据集的实验证明,所提出的方法对于聚类、参数估计和估计聚类数量是有效的。
更新时间: 2024-06-29 12:48:53
领域: cs.LG,cs.AI
The Machine Psychology of Cooperation: Can GPT models operationalise prompts for altruism, cooperation, competitiveness and selfishness in economic games?
We investigated the capability of the GPT-3.5 large language model (LLM) to operationalize natural language descriptions of cooperative, competitive, altruistic, and self-interested behavior in two social dilemmas: the repeated Prisoners Dilemma and the one-shot Dictator Game. Using a within-subject experimental design, we used a prompt to describe the task environment using a similar protocol to that used in experimental psychology studies with human subjects. We tested our research question by manipulating the part of our prompt which was used to create a simulated persona with different cooperative and competitive stances. We then assessed the resulting simulacras' level of cooperation in each social dilemma, taking into account the effect of different partner conditions for the repeated game. Our results provide evidence that LLMs can, to some extent, translate natural language descriptions of different cooperative stances into corresponding descriptions of appropriate task behaviour, particularly in the one-shot game. There is some evidence of behaviour resembling conditional reciprocity for the cooperative simulacra in the repeated game, and for the later version of the model there is evidence of altruistic behaviour. Our study has potential implications for using LLM chatbots in task environments that involve cooperation, e.g. using chatbots as mediators and facilitators in public-goods negotiations.
Updated: 2024-06-29 12:29:28
标题: 合作的机器心理学:GPT模型能在经济游戏中操作提示利他主义、合作、竞争和自私吗?
摘要: 我们调查了GPT-3.5大型语言模型(LLM)在两个社会困境中操作自然语言描述合作、竞争、利他和自私行为的能力:重复囚徒困境和一次性独裁者游戏。使用一个被试内实验设计,我们使用一个提示来描述任务环境,使用类似于实验心理学研究中使用的协议与人类受试者。我们通过操纵用于创建具有不同合作和竞争立场的模拟人物的提示部分来测试我们的研究问题。然后我们评估了每个社会困境中结果模拟人的合作水平,考虑了重复游戏的不同合作条件的影响。我们的结果表明LLMs在某种程度上可以将不同合作立场的自然语言描述转化为相应的适当任务行为描述,尤其是在一次性游戏中。在重复游戏中,有一些类似有条件的互惠行为的证据,而在模型的后续版本中有利他行为的证据。我们的研究对于在涉及合作的任务环境中使用LLM聊天机器人具有潜在意义,例如在公共物品谈判中使用聊天机器人作为中介和促进者。
更新时间: 2024-06-29 12:29:28
领域: cs.GT,cs.AI,cs.CY,econ.GN,q-fin.EC,I.2.0
Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision
In contemporary computer vision applications, particularly image classification, architectural backbones pre-trained on large datasets like ImageNet are commonly employed as feature extractors. Despite the widespread use of these pre-trained convolutional neural networks (CNNs), there remains a gap in understanding the performance of various resource-efficient backbones across diverse domains and dataset sizes. Our study systematically evaluates multiple lightweight, pre-trained CNN backbones under consistent training settings across a variety of datasets, including natural images, medical images, galaxy images, and remote sensing images. This comprehensive analysis aims to aid machine learning practitioners in selecting the most suitable backbone for their specific problem, especially in scenarios involving small datasets where fine-tuning a pre-trained network is crucial. Even though attention-based architectures are gaining popularity, we observed that they tend to perform poorly under low data finetuning tasks compared to CNNs. We also observed that some CNN architectures such as ConvNeXt, RegNet and EfficientNet performs well compared to others on a diverse set of domains consistently. Our findings provide actionable insights into the performance trade-offs and effectiveness of different backbones, facilitating informed decision-making in model selection for a broad spectrum of computer vision domains. Our code is available here: https://github.com/pranavphoenix/Backbones
Updated: 2024-06-29 12:26:42
标题: 使用哪种主干:计算机视觉领域资源高效性比较
摘要: 在当代计算机视觉应用中,特别是图像分类中,通常会使用在大型数据集(如ImageNet)上预训练的网络架构作为特征提取器。尽管这些预训练的卷积神经网络(CNNs)被广泛使用,但对于各种资源有效的骨干网络在不同领域和数据集大小下的性能仍存在差距。我们的研究系统地评估了多种轻量级、预训练的CNN骨干网络在一系列数据集上的性能,包括自然图像、医学图像、星系图像和遥感图像。这项综合分析旨在帮助机器学习从业者选择最适合其特定问题的骨干网络,特别是在涉及小数据集的情况下,微调预训练网络至关重要。尽管基于注意力的架构越来越受欢迎,我们观察到它们在低数据微调任务中的表现往往不如CNNs。我们还观察到,一些CNN架构,如ConvNeXt、RegNet和EfficientNet,在各种领域中表现良好且稳定。我们的研究结果为不同骨干网络的性能权衡和有效性提供了可操作的见解,有助于在广泛的计算机视觉领域中进行明智的模型选择。我们的代码可以在此处找到:https://github.com/pranavphoenix/Backbones
更新时间: 2024-06-29 12:26:42
领域: cs.CV,cs.AI,cs.LG,I.2.10; I.4.0; I.4.1; I.4.2; I.4.6; I.4.7; I.4.8; I.4.9; I.4.10; I.2.10; I.5.1; I.5.2; I.5.4; J.2
On the Complexity of Learning to Cooperate with Populations of Socially Rational Agents
Artificially intelligent agents deployed in the real-world will require the ability to reliably \textit{cooperate} with humans (as well as other, heterogeneous AI agents). To provide formal guarantees of successful cooperation, we must make some assumptions about how partner agents could plausibly behave. Any realistic set of assumptions must account for the fact that other agents may be just as adaptable as our agent is. In this work, we consider the problem of cooperating with a \textit{population} of agents in a finitely-repeated, two player general-sum matrix game with private utilities. Two natural assumptions in such settings are that: 1) all agents in the population are individually rational learners, and 2) when any two members of the population are paired together, with high-probability they will achieve at least the same utility as they would under some Pareto efficient equilibrium strategy. Our results first show that these assumptions alone are insufficient to ensure \textit{zero-shot} cooperation with members of the target population. We therefore consider the problem of \textit{learning} a strategy for cooperating with such a population using prior observations its members interacting with one another. We provide upper and lower bounds on the number of samples needed to learn an effective cooperation strategy. Most importantly, we show that these bounds can be much stronger than those arising from a "naive'' reduction of the problem to one of imitation learning.
Updated: 2024-06-29 11:59:52
标题: 学习与社会合作代理群体合作的复杂性
摘要: 在现实世界中部署的人工智能代理将需要可靠地与人类(以及其他异构AI代理)\textit{合作}的能力。为了提供成功合作的正式保证,我们必须对合作伙伴代理可能的行为做一些假设。任何现实的假设都必须考虑到其他代理可能与我们的代理一样适应。在这项工作中,我们考虑了在一个有限重复的私人效用二人零和博弈中与一个\textit{人口}代理合作的问题。在这种设置中的两个自然假设是:1)人口中的所有代理都是理性学习者,2)当人口中的任意两个成员配对时,高概率下他们将至少达到与某个帕累托有效均衡策略下相同的效用。我们的结果首先表明,仅凭这些假设是不足以确保与目标人口成员进行\textit{零射击}合作的。因此,我们考虑了通过先前观察到其成员相互交互来\textit{学习}与这样一个人口合作的策略的问题。我们提供了学习有效合作策略所需样本数量的上下界。最重要的是,我们表明这些界限可能比将问题简化为模仿学习问题产生的界限要强。
更新时间: 2024-06-29 11:59:52
领域: cs.LG,cs.AI,cs.GT,cs.MA
Obtaining $(ε,δ)$-differential privacy guarantees when using a Poisson mechanism to synthesize contingency tables
We show that differential privacy type guarantees can be obtained when using a Poisson synthesis mechanism to protect counts in contingency tables. Specifically, we show how to obtain $(\epsilon, \delta)$-probabilistic differential privacy guarantees via the Poisson distribution's cumulative distribution function. We demonstrate this empirically with the synthesis of an administrative-type confidential database.
Updated: 2024-06-29 11:57:24
标题: 使用泊松机制合成列联表时获得$(ε,δ)$-差分隐私保证
摘要: 我们展示了在使用泊松合成机制保护列联表中的计数时可以获得差分隐私类型保证。具体地,我们展示了如何通过泊松分布的累积分布函数获得$(\epsilon, \delta)$-概率差分隐私保证。我们通过对一个类似行政机构的保密数据库进行合成来进行实证验证。
更新时间: 2024-06-29 11:57:24
领域: cs.CR,stat.ME
Explainability of Machine Learning Models under Missing Data
Missing data is a prevalent issue that can significantly impair model performance and interpretability. This paper briefly summarizes the development of the field of missing data with respect to Explainable Artificial Intelligence and experimentally investigates the effects of various imputation methods on the calculation of Shapley values, a popular technique for interpreting complex machine learning models. We compare different imputation strategies and assess their impact on feature importance and interaction as determined by Shapley values. Moreover, we also theoretically analyze the effects of missing values on Shapley values. Importantly, our findings reveal that the choice of imputation method can introduce biases that could lead to changes in the Shapley values, thereby affecting the interpretability of the model. Moreover, and that a lower test prediction mean square error (MSE) may not imply a lower MSE in Shapley values and vice versa. Also, while Xgboost is a method that could handle missing data directly, using Xgboost directly on missing data can seriously affect interpretability compared to imputing the data before training Xgboost. This study provides a comprehensive evaluation of imputation methods in the context of model interpretation, offering practical guidance for selecting appropriate techniques based on dataset characteristics and analysis objectives. The results underscore the importance of considering imputation effects to ensure robust and reliable insights from machine learning models.
Updated: 2024-06-29 11:31:09
标题: 机器学习模型在缺失数据情况下的可解释性
摘要: 缺失数据是一个普遍存在的问题,可以显著影响模型的性能和可解释性。本文简要总结了缺失数据领域在可解释人工智能方面的发展,并通过实验证明了各种插补方法对Shapley值计算的影响,Shapley值是一种用于解释复杂机器学习模型的流行技术。我们比较了不同的插补策略,并评估了它们对特征重要性和Shapley值所确定的交互作用的影响。此外,我们还从理论上分析了缺失值对Shapley值的影响。重要的是,我们的研究结果表明,插补方法的选择可能引入偏见,从而导致Shapley值的变化,进而影响模型的可解释性。此外,测试预测均方误差(MSE)较低并不意味着Shapley值中的MSE较低,反之亦然。此外,虽然Xgboost是一种可以直接处理缺失数据的方法,但直接在缺失数据上使用Xgboost可能会严重影响可解释性,与在训练Xgboost之前对数据进行插补相比。本研究在模型解释的背景下对插补方法进行了全面评估,为根据数据集特征和分析目标选择适当技术提供了实用指导。结果强调了考虑插补效果的重要性,以确保从机器学习模型中获得强大可靠的见解。
更新时间: 2024-06-29 11:31:09
领域: cs.LG,cs.AI
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
Improvements in language models' capabilities have pushed their applications towards longer contexts, making long-context evaluation and development an active research area. However, many disparate use-cases are grouped together under the umbrella term of "long-context", defined simply by the total length of the model's input, including - for example - Needle-in-a-Haystack tasks, book summarization, and information aggregation. Given their varied difficulty, in this position paper we argue that conflating different tasks by their context length is unproductive. As a community, we require a more precise vocabulary to understand what makes long-context tasks similar or different. We propose to unpack the taxonomy of long-context based on the properties that make them more difficult with longer contexts. We propose two orthogonal axes of difficulty: (I) Diffusion: How hard is it to find the necessary information in the context? (II) Scope: How much necessary information is there to find? We survey the literature on long-context, provide justification for this taxonomy as an informative descriptor, and situate the literature with respect to it. We conclude that the most difficult and interesting settings, whose necessary information is very long and highly diffused within the input, is severely under-explored. By using a descriptive vocabulary and discussing the relevant properties of difficulty in long-context, we can implement more informed research in this area. We call for a careful design of tasks and benchmarks with distinctly long context, taking into account the characteristics that make it qualitatively different from shorter context.
Updated: 2024-06-29 11:09:47
标题: 如果您只需要检索,那么“长上下文”真的重要吗?走向真正困难的长上下文自然语言处理(NLP)
摘要: 语言模型能力的提升推动了它们在更长语境下的应用,使得长语境评估和开发成为一个活跃的研究领域。然而,许多不同的用例被归为“长语境”这一总称下,仅仅由模型输入的总长度来定义,包括例如“大海捞针”任务、图书摘要和信息聚合等。鉴于它们的不同难度,在这篇立场论文中,我们认为将不同任务按照其语境长度混为一谈是无益的。作为一个社区,我们需要更准确的词汇来理解长语境任务的相似性或不同之处。我们提出根据使长语境任务在更长语境下更加困难的属性来拆分长语境的分类。我们提出了两个正交的困难轴:(一)扩散:在语境中找到必要信息有多难?(二)范围:有多少必要信息需要找到?我们调查了关于长语境的文献,为这一分类法提供了理由,并将文献与之联系起来。我们得出结论,那些最困难和有趣的设置,其中必要信息在输入中非常长且高度扩散,是严重未被探索的。通过使用描述性词汇并讨论长语境中困难的相关属性,我们可以在这一领域实施更加明智的研究。我们呼吁精心设计具有明显长语境的任务和基准,并考虑使其在质量上与较短语境有所不同的特征。
更新时间: 2024-06-29 11:09:47
领域: cs.CL,cs.AI
PUZZLES: A Benchmark for Neural Algorithmic Reasoning
Algorithmic reasoning is a fundamental cognitive ability that plays a pivotal role in problem-solving and decision-making processes. Reinforcement Learning (RL) has demonstrated remarkable proficiency in tasks such as motor control, handling perceptual input, and managing stochastic environments. These advancements have been enabled in part by the availability of benchmarks. In this work we introduce PUZZLES, a benchmark based on Simon Tatham's Portable Puzzle Collection, aimed at fostering progress in algorithmic and logical reasoning in RL. PUZZLES contains 40 diverse logic puzzles of adjustable sizes and varying levels of complexity; many puzzles also feature a diverse set of additional configuration parameters. The 40 puzzles provide detailed information on the strengths and generalization capabilities of RL agents. Furthermore, we evaluate various RL algorithms on PUZZLES, providing baseline comparisons and demonstrating the potential for future research. All the software, including the environment, is available at https://github.com/ETH-DISCO/rlp.
Updated: 2024-06-29 11:02:05
标题: PUZZLES:神经算法推理的基准
摘要: 算法推理是一种基本的认知能力,在问题解决和决策过程中发挥着关键作用。强化学习(RL)在诸如运动控制、处理感知输入和管理随机环境等任务中展现出了非凡的能力。这些进展在某种程度上得益于基准测试的可用性。在这项工作中,我们介绍了PUZZLES,这是一个基于Simon Tatham的Portable Puzzle Collection的基准测试,旨在促进RL中算法和逻辑推理的进步。PUZZLES包含40个不同的逻辑谜题,可调整大小和不同复杂程度;许多谜题还包括各种额外的配置参数。这40个谜题提供了关于RL代理的优势和泛化能力的详细信息。此外,我们评估了各种RL算法在PUZZLES上的表现,提供了基准比较,并展示了未来研究的潜力。所有软件,包括环境,均可在https://github.com/ETH-DISCO/rlp下载。
更新时间: 2024-06-29 11:02:05
领域: cs.LG,cs.AI
A Study on Effect of Reference Knowledge Choice in Generating Technical Content Relevant to SAPPhIRE Model Using Large Language Model
Representation of systems using the SAPPhIRE model of causality can be an inspirational stimulus in design. However, creating a SAPPhIRE model of a technical or a natural system requires sourcing technical knowledge from multiple technical documents regarding how the system works. This research investigates how to generate technical content accurately relevant to the SAPPhIRE model of causality using a Large Language Model, also called LLM. This paper, which is the first part of the two-part research, presents a method for hallucination suppression using Retrieval Augmented Generating with LLM to generate technical content supported by the scientific information relevant to a SAPPhIRE con-struct. The result from this research shows that the selection of reference knowledge used in providing context to the LLM for generating the technical content is very important. The outcome of this research is used to build a software support tool to generate the SAPPhIRE model of a given technical system.
Updated: 2024-06-29 10:46:01
标题: 使用大型语言模型生成与SAPPhIRE模型相关的技术内容中参考知识选择的影响研究
摘要: 使用SAPPhIRE因果模型来表示系统可以在设计中起到激发性的作用。然而,创建一个技术或自然系统的SAPPhIRE模型需要从多个关于系统如何运作的技术文档中获取技术知识。本研究探讨如何使用大型语言模型(LLM)生成与SAPPhIRE因果模型相关的技术内容。这篇论文是两部分研究的第一部分,提出了一种使用LLM进行检索增强生成的方法来抑制幻觉,以生成受科学信息支持的与SAPPhIRE构造相关的技术内容。这项研究的结果表明,在为生成技术内容的LLM提供背景时,所选择的参考知识非常重要。这项研究的结果被用于构建一个软件支持工具,用于生成给定技术系统的SAPPhIRE模型。
更新时间: 2024-06-29 10:46:01
领域: cs.CL,cs.AI
Multi-task multi-constraint differential evolution with elite-guided knowledge transfer for coal mine integrated energy system dispatching
The dispatch optimization of coal mine integrated energy system is challenging due to high dimensionality, strong coupling constraints, and multiobjective. Existing constrained multiobjective evolutionary algorithms struggle with locating multiple small and irregular feasible regions, making them inaplicable to this problem. To address this issue, we here develop a multitask evolutionary algorithm framework that incorporates the dispatch correlated domain knowledge to effectively deal with strong constraints and multiobjective optimization. Possible evolutionary multitask construction strategy based on complex constraint relationship analysis and handling, i.e., constraint coupled spatial decomposition, constraint strength classification and constraint handling technique, is first explored. Within the multitask evolutionary optimization framework, two strategies, i.e., an elite guided knowledge transfer by designing a special crowding distance mechanism to select dominant individuals from each task, and an adaptive neighborhood technology based mutation to effectively balance the diversity and convergence of each optimized task for the differential evolution algorithm, are further developed. The performance of the proposed algorithm in feasibility, convergence, and diversity is demonstrated in a case study of a coal mine integrated energy system by comparing with CPLEX solver and seven constrained multiobjective evolutionary algorithms.
Updated: 2024-06-29 10:00:16
标题: 多任务多约束差分进化与精英引导知识转移在煤矿综合能源系统调度中的应用
摘要: 煤矿综合能源系统的调度优化面临着高维度、强耦合约束和多目标的挑战。现有的受限多目标进化算法很难定位多个小型和不规则的可行区域,因此无法应用于该问题。为解决这一问题,我们开发了一个多任务进化算法框架,将调度相关领域知识整合进来,有效处理强约束和多目标优化。首先探讨了可能的基于复杂约束关系分析和处理的进化多任务构建策略,即约束耦合空间分解、约束强度分类和约束处理技术。在多任务进化优化框架中,进一步发展了两种策略,即通过设计特殊的拥挤距离机制从每个任务中选择主导个体进行精英引导的知识传递,以及基于自适应邻域技术的变异,以有效平衡不同优化任务的多样性和收敛性,为差分进化算法。通过与CPLEX求解器和七种受限多目标进化算法进行比较,展示了所提出算法在可行性、收敛性和多样性方面在煤矿综合能源系统案例研究中的表现。
更新时间: 2024-06-29 10:00:16
领域: cs.NE,cs.AI
LLM-Driven Multimodal Opinion Expression Identification
Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios. Utilizing CMU MOSEI and IEMOCAP datasets, we construct the CI-MOEI dataset. Additionally, Text-to-Speech (TTS) technology is applied to the MPQA dataset to obtain the CIM-OEI dataset. We design a template for the OEI task to take full advantage of the generative power of large language models (LLMs). Advancing further, we propose an LLM-driven method STOEI, which combines speech and text modal to identify opinion expressions. Our experiments demonstrate that MOEI significantly improves the performance while our method outperforms existing methods by 9.20\% and obtains SOTA results.
Updated: 2024-06-29 09:55:50
标题: LLM驱动的多模态意见表达识别
摘要: 意见表达识别(OEI)在自然语言处理中是至关重要的,应用范围从语音助手到抑郁症诊断不等。本研究将OEI扩展到跨越多模态输入,强调了声音线索在传递情感微妙之处的重要性,超越了文本的能力。我们引入了一个新颖的多模态OEI(MOEI)任务,集成文本和语音以反映现实场景。利用CMU MOSEI和IEMOCAP数据集,我们构建了CI-MOEI数据集。此外,文本转语音(TTS)技术被应用于MPQA数据集,以获得CIM-OEI数据集。我们设计了一个OEI任务模板,充分利用大型语言模型(LLMs)的生成能力。进一步推进,我们提出了一种LLM驱动的方法STOEI,结合语音和文本模态来识别意见表达。我们的实验表明,MOEI显著提高了性能,而我们的方法优于现有方法9.20%,取得了SOTA结果。
更新时间: 2024-06-29 09:55:50
领域: cs.CL,cs.AI,cs.SD,eess.AS
FANFOLD: Graph Normalizing Flows-driven Asymmetric Network for Unsupervised Graph-Level Anomaly Detection
Unsupervised graph-level anomaly detection (UGAD) has attracted increasing interest due to its widespread application. In recent studies, knowledge distillation-based methods have been widely used in unsupervised anomaly detection to improve model efficiency and generalization. However, the inherent symmetry between the source (teacher) and target (student) networks typically results in consistent outputs across both architectures, making it difficult to distinguish abnormal graphs from normal graphs. Also, existing methods mainly rely on graph features to distinguish anomalies, which may be unstable with complex and diverse data and fail to capture the essence that differentiates normal graphs from abnormal ones. In this work, we propose a Graph Normalizing Flows-driven Asymmetric Network For Unsupervised Graph-Level Anomaly Detection (FANFOLD in short). We introduce normalizing flows to unsupervised graph-level anomaly detection due to their successful application and superior quality in learning the underlying distribution of samples. Specifically, we adopt the knowledge distillation technique and apply normalizing flows on the source network, achieving the asymmetric network. In the training stage, FANFOLD transforms the original distribution of normal graphs to a standard normal distribution. During inference, FANFOLD computes the anomaly score using the source-target loss to discriminate between normal and anomalous graphs. We conduct extensive experiments on 15 datasets of different fields with 9 baseline methods to validate the superiority of FANFOLD.
Updated: 2024-06-29 09:49:16
标题: FANFOLD:基于图归一化流驱动的非对称网络用于无监督图级异常检测
摘要: 无监督图级异常检测(UGAD)由于其广泛应用而引起了越来越多的关注。在最近的研究中,基于知识蒸馏的方法广泛应用于无监督异常检测,以提高模型效率和泛化能力。然而,源(教师)和目标(学生)网络之间的固有对称性通常导致两种结构之间的一致输出,使得难以区分异常图和正常图。此外,现有方法主要依赖图特征来区分异常,这可能在复杂和多样化的数据中不稳定,并且无法捕捉区分正常图和异常图之间的本质。在这项工作中,我们提出了一种基于图归一化流驱动的非对称网络用于无监督图级异常检测(简称FANFOLD)。我们引入了归一化流到无监督图级异常检测中,因为它们在学习样本的潜在分布方面具有成功的应用和优越的质量。具体而言,我们采用知识蒸馏技术,并将归一化流应用于源网络,实现了非对称网络。在训练阶段,FANFOLD将正常图的原始分布转换为标准正态分布。在推断阶段,FANFOLD使用源-目标损失计算异常得分,以区分正常和异常图。我们在15个不同领域的数据集上进行了大量实验证明了FANFOLD的优越性,与9种基准方法进行了比较。
更新时间: 2024-06-29 09:49:16
领域: cs.LG,cs.AI
Learning Position From Vehicle Vibration Using an Inertial Measurement Unit
This paper presents a novel approach to vehicle positioning that operates without reliance on the global navigation satellite system (GNSS). Traditional GNSS approaches are vulnerable to interference in certain environments, rendering them unreliable in situations such as urban canyons, under flyovers, or in low reception areas. This study proposes a vehicle positioning method based on learning the road signature from accelerometer and gyroscope measurements obtained by an inertial measurement unit (IMU) sensor. In our approach, the route is divided into segments, each with a distinct signature that the IMU can detect through the vibrations of a vehicle in response to subtle changes in the road surface. The study presents two different data-driven methods for learning the road segment from IMU measurements. One method is based on convolutional neural networks and the other on ensemble random forest applied to handcrafted features. Additionally, the authors present an algorithm to deduce the position of a vehicle in real-time using the learned road segment. The approach was applied in two positioning tasks: (i) a car along a 6[km] route in a dense urban area; (ii) an e-scooter on a 1[km] route that combined road and pavement surfaces. The mean error between the proposed method's position and the ground truth was approximately 50[m] for the car and 30[m] for the e-scooter. Compared to a solution based on time integration of the IMU measurements, the proposed approach has a mean error of more than 5 times better for e-scooters and 20 times better for cars.
Updated: 2024-06-29 09:47:09
标题: 使用惯性测量单元从车辆振动中学习位置
摘要: 这篇论文提出了一种新颖的车辆定位方法,该方法不依赖全球导航卫星系统(GNSS)。传统的GNSS方法在某些环境中容易受到干扰,使它们在城市峡谷、立交桥下或接收信号较弱的区域等情况下变得不可靠。本研究提出了一种基于从惯性测量单元(IMU)传感器获取的加速度计和陀螺仪测量数据学习道路特征的车辆定位方法。在我们的方法中,路线被分成具有不同特征的段,IMU可以通过车辆对路面细微变化的振动做出反应来检测这些特征。该研究提出了两种不同的基于数据驱动的方法来学习从IMU测量数据中获得的道路段。一种方法基于卷积神经网络,另一种方法基于手工制作的特征应用于集成随机森林。此外,作者提出了一种算法,用于实时推断车辆的位置,利用学习到的道路段。该方法应用于两个定位任务:(i)一辆汽车沿着城市密集区域的6[km]路线;(ii)一辆电动滑板车在结合了道路和人行道表面的1[km]路线上。所提出方法的位置与实际情况之间的平均误差约为汽车的50[m]和电动滑板车的30[m]。与基于IMU测量的时间积分的解决方案相比,所提出的方法对于电动滑板车的平均误差提高了5倍以上,对于汽车提高了20倍。
更新时间: 2024-06-29 09:47:09
领域: cs.RO,cs.AI
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
The trustworthy machine learning (ML) community is increasingly recognizing the crucial need for models capable of selectively 'unlearning' data points after training. This leads to the problem of machine unlearning (MU), aiming to eliminate the influence of chosen data points on model performance, while still maintaining the model's utility post-unlearning. Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting, ignoring the vital inquiry into which subset should be chosen to truly gauge the authenticity of unlearning performance. To tackle this issue, we introduce a new evaluative angle for MU from an adversarial viewpoint. We propose identifying the data subset that presents the most significant challenge for influence erasure, i.e., pinpointing the worst-case forget set. Utilizing a bi-level optimization principle, we amplify unlearning challenges at the upper optimization level to emulate worst-case scenarios, while simultaneously engaging in standard training and unlearning at the lower level, achieving a balance between data influence erasure and model utility. Our proposal offers a worst-case evaluation of MU's resilience and effectiveness. Through extensive experiments across different datasets (including CIFAR-10, 100, CelebA, Tiny ImageNet, and ImageNet) and models (including both image classifiers and generative models), we expose critical pros and cons in existing (approximate) unlearning strategies. Our results illuminate the complex challenges of MU in practice, guiding the future development of more accurate and robust unlearning algorithms. The code is available at https://github.com/OPTML-Group/Unlearn-WorstCase.
Updated: 2024-06-29 09:24:46
标题: 挑战遗忘:揭示机器遗忘中最糟糕的情况
摘要: 机器学习(ML)社区越来越认识到需要能够在训练后有选择地“遗忘”数据点的模型的关键性。这导致了机器遗忘(MU)的问题,旨在消除选定数据点对模型性能的影响,同时仍保持模型在遗忘后的效用。尽管存在各种用于数据影响消除的MU方法,评估主要集中在随机数据遗忘上,忽视了对应该选择哪个子集来真正衡量遗忘性能的重要探讨。为了解决这个问题,我们从对抗的角度引入了一个新的MU评估角度。我们提出识别呈现对影响消除最重要挑战的数据子集,即确定最坏情况忘记集。利用双层优化原则,在上层优化级别上放大遗忘挑战,模拟最坏情况,同时在下层进行标准训练和遗忘,实现数据影响消除和模型效用之间的平衡。我们的提议提供了MU弹性和有效性的最坏情况评估。通过在不同数据集(包括CIFAR-10、100、CelebA、Tiny ImageNet和ImageNet)和模型(包括图像分类器和生成模型)上进行广泛实验,我们揭示了现有(近似)遗忘策略中的关键优缺点。我们的结果揭示了MU在实践中的复杂挑战,指导未来开发更准确和强大的遗忘算法。代码可在https://github.com/OPTML-Group/Unlearn-WorstCase上找到。
更新时间: 2024-06-29 09:24:46
领域: cs.LG,cs.AI,cs.CV
GraphArena: Benchmarking Large Language Models on Graph Computational Problems
The "arms race" of Large Language Models (LLMs) demands novel, challenging, and diverse benchmarks to faithfully examine their progresses. We introduce GraphArena, a benchmarking tool designed to evaluate LLMs on graph computational problems using million-scale real-world graphs from diverse scenarios such as knowledge graphs, social networks, and molecular structures. GraphArena offers a suite of 10 computational tasks, encompassing four polynomial-time (e.g., Shortest Distance) and six NP-complete challenges (e.g., Travelling Salesman Problem). It features a rigorous evaluation framework that classifies LLM outputs as correct, suboptimal (feasible but not optimal), or hallucinatory (properly formatted but infeasible). Evaluation of 10 leading LLMs, including GPT-4o and LLaMA3-70B-Instruct, reveals that even top-performing models struggle with larger, more complex graph problems and exhibit hallucination issues. Despite the application of strategies such as chain-of-thought prompting, these issues remain unresolved. GraphArena contributes a valuable supplement to the existing LLM benchmarks and is open-sourced at https://github.com/squareRoot3/GraphArena.
Updated: 2024-06-29 09:19:23
标题: GraphArena:在图计算问题上对大型语言模型进行基准测试
摘要: 大型语言模型(LLMs)的“军备竞赛”需要新颖、具有挑战性和多样化的基准来忠实地检验它们的进展。我们介绍了GraphArena,这是一个旨在使用来自不同场景的百万规模真实世界图形来评估LLMs在图计算问题上的基准工具。GraphArena提供了一套包含四个多项式时间(例如,最短距离)和六个NP完全挑战(例如,旅行推销员问题)的十个计算任务。它具有严格的评估框架,将LLM的输出分类为正确、次优(可行但不是最佳)或幻觉(格式正确但不可行)。对包括GPT-4o和LLaMA3-70B-Instruct在内的10个领先的LLM进行评估表明,即使表现最佳的模型也难以处理更大、更复杂的图问题,并且存在幻觉问题。尽管应用了诸如思维链引导等策略,但这些问题仍未解决。GraphArena为现有LLM基准提供了有价值的补充,并在https://github.com/squareRoot3/GraphArena上开源。
更新时间: 2024-06-29 09:19:23
领域: cs.AI,cs.CL
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention
Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3's generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity.
Updated: 2024-06-29 09:09:42
标题: 多样性干预文本到图像生成的事实税:基准和事实增强干预
摘要: 基于提示的“多样性干预”通常被采用来改善描绘具有不同种族或性别特征的个体的文本到图像(T2I)模型的多样性。然而,在生成真实历史人物时,这种策略是否会导致非事实的人口分布呢?在这项工作中,我们提出了DemOgraphic FActualIty Representation(DoFaiR),这是一个基准,用于系统地量化在T2I模型中使用多样性干预和保留人口事实性之间的权衡。DoFaiR包含了756个经过精心事实核查的测试实例,通过自动化的证据支持评估流程揭示了各种多样性提示的事实性税收。在DoFaiR上进行的实验揭示了多样性导向指令增加了DALLE-3生成中不同性别和种族群体的数量,但以历史不准确的人口分布为代价。为了解决这个问题,我们提出了Fact-Augmented Intervention(FAI),该干预指导大型语言模型(LLM)反思有关历史时期生成主体的性别和种族构成的口头或检索到的事实信息,并将其纳入T2I模型的生成上下文中。通过使用反映历史真相来引导模型生成,FAI在保留多样性的同时显著提高了在多样性干预下的人口事实性。
更新时间: 2024-06-29 09:09:42
领域: cs.CL,cs.AI,cs.CV,cs.CY
Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization
Multi-party training frameworks for decision trees based on secure multi-party computation enable multiple parties to train high-performance models on distributed private data with privacy preservation. The training process essentially involves frequent dataset splitting according to the splitting criterion (e.g. Gini impurity). However, existing multi-party training frameworks for decision trees demonstrate communication inefficiency due to the following issues: (1) They suffer from huge communication overhead in securely splitting a dataset with continuous attributes. (2) They suffer from huge communication overhead due to performing almost all the computations on a large ring to accommodate the secure computations for the splitting criterion. In this paper, we are motivated to present an efficient three-party training framework, namely Ents, for decision trees by communication optimization. For the first issue, we present a series of training protocols based on the secure radix sort protocols to efficiently and securely split a dataset with continuous attributes. For the second issue, we propose an efficient share conversion protocol to convert shares between a small ring and a large ring to reduce the communication overhead incurred by performing almost all the computations on a large ring. Experimental results from eight widely used datasets show that Ents outperforms state-of-the-art frameworks by $5.5\times \sim 9.3\times$ in communication sizes and $3.9\times \sim 5.3\times$ in communication rounds. In terms of training time, Ents yields an improvement of $3.5\times \sim 6.7\times$. To demonstrate its practicality, Ents requires less than three hours to securely train a decision tree on a widely used real-world dataset (Skin Segmentation) with more than 245,000 samples in the WAN setting.
Updated: 2024-06-29 08:49:42
标题: 三方决策树训练框架Ents:基于通信优化的高效三方训练方案
摘要: 基于安全多方计算的决策树多方训练框架使多方能够在分布式私有数据上进行高性能模型的训练,并保护隐私。训练过程基本上涉及根据拆分准则(例如Gini不纯度)频繁地对数据集进行拆分。然而,现有的决策树多方训练框架由于以下问题而表现出通信效率低下:(1)它们在安全地拆分具有连续属性的数据集时存在巨大的通信开销。(2)由于在一个大环上执行几乎所有计算以适应拆分准则的安全计算,因此它们面临巨大的通信开销。 在本文中,我们受到通信优化的启发,提出了一种高效的三方训练框架 Ents 用于决策树。对于第一个问题,我们基于安全基数排序协议提出了一系列训练协议,以便高效安全地拆分具有连续属性的数据集。对于第二个问题,我们提出了一种高效的份额转换协议,用于在小环和大环之间转换份额,以减少在大环上执行几乎所有计算所引发的通信开销。来自八个广泛使用的数据集的实验结果表明,Ents 在通信大小方面比最先进的框架提高了 5.5 倍到 9.3 倍,在通信轮次方面提高了 3.9 倍到 5.3 倍。在训练时间方面,Ents 实现了 3.5 倍到 6.7 倍的改进。为了证明其实用性,在广泛使用的真实世界数据集(皮肤分割)上,在 WAN 设置中,Ents 需要不到三小时来安全地训练一颗决策树,该数据集包含超过 245,000 个样本。
更新时间: 2024-06-29 08:49:42
领域: cs.CR,cs.AI
Axiomatization of Gradient Smoothing in Neural Networks
Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we proposed a gradient smooth theoretical framework for neural networks based on the function mollification and Monte Carlo integration. The framework intrinsically axiomatized gradient smoothing and reveals the rationale of existing methods. Furthermore, we provided an approach to design new smooth methods derived from the framework. By experimental measurement of several newly designed smooth methods, we demonstrated the research potential of our framework.
Updated: 2024-06-29 08:43:38
标题: 神经网络中梯度平滑的公理化
摘要: Gradients在神经网络解释中发挥着关键作用。神经网络固有的高维度和结构复杂性导致原始梯度包含大量噪音。虽然提出了几种方法来通过平滑减少噪音,但对神经网络中平滑梯度背后的原理讨论较少。在这项工作中,我们提出了一个基于函数磨光和蒙特卡罗积分的神经网络梯度平滑理论框架。该框架本质上公理化了梯度平滑,并揭示了现有方法的原理。此外,我们提供了一种从该框架派生新平滑方法的方法。通过实验测量几种新设计的平滑方法,我们展示了我们框架的研究潜力。
更新时间: 2024-06-29 08:43:38
领域: cs.LG,cs.AI
JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning
Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data.
Updated: 2024-06-29 08:19:25
标题: JSCDS: 一种用于龋齿RGB图像的核心数据选择方法,基于Jason-Shannon离散度的有效学习
摘要: 基于深度学习的RGB龋齿检测提高了龋齿识别的效率,对预防口腔疾病至关重要。深度学习模型的性能取决于高质量的数据,并需要大量的训练资源,使得有效部署具有挑战性。通过消除低质量和混乱数据,核心数据选择旨在提高训练效率,而不会显著影响模型性能。然而,基于距离的数据选择方法往往难以区分高维龋齿数据之间的依赖关系。为了解决这个问题,我们提出了一种基于Jensen-Shannon散度(JSCDS)的核心数据选择方法,用于有效的龋齿图像学习和龋齿分类。我们将核心数据选择标准描述为不同类别样本的分布。JSCDS通过在龋齿分类网络中样本嵌入表示计算聚类中心,并利用Jensen-Shannon散度计算数据样本与聚类中心之间的互信息,捕捉高维数据之间的非线性依赖关系。计算平均互信息以符合上述分布,作为构建模型训练核心集的标准。对RGB龋齿数据集进行的大量实验表明,JSCDS在预测性能和时间消耗方面优于其他数据选择方法。值得注意的是,JSCDS仅使用50%的核心数据就超过了完整数据集模型的性能,在70%的核心数据中,其性能优势更加显著。
更新时间: 2024-06-29 08:19:25
领域: cs.CV,cs.AI
From RAG to RICHES: Retrieval Interlaced with Sequence Generation
We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks. RICHES offers an alternative to conventional RAG systems by eliminating the need for separate retriever and generator. It retrieves documents by directly decoding their contents, constrained on the corpus. Unifying retrieval with generation allows us to adapt to diverse new tasks via prompting alone. RICHES can work with any Instruction-tuned model, without additional training. It provides attributed evidence, supports multi-hop retrievals and interleaves thoughts to plan on what to retrieve next, all within a single decoding pass of the LLM. We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.
Updated: 2024-06-29 08:16:58
标题: 从RAG到RICHES:检索与序列生成交织
摘要: 我们提出了一种新颖的方法RICHES,它将检索与序列生成任务交错进行。RICHES通过消除检索器和生成器的分离需求,为传统的RAG系统提供了一种替代方案。它通过直接解码内容来检索文档,受语料库限制。将检索与生成统一起来,使我们能够仅通过提示适应各种新任务。RICHES可以与任何经过指导的模型一起工作,无需额外的训练。它提供了带属性的证据,支持多跳检索,并交替思考以规划下一步要检索的内容,所有这些都在LLM的单个解码过程中完成。我们展示了RICHES在包括带属性和多跳问答在内的ODQA任务中的强大性能。
更新时间: 2024-06-29 08:16:58
领域: cs.CL,cs.AI
PhyTracker: An Online Tracker for Phytoplankton
Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions. Traditional phytoplankton monitoring methods, relying on non-in situ observations, are time-consuming and resource-intensive, limiting timely analysis. To address these limitations, we introduce PhyTracker, an intelligent in situ tracking framework designed for automatic tracking of phytoplankton. PhyTracker overcomes significant challenges unique to phytoplankton monitoring, such as constrained mobility within water flow, inconspicuous appearance, and the presence of impurities. Our method incorporates three innovative modules: a Texture-enhanced Feature Extraction (TFE) module, an Attention-enhanced Temporal Association (ATA) module, and a Flow-agnostic Movement Refinement (FMR) module. These modules enhance feature capture, differentiate between phytoplankton and impurities, and refine movement characteristics, respectively. Extensive experiments on the PMOT dataset validate the superiority of PhyTracker in phytoplankton tracking, and additional tests on the MOT dataset demonstrate its general applicability, outperforming conventional tracking methods. This work highlights key differences between phytoplankton and traditional objects, offering an effective solution for phytoplankton monitoring.
Updated: 2024-06-29 07:53:47
标题: PhyTracker:一种用于浮游植物的在线追踪器
摘要: 摘要:浮游植物是水生生态系统中至关重要的组成部分,需要有效的监测来了解海洋生态过程和环境条件。传统的浮游植物监测方法依赖于非现场观测,耗时且资源密集,限制了及时分析。为了解决这些限制,我们介绍了PhyTracker,这是一个智能的现场跟踪框架,专为自动跟踪浮游植物而设计。PhyTracker克服了浮游植物监测中独特的重大挑战,如在水流中受限的移动性,难以察觉的外观和杂质的存在。我们的方法包括三个创新模块:纹理增强特征提取(TFE)模块,注意力增强时间关联(ATA)模块和流不可知移动细化(FMR)模块。这些模块增强了特征捕获,区分了浮游植物和杂质,并精炼了运动特征。对PMOT数据集的大量实验验证了PhyTracker在浮游植物跟踪中的优越性,对MOT数据集的额外测试展示了其普适性,在性能上超越了传统的跟踪方法。这项工作凸显了浮游植物与传统物体之间的关键差异,为浮游植物监测提供了有效的解决方案。
更新时间: 2024-06-29 07:53:47
领域: cs.CV,cs.AI
Resource Allocation and Secure Wireless Communication in the Large Model-based Mobile Edge Computing System
With the rapid advancement of large models and mobile edge computing, transfer learning, particularly through fine-tuning, has become crucial for adapting models to downstream tasks. Traditionally, this requires users to share their data with model owners for fine-tuning, which is not only costly but also raises significant privacy concerns. Furthermore, fine-tuning large-scale models is computationally intensive and often impractical for many users. To tackle these challenges, we introduce a system that combines offsite-tuning with physical-layer security, which provides local data owners with a lightweight adapter and a compressed emulator. Data owners then fine-tune the adapter locally and securely send it back to the model owners through a confidential channel for integration, ensuring privacy and resource conservation. Our paper focuses on optimizing computational resource allocation among data owners and the large model owner deployed on edge, and on the compression ratio of adapters. We incorporate a secrecy uplink channel to maximize the utility that we defined while minimizing system costs like energy consumption and delay. The optimization uses the Dinkelbach algorithm, fractional programming, successive convex approximation and alternating optimization. Experiments demonstrate our algorithm's superiority over existing methods.
Updated: 2024-06-29 07:29:29
标题: 资源分配和大型基于模型的移动边缘计算系统中的安全无线通信
摘要: 随着大型模型和移动边缘计算的快速发展,迁移学习,特别是通过微调,已经成为适应下游任务的关键。传统上,这需要用户与模型所有者共享其数据进行微调,这不仅成本高昂,而且引起了重大的隐私问题。此外,对大规模模型进行微调是计算密集型的,对许多用户来说通常是不切实际的。为了解决这些挑战,我们引入了一个系统,将远程微调与物理层安全相结合,为本地数据所有者提供了一个轻量级适配器和一个压缩仿真器。数据所有者随后在本地微调适配器,并通过保密渠道安全地将其发送回模型所有者进行集成,确保隐私和资源保护。我们的论文侧重于优化数据所有者和部署在边缘的大型模型所有者之间的计算资源分配,以及适配器的压缩比。我们结合了一个保密上行通道,以最大化我们定义的效用,同时最小化能源消耗和延迟等系统成本。优化使用了Dinkelbach算法、分数规划、连续凸逼近和交替优化。实验证明我们的算法优于现有方法。
更新时间: 2024-06-29 07:29:29
领域: cs.CR,cs.SI
Boosting Protein Language Models with Negative Sample Mining
We introduce a pioneering methodology for boosting large language models in the domain of protein representation learning. Our primary contribution lies in the refinement process for correlating the over-reliance on co-evolution knowledge, in a way that networks are trained to distill invaluable insights from negative samples, constituted by protein pairs sourced from disparate categories. By capitalizing on this novel approach, our technique steers the training of transformer-based models within the attention score space. This advanced strategy not only amplifies performance but also reflects the nuanced biological behaviors exhibited by proteins, offering aligned evidence with traditional biological mechanisms such as protein-protein interaction. We experimentally observed improved performance on various tasks over datasets, on top of several well-established large protein models. This innovative paradigm opens up promising horizons for further progress in the realms of protein research and computational biology.
Updated: 2024-06-29 07:07:49
标题: 用负样本挖掘提升蛋白质语言模型
摘要: 我们介绍了一种在蛋白质表示学习领域中提升大型语言模型的开创性方法。我们的主要贡献在于对过度依赖共同进化知识的细化过程,以一种网络被训练以从不同类别的蛋白质对中提炼宝贵见解的负样本为核心。通过利用这种新颖方法,我们的技术引导了基于transformer的模型在注意力分数空间中的训练。这种先进的策略不仅提高了性能,还反映了蛋白质展现的微妙生物学行为,与传统生物机制如蛋白质相互作用相一致。我们在多个任务和数据集上实验观察到了改进的性能,超过了几个已建立的大型蛋白质模型。这种创新范式为蛋白质研究和计算生物学领域的进一步进展打开了有希望的前景。
更新时间: 2024-06-29 07:07:49
领域: cs.AI,cs.CL,cs.LG
Privacy Impact Assessments in the Wild: A Scoping Review
Privacy Impact Assessments (PIAs) offer a systematic process for assessing the privacy impacts of a project or system. As a privacy engineering strategy, PIAs are heralded as one of the main approaches to privacy by design, supporting the early identification of threats and controls. However, there is still a shortage of empirical evidence on their uptake and proven effectiveness in practice. To better understand the current state of literature and research, this paper provides a comprehensive Scoping Review (ScR) on the topic of PIAs "in the wild", following the well-established Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. As a result, this ScR includes 45 studies, providing an extensive synthesis of the existing body of knowledge, classifying types of research and publications, appraising the methodological quality of primary research, and summarising the positive and negative aspects of PIAs in practice, as reported by studies. This ScR also identifies significant research gaps (e.g., evidence gaps from contradictory results and methodological gaps from research design deficiencies), future research pathways, and implications for researchers, practitioners, and policymakers developing and evaluating PIA frameworks. As we conclude, there is still a significant need for more primary research on the topic, both qualitative and quantitative. A critical appraisal of qualitative studies (n=28) revealed deficiencies in the methodological quality, and only four quantitative studies were identified, suggesting that current primary research remains incipient. Nonetheless, PIAs can be regarded as a prominent sub-area in the broader field of Empirical Privacy Engineering, warranting further research toward more evidence-based practices.
Updated: 2024-06-29 07:06:30
标题: 荒野中的隐私影响评估:范围审查
摘要: Privacy Impact Assessments(PIAs)提供了一个系统的过程,用于评估项目或系统的隐私影响。作为一种隐私工程策略,PIAs被誉为隐私设计的主要方法之一,支持早期识别威胁和控制措施。然而,在实践中,对它们的采用和有效性仍然缺乏实证证据。为了更好地了解当前文献和研究的现状,本文提供了关于PIAs“在野外”的综合范围审查(ScR),遵循了广泛认可的系统评价和荟萃分析(PRISMA)指南。因此,这个ScR包括45项研究,提供了对现有知识体系的全面综合,分类研究和出版物类型,评估主要研究的方法论质量,并总结了研究报告中PIAs在实践中的正面和负面方面。这个ScR还确定了重要的研究空白(例如,来自矛盾结果的证据空白和来自研究设计缺陷的方法论空白),未来的研究路径以及对开发和评估PIA框架的研究人员、从业者和政策制定者的影响。总之,对于这个主题仍然需要更多的初级研究,无论是定性的还是定量的。对定性研究(n=28)的关键评价揭示了方法论质量的不足,只识别了四项定量研究,表明当前初级研究仍处于初级阶段。尽管如此,PIAs可以被视为更广泛的经验隐私工程领域中的一个显著子领域,需要进一步研究以实践更多基于证据的做法。
更新时间: 2024-06-29 07:06:30
领域: cs.CR
Korean Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering
Investigations into Aspect-Based Sentiment Analysis (ABSA) for Korean restaurant reviews are notably lacking in the existing literature. Our research proposes an intuitive and effective framework for ABSA in low-resource languages such as Korean. It optimizes prediction labels by integrating translated benchmark and unlabeled Korean data. Using a model fine-tuned on translated data, we pseudo-labeled the actual Korean NLI set. Subsequently, we applied LaBSE and MSP-based filtering to this pseudo-NLI set as implicit feature, enhancing Aspect Category Detection and Polarity determination through additional training. Incorporating dual filtering, this model bridged dataset gaps, achieving positive results in Korean ABSA with minimal resources. Through additional data injection pipelines, our approach aims to utilize high-resource data and construct effective models within communities, whether corporate or individual, in low-resource language countries. Compared to English ABSA, our framework showed an approximately 3% difference in F1 scores and accuracy. We release the dataset and our code for Korean ABSA, at this link.
Updated: 2024-06-29 07:01:51
标题: 通过隐式特征对韩语基于语态的情感分析进行语料过滤对齐
摘要: 现有文献中明显缺乏针对韩国餐厅评论的基于方面的情感分析(ABSA)的研究。我们的研究提出了一个直观有效的框架,用于在韩语等低资源语言中进行ABSA。通过整合翻译基准和未标记的韩语数据,优化了预测标签。使用在翻译数据上微调的模型,我们伪标记了实际韩语NLI集。随后,我们将LaBSE和基于MSP的过滤应用于这个伪NLI集,作为隐式特征,通过额外的训练增强了方面类别检测和极性确定。通过结合双重过滤,这个模型弥合了数据集差距,在韩语ABSA中取得了积极的结果,且资源消耗最小。通过额外的数据注入管道,我们的方法旨在利用高资源数据,在低资源语言国家的企业或个人社区内构建有效模型。与英语ABSA相比,我们的框架在F1分数和准确性方面显示了约3%的差异。我们在此链接上发布了韩语ABSA的数据集和代码。
更新时间: 2024-06-29 07:01:51
领域: cs.CL,cs.AI
Dual-view Aware Smart Contract Vulnerability Detection for Ethereum
The wide application of Ethereum technology has brought technological innovation to traditional industries. As one of Ethereum's core applications, smart contracts utilize diverse contract codes to meet various functional needs and have gained widespread use. However, the non-tamperability of smart contracts, coupled with vulnerabilities caused by natural flaws or human errors, has brought unprecedented challenges to blockchain security. Therefore, in order to ensure the healthy development of blockchain technology and the stability of the blockchain community, it is particularly important to study the vulnerability detection techniques for smart contracts. In this paper, we propose a Dual-view Aware Smart Contract Vulnerability Detection Framework named DVDet. The framework initially converts the source code and bytecode of smart contracts into weighted graphs and control flow sequences, capturing potential risk features from these two perspectives and integrating them for analysis, ultimately achieving effective contract vulnerability detection. Comprehensive experiments on the Ethereum dataset show that our method outperforms others in detecting vulnerabilities.
Updated: 2024-06-29 06:47:51
标题: 以太坊双视图感知智能合约漏洞检测
摘要: 以太坊技术的广泛应用为传统行业带来了技术创新。作为以太坊的核心应用之一,智能合约利用各种合同代码来满足不同的功能需求,并得到广泛应用。然而,智能合约的不可篡改性,以及由自然缺陷或人为错误导致的漏洞,给区块链安全带来了前所未有的挑战。因此,为了确保区块链技术的健康发展和区块链社区的稳定,研究智能合约的漏洞检测技术尤为重要。本文提出了一种名为DVDet的双视图感知智能合约漏洞检测框架。该框架首先将智能合约的源代码和字节码转换为加权图和控制流序列,从这两个视角捕获潜在的风险特征,并将它们整合进行分析,最终实现有效的合同漏洞检测。对以太坊数据集的全面实验表明,我们的方法在检测漏洞方面优于其他方法。
更新时间: 2024-06-29 06:47:51
领域: cs.CR,cs.LG
SPARKLE: Enhancing SPARQL Generation with Direct KG Integration in Decoding
Existing KBQA methods have traditionally relied on multi-stage methodologies, involving tasks such as entity linking, subgraph retrieval and query structure generation. However, multi-stage approaches are dependent on the accuracy of preceding steps, leading to cascading errors and increased inference time. Although a few studies have explored the use of end-to-end models, they often suffer from lower accuracy and generate inoperative query that is not supported by the underlying data. Furthermore, most prior approaches are limited to the static training data, potentially overlooking the evolving nature of knowledge bases over time. To address these challenges, we present a novel end-to-end natural language to SPARQL framework, SPARKLE. Notably SPARKLE leverages the structure of knowledge base directly during the decoding, effectively integrating knowledge into the query generation. Our study reveals that simply referencing knowledge base during inference significantly reduces the occurrence of inexecutable query generations. SPARKLE achieves new state-of-the-art results on SimpleQuestions-Wiki and highest F1 score on LCQuAD 1.0 (among models not using gold entities), while getting slightly lower result on the WebQSP dataset. Finally, we demonstrate SPARKLE's fast inference speed and its ability to adapt when the knowledge base differs between the training and inference stages.
Updated: 2024-06-29 06:43:11
标题: SPARKLE:在解码中通过直接知识图谱集成增强SPARQL生成
摘要: 现有的KBQA方法传统上依赖于多阶段的方法论,涉及实体链接、子图检索和查询结构生成等任务。然而,多阶段方法依赖于前期步骤的准确性,导致级联错误和增加推理时间。虽然一些研究已经探讨了使用端到端模型的方法,但它们往往精度较低,并生成不受底层数据支持的无法执行的查询。此外,大多数先前的方法局限于静态训练数据,可能忽视了知识库随时间演变的特性。为了解决这些挑战,我们提出了一种新颖的端到端自然语言到SPARQL框架SPARKLE。值得注意的是,SPARKLE在解码过程中直接利用知识库的结构,有效地将知识整合到查询生成中。我们的研究表明,在推理过程中简单地引用知识库显著减少了无法执行查询生成的发生率。SPARKLE在SimpleQuestions-Wiki上取得了新的最先进结果,在LCQuAD 1.0上获得了最高的F1分数(在不使用黄金实体的模型中),而在WebQSP数据集上略微降低了结果。最后,我们展示了SPARKLE的快速推理速度以及在训练和推理阶段之间知识库差异时的适应能力。
更新时间: 2024-06-29 06:43:11
领域: cs.IR,cs.AI,cs.CL
Facilitating Feature and Topology Lightweighting: An Ethereum Transaction Graph Compression Method for Malicious Account Detection
Ethereum has become one of the primary global platforms for cryptocurrency, playing an important role in promoting the diversification of the financial ecosystem. However, the relative lag in regulation has led to a proliferation of malicious activities in Ethereum, posing a serious threat to fund security. Existing regulatory methods usually detect malicious accounts through feature engineering or large-scale transaction graph mining. However, due to the immense scale of transaction data and malicious attacks, these methods suffer from inefficiency and low robustness during data processing and anomaly detection. In this regard, we propose an Ethereum Transaction Graph Compression method named TGC4Eth, which assists malicious account detection by lightweighting both features and topology of the transaction graph. At the feature level, we select transaction features based on their low importance to improve the robustness of the subsequent detection models against feature evasion attacks; at the topology level, we employ focusing and coarsening processes to compress the structure of the transaction graph, thereby improving both data processing and inference efficiency of detection models. Extensive experiments demonstrate that TGC4Eth significantly improves the computational efficiency of existing detection models while preserving the connectivity of the transaction graph. Furthermore, TGC4Eth enables existing detection models to maintain stable performance and exhibit high robustness against feature evasion attacks.
Updated: 2024-06-29 06:33:19
标题: 促进特征和拓扑轻量化:一种用于检测恶意账户的以太坊交易图压缩方法
摘要: 以太坊已成为全球加密货币的主要平台之一,在促进金融生态系统多元化方面发挥着重要作用。然而,监管相对滞后导致以太坊上恶意活动的泛滥,对资金安全构成严重威胁。现有的监管方法通常通过特征工程或大规模交易图挖掘来检测恶意账户。然而,由于交易数据规模庞大和恶意攻击,这些方法在数据处理和异常检测过程中存在低效率和低鲁棒性。基于此,我们提出了一种名为TGC4Eth的以太坊交易图压缩方法,通过轻量化交易图的特征和拓扑结构来辅助恶意账户的检测。在特征级别上,我们基于低重要性选择交易特征,以提高后续检测模型对特征逃避攻击的鲁棒性;在拓扑级别上,我们采用聚焦和粗化过程来压缩交易图的结构,从而提高检测模型的数据处理和推理效率。大量实验证明,TGC4Eth显著提高了现有检测模型的计算效率,同时保留了交易图的连通性。此外,TGC4Eth使现有检测模型能够保持稳定性能,并对特征逃避攻击具有高鲁棒性。
更新时间: 2024-06-29 06:33:19
领域: cs.CR,cs.SI
Guided Trajectory Generation with Diffusion Models for Offline Model-based Optimization
Optimizing complex and high-dimensional black-box functions is ubiquitous in science and engineering fields. Unfortunately, the online evaluation of these functions is restricted due to time and safety constraints in most cases. In offline model-based optimization (MBO), we aim to find a design that maximizes the target function using only a pre-existing offline dataset. While prior methods consider forward or inverse approaches to address the problem, these approaches are limited by conservatism and the difficulty of learning highly multi-modal mappings. Recently, there has been an emerging paradigm of learning to improve solutions with synthetic trajectories constructed from the offline dataset. In this paper, we introduce a novel conditional generative modeling approach to produce trajectories toward high-scoring regions. First, we construct synthetic trajectories toward high-scoring regions using the dataset while injecting locality bias for consistent improvement directions. Then, we train a conditional diffusion model to generate trajectories conditioned on their scores. Lastly, we sample multiple trajectories from the trained model with guidance to explore high-scoring regions beyond the dataset and select high-fidelity designs among generated trajectories with the proxy function. Extensive experiment results demonstrate that our method outperforms competitive baselines on Design-Bench and its practical variants. The code is publicly available in \texttt{https://github.com/dbsxodud-11/GTG}.
Updated: 2024-06-29 06:12:36
标题: 使用扩散模型的引导轨迹生成进行离线基于模型的优化
摘要: 优化复杂和高维黑盒函数在科学和工程领域中是普遍的。不幸的是,在大多数情况下,由于时间和安全限制,这些函数的在线评估受到限制。在离线基于模型的优化(MBO)中,我们旨在仅使用预先存在的离线数据集找到最大化目标函数的设计。尽管先前的方法考虑前向或逆向方法来解决问题,但这些方法受到保守主义和学习高度多模态映射的困难的限制。最近,出现了一种新兴的学习范式,即利用从离线数据集构建的合成轨迹来改善解决方案。在本文中,我们介绍了一种新颖的条件生成建模方法,以产生指向高得分区域的轨迹。首先,我们使用数据集构建朝着高得分区域的合成轨迹,同时注入局部性偏差以获得一致的改进方向。然后,我们训练一个条件扩散模型,以根据其分数生成轨迹。最后,我们从训练模型中采样多条轨迹,通过代理函数在生成的轨迹中选择超越数据集的高得分区域并选择高保真度设计。大量实验证明,我们的方法在Design-Bench及其实际变体上优于竞争基线。代码公开可用于\texttt{https://github.com/dbsxodud-11/GTG}。
更新时间: 2024-06-29 06:12:36
领域: cs.LG,cs.AI
Teola: Towards End-to-End Optimization of LLM-based Applications
Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling decisions. We propose fine-grained end-to-end orchestration, which utilizes task primitives as the basic units and represents each query's workflow as a primitive-level dataflow graph. This explicitly exposes a much larger design space, enables optimizations in parallelization and pipelining across primitives of different modules, and enhances scheduling to improve application-level performance. We build Teola, a novel orchestration framework for LLM-based applications that implements this scheme. Comprehensive experiments show that Teola can achieve up to 2.09x speedup over existing systems across various popular LLM applications.
Updated: 2024-06-29 05:59:53
标题: Teola:朝向基于LLM的应用程序端到端优化
摘要: 基于大型语言模型(LLM)的应用程序由LLM和非LLM组件组成,每个组件对端到端延迟都有贡献。尽管人们已经做出了很大努力来优化LLM推理,但端到端工作流优化却被忽视了。现有框架采用粗粒度编排任务模块,这限制了优化在每个模块内部进行,并导致次优调度决策。我们提出了细粒度端到端编排,利用任务基元作为基本单元,并将每个查询的工作流表示为基元级数据流图。这明确地暴露了一个更大的设计空间,使得可以在不同模块的基元之间进行并行化和流水线处理的优化,并增强调度以提高应用程序级性能。我们构建了Teola,一个为LLM应用程序设计的新型编排框架,实现了这种方案。全面的实验表明,Teola可以在各种流行的LLM应用程序中实现高达2.09倍的加速。
更新时间: 2024-06-29 05:59:53
领域: cs.DC,cs.AI,cs.NI
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing
Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of fine-grained debiasing strategies for editable fairness in LLMs.
Updated: 2024-06-29 05:50:28
标题: 从知识编辑的角度减轻大型语言模型的偏见
摘要: 现有的去偏见方法不可避免地会做出不合理或不期望的预测,因为它们被指定和评估为实现不同社会群体之间的平等,但忽略了个体事实,导致修改了现有知识。在本文中,我们首先建立了一个新的偏见缓解基准BiasKE,利用现有和额外构建的数据集,通过公平性、特异性和泛化等补充指标系统评估去偏见性能。同时,我们提出了一种新颖的去偏见方法Fairness Stamp(FAST),通过对个体偏见知识进行细粒度校准,实现可编辑的公平性。综合实验证明,FAST在去偏见性能方面超过了最先进的基线,同时不损害模型整体能力以保留知识,突显了在LLMs中实现可编辑公平性的细粒度去偏见策略的前景。
更新时间: 2024-06-29 05:50:28
领域: cs.CL,cs.AI
Self-Supervised Position Debiasing for Large Language Models
Fine-tuning has been demonstrated to be an effective method to improve the domain performance of large language models (LLMs). However, LLMs might fit the dataset bias and shortcuts for prediction, leading to poor generation performance. Previous works have proven that LLMs are prone to exhibit position bias, i.e., leveraging information positioned at the beginning or end, or specific positional cues within the input. Existing debiasing methods for LLMs require external bias knowledge or annotated non-biased samples, which is lacking for position debiasing and impractical in reality. In this work, we propose a self-supervised position debiasing (SOD) framework to mitigate position bias for LLMs. SOD leverages unsupervised responses from pre-trained LLMs for debiasing without relying on any external knowledge. To improve the quality of unsupervised responses, we propose an objective alignment (OAM) module to prune these responses. Experiments on eight datasets and five tasks show that SOD consistently outperforms existing methods in mitigating three types of position biases. Besides, SOD achieves this by sacrificing only a small performance on biased samples, which is general and effective. To facilitate the reproducibility of the results, we share the code of all methods and datasets on https://github.com/LZKSKY/SOD.
Updated: 2024-06-29 05:20:09
标题: 自监督位置去偏差对大型语言模型的影响
摘要: 微调已被证明是提高大型语言模型(LLMs)领域性能的有效方法。然而,LLMs可能会适应数据集的偏见和预测的捷径,导致生成性能不佳。先前的研究表明,LLMs容易表现出位置偏见,即利用位于开头或结尾的信息,或输入中的特定位置线索。现有的LLMs去偏见方法需要外部偏见知识或已注释的非偏见样本,这在位置去偏见方面缺乏,并在实际中不切实际。在这项工作中,我们提出了一种自监督位置去偏见(SOD)框架,以减轻LLMs的位置偏见。SOD利用预训练LLMs的无监督响应进行去偏见,而不依赖任何外部知识。为了提高无监督响应的质量,我们提出了一个客观对齐(OAM)模块来修剪这些响应。在八个数据集和五个任务上的实验表明,SOD在减轻三种位置偏见方面始终优于现有方法。此外,SOD通过仅在有偏见的样本上牺牲一小部分性能来实现这一点,这是一般且有效的。为了促进结果的可重现性,我们在https://github.com/LZKSKY/SOD上分享了所有方法和数据集的代码。
更新时间: 2024-06-29 05:20:09
领域: cs.CL,cs.AI,cs.LG,I.2.7
When large language models meet evolutionary algorithms
Pre-trained large language models (LLMs) have powerful capabilities for generating creative natural text. Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems. Motivated by the common collective and directionality of text generation and evolution, this paper illustrates the parallels between LLMs and EAs, which includes multiple one-to-one key characteristics: token representation and individual representation, position encoding and fitness shaping, position embedding and selection, Transformers block and reproduction, and model training and parameter adaptation. By examining these parallels, we analyze existing interdisciplinary research, with a specific focus on evolutionary fine-tuning and LLM-enhanced EAs. Drawing from these insights, valuable future directions are presented for advancing the integration of LLMs and EAs, while highlighting key challenges along the way. These parallels not only reveal the evolution mechanism behind LLMs but also facilitate the development of evolved artificial agents that approach or surpass biological organisms.
Updated: 2024-06-29 05:16:33
标题: 当大型语言模型遇到进化算法时
摘要: 预训练的大型语言模型(LLMs)具有强大的能力生成创造性的自然文本。进化算法(EAs)可以发现复杂现实世界问题的多样化解决方案。受文本生成和演化的共同集体性和方向性的启发,本文阐述了LLMs和EAs之间的相似之处,其中包括多个一对一的关键特征:标记表示和个体表示,位置编码和适应性塑造,位置嵌入和选择,变压器块和繁殖,以及模型训练和参数适应。通过检验这些相似之处,我们分析了现有的跨学科研究,特别关注进化微调和LLM增强的EAs。借鉴这些见解,提出了推进LLMs和EAs整合的有价值的未来方向,同时强调了前进道路上的关键挑战。这些相似之处不仅揭示了LLMs背后的演化机制,还促进了发展逐渐接近或超越生物有机体的演化人工代理。
更新时间: 2024-06-29 05:16:33
领域: cs.NE,cs.AI,cs.CL,cs.LG
LiteSearch: Efficacious Tree Search for LLM
Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget (maximum number of children) calculation to tackle this issue. By considering the search progress towards the final answer (history) and the guidance from a value network (future) trained without any step-wise annotations, our algorithm iteratively selects the most promising tree node before expanding it within the boundaries of the allocated computational budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach not only offers competitive performance but also enjoys significantly lower computational costs compared to baseline methods.
Updated: 2024-06-29 05:14:04
标题: LiteSearch:用于LLM的高效树搜索
摘要: 最近的研究表明,树搜索算法(如蒙特卡洛树搜索)可以显著提高LLM在复杂数学推理任务上的性能。然而,由于浪费性的搜索策略,它们通常需要超过贪婪解码的10倍以上的计算资源,这使它们难以在实际应用中部署。本研究引入了一种新颖的引导树搜索算法,具有动态节点选择和节点级别的探索预算(最大子节点数)计算,以解决这个问题。通过考虑搜索进展朝着最终答案(历史)和来自值网络(未经任何逐步标注训练)的指导(未来),我们的算法在分配的计算预算范围内迭代地选择最有前途的树节点,然后扩展它。在GSM8K和TabMWP数据集上进行的实验表明,我们的方法不仅提供竞争性性能,而且与基线方法相比享有显著较低的计算成本。
更新时间: 2024-06-29 05:14:04
领域: cs.CL,cs.AI,cs.LG
Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence
This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial data intelligent large models and their applications in urban environments, aerospace remote sensing, geography, transportation, and other scenarios. Additionally, it summarizes the latest application cases of spatial data intelligent large models in themes such as urban development, multimodal systems, remote sensing, smart transportation, and resource environments. Finally, the report concludes with an overview and outlook on the development prospects of spatial data intelligent large models.
Updated: 2024-06-29 04:31:52
标题: 空间数据智能基础模型研究:中国2024年空间数据智能战略发展白皮书
摘要: 这份报告关注空间数据智能大模型,深入探讨这些模型的原理、方法和尖端应用。报告详细讨论了空间数据智能大模型的定义、发展历史、当前状态和趋势,以及它们面临的挑战。报告系统地阐明了空间数据智能大模型的关键技术及其在城市环境、航空航天遥感、地理、交通等场景中的应用。此外,报告总结了空间数据智能大模型在城市发展、多模式系统、遥感、智能交通和资源环境等主题中的最新应用案例。最后,报告总结了空间数据智能大模型发展前景的概述和展望。
更新时间: 2024-06-29 04:31:52
领域: cs.AI,cs.CV,cs.LG
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems
Single-stage neural combinatorial optimization solvers have achieved near-optimal results on various small-scale combinatorial optimization (CO) problems without needing expert knowledge. However, these solvers exhibit significant performance degradation when applied to large-scale CO problems. Recently, two-stage neural methods with divide-and-conquer strategies have shown superiorities in addressing large-scale CO problems. Nevertheless, the efficiency of these methods highly relies on problem-specific heuristics in either the divide or the conquer procedure, which limits their applicability to general CO problems. Moreover, these methods employ separate training schemes and ignore the interdependencies between the dividing and conquering strategies, which often leads to sub-optimal solutions. To tackle these drawbacks, this article develops a unified neural divide-and-conquer framework (i.e., UDC) for solving general large-scale CO problems. UDC offers a Divide-Conquer-Reunion (DCR) training method to eliminate the negative impact of a sub-optimal dividing policy. Employing a high-efficiency Graph Neural Network (GNN) for global dividing and a fixed-length sub-path solver for conquering sub-problems, the proposed UDC framework demonstrates extensive applicability, achieving superior performance in 10 representative large-scale CO problems.
Updated: 2024-06-29 04:29:03
标题: UDC:一个统一的神经分治框架用于大规模组合优化问题
摘要: 单阶段神经组合优化求解器在各种小规模组合优化(CO)问题上取得了接近最优结果,而无需专家知识。然而,当应用于大规模CO问题时,这些求解器表现出显著的性能下降。最近,采用分而治之策略的两阶段神经方法在解决大规模CO问题方面显示出优势。然而,这些方法的效率高度依赖于问题特定的启发式算法,限制了它们对一般CO问题的适用性。此外,这些方法采用独立的训练方案,并忽略了分治策略之间的相互依赖关系,这经常导致次优解。为了解决这些缺点,本文开发了一个统一的神经分治框架(即UDC)来解决一般大规模CO问题。UDC提供了一个分治-征服-重聚(DCR)训练方法,以消除次优分治政策的负面影响。采用高效的图神经网络(GNN)进行全局分治,以及固定长度的子路径求解器来征服子问题,所提出的UDC框架展示了广泛的适用性,在10个代表性大规模CO问题中取得了卓越的性能。
更新时间: 2024-06-29 04:29:03
领域: cs.AI,cs.NE
Diverse Intra- and Inter-Domain Activity Style Fusion for Cross-Person Generalization in Activity Recognition
Existing domain generalization (DG) methods for cross-person generalization tasks often face challenges in capturing intra- and inter-domain style diversity, resulting in domain gaps with the target domain. In this study, we explore a novel perspective to tackle this problem, a process conceptualized as domain padding. This proposal aims to enrich the domain diversity by synthesizing intra- and inter-domain style data while maintaining robustness to class labels. We instantiate this concept using a conditional diffusion model and introduce a style-fused sampling strategy to enhance data generation diversity. In contrast to traditional condition-guided sampling, our style-fused sampling strategy allows for the flexible use of one or more random styles to guide data synthesis. This feature presents a notable advancement: it allows for the maximum utilization of possible permutations and combinations among existing styles to generate a broad spectrum of new style instances. Empirical evaluations on a broad range of datasets demonstrate that our generated data achieves remarkable diversity within the domain space. Both intra- and inter-domain generated data have proven to be significant and valuable, contributing to varying degrees of performance enhancements. Notably, our approach outperforms state-of-the-art DG methods in all human activity recognition tasks.
Updated: 2024-06-29 03:15:51
标题: 跨个体泛化中多样的领域内和领域间活动风格融合在活动识别中的应用
摘要: 现有的领域泛化(DG)方法用于跨个人泛化任务往往面临捕捉领域内和领域间样式多样性的挑战,导致与目标领域存在领域差距。在这项研究中,我们探索了一个新颖的视角来解决这个问题,这个过程被概念化为领域填充。这个提议旨在通过合成领域内和领域间的样式数据来丰富领域多样性,同时保持对类标的稳健性。我们使用条件扩散模型实例化了这个概念,并引入了一个样式融合的采样策略来增强数据生成多样性。与传统的条件引导采样相比,我们的样式融合采样策略允许灵活地使用一个或多个随机样式来引导数据合成。这一特性带来了一个显著的进步:它允许最大限度地利用现有样式之间的可能排列组合来生成广泛的新样式实例。对一系列数据集的实证评估表明,我们生成的数据在领域空间内实现了显著的多样性。领域内和领域间生成的数据都被证明是显著和有价值的,对性能提升有不同程度的贡献。值得注意的是,我们的方法在所有人体活动识别任务中均优于最先进的DG方法。
更新时间: 2024-06-29 03:15:51
领域: cs.LG,cs.AI
Decoding-Time Language Model Alignment with Multiple Objectives
Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that outputs the next token from a linear combination of predictions of all base models, for any given weightings over different objectives. We exploit a common form among a family of $f$-divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform, and derive an efficient decoding strategy. Theoretically, we show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method. Empirical results demonstrate the effectiveness of the algorithm. For example, compared to a parameter-merging baseline, MOD achieves 12.8% overall reward improvement when equally optimizing towards $3$ objectives. Moreover, we experiment with MOD on combining three fully-finetuned LLMs of different model sizes, each aimed at different objectives such as safety, coding, and general user preference. Unlike traditional methods that require careful curation of a mixture of datasets to achieve comprehensive improvement, we can quickly experiment with preference weightings using MOD to find the best combination of models. Our best combination reduces toxicity on Toxigen to nearly 0% and achieves 7.9--33.3% improvement across other three metrics ($\textit{i.e.}$, Codex@1, GSM-COT, BBH-COT).
Updated: 2024-06-29 02:29:38
标题: 使用多目标进行解码时间语言模型对齐
摘要: 将语言模型(LMs)与人类偏好对齐已经成为一项关键追求,使得这些模型能够更好地满足不同用户的需求。现有方法主要集中在优化LMs以适应单一奖励函数,从而限制了它们对多样化目标的适应能力。在这里,我们提出了$\textbf{多目标解码(MOD)}$,这是一种解码时算法,根据对不同目标的权重输出下一个标记的线性组合预测。我们利用了一类$f$-散度正则化对齐方法(如PPO、DPO及其变种)之间的共同形式,通过Legendre变换识别出一个封闭形式的解决方案,并推导出一个高效的解码策略。从理论上讲,我们展示了为什么现有方法即使在自然环境中也可能不是最优的,并为我们的方法提供了最优性保证。实证结果证明了该算法的有效性。例如,与参数合并基线相比,当平等优化向$3$个目标时,MOD的整体奖励提升率达到12.8%。此外,我们在结合三个完全微调的不同模型大小的LLMs上进行了MOD实验,每个模型都针对安全性、编码和一般用户偏好等不同目标。与传统方法需要精心策划数据集混合以实现全面改进不同,我们可以使用MOD快速实验偏好权重,找到最佳模型组合。我们的最佳组合将Toxigen上的有毒性减少到接近0%,并在其他三个指标(即Codex@1、GSM-COT、BBH-COT)上实现了7.9-33.3%的改进。
更新时间: 2024-06-29 02:29:38
领域: cs.LG
PerAct2: A Perceiver Actor Framework for Bimanual Manipulation Tasks
Bimanual manipulation is challenging due to precise spatial and temporal coordination required between two arms. While there exist several real-world bimanual systems, there is a lack of simulated benchmarks with a large task diversity for systematically studying bimanual capabilities across a wide range of tabletop tasks. This paper addresses the gap by extending RLBench to bimanual manipulation. We open-source our code and benchmark comprising 13 new tasks with 23 unique task variations, each requiring a high degree of coordination and adaptability. To kickstart the benchmark, we extended several state-of-the art methods to bimanual manipulation and also present a language-conditioned behavioral cloning agent -- PerAct2, which enables the learning and execution of bimanual 6-DoF manipulation tasks. Our novel network architecture efficiently integrates language processing with action prediction, allowing robots to understand and perform complex bimanual tasks in response to user-specified goals. Project website with code is available at: http://bimanual.github.io
Updated: 2024-06-29 02:06:01
标题: PerAct2:一个用于双手操作任务的感知者执行者框架
摘要: 双手操作是具有挑战性的,因为需要两只手之间精确的空间和时间协调。虽然存在一些现实世界的双手系统,但缺乏具有大量任务多样性的模拟基准,可以系统地研究在各种桌面任务中双手能力。本文通过将RLBench扩展到双手操作来填补这一差距。我们开源了包含13个新任务和23个独特任务变体的基准,每个任务都需要高度的协调和适应能力。为了启动这个基准,我们将几种最先进的方法扩展到双手操作,并提出了一种以语言为条件的行为克隆代理--PerAct2,它可以学习和执行双手6自由度操作任务。我们的新颖网络架构有效地将语言处理与动作预测结合起来,使机器人能够理解并根据用户指定的目标执行复杂的双手任务。项目网站和代码可在http://bimanual.github.io上找到。
更新时间: 2024-06-29 02:06:01
领域: cs.RO,cs.AI,cs.CV,cs.LG
EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy
Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approaches directly train and detect on unlabeled contaminated datasets, leading to the need for methods that are robust to such conditions. Ensemble methods emerged as a superior solution to enhance model robustness against contaminated training sets. However, the training time is greatly increased by the ensemble. In this study, we investigate the impact of outliers on the training phase, aiming to halt training on unlabeled contaminated datasets before performance degradation. Initially, we noted that blending normal and anomalous data causes AUC fluctuations, a label-dependent measure of detection accuracy. To circumvent the need for labels, we propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. Meanwhile, we theoretically demonstrate negative correlation between entropy metric and the label-based AUC. Based on this, we develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability. We conduct extensive experiments on ADBench (including 47 real datasets), and the overall results indicate that AutoEncoder (AE) enhanced by our approach not only achieves better performance than ensemble AEs but also requires under 2\% of training time. Lastly, our proposed metric and early-stopping approach are evaluated on other deep OD models, exhibiting their broad potential applicability.
Updated: 2024-06-29 01:40:46
标题: EntropyStop:使用损失熵进行无监督深度异常检测
摘要: 无监督异常值检测(UOD)是一项重要的数据挖掘任务。随着深度学习的发展,深度异常值检测(OD)引起了广泛关注。大多数深度UOD模型都是专门在干净的数据集上训练,以学习正常数据的分布,这需要大量的人工努力来清洁真实世界的数据。一些方法直接在未标记的受污染数据集上进行训练和检测,这导致需要对这种条件具有鲁棒性的方法。集成方法成为增强模型对受污染训练集的鲁棒性的优秀解决方案。然而,集成会极大地增加训练时间。 在本研究中,我们调查了异常值对训练阶段的影响,旨在在性能下降之前停止在未标记的受污染数据集上的训练。最初,我们注意到混合正常和异常数据会导致AUC波动,这是一个依赖标签的检测准确性度量。为了避免对标签的需求,我们提出了一个名为Loss Entropy的零标签熵度量,用于损失分布,使我们能够在没有标签的情况下推断出停止训练的最佳时机。同时,我们在理论上证明了熵度量与基于标签的AUC之间的负相关性。基于此,我们开发了一个自动停止算法,EntropyStop,在损失熵建议最大模型检测能力时停止训练。我们在ADBench上进行了大量实验(包括47个真实数据集),总体结果表明,由我们方法增强的AutoEncoder(AE)不仅实现了比集成AE更好的性能,而且所需训练时间不到2%。最后,我们提出的度量和早停止方法在其他深度OD模型上进行了评估,展示了它们广泛的潜在适用性。
更新时间: 2024-06-29 01:40:46
领域: cs.LG,cs.AI
Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks
Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the {\it degree of sign consistency} between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment.
Updated: 2024-06-29 00:46:04
标题: 探索和利用深度神经网络的不对称谷底
摘要: 探索损失景观可以揭示深度神经网络(DNNs)的固有原则。最近的研究表明,除了平坦和陡峭之外,山谷还存在额外的不对称性,但尚未彻底研究其原因或影响。我们的研究系统地探讨了影响DNN山谷对称性的因素,包括(1)影响收敛点的数据集、网络架构、初始化和超参数;和(2)用于1D可视化的噪声的大小和方向。我们的主要观察表明,噪声与收敛点之间的符号一致性程度是山谷对称性的关键指标。从ReLU激活和softmax函数的角度得出的理论洞察可以解释这一有趣现象。我们的发现推动了对模型融合场景中的新理解和应用:(1)插值分离模型的有效性与它们的符号一致性比率显著相关,(2)在联邦学习期间施加符号对齐出现为模型参数对齐的创新方法。
更新时间: 2024-06-29 00:46:04
领域: cs.LG,cs.AI
TIC: Translate-Infer-Compile for accurate "text to plan" using LLMs and Logical Representations
We study the problem of generating plans for given natural language planning task requests. On one hand, LLMs excel at natural language processing but do not perform well on planning. On the other hand, classical planning tools excel at planning tasks but require input in a structured language such as the Planning Domain Definition Language (PDDL). We leverage the strengths of both the techniques by using an LLM for generating the PDDL representation (task PDDL) of planning task requests followed by using a classical planner for computing a plan. Unlike previous approaches that use LLMs for generating task PDDLs directly, our approach comprises of (a) translate: using an LLM only for generating a logically interpretable intermediate representation of natural language task description, (b) infer: deriving additional logically dependent information from the intermediate representation using a logic reasoner (currently, Answer Set Programming solver), and (c) compile: generating the target task PDDL from the base and inferred information. We observe that using an LLM to only output the intermediate representation significantly reduces LLM errors. Consequently, TIC approach achieves, for at least one LLM, high accuracy on task PDDL generation for all seven domains of our evaluation dataset.
Updated: 2024-06-29 00:30:04
标题: TIC: 使用LLMs和逻辑表达进行准确的“文本到计划”翻译、推断和编译
摘要: 我们研究了为给定的自然语言计划任务请求生成计划的问题。一方面,LLMs擅长自然语言处理,但在规划方面表现不佳。另一方面,经典规划工具擅长规划任务,但需要以结构化语言(如规划域定义语言PDDL)作为输入。我们通过使用LLM生成规划任务请求的PDDL表示(任务PDDL),然后使用经典规划器计算计划,充分利用了这两种技术的优势。与以往直接使用LLMs生成任务PDDL的方法不同,我们的方法包括:(a)翻译:仅使用LLM生成自然语言任务描述的逻辑可解释中间表示,(b)推断:使用逻辑推理器(目前为答案集编程求解器)从中间表示中推导出额外的逻辑相关信息,(c)编译:从基础和推断出的信息生成目标任务PDDL。我们观察到,仅使用LLM输出中间表示显著减少了LLM的错误。因此,TIC方法在我们评估数据集的所有七个领域中,至少对于一个LLM,在任务PDDL生成上取得了高准确性。
更新时间: 2024-06-29 00:30:04
领域: cs.CL,cs.AI
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be too complex to be discovered via exploration. In this paper, we study whether Large Language Model (LLM) assistants which find easily discovered forms of specification gaming will generalize to perform rarer and more blatant forms, up to and including reward-tampering. We construct a curriculum of increasingly sophisticated gameable environments and find that training on early-curriculum environments leads to more specification gaming on remaining environments. Strikingly, a small but non-negligible proportion of the time, LLM assistants trained on the full curriculum generalize zero-shot to directly rewriting their own reward function. Retraining an LLM not to game early-curriculum environments mitigates, but does not eliminate, reward-tampering in later environments. Moreover, adding harmlessness training to our gameable environments does not prevent reward-tampering. These results demonstrate that LLMs can generalize from common forms of specification gaming to more pernicious reward tampering and that such behavior may be nontrivial to remove.
Updated: 2024-06-29 00:28:47
标题: 奉承到诡计:探究大型语言模型中的奖励篡改
摘要: 在强化学习中,规范游戏发生在人工智能系统学习到不受欢迎的行为,这些行为由于训练目标的错误规定而被高度奖励。规范游戏的范围可以从简单的奉承行为到复杂和有害的行为,比如奖励篡改,其中模型直接修改自己的奖励机制。然而,这些更有害的行为可能太复杂,无法通过探索发现。在本文中,我们研究了大型语言模型(LLM)助手是否会从容易发现的规范游戏形式推广到执行更罕见和更明显的形式,甚至包括奖励篡改。我们构建了一个逐渐复杂的可游戏环境课程,并发现在早期环境上进行训练会导致在剩余环境上出现更多的规范游戏行为。引人注目的是,有一小部分时间,接受完整课程训练的LLM助手会零-shot推广到直接重写自己的奖励函数。重新训练LLM以避免在早期环境中进行规范游戏可以减轻,但无法消除后期环境中的奖励篡改。此外,向我们的可游戏环境添加无害性训练并不能防止奖励篡改。这些结果表明,LLM可以从常见的规范游戏形式推广到更有害的奖励篡改,而这种行为可能并不容易消除。
更新时间: 2024-06-29 00:28:47
领域: cs.AI,cs.CL