Arxiv Day: Article

Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs

Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless there is an effective precondition. Recently, graph neural networks (GNNs) opened new possibilities for QP. Some promising empirical studies of applying GNNs for QP tasks show that GNNs can capture key characteristics of an optimization instance and provide adaptive guidance accordingly to crucial configurations during the solving process, or directly provide an approximate solution. Despite notable empirical observations, theoretical foundations are still lacking. In this work, we investigate the expressive or representative power of GNNs, a crucial aspect of neural network theory, specifically in the context of QP tasks, with both continuous and mixed-integer settings. We prove the existence of message-passing GNNs that can reliably represent key properties of quadratic programs, including feasibility, optimal objective value, and optimal solution. Our theory is validated by numerical results.

Updated: 2024-06-09 23:57:47

标题: 图神经网络对于（混合整数）二次规划问题的表达能力

摘要: 二次规划（QP）是非线性规划中应用最广泛的问题类别。许多应用需要实时/快速解决方案，尽管不一定需要高精度。现有方法要么涉及矩阵分解，要么使用预条件共轭梯度法。对于相对较大的实例，除非存在有效的预条件，否则这些方法无法满足实时要求。最近，图神经网络（GNNs）为QP打开了新的可能性。一些应用GNNs进行QP任务的有前途的实证研究显示，GNNs可以捕捉优化实例的关键特征，并根据解决过程中的关键配置提供自适应指导，或直接提供近似解决方案。尽管有显著的实证观察结果，但理论基础仍然缺乏。在这项工作中，我们研究了GNNs的表现力或代表性，在神经网络理论的关键方面，特别是在QP任务的连续和混合整数设置中。我们证明了存在可以可靠表示二次规划的关键属性的消息传递GNNs，包括可行性、最优目标值和最优解。我们的理论得到了数值结果的验证。

更新时间: 2024-06-09 23:57:47

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.05938v1

Linear Causal Representation Learning from Unknown Multi-node Interventions

Despite the multifaceted recent advances in interventional causal representation learning (CRL), they primarily focus on the stylized assumption of single-node interventions. This assumption is not valid in a wide range of applications, and generally, the subset of nodes intervened in an interventional environment is fully unknown. This paper focuses on interventional CRL under unknown multi-node (UMN) interventional environments and establishes the first identifiability results for general latent causal models (parametric or nonparametric) under stochastic interventions (soft or hard) and linear transformation from the latent to observed space. Specifically, it is established that given sufficiently diverse interventional environments, (i) identifiability up to ancestors is possible using only soft interventions, and (ii) perfect identifiability is possible using hard interventions. Remarkably, these guarantees match the best-known results for more restrictive single-node interventions. Furthermore, CRL algorithms are also provided that achieve the identifiability guarantees. A central step in designing these algorithms is establishing the relationships between UMN interventional CRL and score functions associated with the statistical models of different interventional environments. Establishing these relationships also serves as constructive proof of the identifiability guarantees.

Updated: 2024-06-09 23:56:49

标题: 从未知的多节点干预中学习线性因果表示

摘要: 尽管干预因果表示学（CRL）近年来取得了多方面的进展，但它们主要集中于单节点干预的理想假设。这一假设在许多应用中并不成立，通常在干预环境中，干预的节点子集是完全未知的。本文关注未知多节点（UMN）干预环境下的干预CRL，并为一般潜在因果模型（参数化或非参数化）在随机干预（软或硬）和从潜在空间到观测空间的线性转换下建立了首次可识别性结果。具体而言，通过提供足够多样化的干预环境，已建立了以下结果：（i）仅使用软干预可以实现对祖先的可识别性，（ii）使用硬干预可以实现完美可识别性。值得注意的是，这些保证与更为严格的单节点干预的最佳已知结果相匹配。此外，还提供了实现可识别性保证的CRL算法。设计这些算法的一个关键步骤是建立UMN干预CRL与不同干预环境的统计模型的评分函数之间的关系。建立这些关系也作为可识别性保证的建设性证明。

更新时间: 2024-06-09 23:56:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.05937v1

Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We observe that existing LMs are highly brittle to the presence of conflicting information in both the fine-tuning and in-context few-shot learning scenarios. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability. Our empirical results on open-domain QA show that these approaches significantly enhance model robustness. We also provide our findings on incorporating the fine-tuned discriminator's decision into the in-context learning process, proposing a way to exploit the benefits of two disparate learning schemes. Alongside our findings, we provide MacNoise, a machine-generated, conflict-induced dataset to further encourage research in this direction.

Updated: 2024-06-09 23:42:48

标题: 为何如此轻信？增强检索增强模型对反事实噪音的鲁棒性

摘要: 现有的检索增强语言模型（LMs）大多假设在检索到的文档集中存在一个天真的二分法：查询相关性和不相关性。我们的工作研究了一个更具挑战性的场景，即即使是“相关”的文档可能包含误导性或不正确的信息，导致检索到的文档之间产生冲突，从而负面影响模型决策，作为噪声存在。我们观察到现有的LMs对于细调和上下文少样本学习场景中存在冲突信息的鲁棒性非常脆弱。我们提出了处理检索文档中知识冲突的方法，通过显式地对鉴别器进行微调或提示GPT-3.5来引发其鉴别能力。我们在开放领域QA上的实证结果表明，这些方法显著增强了模型的鲁棒性。我们还提供了关于将经过微调的鉴别器的决策纳入上下文学习过程中的发现，提出了利用两种不同学习方案的益处的方法。除了我们的发现，我们还提供了MacNoise，一个由机器生成的、引发冲突的数据集，以进一步鼓励在这个方向上的研究。

更新时间: 2024-06-09 23:42:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.01579v3

Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises

Semantic communication (SemCom) has emerged as a new paradigm for communication systems, with deep learning (DL) models being one of the key drives to shift from the accuracy of bit/symbol to the semantics and pragmatics of data. Nevertheless, DL-based SemCom systems often face performance bottlenecks due to overfitting, poor generalization, and sensitivity to outliers. Furthermore, the varying-fading gains and noises with uncertain signal-to-noise ratios (SNRs) commonly present in wireless channels usually restrict the accuracy of semantic information transmission. Consequently, to address the aforementioned issues, this paper constructs a SemCom system based on the latent diffusion model, and proposes three improvements compared to existing works: i) To handle potential outliers in the source data, semantic errors obtained by projected gradient descent based on the vulnerabilities of DL models, are utilized to update the parameters and obtain an outlier-robust encoder. ii) A lightweight single-layer latent space transformation adapter completes one-shot learning at transmitter and is placed before the decoder at receiver, enabling adaptation for out-of-distribution data or enhancing human-perceptual quality. iii) An end-to-end consistency distillation (EECD) strategy is used to distill the diffusion models trained in latent space, enabling deterministic single or few-step real-time denoising in various noisy channels while maintaining high semantic quality. Extensive numerical experiments across different datasets demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality, outperforming the existing denoising approaches in semantic metrics such as MS-SSIM and LPIPS.

Updated: 2024-06-09 23:39:31

标题: 潜在扩散模型支持的考虑语义歧义和信道噪声的实时语义交流

摘要: 语义通信（SemCom）已经成为通信系统的一种新范式，深度学习（DL）模型是从比特/符号准确度向数据的语义和语用学转变的关键驱动力之一。然而，基于DL的SemCom系统常常面临过拟合、泛化能力差以及对离群值敏感等性能瓶颈。此外，在无线信道中常见的信号衰落增益和噪声随着不确定信噪比（SNR）的变化，通常限制了语义信息传输的准确性。因此，为了解决上述问题，本文构建了基于潜在扩散模型的SemCom系统，并提出了与现有工作相比的三项改进：i）为了处理源数据中潜在的离群值，利用基于DL模型的脆弱性的投影梯度下降得到的语义错误来更新参数，获得一个抗离群值的编码器。ii）一个轻量级的单层潜在空间转换适配器在发射端进行一次性学习，并在接收端解码器之前放置，使其能够适应分布之外的数据或增强人类感知质量。iii）使用端到端一致性蒸馏（EECD）策略来提炼在潜在空间中训练的扩散模型，实现在各种嘈杂信道中进行确定性的单步或少步实时去噪，同时保持高语义质量。对不同数据集进行的大量数值实验显示了提出的SemCom系统的优越性，一致地证明其对离群值的稳健性，传输未知分布数据的能力以及在保持高人类感知质量的同时执行实时信道去噪任务的能力，优于现有的去噪方法在MS-SSIM和LPIPS等语义度量方面。

更新时间: 2024-06-09 23:39:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.06644v1

A Relevance Model for Threat-Centric Ranking of Cybersecurity Vulnerabilities

The relentless process of tracking and remediating vulnerabilities is a top concern for cybersecurity professionals. The key challenge is trying to identify a remediation scheme specific to in-house, organizational objectives. Without a strategy, the result is a patchwork of fixes applied to a tide of vulnerabilities, any one of which could be the point of failure in an otherwise formidable defense. Given that few vulnerabilities are a focus of real-world attacks, a practical remediation strategy is to identify vulnerabilities likely to be exploited and focus efforts towards remediating those vulnerabilities first. The goal of this research is to demonstrate that aggregating and synthesizing readily accessible, public data sources to provide personalized, automated recommendations for organizations to prioritize their vulnerability management strategy will offer significant improvements over using the Common Vulnerability Scoring System (CVSS). We provide a framework for vulnerability management specifically focused on mitigating threats using adversary criteria derived from MITRE ATT&CK. We test our approach by identifying vulnerabilities in software associated with six universities and four government facilities. Ranking policy performance is measured using the Normalized Discounted Cumulative Gain (nDCG). Our results show an average 71.5% - 91.3% improvement towards the identification of vulnerabilities likely to be targeted and exploited by cyber threat actors. The return on investment (ROI) of patching using our policies results in a savings of 23.3% - 25.5% in annualized costs. Our results demonstrate the efficacy of creating knowledge graphs to link large data sets to facilitate semantic queries and create data-driven, flexible ranking policies.

Updated: 2024-06-09 23:29:12

标题: 一个用于基于威胁中心的网络安全漏洞排序的相关性模型

摘要: 追踪和修复漏洞的不懈过程是网络安全专业人员的首要关注点。关键挑战在于尝试确定一种特定于内部组织目标的修复方案。没有策略的情况下，结果就是对一大堆漏洞应用了一系列补丁，其中任何一个都可能是一个强大防御体系中的失败点。考虑到很少有漏洞成为现实攻击的焦点，一个实用的修复策略是识别可能被利用的漏洞，并将工作重点放在首先修复这些漏洞上。本研究的目标是证明，聚合和合成 readily accessible, public data sources，为组织提供个性化、自动化的建议，以便优先考虑其漏洞管理策略，将比使用 Common Vulnerability Scoring System (CVSS) 带来显著改进。我们提供了一个专门关注利用 MITRE ATT&CK 推导的对手标准来减轻威胁的漏洞管理框架。我们通过识别与六所大学和四个政府设施相关的软件中的漏洞来测试我们的方法。使用 Normalized Discounted Cumulative Gain (nDCG) 来衡量策略的表现排名。我们的结果显示，对于可能被网络威胁行为者瞄准和利用的漏洞的识别，平均改进了71.5% - 91.3%。使用我们的策略进行打补丁的投资回报率 (ROI) 导致年度成本节省了23.3% - 25.5%。我们的结果表明，创建知识图谱以链接大数据集合，以便进行语义查询，并创建基于数据的、灵活的排名策略的有效性。

更新时间: 2024-06-09 23:29:12

领域: cs.CR,K.6.5

下载: http://arxiv.org/abs/2406.05933v1

SynthAI: A Multi Agent Generative AI Framework for Automated Modular HLS Design Generation

In this paper, we introduce SynthAI, a new method for the automated creation of High-Level Synthesis (HLS) designs. SynthAI integrates ReAct agents, Chain-of-Thought (CoT) prompting, web search technologies, and the Retrieval-Augmented Generation (RAG) framework within a structured decision graph. This innovative approach enables the systematic decomposition of complex hardware design tasks into multiple stages and smaller, manageable modules. As a result, SynthAI produces synthesizable designs that closely adhere to user-specified design objectives and functional requirements. We further validate the capabilities of SynthAI through several case studies, highlighting its proficiency in generating complex, multi-module logic designs from a single initial prompt. The SynthAI code is provided via the following repo: \url{https://github.com/sarashs/FPGA_AGI}

Updated: 2024-06-09 23:01:24

标题: SynthAI：用于自动化模块化高级综合设计生成的多智能体生成人工智能框架

摘要: 在这篇论文中，我们介绍了SynthAI，这是一种用于自动创建高级综合（HLS）设计的新方法。SynthAI集成了ReAct代理、思维链（CoT）提示、网络搜索技术和检索增强生成（RAG）框架在一个结构化决策图中。这种创新方法使得复杂硬件设计任务能够系统地分解成多个阶段和更小、可管理的模块。因此，SynthAI产生的可综合设计与用户指定的设计目标和功能要求密切相关。我们通过几个案例研究进一步验证了SynthAI的能力，突出了它在从单个初始提示生成复杂的多模块逻辑设计方面的熟练程度。SynthAI代码提供在以下仓库中：\url{https://github.com/sarashs/FPGA_AGI}

更新时间: 2024-06-09 23:01:24

领域: cs.AI

下载: http://arxiv.org/abs/2405.16072v2

Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation

The performance of supervised semantic segmentation methods highly relies on the availability of large-scale training data. To alleviate this dependence, few-shot semantic segmentation (FSS) is introduced to leverage the model trained on base classes with sufficient data into the segmentation of novel classes with few data. FSS methods face the challenge of model generalization on novel classes due to the distribution shift between base and novel classes. To overcome this issue, we propose a class-shared memory (CSM) module consisting of a set of learnable memory vectors. These memory vectors learn elemental object patterns from base classes during training whilst re-encoding query features during both training and inference, thereby improving the distribution alignment between base and novel classes. Furthermore, to cope with the performance degradation resulting from the intra-class variance across images, we introduce an uncertainty-based feature augmentation (UFA) module to produce diverse query features during training for improving the model's robustness. We integrate CSM and UFA into representative FSS works, with experimental results on the widely-used PASCAL-5$^i$ and COCO-20$^i$ datasets demonstrating the superior performance of ours over state of the art.

Updated: 2024-06-09 22:50:22

标题: 基于记忆引导的网络与基于不确定性的特征增强用于少样本语义分割

摘要: 监督语义分割方法的性能高度依赖于大规模训练数据的可用性。为了减轻这种依赖性，引入了少样本语义分割（FSS）来利用在具有足够数据的基础类别上训练的模型来进行具有少量数据的新类别的分割。FSS方法面临的挑战是在基础类别和新类别之间的分布转移导致了模型对新类别的泛化问题。为了克服这个问题，我们提出了一个包含一组可学习记忆向量的类别共享内存（CSM）模块。这些记忆向量在训练期间从基础类别中学习基本对象模式，同时在训练和推断期间重新编码查询特征，从而改善基础类别和新类别之间的分布对齐。此外，为了应对由于图像内类变异性而导致的性能下降，我们引入了基于不确定性的特征增强（UFA）模块，在训练过程中产生多样化的查询特征，以提高模型的鲁棒性。我们将CSM和UFA集成到代表性的FSS工作中，通过在广泛使用的PASCAL-5$^i$和COCO-20$^i$数据集上的实验结果，展示了我们的卓越性能超过了现有技术水平。

更新时间: 2024-06-09 22:50:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.00545v2

Cyber-sensorium: An Extension of the Cyber Public Health Framework

In response to increasingly sophisticated cyberattacks, a health-based approach is being used to define and assess their impact. Two significant cybersecurity workshops have fostered this perspective, aiming to standardize the understanding of cyber harm. Experts at these workshops agreed on a public health-like framework to analyze cyber threats focusing on the perpetrators' intent, the means available to them, and the vulnerability of targets. We contribute to this dialogue with the cyber sensorium concept, drawing parallels between the digital network and a biological nervous system essential to human welfare. Cyberattacks on this system present serious global risks, underlining the need for its protection.

Updated: 2024-06-09 22:44:49

标题: 网络感知：网络公共卫生框架的延伸

摘要: 为应对日益复杂的网络攻击，人们开始采用基于健康的方法来定义和评估它们的影响。两个重要的网络安全研讨会促进了这一视角，旨在标准化对网络危害的理解。这些研讨会上的专家们就类似公共卫生的框架达成一致，以分析网络威胁，重点关注攻击者的意图、可利用的手段和目标的易受攻击性。我们通过提出网络感知概念，将数字网络与对人类福祉至关重要的生物神经系统之间的相似之处进行了对比。对这一系统的网络攻击带来了严重的全球风险，强调了对其保护的必要性。

更新时间: 2024-06-09 22:44:49

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2406.05929v1

MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification

We present a simple yet effective method to improve the robustness of Convolutional Neural Networks (CNNs) against adversarial examples by post-processing an adversarially trained model. Our technique, MeanSparse, cascades the activation functions of a trained model with novel operators that sparsify mean-centered feature vectors. This is equivalent to reducing feature variations around the mean, and we show that such reduced variations merely affect the model's utility, yet they strongly attenuate the adversarial perturbations and decrease the attacker's success rate. Our experiments show that, when applied to the top models in the RobustBench leaderboard, it achieves a new robustness record of 72.08% (from 71.07%) and 59.64% (from 59.56%) on CIFAR-10 and ImageNet, respectively, in term of AutoAttack accuracy. Code is available at https://github.com/SPIN-UMass/MeanSparse

Updated: 2024-06-09 22:14:55

标题: MeanSparse：通过均值中心特征稀疏化增强后训练的鲁棒性

摘要: 我们提出了一种简单而有效的方法，通过对经过对抗训练的模型进行后处理来提高卷积神经网络（CNNs）对抗性例子的鲁棒性。我们的技术，MeanSparse，将经过训练的模型的激活函数级联使用新颖的运算符，使其稀疏化以均值为中心的特征向量。这相当于减少围绕均值的特征变化，我们表明这种减少的变化只是影响了模型的效用，但它们强烈削弱了对抗性扰动并降低了攻击者的成功率。我们的实验表明，当应用于RobustBench排行榜中排名前列的模型时，它在AutoAttack准确性方面分别实现了CIFAR-10和ImageNet的新的鲁棒性记录，分别为72.08％（从71.07％）和59.64％（从59.56％）。代码可在https://github.com/SPIN-UMass/MeanSparse找到。

更新时间: 2024-06-09 22:14:55

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.05927v1

Infinite-Dimensional Feature Interaction

The past neural network design has largely focused on feature representation space dimension and its capacity scaling (e.g., width, depth), but overlooked the feature interaction space scaling. Recent advancements have shown shifted focus towards element-wise multiplication to facilitate higher-dimensional feature interaction space for better information transformation. Despite this progress, multiplications predominantly capture low-order interactions, thus remaining confined to a finite-dimensional interaction space. To transcend this limitation, classic kernel methods emerge as a promising solution to engage features in an infinite-dimensional space. We introduce InfiNet, a model architecture that enables feature interaction within an infinite-dimensional space created by RBF kernel. Our experiments reveal that InfiNet achieves new state-of-the-art, owing to its capability to leverage infinite-dimensional interactions, significantly enhancing model performance.

Updated: 2024-06-09 22:10:42

标题: 无限维特征交互

摘要: 过去神经网络设计主要集中在特征表示空间维度及其容量扩展（例如，宽度、深度），但忽视了特征交互空间的扩展。最近的进展表明，焦点转向逐元素乘法，以促进更高维的特征交互空间，以实现更好的信息转换。尽管取得了进展，但乘法主要捕捉低阶交互，因此仍受限于有限维的交互空间。为了超越这一限制，经典的核方法被视为在无限维空间中引入特征的有希望解决方案。我们介绍了InfiNet，一种模型架构，它使特征在由RBF核创建的无限维空间内进行交互。我们的实验表明，由于其利用无限维交互的能力，InfiNet实现了新的最先进技术水平，显著提升了模型性能。

更新时间: 2024-06-09 22:10:42

领域: cs.LG

下载: http://arxiv.org/abs/2405.13972v3

CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models Using Synthetic Back-Translation Data

Neural Machine Translation (NMT) for low-resource languages is still a challenging task in front of NLP researchers. In this work, we deploy a standard data augmentation methodology by back-translation to a new language translation direction Cantonese-to-English. We present the models we fine-tuned using the limited amount of real data and the synthetic data we generated using back-translation including OpusMT, NLLB, and mBART. We carried out automatic evaluation using a range of different metrics including lexical-based and embedding-based. Furthermore. we create a user-friendly interface for the models we included in this\textsc{ CantonMT} research project and make it available to facilitate Cantonese-to-English MT research. Researchers can add more models into this platform via our open-source\textsc{ CantonMT} toolkit \url{https://github.com/kenrickkung/CantoneseTranslation}.

Updated: 2024-06-09 22:10:04

标题: CantonMT：使用合成回译数据微调模型的粤语到英语NMT平台

摘要: 神经机器翻译（NMT）对于低资源语言仍然是自然语言处理研究人员面临的挑战。在这项工作中，我们将标准的数据增强方法——反向翻译应用到一个新的语言翻译方向，即粤语到英语。我们展示了我们使用有限的真实数据和通过反向翻译生成的合成数据（包括OpusMT、NLLB和mBART）微调的模型。我们使用一系列不同的基于词汇和基于嵌入的度量进行自动评估。此外，我们为本研究项目中包含的模型创建了一个用户友好的界面，并使其可用以促进粤语到英语的机器翻译研究。研究人员可以通过我们的开源CantonMT工具包将更多模型添加到这个平台上。您可以在\url{https://github.com/kenrickkung/CantoneseTranslation}找到该工具包。

更新时间: 2024-06-09 22:10:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.11346v3

Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona management, which enable reasoning for appropriate long-term dialogue responses. Recent progress in the human-like cognitive and reasoning capabilities of LLMs suggests that LLM-based agents could significantly enhance automated perception, decision-making, and problem-solving. In response to this potential, we introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent), which incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. For the event memory module, long and short-term memory banks are employed to separately focus on historical and ongoing sessions, while a topic-based retrieval mechanism is introduced to enhance the accuracy of memory retrieval. Furthermore, the persona module conducts dynamic persona modeling for both users and agents. The integration of retrieved memories and extracted personas is subsequently fed into the generator to induce appropriate responses. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated across various illustrative benchmarks, models, and tasks. The code is released at https://github.com/leolee99/LD-Agent.

Updated: 2024-06-09 21:58:32

标题: 你好！LLM强化的个性化代理用于长期对话

摘要: 开放领域对话系统随着大型语言模型（LLMs）的发展取得了显著进展。然而，大多数现有的对话系统主要关注简短的单次会话互动，忽视了长期陪伴和个性化与聊天机器人互动的现实需求。解决这一现实需求的关键是事件摘要和角色管理，这些可以促使适当的长期对话响应的推理。LLMs在人类般认知和推理能力方面的最新进展表明，基于LLM的代理可以显著增强自动感知、决策和问题解决能力。为了应对这一潜力，我们引入了一个与模型无关的框架，即长期对话代理（LD-Agent），该框架包含三个独立可调节的模块，分别专门用于事件感知、角色提取和响应生成。对于事件记忆模块，分别使用长期和短期记忆库来分开关注历史和正在进行的会话，同时引入基于主题的检索机制以提高记忆检索的准确性。此外，角色模块为用户和代理进行动态角色建模。随后，检索的记忆和提取的角色被馈送到生成器中以诱导适当的响应。LD-Agent的有效性、普适性和跨领域能力在各种说明性基准、模型和任务中进行了实证证明。该代码已发布在https://github.com/leolee99/LD-Agent。

更新时间: 2024-06-09 21:58:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05925v1

Learning to Infer Generative Template Programs for Visual Concepts

People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic expressions from a domain-specific language that specify structural and parametric patterns common to an input concept. Our framework supports multiple concept-related tasks, including few-shot generation and co-segmentation through parsing. We develop a learning paradigm that allows us to train networks that infer Template Programs directly from visual datasets that contain concept groupings. We run experiments across multiple visual domains: 2D layouts, Omniglot characters, and 3D shapes. We find that our method outperforms task-specific alternatives, and performs competitively against domain-specific approaches for the limited domains where they exist.

Updated: 2024-06-09 21:54:18

标题: 学习推断视觉概念的生成模板程序

摘要: 人们从少数示例中掌握灵活的视觉概念。我们探索了一个神经符号系统，该系统学习如何推断捕捉视觉概念的程序，以通用的方式。我们引入了模板程序：来自特定领域语言的程序表达，指定了输入概念中常见的结构和参数模式。我们的框架支持多个与概念相关的任务，包括通过解析进行少样本生成和共分割。我们开发了一种学习范式，使我们能够训练网络直接从包含概念分组的视觉数据集中推断模板程序。我们在多个视觉领域进行实验：2D布局、Omniglot字符和3D形状。我们发现我们的方法胜过特定任务的替代方案，并在有限领域存在的情况下与特定领域方法竞争。

更新时间: 2024-06-09 21:54:18

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2403.15476v2

Aligning LLM Agents by Learning Latent Preference from User Edits

We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex, subtle, and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages the LLM to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.

Updated: 2024-06-09 21:45:09

标题: 通过学习用户编辑的潜在偏好对齐LLM代理

摘要: 我们研究基于用户对代理输出进行的编辑的LLM语言代理的互动学习。在写作助手等典型环境中，用户与语言代理进行互动，以生成给定上下文的响应，并可以选择性地编辑代理响应以根据其潜在偏好个性化，同时提高正确性。编辑反馈自然生成，使其成为改善代理与用户偏好一致性，并随时间减少用户编辑成本的合适候选。我们提出了一个学习框架PRELUDE，根据历史编辑数据推断用户潜在偏好的描述。推断的用户偏好描述用于定义未来生成响应的提示。这避免了对代理进行微调，这是昂贵的，难以随用户数量扩展，并且甚至可能降低其在其他任务上的性能。此外，学习描述性偏好可以提高可解释性，使用户能够查看和修改学习到的偏好。然而，用户偏好可能复杂、微妙，并且根据上下文变化，这使得学习变得具有挑战性。为了解决这个问题，我们提出了一个简单而有效的算法CIPHER，利用LLM推断用户根据用户编辑对给定上下文的偏好。在未来，CIPHER从历史中k个最接近的上下文中检索推断的偏好，并形成用于生成响应的汇总偏好。我们引入了两个互动环境--摘要和电子邮件写作，并使用GPT-4模拟用户进行评估。在两项任务中，CIPHER通过实现最低的编辑距离成本，同时只有很小的LLM查询成本开销，优于几个基线。我们的分析报告表明，CIPHER学习到的用户偏好与真实潜在偏好之间存在显著相似性。

更新时间: 2024-06-09 21:45:09

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.15269v2

Contrastive Learning from Synthetic Audio Doppelgangers

Learning robust audio representations currently demands extensive datasets of real-world sound recordings. By applying artificial transformations to these recordings, models can learn to recognize similarities despite subtle variations through techniques like contrastive learning. However, these transformations are only approximations of the true diversity found in real-world sounds, which are generated by complex interactions of physical processes, from vocal cord vibrations to the resonance of musical instruments. We propose a solution to both the data scale and transformation limitations, leveraging synthetic audio. By randomly perturbing the parameters of a sound synthesizer, we generate audio doppelg\"angers-synthetic positive pairs with causally manipulated variations in timbre, pitch, and temporal envelopes. These variations, difficult to achieve through transformations of existing audio, provide a rich source of contrastive information. Despite the shift to randomly generated synthetic data, our method produces strong representations, competitive with real data on standard audio classification benchmarks. Notably, our approach is lightweight, requires no data storage, and has only a single hyperparameter, which we extensively analyze. We offer this method as a complement to existing strategies for contrastive learning in audio, using synthesized sounds to reduce the data burden on practitioners.

Updated: 2024-06-09 21:44:06

标题: 从合成音频“幽灵”中进行对比学习

摘要: 学习稳健的音频表示目前需要大量的真实世界声音录音数据。通过对这些录音应用人工转换，模型可以学习识别相似性，尽管存在微小变化，通过对比学习等技术。然而，这些转换仅是真实世界声音中真实多样性的近似，这些声音是由从声带振动到乐器共鸣的复杂物理过程相互作用产生的。我们提出了一个解决数据规模和转换限制的方法，利用合成音频。通过随机扰动声音合成器的参数，我们生成音频的双生体-具有因果性操作变化的合成正对，包括音色、音高和时间包络。这些难以通过现有音频的转换实现的变化提供了丰富的对比信息源。尽管转变为随机生成的合成数据，我们的方法产生了强大的表示，在标准音频分类基准测试中与真实数据竞争力相当。值得注意的是，我们的方法轻量级，无需数据存储，仅有一个超参数，我们进行了广泛分析。我们将这种方法作为对现有音频对比学习策略的补充，利用合成声音减轻从业者的数据负担。

更新时间: 2024-06-09 21:44:06

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.05923v1

Estimating Unknown Population Sizes Using the Hypergeometric Distribution

The multivariate hypergeometric distribution describes sampling without replacement from a discrete population of elements divided into multiple categories. Addressing a gap in the literature, we tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown. Here, we propose a novel solution using the hypergeometric likelihood to solve this estimation challenge, even in the presence of severe under-sampling. We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable, such as with collaborative filtering, using the variational autoencoder framework. Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data, both in terms of accuracy of population size estimate and in its ability to learn an informative latent space. We demonstrate our method's versatility through applications in NLP, by inferring and estimating the complexity of latent vocabularies in text excerpts, and in biology, by accurately recovering the true number of gene transcripts from sparse single-cell genomics data.

Updated: 2024-06-09 21:43:28

标题: 使用超几何分布估计未知人口规模

摘要: 多元超几何分布描述了从一个离散元素群体中不放回地抽样，该群体被分成多个类别。在文献中存在的一个空白中，我们应对了当总体大小和其组成类别的大小都未知时估计离散分布的挑战。在这里，我们提出了一种新颖的解决方案，使用超几何似然来解决这一估计挑战，即使在严重欠采样的情况下也能实现。我们开发了我们的方法，以解释一个数据生成过程，其中真实情况是一个基于连续潜变量的分布混合，例如使用变分自动编码器框架进行协同过滤。经验数据模拟表明，我们的方法在模拟计数数据的模型中优于其他似然函数，无论是在估计总体大小的准确性还是在学习信息丰富的潜在空间方面。我们通过在自然语言处理领域中推断和估计文本摘录中潜在词汇的复杂性，并在生物学领域中通过准确恢复稀疏单细胞基因组学数据中真实基因转录本的数量，展示了我们方法的多功能性。

更新时间: 2024-06-09 21:43:28

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2402.14220v2

Why Don't Prompt-Based Fairness Metrics Correlate?

The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.

Updated: 2024-06-09 21:12:15

标题: 为什么基于提示的公平度量不相关？

摘要: 大规模语言模型的广泛使用引发了关于这些模型可能学习的潜在偏见的基本问题。这导致了几个旨在评估和减轻这些偏见的指标的发展。在本文中，我们首先证明了基于提示的公平度指标在相关性方面表现出较差的一致性，引发了关于使用提示进行公平评估可靠性的重要问题。然后，我们概述了为什么现有指标之间观察到如此低的相关性的六个相关原因。基于这些见解，我们提出了一种名为Correlated Fairness Output (CAIRO)的方法，以增强公平度指标之间的相关性。CAIRO通过使用几个预训练语言模型来扩展给定公平度指标的原始提示，然后选择在指标之间实现最高相关性的扩展提示的组合。我们展示了在性别和宗教偏见方面，Pearson相关性分别从0.3和0.18提高到0.90和0.98的显着改善。我们的代码可在https://github.com/chandar-lab/CAIRO上找到。

更新时间: 2024-06-09 21:12:15

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.05918v1

Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations

Multimodal Large Language Models (MLLMs) are commonly evaluated using costly annotated multimodal benchmarks. However, these benchmarks often struggle to keep pace with the rapidly advancing requirements of MLLM evaluation. We propose GenCeption, a novel and annotation-free MLLM evaluation framework that merely requires unimodal data to assess inter-modality semantic coherence and inversely reflects the models' inclination to hallucinate. Analogous to the popular DrawCeption game, GenCeption initiates with a non-textual sample and undergoes a series of iterative description and generation steps. Semantic drift across iterations is quantified using the GC@T metric. Our empirical findings validate GenCeption's efficacy, showing strong correlations with popular MLLM benchmarking results. GenCeption may be extended to mitigate training data contamination by utilizing ubiquitous, previously unseen unimodal data.

Updated: 2024-06-09 21:10:34

标题: 引入GenCeption用于多模态LLM基准测试：您可以绕过注释

摘要: 多模式大型语言模型（MLLMs）通常通过昂贵的注释多模式基准进行评估。然而，这些基准经常难以跟上MLLM评估的快速发展需求。我们提出了GenCeption，这是一个新颖且无需注释的MLLM评估框架，仅需要单模态数据来评估跨模态语义一致性，并反映模型产生幻觉的倾向。类似于流行的DrawCeption游戏，GenCeption从一个非文本样本开始，经历一系列迭代的描述和生成步骤。跨迭代的语义漂移使用GC@T指标进行量化。我们的实证研究结果验证了GenCeption的有效性，显示与流行的MLLM基准测试结果之间的强相关性。GenCeption可以通过利用普遍存在但以前未见过的单模态数据来扩展，以减轻训练数据污染的问题。

更新时间: 2024-06-09 21:10:34

领域: cs.CL,cs.AI,cs.LG,I.7; I.4

下载: http://arxiv.org/abs/2402.14973v2

BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Land Use Land Cover (LULC) analysis on satellite images using deep learning-based methods is significantly helpful in understanding the geography, socio-economic conditions, poverty levels, and urban sprawl in developing countries. Recent works involve segmentation with LULC classes such as farmland, built-up areas, forests, meadows, water bodies, etc. Training deep learning methods on satellite images requires large sets of images annotated with LULC classes. However, annotated data for developing countries are scarce due to a lack of funding, absence of dedicated residential/industrial/economic zones, a large population, and diverse building materials. BD-SAT provides a high-resolution dataset that includes pixel-by-pixel LULC annotations for Dhaka metropolitan city and surrounding rural/urban areas. Using a strict and standardized procedure, the ground truth is created using Bing satellite imagery with a ground spatial distance of 2.22 meters per pixel. A three-stage, well-defined annotation process has been followed with support from GIS experts to ensure the reliability of the annotations. We performed several experiments to establish benchmark results. The results show that the annotated BD-SAT is sufficient to train large deep learning models with adequate accuracy for five major LULC classes: forest, farmland, built-up areas, water bodies, and meadows.

Updated: 2024-06-09 20:54:58

标题: BD-SAT：达卡市发展区域的高分辨率土地利用土地覆盖数据集和基准结果

摘要: 卫星图像上的土地利用土地覆盖(LULC)分析，利用基于深度学习的方法在了解发展中国家的地理、社会经济状况、贫困程度和城市扩张方面起着重要作用。最近的研究涉及到对农田、建筑区、森林、草地、水体等LULC类别的分割。在卫星图像上训练深度学习方法需要大量带有LULC类别注释的图像数据集。然而，由于缺乏资金、缺乏专门的居住/工业/经济区域、人口众多和建筑材料多样化，发展中国家的注释数据很少。BD-SAT提供了一个高分辨率数据集，其中包括达卡大都市及周边农村/城市地区的逐像素LULC注释。采用严格和标准化的程序，使用Bing卫星图像创建了地面真实数据，每个像素的地面空间距离为2.22米。遵循了一个经过三个阶段的明确定义的注释过程，得到了GIS专家的支持以确保注释的可靠性。我们进行了几项实验来建立基准结果。结果表明，注释的BD-SAT足以训练大型深度学习模型，对于五个主要的LULC类别（森林、农田、建筑区、水体和草地）具有足够的准确性。

更新时间: 2024-06-09 20:54:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05912v1

Neural Scaling Laws on Graphs

Deep graph models (e.g., graph neural networks and graph transformers) have become important techniques for leveraging knowledge across various types of graphs. Yet, the scaling properties of deep graph models have not been systematically investigated, casting doubt on the feasibility of achieving large graph models through enlarging the model and dataset sizes. In this work, we delve into neural scaling laws on graphs from both model and data perspectives. We first verify the validity of such laws on graphs, establishing formulations to describe the scaling behaviors. For model scaling, we investigate the phenomenon of scaling law collapse and identify overfitting as the potential reason. Moreover, we reveal that the model depth of deep graph models can impact the model scaling behaviors, which differ from observations in other domains such as CV and NLP. For data scaling, we suggest that the number of graphs can not effectively metric the graph data volume in scaling law since the sizes of different graphs are highly irregular. Instead, we reform the data scaling law with the number of edges as the metric to address the irregular graph sizes. We further demonstrate the reformed law offers a unified view of the data scaling behaviors for various fundamental graph tasks including node classification, link prediction, and graph classification. This work provides valuable insights into neural scaling laws on graphs, which can serve as an essential step toward large graph models.

Updated: 2024-06-09 20:49:33

标题: 图上的神经缩放规律

摘要: 深度图模型（例如，图神经网络和图变压器）已经成为利用各种类型图知识的重要技术。然而，深度图模型的扩展特性尚未得到系统研究，这对通过扩大模型和数据集大小实现大型图模型的可行性产生了疑问。在这项工作中，我们从模型和数据的角度探讨了图上的神经扩展规律。我们首先验证了这些规律在图上的有效性，建立了描述扩展行为的公式。在模型扩展方面，我们研究了规律坍缩现象，并确定过拟合可能是潜在原因。此外，我们揭示了深度图模型的模型深度可能会影响模型的扩展行为，这与CV和NLP等其他领域的观察结果不同。在数据扩展方面，我们建议图的数量不能有效地衡量图数据量的扩展规律，因为不同图的大小高度不规则。相反，我们改进了以边的数量为度量标准的数据扩展规律，以解决不规则图大小的问题。我们进一步证明了改进的规律为各种基本图任务（包括节点分类、链接预测和图分类）提供了数据扩展行为的统一视角。这项工作为图上的神经扩展规律提供了宝贵的见解，可以作为实现大型图模型的重要步骤。

更新时间: 2024-06-09 20:49:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.02054v2

A Survey on Complexity Measures of Pseudo-Random Sequences

Since the introduction of the Kolmogorov complexity of binary sequences in the 1960s, there have been significant advancements in the topic of complexity measures for randomness assessment, which are of fundamental importance in theoretical computer science and of practical interest in cryptography. This survey reviews notable research from the past four decades on the linear, quadratic and maximum-order complexities of pseudo-random sequences and their relations with Lempel-Ziv complexity, expansion complexity, 2-adic complexity, and correlation measures.

Updated: 2024-06-09 20:42:16

标题: 《伪随机序列复杂度测量调查》

摘要: 自从上世纪60年代引入了二进制序列的科尔莫戈洛夫复杂性以来，在随机性评估的复杂度测量主题方面取得了显著进展，这在理论计算机科学中具有基本重要性，并在密码学中具有实际意义。本调查回顾了过去四十年关于伪随机序列的线性、二次和最大阶复杂性的显着研究，以及它们与兰佩尔-齐夫复杂性、扩展复杂性、2-进制复杂性和相关性测量的关系。

更新时间: 2024-06-09 20:42:16

领域: cs.CR

下载: http://arxiv.org/abs/2405.08479v2

TTM-RE: Memory-Augmented Document-Level Relation Extraction

Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-quality, distantly supervised training data generally do not perform better than those trained solely on the smaller, high-quality, human-annotated training data. To unlock the full potential of large-scale noisy training data for document-level relation extraction, we propose TTM-RE, a novel approach that integrates a trainable memory module, known as the Token Turing Machine, with a noisy-robust loss function that accounts for the positive-unlabeled setting. Extensive experiments on ReDocRED, a benchmark dataset for document-level relation extraction, reveal that TTM-RE achieves state-of-the-art performance (with an absolute F1 score improvement of over 3%). Ablation studies further illustrate the superiority of TTM-RE in other domains (the ChemDisGene dataset in the biomedical domain) and under highly unlabeled settings.

Updated: 2024-06-09 20:18:58

标题: TTM-RE：基于记忆的文档级关系抽取

摘要: 文档级关系抽取旨在分类文档中任意两个实体之间的关联。我们发现以往的文档级关系抽取方法未能充分利用大量训练数据的潜力，而这些数据具有不同的噪声水平。例如，在ReDocRED基准数据集中，基于大规模、质量较低的远程监督训练数据训练的最先进方法通常不会比仅基于更小、高质量、人工标注的训练数据训练的方法表现更好。为了释放大规模嘈杂训练数据在文档级关系抽取中的全部潜力，我们提出了TTM-RE，这是一种集成可训练记忆模块（称为Token Turing Machine）和考虑正-无标签设置的嘈杂鲁棒损失函数的新方法。对文档级关系抽取基准数据集ReDocRED进行的大量实验表明，TTM-RE取得了最先进的性能（F1得分绝对提高超过3%）。消融研究进一步说明了TTM-RE在其他领域（生物医学领域中的ChemDisGene数据集）和高度无标签设置下的优越性。

更新时间: 2024-06-09 20:18:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05906v1

Neuron-Level Knowledge Attribution in Large Language Models

Identifying important neurons for final predictions is essential for understanding the mechanisms of large language models. Due to computational constraints, current attribution techniques struggle to operate at neuron level. In this paper, we propose a static method for pinpointing significant neurons for different outputs. Compared to seven other methods, our approach demonstrates superior performance across three metrics. Additionally, since most static methods typically only identify "value neurons" directly contributing to the final prediction, we introduce a static method for identifying "query neurons" which activate these "value neurons". Finally, we apply our methods to analyze the localization of six distinct types of knowledge across both attention and feed-forward network (FFN) layers. Our method and analysis are helpful for understanding the mechanisms of knowledge storage and set the stage for future research in knowledge editing. We will release our data and code on github.

Updated: 2024-06-09 20:03:02

标题: 大型语言模型中的神经元级知识归因

摘要: 识别对最终预测结果至关重要的神经元对于理解大型语言模型的机制至关重要。由于计算约束，目前的归因技术很难在神经元级别进行操作。在本文中，我们提出了一种用于确定不同输出的重要神经元的静态方法。与其他七种方法相比，我们的方法在三个指标上表现出优越性能。此外，由于大多数静态方法通常只识别直接影响最终预测结果的“数值神经元”，我们引入了一种用于识别激活这些“数值神经元”的“查询神经元”的静态方法。最后，我们应用我们的方法来分析六种不同类型知识在注意力和前馈网络（FFN）层中的定位。我们的方法和分析有助于理解知识存储的机制，并为知识编辑的未来研究奠定了基础。我们将在github上发布我们的数据和代码。

更新时间: 2024-06-09 20:03:02

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.12141v3

Aegis: A Decentralized Expansion Blockchain

Blockchains implement monetary systems operated by committees of nodes. The robustness of established blockchains presents an opportunity to leverage their infrastructure for creating expansion chains. Expansion chains can provide additional functionality to the primary chain they leverage or implement separate functionalities, while benefiting from the primary chain's security and the stability of its tokens. Indeed, tools like Ethereum's EigenLayer enable nodes to stake (deposit collateral) on a primary chain to form a committee responsible for operating an expansion chain. But here is the rub. Classical protocols assume correct, well-behaved nodes stay correct indefinitely. Yet in our case, the stake incentivizes correctness--it will be slashed (revoked) if its owner deviates. Once a node withdraws its stake, there is no basis to assume its correctness. To address the new challenge, we present Aegis, an expansion chain based on primary-chain stake, assuming a bounded primary-chain write time. Aegis uses references from Aegis blocks to primary blocks to define committees, checkpoints on the primary chain to perpetuate decisions, and resets on the primary chain to establish a new committee if the previous one becomes obsolete. It ensures safety at all times and rapid progress when latency among Aegis nodes is low.

Updated: 2024-06-09 19:53:48

标题: 神盾：一个去中心化的扩展区块链

摘要: 区块链实现由节点委员会操作的货币系统。已建立的区块链的稳健性为利用它们的基础设施创建扩展链提供了机会。扩展链可以为其利用的主链提供附加功能，或者实现单独的功能，同时受益于主链的安全性和其代币的稳定性。事实上，像以太坊的EigenLayer这样的工具使节点能够在主链上抵押（存入抵押品），从而形成负责运行扩展链的委员会。但问题在于，传统协议假设正确、表现良好的节点会永远保持正确。然而，在我们的情况下，抵押品激励了正确性--如果其所有者偏离，将被削减（撤销）。一旦节点撤回其抵押品，就没有理由认为其正确性。为了解决这一新挑战，我们提出了Aegis，一种基于主链抵押的扩展链，假设主链写入时间受限。Aegis利用Aegis区块对主区块的引用来定义委员会，在主链上的检查点上延续决策，并且在主链上重置以建立一个新委员会，如果之前的委员会变得过时。它确保始终安全，并在Aegis节点之间的延迟低时实现快速进展。

更新时间: 2024-06-09 19:53:48

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2406.05904v1

Governance of Generative Artificial Intelligence for Companies

Generative Artificial Intelligence (GenAI), specifically large language models like ChatGPT, has swiftly entered organizations without adequate governance, posing both opportunities and risks. Despite extensive debates on GenAI's transformative nature and regulatory measures, limited research addresses organizational governance, encompassing technical and business perspectives. Our review paper fills this gap by surveying recent works with the purpose of developing a framework for GenAI governance within companies. This framework outlines the scope, objectives, and governance mechanisms tailored to harness business opportunities as well as mitigate risks associated with GenAI integration. Our research contributes a focused approach to GenAI governance, offering practical insights for companies navigating the challenges of GenAI adoption and highlighting research gaps.

Updated: 2024-06-09 19:48:05

标题: 公司的生成人工智能治理

摘要: 生成人工智能（GenAI），特别是像ChatGPT这样的大型语言模型，已迅速进入组织而缺乏充分的治理，既带来机遇又带来风险。尽管关于GenAI变革性质和监管措施进行了广泛的辩论，但有限的研究涉及组织治理，涵盖技术和商业视角。我们的综述论文通过调查最近的研究作品，旨在制定一个适用于公司的GenAI治理框架。该框架概述了范围、目标和治理机制，旨在利用商业机会并减轻与GenAI整合相关的风险。我们的研究提供了一种专注于GenAI治理的方法，为正在应对GenAI采用挑战的公司提供实用见解，并突出研究空白。

更新时间: 2024-06-09 19:48:05

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2403.08802v2

Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

There is a growing body of work on learning from human feedback to align various aspects of machine learning systems with human values and preferences. We consider the setting of fairness in content moderation, in which human feedback is used to determine how two comments -- referencing different sensitive attribute groups -- should be treated in comparison to one another. With a novel dataset collected from Prolific and MTurk, we find significant gaps in fairness preferences depending on the race, age, political stance, educational level, and LGBTQ+ identity of annotators. We also demonstrate that demographics mentioned in text have a strong influence on how users perceive individual fairness in moderation. Further, we find that differences also exist in downstream classifiers trained to predict human preferences. Finally, we observe that an ensemble, giving equal weight to classifiers trained on annotations from different demographics, performs better for different demographic intersections; compared to a single classifier that gives equal weight to each annotation.

Updated: 2024-06-09 19:42:25

标题: 谁的偏好？公平偏好的差异及其对利用人类反馈的人工智能公平性的影响

摘要: 越来越多的工作集中在从人类反馈中学习，以使机器学习系统的各个方面与人类价值观和偏好保持一致。我们考虑内容审查中的公平设置，其中利用人类反馈来确定如何处理涉及不同敏感属性群体的两条评论。通过从Prolific和MTurk收集的新颖数据集，我们发现根据注释者的种族、年龄、政治立场、教育水平和LGBTQ+身份，公平偏好存在显著差距。我们还证明了文本中提到的人口统计学对用户如何感知审查中的个体公平性具有很强影响。此外，我们发现训练用于预测人类偏好的下游分类器也存在差异。最后，我们观察到一个集成模型，在给予来自不同人口统计数据的分类器相等权重的情况下，对不同人口统计数据交叉点表现更好；相比之下，一个给予每个注释相等权重的单一分类器。

更新时间: 2024-06-09 19:42:25

领域: cs.LG,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2406.05902v1

Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Research

The astonishing success of Large Language Models (LLMs) in Natural Language Processing (NLP) has spurred their use in many application domains beyond text analysis, including wearable sensor-based Human Activity Recognition (HAR). In such scenarios, often sensor data are directly fed into an LLM along with text instructions for the model to perform activity classification. Seemingly remarkable results have been reported for such LLM-based HAR systems when they are evaluated on standard benchmarks from the field. Yet, we argue, care has to be taken when evaluating LLM-based HAR systems in such a traditional way. Most contemporary LLMs are trained on virtually the entire (accessible) internet -- potentially including standard HAR datasets. With that, it is not unlikely that LLMs actually had access to the test data used in such benchmark experiments.The resulting contamination of training data would render these experimental evaluations meaningless. In this paper we investigate whether LLMs indeed have had access to standard HAR datasets during training. We apply memorization tests to LLMs, which involves instructing the models to extend given snippets of data. When comparing the LLM-generated output to the original data we found a non-negligible amount of matches which suggests that the LLM under investigation seems to indeed have seen wearable sensor data from the benchmark datasets during training. For the Daphnet dataset in particular, GPT-4 is able to reproduce blocks of sensor readings. We report on our investigations and discuss potential implications on HAR research, especially with regards to reporting results on experimental evaluation

Updated: 2024-06-09 19:38:27

标题: 大型语言模型记忆传感器数据集！对人类活动识别研究的影响

摘要: 大型语言模型（LLMs）在自然语言处理（NLP）中取得的惊人成功已经激发了它们在文本分析之外的许多应用领域中的使用，包括基于可穿戴传感器的人体活动识别（HAR）。在这种情况下，传感器数据经常直接输入到LLM中，以及文本指令，让模型执行活动分类。当这些基于LLM的HAR系统在标准基准上进行评估时，看似出色的结果已经被报道。然而，我们认为，在传统方式下评估LLM-based HAR系统时应该谨慎。大多数当代LLMs都是在几乎整个（可访问的）互联网上进行训练的，可能包括标准HAR数据集。因此，LLMs实际上可以访问用于这些基准实验的测试数据是并不奇怪的。由于训练数据的污染，这些实验性评估将变得毫无意义。在本文中，我们调查LLMs在训练过程中是否确实访问了标准HAR数据集。我们对LLMs应用记忆测试，这涉及指导模型扩展给定数据片段。当将LLM生成的输出与原始数据进行比较时，我们发现了一定数量的匹配项，这表明正在调查的LLM确实在训练中看到了来自基准数据集的可穿戴传感器数据。特别是对于Daphnet数据集，GPT-4能够复现传感器读数块。我们报告了我们的调查结果，并讨论了对HAR研究的潜在影响，特别是关于实验评估结果的报告。

更新时间: 2024-06-09 19:38:27

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.05900v1

Async Learned User Embeddings for Ads Delivery Optimization

User representation is crucial for recommendation systems as it helps to deliver personalized recommendations by capturing user preferences and behaviors in low-dimensional vectors. High-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends significantly on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities in Meta platforms through a Transformer-like large scale feature learning module. The async learned user representations embeddings (ALURE) are further converted to user similarity graphs through graph learning and then combined with user realtime activities to retrieval highly related ads candidates for the entire ads delivery system. Our method shows significant gains in both offline and online experiments.

Updated: 2024-06-09 19:35:20

标题: 异步学习用户嵌入以优化广告投放

摘要: 用户表示对推荐系统至关重要，因为它通过捕捉用户偏好和行为的低维向量，帮助提供个性化推荐。高质量的用户嵌入可以捕捉微妙的偏好，实现精确的相似性计算，并适应随时间变化的偏好以保持相关性。推荐系统的有效性在很大程度上取决于用户嵌入的质量。我们建议通过类似Transformer的大规模特征学习模块，从Meta平台中基于序列的多模态用户活动中异步学习数十亿用户每天的高保真用户嵌入。异步学习的用户表示嵌入（ALURE）进一步通过图学习转换为用户相似性图，然后与用户实时活动结合，为整个广告投放系统检索高度相关的广告候选。我们的方法在离线和在线实验中都表现出显著的增益。

更新时间: 2024-06-09 19:35:20

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05898v1

Event prediction and causality inference despite incomplete information

We explored the challenge of predicting and explaining the occurrence of events within sequences of data points. Our focus was particularly on scenarios in which unknown triggers causing the occurrence of events may consist of non-consecutive, masked, noisy data points. This scenario is akin to an agent tasked with learning to predict and explain the occurrence of events without understanding the underlying processes or having access to crucial information. Such scenarios are encountered across various fields, such as genomics, hardware and software verification, and financial time series prediction. We combined analytical, simulation, and machine learning (ML) approaches to investigate, quantify, and provide solutions to this challenge. We deduced and validated equations generally applicable to any variation of the underlying challenge. Using these equations, we (1) described how the level of complexity changes with various parameters (e.g., number of apparent and hidden states, trigger length, confidence, etc.) and (2) quantified the data needed to successfully train an ML model. We then (3) proved our ML solution learns and subsequently identifies unknown triggers and predicts the occurrence of events. If the complexity of the challenge is too high, our ML solution can identify trigger candidates to be used to interactively probe the system under investigation to determine the true trigger in a way considerably more efficient than brute force methods. By sharing our findings, we aim to assist others grappling with similar challenges, enabling estimates on the complexity of their problem, the data required and a solution to solve it.

Updated: 2024-06-09 19:23:20

标题: 事件预测和因果推断：尽管信息不完整

摘要: 我们探讨了在数据点序列中预测和解释事件发生的挑战。我们特别关注的是在未知触发器引起事件发生的情况下，这些触发器可能由非连续的、遮蔽的、嘈杂的数据点组成。这种情况类似于一个被赋予学习预测和解释事件发生的任务，但并不了解底层过程或无法获取关键信息。这些情况在各个领域都会遇到，比如基因组学、硬件和软件验证，以及金融时间序列预测。我们结合了分析、模拟和机器学习（ML）方法来研究、量化和提供解决方案。我们推导和验证了适用于底层挑战的任何变化的方程。利用这些方程，我们（1）描述了随着各种参数的变化（例如明显状态和隐藏状态的数量、触发器长度、置信度等）而发生的复杂性变化，并且（2）量化了成功训练ML模型所需的数据。然后，我们（3）证明了我们的ML解决方案学习并随后识别未知触发器并预测事件的发生。如果挑战的复杂性太高，我们的ML解决方案可以识别触发器候选者，以便与暴力方法相比在交互式探测系统进行调查以确定真正的触发器的方式更加高效。通过分享我们的发现，我们旨在帮助其他人解决类似的挑战，从而估计其问题的复杂性、所需的数据和解决方案。

更新时间: 2024-06-09 19:23:20

领域: cs.LG

下载: http://arxiv.org/abs/2406.05893v1

Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models

Software security vulnerabilities allow attackers to perform malicious activities to disrupt software operations. Recent Transformer-based language models have significantly advanced vulnerability detection, surpassing the capabilities of static analysis based deep learning models. However, language models trained solely on code tokens do not capture either the explanation of vulnerability type or the data flow structure information of code, both of which are crucial for vulnerability detection. We propose a novel technique that integrates a multitask sequence-to-sequence LLM with pro-gram control flow graphs encoded as a graph neural network to achieve sequence-to-classification vulnerability detection. We introduce MSIVD, multitask self-instructed fine-tuning for vulnerability detection, inspired by chain-of-thought prompting and LLM self-instruction. Our experiments demonstrate that MSIVD achieves superior performance, outperforming the highest LLM-based vulnerability detector baseline (LineVul), with a F1 score of 0.92 on the BigVul dataset, and 0.48 on the PreciseBugs dataset. By training LLMs and GNNs simultaneously using a combination of code and explanatory metrics of a vulnerable program, MSIVD represents a promising direction for advancing LLM-based vulnerability detection that generalizes to unseen data. Based on our findings, we further discuss the necessity for new labelled security vulnerability datasets, as recent LLMs have seen or memorized prior datasets' held-out evaluation data.

Updated: 2024-06-09 19:18:05

标题: 使用大型语言模型的多任务自我指导微调进行安全漏洞检测

摘要: 软件安全漏洞使攻击者能够执行恶意活动，破坏软件运行。最近基于Transformer的语言模型显著提高了漏洞检测的能力，超越了基于静态分析的深度学习模型的能力。然而，仅基于代码标记训练的语言模型无法捕捉漏洞类型的解释或代码的数据流结构信息，这两者对漏洞检测至关重要。我们提出了一种新颖的技术，将多任务序列到序列LLM与编码为图神经网络的程序控制流图集成在一起，实现了序列到分类漏洞检测。我们引入了MSIVD，多任务自我指导微调用于漏洞检测，受到了思维链提示和LLM自我指导的启发。我们的实验表明，MSIVD实现了卓越的性能，在BigVul数据集上的F1得分为0.92，在PreciseBugs数据集上为0.48，优于最高的LLM基础漏洞检测器(LineVul)。通过同时训练LLM和GNN，使用易受攻击程序的代码和解释性指标的组合，MSIVD代表了推进基于LLM的漏洞检测的有前途的方向，可以泛化到未见数据。根据我们的发现，我们进一步讨论了新的标记安全漏洞数据集的必要性，因为最近的LLM已经看到或记忆了以前数据集的被保留的评估数据。

更新时间: 2024-06-09 19:18:05

领域: cs.CR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2406.05892v1

GCtx-UNet: Efficient Network for Medical Image Segmentation

Medical image segmentation is crucial for disease diagnosis and monitoring. Though effective, the current segmentation networks such as UNet struggle with capturing long-range features. More accurate models such as TransUNet, Swin-UNet, and CS-UNet have higher computation complexity. To address this problem, we propose GCtx-UNet, a lightweight segmentation architecture that can capture global and local image features with accuracy better or comparable to the state-of-the-art approaches. GCtx-UNet uses vision transformer that leverages global context self-attention modules joined with local self-attention to model long and short range spatial dependencies. GCtx-UNet is evaluated on the Synapse multi-organ abdominal CT dataset, the ACDC cardiac MRI dataset, and several polyp segmentation datasets. In terms of Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD) metrics, GCtx-UNet outperformed CNN-based and Transformer-based approaches, with notable gains in the segmentation of complex and small anatomical structures. Moreover, GCtx-UNet is much more efficient than the state-of-the-art approaches with smaller model size, lower computation workload, and faster training and inference speed, making it a practical choice for clinical applications.

Updated: 2024-06-09 19:17:14

标题: GCtx-UNet：用于医学图像分割的高效网络

摘要: 医学图像分割对于疾病诊断和监测至关重要。尽管当前的分割网络如UNet等在一定程度上是有效的，但仍然存在难以捕捉长距离特征的问题。更准确的模型如TransUNet、Swin-UNet和CS-UNet具有更高的计算复杂度。为了解决这个问题，我们提出了GCtx-UNet，一种轻量级的分割架构，能够准确地捕捉全局和局部图像特征，表现优于或与最先进的方法相媲美。GCtx-UNet使用视觉Transformer，结合全局上下文自注意力模块和局部自注意力模块，来建模长距离和短距离空间依赖关系。GCtx-UNet在Synapse多器官腹部CT数据集、ACDC心脏MRI数据集和几个息肉分割数据集上进行了评估。在Dice相似系数（DSC）和Hausdorff距离（HD）指标方面，GCtx-UNet的表现优于基于CNN和Transformer的方法，在复杂和小型解剖结构的分割中取得显著的增益。此外，GCtx-UNet比最先进的方法更高效，具有更小的模型尺寸、更低的计算负载以及更快的训练和推理速度，使其成为临床应用的实用选择。

更新时间: 2024-06-09 19:17:14

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.05891v1

Few-Shot Load Forecasting Under Data Scarcity in Smart Grids: A Meta-Learning Approach

Despite the rapid expansion of smart grids and large volumes of data at the individual consumer level, there are still various cases where adequate data collection to train accurate load forecasting models is challenging or even impossible. This paper proposes adapting an established model-agnostic meta-learning algorithm for short-term load forecasting in the context of few-shot learning. Specifically, the proposed method can rapidly adapt and generalize within any unknown load time series of arbitrary length using only minimal training samples. In this context, the meta-learning model learns an optimal set of initial parameters for a base-level learner recurrent neural network. The proposed model is evaluated using a dataset of historical load consumption data from real-world consumers. Despite the examined load series' short length, it produces accurate forecasts outperforming transfer learning and task-specific machine learning methods by $12.5\%$. To enhance robustness and fairness during model evaluation, a novel metric, mean average log percentage error, is proposed that alleviates the bias introduced by the commonly used MAPE metric. Finally, a series of studies to evaluate the model's robustness under different hyperparameters and time series lengths is also conducted, demonstrating that the proposed approach consistently outperforms all other models.

Updated: 2024-06-09 18:59:08

标题: 在智能电网中数据稀缺条件下的少样本负荷预测：一种元学习方法

摘要: 尽管智能电网迅速扩张，个人消费者层面的数据量庞大，但仍然存在各种情况下数据收集不足以训练准确负荷预测模型的挑战，甚至是不可能的。本文提出在少样本学习的背景下，将一个已建立的模型无关的元学习算法应用于短期负荷预测。具体而言，所提出的方法可以仅使用极少的训练样本，在任何未知长度的负荷时间序列中快速适应和泛化。在此背景下，元学习模型学习了一个基层学习器递归神经网络的最佳初始参数集。所提出的模型使用来自真实消费者的历史负荷消耗数据集进行评估。尽管所考察的负荷序列长度较短，但它产生了比迁移学习和任务特定机器学习方法优越12.5%的准确预测。为了增强模型评估过程中的稳健性和公平性，提出了一种新的度量标准，平均对数百分比误差，该度量标准缓解了常用的MAPE度量引入的偏差。最后，还进行了一系列研究，评估了模型在不同超参数和时间序列长度下的稳健性，结果表明所提出的方法始终优于其他所有模型。

更新时间: 2024-06-09 18:59:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.05887v1

Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, offline Learning from Observations (LfO) is extensively studied: the agent learns to solve a task given only expert states and task-agnostic non-expert state-action pairs. The state-of-the-art DIstribution Correction Estimation (DICE) methods, as exemplified by SMODICE, minimize the state occupancy divergence between the learner's and empirical expert policies. However, such methods are limited to either $f$-divergences (KL and $chi^2$) or Wasserstein distance with Rubinstein duality, the latter of which constrains the underlying distance metric crucial to the performance of Wasserstein-based solutions. To enable more flexible distance metrics, we propose Primal Wasserstein DICE (PW-DICE). It minimizes the primal Wasserstein distance between the learner and expert state occupancies and leverages a contrastively learned distance metric. Theoretically, our framework is a generalization of SMODICE, and is the first work that unifies $f$-divergence and Wasserstein minimization. Empirically, we find that PW-DICE improves upon several state-of-the-art methods. The code is available at https://github.com/KaiYan289/PW-DICE.

Updated: 2024-06-09 18:43:27

标题: 离线观察的基于原始Wasserstein状态占用匹配的模仿

摘要: 在现实世界的场景中，与环境的任意交互通常会带来成本，并且专家演示的行动并不总是可用的。为了减少这两方面的需求，离线观察学习（LfO）得到了广泛研究：代理学习仅给定专家状态和与任务无关的非专家状态-动作对来解决任务。以SMODICE为例的最先进的DIstribution Correction Estimation（DICE）方法最小化了学习者和实证专家政策之间的状态占有差异。然而，这种方法仅限于$f$-散度（KL和$chi^2$）或具有Rubinstein对偶性的Wasserstein距离，后者限制了对于基于Wasserstein的解决方案性能至关重要的底层距离度量。为了实现更灵活的距离度量，我们提出了原始Wasserstein DICE（PW-DICE）。它最小化了学习者和专家状态占有之间的原始Wasserstein距离，并利用对比学习的距离度量。从理论上讲，我们的框架是SMODICE的一个泛化，并且是第一个统一$f$-散度和Wasserstein最小化的工作。在实证方面，我们发现PW-DICE优于几种最先进的方法。代码可在https://github.com/KaiYan289/PW-DICE获得。

更新时间: 2024-06-09 18:43:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.01331v3

Stereographic Spherical Sliced Wasserstein Distances

Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in developing computationally efficient variations of these distances for spherical probability measures. This paper introduces a high-speed and highly parallelizable distance for comparing spherical measures using the stereographic projection and the generalized Radon transform, which we refer to as the Stereographic Spherical Sliced Wasserstein (S3W) distance. We carefully address the distance distortion caused by the stereographic projection and provide an extensive theoretical analysis of our proposed metric and its rotationally invariant variation. Finally, we evaluate the performance of the proposed metrics and compare them with recent baselines in terms of both speed and accuracy through a wide range of numerical studies, including gradient flows and self-supervised learning. Our code is available at https://github.com/mint-vu/s3wd.

Updated: 2024-06-09 18:42:20

标题: 球形切片Wasserstein距离的立体投影

摘要: 比较球形概率分布在各个领域中都备受关注，包括地质学、医学领域、计算机视觉和深度表示学习。利用基于最优输运的距离，如Wasserstein距离，比较概率测度已经激发了对为球形概率测度开发计算效率变体的活跃研究。本文介绍了一种高速且高度可并行化的距离，用于使用等距投影和广义Radon变换比较球形测度，我们称之为Stereographic Spherical Sliced Wasserstein（S3W）距离。我们仔细处理了等距投影引起的距离失真，并对我们提出的度量及其具有旋转不变性的变体进行了广泛的理论分析。最后，我们通过广泛的数值研究，包括梯度流和自监督学习，评估了所提出的度量的性能，并将其与最近的基线进行了比较，包括速度和准确性。我们的代码可在https://github.com/mint-vu/s3wd找到。

更新时间: 2024-06-09 18:42:20

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2402.02345v2

Information Theoretic Guarantees For Policy Alignment In Large Language Models

Policy alignment of large language models refers to constrained policy optimization, where the policy is optimized to maximize a reward while staying close to a reference policy with respect to an $f$-divergence such as the $\mathsf{KL}$ divergence. The best of $n$ alignment policy selects a sample from the reference policy that has the maximum reward among $n$ independent samples. For both cases (policy alignment and best of $n$), recent works showed empirically that the reward improvement of the aligned policy on the reference one scales like $\sqrt{\mathsf{KL}}$, with an explicit bound in $n$ on the $\mathsf{KL}$ for the best of $n$ policy. We show in this paper that the $\sqrt{\mathsf{KL}}$ information theoretic upper bound holds if the reward under the reference policy has sub-gaussian tails. Moreover, we prove for the best of $n$ policy, that the $\mathsf{KL}$ upper bound can be obtained for any $f$-divergence via a reduction to exponential order statistics owing to the R\'enyi representation of order statistics, and a data processing inequality. If additional information is known on the tails of the aligned policy we show that tighter control on the reward improvement can be obtained via the R\'enyi divergence. Finally we demonstrate how these upper bounds transfer from proxy rewards to golden rewards which results in a decrease in the golden reward improvement due to overestimation and approximation errors of the proxy reward.

Updated: 2024-06-09 18:41:50

标题: 大型语言模型中政策对齐的信息论保证

摘要: 大型语言模型的政策对齐指的是受限政策优化，其中政策被优化以最大化奖励，同时保持与参考政策在$f$-散度（如$\mathsf{KL}$散度）方面的接近。最佳的$n$对齐政策从参考政策中选择一个样本，该样本在$n$个独立样本中具有最大奖励。对于这两种情况（政策对齐和最佳的$n$），最近的研究在实证上表明，对齐政策在参考政策上的奖励改进与$\sqrt{\mathsf{KL}}$成比例，对于最佳的$n$政策在$n$对$\mathsf{KL}$的显式界限上成立。我们在本文中证明，如果参考政策下的奖励具有次高斯尾部，则$\sqrt{\mathsf{KL}}$的信息理论上限成立。此外，我们证明对于最佳的$n$政策，通过将其降低到指数级顺序统计的R\'enyi表示，并且数据处理不等式，可以通过任何$f$-散度获得$\mathsf{KL}$上限。如果对对齐政策的尾部有额外信息，我们表明通过R\'enyi散度可以获得对奖励改进的更紧密控制。最后，我们展示了这些上限如何从代理奖励转移至黄金奖励，这导致了由于对代理奖励的高估和近似误差导致的黄金奖励改进的减少。

更新时间: 2024-06-09 18:41:50

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2406.05883v1

Distributional Preference Alignment of LLMs via Optimal Transport

Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the stochastic dominance of the reward distribution of the positive samples on the reward distribution of the negative samples. We analyze the sample complexity of AOT by considering the dual of the OT problem and show that it converges at the parametric rate. Empirically, we show on a diverse set of alignment datasets and LLMs that AOT leads to state-of-the-art models in the 7B family of models when evaluated with Open LLM Benchmarks and AlpacaEval.

Updated: 2024-06-09 18:41:05

标题: 通过最优输运实现LLMs的分布偏好对齐

摘要: 当前的LLM对齐技术在样本级别使用成对的人类偏好，因此，并不意味着在分布级别上进行对齐。我们在本文中提出了一种新颖的方法Alignment via Optimal Transport（AOT），用于LLM的分布偏好对齐。AOT通过使正样本的奖励分布在负样本的分布上以第一顺序具有随机优势，从而在未配对的偏好数据上对齐LLM。我们引入了这种一阶随机优势的凸松弛，并将其构造为具有平滑和凸成本的最优传输问题。由于结果最优传输问题的一维特性和成本的凸性，通过对经验度量进行排序，它具有封闭形式的解。我们使用这个AOT目标对LLM进行微调，通过惩罚正样本奖励分布在负样本奖励分布上的随机优势违反来实现对齐。通过考虑OT问题的对偶，我们分析了AOT的样本复杂度，并且显示其以参数速率收敛。在多样化的对齐数据集和LLM上的实证结果显示，AOT在使用Open LLM Benchmarks和AlpacaEval评估时，可以实现7B系列模型的最先进模型。

更新时间: 2024-06-09 18:41:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.05882v1

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Developing interactive systems that leverage natural language instructions to solve complex robotic control tasks has been a long-desired goal in the robotics community. Large Language Models (LLMs) have demonstrated exceptional abilities in handling complex tasks, including logical reasoning, in-context learning, and code generation. However, predicting low-level robotic actions using LLMs poses significant challenges. Additionally, the complexity of such tasks usually demands the acquisition of policies to execute diverse subtasks and combine them to attain the ultimate objective. Hierarchical Reinforcement Learning (HRL) is an elegant approach for solving such tasks, which provides the intuitive benefits of temporal abstraction and improved exploration. However, HRL faces the recurring issue of non-stationarity due to unstable lower primitive behaviour. In this work, we propose LGR2, a novel HRL framework that leverages language instructions to generate a stationary reward function for the higher-level policy. Since the language-guided reward is unaffected by the lower primitive behaviour, LGR2 mitigates non-stationarity and is thus an elegant method for leveraging language instructions to solve robotic control tasks. To analyze the efficacy of our approach, we perform empirical analysis and demonstrate that LGR2 effectively alleviates non-stationarity in HRL. Our approach attains success rates exceeding 70$\%$ in challenging, sparse-reward robotic navigation and manipulation environments where the baselines fail to achieve any significant progress. Additionally, we conduct real-world robotic manipulation experiments and demonstrate that CRISP shows impressive generalization in real-world scenarios.

Updated: 2024-06-09 18:40:24

标题: LGR2: 语言引导的奖励重标记用于加速分层强化学习

摘要: 开发利用自然语言指令解决复杂机器人控制任务的交互式系统一直是机器人领域长期以来的目标。大型语言模型（LLMs）已经展示出在处理复杂任务，包括逻辑推理、上下文学习和代码生成方面具有卓越能力。然而，使用LLMs预测低级机器人动作存在重大挑战。此外，这类任务的复杂性通常需要获取策略来执行各种子任务并将它们结合起来实现最终目标。分层强化学习（HRL）是解决这类任务的一种优雅方法，它提供了时间抽象和改进探索的直观好处。然而，HRL面临着由于不稳定的低层原始行为而导致的非稳态的重复问题。在这项工作中，我们提出了LGR2，一个利用语言指令生成高级策略的稳态奖励函数的新型HRL框架。由于语言引导奖励不受低级原始行为影响，LGR2减轻了非稳态问题，因此是利用语言指令解决机器人控制任务的一种优雅方法。为了分析我们方法的有效性，我们进行了实证分析，并证明LGR2有效地减轻了HRL中的非稳态问题。我们的方法在具有挑战性、稀疏奖励的机器人导航和操作环境中取得了超过70％的成功率，而基线方法未能取得任何显著进展。此外，我们进行了真实世界的机器人操作实验，并证明CRISP在真实场景中具有令人印象深刻的泛化能力。

更新时间: 2024-06-09 18:40:24

领域: cs.LG,cs.CL,cs.RO

下载: http://arxiv.org/abs/2406.05881v1

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with "TrustFormers" across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees.

Updated: 2024-06-09 18:40:20

标题: 审计和生成具有可控信任权衡的合成数据

摘要: 实际世界的数据常常存在偏见、不平衡和隐私风险。合成数据集已经出现以解决这些问题。这一范式依赖于生成式人工智能模型生成无偏见、保护隐私的数据，同时保持对原始数据的忠实度。然而，评估合成数据集和模型的可信度是一个关键挑战。我们引入了一个全面评估合成数据集和人工智能模型的综合审计框架。它专注于防止偏见和歧视，确保忠实于源数据，评估效用、稳健性和隐私保护。我们通过审计教育、医疗保健、银行和人力资源等不同用例中的各种生成模型展示了该框架的有效性，涵盖了不同的数据模态，如表格、时间序列、视觉和自然语言。这种全面评估对于遵守监管保障措施至关重要。我们引入了一个信誉指数，以基于它们的保护措施权衡对合成数据集进行排名。此外，我们在训练过程中介绍了一个基于信誉的模型选择和交叉验证过程，以“信誉形塑者”为例跨越各种数据类型。这种方法允许在合成数据创建中进行可控的信誉权衡。我们的审计框架促进了各方利益相关者之间的合作，包括数据科学家、治理专家、内部审阅者、外部认证机构和监管机构。这种透明报告应该成为一种标准做法，以防止偏见、歧视和隐私违规，确保遵守政策并提供问责、安全和性能保证。

更新时间: 2024-06-09 18:40:20

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2304.10819v4

Enhancing Mobile "How-to" Queries with Automated Search Results Verification and Reranking

Many people use search engines to find online guidance to solve computer or mobile device problems. Users frequently encounter challenges in identifying effective solutions from search results, often wasting time trying ineffective solutions that seem relevant yet fail to solve real problems. This paper introduces a novel approach to improving the accuracy and relevance of online technical support search results through automated search results verification and reranking. Taking "How-to" queries specific to on-device execution as a starting point, we developed the first solution that allows an AI agent to interpret and execute step-by-step instructions in the search results in a controlled Android environment. We further integrated the agent's findings into a reranking mechanism that orders search results based on the success indicators of the tested solutions. The paper details the architecture of our solution and a comprehensive evaluation of the system through a series of tests across various application domains. The results demonstrate a significant improvement in the quality and reliability of the top-ranked results. Our findings suggest a paradigm shift in how search engine ranking for online technical support help can be optimized, offering a scalable and automated solution to the pervasive challenge of finding effective and reliable online help.

Updated: 2024-06-09 18:33:00

标题: 通过自动搜索结果验证和重新排序提升移动端“How-to”查询

摘要: 许多人使用搜索引擎寻找在线指导来解决计算机或移动设备问题。用户经常在从搜索结果中识别有效解决方案时遇到挑战，通常会浪费时间尝试看似相关但无法解决真实问题的无效解决方案。本文介绍了一种改进在线技术支持搜索结果准确性和相关性的新方法，通过自动化搜索结果验证和重新排名。从特定于设备执行的“How-to”查询作为起点，我们开发了第一个允许AI代理在受控Android环境中解释和执行搜索结果中逐步说明的解决方案。我们进一步将代理的发现整合到重新排名机制中，根据已测试解决方案的成功指标对搜索结果进行排序。本文详细介绍了我们解决方案的架构以及通过一系列跨不同应用领域的测试对系统进行的全面评估。结果表明，在排名靠前的结果质量和可靠性方面取得了显著改善。我们的发现表明，搜索引擎排名在线技术支持帮助可以优化，提供可扩展且自动化的解决方案来解决寻找有效和可靠在线帮助的普遍挑战。

更新时间: 2024-06-09 18:33:00

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.08860v2

Risk Aware Benchmarking of Large Language Models

We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a metrics portfolio for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.

Updated: 2024-06-09 18:26:34

标题: 大语言模型的风险感知基准测试

摘要: 我们提出了一个分布框架，用于基准测试基础模型的社会技术风险，并量化统计显著性。我们的方法基于一种基于实际随机变量的一阶和二阶随机优势的新统计相对测试。我们表明，该测试中的二阶统计与经济计量学和数学金融中常用的均值风险模型相关联，用于在选择不同方案时平衡风险和效用。利用这一框架，我们正式开发了一种基于风险意识的基础模型选择方法，给定由指定指标量化的防护壁垒。受数学金融中的投资组合优化和选择理论的启发，我们将每个模型的指标投资组合定义为聚合一系列指标的手段，并根据这些投资组合的随机优势进行模型选择。我们的测试的统计显著性在理论上通过一个渐近分析支持，通过一个自举方差估计在实践中具体实例化。我们利用我们的框架比较了与偏离指令和输出有毒内容相关的风险相关的各种大型语言模型。

更新时间: 2024-06-09 18:26:34

领域: cs.LG,math.ST,q-fin.RM,stat.ML,stat.TH

下载: http://arxiv.org/abs/2310.07132v3

Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning

Paraphrasing of offensive content is a better alternative to content removal and helps improve civility in a communication environment. Supervised paraphrasers; however, rely heavily on large quantities of labelled data to help preserve meaning and intent. They also often retain a large portion of the offensiveness of the original content, which raises questions on their overall usability. In this paper we aim to assist practitioners in developing usable paraphrasers by exploring In-Context Learning (ICL) with large language models (LLMs), i.e., using a limited number of input-label demonstration pairs to guide the model in generating desired outputs for specific queries. Our study focuses on key factors such as - number and order of demonstrations, exclusion of prompt instruction, and reduction in measured toxicity. We perform principled evaluation on three datasets, including our proposed Context-Aware Polite Paraphrase (CAPP) dataset, comprising of dialogue-style rude utterances, polite paraphrases, and additional dialogue context. We evaluate our approach using four closed source and one open source LLM. Our results reveal that ICL is comparable to supervised methods in generation quality, while being qualitatively better by 25% on human evaluation and attaining lower toxicity by 76%. Also, ICL-based paraphrasers only show a slight reduction in performance even with just 10% training data.

Updated: 2024-06-09 18:22:33

标题: 演示就足够了：利用上下文学习推进攻击性内容的改写

摘要: 对冒犯性内容的改写是比删除内容更好的选择，有助于提高沟通环境的文明程度。然而，受监督的改写者通常会严重依赖大量有标签的数据来帮助保留含义和意图。他们通常会保留原始内容中的大部分冒犯性，这引发了对其整体可用性的质疑。本文旨在通过探索使用大型语言模型（LLMs）的上下文学习（ICL），即使用有限数量的输入-标签演示对来引导模型生成特定查询的期望输出，以帮助从业者开发可用的改写者。我们的研究重点关注关键因素，如演示的数量和顺序、排除提示指令以及降低测量毒性。我们在三个数据集上进行了系统评估，包括我们提出的上下文感知礼貌改写（CAPP）数据集，其中包含对话式粗鲁话语、礼貌改写以及额外的对话上下文。我们使用四个闭源和一个开源的LLM对我们的方法进行评估。我们的结果显示，ICL在生成质量方面与受监督方法相媲美，而在人类评估方面质量更好，提高了25％，并且毒性更低，降低了76％。此外，基于ICL的改写者即使只使用10％的训练数据，性能也只会略微降低。

更新时间: 2024-06-09 18:22:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.10707v2

Zero-Shot End-To-End Spoken Question Answering In Medical Domain

In the rapidly evolving landscape of spoken question-answering (SQA), the integration of large language models (LLMs) has emerged as a transformative development. Conventional approaches often entail the use of separate models for question audio transcription and answer selection, resulting in significant resource utilization and error accumulation. To tackle these challenges, we explore the effectiveness of end-to-end (E2E) methodologies for SQA in the medical domain. Our study introduces a novel zero-shot SQA approach, compared to traditional cascade systems. Through a comprehensive evaluation conducted on a new open benchmark of 8 medical tasks and 48 hours of synthetic audio, we demonstrate that our approach requires up to 14.7 times fewer resources than a combined 1.3B parameters LLM with a 1.55B parameters ASR model while improving average accuracy by 0.5\%. These findings underscore the potential of E2E methodologies for SQA in resource-constrained contexts.

Updated: 2024-06-09 18:13:36

标题: 医学领域零样本端到端口语问答

摘要: 在口语问答系统（SQA）不断发展的领域中，大型语言模型（LLMs）的整合已经成为一项变革性的发展。传统方法通常涉及使用独立模型进行问题音频转录和答案选择，导致资源利用率高和误差累积严重。为了应对这些挑战，我们探讨了端到端（E2E）方法在医学领域SQA中的有效性。我们的研究引入了一种新颖的零-shot SQA方法，与传统级联系统进行了比较。通过对新的开放基准的8个医学任务和48小时的合成音频进行全面评估，我们证明我们的方法比具有13亿参数LLM和15.5亿参数ASR模型的组合模型所需的资源少多达14.7倍，同时将平均准确率提高了0.5％。这些发现突显了E2E方法在资源受限环境中用于SQA的潜力。

更新时间: 2024-06-09 18:13:36

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.05876v1

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Security vulnerabilities in modern software are prevalent and harmful. While automated vulnerability detection tools have made promising progress, their scalability and applicability remain challenging. Recently, Large Language Models (LLMs), such as GPT-4 and CodeLlama, have demonstrated remarkable performance on code-related tasks. However, it is unknown whether such LLMs can do complex reasoning over code. In this work, we explore whether pre-trained LLMs can detect security vulnerabilities and address the limitations of existing tools. We evaluate the effectiveness of pre-trained LLMs, in terms of performance, explainability, and robustness, on a set of five diverse security benchmarks spanning two languages, Java and C/C++, and covering both synthetic and real-world projects. Overall, all LLMs show modest effectiveness in end-to-end reasoning about vulnerabilities, obtaining an average of 60% accuracy across all datasets. However, we observe that LLMs show promising abilities at performing parts of the analysis correctly, such as identifying vulnerability-related specifications (e.g., sources and sinks) and leveraging natural language information to understand code behavior (e.g., to check if code is sanitized). Further, LLMs are relatively much better at detecting simpler vulnerabilities that typically only need local reasoning (e.g., Integer Overflows and NULL pointer dereference). We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets (improving F1 score by up to 0.25 on average). Finally, we share our insights and recommendations for future work on leveraging LLMs for vulnerability detection.

Updated: 2024-06-09 18:12:48

标题: 理解大型语言模型在检测安全漏洞中的有效性

摘要: 现代软件中的安全漏洞是普遍存在且有害的。虽然自动漏洞检测工具取得了一些进展，但它们的可扩展性和适用性仍具挑战性。最近，大型语言模型（LLMs），如GPT-4和CodeLlama，在代码相关任务上表现出了显著的性能。然而，目前尚不清楚这些LLMs是否能够对代码进行复杂的推理。在这项工作中，我们探讨了预训练LLMs是否能够检测安全漏洞并解决现有工具的局限性。我们评估了预训练LLMs在两种语言（Java和C/C++）、涵盖合成和真实项目的五个不同安全基准上的效果，包括性能、解释性和鲁棒性。总体而言，所有LLMs在推理漏洞方面显示出了适度的效果，在所有数据集上平均获得了60%的准确率。然而，我们观察到LLMs在执行部分分析时表现出了有希望的能力，例如识别与漏洞相关的规范（例如源和接收器），并利用自然语言信息来理解代码行为（例如检查代码是否经过消毒）。此外，LLMs在检测通常只需要局部推理的简单漏洞（例如整数溢出和空指针解除引用）方面相对更好。我们发现，涉及逐步分析的先进提示策略显著提高了LLMs在真实世界数据集上的性能（平均提高了F1分数0.25）。最后，我们分享了关于未来利用LLMs进行漏洞检测的见解和建议。

更新时间: 2024-06-09 18:12:48

领域: cs.CR,cs.PL,cs.SE

下载: http://arxiv.org/abs/2311.16169v2

Stealthy Targeted Backdoor Attacks against Image Captioning

In recent years, there has been an explosive growth in multimodal learning. Image captioning, a classical multimodal task, has demonstrated promising applications and attracted extensive research attention. However, recent studies have shown that image caption models are vulnerable to some security threats such as backdoor attacks. Existing backdoor attacks against image captioning typically pair a trigger either with a predefined sentence or a single word as the targeted output, yet they are unrelated to the image content, making them easily noticeable as anomalies by humans. In this paper, we present a novel method to craft targeted backdoor attacks against image caption models, which are designed to be stealthier than prior attacks. Specifically, our method first learns a special trigger by leveraging universal perturbation techniques for object detection, then places the learned trigger in the center of some specific source object and modifies the corresponding object name in the output caption to a predefined target name. During the prediction phase, the caption produced by the backdoored model for input images with the trigger can accurately convey the semantic information of the rest of the whole image, while incorrectly recognizing the source object as the predefined target. Extensive experiments demonstrate that our approach can achieve a high attack success rate while having a negligible impact on model clean performance. In addition, we show our method is stealthy in that the produced backdoor samples are indistinguishable from clean samples in both image and text domains, which can successfully bypass existing backdoor defenses, highlighting the need for better defensive mechanisms against such stealthy backdoor attacks.

Updated: 2024-06-09 18:11:06

标题: 《针对图像字幕的隐秘目标后门攻击》

摘要: 近年来，多模态学习呈现爆炸式增长。图像字幕，作为经典的多模态任务，展示了有前途的应用并吸引了广泛的研究关注。然而，最近的研究表明，图像字幕模型容易受到一些安全威胁，例如后门攻击。现有的针对图像字幕的后门攻击通常将触发器与预定义的句子或单词配对作为目标输出，但与图像内容无关，因此易被人类注意到作为异常。在本文中，我们提出了一种新颖的方法，用于设计针对图像字幕模型的有针对性的后门攻击，旨在比先前的攻击更具隐蔽性。具体来说，我们的方法首先通过利用用于目标检测的通用扰动技术学习一个特殊的触发器，然后将学习到的触发器放置在某些特定源对象的中心，并将输出字幕中对应的对象名称修改为预定义的目标名称。在预测阶段，通过带有触发器的输入图像，后门模型生成的字幕能够准确传达整个图像的语义信息，同时错误地识别源对象为预定义的目标。广泛的实验表明，我们的方法可以实现高攻击成功率，同时对模型干净性能的影响微乎其微。此外，我们展示了我们的方法具有隐蔽性，即生成的后门样本在图像和文本领域都与干净样本无法区分，从而成功绕过现有的后门防御，突显了对此类隐蔽后门攻击的更好防御机制的需求。

更新时间: 2024-06-09 18:11:06

领域: cs.CR

下载: http://arxiv.org/abs/2406.05874v1

Conserving Human Creativity with Evolutionary Generative Algorithms: A Case Study in Music Generation

This study explores the application of evolutionary generative algorithms in music production to preserve and enhance human creativity. By integrating human feedback into Differential Evolution algorithms, we produced six songs that were submitted to international record labels, all of which received contract offers. In addition to testing the commercial viability of these methods, this paper examines the long-term implications of content generation using traditional machine learning methods compared with evolutionary algorithms. Specifically, as current generative techniques continue to scale, the potential for computer-generated content to outpace human creation becomes likely. This trend poses a risk of exhausting the pool of human-created training data, potentially forcing generative machine learning models to increasingly depend on their random input functions for generating novel content. In contrast to a future of content generation guided by aimless random functions, our approach allows for individualized creative exploration, ensuring that computer-assisted content generation methods are human-centric and culturally relevant through time.

Updated: 2024-06-09 18:11:05

标题: 用进化生成算法保护人类创造力：音乐生成的案例研究

摘要: 这项研究探讨了在音乐制作中应用进化生成算法以保护和增强人类创造力。通过将人类反馈整合到差分进化算法中，我们制作了六首歌曲，这些歌曲被提交给国际唱片公司，所有歌曲都收到了合同提议。除了测试这些方法的商业可行性外，本文还比较了使用传统机器学习方法和进化算法生成内容的长期影响。具体来说，随着当前的生成技术不断扩展，计算机生成的内容超越人类创作的潜力变得可能。这一趋势带来了一个风险，即耗尽人类创造的训练数据池，可能导致生成机器学习模型越来越依赖其随机输入函数来生成新颖内容。与未来由毫无目的的随机函数引导的内容生成相比，我们的方法允许进行个性化的创造性探索，确保计算机辅助内容生成方法在时间上是以人为中心的，并具有文化相关性。

更新时间: 2024-06-09 18:11:05

领域: cs.NE,cs.AI,math.OC

下载: http://arxiv.org/abs/2406.05873v1

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT-3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING's potential to serve as a sandbox environment for further research in self-supervised text-based RL.

Updated: 2024-06-09 18:07:47

标题: STARLING: 利用大型语言模型进行文本强化学习智能体的自监督训练

摘要: 交互式小说游戏已经成为提高基于语言的强化学习（RL）代理的泛化能力的重要应用。现有的交互式小说游戏环境是特定领域的或者生成耗时，不能训练RL代理以掌握特定技能。在这项工作中，我们引入了一个用于自监督RL的交互式环境STARLING，用于基于文本的游戏，通过自动生成的游戏（基于种子游戏想法集）来提升性能和泛化能力，以达到目标环境的目标。这些游戏让代理在预定义的一组任务上磨练他们的技能。我们使用这个自动化框架生成了100个游戏的环境，并进行了测试，该框架使用大型语言模型（GPT-3）和一个基于Inform7的交互式小说游戏引擎，为用户提供在最少人工监督下生成更多游戏的能力。基于人类参与者和基准文本RL代理的实验结果显示，当前最先进的基于文本的RL代理不能像人类那样在新情况下利用先前学到的技能。这些结果强调了STARLING在自监督文本RL进一步研究中的潜力作为一个沙盒环境。

更新时间: 2024-06-09 18:07:47

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05872v1

OmniControlNet: Dual-stage Integration for Conditional Image Generation

We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method and incorporating its individually trained image generation processes into a single model. Despite its tremendous success, the ControlNet of a two-stage pipeline bears limitations in being not self-contained (e.g. calls the external condition generation algorithms) with a large model redundancy (separately trained models for different types of conditioning inputs). Our proposed OmniControlNet consolidates 1) the condition generation (e.g., HED edges, depth maps, user scribble, and animal pose) by a single multi-tasking dense prediction algorithm under the task embedding guidance and 2) the image generation process for different conditioning types under the textual embedding guidance. OmniControlNet achieves significantly reduced model complexity and redundancy while capable of producing images of comparable quality for conditioned text-to-image generation.

Updated: 2024-06-09 18:03:47

标题: OmniControlNet：用于条件图像生成的双阶段集成

摘要: 我们提供了一种双向集成方法，将外部条件生成算法集成到单一密集预测方法中，并将其单独训练的图像生成过程合并到一个模型中。尽管ControlNet在两阶段管道中取得了巨大成功，但存在一些局限性，例如不是自包含的（例如调用外部条件生成算法），同时具有大型模型冗余（为不同类型的条件输入分别训练模型）。我们提出的OmniControlNet通过以下两个方面来实现整合：1）通过单一多任务密集预测算法在任务嵌入引导下整合条件生成（例如HED边缘、深度图、用户涂鸦和动物姿势）；2）在文本嵌入引导下整合不同类型条件的图像生成过程。OmniControlNet在显著减少模型复杂性和冗余的同时，能够生成与条件文本到图像生成相媲美质量的图像。

更新时间: 2024-06-09 18:03:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.05871v1

Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments

The localization of objects is a crucial task in various applications such as robotics, virtual and augmented reality, and the transportation of goods in warehouses. Recent advances in deep learning have enabled the localization using monocular visual cameras. While structure from motion (SfM) predicts the absolute pose from a point cloud, absolute pose regression (APR) methods learn a semantic understanding of the environment through neural networks. However, both fields face challenges caused by the environment such as motion blur, lighting changes, repetitive patterns, and feature-less structures. This study aims to address these challenges by incorporating additional information and regularizing the absolute pose using relative pose regression (RPR) methods. RPR methods suffer under different challenges, i.e., motion blur. The optical flow between consecutive images is computed using the Lucas-Kanade algorithm, and the relative pose is predicted using an auxiliary small recurrent convolutional network. The fusion of absolute and relative poses is a complex task due to the mismatch between the global and local coordinate systems. State-of-the-art methods fusing absolute and relative poses use pose graph optimization (PGO) to regularize the absolute pose predictions using relative poses. In this work, we propose recurrent fusion networks to optimally align absolute and relative pose predictions to improve the absolute pose prediction. We evaluate eight different recurrent units and construct a simulation environment to pre-train the APR and RPR networks for better generalized training. Additionally, we record a large database of different scenarios in a challenging large-scale indoor environment that mimics a warehouse with transportation robots. We conduct hyperparameter searches and experiments to show the effectiveness of our recurrent fusion method compared to PGO.

Updated: 2024-06-09 17:57:45

标题: 将运动结构与仿真增强的姿势回归与光流相结合，用于具有挑战性的室内环境

摘要: 物体的定位是各种应用中至关重要的任务，例如机器人技术、虚拟和增强现实以及仓库货物运输。深度学习的最新进展使得可以使用单目视觉摄像头进行定位。虽然运动结构（SfM）可以从点云预测绝对姿势，但绝对姿势回归（APR）方法通过神经网络学习环境的语义理解。然而，这两个领域都面临着由环境引起的挑战，如运动模糊、光照变化、重复图案和无特征结构。本研究旨在通过整合额外信息和使用相对姿势回归（RPR）方法来规范绝对姿势，以应对这些挑战。RPR方法在不同挑战下面临问题，例如运动模糊。通过Lucas-Kanade算法计算相邻图像之间的光流，并使用辅助的小型循环卷积网络预测相对姿势。由于全局和局部坐标系统之间的不匹配，融合绝对和相对姿势是一项复杂的任务。当前的融合绝对和相对姿势的最新方法使用姿势图优化（PGO）来通过相对姿势规范化绝对姿势预测。在这项工作中，我们提出了循环融合网络，以最佳方式对齐绝对和相对姿势预测，以改善绝对姿势预测。我们评估了八种不同的循环单元，并构建了一个模拟环境，为APR和RPR网络进行更好的泛化训练。此外，我们在一个具有挑战性的大规模室内环境中记录了不同场景的大型数据库，模拟了一个具有运输机器人的仓库。我们进行超参数搜索和实验，以展示我们的循环融合方法相对于PGO的有效性。

更新时间: 2024-06-09 17:57:45

领域: cs.CV,cs.AI,68U01,I.2.9; I.2.10; I.4.1; I.4.10; I.5.4

下载: http://arxiv.org/abs/2304.07250v4

Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents

Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with potentially untrusted content are vulnerable to a new class of denial-of-service attacks we call jamming. An adversary can add a single ``blocker'' document to the database that will be retrieved in response to a specific query and, furthermore, result in the RAG system not answering the query - ostensibly because it lacks the information or because the answer is unsafe. We describe and analyze several methods for generating blocker documents, including a new method based on black-box optimization that does not require the adversary to know the embedding or LLM used by the target RAG system, nor access to an auxiliary LLM to generate blocker documents. We measure the efficacy of the considered methods against several LLMs and embeddings, and demonstrate that the existing safety metrics for LLMs do not capture their vulnerability to jamming. We then discuss defenses against blocker documents.

Updated: 2024-06-09 17:55:55

标题: 机器对抗RAG：使用阻塞文档干扰检索增强生成

摘要: 检索增强生成（RAG）系统通过从知识数据库中检索相关文档，然后应用LLM对检索到的文档生成答案来响应查询。我们展示了在潜在存在不受信任内容的数据库上运行的RAG系统容易受到一种我们称之为阻塞的新型拒绝服务攻击的影响。攻击者可以向数据库中添加一个单一的“阻塞”文档，以响应特定查询而检索到，并且导致RAG系统不回答查询 - 表面上是因为缺乏信息或者答案不安全。我们描述并分析了几种生成阻塞文档的方法，包括一种基于黑盒优化的新方法，该方法不需要攻击者了解目标RAG系统使用的嵌入或LLM，也不需要访问辅助LLM来生成阻塞文档。我们衡量了考虑方法对几种LLM和嵌入的有效性，并展示现有的LLM安全度量标准并不能捕捉它们对阻塞的脆弱性。然后我们讨论了防御阻塞文档的方法。

更新时间: 2024-06-09 17:55:55

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05870v1

An Independence-promoting Loss for Music Generation with Language Models

Music generation schemes using language modeling rely on a vocabulary of audio tokens, generally provided as codes in a discrete latent space learnt by an auto-encoder. Multi-stage quantizers are often employed to produce these tokens, therefore the decoding strategy used for token prediction must be adapted to account for multiple codebooks: either it should model the joint distribution over all codebooks, or fit the product of the codebook marginal distributions. Modelling the joint distribution requires a costly increase in the number of auto-regressive steps, while fitting the product of the marginals yields an inexact model unless the codebooks are mutually independent. In this work, we introduce an independence-promoting loss to regularize the auto-encoder used as the tokenizer in language models for music generation. The proposed loss is a proxy for mutual information based on the maximum mean discrepancy principle, applied in reproducible kernel Hilbert spaces. Our criterion is simple to implement and train, and it is generalizable to other multi-stream codecs. We show that it reduces the statistical dependence between codebooks during auto-encoding. This leads to an increase in the generated music quality when modelling the product of the marginal distributions, while generating audio much faster than the joint distribution model.

Updated: 2024-06-09 17:55:51

标题: 使用语言模型的音乐生成中损失增强独立性

摘要: 使用语言建模的音乐生成方案依赖于一个音频标记词汇，通常作为由自动编码器学习的离散潜在空间中的代码提供。通常使用多阶量化器来生成这些标记，因此用于标记预测的解码策略必须适应多个码书：要么它应该建模所有码书的联合分布，要么适应码书边际分布的乘积。建模联合分布需要在自回归步骤的数量上昂贵的增加，而适应边际的乘积会产生一个不精确的模型，除非码书是相互独立的。在这项工作中，我们引入了一种促进独立性的损失，用于规范用作音乐生成语言模型中的标记器的自动编码器。所提出的损失是基于最大均值差异原理的互信息代理，应用于可重现的核希尔伯特空间。我们的标准易于实施和训练，并且可推广到其他多流编解码器。我们展示了在自动编码过程中减少码书之间的统计依赖性。这导致在建模边际分布的乘积时增加生成音乐的质量，同时比联合分布模型生成音频更快。

更新时间: 2024-06-09 17:55:51

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.02315v2

Procrastination Is All You Need: Exponent Indexed Accumulators for Floating Point, Posits and Logarithmic Numbers

This paper discusses a simple and effective method for the summation of long sequences of floating point numbers. The method comprises two phases: an accumulation phase where the mantissas of the floating point numbers are added to accumulators indexed by the exponents and a reconstruction phase where the actual summation result is finalised. Various architectural details are given for both FPGAs and ASICs including fusing the operation with a multiplier, creating efficient MACs. Some results are presented for FPGAs, including a tensor core capable of multiplying and accumulating two 4x4 matrices of bfloat16 values every clock cycle using ~6,400 LUTs + 64 DSP48 in AMD FPGAs at 700+ MHz. The method is then extended to posits and logarithmic numbers.

Updated: 2024-06-09 17:44:17

标题: 拖延是你所需要的一切：指数索引累加器用于浮点、Posits和对数数字

摘要: 这篇论文讨论了一种简单而有效的方法，用于对长序列的浮点数进行求和。该方法包括两个阶段：一个累积阶段，其中浮点数的尾数被加到由指数索引的累加器中，以及一个重建阶段，其中实际的求和结果被最终确定。针对FPGAs和ASICs提供了各种架构细节，包括将操作与乘法器融合在一起，创建高效的MACs。对于FPGAs提供了一些结果，包括在AMD FPGAs上使用~6,400个LUTs + 64个DSP48每个时钟周期能够乘积和累加两个bfloat16值的4x4矩阵的张量核心，频率为700+ MHz。然后将该方法扩展到posits和对数数字。

更新时间: 2024-06-09 17:44:17

领域: cs.CV,cs.AI,cs.AR

下载: http://arxiv.org/abs/2406.05866v1

Effective Causal Discovery under Identifiable Heteroscedastic Noise Model

Capturing the underlying structural causal relations represented by Directed Acyclic Graphs (DAGs) has been a fundamental task in various AI disciplines. Causal DAG learning via the continuous optimization framework has recently achieved promising performance in terms of both accuracy and efficiency. However, most methods make strong assumptions of homoscedastic noise, i.e., exogenous noises have equal variances across variables, observations, or even both. The noises in real data usually violate both assumptions due to the biases introduced by different data collection processes. To address the issue of heteroscedastic noise, we introduce relaxed and implementable sufficient conditions, proving the identifiability of a general class of SEM subject to these conditions. Based on the identifiable general SEM, we propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations. We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties and to learn a causal DAG from data with heteroscedastic variable noise under varying variance. We show significant empirical gains of the proposed approaches over state-of-the-art methods on both synthetic data and real data.

Updated: 2024-06-09 17:41:37

标题: 可识别异方差噪声模型下的有效因果发现

摘要: 捕捉由有向无环图（DAGs）表示的潜在结构因果关系是各种人工智能学科中的基本任务。最近，通过连续优化框架进行因果DAG学习在准确性和效率方面取得了令人期待的表现。然而，大多数方法都对同方差噪声做出了强烈假设，即外生噪声在变量、观测或两者之间具有相等的方差。真实数据中的噪声通常违反这两个假设，原因是不同数据收集过程引入的偏差。为了解决异方差噪声问题，我们引入了放松和可实现的充分条件，证明了在这些条件下一般类SEM的可辨识性。基于可辨识的一般SEM，我们提出了一种新的DAG学习公式，考虑了在变量和观测之间噪声方差的变化。然后，我们提出了一种有效的两阶段迭代DAG学习算法，以解决优化困难的增加，并从具有异方差变量噪声的数据中学习因果DAG。我们展示了所提出方法在合成数据和真实数据上相对于最先进方法的显著实证收益。

更新时间: 2024-06-09 17:41:37

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2312.12844v2

Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels

Domain adaptation is often hampered by exceedingly small target datasets and inaccessible source data. These conditions are prevalent in speech verification, where privacy policies and/or languages with scarce speech resources limit the availability of sufficient data. This paper explored techniques of sourcefree domain adaptation unto a limited target speech dataset for speaker verificationin data-scarce languages. Both language and channel mis-match between source and target were investigated. Fine-tuning methods were evaluated and compared across different sizes of labeled target data. A novel iterative cluster-learn algorithm was studied for unlabeled target datasets.

Updated: 2024-06-09 17:27:20

标题: 在数据稀缺语言和嘈杂通道中的说话人验证的无源域适应

摘要: 域自适应通常受到极小的目标数据集和无法访问的源数据的限制。这些条件在语音验证中普遍存在，隐私政策和/或语言资源稀缺限制了足够数据的可用性。本文探讨了将无源领域自适应技术应用到有限的目标语音数据集中进行说话人验证的方法，尤其是在语音资源稀缺的语言中。研究了源与目标之间的语言和通道不匹配情况。对不同规模的标记目标数据进行了细调方法的评估和比较。对未标记的目标数据集研究了一种新颖的迭代聚类学习算法。

更新时间: 2024-06-09 17:27:20

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.05863v1

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.

Updated: 2024-06-09 17:25:47

标题: II-Bench：一个用于多模态大语言模型的图像涵义理解基准

摘要: 多模态大型语言模型（MLLMs）的快速发展不断为各种基准测试带来新突破。为了更准确评估MLLMs的能力，人们提出了许多具有挑战性和全面性的基准测试。然而，对MLLMs的高阶感知能力的探索还不足。为了填补这一空白，我们提出了图像含义理解基准II-Bench，旨在评估模型对图像的高阶感知。通过对多个MLLMs在II-Bench上进行广泛实验，我们取得了重要发现。首先，在II-Bench上观察到MLLMs和人类表现之间存在显著差距。MLLMs的最高准确率达到74.8％，而人类准确率平均为90％，最高可达令人印象深刻的98％。随后，MLLMs在抽象和复杂图像上表现较差，表明它们理解高级语义和捕捉图像细节的能力存在局限性。最后，观察到大多数模型在提示中加入图像情感极性暗示时准确率提高。这一观察强调了它们对图像情感的内在理解存在明显不足。我们相信II-Bench将激发社区开发下一代MLLMs，推动通往专家人工通用智能（AGI）的进程。II-Bench可在https://huggingface.co/datasets/m-a-p/II-Bench上公开获取。

更新时间: 2024-06-09 17:25:47

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.05862v1

Mitigating spectral bias for the multiscale operator learning

Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs). In this work, we focus on multiscale PDEs that have important applications such as reservoir modeling and turbulence prediction. We demonstrate that for such PDEs, the spectral bias towards low-frequency components presents a significant challenge for existing neural operators. To address this challenge, we propose a hierarchical attention neural operator (HANO) inspired by the hierarchical matrix approach. HANO features a scale-adaptive interaction range and self-attentions over a hierarchy of levels, enabling nested feature computation with controllable linear cost and encoding/decoding of multiscale solution space. We also incorporate an empirical $H^1$ loss function to enhance the learning of high-frequency components. Our numerical experiments demonstrate that HANO outperforms state-of-the-art (SOTA) methods for representative multiscale problems.

Updated: 2024-06-09 17:13:08

标题: 缓解多尺度算子学习的光谱偏差

摘要: 神经算子已经成为学习偏微分方程（PDEs）的无限维参数和解空间之间映射的强大工具。在这项工作中，我们专注于具有重要应用，如油藏建模和湍流预测的多尺度PDEs。我们证明对于这种PDEs，对低频分量的谱偏差构成了现有神经算子的重大挑战。为了解决这一挑战，我们提出了一种受层次矩阵方法启发的分层注意神经算子（HANO）。HANO具有适应尺度的相互作用范围和自我注意力，能够在一系列级别上进行嵌套特征计算，具有可控的线性成本和对多尺度解空间的编码/解码。我们还结合了经验$H^1$损失函数来增强对高频分量的学习。我们的数值实验表明，HANO在代表性的多尺度问题上优于最先进的方法。

更新时间: 2024-06-09 17:13:08

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2210.10890v3

Comments on "Federated Learning with Differential Privacy: Algorithms and Performance Analysis"

In the paper by Wei et al. ("Federated Learning with Differential Privacy: Algorithms and Performance Analysis"), the convergence performance of the proposed differential privacy algorithm in federated learning (FL), known as Noising before Model Aggregation FL (NbAFL), was studied. However, the presented convergence upper bound of NbAFL (Theorem 2) is incorrect. This comment aims to present the correct form of the convergence upper bound for NbAFL.

Updated: 2024-06-09 17:03:56

标题: 对“具有差分隐私的联邦学习：算法和性能分析”一文的评论

摘要: 在魏等人的论文中（“具有差分隐私的联邦学习：算法和性能分析”），研究了提出的差分隐私算法在联邦学习（FL）中的收敛性能，该算法被称为在模型聚合之前添加噪音的FL（NbAFL）。然而，所提出的NbAFL的收敛性上限（定理2）是错误的。本评论旨在提出NbAFL的正确收敛性上限形式。

更新时间: 2024-06-09 17:03:56

领域: cs.DC,cs.CR,cs.PF

下载: http://arxiv.org/abs/2406.05858v1

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. Parameter-Efficient Fine-Tuning (PEFT) techniques for fine-tuning language models significantly reduce computational requirements by selectively fine-tuning small subsets of parameters. In this study, we propose a two-step PEFT framework and evaluate it in the clinical domain. Our approach combines a specialised PEFT adapter layer designed for clinical domain adaptation with another adapter specialised for downstream tasks. We evaluate the framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our framework achieves a better AUROC score averaged across all clinical downstream tasks compared to clinical language models. In particular, we observe large improvements of 4-5% AUROC in large-scale multilabel classification tasks, such as diagnoses and procedures classification. To our knowledge, this study is the first to provide an extensive empirical analysis of the interplay between PEFT techniques and domain adaptation in an important real-world domain of clinical applications.

Updated: 2024-06-09 17:00:36

标题: 在临床领域中参数高效微调LLaMA

摘要: 将预训练语言模型调整到新领域，如临床应用，传统上涉及重新训练其整套参数。参数高效微调（PEFT）技术可以通过有选择地微调小子集的参数显著减少计算需求。在这项研究中，我们提出了一个两步PEFT框架，并在临床领域进行评估。我们的方法结合了专门为临床领域适应而设计的PEFT适配器层和另一个专门为下游任务设计的适配器。我们在多个临床结果预测数据集上评估了该框架，将其与临床训练的语言模型进行了比较。我们的框架在所有临床下游任务的平均AUROC得分上表现优于临床语言模型。特别是，在大规模多标签分类任务（如诊断和程序分类）中，我们观察到4-5%的AUROC显著提高。据我们所知，这项研究是第一次在重要的临床应用领域提供了关于PEFT技术和领域适应相互作用的广泛实证分析。

更新时间: 2024-06-09 17:00:36

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2307.03042v3

Self-Distilled Disentangled Learning for Counterfactual Prediction

The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenarios, especially within high-dimensional spaces. To circumvent this challenge, we propose the Self-Distilled Disentanglement framework, referred to as $SD^2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs for high-dimensional representations. Our comprehensive experiments, conducted on both synthetic and real-world datasets, confirms the effectiveness of our approach in facilitating counterfactual inference in the presence of both observed and unobserved confounders.

Updated: 2024-06-09 16:58:19

标题: 自主提炼的解缠学习用于反事实预测

摘要: 解缠结表示学习的进展显著提高了反事实预测的准确性，通过精确控制工具变量、混淆因素和可调变量。一种吸引人的方法是通过互信息最小化来实现这些因素的独立分离，这是一个在许多机器学习场景中都面临挑战的任务，特别是在高维空间中。为了规避这一挑战，我们提出了自我提炼解缠结框架，称为$SD^2。基于信息论，它确保在高维表示中实现理论上合理的独立解缠结表示，而无需复杂的互信息估计器设计。我们在合成和真实数据集上进行了全面的实验，验证了我们的方法在存在观察和未观察混淆因素的情况下促进反事实推理的有效性。

更新时间: 2024-06-09 16:58:19

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.05855v1

Scaling Graph Convolutions for Mobile Vision

To compete with existing mobile architectures, MobileViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, MobileViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 $AP^{box}$ and 0.7 $AP^{mask}$, and MobileViGv2-B outperforms MobileViG-B by 1.0 $AP^{box}$ and 0.7 $AP^{mask}$. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% $mIoU$ and MobileViGv2-B achieves 44.3% $mIoU$. Our code can be found at \url{https://github.com/SLDGroup/MobileViGv2}.

Updated: 2024-06-09 16:49:19

标题: 将图卷积扩展应用于移动视觉

摘要: 为了与现有的移动架构竞争，MobileViG引入了Sparse Vision Graph Attention（SVGA），这是一种基于GNNs原理的快速令牌混合运算符。然而，MobileViG在模型大小方面缩放性较差，最多比延迟相似的模型落后1%。本文介绍了Mobile Graph Convolution（MGC），这是一个新的视觉图神经网络（ViG）模块，解决了这个缩放问题。我们提出的移动视觉架构MobileViGv2使用MGC来展示我们方法的有效性。MGC通过增加图的稀疏性并引入条件位置编码来改进SVGA。我们最小的模型MobileViGv2-Ti在ImageNet-1K上取得了77.7%的top-1准确率，比MobileViG-Ti高出2%，在iPhone 13 Mini NPU上的推理延迟为0.9毫秒。我们最大的模型MobileViGv2-B在ImageNet-1K上取得了83.4%的top-1准确率，比MobileViG-B高出0.8%，推理延迟为2.7毫秒。除了图像分类外，我们还展示了MobileViGv2对其他任务的泛化能力。在MS COCO 2017上进行目标检测和实例分割时，MobileViGv2-M的表现优于MobileViG-M，$AP^{box}$和$AP^{mask}$分别提高了1.2和0.7，MobileViGv2-B比MobileViG-B分别提高了1.0和0.7。在ADE20K上进行语义分割时，MobileViGv2-M达到了42.9%的$mIoU$，MobileViGv2-B达到了44.3%的$mIoU$。我们的代码可以在https://github.com/SLDGroup/MobileViGv2找到。

更新时间: 2024-06-09 16:49:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.05850v1

Localized Adaptive Risk Control

Adaptive Risk Control (ARC) is an online calibration strategy based on set prediction that offers worst-case deterministic long-term risk control, as well as statistical marginal coverage guarantees. ARC adjusts the size of the prediction set by varying a single scalar threshold based on feedback from past decisions. In this work, we introduce Localized Adaptive Risk Control (L-ARC), an online calibration scheme that targets statistical localized risk guarantees ranging from conditional risk to marginal risk, while preserving the worst-case performance of ARC. L-ARC updates a threshold function within a reproducing kernel Hilbert space (RKHS), with the kernel determining the level of localization of the statistical risk guarantee. The theoretical results highlight a trade-off between localization of the statistical risk and convergence speed to the long-term risk target. Thanks to localization, L-ARC is demonstrated via experiments to produce prediction sets with risk guarantees across different data subpopulations, significantly improving the fairness of the calibrated model for tasks such as image segmentation and beam selection in wireless networks.

Updated: 2024-06-09 16:23:49

标题: 局部自适应风险控制

摘要: 自适应风险控制（ARC）是一种基于集合预测的在线校准策略，提供最坏情况下的确定性长期风险控制，以及统计边际覆盖保证。ARC通过根据过去决策的反馈调整预测集的大小，通过改变单个标量阈值来实现。在这项工作中，我们介绍了局部自适应风险控制（L-ARC），这是一种针对从条件风险到边际风险的统计局部风险保证的在线校准方案，同时保持ARC的最坏情况表现。L-ARC在再生核希尔伯特空间（RKHS）内更新一个阈值函数，核决定了统计风险保证的本地化水平。理论结果突出了统计风险本地化和收敛速度到长期风险目标之间的权衡。由于本地化，通过实验证明L-ARC能够生成跨不同数据子群体的风险保证预测集，显着提高了校准模型在图像分割和无线网络中的光束选择等任务中的公平性。

更新时间: 2024-06-09 16:23:49

领域: stat.ML,cs.AI,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2405.07976v2

Nonlinear Distributionally Robust Optimization

This article focuses on a class of distributionally robust optimization (DRO) problems where, unlike the growing body of the literature, the objective function is potentially nonlinear in the distribution. Existing methods to optimize nonlinear functions in probability space use the Frechet derivatives, which present both theoretical and computational challenges. Motivated by this, we propose an alternative notion for the derivative and corresponding smoothness based on Gateaux (G)-derivative for generic risk measures. These concepts are explained via three running risk measure examples of variance, entropic risk, and risk on finite support sets. We then propose a G-derivative based Frank-Wolfe (FW) algorithm for generic nonlinear optimization problems in probability spaces and establish its convergence under the proposed notion of smoothness in a completely norm-independent manner. We use the set-up of the FW algorithm to devise a methodology to compute a saddle point of the nonlinear DRO problem. Finally, we validate our theoretical results on two cases of the entropic and variance risk measures in the context of portfolio selection problems. In particular, we analyze their regularity conditions and "sufficient statistic", compute the respective FW-oracle in various settings, and confirm the theoretical outcomes through numerical validation.

Updated: 2024-06-09 16:23:02

标题: 非线性分布鲁棒优化

摘要: 这篇文章关注的是一类分布鲁棒优化（DRO）问题，与日益增多的文献不同，目标函数在分布中可能是非线性的。现有的优化非线性函数在概率空间中的方法使用Frechet导数，这带来了理论和计算上的挑战。受此启发，我们提出了一种用于泛化风险度量的G-导数及相应平滑性的替代概念。通过方差、熵风险和有限支持集上的风险三个运行风险度量示例来解释这些概念。然后，我们提出了一种基于G-导数的Frank-Wolfe（FW）算法，用于概率空间中的泛化非线性优化问题，并在提出的平滑性概念下以完全与范数无关的方式建立其收敛性。我们利用FW算法的设置设计了一种计算非线性DRO问题鞍点的方法论。最后，我们在投资组合选择问题的背景下验证了我们的理论结果。特别是，我们分析了它们的正则性条件和“充分统计量”，在不同设置中计算了各自的FW-oracle，并通过数值验证证实了理论结果。

更新时间: 2024-06-09 16:23:02

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2306.03202v2

Minimum Variance Unbiased N:M Sparsity for the Neural Gradients

In deep learning, fine-grained N:M sparsity reduces the data footprint and bandwidth of a General Matrix multiply (GEMM) up to x2, and doubles throughput by skipping computation of zero values. So far, it was mainly only used to prune weights to accelerate the forward and backward phases. We examine how this method can be used also for the neural gradients (i.e., loss gradients with respect to the intermediate neural layer outputs). To this end, we first establish a tensor-level optimality criteria. Previous works aimed to minimize the mean-square-error (MSE) of each pruned block. We show that while minimization of the MSE works fine for pruning the weights and activations, it catastrophically fails for the neural gradients. Instead, we show that accurate pruning of the neural gradients requires an unbiased minimum-variance pruning mask. We design such specialized masks, and find that in most cases, 1:2 sparsity is sufficient for training, and 2:4 sparsity is usually enough when this is not the case. Further, we suggest combining several such methods together in order to potentially speed up training even more.

Updated: 2024-06-09 16:21:18

标题: 神经梯度的最小方差无偏N:M稀疏化

摘要: 在深度学习中，细粒度的N:M稀疏性将通用矩阵乘法（GEMM）的数据占用和带宽减少了一倍，并通过跳过零值的计算来提高吞吐量。到目前为止，这种方法主要仅用于修剪权重以加速前向和反向阶段。我们研究了如何将这种方法也用于神经梯度（即相对于中间神经层输出的损失梯度）。为此，我们首先建立了一个张量级别的最优标准。先前的研究旨在最小化每个修剪块的均方误差（MSE）。我们展示了虽然最小化MSE对于修剪权重和激活函数效果很好，但对于神经梯度却失败得很惨。相反，我们表明准确修剪神经梯度需要一个无偏的最小方差修剪掩模。我们设计了这样的专门掩模，并发现在大多数情况下，1:2的稀疏性对于训练已经足够，而当情况不是这样时，2:4的稀疏性通常足够。此外，我们建议将几种这样的方法结合在一起，以潜在地进一步加速训练。

更新时间: 2024-06-09 16:21:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2203.10991v3

Fight Back Against Jailbreaking via Prompt Adversarial Tuning

While Large Language Models (LLMs) have achieved tremendous success in various applications, they are also susceptible to jailbreak attacks. Several primary defense strategies have been proposed to protect LLMs from producing harmful information, mostly with a particular focus on harmful content filtering or heuristical defensive prompt designs. However, how to achieve intrinsic robustness through the prompts remains an open problem. In this paper, motivated by adversarial training paradigms for achieving reliable robustness, we propose an approach named Prompt Adversarial Tuning (PAT) that trains a prompt control attached to the user prompt as a guard prefix. To achieve our defense goal whilst maintaining natural performance, we optimize the control prompt with both adversarial and benign prompts. Comprehensive experiments show that our method is effective against both black-box and white-box attacks, reducing the success rate of advanced attacks to nearly 0 while maintaining the model's utility on the benign task. The proposed defense strategy incurs only negligible computational overhead, charting a new perspective for future explorations in LLM security. Our code is available at https://github.com/rain152/PAT.

Updated: 2024-06-09 16:18:46

标题: 通过及时的对抗调整来抵御越狱活动

摘要: 尽管大型语言模型(LLMs)在各种应用中取得了巨大成功，但它们也容易受到越狱攻击的影响。已经提出了几种主要的防御策略来保护LLMs免受产生有害信息的影响，主要集中在有害内容过滤或启发式的防御提示设计上。然而，如何通过提示实现内在的鲁棒性仍然是一个悬而未决的问题。在本文中，受到为实现可靠鲁棒性而提出的对抗训练范式的启发，我们提出了一种名为Prompt Adversarial Tuning (PAT)的方法，该方法训练一个附加到用户提示的提示控制作为防护前缀。为了实现我们的防御目标同时保持自然性能，我们通过对抗和良性提示优化控制提示。综合实验证明，我们的方法对黑盒和白盒攻击都有效，将高级攻击的成功率降低到接近0，同时保持模型在良性任务上的效用。所提出的防御策略仅产生可忽略的计算开销，为LLM安全领域的未来探索开辟了新的视角。我们的代码可在https://github.com/rain152/PAT 上找到。

更新时间: 2024-06-09 16:18:46

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2402.06255v2

Learning to Evaluate the Artness of AI-generated Images

Assessing the artness of AI-generated images continues to be a challenge within the realm of image generation. Most existing metrics cannot be used to perform instance-level and reference-free artness evaluation. This paper presents ArtScore, a metric designed to evaluate the degree to which an image resembles authentic artworks by artists (or conversely photographs), thereby offering a novel approach to artness assessment. We first blend pre-trained models for photo and artwork generation, resulting in a series of mixed models. Subsequently, we utilize these mixed models to generate images exhibiting varying degrees of artness with pseudo-annotations. Each photorealistic image has a corresponding artistic counterpart and a series of interpolated images that range from realistic to artistic. This dataset is then employed to train a neural network that learns to estimate quantized artness levels of arbitrary images. Extensive experiments reveal that the artness levels predicted by ArtScore align more closely with human artistic evaluation than existing evaluation metrics, such as Gram loss and ArtFID.

Updated: 2024-06-09 16:13:12

标题: 学习评估人工智能生成图像的艺术性

摘要: 评估人工智能生成的图像的艺术性仍然是图像生成领域中的一个挑战。大多数现有的度量指标无法用于执行实例级和无参考的艺术评估。本文提出了ArtScore，这是一种度量指标，旨在评估图像与艺术家（或相反，摄影作品）的真实艺术品相似程度，从而提供了一种新颖的艺术性评估方法。我们首先混合预训练的照片和艺术品生成模型，从而产生一系列混合模型。随后，我们利用这些混合模型生成展示不同艺术程度的图像，并带有伪标注。每个逼真的图像都有一个相应的艺术对应物和一系列从逼真到艺术化的插值图像。然后利用这个数据集训练一个神经网络，学习估计任意图像的量化艺术水平。广泛的实验表明，ArtScore预测的艺术水平与人类艺术评估更加接近，而不是现有的评估指标，例如Gram损失和ArtFID。

更新时间: 2024-06-09 16:13:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.04923v2

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching

The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.

Updated: 2024-06-09 16:13:02

标题: RNAFlow: 通过逆向折叠的流匹配进行RNA结构和序列设计

摘要: 随着RNA工程在不同生物应用中的日益重要，人们对开发基于结构的RNA设计的人工智能方法产生了兴趣。虽然扩散模型在蛋白质设计方面表现出色，但将它们应用于RNA设计却面临新的挑战，因为RNA具有构象的灵活性，而调整大型结构预测模型的计算成本也很高。为此，我们提出了RNAFlow，一种用于蛋白质条件下的RNA序列-结构设计的流匹配模型。其去噪网络集成了一个RNA逆向折叠模型和一个预训练的RosettaFold2NA网络，用于生成RNA序列和结构。在结构去噪过程中整合逆向折叠模型使我们能够简化训练，通过固定结构预测网络。我们进一步通过将其条件化为推断的构象集合来增强逆向折叠模型，以模拟动态的RNA构象。对蛋白质条件下的RNA结构和序列生成任务的评估显示，RNAFlow相对于现有的RNA设计方法具有优势。

更新时间: 2024-06-09 16:13:02

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2405.18768v2

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

We investigate the task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To this end, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic models. Roughly speaking, GIN implies the existence of a set $\mathcal{S}$ such that $\mathcal{S}$ is causally earlier (w.r.t. the causal ordering) than $\mathbf{Y}$, and that every active (collider-free) path between $\mathbf{Y}$ and $\mathbf{Z}$ must contain a node from $\mathcal{S}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the causal structure of a LiNGLaH is identifiable in light of GIN conditions. Experimental results show the effectiveness of the proposed method.

Updated: 2024-06-09 16:07:50

标题: 具有潜变量的因果结构估计的广义独立噪声条件

摘要: 我们研究了在存在潜变量的情况下学习因果结构的任务，包括定位潜变量并确定它们的数量，以及识别潜在变量和观察变量之间的因果关系。为此，我们提出了一个适用于包含潜变量的线性非高斯无环因果模型的广义独立噪声（GIN）条件，该条件建立了某些测量变量的线性组合与其他测量变量之间的独立性。具体来说，对于两个观察随机向量 $\bf{Y}$ 和 $\bf{Z}$，如果 $\omega^{\intercal}\mathbf{Y}$ 和 $\mathbf{Z}$ 是独立的，则 GIN 成立，其中 $\omega$ 是由 $\mathbf{Y}$ 和 $\mathbf{Z}$ 之间的交叉协方差确定的非零参数向量。然后我们给出了线性非高斯无环模型中 GIN 条件的必要和充分图形标准。粗略地说，GIN 意味着存在一个集合 $\mathcal{S}$，使得 $\mathcal{S}$ 在因果排序中早于 $\mathbf{Y}$，并且在 $\mathbf{Y}$ 和 $\mathbf{Z}$ 之间的每条活动（无碰撞）路径都必须包含来自 $\mathcal{S}$ 的节点。有趣的是，我们发现独立噪声条件（即，如果没有混杂因素，原因与从回归效应到原因的残差是独立的）可以看作是 GIN 的一个特例。通过GIN与潜在因果结构之间的联系，我们进一步利用所提出的GIN条件，结合精心设计的搜索程序，高效地估计线性、非高斯、潜在分层模型（LiNGLaHs），其中潜在混杂因素也可能存在因果关系，甚至可能遵循层次结构。我们展示了在GIN条件下 LiNGLaH 的因果结构是可识别的。实验结果显示了该方法的有效性。

更新时间: 2024-06-09 16:07:50

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2308.06718v2

MaLa-ASR: Multimedia-Assisted LLM-Based ASR

As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate textual keywords extracted from presentation slides to improve recognition of conference content. MaLa-ASR yields average WERs of 9.4% and 11.7% on the L95 and S95 subsets of the SlideSpeech corpus, representing a significant relative WER drop of 27.9% and 44.7% over the baseline model reported in SlideSpeech. MaLa-ASR underscores LLM's strong performance in speech tasks and the capability to integrate auxiliary information conveniently. By adding keywords to the input prompt, the biased word error rate (B-WER) reduces relatively by 46.0% and 44.2%, establishing a new SOTA on this dataset.

Updated: 2024-06-09 16:00:00

标题: MaLa-ASR: 多媒体辅助的基于LLM的ASR

摘要: 随着越来越多的信息丰富的数据如视频变得可用，利用多模态辅助信息来增强音频任务已经引起了广泛的研究兴趣。最近对基于LLM的音频模型的研究激增，为解决音频任务提供了新的视角。鉴于LLM可以灵活地接受多个输入，我们提出了MaLa-ASR，一种基于LLM的ASR模型，可以集成从演示幻灯片中提取的文本关键词，以改善对会议内容的识别。MaLa-ASR在SlideSpeech语料库的L95和S95子集上产生了平均WER分别为9.4%和11.7%，相对基线模型在SlideSpeech中报告的WER降低了27.9%和44.7%。MaLa-ASR突显了LLM在语音任务中的强大性能和方便集成辅助信息的能力。通过将关键词添加到输入提示中，偏差字错误率（B-WER）相对降低了46.0%和44.2%，在这个数据集上建立了一个新的SOTA。

更新时间: 2024-06-09 16:00:00

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2406.05839v1

Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning

With the great success of diffusion models (DMs) in generating realistic synthetic vision data, many researchers have investigated their potential in decision-making and control. Most of these works utilized DMs to sample directly from the trajectory space, where DMs can be viewed as a combination of dynamics models and policies. In this work, we explore how to decouple DMs' ability as dynamics models in fully offline settings, allowing the learning policy to roll out trajectories. As DMs learn the data distribution from the dataset, their intrinsic policy is actually the behavior policy induced from the dataset, which results in a mismatch between the behavior policy and the learning policy. We propose Dynamics Diffusion, short as DyDiff, which can inject information from the learning policy to DMs iteratively. DyDiff ensures long-horizon rollout accuracy while maintaining policy consistency and can be easily deployed on model-free algorithms. We provide theoretical analysis to show the advantage of DMs on long-horizon rollout over models and demonstrate the effectiveness of DyDiff in the context of offline reinforcement learning, where the rollout dataset is provided but no online environment for interaction. Our code is at https://github.com/FineArtz/DyDiff.

Updated: 2024-06-09 15:56:59

标题: 长期展望下的通过动力学扩散实现的离线强化学习

摘要: 随着扩散模型（DMs）在生成逼真的合成视觉数据方面取得巨大成功，许多研究人员已经开始探讨它们在决策和控制方面的潜力。这些工作大多利用DMs直接从轨迹空间中采样，其中DMs可以被视为动态模型和策略的组合。在这项工作中，我们探讨了如何在完全离线的设置中解耦DMs作为动态模型的能力，使学习策略能够展开轨迹。由于DMs从数据集中学习数据分布，它们的内在策略实际上是从数据集中诱导出的行为策略，这导致行为策略与学习策略之间存在不匹配。我们提出了Dynamics Diffusion，简称DyDiff，它可以迭代地从学习策略向DMs注入信息。DyDiff确保长时间横向展开的准确性，同时保持策略一致性，并且可以轻松部署在无模型算法上。我们提供了理论分析，展示了DMs在长时间横向展开上的优势，以及在离线强化学习环境中展示了DyDiff的有效性，其中提供了展开数据集，但没有在线环境进行交互。我们的代码位于https://github.com/FineArtz/DyDiff。

更新时间: 2024-06-09 15:56:59

领域: cs.LG

下载: http://arxiv.org/abs/2405.19189v2

Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation

In this report, we present our solution for the semantic segmentation in adverse weather, in UG2+ Challenge at CVPR 2024. To achieve robust and accurate segmentation results across various weather conditions, we initialize the InternImage-H backbone with pre-trained weights from the large-scale joint dataset and enhance it with the state-of-the-art Upernet segmentation method. Specifically, we utilize offline and online data augmentation approaches to extend the train set, which helps us to further improve the performance of the segmenter. As a result, our proposed solution demonstrates advanced performance on the test set and achieves 3rd position in this challenge.

Updated: 2024-06-09 15:56:35

标题: CVPR 2024 UG2+挑战赛全天候语义分割的解决方案

摘要: 在这份报告中，我们展示了我们在CVPR 2024的UG2+挑战中针对恶劣天气的语义分割的解决方案。为了在各种天气条件下实现稳健和准确的分割结果，我们使用来自大规模联合数据集的预训练权重来初始化InternImage-H骨干，并结合最先进的Upernet分割方法进行增强。具体而言，我们利用离线和在线数据增强方法来扩展训练集，这有助于进一步提高分割器的性能。因此，我们提出的解决方案在测试集上展示了先进的性能，并在这一挑战中取得了第三名。

更新时间: 2024-06-09 15:56:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05837v1

Improving Antibody Design with Force-Guided Sampling in Diffusion Models

Antibodies, crucial for immune defense, primarily rely on complementarity-determining regions (CDRs) to bind and neutralize antigens, such as viruses. The design of these CDRs determines the antibody's affinity and specificity towards its target. Generative models, particularly denoising diffusion probabilistic models (DDPMs), have shown potential to advance the structure-based design of CDR regions. However, only a limited dataset of bound antibody-antigen structures is available, and generalization to out-of-distribution interfaces remains a challenge. Physics based force-fields, which approximate atomic interactions, offer a coarse but universal source of information to better mold designs to target interfaces. Integrating this foundational information into diffusion models is, therefore, highly desirable. Here, we propose a novel approach to enhance the sampling process of diffusion models by integrating force field energy-based feedback. Our model, DiffForce, employs forces to guide the diffusion sampling process, effectively blending the two distributions. Through extensive experiments, we demonstrate that our method guides the model to sample CDRs with lower energy, enhancing both the structure and sequence of the generated antibodies.

Updated: 2024-06-09 15:50:35

标题: 在扩散模型中利用力导向抽样改进抗体设计

摘要: 抗体在免疫防御中起着至关重要的作用，主要依赖于互补决定区域（CDRs）来结合和中和病原体，如病毒。这些CDR的设计决定了抗体对其靶标的亲和力和特异性。生成模型，特别是去噪扩散概率模型（DDPMs），已显示出推进基于结构的CDR区域设计的潜力。然而，仅有有限的结合抗体-抗原结构数据集可用，并且对超出分布界面的泛化仍然是一个挑战。基于物理的力场，可以近似原子间的相互作用，为更好地塑造设计目标界面提供了粗略但普遍的信息源。因此，将这一基础信息整合到扩散模型中是非常可取的。在这里，我们提出了一种增强扩散模型采样过程的新方法，即整合力场能量反馈。我们的模型DiffForce利用力量来引导扩散采样过程，有效地融合了这两个分布。通过大量实验证明，我们的方法引导模型采样具有更低能量的CDRs，从而增强了生成抗体的结构和序列。

更新时间: 2024-06-09 15:50:35

领域: q-bio.QM,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2406.05832v1

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making

Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental approach as effective technique for improving coordination in multi-agent systems.

Updated: 2024-06-09 15:48:32

标题: CoMIX：一种用于高效分散协调和独立决策的多智能体强化学习训练架构

摘要: 强大的协调技能使代理能够在共享环境中协同运作，共同朝着共同目标努力，理想情况下，个体之间不会妨碍彼此的进展。为此，本文介绍了协调QMIX（CoMIX），这是一种新颖的分布式代理训练框架，通过灵活的策略实现了紧密协调，同时允许个体层面的独立决策。CoMIX将自私和合作行为建模为每个代理的决策过程中的增量步骤。这使得代理可以根据不同情况动态调整其行为，平衡独立性和合作性。在各种模拟环境中进行的实验表明，CoMIX在协作任务上优于基线。结果验证了我们的增量方法作为改善多代理系统协调的有效技术。

更新时间: 2024-06-09 15:48:32

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2308.10721v2

Latent Neural Operator for Solving Forward and Inverse PDE Problems

Neural operators effectively solve PDE problems from data without knowing the explicit equations, which learn the map from the input sequences of observed samples to the predicted values. Most existed works build the model in the original geometric space, leading to high computational costs when the number of sample points is large. We present the Latent Neural Operator (LNO) solving PDEs in the latent space. In particular, we first propose Physics-Cross-Attention (PhCA) transforming representation from the geometric space to the latent space, then learn the operator in the latent space, and finally recover the real-world geometric space via the inverse PhCA map. Our model retains flexibility that can decode values in any position not limited to locations defined in training set, and therefore can naturally perform interpolation and extrapolation tasks particularly useful for inverse problems. Moreover, the proposed LNO improves in both prediction accuracy and computational efficiency. Experiments show that LNO reduces the GPU memory by 50%, speeds up training 1.8 times, and reaches state-of-the-art accuracy on four out of six benchmarks for forward problems and a benchmark for inverse problem.

Updated: 2024-06-09 15:42:57

标题: 潜在神经运算符用于解决正向和反向PDE问题

摘要: 神经算子有效地从数据中解决了PDE问题，而无需了解显式方程，它学习了从输入序列的观测样本到预测值的映射。大多数现有的工作在原始几何空间中构建模型，导致样本点数量较大时计算成本高昂。我们提出了在潜在空间中解决PDE的Latent Neural Operator（LNO）。具体而言，我们首先提出了物理交叉注意力（PhCA），将表示从几何空间转换到潜在空间，然后在潜在空间中学习算子，最后通过逆PhCA映射恢复真实世界的几何空间。我们的模型保持了灵活性，可以解码任何位置的值，不仅限于训练集中定义的位置，因此可以自然地执行插值和外推任务，特别适用于逆问题。此外，提出的LNO在预测精度和计算效率上都有所提高。实验证明，LNO将GPU内存减少了50％，训练速度提高了1.8倍，并在六个前向问题基准测试中的四个以及一个逆问题基准测试中达到了最先进的准确性。

更新时间: 2024-06-09 15:42:57

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.03923v2

Beyond Gut Feel: Using Time Series Transformers to Find Investment Gems

This paper addresses the growing application of data-driven approaches within the Private Equity (PE) industry, particularly in sourcing investment targets (i.e., companies) for Venture Capital (VC) and Growth Capital (GC). We present a comprehensive review of the relevant approaches and propose a novel approach leveraging a Transformer-based Multivariate Time Series Classifier (TMTSC) for predicting the success likelihood of any candidate company. The objective of our research is to optimize sourcing performance for VC and GC investments by formally defining the sourcing problem as a multivariate time series classification task. We consecutively introduce the key components of our implementation which collectively contribute to the successful application of TMTSC in VC/GC sourcing: input features, model architecture, optimization target, and investor-centric data processing. Our extensive experiments on two real-world investment tasks, benchmarked towards three popular baselines, demonstrate the effectiveness of our approach in improving decision making within the VC and GC industry.

Updated: 2024-06-09 15:42:47

标题: 超越直觉：利用时间序列转换器发现投资宝石

摘要: 本文讨论了数据驱动方法在私募股权（PE）行业中日益增长的应用，特别是在为风险投资（VC）和成长资本（GC）寻找投资目标（即公司）方面。我们对相关方法进行了全面审查，并提出了一种新颖方法，利用基于Transformer的多变量时间序列分类器（TMTSC）来预测任何候选公司的成功可能性。我们的研究目标是通过将寻源问题正式定义为多变量时间序列分类任务，优化VC和GC投资的寻源绩效。我们依次介绍了我们实现的关键组件，这些组件共同促成了TMTSC在VC/GC寻源中的成功应用：输入特征、模型架构、优化目标和投资者中心的数据处理。我们在两个真实投资任务上进行了广泛的实验，与三个流行的基准进行了对比，证明了我们的方法在改善VC和GC行业内的决策制定方面的有效性。

更新时间: 2024-06-09 15:42:47

领域: cs.LG,cs.AI,cs.CE,q-fin.PM,91B84 (Primary) 68T07 (Secondary),I.2.6; I.2.1; H.4.0

下载: http://arxiv.org/abs/2309.16888v2

Probabilistic Approach to Black-Box Binary Optimization with Budget Constraints: Application to Sensor Placement

We present a fully probabilistic approach for solving binary optimization problems with black-box objective functions and with budget constraints. In the probabilistic approach, the optimization variable is viewed as a random variable and is associated with a parametric probability distribution. The original optimization problem is replaced with an optimization over the expected value of the original objective, which is then optimized over the probability distribution parameters. The resulting optimal parameter (optimal policy) is used to sample the binary space to produce estimates of the optimal solution(s) of the original binary optimization problem. The probability distribution is chosen from the family of Bernoulli models because the optimization variable is binary. The optimization constraints generally restrict the feasibility region. This can be achieved by modeling the random variable with a conditional distribution given satisfiability of the constraints. Thus, in this work we develop conditional Bernoulli distributions to model the random variable conditioned by the total number of nonzero entries, that is, the budget constraint. This approach (a) is generally applicable to binary optimization problems with nonstochastic black-box objective functions and budget constraints; (b) accounts for budget constraints by employing conditional probabilities that sample only the feasible region and thus considerably reduces the computational cost compared with employing soft constraints; and (c) does not employ soft constraints and thus does not require tuning of a regularization parameter, for example to promote sparsity, which is challenging in sensor placement optimization problems. The proposed approach is verified numerically by using an idealized bilinear binary optimization problem and is validated by using a sensor placement experiment in a parameter identification setup.

Updated: 2024-06-09 15:37:28

标题: 概率方法在具有预算约束的黑盒二进制优化中的应用：传感器位置布置

摘要: 我们提出了一种完全概率化的方法，用于解决具有黑盒目标函数和预算约束的二元优化问题。在概率化方法中，优化变量被视为随机变量，并与参数化概率分布相关联。原始优化问题被替换为对原始目标的期望值进行优化，然后再对概率分布参数进行优化。得到的最优参数（最优策略）被用于对二元空间进行抽样，从而产生原始二元优化问题的最优解的估计。概率分布从伯努利模型族中选择，因为优化变量是二元的。优化约束通常限制了可行域。这可以通过对满足约束条件的情况下对随机变量进行建模来实现。因此，在这项工作中，我们开发了条件伯努利分布，以模拟受总非零条目数量（即预算约束）约束的随机变量。这种方法(a)通常适用于具有非随机黑盒目标函数和预算约束的二元优化问题；(b)通过使用只对可行区域进行抽样的条件概率，考虑了预算约束，因此与使用软约束相比，大大降低了计算成本；(c)不使用软约束，因此不需要调整正则化参数，例如促进稀疏性，在传感器布置优化问题中这是具有挑战性的。通过使用理想的双线性二元优化问题进行数值验证，并通过在参数识别设置中使用传感器布置实验进行验证。

更新时间: 2024-06-09 15:37:28

领域: math.OC,cs.CE,cs.LG,math.CO,stat.AP,90C27, 60C05, 62K05, 35R30, 35Q93, 65C60, 93E35

下载: http://arxiv.org/abs/2406.05830v1

FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering

Federated Learning (FL) is a machine learning paradigm that safeguards privacy by retaining client data on edge devices. However, optimizing FL in practice can be challenging due to the diverse and heterogeneous nature of the learning system. Though recent research has focused on improving the optimization of FL when distribution shifts occur among clients, ensuring global performance when multiple types of distribution shifts occur simultaneously among clients -- such as feature distribution shift, label distribution shift, and concept shift -- remain under-explored. In this paper, we identify the learning challenges posed by the simultaneous occurrence of diverse distribution shifts and propose a clustering principle to overcome these challenges. Through our research, we find that existing methods fail to address the clustering principle. Therefore, we propose a novel clustering algorithm framework, dubbed as FedRC, which adheres to our proposed clustering principle by incorporating a bi-level optimization problem and a novel objective function. Extensive experiments demonstrate that FedRC significantly outperforms other SOTA cluster-based FL methods. Our code is available at \url{https://github.com/LINs-lab/FedRC}.

Updated: 2024-06-09 15:36:10

标题: FedRC：通过强健聚类解决联邦学习中多样分布转移挑战

摘要: 联邦学习（FL）是一种通过在边缘设备上保留客户数据来保护隐私的机器学习范例。然而，在实践中优化FL可能具有挑战性，因为学习系统的多样性和异质性。尽管最近的研究集中在改善当客户之间发生分布转移时FL的优化，但在客户之间同时发生多种类型的分布转移时确保全局性能（如特征分布转移、标签分布转移和概念转移）仍未得到充分探讨。在本文中，我们确定了同时出现多种分布转移所引发的学习挑战，并提出了一种聚类原则来克服这些挑战。通过我们的研究，我们发现现有方法未能解决聚类原则。因此，我们提出了一种新颖的聚类算法框架，名为FedRC，通过整合一个双层优化问题和一个新颖的目标函数来符合我们提出的聚类原则。大量实验证明，FedRC在集群为基础的FL方法中明显优于其他最先进的方法。我们的代码可在\url{https://github.com/LINs-lab/FedRC}上找到。

更新时间: 2024-06-09 15:36:10

领域: cs.LG

下载: http://arxiv.org/abs/2301.12379v4

Multi-Stain Multi-Level Convolutional Network for Multi-Tissue Breast Cancer Image Segmentation

Digital pathology and microscopy image analysis are widely employed in the segmentation of digitally scanned IHC slides, primarily to identify cancer and pinpoint regions of interest (ROI) indicative of tumor presence. However, current ROI segmentation models are either stain-specific or suffer from the issues of stain and scanner variance due to different staining protocols or modalities across multiple labs. Also, tissues like Ductal Carcinoma in Situ (DCIS), acini, etc. are often classified as Tumors due to their structural similarities and color compositions. In this paper, we proposed a novel convolutional neural network (CNN) based Multi-class Tissue Segmentation model for histopathology whole-slide Breast slides which classify tumors and segments other tissue regions such as Ducts, acini, DCIS, Squamous epithelium, Blood Vessels, Necrosis, etc. as a separate class. Our unique pixel-aligned non-linear merge across spatial resolutions empowers models with both local and global fields of view for accurate detection of various classes. Our proposed model is also able to separate bad regions such as folds, artifacts, blurry regions, bubbles, etc. from tissue regions using multi-level context from different resolutions of WSI. Multi-phase iterative training with context-aware augmentation and increasing noise was used to efficiently train a multi-stain generic model with partial and noisy annotations from 513 slides. Our training pipeline used 12 million patches generated using context-aware augmentations which made our model stain and scanner invariant across data sources. To extrapolate stain and scanner invariance, our model was evaluated on 23000 patches which were for a completely new stain (Hematoxylin and Eosin) from a completely new scanner (Motic) from a different lab. The mean IOU was 0.72 which is on par with model performance on other data sources and scanners.

Updated: 2024-06-09 15:35:49

标题: 多染料多级卷积网络用于多组织乳腺癌图像分割

摘要: 数字病理学和显微镜图像分析广泛应用于数字扫描的免疫组化（IHC）幻灯片的分割，主要用于识别癌症并确定提示肿瘤存在的感兴趣区域（ROI）。然而，当前的ROI分割模型要么是染料特异性的，要么受不同实验室间不同染色协议或模式引起的染料和扫描仪差异的影响。此外，由于其结构相似性和色彩组合，组织如局部浸润性导管癌（DCIS）、腺泡等经常被归类为肿瘤。在本文中，我们提出了一种基于卷积神经网络（CNN）的多类组织分割模型，用于对组织病理学全幻灯片乳腺切片进行分类肿瘤并分割其他组织区域，如导管、腺泡、DCIS、鳞状上皮、血管、坏死等作为单独的类别。我们独特的像素对齐非线性合并跨空间分辨率使模型具有局部和全局视野，以便准确检测各种类别。我们提出的模型还能够使用来自WSI不同分辨率的多级上下文，将褶皱、伪影、模糊区域、气泡等坏区域与组织区域分开。我们使用具有部分和嘈杂注释的513个幻灯片生成的1200万个补丁进行多阶段迭代训练，利用上下文感知增强和增加噪声有效训练多染料通用模型。我们的训练流程能够使模型在数据源之间具有染料和扫描仪不变性。为了推断染料和扫描仪不变性，我们在全新染色（溴甲蓝和伊红）和全新扫描仪（Motic）的不同实验室的23000个补丁上评估了我们的模型。平均IOU为0.72，与模型在其他数据源和扫描仪上的性能相当。

更新时间: 2024-06-09 15:35:49

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.05828v1

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

Deep neural networks are susceptible to backdoor attacks, where adversaries manipulate model predictions by inserting malicious samples into the training data. Currently, there is still a lack of direct filtering methods for identifying suspicious training data to unveil potential backdoor samples. In this paper, we propose a novel method, Prediction Shift Backdoor Detection (PSBD), leveraging an uncertainty-based approach requiring minimal unlabeled clean validation data. PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels with dropout applied during inference, while backdoor samples exhibit less PS. We hypothesize PS results from neuron bias effect, making neurons favor features of certain classes. PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference. Extensive experiments have been conducted to verify the effectiveness and efficiency of PSBD, which achieves state-of-the-art results among mainstream detection methods.

Updated: 2024-06-09 15:31:00

标题: PSBD：预测偏移不确定性解锁后门检测

摘要: 深度神经网络容易受到后门攻击，即对手通过将恶意样本插入训练数据来操纵模型预测。目前，仍然缺乏直接过滤方法来识别可疑训练数据以揭示潜在的后门样本。在本文中，我们提出了一种新颖的方法，称为Prediction Shift Backdoor Detection（PSBD），利用基于不确定性的方法，需要最少量的无标签干净验证数据。PSBD的动机是一个有趣的Prediction Shift（PS）现象，即在推断过程中应用了辍学时，被毒害模型对干净数据的预测经常偏离真实标签，而向某些其他标签偏移，而后门样本表现出较少的PS。我们假设PS是由神经元偏向效应造成的，使神经元偏爱某些类的特征。PSBD通过计算Prediction Shift Uncertainty（PSU）来识别后门训练样本，即在模型推断过程中打开和关闭辍学层时概率值的方差。我们进行了大量实验来验证PSBD的有效性和效率，PSBD在主流检测方法中取得了最先进的结果。

更新时间: 2024-06-09 15:31:00

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.05826v1

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine. Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges. In this paper, we introduce BioMistral, an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central. We conduct a comprehensive evaluation of BioMistral on a benchmark comprising 10 established medical question-answering (QA) tasks in English. We also explore lightweight models obtained through quantization and model merging approaches. Our results demonstrate BioMistral's superior performance compared to existing open-source medical models and its competitive edge against proprietary counterparts. Finally, to address the limited availability of data beyond English and to assess the multilingual generalization of medical LLMs, we automatically translated and evaluated this benchmark into 7 other languages. This marks the first large-scale multilingual evaluation of LLMs in the medical domain. Datasets, multilingual evaluation benchmarks, scripts, and all the models obtained during our experiments are freely released.

Updated: 2024-06-09 15:19:09

标题: BioMistral：用于医学领域的开源预训练大型语言模型集合

摘要: 在最近几年，大型语言模型(LLMs)展示了非凡的多功能性，提供了在专业领域如医疗保健和医学中的潜在应用。尽管有各种针对健康领域定制的开源LLMs可用，但将通用LLMs调整到医学领域中仍然存在重大挑战。在本文中，我们介绍了BioMistral，一个专为生物医学领域定制的开源LLM，利用Mistral作为其基础模型，并在PubMed Central上进一步预训练。我们对BioMistral在包括10个已建立的医学问答(QA)任务的基准上进行了全面评估。我们还探讨了通过量化和模型合并方法获得的轻量级模型。我们的结果表明，与现有的开源医学模型相比，BioMistral表现出卓越的性能，并且在专有对手中具有竞争优势。最后，为了解决除英语以外的数据有限可用性，并评估医学LLMs的多语言概括能力，我们自动翻译并评估了这个基准到其他7种语言。这标志着医学领域中LLMs的首次大规模多语言评估。我们在实验中获得的数据集、多语言评估基准、脚本和所有模型都已免费发布。

更新时间: 2024-06-09 15:19:09

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.10373v2

Symmetric Matrix Completion with ReLU Sampling

We study the problem of symmetric positive semi-definite low-rank matrix completion (MC) with deterministic entry-dependent sampling. In particular, we consider rectified linear unit (ReLU) sampling, where only positive entries are observed, as well as a generalization to threshold-based sampling. We first empirically demonstrate that the landscape of this MC problem is not globally benign: Gradient descent (GD) with random initialization will generally converge to stationary points that are not globally optimal. Nevertheless, we prove that when the matrix factor with a small rank satisfies mild assumptions, the nonconvex objective function is geodesically strongly convex on the quotient manifold in a neighborhood of a planted low-rank matrix. Moreover, we show that our assumptions are satisfied by a matrix factor with i.i.d. Gaussian entries. Finally, we develop a tailor-designed initialization for GD to solve our studied formulation, which empirically always achieves convergence to the global minima. We also conduct extensive experiments and compare MC methods, investigating convergence and completion performance with respect to initialization, noise level, dimension, and rank.

Updated: 2024-06-09 15:14:53

标题: 使用ReLU采样的对称矩阵补全

摘要: 我们研究了对称正半定低秩矩阵完成（MC）问题，采用确定性入侵依赖采样。特别是，我们考虑修正线性单元（ReLU）采样，只观察正条目，以及基于阈值的采样的泛化。我们首先经验性地证明了这个MC问题的梯度下降（GD）景观并非全局良性：具有随机初始化的GD通常会收敛到不是全局最优的稳定点。然而，我们证明了当具有小秩的矩阵因子满足温和假设时，非凸目标函数在种植低秩矩阵的商流形附近是测地强凸的。此外，我们展示了我们的假设被具有i.i.d.高斯条目的矩阵因子满足。最后，我们为GD开发了一种量身定制的初始化方法，以解决我们研究的表达式，经验上总是实现对全局最小值的收敛。我们还进行了广泛的实验，并比较了MC方法，研究了与初始化、噪声水平、维度和秩相关的收敛和完成性能。

更新时间: 2024-06-09 15:14:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.05822v1

How Important Is Tokenization in French Medical Masked Language Models?

Subword tokenization has become the prevailing standard in the field of natural language processing (NLP) over recent years, primarily due to the widespread utilization of pre-trained language models. This shift began with Byte-Pair Encoding (BPE) and was later followed by the adoption of SentencePiece and WordPiece. While subword tokenization consistently outperforms character and word-level tokenization, the precise factors contributing to its success remain unclear. Key aspects such as the optimal segmentation granularity for diverse tasks and languages, the influence of data sources on tokenizers, and the role of morphological information in Indo-European languages remain insufficiently explored. This is particularly pertinent for biomedical terminology, characterized by specific rules governing morpheme combinations. Despite the agglutinative nature of biomedical terminology, existing language models do not explicitly incorporate this knowledge, leading to inconsistent tokenization strategies for common terms. In this paper, we seek to delve into the complexities of subword tokenization in French biomedical domain across a variety of NLP tasks and pinpoint areas where further enhancements can be made. We analyze classical tokenization algorithms, including BPE and SentencePiece, and introduce an original tokenization strategy that integrates morpheme-enriched word segmentation into existing tokenization methods.

Updated: 2024-06-09 15:11:31

标题: 法国医用口罩语言模型中的标记化有多重要？

摘要: 子词分词近年来已成为自然语言处理领域的主流标准，主要是由于预训练语言模型的广泛使用。这种转变始于字节对编码（Byte-Pair Encoding，BPE），后来又采用了SentencePiece和WordPiece。尽管子词分词始终优于字符和单词级别的分词，但其成功的确切因素仍不清楚。诸如不同任务和语言的最佳分割粒度、数据源对分词器的影响以及形态信息在印欧语言中的作用等关键方面仍未得到充分探讨。这对生物医学术语特别重要，其特点是具有控制形态素组合的特定规则。尽管生物医学术语具有粘着性，现有语言模型并未明确融入这一知识，导致常见术语的分词策略不一致。在本文中，我们试图深入研究法语生物医学领域子词分词的复杂性，涵盖各种自然语言处理任务，并指出可以进一步改进的领域。我们分析了经典的分词算法，包括BPE和SentencePiece，并引入了一种原创的分词策略，将富有形态素的单词分割集成到现有的分词方法中。

更新时间: 2024-06-09 15:11:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.15010v2

Attention as a Hypernetwork

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a low-dimensional latent code specifies key-query specific operations. We find empirically that this latent code is highly structured, capturing information about the subtasks performed by the network. Using the framework of attention as a hypernetwork we further propose a simple modification of multi-head linear attention that strengthens the ability for compositional generalization on a range of abstract reasoning tasks. In particular, we introduce a symbolic version of the Raven Progressive Matrices human intelligence test on which we demonstrate how scaling model size and data enables compositional generalization and gives rise to a functionally structured latent code in the transformer.

Updated: 2024-06-09 15:08:00

标题: 关注作为一个超网络

摘要: 在某些情况下，transformers可以推广到在训练过程中可能遇到其组成部分的新问题实例，但其组合方式尚未遇到。是什么机制支持这种能力进行组合泛化？通过将多头注意力重新制定为超网络，我们发现一个低维潜变量代码指定了关键查询特定操作。实证发现这个潜变量代码高度结构化，捕捉了网络执行的子任务信息。在将注意力作为超网络的框架下，我们进一步提出了一个简单的多头线性注意力的修改，增强了在一系列抽象推理任务上进行组合泛化的能力。特别是，我们引入了一个象征性版本的Raven渐进矩阵人类智力测试，展示了如何通过扩展模型大小和数据使transformer中的功能性结构化潜变量代码实现组合泛化。

更新时间: 2024-06-09 15:08:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.05816v1

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance of state-of-the-art models in zero- and few-shot scenarios for most tasks, and particularly well for the QA task, even though they have never seen examples from these tasks before. However, we observed that the classification and RE tasks perform below what can be achieved with a specifically trained model for the medical field, such as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all the studied tasks, with some models being better suited for certain tasks than others.

Updated: 2024-06-09 15:06:57

标题: 一项关于指导调优大型语言模型在临床和生物医学任务中的零样本和少样本研究

摘要: 我们评估了四种最先进的指令调整的大型语言模型(LLMs)——ChatGPT、Flan-T5 UL2、Tk-Instruct和Alpaca——在一组包括13个真实世界的临床和生物医学自然语言处理(NLP)任务中的表现，这些任务用英语完成，例如命名实体识别(NER)、问答(QA)、关系抽取(RE)等。我们的整体结果表明，评估的LLMs在大多数任务的零点和少点情况下开始接近最先进模型的表现，尤其在问答任务中表现得很好，尽管它们从未在这些任务中看到过示例。然而，我们观察到分类和关系抽取任务的表现低于专门针对医学领域训练的模型，如PubMedBERT。最后，我们注意到没有任何LLM在所有研究任务中都胜过其他模型，有些模型更适合某些任务。

更新时间: 2024-06-09 15:06:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2307.12114v3

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs, presenting a novel environment for testing the safety generalization of LLMs. Our comprehensive studies on state-of-the-art LLMs including GPT-4, Claude-2, and Llama-2 series reveal a new and universal safety vulnerability of these models against code input: CodeAttack bypasses the safety guardrails of all models more than 80\% of the time. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization, such as encoding natural language input with data structures. Furthermore, we give our hypotheses about the success of CodeAttack: the misaligned bias acquired by LLMs during code training, prioritizing code completion over avoiding the potential safety risk. Finally, we analyze potential mitigation measures. These findings highlight new safety risks in the code domain and the need for more robust safety alignment algorithms to match the code capabilities of LLMs.

Updated: 2024-06-09 15:04:34

标题: CodeAttack：通过代码补全揭示大型语言模型的安全泛化挑战

摘要: 大语言模型（LLMs）的快速发展带来了卓越的生成能力，但也引发了人们对其潜在滥用的担忧。尽管监督微调和从人类反馈中进行增强学习等策略增强了它们的安全性，但这些方法主要集中在自然语言上，可能无法推广到其他领域。本文介绍了CodeAttack，这是一个将自然语言输入转换为代码输入的框架，为测试LLMs的安全泛化提供了一个新颖的环境。我们对包括GPT-4、Claude-2和Llama-2系列在内的最先进LLMs进行了全面研究，发现这些模型存在一种新的普遍安全漏洞：CodeAttack在超过80%的时间内绕过了所有模型的安全防护。我们发现，CodeAttack和自然语言之间的较大分布差距会导致安全泛化能力较弱，例如使用数据结构对自然语言输入进行编码。此外，我们提出了关于CodeAttack成功的假设：LLMs在代码训练过程中获得的失调偏见，优先考虑代码完成而不是避免潜在安全风险。最后，我们分析了潜在的缓解措施。这些发现突显了代码领域的新安全风险，以及需要更强大的安全对齐算法来匹配LLMs的代码能力。

更新时间: 2024-06-09 15:04:34

领域: cs.CL,cs.AI,cs.CR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2403.07865v4

What Can We Learn from State Space Models for Machine Learning on Graphs?

Machine learning on graphs has recently found extensive applications across domains. However, the commonly used Message Passing Neural Networks (MPNNs) suffer from limited expressive power and struggle to capture long-range dependencies. Graph transformers offer a strong alternative due to their global attention mechanism, but they come with great computational overheads, especially for large graphs. In recent years, State Space Models (SSMs) have emerged as a compelling approach to replace full attention in transformers to model sequential data. It blends the strengths of RNNs and CNNs, offering a) efficient computation, b) the ability to capture long-range dependencies, and c) good generalization across sequences of various lengths. However, extending SSMs to graph-structured data presents unique challenges due to the lack of canonical node ordering in graphs. In this work, we propose Graph State Space Convolution (GSSC) as a principled extension of SSMs to graph-structured data. By leveraging global permutation-equivariant set aggregation and factorizable graph kernels that rely on relative node distances as the convolution kernels, GSSC preserves all three advantages of SSMs. We demonstrate the provably stronger expressiveness of GSSC than MPNNs in counting graph substructures and show its effectiveness across 10 real-world, widely used benchmark datasets, where GSSC achieves best results on 7 out of 10 datasets with all significant improvements compared to the state-of-the-art baselines and second-best results on the other 3 datasets. Our findings highlight the potential of GSSC as a powerful and scalable model for graph machine learning. Our code is available at https://github.com/Graph-COM/GSSC.

Updated: 2024-06-09 15:03:36

标题: 我们可以从状态空间模型中学到什么，用于图机器学习？

摘要: 最近，在图上的机器学习已经在各个领域得到了广泛的应用。然而，常用的消息传递神经网络（MPNNs）存在表达能力有限的问题，并且难以捕捉长距离依赖关系。图变换器由于其全局注意力机制，提供了一个强大的替代方案，但是它们带来了巨大的计算开销，尤其是对于大型图。近年来，状态空间模型（SSMs）已经成为一种有力的方法，用于替代变换器中的全注意力来建模序列数据。它结合了RNNs和CNNs的优势，提供了a）高效的计算，b）捕捉长距离依赖关系的能力，以及c）对各种长度序列的良好泛化能力。然而，将SSMs扩展到图结构数据面临着独特的挑战，因为图中缺乏规范的节点排序。在这项工作中，我们提出了图状态空间卷积（GSSC）作为对图结构数据进行SSMs的原则性扩展。通过利用全局置换等变集合聚合和依赖于相对节点距离的卷积核的可因子化图核，GSSC保留了SSMs的所有三个优势。我们证明了GSSC比MPNNs在计数图子结构方面具有更强的表达能力，并且展示了它在10个真实世界广泛使用的基准数据集上的有效性，其中GSSC在10个数据集中有7个取得了最佳结果，并且相较于最先进的基线有显著的改进，并且在另外3个数据集上取得了次优结果。我们的研究结果突显了GSSC作为图机器学习的一个强大且可扩展的模型的潜力。我们的代码可以在https://github.com/Graph-COM/GSSC上找到。

更新时间: 2024-06-09 15:03:36

领域: cs.LG

下载: http://arxiv.org/abs/2406.05815v1

Unified Text-to-Image Generation and Retrieval

How humans can efficiently and effectively acquire images has always been a perennial question. A typical solution is text-to-image retrieval from an existing database given the text query; however, the limited database typically lacks creativity. By contrast, recent breakthroughs in text-to-image generation have made it possible to produce fancy and diverse visual content, but it faces challenges in synthesizing knowledge-intensive images. In this work, we rethink the relationship between text-to-image generation and retrieval and propose a unified framework in the context of Multimodal Large Language Models (MLLMs). Specifically, we first explore the intrinsic discriminative abilities of MLLMs and introduce a generative retrieval method to perform retrieval in a training-free manner. Subsequently, we unify generation and retrieval in an autoregressive generation way and propose an autonomous decision module to choose the best-matched one between generated and retrieved images as the response to the text query. Additionally, we construct a benchmark called TIGeR-Bench, including creative and knowledge-intensive domains, to standardize the evaluation of unified text-to-image generation and retrieval. Extensive experimental results on TIGeR-Bench and two retrieval benchmarks, i.e., Flickr30K and MS-COCO, demonstrate the superiority and effectiveness of our proposed method.

Updated: 2024-06-09 15:00:28

标题: 统一的文本到图像生成和检索

摘要: 人类如何高效有效地获取图像一直是一个长期存在的问题。一个典型的解决方案是从现有数据库中根据文本查询进行文本到图像检索；然而，有限的数据库通常缺乏创造性。相比之下，最近在文本到图像生成方面取得的突破使得产生精美多样的视觉内容成为可能，但在合成知识密集型图像方面面临挑战。在这项工作中，我们重新思考了文本到图像生成和检索之间的关系，并提出了一个在多模态大语言模型（MLLMs）背景下的统一框架。具体来说，我们首先探索MLLMs的固有区分能力，并引入一种生成检索方法，以无需训练的方式进行检索。随后，我们以自回归生成的方式统一生成和检索，并提出了一个自主决策模块，以选择在生成和检索的图像中最匹配的一个作为对文本查询的响应。此外，我们构建了一个名为TIGeR-Bench的基准，包括创造性和知识密集型领域，以规范统一文本到图像生成和检索的评估。在TIGeR-Bench和两个检索基准，即Flickr30K和MS-COCO上的广泛实验结果表明我们提出的方法的优越性和有效性。

更新时间: 2024-06-09 15:00:28

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2406.05814v1

Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models

Large language models have gained tremendous popularity in domains such as e-commerce, finance, healthcare, and education. Fine-tuning is a common approach to customize an LLM on a domain-specific dataset for a desired downstream task. In this paper, we present a valuable resource for fine-tuning LLMs developed for the Spanish language to perform a variety of tasks such as classification, masked language modeling, clustering, and others. Our resource is a collection of handwritten notary records from the seventeenth century obtained from the National Archives of Argentina. This collection contains a combination of original images and transcribed text (and metadata) of 160+ pages that were handwritten by two notaries, namely, Estenban Agreda de Vergara and Nicolas de Valdivia y Brisuela nearly 400 years ago. Through empirical evaluation, we demonstrate that our collection can be used to fine-tune Spanish LLMs for tasks such as classification and masked language modeling, and can outperform pre-trained Spanish models and ChatGPT-3.5/ChatGPT-4o. Our resource will be an invaluable resource for historical text analysis and is publicly available on GitHub.

Updated: 2024-06-09 14:54:22

标题: 17世纪西班牙美洲公证记录用于微调西班牙大型语言模型

摘要: 大型语言模型在电子商务、金融、医疗保健和教育等领域已经获得了巨大的流行。Fine-tuning是一种常见的方法，用于根据特定领域的数据集自定义LLM，以执行所需的下游任务。在本文中，我们提供了一个有价值的资源，用于Fine-tuning为西班牙语开发的LLM，以执行各种任务，如分类、掩盖语言建模、聚类等。我们的资源是从阿根廷国家档案馆获取的17世纪手写公证记录的收集。这个收集包含由Estenban Agreda de Vergara和Nicolas de Valdivia y Brisuela两位公证人在将近400年前手写的160多页原始图像和转录文本（以及元数据）的组合。通过实证评估，我们证明我们的收集可用于Fine-tuning西班牙语LLM，用于分类和掩盖语言建模等任务，并且可以胜过预训练的西班牙语模型和ChatGPT-3.5/ChatGPT-4o。我们的资源将是历史文本分析的宝贵资源，并已在GitHub上公开提供。

更新时间: 2024-06-09 14:54:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05812v1

Toward identifiability of total effects in summary causal graphs with latent confounders: an extension of the front-door criterion

Conducting experiments to estimate total effects can be challenging due to cost, ethical concerns, or practical limitations. As an alternative, researchers often rely on causal graphs to determine if it is possible to identify these effects from observational data. Identifying total effects in fully specified non-temporal causal graphs has garnered considerable attention, with Pearl's front-door criterion enabling the identification of total effects in the presence of latent confounding even when no variable set is sufficient for adjustment. However, specifying a complete causal graph is challenging in many domains. Extending these identifiability results to partially specified graphs is crucial, particularly in dynamic systems where causal relationships evolve over time. This paper addresses the challenge of identifying total effects using a specific and well-known partially specified graph in dynamic systems called a summary causal graph, which does not specify the temporal lag between causal relations and can contain cycles. In particular, this paper presents sufficient graphical conditions for identifying total effects from observational data, even in the presence of hidden confounding and when no variable set is sufficient for adjustment, contributing to the ongoing effort to understand and estimate causal effects from observational data using summary causal graphs.

Updated: 2024-06-09 14:43:06

标题: 朝向在具有潜在混淆因素的总效应总结因果图中的可识别性：前门标准的延伸

摘要: 进行实验以估计总效应可能会面临成本、伦理关切或实际限制等挑战。作为一种替代方法，研究人员通常依赖因果图来确定是否可能从观察数据中识别这些效应。在非时间性因果图中识别总效应引起了广泛关注，Pearl的前门准则使得在潜在混杂存在的情况下甚至在没有足够的变量集进行调整时也能够识别总效应。然而，在许多领域中规定完整的因果图是具有挑战性的。将这些可识别性结果扩展到部分规定的图中是至关重要的，特别是在因果关系随时间演变的动态系统中。本文探讨了在动态系统中使用一种特定且著名的部分规定图——摘要因果图来识别总效应的挑战，该图未规定因果关系之间的时间滞后，并且可能包含循环。具体而言，本文提出了从观测数据中识别总效应的充分图形条件，即使在存在隐藏混杂且没有足够的变量集进行调整的情况下，也能够做到这一点，从而有助于通过摘要因果图理解和估计观测数据中的因果效应的持续努力。

更新时间: 2024-06-09 14:43:06

领域: stat.ME,cs.AI

下载: http://arxiv.org/abs/2406.05805v1

A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components

Recent advancements in Large Language Models (LLMs) have catalyzed the development of sophisticated agentic workflows, offering improvements over traditional single-path, Chain-of-Thought (CoT) prompting techniques. This survey summarize the common workflows, with the particular focus on LLM-Profiled Components (LMPCs) and ignorance of non-LLM components. The reason behind such exploration is to facilitate a clearer understanding of LLM roles and see how reusabile of the LMPCs.

Updated: 2024-06-09 14:42:55

标题: 对基于LLM的代理工作流和LLM-配置文件组件的调查

摘要: 最近，大型语言模型（LLMs）的最新进展催生了复杂主动式工作流程的发展，相较于传统的单一路径、思维链（CoT）提示技术，它们提供了改进。本研究总结了常见的工作流程，特别关注LLM-Profiled Components（LMPCs）并忽略非LLM组件。这种探索背后的原因是为了促进对LLM角色的更清晰理解，以及看到LMPCs的可重用性。

更新时间: 2024-06-09 14:42:55

领域: cs.AI,cs.CL,cs.SE

下载: http://arxiv.org/abs/2406.05804v1

SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention

In the domain of large foundation models, the Segment Anything Model (SAM) has gained notable recognition for its exceptional performance in image segmentation. However, tackling the video camouflage object detection (VCOD) task presents a unique challenge. Camouflaged objects typically blend into the background, making them difficult to distinguish in still images. Additionally, ensuring temporal consistency in this context is a challenging problem. As a result, SAM encounters limitations and falls short when applied to the VCOD task. To overcome these challenges, we propose a new method called the SAM Propagation Module (SAM-PM). Our propagation module enforces temporal consistency within SAM by employing spatio-temporal cross-attention mechanisms. Moreover, we exclusively train the propagation module while keeping the SAM network weights frozen, allowing us to integrate task-specific insights with the vast knowledge accumulated by the large model. Our method effectively incorporates temporal consistency and domain-specific expertise into the segmentation network with an addition of less than 1% of SAM's parameters. Extensive experimentation reveals a substantial performance improvement in the VCOD benchmark when compared to the most recent state-of-the-art techniques. Code and pre-trained weights are open-sourced at https://github.com/SpiderNitt/SAM-PM

Updated: 2024-06-09 14:33:38

标题: SAM-PM：利用时空注意力增强视频伪装对象检测

摘要: 在大型基础模型领域，Segment Anything Model（SAM）因其在图像分割中出色的性能而获得了显着认可。然而，解决视频伪装物体检测（VCOD）任务提出了独特的挑战。伪装物体通常融入背景，使它们在静止图像中难以区分。此外，在此背景下确保时间一致性是一个具有挑战性的问题。因此，当应用于VCOD任务时，SAM遇到了限制并表现不佳。为了克服这些挑战，我们提出了一种名为SAM Propagation Module（SAM-PM）的新方法。我们的传播模块通过使用时空交叉注意机制在SAM内强制执行时间一致性。此外，我们在保持SAM网络权重冻结的情况下专门训练传播模块，从而使我们能够将任务特定的见解与大型模型积累的广泛知识相结合。我们的方法有效地将时间一致性和领域专业知识整合到分割网络中，而SAM参数的增加不到1％。广泛的实验表明，与最新的最先进技术相比，在VCOD基准测试中表现出显着的性能提升。代码和预训练权重可以在https://github.com/SpiderNitt/SAM-PM上找到。

更新时间: 2024-06-09 14:33:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05802v1

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

Background. A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained with Stochastic Gradient Descent (SGD) or one of its variants. However, recent empirical work examined the generalization of a random NN that interpolates the data: the NN was sampled from a seemingly uniform prior over the parameters, conditioned on that the NN perfectly classifies the training set. Interestingly, such a NN sample typically generalized as well as SGD-trained NNs. Contributions. We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN'' that agrees with the labels. Specifically, we show that such a `flat' prior over the NN parameterization induces a rich prior over the NN functions, due to the redundancy in the NN structure. In particular, this creates a bias towards simpler functions, which require less relevant parameters to represent -- enabling learning with a sample complexity approximately proportional to the complexity of the teacher (roughly, the number of non-redundant parameters), rather than the student's.

Updated: 2024-06-09 14:32:58

标题: 如何均匀随机权重导致非均匀偏差：典型的插值神经网络通过狭窄的教师进行泛化

摘要: 背景。一个主要的理论难题是为什么过度参数化的神经网络（NNs）在训练到零损失时（即，使其插值数据）时能很好地泛化。通常，NN是使用随机梯度下降（SGD）或其变体之一进行训练的。然而，最近的实证研究考察了一个随机NN插值数据的泛化能力：NN是从看似均匀的先验参数中进行采样的，条件是NN完全对训练集进行分类。有趣的是，这样的NN样本通常泛化得和经过SGD训练的NN一样好。贡献。我们证明，如果存在一个与标签一致的基础狭窄“教师NN”，那么这样一个随机NN插值者通常会很好地泛化。具体地，我们展示了这种“平坦”先验对NN参数化诱导了对NN函数的丰富先验，这是由于NN结构中的冗余性。特别是，这会对更简单的函数产生偏见，这些函数需要更少的相关参数来表示 - 从而实现了样本复杂度大约与教师的复杂度成比例的学习（大致上是非冗余参数的数量），而不是学生的。

更新时间: 2024-06-09 14:32:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.06323v2

SlowPerception: Physical-World Latency Attack against Visual Perception in Autonomous Driving

Autonomous Driving (AD) systems critically depend on visual perception for real-time object detection and multiple object tracking (MOT) to ensure safe driving. However, high latency in these visual perception components can lead to significant safety risks, such as vehicle collisions. While previous research has extensively explored latency attacks within the digital realm, translating these methods effectively to the physical world presents challenges. For instance, existing attacks rely on perturbations that are unrealistic or impractical for AD, such as adversarial perturbations affecting areas like the sky, or requiring large patches that obscure most of a camera's view, thus making them impossible to be conducted effectively in the real world. In this paper, we introduce SlowPerception, the first physical-world latency attack against AD perception, via generating projector-based universal perturbations. SlowPerception strategically creates numerous phantom objects on various surfaces in the environment, significantly increasing the computational load of Non-Maximum Suppression (NMS) and MOT, thereby inducing substantial latency. Our SlowPerception achieves second-level latency in physical-world settings, with an average latency of 2.5 seconds across different AD perception systems, scenarios, and hardware configurations. This performance significantly outperforms existing state-of-the-art latency attacks. Additionally, we conduct AD system-level impact assessments, such as vehicle collisions, using industry-grade AD systems with production-grade AD simulators with a 97% average rate. We hope that our analyses can inspire further research in this critical domain, enhancing the robustness of AD systems against emerging vulnerabilities.

Updated: 2024-06-09 14:30:18

标题: SlowPerception：自动驾驶中针对视觉感知的物理世界延迟攻击

摘要: 自动驾驶（AD）系统在实时对象检测和多目标跟踪（MOT）方面严重依赖视觉感知，以确保安全驾驶。然而，这些视觉感知组件的高延迟可能导致重大安全风险，如车辆碰撞。尽管先前的研究广泛探讨了数字领域内的延迟攻击，但将这些方法有效地转化到现实世界中面临挑战。例如，现有的攻击依赖于不切实际或不切实际的扰动，例如影响天空的对抗性扰动，或需要遮挡大部分摄像头视野的大块区域，因此在现实世界中无法有效进行。在本文中，我们介绍了SlowPerception，这是第一个针对AD感知的物理世界延迟攻击，通过生成基于投影仪的通用扰动。SlowPerception战略性地在环境中的各种表面上创造大量幻影对象，显著增加了非最大抑制（NMS）和MOT的计算负载，从而引发了相当大的延迟。我们的SlowPerception在物理世界设置中实现了第二级延迟，在不同AD感知系统、场景和硬件配置下的平均延迟为2.5秒。这一性能明显优于现有的最先进延迟攻击。此外，我们使用具有生产级AD模拟器的行业级AD系统进行了AD系统级影响评估，如车辆碰撞，平均达到了97%。我们希望我们的分析能够激发在这一关键领域进一步的研究，增强AD系统对新兴漏洞的鲁棒性。

更新时间: 2024-06-09 14:30:18

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.05800v1

GasTrace: Detecting Sandwich Attack Malicious Accounts in Ethereum

The openness and transparency of Ethereum transaction data make it easy to be exploited by any entities, executing malicious attacks. The sandwich attack manipulates the Automated Market Maker (AMM) mechanism, profiting from manipulating the market price through front or after-running transactions. To identify and prevent sandwich attacks, we propose a cascade classification framework GasTrace. GasTrace analyzes various transaction features to detect malicious accounts, notably through the analysis and modeling of Gas features. In the initial classification, we utilize the Support Vector Machine (SVM) with the Radial Basis Function (RBF) kernel to generate the predicted probabilities of accounts, further constructing a detailed transaction network. Subsequently, the behavior features are captured by the Graph Attention Network (GAT) technique in the second classification. Through cascade classification, GasTrace can analyze and classify the sandwich attacks. Our experimental results demonstrate that GasTrace achieves a remarkable detection and generation capability, performing an accuracy of 96.73% and an F1 score of 95.71% for identifying sandwich attack accounts.

Updated: 2024-06-09 14:25:34

标题: GasTrace：在以太坊中检测夹击攻击恶意账户

摘要: 以太坊交易数据的开放性和透明性使其容易被任何实体利用，执行恶意攻击。三明治攻击利用自动做市商（AMM）机制，通过前后运行交易操纵市场价格获利。为了识别和防止三明治攻击，我们提出了一个级联分类框架GasTrace。GasTrace分析各种交易特征以检测恶意账户，特别是通过Gas特征的分析和建模。在初始分类中，我们利用支持向量机（SVM）与径向基函数（RBF）核来生成账户的预测概率，进一步构建详细的交易网络。随后，通过图注意力网络（GAT）技术捕捉行为特征进行第二次分类。通过级联分类，GasTrace能够分析和分类三明治攻击。我们的实验结果表明，GasTrace实现了显著的检测和生成能力，对于识别三明治攻击账户的准确率为96.73％，F1得分为95.71％。

更新时间: 2024-06-09 14:25:34

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.19971v2

Hidden Holes: topological aspects of language models

We explore the topology of representation manifolds arising in autoregressive neural language models trained on raw text data. In order to study their properties, we introduce tools from computational algebraic topology, which we use as a basis for a measure of topological complexity, that we call perforation. Using this measure, we study the evolution of topological structure in GPT based large language models across depth and time during training. We then compare these to gated recurrent models, and show that the latter exhibit more topological complexity, with a distinct pattern of changes common to all natural languages but absent from synthetically generated data. The paper presents a detailed analysis of the representation manifolds derived by these models based on studying the shapes of vector clouds induced by them as they are conditioned on sentences from corpora of natural language text. The methods developed in this paper are novel in the field and based on mathematical apparatus that might be unfamiliar to the target audience. To help with that we introduce the minimum necessary theory, and provide additional visualizations in the appendices. The main contribution of the paper is a striking observation about the topological structure of the transformer as compared to LSTM based neural architectures. It suggests that further research into mathematical properties of these neural networks is necessary to understand the operation of large transformer language models. We hope this work inspires further explorations in this direction within the NLP community.

Updated: 2024-06-09 14:25:09

标题: 隐藏的漏洞：语言模型的拓扑学方面

摘要: 我们探讨了在基于原始文本数据训练的自回归神经语言模型中出现的表示流形的拓扑结构。为了研究它们的性质，我们引入了计算代数拓扑的工具，这些工具作为我们称之为穿孔的拓扑复杂度度量的基础。利用这个度量，我们研究了基于GPT的大型语言模型在训练过程中随着深度和时间的演变而发生的拓扑结构。然后我们将这些与门控循环模型进行比较，并展示后者表现出更多的拓扑复杂性，具有一种在所有自然语言中都普遍存在但在合成生成数据中缺失的变化模式。本文通过研究这些模型基于处理自然语言文本语料库中的句子时所诱导的向量云的形状，对这些模型推导出的表示流形进行了详细分析。本文中开发的方法在该领域是新颖的，基于可能对目标受众不熟悉的数学工具。为了帮助理解，我们介绍了最低必要的理论，并在附录中提供了额外的可视化。本文的主要贡献是关于变压器的拓扑结构与基于LSTM的神经架构相比的引人注目的观察。它表明进一步研究这些神经网络的数学属性是必要的，以理解大型变压器语言模型的运作。我们希望这项工作能激发NLP社区在这个方向上的更深入探索。

更新时间: 2024-06-09 14:25:09

领域: cs.CL,cs.AI,cs.NE

下载: http://arxiv.org/abs/2406.05798v1

Learning to Maximize Mutual Information for Chain-of-Thought Distillation

Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step~(DSS), a novel method utilizing chain-of-thought~(CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Codes are available at \url{https://github.com/xinchen9/cot_distillation_ACL2024}.

Updated: 2024-06-09 14:24:54

标题: 学习最大化相互信息以实现思维链提取

摘要: 知识蒸馏是一种将知识从大型、复杂模型转移到较小模型的技术，标志着向高效AI部署迈出了关键一步。蒸馏步骤（DSS）是一种利用思维链（CoT）蒸馏的新方法，通过赋予较小模型优越的推理能力，展现出潜力。在DSS中，经过蒸馏的模型通过多任务学习框架同时获得生成理由和预测标签的能力。然而，DSS忽视了两个训练任务之间的内在关系，导致CoT知识与标签预测任务的集成效果不佳。因此，我们从信息瓶颈的角度研究了这两个任务的相互关系，并将其形式化为最大化两个任务的表示特征的互信息。我们提出了一种变分方法来解决这个优化问题，使用基于学习的方法。我们在四个数据集上的实验结果表明，我们的方法优于最先进的DSS。我们的发现为未来关于语言模型蒸馏以及涉及CoT的应用的研究提供了有益的指导。代码可在\url{https://github.com/xinchen9/cot_distillation_ACL2024}找到。

更新时间: 2024-06-09 14:24:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.03348v3

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

The integration of molecule and language has garnered increasing attention in molecular science. Recent advancements in Language Models (LMs) have demonstrated potential for the comprehensive modeling of molecule and language. However, existing works exhibit notable limitations. Most existing works overlook the modeling of 3D information, which is crucial for understanding molecular structures and also functions. While some attempts have been made to leverage external structure encoding modules to inject the 3D molecular information into LMs, there exist obvious difficulties that hinder the integration of molecular structure and language text, such as modality alignment and separate tuning. To bridge this gap, we propose 3D-MolT5, a unified framework designed to model both 1D molecular sequence and 3D molecular structure. The key innovation lies in our methodology for mapping fine-grained 3D substructure representations (based on 3D molecular fingerprints) to a specialized 3D token vocabulary for 3D-MolT5. This 3D structure token vocabulary enables the seamless combination of 1D sequence and 3D structure representations in a tokenized format, allowing 3D-MolT5 to encode molecular sequence (SELFIES), molecular structure, and text sequences within a unified architecture. Alongside, we further introduce 1D and 3D joint pre-training to enhance the model's comprehension of these diverse modalities in a joint representation space and better generalize to various tasks for our foundation model. Through instruction tuning on multiple downstream datasets, our proposed 3D-MolT5 shows superior performance than existing methods in molecular property prediction, molecule captioning, and text-based molecule generation tasks. Our code will be available on GitHub soon.

Updated: 2024-06-09 14:20:55

标题: 3D-MolT5：朝向统一的3D分子-文本建模与3D分子标记化

摘要: 分子和语言的整合在分子科学中引起了越来越多的关注。语言模型（LMs）的最新进展显示了对分子和语言进行全面建模的潜力。然而，现有作品存在明显的局限性。大多数现有作品忽视了对三维信息的建模，这对于理解分子结构和功能至关重要。虽然已经有一些尝试利用外部结构编码模块将三维分子信息注入到LMs中，但存在明显困难，阻碍了分子结构和语言文本的整合，比如模态对齐和独立调整。为了弥合这一差距，我们提出了3D-MolT5，这是一个统一的框架，旨在对一维分子序列和三维分子结构进行建模。关键创新在于我们的方法论，将基于三维分子指纹的细粒度三维亚结构表示映射到3D-MolT5的专门的三维令牌词汇中。这个三维结构令牌词汇使得一维序列和三维结构表示能够无缝地结合在一个令牌化的格式中，使得3D-MolT5能够在统一的架构中对分子序列（SELFIES）、分子结构和文本序列进行编码。同时，我们进一步引入了一维和三维联合预训练，以增强模型对这些不同模态的理解，并更好地推广到我们基础模型的各种任务。通过在多个下游数据集上进行指导调整，我们提出的3D-MolT5在分子属性预测、分子字幕和基于文本的分子生成任务中表现出比现有方法更出色的性能。我们的代码将很快在GitHub上发布。

更新时间: 2024-06-09 14:20:55

领域: q-bio.BM,cs.AI,cs.CE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05797v1

ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

The need for abundant labelled data in supervised Adversarial Training (AT) has prompted the use of Self-Supervised Learning (SSL) techniques with AT. However, the direct application of existing SSL methods to adversarial training has been sub-optimal due to the increased training complexity of combining SSL with AT. A recent approach, DeACL, mitigates this by utilizing supervision from a standard SSL teacher in a distillation setting, to mimic supervised AT. However, we find that there is still a large performance gap when compared to supervised adversarial training, specifically on larger models. In this work, investigate the key reason for this gap and propose Projected Feature Adversarial Training (ProFeAT) to bridge the same. We show that the sub-optimal distillation performance is a result of mismatch in training objectives of the teacher and student, and propose to use a projection head at the student, that allows it to leverage weak supervision from the teacher while also being able to learn adversarially robust representations that are distinct from the teacher. We further propose appropriate attack and defense losses at the feature and projector, alongside a combination of weak and strong augmentations for the teacher and student respectively, to improve the training data diversity without increasing the training complexity. Through extensive experiments on several benchmark datasets and models, we demonstrate significant improvements in both clean and robust accuracy when compared to existing SSL-AT methods, setting a new state-of-the-art. We further report on-par/ improved performance when compared to TRADES, a popular supervised-AT method.

Updated: 2024-06-09 14:20:46

标题: ProFeAT：用于自监督学习稳健表示的投影特征对抗训练

摘要: 在监督对抗训练（AT）中，对丰富标记数据的需求促使了利用自监督学习（SSL）技术与AT相结合。然而，现有SSL方法直接应用于对抗训练时存在优化不足的问题，因为将SSL与AT结合会增加训练复杂性。最近的一种方法，DeACL，通过在蒸馏设置中利用标准SSL教师的监督来模拟监督AT，从而缓解了这一问题。然而，与监督对抗训练相比，特别是在较大模型上，仍存在着较大的性能差距。在这项工作中，我们研究了这一差距的关键原因，并提出了Projected Feature Adversarial Training (ProFeAT)来弥合这一差距。我们表明，优化不足的蒸馏性能是由于教师和学生的训练目标不匹配造成的，并提出在学生端使用投影头，使其能够利用教师的弱监督，同时学习具有与教师不同的对抗鲁棒性表示。我们进一步提出了适当的攻击和防御损失，以及教师和学生分别的弱和强增强组合，以改善训练数据的多样性，而不增加训练复杂性。通过在几个基准数据集和模型上进行大量实验，我们证明了与现有SSL-AT方法相比，在干净和稳健准确性方面取得了显著的改进，创造了一个新的技术水平。与流行的监督对抗训练方法TRADES相比，我们进一步报告了与其相当/改进的性能。

更新时间: 2024-06-09 14:20:46

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.05796v1

RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) frame work is showing state-of-the-art performance on open-domain question answering tasks by referencing external knowledge. However, the RAG system faces challenges with performance degradation when it is fed contexts of low relevance or when the relative relevance among the input contexts is inaccurately assessed. In this work, we propose a RE-RAG framework that injects an explicit context relevance estimator (RE) into the RAG system. RE-RAG re-evaluates the retrieved contexts with the proposed context RE and passes the more relevant contexts along with their measure importance to the generator. To train context RE, we propose an unsupervised learning method, which does not utilize any labeled document ranking data to train the context RE. To examine the efficacy of RE-RAG, we examine its performance on Natural Questions and TriviaQA datasets. RE-RAG achieves on-par performance compared to the FiD variants while utilizing fewer contexts (0.25x). We show that the proposed context RE, which was trained with the T5 model, is also applicable to RAG with LLMs(ChatGPT) by improving the performance on NQ (+6.4EM) and TQA (+2.8EM), respecitvely. Lastly, we display that RE can add interpretability to RAG framework as RE score highly correlates with the RE-RAG accuracy. Consequently, RE can be utilized to filter out unanswerable scenarios where context does not contain answers with 38.9%-51.3% accuracy just by examining a set of retrieved contexts.

Updated: 2024-06-09 14:11:19

标题: RE-RAG：使用检索增强生成中的相关性估计器改进开放领域问答性能和可解释性

摘要: 检索增强生成（RAG）框架通过引用外部知识在开放域问题回答任务中展示了最先进的性能。然而，当RAG系统接收到相关性较低的上下文或者对输入上下文之间的相对相关性评估不准确时，性能会下降。在这项工作中，我们提出了一个RE-RAG框架，将显式上下文相关性评估器（RE）注入到RAG系统中。RE-RAG使用提出的上下文RE重新评估检索到的上下文，并将更相关的上下文以及它们的重要性度量传递给生成器。为了训练上下文RE，我们提出了一种无监督学习方法，不使用任何标记的文档排名数据来训练上下文RE。为了检验RE-RAG的有效性，我们在自然问题（Natural Questions）和TrivaQA数据集上检查其性能。RE-RAG在利用更少上下文（0.25倍）的情况下，与FiD变体实现了相当的性能。我们展示了用T5模型训练的提出的上下文RE也适用于具有LLMs（ChatGPT）的RAG，分别在NQ（+6.4EM）和TQA（+2.8EM）上提高了性能。最后，我们显示RE可以为RAG框架增加可解释性，因为RE分数与RE-RAG准确性高度相关。因此，RE可以用来过滤出不可回答的情景，其中上下文不包含答案，准确率为38.9%-51.3%。

更新时间: 2024-06-09 14:11:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05794v1

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

Quantitative reasoning is a critical skill to analyze data, yet the assessment of such ability remains limited. To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data. The benchmark comprises a carefully constructed dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers. To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText. We evaluate natural language reasoning, program-based reasoning, and agent reasoning methods including Chain-of-Thought, Program-of-Thoughts, ReAct, and code interpreter assistants on diverse models. The strongest model GPT-4 achieves an accuracy of 58%, which has much room for improvement. Among open-source models, Deepseek-coder-instruct, a code LLM pretrained on 2T tokens, gets the highest accuracy of 37%. Analysis reveals that models encounter difficulties in data analysis and causal reasoning, and struggle in using causal knowledge and provided data simultaneously. Code and data are in https://github.com/xxxiaol/QRData.

Updated: 2024-06-09 13:54:09

标题: LLM是否具备基于数据的统计和因果推理能力？用数据进行先进定量推理的基准测试

摘要: Quantitative reasoning是一种分析数据的关键技能，然而对这种能力的评估仍然有限。为了填补这一空白，我们引入了Quantitative Reasoning with Data（QRData）基准，旨在评估大型语言模型在统计和因果推理方面使用真实世界数据的能力。该基准由一个精心构建的数据集组成，包括来自教科书、在线学习资料和学术论文的数据表，并附带411个问题。为了比较模型在数据和文本上的定量推理能力，我们用一个辅助集合包含了290个仅文本的问题，即QRText。我们评估了自然语言推理、基于程序的推理和代理推理方法，包括Chain-of-Thought、Program-of-Thoughts、ReAct和代码解释器助手，应用于不同的模型。最强的模型GPT-4实现了58%的准确率，但仍有很大的改进空间。在开源模型中，Deepseek-coder-instruct，一个在2T令牌上预训练的代码LLM，获得了最高的37%准确率。分析显示，模型在数据分析和因果推理方面遇到困难，并且在同时使用因果知识和提供的数据方面存在困难。代码和数据在https://github.com/xxxiaol/QRData。

更新时间: 2024-06-09 13:54:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.17644v2

Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment

The automated classification of stuttered speech has significant implications for timely assessments providing assistance to speech language pathologists. Despite notable advancements in the field, the cases in which multiple disfluencies occur in speech require attention. We have taken a progressive approach to fill this gap by classifying multi-stuttered speech more efficiently. The problem has been addressed by firstly curating a dataset of multi-stuttered disfluencies from SEP-28k audio clips. Secondly, employing Whisper, a state-of-the-art speech recognition model has been leveraged by using its encoder and taking the problem as multi-label classification. Thirdly, using a 6 encoder layer Whisper and experimenting with various layer freezing strategies, a computationally efficient configuration of the model was identified. The proposed configuration achieved micro, macro, and weighted F1- scores of 0.88, 0.85, and 0.87, correspondingly on an external test dataset i.e. Fluency-Bank. In addition, through layer freezing strategies, we were able to achieve the aforementioned results by fine-tuning a single encoder layer, consequently, reducing the model's trainable parameters from 20.27 million to 3.29 million. This research study unveils the contribution of the last encoder layer in the identification of disfluencies in stuttered speech. Consequently, it has led to a computationally efficient approach which makes the model more adaptable for various dialects and languages.

Updated: 2024-06-09 13:42:51

标题: 优化多重口吃语音分类：利用耳语编码器实现自动评估中的高效参数减少

摘要: 自动分类口吃言语具有重要意义，可为言语语言病理学家提供及时的评估和帮助。尽管该领域取得了显著进展，但在言语中出现多种不流畅情况的案例需要关注。我们采取了渐进式方法来填补这一空白，更有效地分类多种不流畅言语。首先，我们从SEP-28k音频剪辑中筛选了多种不流畅数据集。其次，利用最先进的语音识别模型Whisper，利用其编码器并将问题视为多标签分类。第三，使用6个编码器层的Whisper，并尝试各种层冻结策略，确定了模型的计算有效配置。所提出的配置在外部测试数据集Fluency-Bank上分别达到了0.88、0.85和0.87的微观、宏观和加权F1分数。此外，通过层冻结策略，我们可以通过微调单个编码器层来实现上述结果，从而将模型的可训练参数从2027万减少到329万。这项研究揭示了最后一个编码器层在识别口吃言语中的不流畅方面的贡献。因此，这导致了一种计算有效的方法，使模型更适用于各种方言和语言。

更新时间: 2024-06-09 13:42:51

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.05784v1

Convergence Conditions of Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we study the mean square asymptotic stability of a class of random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences dependent on the homogeneous ones. Secondly, we introduce the concept of random Tikhonov regularization path, and show that if the regularization path is slowly time-varying in some sense, then the output of the algorithm is consistent with the regularization path in mean square. Furthermore, if the data streams also satisfy the RKHS persistence of excitation condition, i.e. there exists a fixed length of time period, such that the conditional expectation of the operators induced by the input data accumulated over every time period has a uniformly strictly positive compact lower bound in the sense of the operator order with respect to time, then the output of the algorithm is consistent with the unknown function in mean square. Finally, for the case with independent and non-identically distributed data streams, the algorithm achieves the mean square consistency provided the marginal probability measures induced by the input data are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.

Updated: 2024-06-09 13:11:36

标题: 在线正则化统计学习在具有非平稳数据的再生核希尔伯特空间中的收敛条件

摘要: 我们研究了在具有依赖性和非定常在线数据流的再生核希尔伯特空间（RKHS）中递归正则化学习算法的收敛性。首先，我们研究了RKHS中一类随机差分方程的均方渐近稳定性，其非齐次项是依赖于齐次项的鞅差分序列。其次，我们引入了随机Tikhonov正则化路径的概念，并且证明了如果正则化路径在某种意义上缓慢变化，那么算法的输出与正则化路径在均方意义上是一致的。此外，如果数据流还满足RKHS的激励持久性条件，即存在一个固定长度的时间段，使得在每个时间段内由输入数据诱导的算子的条件期望在时间序中具有一致严格正定的紧下界，那么算法的输出在均方意义上与未知函数是一致的。最后，对于具有独立和非同分布数据流的情况，算法在边际概率测度由输入数据引起的缓慢时间变化且每个固定长度时间段的平均测度具有一致严格正定下界时，可以实现均方一致性。

更新时间: 2024-06-09 13:11:36

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03211v4

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes only a third of the training process. Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i.e., the loss gradients with respect to the outputs of intermediate neural layers. Previous works separately showed that accurate 4-bit quantization of the neural gradients needs to (1) be unbiased and (2) have a log scale. However, no previous work aimed to combine both ideas, as we do in this work. Specifically, we examine the importance of having unbiased quantization in quantized neural network training, where to maintain it, and how to combine it with logarithmic quantization. Based on this, we suggest a $\textit{logarithmic unbiased quantization}$ (LUQ) method to quantize both the forward and backward phases to 4-bit, achieving state-of-the-art results in 4-bit training without the overhead. For example, in ResNet50 on ImageNet, we achieved a degradation of 1.1%. We further improve this to a degradation of only 0.32% after three epochs of high precision fine-tuning, combined with a variance reduction method -- where both these methods add overhead comparable to previously suggested methods.

Updated: 2024-06-09 13:06:23

标题: 使用标准格式的4位矩阵乘法进行准确的神经网络训练

摘要: 权重和激活的量化是减少深度神经网络（DNNs）训练的计算占用的主要方法之一。当前的方法使前向阶段可以进行4位量化。然而，这仅构成了训练过程的三分之一。减少整个训练过程的计算占用需要对神经梯度进行量化，即相对于中间神经层输出的损失梯度。先前的研究分别表明，准确的4位神经梯度量化需要（1）无偏和（2）具有对数尺度。然而，之前没有任何研究旨在结合这两个想法，就像我们在这项研究中所做的那样。具体来说，我们研究了在量化神经网络训练中具有无偏量化的重要性，如何保持它，以及如何将其与对数量化相结合。基于此，我们提出了一种“对数无偏量化”（LUQ）方法，将前向和反向阶段都量化为4位，实现了4位训练的最先进结果，而不带有额外开销。例如，在ImageNet上的ResNet50中，我们实现了1.1%的降级。在经过三轮高精度微调后，再结合方差减少方法，我们进一步将这一降级改进为仅为0.32%，而这两种方法的额外开销与之前建议的方法相当。

更新时间: 2024-06-09 13:06:23

领域: cs.LG

下载: http://arxiv.org/abs/2112.10769v4

MLCM: Multistep Consistency Distillation of Latent Diffusion Model

Distilling large latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face a dilemma where they either (i) depend on multiple individual distilled models for different sampling budgets, or (ii) sacrifice generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8) sampling steps. To address these, we extend the recent multistep consistency distillation (MCD) strategy to representative LDMs, establishing the Multistep Latent Consistency Models (MLCMs) approach for low-cost high-quality image synthesis. MLCM serves as a unified model for various sampling steps due to the promise of MCD. We further augment MCD with a progressive training strategy to strengthen inter-segment consistency to boost the quality of few-step generations. We take the states from the sampling trajectories of the teacher model as training data for MLCMs to lift the requirements for high-quality training datasets and to bridge the gap between the training and inference of the distilled model. MLCM is compatible with preference learning strategies for further improvement of visual quality and aesthetic appeal. Empirically, MLCM can generate high-quality, delightful images with only 2-8 sampling steps. On the MSCOCO-2017 5K benchmark, MLCM distilled from SDXL gets a CLIP Score of 33.30, Aesthetic Score of 6.19, and Image Reward of 1.20 with only 4 steps, substantially surpassing 4-step LCM [23], 8-step SDXL-Lightning [17], and 8-step HyperSD [33]. We also demonstrate the versatility of MLCMs in applications including controllable generation, image style transfer, and Chinese-to-image generation.

Updated: 2024-06-09 12:55:50

标题: MLCM：潜在扩散模型的多步一致性蒸馏

摘要: 将大型潜在扩散模型（LDMs）提炼成易于抽样的模型吸引了越来越多的研究兴趣。然而，现有方法大多面临一个两难选择，要么（i）依赖于不同抽样预算的多个个体提炼模型，要么（ii）在有限的（例如2-4）和/或适度的（例如5-8）抽样步骤中牺牲生成质量。为了解决这些问题，我们将最近的多步一致性提炼（MCD）策略扩展到具有代表性的LDMs，建立了用于低成本高质量图像合成的多步潜在一致性模型（MLCMs）方法。MLCM作为各种抽样步骤的统一模型，得益于MCD的承诺。我们进一步采用渐进式训练策略增强MCD，以加强分段间的一致性，提升少步生成的质量。我们将教师模型抽样轨迹的状态作为MLCM的训练数据，从而提高了对高质量训练数据集的需求，并弥合了提炼模型的训练和推断之间的差距。MLCM与偏好学习策略兼容，以进一步提高视觉质量和审美吸引力。根据经验证据，MLCM可以在仅仅2-8个抽样步骤内生成高质量、令人愉悦的图像。在MSCOCO-2017 5K基准测试中，从SDXL提炼的MLCM在仅4个步骤中获得了33.30的CLIP分数、6.19的审美分数和1.20的图像奖励，远远超过了4步LCM [23]、8步SDXL-Lightning [17]和8步HyperSD [33]。我们还展示了MLCM在可控生成、图像风格转移和中文到图像生成等应用中的多功能性。

更新时间: 2024-06-09 12:55:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05768v1

Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media

On social media, users often express their personal feelings, which may exhibit cognitive distortions or even suicidal tendencies on certain specific topics. Early recognition of these signs is critical for effective psychological intervention. In this paper, we introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification and SocialCD-3K for cognitive distortions detection. The SOS-HL-1K dataset contained 1,249 posts and SocialCD-3K dataset was a multi-label classification dataset that containing 3,407 posts. We propose a comprehensive evaluation using two supervised learning methods and eight large language models (LLMs) on the proposed datasets. From the prompt engineering perspective, we experimented with two types of prompt strategies, including four zero-shot and five few-shot strategies. We also evaluated the performance of the LLMs after fine-tuning on the proposed tasks. The experimental results show that there is still a huge gap between LLMs relying only on prompt engineering and supervised learning. In the suicide classification task, this gap is 6.95% points in F1-score, while in the cognitive distortion task, the gap is even more pronounced, reaching 31.53% points in F1-score. However, after fine-tuning, this difference is significantly reduced. In the suicide and cognitive distortion classification tasks, the gap decreases to 4.31% and 3.14%, respectively. This research highlights the potential of LLMs in psychological contexts, but supervised learning remains necessary for more challenging tasks. All datasets and code are made available.

Updated: 2024-06-09 12:49:52

标题: 监督学习和大型语言模型在心理健康数据集上的基准：中国社交媒体中的认知扭曲和自杀风险

摘要: 在社交媒体上，用户经常表达个人感受，这些感受可能展示出认知扭曲甚至自杀倾向在某些特定话题上。对这些迹象的早期识别对有效的心理干预至关重要。在本文中，我们介绍了两个来自中国社交媒体的新数据集：用于自杀风险分类的SOS-HL-1K和用于检测认知扭曲的SocialCD-3K。SOS-HL-1K数据集包含1,249个帖子，SocialCD-3K数据集是一个包含3,407个帖子的多标签分类数据集。我们提出了使用两种监督学习方法和八个大型语言模型（LLMs）对提出的数据集进行全面评估。从提示工程的角度出发，我们尝试了两种类型的提示策略，包括四种零样本和五种少样本策略。我们还评估了在提出的任务上微调后LLMs的性能。实验结果显示，仅依赖于提示工程和监督学习的LLMs之间仍存在巨大差距。在自杀分类任务中，F1分数的差距为6.95个百分点，而在认知扭曲任务中，这种差距更加显著，达到31.53个百分点的F1分数。然而，在微调后，这种差异显著减少。在自杀和认知扭曲分类任务中，差距分别减少到4.31%和3.14%。这项研究突显了LLMs在心理背景下的潜力，但对于更具挑战性的任务，监督学习仍然是必要的。所有数据集和代码都已提供。

更新时间: 2024-06-09 12:49:52

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2309.03564v3

Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment

Multimodal fusion breaks through the barriers between diverse modalities and has already yielded numerous impressive performances. However, in various specialized fields, it is struggling to obtain sufficient alignment data for the training process, which seriously limits the use of previously elegant models. Thus, semi-supervised learning attempts to achieve multimodal alignment with fewer matched pairs but traditional methods like pseudo-labeling are difficult to apply in domains with no label information. To address these problems, we transform semi-supervised multimodal alignment into a manifold matching problem and propose a new method based on CLIP, named Gentle-CLIP. Specifically, we design a novel semantic density distribution loss to explore implicit semantic alignment information from unpaired multimodal data by constraining the latent representation distribution with fine granularity, thus eliminating the need for numerous strictly matched pairs. Meanwhile, we introduce multi-kernel maximum mean discrepancy as well as self-supervised contrastive loss to pull separate modality distributions closer and enhance the stability of the representation distribution. In addition, the contrastive loss used in CLIP is employed on the supervised matched data to prevent negative optimization. Extensive experiments conducted on a range of tasks in various fields, including protein, remote sensing, and the general vision-language field, demonstrate the effectiveness of our proposed Gentle-CLIP.

Updated: 2024-06-09 12:41:14

标题: 温和-CLIP：利用软对齐探索低质量多模态数据中的对齐语义

摘要: 多模态融合突破了不同模态之间的障碍，并已经取得了许多令人印象深刻的表现。然而，在各个专业领域中，获得足够的对齐数据用于训练过程仍然是一个挑战，这严重限制了先前优雅模型的使用。因此，半监督学习试图通过较少的匹配对实现多模态对齐，但传统方法如伪标签在没有标签信息的领域中很难应用。为了解决这些问题，我们将半监督多模态对齐转化为一个流形匹配问题，并提出了一种基于CLIP的新方法，命名为Gentle-CLIP。具体地，我们设计了一种新颖的语义密度分布损失，通过细粒度地约束潜在表示分布，从未配对的多模态数据中探索隐含的语义对齐信息，从而消除了对大量严格匹配对的需求。与此同时，我们引入多核最大均值差异以及自监督对比损失，将不同模态分布拉近，并增强表示分布的稳定性。此外，CLIP中使用的对比损失也应用在监督匹配数据上，以防止负优化。在包括蛋白质、遥感和一般视觉语言领域在内的各种任务上进行了大量实验，证明了我们提出的Gentle-CLIP的有效性。

更新时间: 2024-06-09 12:41:14

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.05766v1

Global Sensitivity Analysis of Uncertain Parameters in Bayesian Networks

Traditionally, the sensitivity analysis of a Bayesian network studies the impact of individually modifying the entries of its conditional probability tables in a one-at-a-time (OAT) fashion. However, this approach fails to give a comprehensive account of each inputs' relevance, since simultaneous perturbations in two or more parameters often entail higher-order effects that cannot be captured by an OAT analysis. We propose to conduct global variance-based sensitivity analysis instead, whereby $n$ parameters are viewed as uncertain at once and their importance is assessed jointly. Our method works by encoding the uncertainties as $n$ additional variables of the network. To prevent the curse of dimensionality while adding these dimensions, we use low-rank tensor decomposition to break down the new potentials into smaller factors. Last, we apply the method of Sobol to the resulting network to obtain $n$ global sensitivity indices. Using a benchmark array of both expert-elicited and learned Bayesian networks, we demonstrate that the Sobol indices can significantly differ from the OAT indices, thus revealing the true influence of uncertain parameters and their interactions.

Updated: 2024-06-09 12:36:38

标题: 贝叶斯网络中不确定参数的全局敏感性分析

摘要: 传统上，贝叶斯网络的敏感性分析研究通过以一种一次一个（OAT）的方式修改其条件概率表的条目来研究其影响。然而，这种方法未能全面说明每个输入的相关性，因为两个或多个参数的同时扰动通常会导致高阶效应，这些效应不能被OAT分析捕捉到。我们提出进行全局基于方差的敏感性分析，即将$n$个参数同时视为不确定，并共同评估它们的重要性。我们的方法通过将不确定性编码为网络的$n$个额外变量来实现。为了在添加这些维度时避免维度灾难，我们使用低秩张量分解将新的潜在因素分解为较小的因子。最后，我们应用Sobol方法到结果网络中，以获得$n$个全局敏感性指标。使用专家引导和学习的贝叶斯网络的基准数组，我们展示了Sobol指数与OAT指数之间可能存在显著差异，从而揭示了不确定参数及其相互作用的真正影响。

更新时间: 2024-06-09 12:36:38

领域: cs.AI

下载: http://arxiv.org/abs/2406.05764v1

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge, requiring a balance between stability and adaptability. Relying on the generalizable representation in pre-trained models (PTMs), PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs. However, many existing PTM-based CL methods use restricted adaptation on a fixed set of these modules to avoid forgetting, suffering from limited CL ability. Periodically adding task-specific modules results in linear model growth rate and impaired knowledge reuse. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL. SEMA automatically decides to reuse or add adapter modules on demand in CL, depending on whether significant distribution shift that cannot be handled is detected at different representation levels. We design modular adapter consisting of a functional adapter and a representation descriptor. The representation descriptors are trained as a distribution shift indicator and used to trigger self-expansion signals. For better composing the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. SEMA enables better knowledge reuse and sub-linear expansion rate. Extensive experiments demonstrate the effectiveness of the proposed self-expansion method, achieving state-of-the-art performance compared to PTM-based CL methods without memory rehearsal.

Updated: 2024-06-09 12:24:03

标题: 使用适配器混合自我扩展预训练模型以进行持续学习

摘要: 持续学习（CL）旨在在非静态数据流中持续积累知识，而不会忘记已学知识，需要在稳定性和适应性之间取得平衡。基于预训练模型（PTMs）中的可泛化表示，基于PTM的CL方法通过在冻结的PTMs上添加可学习的适配器或提示，对下游任务进行有效的持续适应。然而，许多现有的基于PTM的CL方法在固定的一组模块上进行受限适应，以避免遗忘，从而导致CL能力有限。定期添加特定任务的模块会导致线性模型增长率和知识复用受损。我们提出了自主扩展预训练模型的模块化适应（SEMA），这是一种增强PTM-based CL中稳定性-可塑性平衡控制的新方法。SEMA根据在不同表示级别检测到的无法处理的显著分布转移是否发生，自动决定在CL中重用或添加适配器模块。我们设计了由功能适配器和表示描述符组成的模块化适配器。表示描述符被训练为分布转移指示器，并用于触发自主扩展信号。为了更好地组合适配器，我们共同学习了一个可扩展的加权路由器，用于混合适配器输出。SEMA实现了更好的知识复用和次线性扩展率。大量实验证明了所提出的自主扩展方法的有效性，与无记忆回放的基于PTM的CL方法相比，性能达到了最先进水平。

更新时间: 2024-06-09 12:24:03

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.18886v2

Vision Mamba: Cutting-Edge Classification of Alzheimer's Disease with 3D MRI Scans

Classifying 3D MRI images for early detection of Alzheimer's disease is a critical task in medical imaging. Traditional approaches using Convolutional Neural Networks (CNNs) and Transformers face significant challenges in this domain. CNNs, while effective in capturing local spatial features, struggle with long-range dependencies and often require extensive computational resources for high-resolution 3D data. Transformers, on the other hand, excel in capturing global context but suffer from quadratic complexity in inference time and require substantial memory, making them less efficient for large-scale 3D MRI data. To address these limitations, we propose the use of Vision Mamba, an advanced model based on State Space Models (SSMs), for the classification of 3D MRI images to detect Alzheimer's disease. Vision Mamba leverages dynamic state representations and the selective scan algorithm, allowing it to efficiently capture and retain important spatial information across 3D volumes. By dynamically adjusting state transitions based on input features, Vision Mamba can selectively retain relevant information, leading to more accurate and computationally efficient processing of 3D MRI data. Our approach combines the parallelizable nature of convolutional operations during training with the efficient, recurrent processing of states during inference. This architecture not only improves computational efficiency but also enhances the model's ability to handle long-range dependencies within 3D medical images. Experimental results demonstrate that Vision Mamba outperforms traditional CNN and Transformer models accuracy, making it a promising tool for the early detection of Alzheimer's disease using 3D MRI data.

Updated: 2024-06-09 12:23:22

标题: Vision Mamba：利用3D MRI扫描对阿尔茨海默病进行尖端分类

摘要: 对3D MRI图像进行分类，以早期检测阿尔茨海默病，是医学成像中的一项关键任务。传统方法使用卷积神经网络（CNNs）和变压器在这一领域面临着重大挑战。CNNs在捕捉局部空间特征方面效果显著，但在处理长距离依赖时表现不佳，并且通常需要大量的计算资源来处理高分辨率的3D数据。另一方面，变压器在捕捉全局上下文方面表现出色，但在推理时面临二次复杂性，并且需要大量的内存，使其在处理大规模3D MRI数据时效率较低。为了解决这些限制，我们提出使用基于状态空间模型（SSMs）的先进模型Vision Mamba，用于对3D MRI图像进行分类以检测阿尔茨海默病。Vision Mamba利用动态状态表示和选择性扫描算法，使其能够有效捕捉和保留3D体积中的重要空间信息。通过根据输入特征动态调整状态转换，Vision Mamba可以选择性地保留相关信息，从而实现更准确和计算效率高的3D MRI数据处理。我们的方法结合了训练过程中卷积操作的可并行性与推理过程中状态的高效、重复处理。这种架构不仅提高了计算效率，还增强了模型处理3D医学图像中长距离依赖性的能力。实验结果表明，Vision Mamba在准确性上优于传统的CNN和Transformer模型，使其成为使用3D MRI数据进行阿尔茨海默病早期检测的有前景的工具。

更新时间: 2024-06-09 12:23:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.05757v1

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models

The recent rapid development of Large Vision-Language Models (LVLMs) has indicated their potential for embodied tasks.However, the critical skill of spatial understanding in embodied environments has not been thoroughly evaluated, leaving the gap between current LVLMs and qualified embodied intelligence unknown. Therefore, we construct EmbSpatial-Bench, a benchmark for evaluating embodied spatial understanding of LVLMs.The benchmark is automatically derived from embodied scenes and covers 6 spatial relationships from an egocentric perspective.Experiments expose the insufficient capacity of current LVLMs (even GPT-4V). We further present EmbSpatial-SFT, an instruction-tuning dataset designed to improve LVLMs' embodied spatial understanding.

Updated: 2024-06-09 12:23:14

标题: EmbSpatial-Bench：使用大规模视觉-语言模型评估具身体任务的空间理解Benchmark

摘要: 最近快速发展的大型视觉语言模型（LVLMs）表明它们在具体任务中的潜力。然而，在具体环境中的关键技能空间理解尚未得到充分评估，这导致当前LVLMs与合格的具体智能之间的差距未知。因此，我们构建了EmbSpatial-Bench，这是一个用于评估LVLMs具体空间理解能力的基准。该基准是从具体场景自动导出的，从自我中心的视角涵盖了6种空间关系。实验证实了当前LVLMs（甚至包括GPT-4V）的能力不足。我们进一步提出了EmbSpatial-SFT，这是一个旨在改善LVLMs具体空间理解能力的指令调整数据集。

更新时间: 2024-06-09 12:23:14

领域: cs.AI,cs.CL,cs.CV,cs.MM

下载: http://arxiv.org/abs/2406.05756v1

Numerical solution of a PDE arising from prediction with expert advice

This work investigates the online machine learning problem of prediction with expert advice in an adversarial setting through numerical analysis of, and experiments with, a related partial differential equation. The problem is a repeated two-person game involving decision-making at each step informed by $n$ experts in an adversarial environment. The continuum limit of this game over a large number of steps is a degenerate elliptic equation whose solution encodes the optimal strategies for both players. We develop numerical methods for approximating the solution of this equation in relatively high dimensions ($n\leq 10$) by exploiting symmetries in the equation and the solution to drastically reduce the size of the computational domain. Based on our numerical results we make a number of conjectures about the optimality of various adversarial strategies, in particular about the non-optimality of the COMB strategy.

Updated: 2024-06-09 12:17:05

标题: 专家建议预测中PDE的数值解

摘要: 这项工作通过数值分析和实验研究相关的偏微分方程，探讨了在线机器学习中的专家建议预测问题在对抗环境中的情况。该问题是一个涉及每一步决策的重复两人游戏，在对抗环境中由$n$个专家提供信息。在大量步骤中这个游戏的连续极限是一个退化椭圆方程，其解编码了两个玩家的最佳策略。我们开发了数值方法来近似解这个方程在相对高维度($n\leq 10$)中的解，通过利用方程和解中的对称性来大幅减少计算域的大小。根据我们的数值结果，我们对各种对抗策略的最优性提出了一些猜想，特别是关于COMB策略的非最优性。

更新时间: 2024-06-09 12:17:05

领域: math.NA,cs.AI,cs.LG,cs.NA,math.AP,35D40, 65N12, 65N06, 35Q68, 35J60

下载: http://arxiv.org/abs/2406.05754v1

Grounding Continuous Representations in Geometry: Equivariant Neural Fields

Recently, Neural Fields have emerged as a powerful modelling paradigm to represent continuous signals. In a conditional neural field, a field is represented by a latent variable that conditions the NeF, whose parametrisation is otherwise shared over an entire dataset. We propose Equivariant Neural Fields based on cross attention transformers, in which NeFs are conditioned on a geometric conditioning variable, a latent point cloud, that enables an equivariant decoding from latent to field. Our equivariant approach induces a steerability property by which both field and latent are grounded in geometry and amenable to transformation laws if the field transforms, the latent represents transforms accordingly and vice versa. Crucially, the equivariance relation ensures that the latent is capable of (1) representing geometric patterns faitfhully, allowing for geometric reasoning in latent space, (2) weightsharing over spatially similar patterns, allowing for efficient learning of datasets of fields. These main properties are validated using classification experiments and a verification of the capability of fitting entire datasets, in comparison to other non-equivariant NeF approaches. We further validate the potential of ENFs by demonstrate unique local field editing properties.

Updated: 2024-06-09 12:16:30

标题: 将连续表示法基础于几何学：等变神经场

摘要: 最近，神经场已经成为一种强大的建模范式，用于表示连续信号。在条件神经场中，一个场由一个潜变量表示，该变量在整个数据集上共享神经场的参数化。我们提出了基于交叉注意力变换器的等变神经场，其中神经场在几何条件变量上进行条件化，即一个潜在点云，这使得从潜在到场的解码是等变的。我们的等变方法引入了一种可操纵性属性，通过该属性，场和潜在都基于几何并且容易受到变换规律的影响，如果场发生变换，潜在也相应发生变换，反之亦然。至关重要的是，等变关系确保潜在能够(1)忠实地表示几何模式，允许在潜在空间中进行几何推理，(2)在空间上相似模式之间共享权重，从而实现有效学习数据集中的场。通过分类实验验证了这些主要属性，并验证了与其他非等变神经场方法相比，等变神经场可以拟合整个数据集的能力。我们进一步通过展示独特的局部场编辑属性来验证ENF的潜力。

更新时间: 2024-06-09 12:16:30

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.05753v1

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{O}(d\sqrt{\smash[b]{T/K}})$. Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{O}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.

Updated: 2024-06-09 12:14:10

标题: 多项式逻辑波段上的几乎极小化最优遗憾

摘要: 在这篇论文中，我们研究了上下文多项式对数（MNL）赌博问题，其中学习代理根据上下文信息顺序选择一组物品，并且用户反馈遵循MNL选择模型。特别是在最大物品组合大小$K$方面，下限和上限遗憾边界之间存在显著差异。此外，这些边界之间奖励结构的变化使得追求最优性变得复杂。在均匀奖励下，其中所有物品具有相同的期望奖励时，我们建立了一个$\Omega(d\sqrt{\smash[b]{T/K}})$的遗憾下限，并提出了一个常数时间算法OFU-MNL+，实现了一个匹配的上限界$\tilde{O}(d\sqrt{\smash[b]{T/K}})$。在非均匀奖励下，我们证明了一个$\Omega(d\sqrt{T})$的下限和一个$\tilde{O}(d\sqrt{T})$的上限，同样可以通过OFU-MNL+达到。我们的实证研究支持这些理论发现。据我们所知，这是上下文MNL赌博文献中第一篇证明极小极优性的工作，不论是在均匀还是非均匀奖励设置下，并提出了一个计算效率高的算法，可以达到这种最优性，最多有对数因子。

更新时间: 2024-06-09 12:14:10

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.09831v4

Do Deep Neural Network Solutions Form a Star Domain?

It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require very wide networks to succeed. In this work, we conjecture that more generally, the SGD solution set is a "star domain" that contains a "star model" that is linearly connected to all the other solutions via paths with low loss values, modulo permutations. We propose the Starlight algorithm that finds a star model of a given learning task. We validate our claim by showing that this star model is linearly connected with other independently found solutions. As an additional benefit of our study, we demonstrate better uncertainty estimates on the Bayesian Model Averaging over the obtained star domain. Further, we demonstrate star models as potential substitutes for model ensembles. Our code is available at https://github.com/aktsonthalia/starlight.

Updated: 2024-06-09 11:51:03

标题: 深度神经网络解决方案形成星形域吗？

摘要: 最近有人猜测，通过随机梯度下降（SGD）可达到的神经网络解集是凸的，考虑排列不变性（Entezari等，2022）。这意味着线性路径可以连接两个独立解，只要一个模型的权重被适当排列。然而，目前测试这一理论的方法通常需要非常宽的网络才能成功。在这项工作中，我们推测更一般地，SGD解集是一个“星形域”，其中包含一个“星型模型”，通过低损失的路径与其他解线性连接，经过排列。我们提出了Starlight算法，用于找到给定学习任务的星型模型。我们通过展示这个星型模型与其他独立发现的解线性连接来验证我们的主张。作为我们研究的额外好处，我们展示了在获得的星形域上进行贝叶斯模型平均的更好的不确定性估计。此外，我们展示星形模型作为模型集合的潜在替代品。我们的代码可以在https://github.com/aktsonthalia/starlight找到。

更新时间: 2024-06-09 11:51:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07968v2

Bayesian Deep Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Efficient learning and model compression algorithm for deep neural network (DNN) is a key workhorse behind the rise of deep learning (DL). In this work, we propose a message passing based Bayesian deep learning algorithm called EM-TDAMP to avoid the drawbacks of traditional stochastic gradient descent (SGD) based learning algorithms and regularization-based model compression methods. Specifically, we formulate the problem of DNN learning and compression as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an expectation maximization (EM) framework to estimate posterior distributions for parameters (E-step) and update hyperparameters (M-step), where the E-step is realized by a newly proposed turbo deep approximate message passing (TDAMP) algorithm. We further extend the EM-TDAMP and propose a novel Bayesian federated learning framework, in which and the clients perform TDAMP to efficiently calculate the local posterior distributions based on the local data, and the central server first aggregates the local posterior distributions to update the global posterior distributions and then update hyperparameters based on EM to accelerate convergence. We detail the application of EM-TDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EM-TDAMP.

Updated: 2024-06-09 11:44:16

标题: 贝叶斯深度学习：基于期望最大化和Turbo深度近似消息传递的方法

摘要: 高效学习和模型压缩算法是深度神经网络（DNN）背后崛起的关键工具。在这项工作中，我们提出了一种基于消息传递的贝叶斯深度学习算法，称为EM-TDAMP，以避免传统随机梯度下降（SGD）算法和基于正则化的模型压缩方法的缺点。具体来说，我们将DNN学习和压缩问题表述为稀疏贝叶斯推断问题，其中采用了分组稀疏先验以实现结构化模型压缩。然后，我们提出了一个期望最大化（EM）框架来估计参数的后验分布（E步骤）和更新超参数（M步骤），其中E步骤通过一种新提出的turbo深度近似消息传递（TDAMP）算法实现。我们进一步扩展了EM-TDAMP，并提出了一种新颖的贝叶斯联邦学习框架，在这个框架中，客户端执行TDAMP来基于本地数据高效计算局部后验分布，中央服务器首先聚合局部后验分布来更新全局后验分布，然后根据EM更新超参数以加速收敛。我们详细介绍了EM-TDAMP在波士顿房价预测和手写识别中的应用，并提供了大量的数值结果来展示EM-TDAMP的优势。

更新时间: 2024-06-09 11:44:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.07366v2

Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted.

Updated: 2024-06-09 11:37:45

标题: 动态不确定性因果图在临床诊断中的方法论及现实应用：可解释性和不变性

摘要: AI辅助临床诊断在医疗保健中备受期待。现有的深度学习模型缺乏可解释性，主要集中在图像分析上。最近开发的动态不确定因果图（DUCG）方法是因果驱动的，可解释的，并且在不同应用场景下是不变的，没有数据收集、标记、拟合、隐私、偏见、泛化、高成本和高能耗等问题。通过临床专家和DUCG技术人员之间的密切合作，构建了46个涵盖54个主要症状的DUCG模型。超过1,000种疾病可以在不需要分类的情况下进行诊断。在应用于现实世界之前，46个DUCG模型经过第三方医院的回顾验证。经验证的诊断准确率不低于95％，其中包括罕见疾病在内的每种疾病的诊断准确率不低于80％。在验证之后，46个DUCG模型被应用于中国的现实世界。已进行了超过一百万个真实诊断案例，仅识别出17个错误诊断。由于DUCG的透明性，找到并纠正了导致错误诊断的错误。经常应用DUCG的临床医生的诊断能力得到了显著提高。在介绍早期呈现的DUCG方法论之后，提出了潜在医疗检查的推荐算法，并提取了DUCG的关键思想。

更新时间: 2024-06-09 11:37:45

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.05746v1

Structured Learning of Compositional Sequential Interventions

We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as ``close schools for a month due to a pandemic'' or ``promote this podcast to this user during this week'', it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously unseen combinatorial sequences. Standard black-box approaches mapping sequences of categorical variables to outputs are applicable, but they rely on poorly understood assumptions on how reliable generalization can be obtained, and may underperform under sparse sequences, temporal variability, and large action spaces. To approach that, we pose an explicit model for \emph{composition}, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps. We show the identification properties of our compositional model, inspired by advances in causal matrix factorization methods but focusing on predictive models for novel compositions of interventions instead of matrix completion tasks and causal effect estimation. We compare our approach to flexible but generic black-box models to illustrate how structure aids prediction in sparse data conditions.

Updated: 2024-06-09 11:36:36

标题: 结构化学习组合顺序干预

摘要: 我们考虑顺序治疗方案，其中每个单位随时间暴露于干预的组合。当干预以定性标签描述，例如“由于大流行关闭学校一个月”或“在本周向此用户推广这个播客”时，不清楚哪些适当的结构假设允许我们将行为预测推广到以前未见的组合序列。将序列的分类变量映射到输出的标准黑盒方法是适用的，但它们依赖于如何获得可靠推广的机制的不清楚假设，并且在稀疏序列、时间变异性和大动作空间下可能表现不佳。为了解决这个问题，我们提出了一种明确的\emph{组合}模型，即如何将顺序干预的效果分解为模块，澄清了哪些数据条件允许在不同单位和时间步骤识别它们的联合效应。我们展示了我们的组合模型的识别特性，灵感来自因果矩阵分解方法的进展，但专注于对新型干预组合的预测模型，而不是矩阵完成任务和因果效应估计。我们将我们的方法与灵活但通用的黑盒模型进行比较，以说明结构如何在稀疏数据条件下有助于预测。

更新时间: 2024-06-09 11:36:36

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05745v1

LLMs Meet Multimodal Generation and Editing: A Survey

With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language models (MLLMs) mainly focus on multimodal understanding. This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio. Specifically, we summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods. Then, we summarize the various roles of LLMs in multimodal generation and exhaustively investigate the critical technical components behind these methods and the multimodal datasets utilized in these studies. Additionally, we dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction. Lastly, we discuss the advancements in the generative AI safety field, investigate emerging applications, and discuss future prospects. Our work provides a systematic and insightful overview of multimodal generation and processing, which is expected to advance the development of Artificial Intelligence for Generative Content (AIGC) and world models. A curated list of all related papers can be found at https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

Updated: 2024-06-09 11:34:12

标题: LLMs遇见多模态生成和编辑：一项调查

摘要: 随着大型语言模型（LLM）的最新进展，人们对将LLM与多模态学习相结合的兴趣日益增长。以往对多模态大型语言模型（MLLM）的调查主要集中在多模态理解方面。本调查详细阐述了跨越图像、视频、3D和音频等各个领域的多模态生成和编辑，并总结了这些领域中具有里程碑意义的工作。我们将这些研究分为基于LLM和基于CLIP/T5的方法。然后，我们总结了LLM在多模态生成中的各种作用，并详细调查了这些方法背后的关键技术组件以及这些研究中使用的多模态数据集。此外，我们深入探讨了可以利用现有生成模型进行人机交互的工具增强型多模态代理。最后，我们讨论了生成AI安全领域的进展，调查了新兴应用，并讨论了未来前景。我们的工作提供了多模态生成和处理的系统和富有洞察力的概述，有望推动生成内容人工智能（AIGC）和世界模型的发展。所有相关论文的策划列表可在https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation找到。

更新时间: 2024-06-09 11:34:12

领域: cs.AI,cs.CL,cs.CV,cs.MM,cs.SD

下载: http://arxiv.org/abs/2405.19334v2

Digital Business Model Analysis Using a Large Language Model

Digital transformation (DX) has recently become a pressing issue for many companies as the latest digital technologies, such as artificial intelligence and the Internet of Things, can be easily utilized. However, devising new business models is not easy for compa-nies, though they can improve their operations through digital technologies. Thus, business model design support methods are needed by people who lack digital tech-nology expertise. In contrast, large language models (LLMs) represented by ChatGPT and natural language processing utilizing LLMs have been developed revolutionarily. A business model design support system that utilizes these technologies has great potential. However, research on this area is scant. Accordingly, this study proposes an LLM-based method for comparing and analyzing similar companies from different business do-mains as a first step toward business model design support utilizing LLMs. This method can support idea generation in digital business model design.

Updated: 2024-06-09 11:16:11

标题: 使用大型语言模型进行数字商业模型分析

摘要: 数字化转型（DX）最近已成为许多公司的紧迫问题，因为最新的数字技术，如人工智能和物联网，可以轻松利用。然而，虽然公司可以通过数字技术改进运营，但制定新的商业模式并不容易。因此，缺乏数字技术专业知识的人们需要商业模式设计支持方法。相反，由ChatGPT等大型语言模型（LLMs）代表的自然语言处理利用LLMs已经得到了革命性发展。利用这些技术的商业模式设计支持系统具有巨大潜力。然而，对这一领域的研究很少。因此，本研究提出了一种基于LLM的方法，作为利用LLM支持商业模式设计的第一步，用于比较和分析不同业务领域中类似公司。这种方法可以支持数字商业模式设计中的创意生成。

更新时间: 2024-06-09 11:16:11

领域: cs.OH,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.05741v1

Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking

Docking is a crucial component in drug discovery aimed at predicting the binding conformation and affinity between small molecules and target proteins. ML-based docking has recently emerged as a prominent approach, outpacing traditional methods like DOCK and AutoDock Vina in handling the growing scale and complexity of molecular libraries. However, the availability of comprehensive and user-friendly datasets for training and benchmarking ML-based docking algorithms remains limited. We introduce Smiles2Dock, an open large-scale multi-task dataset for molecular docking. We created a framework combining P2Rank and AutoDock Vina to dock 1.7 million ligands from the ChEMBL database against 15 AlphaFold proteins, giving us more than 25 million protein-ligand binding scores. The dataset leverages a wide range of high-accuracy AlphaFold protein models, encompasses a diverse set of biologically relevant compounds and enables researchers to benchmark all major approaches for ML-based docking such as Graph, Transformer and CNN-based methods. We also introduce a novel Transformer-based architecture for docking scores prediction and set it as an initial benchmark for our dataset. Our dataset and code are publicly available to support the development of novel ML-based methods for molecular docking to advance scientific research in this field.

Updated: 2024-06-09 11:13:03

标题: Smiles2Dock: 一个用于基于机器学习的分子对接的开放大规模多任务数据集

摘要: 对接（Docking）是药物发现中的关键组成部分，旨在预测小分子与靶蛋白之间的结合构象和亲和力。基于机器学习的对接最近已经成为一种突出的方法，超越了传统方法如DOCK和AutoDock Vina，在处理日益增长的分子库规模和复杂性方面。然而，用于训练和基准测试基于机器学习的对接算法的全面且用户友好的数据集仍然有限。我们介绍了Smiles2Dock，一个用于分子对接的开放大规模多任务数据集。我们创建了一个框架，结合了P2Rank和AutoDock Vina，将来自ChEMBL数据库的170万个配体对接到15个AlphaFold蛋白质上，为我们提供了超过2500万个蛋白质-配体结合得分。该数据集利用了广泛的高精度AlphaFold蛋白模型，涵盖了各种生物相关化合物，并使研究人员能够对比所有主要的基于机器学习的对接方法，如基于图形、变压器和CNN的方法。我们还介绍了一种新颖的基于变压器的架构用于对接得分预测，并将其设置为我们数据集的初始基准。我们的数据集和代码公开可用，以支持发展用于分子对接的新型基于机器学习的方法，推动该领域的科学研究。

更新时间: 2024-06-09 11:13:03

领域: q-bio.BM,cs.LG,stat.AP,stat.CO

下载: http://arxiv.org/abs/2406.05738v1

Region of Interest Loss for Anonymizing Learned Image Compression

The use of AI in public spaces continually raises concerns about privacy and the protection of sensitive data. An example is the deployment of detection and recognition methods on humans, where images are provided by surveillance cameras. This results in the acquisition of great amounts of sensitive data, since the capture and transmission of images taken by such cameras happens unaltered, for them to be received by a server on the network. However, many applications do not explicitly require the identity of a given person in a scene; An anonymized representation containing information of the person's position while preserving the context of them in the scene suffices. We show how using a customized loss function on region of interests (ROI) can achieve sufficient anonymization such that human faces become unrecognizable while persons are kept detectable, by training an end-to-end optimized autoencoder for learned image compression that utilizes the flexibility of the learned analysis and reconstruction transforms for the task of mutating parts of the compression result. This approach enables compression and anonymization in one step on the capture device, instead of transmitting sensitive, nonanonymized data over the network. Additionally, we evaluate how this anonymization impacts the average precision of pre-trained foundation models on detecting faces (MTCNN) and humans (YOLOv8) in comparison to non-ANN based methods, while considering compression rate and latency.

Updated: 2024-06-09 10:36:06

标题: 感兴趣区域的丢失对于匿名学习图像压缩的影响

摘要: 在公共场所使用人工智能不断引发人们对隐私和敏感数据保护的担忧。一个例子是在人体上部署检测和识别方法，其中图像由监控摄像头提供。这导致大量敏感数据的获取，因为这些摄像头拍摄的图像经过不加修改地捕获和传输，以便它们被网络上的服务器接收。然而，许多应用程序并不明确要求在场景中给定人物的身份；一个包含人物位置信息的匿名化表示足以保留他们在场景中的背景。我们展示了如何使用自定义的损失函数在感兴趣区域（ROI）上实现足够的匿名化，使人脸变得不可识别，同时保持人物可检测，通过训练端到端优化的自动编码器来学习图像压缩，利用学习分析和重建转换的灵活性来对压缩结果的部分进行变异。这种方法使压缩和匿名化一步完成在捕获设备上，而不是通过网络传输敏感的非匿名化数据。此外，我们评估了这种匿名化对于在检测人脸（MTCNN）和人类（YOLOv8）的预训练基础模型的平均精度的影响，与非ANN方法相比，同时考虑压缩率和延迟。

更新时间: 2024-06-09 10:36:06

领域: cs.CV,cs.CR,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.05726v1

Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models

Neural network pruning has become increasingly crucial due to the complexity of these models and their widespread use in various fields. Existing pruning algorithms often suffer from limitations such as architecture specificity, excessive complexity and reliance on demanding calculations, rendering them impractical for real-world applications. This paper introduces KEN: a straightforward, universal and unstructured pruning algorithm based on Kernel Density Estimation (KDE). KEN aims to construct optimized transformers by selectively preserving the most significant parameters while restoring others to their pre-training state. This strategy preserves model performance while enabling storage of only the optimized subnetwork, leading to substantial memory savings. Extensive evaluations across seven different LLMs demonstrate that KEN achieves equal or better performance than their original unpruned versions, with a minimum parameter reduction of 25%. Furthermore, in-depth comparisons with established pruning and PEFT algorithms confirm KEN effectiveness. We further introduce KEN$_{viz}$, an explainable tool that visualizes the optimized model composition achieved by KEN from different points of view.

Updated: 2024-06-09 10:32:03

标题: Less is KEN: 一个通用且简单的大型语言模型非参数修剪算法

摘要: 神经网络修剪由于这些模型的复杂性以及它们在各个领域的广泛应用而变得越来越关键。现有的修剪算法通常存在诸如架构特异性、过度复杂和依赖于繁重计算等限制，使它们在现实世界应用中变得不切实际。本文介绍了KEN：一种基于核密度估计（KDE）的简单、通用和非结构化的修剪算法。KEN旨在通过选择性保留最重要的参数，同时将其他参数恢复到它们的预训练状态，来构建优化的转换器。这种策略在保持模型性能的同时，只存储优化的子网络，从而实现了大幅节省内存。在七个不同的LLM上进行了广泛评估，结果表明KEN实现了与其原始未修剪版本相等或更好的性能，最小参数减少率为25%。此外，与已建立的修剪和PEFT算法进行了深入比较，验证了KEN的有效性。我们进一步介绍了KEN$_{viz}$，一种可解释工具，可以从不同角度可视化KEN实现的优化模型构成。

更新时间: 2024-06-09 10:32:03

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.03142v2

Deception Analysis with Artificial Intelligence: An Interdisciplinary Perspective

Humans and machines interact more frequently than ever and our societies are becoming increasingly hybrid. A consequence of this hybridisation is the degradation of societal trust due to the prevalence of AI-enabled deception. Yet, despite our understanding of the role of trust in AI in the recent years, we still do not have a computational theory to be able to fully understand and explain the role deception plays in this context. This is a problem because while our ability to explain deception in hybrid societies is delayed, the design of AI agents may keep advancing towards fully autonomous deceptive machines, which would pose new challenges to dealing with deception. In this paper we build a timely and meaningful interdisciplinary perspective on deceptive AI and reinforce a 20 year old socio-cognitive perspective on trust and deception, by proposing the development of DAMAS -- a holistic Multi-Agent Systems (MAS) framework for the socio-cognitive modelling and analysis of deception. In a nutshell this paper covers the topic of modelling and explaining deception using AI approaches from the perspectives of Computer Science, Philosophy, Psychology, Ethics, and Intelligence Analysis.

Updated: 2024-06-09 10:31:26

标题: 用人工智能进行欺骗分析：跨学科视角

摘要: 人类和机器的互动比以往任何时候都频繁，我们的社会变得越来越混合。这种混合化的后果是由于AI-enabled deception的普遍存在而导致社会信任的退化。然而，尽管我们近年来对AI中信任的作用有了一定的了解，但我们仍然没有一种计算理论能够完全理解和解释在这种情境下欺骗所起的作用。这是一个问题，因为虽然我们解释混合社会中的欺骗的能力受到了延迟，但AI代理的设计可能会不断向着完全自主的欺骗机器发展，这将带来应对欺骗的新挑战。本文通过提出DAMAS的开发，建立了一个及时而有意义的跨学科视角，强调了20年前关于信任和欺骗的社会认知视角，提出了一个面向社会认知建模和分析的综合多代理系统（MAS）框架。简言之，本文从计算机科学、哲学、心理学、伦理学和情报分析的角度探讨了使用AI方法对欺骗进行建模和解释的主题。

更新时间: 2024-06-09 10:31:26

领域: cs.MA,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.05724v1

VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft

In this paper, we aim to evaluate multi-agent systems against complex dependencies, including spatial, causal, and temporal constraints. First, we construct a new benchmark, named VillagerBench, within the Minecraft environment.VillagerBench comprises diverse tasks crafted to test various aspects of multi-agent collaboration, from workload distribution to dynamic adaptation and synchronized task execution. Second, we introduce a Directed Acyclic Graph Multi-Agent Framework VillagerAgent to resolve complex inter-agent dependencies and enhance collaborative efficiency. This solution incorporates a task decomposer that creates a directed acyclic graph (DAG) for structured task management, an agent controller for task distribution, and a state manager for tracking environmental and agent data. Our empirical evaluation on VillagerBench demonstrates that VillagerAgent outperforms the existing AgentVerse model, reducing hallucinations and improving task decomposition efficacy. The results underscore VillagerAgent's potential in advancing multi-agent collaboration, offering a scalable and generalizable solution in dynamic environments. The source code is open-source on GitHub (https://github.com/cnsdqd-dyb/VillagerAgent).

Updated: 2024-06-09 10:21:47

标题: VillagerAgent：一个基于图的多智能体框架，用于协调《Minecraft》中复杂任务依赖关系

摘要: 在这篇论文中，我们旨在评估多智能体系统对复杂依赖关系的影响，包括空间、因果和时间约束。首先，我们在Minecraft环境中构建了一个名为VillagerBench的新基准。VillagerBench包括各种任务，旨在测试多智能体协作的各个方面，从工作负载分配到动态适应和同步任务执行。其次，我们引入了一个有向无环图多智能体框架VillagerAgent，以解决复杂的智能体间依赖关系，并增强协作效率。这种解决方案包括一个任务分解器，用于创建结构化任务管理的有向无环图（DAG），一个智能体控制器用于任务分配，以及一个状态管理器用于跟踪环境和智能体数据。我们在VillagerBench上的实证评估表明，VillagerAgent优于现有的AgentVerse模型，减少了幻觉并提高了任务分解效率。结果强调了VillagerAgent在推动多智能体协作方面的潜力，提供了在动态环境中可扩展且通用的解决方案。源代码在GitHub上开源（https://github.com/cnsdqd-dyb/VillagerAgent）。

更新时间: 2024-06-09 10:21:47

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2406.05720v1

CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification

In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.

Updated: 2024-06-09 10:20:21

标题: CompanyKG：一种用于公司相似度量化的大规模异构图

摘要: 在投资行业中，通常需要对公司进行精细化的相似度量化，以实现市场映射、竞争对手分析和并购等一系列目的。我们提出并发布了一个知识图谱，名为CompanyKG，用于表示和学习多样化的公司特征和关系。具体来说，117万家公司被表示为富含公司描述嵌入的节点；15种不同的公司间关系导致了5106万条加权边。为了全面评估用于公司相似度量化的方法，我们设计并编制了三个带有标注测试集的评估任务：相似度预测、竞争对手检索和相似度排名。我们为11种可复现的预测方法进行了广泛的基准测试结果，分为三组：仅节点、仅边和节点+边。据我们所知，CompanyKG是第一个源自真实投资平台的大规模异质图数据集，专为量化公司间相似度而设计。

更新时间: 2024-06-09 10:20:21

领域: cs.AI,cs.CE,cs.DB,cs.LG,05C85, 05C12, 68T07, 68T50, 05C90,E.0; I.2.1; I.2.6; H.4.0; J.0; I.2.8; I.2.7

下载: http://arxiv.org/abs/2306.10649v4

Contextual Continuum Bandits: Static Versus Dynamic Regret

We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated to the context. The goal is to minimize all the underlying functions for the received contexts, leading to a dynamic (contextual) notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are H\"older with respect to the contexts, we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret. We further study the case of strongly convex and smooth functions when the observations are noisy. Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret. Lastly, we present a minimax lower bound, implying two key facts. First, no algorithm can achieve sub-linear dynamic regret over functions that are not continuous with respect to the context. Second, for strongly convex and smooth functions, the algorithm that we propose achieves, up to a logarithmic factor, the minimax optimal rate of dynamic regret as a function of the number of queries.

Updated: 2024-06-09 10:12:08

标题: 上下文连续性赌博机：静态与动态后悔

摘要: 我们研究了上下文连续性赌博问题，其中学习者顺序接收边缘信息向量，并必须在凸集中选择一个动作，最小化与上下文相关的函数。目标是最小化接收到的上下文的所有基础函数，导致动态（上下文）后悔的概念，这比标准静态后悔更强。假设目标函数与上下文相关，我们证明任何实现次线性静态后悔的算法都可以扩展到实现次线性动态后悔。我们进一步研究了当观察结果有噪声时，强凸和光滑函数的情况。受内点法和自共轭障碍的启发，我们提出了一种实现次线性动态后悔的算法。最后，我们提出了一个极小值下界，暗示两个关键事实。首先，没有算法可以在不连续于上下文的函数上实现次线性动态后悔。其次，对于强凸和光滑函数，我们提出的算法在查询数量的函数上，达到了极小值动态后悔的最优速率，仅相差对数因子。

更新时间: 2024-06-09 10:12:08

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.05714v1

Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits

Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit algorithms, as it achieves near-optimal regret rates under various moment assumptions. Up until recently most UCB methods relied on concentration inequalities leading to confidence bounds which depend on moment parameters, such as the variance proxy, that are usually unknown in practice. In this paper, we propose a new distribution-free, data-driven UCB algorithm for symmetric reward distributions, which needs no moment information. The key idea is to combine a refined, one-sided version of the recently developed resampled median-of-means (RMM) method with UCB. We prove a near-optimal regret bound for the proposed anytime, parameter-free RMM-UCB method, even for heavy-tailed distributions.

Updated: 2024-06-09 10:06:50

标题: 基于数据驱动的重尾赌博机近最优遗憾的上界

摘要: 随机多臂老虎机（MABs）提供了一个基本的强化学习模型，用于研究在不确定环境中的顺序决策。上界置信度（UCB）算法催生了老虎机算法的复兴，因为它在各种矩假设下实现了接近最优的遗憾率。直到最近，大多数UCB方法依赖于集中不等式，导致置信度边界取决于矩参数（如方差代理），这些参数在实践中通常是未知的。在本文中，我们提出了一种新的无分布、数据驱动的UCB算法，适用于对称奖励分布，无需矩信息。关键思想是将最近开发的重新采样中值均值（RMM）方法的精炼的单边版本与UCB相结合。我们证明了所提出的任意时间、无参数的RMM-UCB方法的接近最优遗憾界，即使对于重尾分布也是如此。

更新时间: 2024-06-09 10:06:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.05710v1

TR2MTL: LLM based framework for Metric Temporal Logic Formalization of Traffic Rules

Traffic rules formalization is crucial for verifying the compliance and safety of autonomous vehicles (AVs). However, manual translation of natural language traffic rules as formal specification requires domain knowledge and logic expertise, which limits its adaptation. This paper introduces TR2MTL, a framework that employs large language models (LLMs) to automatically translate traffic rules (TR) into metric temporal logic (MTL). It is envisioned as a human-in-loop system for AV rule formalization. It utilizes a chain-of-thought in-context learning approach to guide the LLM in step-by-step translation and generating valid and grammatically correct MTL formulas. It can be extended to various forms of temporal logic and rules. We evaluated the framework on a challenging dataset of traffic rules we created from various sources and compared it against LLMs using different in-context learning methods. Results show that TR2MTL is domain-agnostic, achieving high accuracy and generalization capability even with a small dataset. Moreover, the method effectively predicts formulas with varying degrees of logical and semantic structure in unstructured traffic rules.

Updated: 2024-06-09 09:55:04

标题: TR2MTL：基于LLM的交通规则度量时间逻辑形式化框架

摘要: 交通规则的形式化对于验证自动驾驶车辆（AVs）的合规性和安全性至关重要。然而，将自然语言交通规则手动翻译为形式化规范需要领域知识和逻辑专业知识，这限制了其适应性。本文介绍了TR2MTL，这是一个利用大型语言模型（LLMs）自动将交通规则（TR）翻译为度量时序逻辑（MTL）的框架。它被设想为一种用于AV规则形式化的人机协同系统。它采用一种上下文学习的思维链方法，引导LLM逐步翻译并生成有效的和语法正确的MTL公式。它可以扩展到各种形式的时序逻辑和规则。我们在一个具有挑战性的交通规则数据集上评估了该框架，该数据集是我们从各种来源创建的，并将其与使用不同上下文学习方法的LLMs进行了比较。结果显示，TR2MTL是与领域无关的，即使数据集很小，也能实现高准确性和泛化能力。此外，该方法有效地预测了在非结构化交通规则中具有不同程度的逻辑和语义结构的公式。

更新时间: 2024-06-09 09:55:04

领域: cs.RO,cs.FL,cs.LG

下载: http://arxiv.org/abs/2406.05709v1

QGEval: A Benchmark for Question Generation Evaluation

Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is frequently used in the field of question generation (QG) and is one of the most accurate evaluation methods. It also serves as the standard for automatic metrics. However, there is a lack of unified evaluation criteria, which hampers the development of both QG technologies and automatic evaluation methods. To address this, we propose QGEval, a multi-dimensional Evaluation benchmark for Question Generation, which evaluates both generated questions and existing automatic metrics across 7 dimensions: fluency, clarity, conciseness, relevance, consistency, answerability, and answer consistency. We demonstrate the appropriateness of these dimensions by examining their correlations and distinctions. Analysis with QGEval reveals that 1) most QG models perform unsatisfactorily in terms of answerability and answer consistency, and 2) existing metrics fail to align well with human assessments when evaluating generated questions across the 7 dimensions. We expect this work to foster the development of both QG technologies and automatic metrics for QG.

Updated: 2024-06-09 09:51:55

标题: QGEval：问题生成评估基准

摘要: 自动生成的问题往往存在表达不清晰或事实不准确等问题，需要对其质量进行可靠和全面的评估。在问题生成领域，人工评估经常被使用，并且是最准确的评估方法之一。它还作为自动指标的标准。然而，缺乏统一的评估标准，这阻碍了问题生成技术和自动评估方法的发展。为了解决这个问题，我们提出了QGEval，一个用于问题生成的多维评估基准，评估生成的问题和现有的自动指标在7个维度上的表达：流畅性、清晰度、简洁性、相关性、一致性、可回答性和回答一致性。我们通过检查它们的相关性和区别来展示这些维度的适用性。使用QGEval进行分析显示，1）大多数问题生成模型在可回答性和回答一致性方面表现不佳，2）现有指标在评估生成的问题在7个维度上时与人工评估不一致。我们期望这项工作推动问题生成技术和自动指标的发展。

更新时间: 2024-06-09 09:51:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05707v1

Avoiding strict saddle points of nonconvex regularized problems

We introduce a strict saddle property for $\ell_p$ regularized functions, and propose an iterative reweighted $\ell_1$ algorithm to solve the $\ell_p$ regularized problems. The algorithm is guaranteed to converge only to local minimizers when randomly initialized. The strict saddle property is shown generic on these sparse optimization problems. Those analyses as well as the proposed algorithm can be easily extended to general nonconvex regularized problems.

Updated: 2024-06-09 09:39:21

标题: 避免非凸正则化问题的严格鞍点

摘要: 我们引入了$\ell_p$正则化函数的严格鞍点属性，并提出了一种迭代重新加权的$\ell_1$算法来解决$\ell_p$正则化问题。该算法只能保证在随机初始化时收敛到局部最小值。这种严格鞍点属性在这些稀疏优化问题中是通用的。这些分析以及所提出的算法可以很容易地推广到一般的非凸正则化问题。

更新时间: 2024-06-09 09:39:21

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2401.09274v2

UCTB: An Urban Computing Tool Box for Building Spatiotemporal Prediction Services

Spatiotemporal crowd flow prediction is one of the key technologies in smart cities. Currently, there are two major pain points that plague related research and practitioners. Firstly, crowd flow is related to multiple domain knowledge factors; however, due to the diversity of application scenarios, it is difficult for subsequent work to make reasonable and comprehensive use of domain knowledge. Secondly, with the development of deep learning technology, the implementation of relevant techniques has become increasingly complex; reproducing advanced models has become a time-consuming and increasingly cumbersome task. To address these issues, we design and implement a spatiotemporal crowd flow prediction toolbox called UCTB (Urban Computing Tool Box), which integrates multiple spatiotemporal domain knowledge and state-of-the-art models simultaneously. The relevant code and supporting documents have been open-sourced at https://github.com/uctb/UCTB.

Updated: 2024-06-09 09:28:46

标题: UCTB：用于构建时空预测服务的城市计算工具箱

摘要: 时空人群流预测是智能城市中的关键技术之一。目前，存在两个主要问题困扰着相关研究和从业者。首先，人群流与多个领域知识因素相关；然而，由于应用场景的多样性，难以使后续工作合理和全面地利用领域知识。其次，随着深度学习技术的发展，相关技术的实施变得越来越复杂；复制高级模型变得耗时且越来越繁琐。为解决这些问题，我们设计并实施了一个名为UCTB（城市计算工具箱）的时空人群流预测工具箱，同时集成了多个时空领域知识和最先进的模型。相关代码和支持文档已在https://github.com/uctb/UCTB 上开源。

更新时间: 2024-06-09 09:28:46

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2306.04144v2

Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering

This study investigates the capabilities of Large Language Models (LLMs), specifically GPT-4, in the context of Binary Reverse Engineering (RE). Employing a structured experimental approach, we analyzed the LLM's performance in interpreting and explaining human-written and decompiled codes. The research encompassed two phases: the first on basic code interpretation and the second on more complex malware analysis. Key findings indicate LLMs' proficiency in general code understanding, with varying effectiveness in detailed technical and security analyses. The study underscores the potential and current limitations of LLMs in reverse engineering, revealing crucial insights for future applications and improvements. Also, we examined our experimental methodologies, such as methods of evaluation and data constraints, which provided us with a technical vision for any future research activity in this field.

Updated: 2024-06-09 09:23:58

标题: 探讨大型语言模型（GPT-4）在二进制逆向工程中的有效性

摘要: 这项研究调查了大型语言模型（LLMs），特别是GPT-4，在二进制逆向工程（RE）领域的能力。采用结构化实验方法，我们分析了LLM在解释和解释人工编写的和反编译的代码方面的表现。研究包括两个阶段：第一个是基本代码解释，第二个是更复杂的恶意软件分析。关键发现表明LLMs在一般代码理解方面的熟练程度，对详细技术和安全分析的有效性各不相同。该研究强调了LLMs在逆向工程中的潜力和当前限制，揭示了未来应用和改进的重要见解。此外，我们还检验了我们的实验方法，如评估方法和数据限制，为该领域中任何未来研究活动提供了技术愿景。

更新时间: 2024-06-09 09:23:58

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.06637v1

Moving Sampling Physics-informed Neural Networks induced by Moving Mesh PDE

In this work, we propose an end-to-end adaptive sampling neural network (MMPDE-Net) based on the moving mesh method, which can adaptively generate new sampling points by solving the moving mesh PDE. This model focuses on improving the quality of sampling points generation. Moreover, we develop an iterative algorithm based on MMPDE-Net, which makes the sampling points more precise and controllable. Since MMPDE-Net is a framework independent of the deep learning solver, we combine it with physics-informed neural networks (PINN) to propose moving sampling PINN (MS-PINN) and demonstrate its effectiveness by error analysis under some assumptions. Finally, we demonstrate the performance improvement of MS-PINN compared to PINN through numerical experiments of four typical examples, which numerically verify the effectiveness of our method.

Updated: 2024-06-09 08:56:30

标题: 移动网格PDE诱导的移动采样物理信息神经网络

摘要: 在这项工作中，我们提出了一种基于移动网格方法的端到端自适应采样神经网络（MMPDE-Net），它可以通过解决移动网格PDE来自适应地生成新的采样点。该模型专注于改善采样点生成的质量。此外，我们基于MMPDE-Net开发了一种迭代算法，使采样点更加精确和可控。由于MMPDE-Net是一个独立于深度学习求解器的框架，我们将其与基于物理信息的神经网络（PINN）相结合，提出了移动采样PINN（MS-PINN），并在一些假设下通过误差分析证明了其有效性。最后，我们通过四个典型示例的数值实验展示了MS-PINN相对于PINN的性能改进，从而数值验证了我们方法的有效性。

更新时间: 2024-06-09 08:56:30

领域: math.NA,cs.AI,cs.LG,cs.NA

下载: http://arxiv.org/abs/2311.16167v4

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audio generated from noisy audio prompts within the context of flow-matching-based zero-shot TTS. Our investigation includes comprehensive training strategies: unsupervised pre-training with masked speech denoising, multi-speaker detection and DNSMOS-based data filtering on the pre-training data, and fine-tuning with random noise mixing. The results of our experiments demonstrate significant improvements in intelligibility, speaker similarity, and overall audio quality compared to the approach of applying speech enhancement to the audio prompt.

Updated: 2024-06-09 08:51:50

标题: 对基于流匹配的零样本 TTS 的噪声鲁棒性进行研究

摘要: 最近，零样本文本到语音（TTS）系统已经取得了快速进展，能够从简短的音频提示中合成任何说话者的声音。然而，当音频提示包含噪音时，生成的语音质量显著下降，且有限的研究已经进行来解决这个问题。在本文中，我们探讨了在流匹配型零样本TTS背景下增强从嘈杂音频提示生成的音频质量的各种策略。我们的研究包括全面的训练策略：无监督预训练配合掩蔽语音降噪、多说话者检测和基于DNSMOS的数据过滤，以及随机噪音混合的微调。我们实验的结果表明，在可懂性、说话者相似度和整体音频质量方面，与将语音增强应用于音频提示的方法相比，我们的方法取得了显著的改进。

更新时间: 2024-06-09 08:51:50

领域: eess.AS,cs.AI,eess.SP

下载: http://arxiv.org/abs/2406.05699v1

Interpretable Deep Clustering for Tabular Data

Clustering is a fundamental learning task widely used as a first step in data analysis. For example, biologists use cluster assignments to analyze genome sequences, medical records, or images. Since downstream analysis is typically performed at the cluster level, practitioners seek reliable and interpretable clustering models. We propose a new deep-learning framework for general domain tabular data that predicts interpretable cluster assignments at the instance and cluster levels. First, we present a self-supervised procedure to identify the subset of the most informative features from each data point. Then, we design a model that predicts cluster assignments and a gate matrix that provides cluster-level feature selection. Overall, our model provides cluster assignments with an indication of the driving feature for each sample and each cluster. We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics tabular datasets. Furthermore, using previously proposed metrics, we verify that our model leads to interpretable results at a sample and cluster level. Our code is available at https://github.com/jsvir/idc.

Updated: 2024-06-09 08:40:00

标题: 可解释的深度聚类技术用于表格数据

摘要: 聚类是一项基础的学习任务，在数据分析中被广泛用作第一步。例如，生物学家使用聚类分配来分析基因组序列、医疗记录或图像。由于下游分析通常在聚类级别上执行，从业者寻求可靠且可解释的聚类模型。我们提出了一个新的深度学习框架，适用于一般领域的表格数据，可以在实例和聚类级别预测可解释的聚类分配。首先，我们提出了一种自监督程序，用于识别每个数据点中最具信息量的特征子集。然后，我们设计了一个模型，用于预测聚类分配，以及一个门矩阵，提供聚类级别的特征选择。总的来说，我们的模型提供了带有每个样本和每个聚类驱动特征指示的聚类分配。我们展示了所提出的方法可以可靠地预测生物、文本、图像和物理表格数据集中的聚类分配。此外，使用先前提出的度量标准，我们验证了我们的模型在样本和聚类级别上导致可解释的结果。我们的代码可以在 https://github.com/jsvir/idc 获取。

更新时间: 2024-06-09 08:40:00

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.04785v2

A Low Rank Neural Representation of Entropy Solutions

We construct a new representation of entropy solutions to nonlinear scalar conservation laws with a smooth convex flux function in a single spatial dimension. The representation is a generalization of the method of characteristics and posseses a compositional form. While it is a nonlinear representation, the embedded dynamics of the solution in the time variable is linear. This representation is then discretized as a manifold of implicit neural representations where the feedforward neural network architecture has a low rank structure. Finally, we show that the low rank neural representation with a fixed number of layers and a small number of coefficients can approximate any entropy solution regardless of the complexity of the shock topology, while retaining the linearity of the embedded dynamics.

Updated: 2024-06-09 08:37:11

标题: 一个低秩神经表示的熵解

摘要: 我们构建了一个新的熵解表示，用于具有光滑凸流函数的非线性标量守恒定律在单一空间维度中。该表示是特征方法的泛化，并具有组合形式。虽然它是一个非线性表示，但解在时间变量中的嵌入动态是线性的。然后，该表示被离散化为隐式神经表示的流形，其中前馈神经网络结构具有低秩结构。最后，我们展示了低秩神经表示，具有固定层数和少量系数，可以逼近任何熵解，而不受激波拓扑复杂性的限制，同时保持嵌入动态的线性性。

更新时间: 2024-06-09 08:37:11

领域: math.NA,cs.LG,cs.NA,68T07, 41A46, 41A25, 65N15, 35L65

下载: http://arxiv.org/abs/2406.05694v1

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target voice domains, the models tend to generate audios with hoarseness, posing challenges in achieving high-quality vocal outputs. Therefore, in this paper, we propose a Self-supervised Pitch Augmentation method for Singing Voice Conversion (SPA-SVC), which can enhance the voice quality in SVC tasks without requiring additional data or increasing model parameters. We innovatively introduce a cycle pitch shifting training strategy and Structural Similarity Index (SSIM) loss into our SVC model, effectively enhancing its performance. Experimental results on the public singing datasets M4Singer indicate that our proposed method significantly improves model performance in both general SVC scenarios and particularly in cross-domain SVC scenarios.

Updated: 2024-06-09 08:34:01

标题: SPA-SVC：自监督调音增强的歌声转换

摘要: 基于扩散的歌声转换（SVC）模型与传统方法相比，已经显示出更好的合成质量。然而，在跨域SVC场景中，源声音域和目标声音域之间的音高存在显著差异，模型往往会生成带有嘶哑的音频，这给实现高质量声音输出带来挑战。因此，在本文中，我们提出了一种自监督音高增强方法，用于歌声转换（SPA-SVC），可以在不需要额外数据或增加模型参数的情况下增强SVC任务中的声音质量。我们创新地引入了一个循环音高移位训练策略和结构相似性指数（SSIM）损失到我们的SVC模型中，有效提升了其性能。在公共歌唱数据集M4Singer上的实验结果表明，我们提出的方法显著改善了模型在一般SVC场景和特别是在跨域SVC场景中的性能。

更新时间: 2024-06-09 08:34:01

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.05692v1

Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.

Updated: 2024-06-09 08:24:17

标题: 同行评审作为一个多轮和长对话的基于角色的互动

摘要: 大型语言模型（LLMs）已在各个领域展示了广泛的应用，且在学术同行评审过程中展现了显著的潜力。然而，现有的应用主要局限于基于提交的论文生成静态评论，未能捕捉真实世界同行评审的动态和迭代特性。在本文中，我们将同行评审过程重新构建为一个多轮、长篇对话，包含作者、审稿人和决策者的不同角色。我们构建了一个包含来自多个来源的超过26,841篇论文和92,017篇评论的全面数据集，其中包括顶级会议和知名期刊。这个数据集经过精心设计，旨在为LLMs在多轮对话中的应用提供便利，有效模拟完整的同行评审过程。此外，我们提出了一系列指标来评估LLMs在这种重新构建的同行评审环境下各个角色的表现，确保公正和全面的评估。我们相信这项工作通过整合动态、基于角色的互动，为增强LLM驱动的同行评审过程提供了一个有前途的视角。它与真实世界学术同行评审的迭代和互动特性密切相关，为该领域未来的研究和发展提供了坚实的基础。我们在https://github.com/chengtan9907/ReviewMT 开源了数据集。

更新时间: 2024-06-09 08:24:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.05688v1

Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning

This paper studies learning fair encoders in a self-supervised learning (SSL) setting, in which all data are unlabeled and only a small portion of them are annotated with sensitive attribute. Adversarial fair representation learning is well suited for this scenario by minimizing a contrastive loss over unlabeled data while maximizing an adversarial loss of predicting the sensitive attribute over the data with sensitive attribute. Nevertheless, optimizing adversarial fair representation learning presents significant challenges due to solving a non-convex non-concave minimax game. The complexity deepens when incorporating a global contrastive loss that contrasts each anchor data point against all other examples. A central question is ``{\it can we design a provable yet efficient algorithm for solving adversarial fair self-supervised contrastive learning}?'' Building on advanced optimization techniques, we propose a stochastic algorithm dubbed SoFCLR with a convergence analysis under reasonable conditions without requring a large batch size. We conduct extensive experiments to demonstrate the effectiveness of the proposed approach for downstream classification with eight fairness notions.

Updated: 2024-06-09 08:11:12

标题: 可证优化的对抗公平自监督对比学习

摘要: 本文研究在自监督学习（SSL）设置中学习公平编码器，在该设置中，所有数据都没有标签，只有一小部分数据带有敏感属性的注释。通过最小化未标记数据上的对比损失，同时最大化在具有敏感属性数据上预测敏感属性的对抗损失，对抗公平表示学习非常适合这种情况。然而，由于解决非凸非凹极小极大博弈，优化对抗公平表示学习面临重大挑战。当加入对每个锚定数据点与所有其他示例进行对比的全局对比损失时，复杂性加深。一个核心问题是“我们是否能设计一个既可证明又高效的算法来解决对抗公平自监督对比学习”？基于先进的优化技术，我们提出了一种名为SoFCLR的随机算法，并在合理条件下进行了收敛分析，而无需大批量大小。我们进行了大量实验证明所提方法在具有八种公平性概念的下游分类中的有效性。

更新时间: 2024-06-09 08:11:12

领域: cs.LG,cs.CV,cs.CY

下载: http://arxiv.org/abs/2406.05686v1

Predicting Open-Hole Laminates Failure Using Support Vector Machines With Classical and Quantum Kernels

Modeling open hole failure of composites is a complex task, consisting in a highly nonlinear response with interacting failure modes. Numerical modeling of this phenomenon has traditionally been based on the finite element method, but requires to tradeoff between high fidelity and computational cost. To mitigate this shortcoming, recent work has leveraged machine learning to predict the strength of open hole composite specimens. Here, we also propose using data-based models but to tackle open hole composite failure from a classification point of view. More specifically, we show how to train surrogate models to learn the ultimate failure envelope of an open hole composite plate under in-plane loading. To achieve this, we solve the classification problem via support vector machine (SVM) and test different classifiers by changing the SVM kernel function. The flexibility of kernel-based SVM also allows us to integrate the recently developed quantum kernels in our algorithm and compare them with the standard radial basis function (RBF) kernel. Finally, thanks to kernel-target alignment optimization, we tune the free parameters of all kernels to best separate safe and failure-inducing loading states. The results show classification accuracies higher than 90% for RBF, especially after alignment, followed closely by the quantum kernel classifiers.

Updated: 2024-06-09 07:49:00

标题: 使用支持向量机和经典及量子核预测开孔层合板的破坏

摘要: 建模复合材料开孔失效是一项复杂的任务，包括高度非线性响应与相互作用的失效模式。对这一现象的数值建模传统上基于有限元方法，但需要在高保真度和计算成本之间进行权衡。为了减轻这一缺点，最近的研究利用机器学习来预测开孔复合材料试样的强度。在这里，我们还提出使用基于数据的模型从分类的角度来处理开孔复合材料的失效。更具体地说，我们展示了如何训练替代模型来学习开孔复合材料板在平面加载下的最终失效包络。为了实现这一目标，我们通过支持向量机（SVM）解决分类问题，并通过更改SVM核函数来测试不同的分类器。基于核的支持向量机的灵活性还使我们能够在我们的算法中集成最近开发的量子核，并将其与标准径向基函数（RBF）核进行比较。最后，通过核-目标对齐优化，我们调整所有核的自由参数以最佳地分离安全和导致失效的加载状态。结果显示径向基函数的分类准确度高于90％，特别是在对齐后，紧随其后的是量子核分类器。

更新时间: 2024-06-09 07:49:00

领域: cs.CE,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2405.02903v2

From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR

Electronic Health Records (EHRs) contain rich patient information and are crucial for clinical research and practice. In recent years, deep learning models have been applied to EHRs, but they often rely on massive features, which may not be readily available for all patients. We propose HTP-Star, which leverages hypergraph structures with a pretrain-then-finetune framework for modeling EHR data, enabling seamless integration of additional features. Additionally, we design two techniques, namely (1) Smoothness-inducing Regularization and (2) Group-balanced Reweighting, to enhance the model's robustness during fine-tuning. Through experiments conducted on two real EHR datasets, we demonstrate that HTP-Star consistently outperforms various baselines while striking a balance between patients with basic and extra features.

Updated: 2024-06-09 07:41:03

标题: 从基本特征到额外特征：超图变换器的预训练-微调，用于基于电子病历的平衡临床预测

摘要: 电子健康记录（EHRs）包含丰富的患者信息，对临床研究和实践至关重要。近年来，深度学习模型已被应用于EHRs，但它们通常依赖于大量特征，这些特征可能并非所有患者都能轻松获得。我们提出了HTP-Star，它利用超图结构和一个预训练-微调框架来建模EHR数据，实现了对额外特征的无缝集成。此外，我们设计了两种技术，即（1）平滑性诱导正则化和（2）群组平衡重加权，以增强模型在微调过程中的稳健性。通过对两个真实EHR数据集进行的实验，我们证明了HTP-Star在各种基线模型之上持续表现出色，同时在基本特征和额外特征之间取得了平衡。

更新时间: 2024-06-09 07:41:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.05682v1

DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks

Despite the exceptional performance of deep neural networks (DNNs) across different domains, they are vulnerable to adversarial samples, in particular for tasks related to computer vision. Such vulnerability is further influenced by the digital container formats used in computers, where the discrete numerical values are commonly used for storing the pixel values. This paper examines how information loss in file formats impacts the effectiveness of adversarial attacks. Notably, we observe a pronounced hindrance to the adversarial attack performance due to the information loss of the non-integer pixel values. To address this issue, we explore to leverage the gradient information of the attack samples within the model to mitigate the information loss. We introduce the Do More Steps (DMS) algorithm, which hinges on two core techniques: gradient ascent-based \textit{adversarial integerization} (DMS-AI) and integrated gradients-based \textit{attribution selection} (DMS-AS). Our goal is to alleviate such lossy process to retain the attack performance when storing these adversarial samples digitally. In particular, DMS-AI integerizes the non-integer pixel values according to the gradient direction, and DMS-AS selects the non-integer pixels by comparing attribution results. We conduct thorough experiments to assess the effectiveness of our approach, including the implementations of the DMS-AI and DMS-AS on two large-scale datasets with various latest gradient-based attack methods. Our empirical findings conclusively demonstrate the superiority of our proposed DMS-AI and DMS-AS pixel integerization methods over the standardised methods, such as rounding, truncating and upper approaches, in maintaining attack integrity.

Updated: 2024-06-09 07:38:45

标题: DMS：通过更多步骤解决实用对抗攻击中的信息丢失问题

摘要: 尽管深度神经网络（DNNs）在不同领域表现出色，但它们对敌对样本具有脆弱性，特别是与计算机视觉相关的任务。这种脆弱性受到计算机使用的数字容器格式的影响，其中离散数值通常用于存储像素值。本文研究了文件格式中信息丢失如何影响对抗攻击的有效性。值得注意的是，我们观察到非整数像素值的信息丢失对对抗攻击性能造成明显阻碍。为了解决这个问题，我们探索利用攻击样本在模型中的梯度信息来减轻信息丢失。我们介绍了Do More Steps（DMS）算法，它依赖于两个核心技术：基于梯度上升的敌对整数化（DMS-AI）和基于综合梯度的属性选择（DMS-AS）。我们的目标是减轻这种丢失过程，以在数字存储这些对抗样本时保持攻击性能。具体而言，DMS-AI根据梯度方向将非整数像素值整数化，而DMS-AS通过比较属性结果选择非整数像素。我们进行了彻底的实验，评估了我们的方法的有效性，包括在两个大规模数据集上实施DMS-AI和DMS-AS，并使用各种最新的基于梯度的攻击方法。我们的实证发现明确证明了我们提出的DMS-AI和DMS-AS像素整数化方法在维护攻击完整性方面优于标准化方法，如四舍五入、截断和上限方法。

更新时间: 2024-06-09 07:38:45

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.07580v1

Enhancing Neural Subset Selection: Integrating Background Information into Set Representations

Learning neural subset selection tasks, such as compound selection in AI-aided drug discovery, have become increasingly pivotal across diverse applications. The existing methodologies in the field primarily concentrate on constructing models that capture the relationship between utility function values and subsets within their respective supersets. However, these approaches tend to overlook the valuable information contained within the superset when utilizing neural networks to model set functions. In this work, we address this oversight by adopting a probabilistic perspective. Our theoretical findings demonstrate that when the target value is conditioned on both the input set and subset, it is essential to incorporate an \textit{invariant sufficient statistic} of the superset into the subset of interest for effective learning. This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated. Motivated by these insights, we propose a simple yet effective information aggregation module designed to merge the representations of subsets and supersets from a permutation invariance perspective. Comprehensive empirical evaluations across diverse tasks and datasets validate the enhanced efficacy of our approach over conventional methods, underscoring the practicality and potency of our proposed strategies in real-world contexts.

Updated: 2024-06-09 07:34:45

标题: 增强神经子集选择：将背景信息整合到集合表示中

摘要: 学习神经子集选择任务，例如在人工智能辅助药物发现中的化合物选择，已经在各种应用中变得越来越关键。该领域现有的方法主要集中在构建捕捉效用函数值和各自超集内子集之间关系的模型上。然而，当利用神经网络来建模集合函数时，这些方法往往忽略了超集中包含的有价值的信息。在这项工作中，我们通过采用概率视角来解决这一疏忽。我们的理论发现表明，当目标值在输入集和子集的条件下时，将超集的\textit{不变充分统计量}纳入感兴趣的子集中对于有效学习至关重要。这确保输出值对于子集及其相应超集的排列保持不变，从而能够识别出子集来自的具体超集。受到这些见解的启发，我们提出了一个简单而有效的信息聚合模块，旨在从排列不变性的角度合并子集和超集的表示。通过对各种任务和数据集进行全面的实证评估，验证了我们的方法相对于传统方法的增强效力，强调了我们提出的策略在实际环境中的实用性和效力。

更新时间: 2024-06-09 07:34:45

领域: cs.LG

下载: http://arxiv.org/abs/2402.03139v2

Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking

Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While supervised fine-tuning helps with quality, it requires extensive supervision data to capture the full diversity of solutions. Alternatively, reinforcement learning methods like PPO aim to find limited highest-reward solutions while neglecting the solution diversity, akin to convergent thinking. To address these limitations, we propose Flow of Reasoning (FoR) -- an efficient LLM training approach enabling diverse reasoning with minimal data. FoR formulates multi-step LLM reasoning as a Markovian flow from an initial state to terminal states. The formulation allows to adapt principled GFlowNet approaches to train the LLM as a policy, which is able to sample multiple reasoning paths with probabilities proportional to the unnormalized reward. Empirical results show that, with limited training data (e.g., 15 examples), FoR can discover diverse high-quality solutions that excel greatly beyond current state-of-the-art methods across three tasks, including embodied reasoning (BlocksWorld), math puzzle solving (Game24), and logical reasoning (PrOntoQA). Code is available at https://github.com/Yu-Fangxu/FoR.

Updated: 2024-06-09 07:06:58

标题: 推理流程：利用发散性思维高效训练LLM策略

摘要: 分歧性思维是生成多样化解决方案的认知过程，是人类创造力和问题解决的标志。对于机器来说，在复杂推理问题中采样多样化的解决方案轨迹对于强大的结果、数据增强和增强模型泛化能力至关重要。大型语言模型（LLMs）经常在生成高质量、多样化推理方面遇到困难。虽然监督微调有助于提高质量，但需要大量监督数据来捕捉所有解决方案的多样性。另外，像PPO这样的强化学习方法旨在找到有限的最高奖励解决方案，而忽视解决方案的多样性，类似于收敛性思维。为了解决这些限制，我们提出了Reasoning Flow（FoR）- 一种高效的LLM训练方法，能够以最少的数据进行多样化推理。FoR将多步LLM推理形式化为从初始状态到终端状态的马尔可夫流。这种形式化允许采用原则上合理的GFlowNet方法来训练LLM作为一个策略，能够以与未标准化奖励成比例的概率采样多个推理路径。实证结果表明，在有限的训练数据（例如15个示例）的情况下，FoR能够发现在三个任务中明显优于当前最先进方法的多样化高质量解决方案，包括具身推理（BlocksWorld）、数学难题解决（Game24）和逻辑推理（PrOntoQA）。代码可在https://github.com/Yu-Fangxu/FoR上找到。

更新时间: 2024-06-09 07:06:58

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.05673v1

Certified Robustness to Data Poisoning in Gradient-Based Training

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. However, provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge and develop the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data. In particular, our framework certifies robustness against untargeted and targeted poisoning as well as backdoor attacks for both input and label manipulations. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

Updated: 2024-06-09 06:59:46

标题: 基于梯度训练的数据毒化的认证稳健性

摘要: 现代机器学习流水线利用大量的公共数据，这使得无法保证数据质量，同时也使模型容易受到毒化和后门攻击。然而，在这种攻击下能够明确界定模型行为仍然是一个未解决的问题。在这项工作中，我们解决了这一挑战，并开发了第一个提供对使用潜在操纵数据训练的模型行为提供可证明保证的框架。具体来说，我们的框架证明了对于输入和标签操纵的无目标和有目标毒化以及后门攻击的鲁棒性。我们的方法利用凸松弛来对给定毒化威胁模型的所有可能参数更新集进行近似，从而使我们能够限定任何基于梯度的学习算法的所有可达参数集。在获得这些参数集之后，我们提供了关于最坏情况行为的界限，包括模型性能和后门成功率。我们在能源消耗、医学成像和自动驾驶等应用中的多个真实世界数据集上展示了我们的方法。

更新时间: 2024-06-09 06:59:46

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.05670v1

General Distribution Learning: A theoretical framework for Deep Learning

There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima in generalization, and the exceptional performance of deep architectures, among others. This paper introduces a novel theoretical learning framework known as General Distribution Learning (GD Learning), which is designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors caused by models and fitting algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity. This integration of external knowledge helps to minimize learning errors across the entire dataset, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solution to non-convex optimization problems, such as minimizing fitting error, can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight has led to the development of the gradient structure control algorithm. GD Learning also offers a fresh perspective on the questions on deep learning, including overparameterization and non-convex optimizations, bias-variance trade-off, and the mechanism of flat minima.

Updated: 2024-06-09 06:49:22

标题: 普遍分布学习：深度学习的理论框架

摘要: 在经典学习理论框架内，深度学习（DL）仍然存在许多未解答的研究问题。这些问题包括过参数化神经网络（NNs）的显著泛化能力，尽管目标非凸性，但优化性能高效，泛化中平坦极小的机制，以及深度结构的卓越性能等。本文介绍了一种称为General Distribution Learning（GD Learning）的新型理论学习框架，旨在解决一系列机器学习和统计任务，包括分类、回归和参数估计。GD Learning与统计机器学习不同，它专注于真实的潜在分布。在GD Learning中，学习误差对应于传统统计学习框架中的预期错误，被分为由模型和拟合算法引起的拟合误差，以及由有限采样数据引入的采样误差。该框架显著地整合了先验知识，特别是在数据稀缺情景下。这种对外部知识的整合有助于最小化整个数据集的学习错误，从而提高性能。在GD Learning框架内，我们展示了解决非凸优化问题（如最小化拟合误差）的全局最优解可以通过最小化梯度范数和模型雅可比矩阵特征值的不均匀性来接近。这一洞察力促成了梯度结构控制算法的发展。GD Learning还为深度学习中的问题提供了新的视角，包括过参数化和非凸优化、偏差-方差权衡以及平坦极小的机制。

更新时间: 2024-06-09 06:49:22

领域: cs.LG,cs.IR,stat.ML

下载: http://arxiv.org/abs/2406.05666v1

A critical appraisal of water table depth estimation: Challenges and opportunities within machine learning

Fine-resolution spatial patterns of water table depth (WTD) play a crucial role in shaping ecological resilience, hydrological connectivity, and anthropocentric objectives. Generally, a large-scale (e.g., continental or global) spatial map of static WTD can be simulated using either physically-based (PB) or machine learning-based (ML) models. We construct three fine-resolution (500 m) ML simulations of WTD, using the XGBoost algorithm and more than 20 million real and proxy observations of WTD, across the United States and Canada. The three ML models were constrained using known physical relations between WTD's drivers and WTD and were trained by sequentially adding real and proxy observations of WTD. We interpret the black box of our physically constrained ML models and compare it against available literature in groundwater hydrology. Through an extensive (pixel-by-pixel) evaluation, we demonstrate that our models can more accurately predict unseen real and proxy observations of WTD across most of North America's ecoregions compared to three available PB simulations of WTD. However, we still argue that large-scale WTD estimation is far from being a solved problem. We reason that due to biased observational data mainly collected from low-elevation floodplains, the misspecification of equations within physically-based models, and the over-flexibility of machine learning models, verifiably accurate simulations of WTD do not yet exist. Ultimately, we thoroughly discuss future directions that may help hydrogeologists decide how to proceed with WTD estimations, with a particular focus on the application of machine learning and the use of proxy satellite data.

Updated: 2024-06-09 06:44:06

标题: 水位深度估算的批判性评估：机器学习中的挑战与机遇

摘要: 水位深度（WTD）的细分空间模式在塑造生态韧性、水文连通性和人类中心目标方面起着至关重要的作用。通常，可以使用基于物理的（PB）或基于机器学习的（ML）模型来模拟大尺度（例如，大陆或全球）静态WTD的空间地图。我们利用XGBoost算法构建了三个细分辨率（500米）的WTD ML模拟，跨越美国和加拿大的超过2000万个真实和代理WTD观测。这三个ML模型受到已知的WTD驱动因素和WTD之间的物理关系的限制，并通过逐步添加真实和代理WTD观测来进行训练。我们解释了我们物理约束的ML模型的黑匣子，并将其与地下水水文学的现有文献进行比较。通过广泛的（逐像素）评估，我们展示了与三个现有的PB WTD模拟相比，我们的模型可以更准确地预测北美大部分生态区的未见真实和代理WTD观测。然而，我们仍然认为大尺度WTD估计远未解决。我们认为，由于主要是从低海拔洪泛平原收集的有偏观测数据，基于物理的模型中方程式的错误规定以及机器学习模型的过度灵活性，目前尚不存在可以验证准确的WTD模拟。最终，我们全面讨论了未来的方向，可能有助于水文地质学家决定如何继续进行WTD估算，特别关注机器学习的应用和代理卫星数据的使用。

更新时间: 2024-06-09 06:44:06

领域: cs.LG,stat.AP,stat.ML

下载: http://arxiv.org/abs/2405.04579v2

Injecting Undetectable Backdoors in Deep Learning and Language Models

As ML models become increasingly complex and integral to high-stakes domains such as finance and healthcare, they also become more susceptible to sophisticated adversarial attacks. We investigate the threat posed by undetectable backdoors in models developed by insidious external expert firms. When such backdoors exist, they allow the designer of the model to sell information to the users on how to carefully perturb the least significant bits of their input to change the classification outcome to a favorable one. We develop a general strategy to plant a backdoor to neural networks while ensuring that even if the model's weights and architecture are accessible, the existence of the backdoor is still undetectable. To achieve this, we utilize techniques from cryptography such as cryptographic signatures and indistinguishability obfuscation. We further introduce the notion of undetectable backdoors to language models and extend our neural network backdoor attacks to such models based on the existence of steganographic functions.

Updated: 2024-06-09 06:26:21

标题: 在深度学习和语言模型中注入无法检测的后门

摘要: 随着机器学习模型在金融和医疗保健等高风险领域变得日益复杂和重要，它们也变得更容易受到复杂的对抗性攻击。我们调查了由阴险的外部专家公司开发的模型中存在的不可检测的后门所构成的威胁。当存在这样的后门时，它们允许模型的设计者向用户出售信息，指导他们如何小心地扰动输入的最不显著位，以改变分类结果为有利的结果。我们开发了一种通用策略，可以在神经网络中植入后门，同时确保即使模型的权重和结构是可访问的，后门的存在仍然是不可检测的。为了实现这一点，我们利用了来自密码学的技术，如加密签名和不可区分性混淆。我们进一步引入了对语言模型中不可检测的后门的概念，并基于隐藏功能的存在，将我们的神经网络后门攻击扩展到这些模型。

更新时间: 2024-06-09 06:26:21

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2406.05660v1

CompVPD: Iteratively Identifying Vulnerability Patches Based on Human Validation Results with a Precise Context

Applying security patches in open source software timely is critical for ensuring the security of downstream applications. However, it is challenging to apply these patches promptly because notifications of patches are often incomplete and delayed. To address this issue, existing approaches employ deep-learning (DL) models to identify additional vulnerability patches by determining whether a code commit addresses a vulnerability. Nonetheless, these approaches suffer from low accuracy due to the imprecise context provided for the patches. To provide precise context for patches, we propose a multi-granularity slicing algorithm and an adaptive-expanding algorithm to accurately identify code related to the patches. Additionally, the precise context enables to design an iterative identification framework, CompVPD, which utilizes the human validation results, and substantially improve the effectiveness. We empirically compare CompVPD with four state-of-the-art/practice (SOTA) approaches in identifying vulnerability patches. The results demonstrate that CompVPD improves the F1 score by 20% compared to the best scores of the SOTA approaches. Additionally, CompVPD contributes to security practice by helping identify 20 vulnerability patches and 18 fixes for high-risk bugs from 2,500 recent code commits in five highly popular open-source projects.

Updated: 2024-06-09 06:09:03

标题: CompVPD：基于人类验证结果，通过精确的上下文迭代识别漏洞补丁

摘要: 及时在开源软件中应用安全补丁对于确保下游应用程序的安全至关重要。然而，由于补丁通知通常不完整且延迟，及时应用这些补丁是具有挑战性的。为解决这一问题，现有方法采用深度学习（DL）模型来通过确定代码提交是否解决了漏洞来识别额外的漏洞补丁。然而，这些方法由于给出的补丁上下文不精确而导致准确性较低。为了为补丁提供精确的上下文，我们提出了一个多粒度切片算法和一个自适应扩展算法，以精确识别与补丁相关的代码。此外，精确的上下文使得能够设计一个迭代识别框架CompVPD，该框架利用人工验证结果并显著提高了效果。我们在识别漏洞补丁方面将CompVPD与四种最先进/实践（SOTA）方法进行了实证比较。结果表明，与SOTA方法的最佳分数相比，CompVPD将F1分数提高了20%。此外，CompVPD通过帮助识别了5个极受欢迎的开源项目中最近2,500个代码提交中的20个漏洞补丁和18个高风险错误修复，为安全实践做出了贡献。

更新时间: 2024-06-09 06:09:03

领域: cs.CR

下载: http://arxiv.org/abs/2310.02530v2

Fooling the Textual Fooler via Randomizing Latent Representations

Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the model architecture or model parameters and thus can be detrimental to existing NLP applications. To perform an attack, the adversary queries the victim model many times to determine the most important words in an input text and to replace these words with their corresponding synonyms. In this work, we propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example in these query-based black-box attacks; that is to fool the textual fooler. This defense, named AdvFooler, works by randomizing the latent representation of the input at inference time. Different from existing defenses, AdvFooler does not necessitate additional computational overhead during training nor relies on assumptions about the potential adversarial perturbation set while having a negligible impact on the model's accuracy. Our theoretical and empirical analyses highlight the significance of robustness resulting from confusing the adversary via randomizing the latent space, as well as the impact of randomization on clean accuracy. Finally, we empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks on two benchmark datasets.

Updated: 2024-06-09 06:06:28

标题: 通过随机化潜在表示来愚弄文本欺骗者

摘要: 尽管在各种自然语言处理任务中表现出色，但最近的研究发现，NLP模型容易受到对抗性攻击，稍微扰动输入即可导致模型失效。在这些攻击中，对抗性单词级扰动是被广泛研究且有效的攻击策略。由于这些攻击在黑盒设置中起作用，它们不需要访问模型架构或模型参数，因此可能对现有的NLP应用产生不利影响。为了进行攻击，对手多次查询受害模型，以确定输入文本中最重要的单词，并用其对应的同义词替换这些单词。在这项工作中，我们提出了一种轻量级且攻击无关的防御机制，其主要目标是在这些基于查询的黑盒攻击中使生成对抗性示例的过程变得困难，即愚弄文本愚者。这种防御机制，名为AdvFooler，通过在推断时随机化输入的潜在表示来运作。与现有的防御机制不同，AdvFooler在训练期间不需要额外的计算开销，也不依赖于对可能的对抗性扰动集的假设，同时对模型的准确性影响微乎其微。我们的理论和实证分析突显了通过在潜在空间中随机化来困惑对手的鲁棒性的重要性，以及随机化对清洁准确性的影响。最后，我们在两个基准数据集上实证展示了AdvFooler对代表性对抗性单词级攻击的近乎最新技术的鲁棒性。

更新时间: 2024-06-09 06:06:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.01452v2

Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

Theory of Mind (ToM) reasoning entails recognizing that other individuals possess their own intentions, emotions, and thoughts, which is vital for guiding one's own thought processes. Although large language models (LLMs) excel in tasks such as summarization, question answering, and translation, they still face challenges with ToM reasoning, especially in open-ended questions. Despite advancements, the extent to which LLMs truly understand ToM reasoning and how closely it aligns with human ToM reasoning remains inadequately explored in open-ended scenarios. Motivated by this gap, we assess the abilities of LLMs to perceive and integrate human intentions and emotions into their ToM reasoning processes within open-ended questions. Our study utilizes posts from Reddit's ChangeMyView platform, which demands nuanced social reasoning to craft persuasive responses. Our analysis, comparing semantic similarity and lexical overlap metrics between responses generated by humans and LLMs, reveals clear disparities in ToM reasoning capabilities in open-ended questions, with even the most advanced models showing notable limitations. To enhance LLM capabilities, we implement a prompt tuning method that incorporates human intentions and emotions, resulting in improvements in ToM reasoning performance. However, despite these improvements, the enhancement still falls short of fully achieving human-like reasoning. This research highlights the deficiencies in LLMs' social reasoning and demonstrates how integrating human intentions and emotions can boost their effectiveness.

Updated: 2024-06-09 05:57:59

标题: LLM是否表现出类似人类推理能力？评估LLM的心灵理论在开放式回答中的表现

摘要: 心理理论（ToM）推理涉及认识到其他个体拥有自己的意图、情感和想法，这对于引导个人的思维过程至关重要。尽管大型语言模型（LLMs）在总结、问答和翻译等任务中表现出色，但它们在ToM推理方面仍然面临挑战，特别是在开放性问题中。尽管取得了进展，但LLMs真正理解ToM推理的程度以及与人类ToM推理的密切程度在开放性场景中仍未得到充分探讨。受到这一差距的启发，我们评估了LLMs感知和整合人类意图和情感到其ToM推理过程中的能力，尤其是在开放性问题中。我们的研究利用了Reddit的ChangeMyView平台上的帖子，该平台要求精细的社会推理来制定有说服力的回应。我们的分析通过比较由人类和LLMs生成的回应之间的语义相似性和词汇重叠度量，揭示了开放性问题中ToM推理能力的明显差异，即使是最先进的模型也显示出明显的局限性。为了增强LLMs的能力，我们实施了一种包含人类意图和情感的提示调整方法，结果表明ToM推理性能有所提高。然而，尽管有这些改进，这种增强仍未能完全实现类似人类的推理。这项研究凸显了LLMs社会推理的不足之处，并展示了整合人类意图和情感如何能提升它们的有效性。

更新时间: 2024-06-09 05:57:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05659v1

Visual Prompt Tuning in Null Space for Continual Learning

Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features, so as to ensure no interference on tasks that have been learned to overcome catastrophic forgetting in CL. However, different from the orthogonal projection in the traditional CNN architecture, the prompt gradient orthogonal projection in the ViT architecture shows completely different and greater challenges, i.e., 1) the high-order and non-linear self-attention operation; 2) the drift of prompt distribution brought by the LayerNorm in the transformer block. Theoretically, we have finally deduced two consistency conditions to achieve the prompt gradient orthogonal projection, which provide a theoretical guarantee of eliminating interference on previously learned knowledge via the self-attention mechanism in visual prompt tuning. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient orthogonal projection. Extensive experimental results demonstrate the effectiveness of anti-forgetting on four class-incremental benchmarks with diverse pre-trained baseline models, and our approach achieves superior performances to state-of-the-art methods. Our code is available in the supplemental material.

Updated: 2024-06-09 05:57:40

标题: 在空间空间中针对持续学习的视觉提示调整

摘要: 现有的提示调整方法在持续学习（CL）中展现了令人印象深刻的表现，通过选择和更新视觉变换器模型中相关的提示。相反，本文旨在通过调整提示以使其朝着与之前任务特征所张成的子空间正交的方向来学习每个任务，以确保对已学习的任务没有干扰，以克服持续学习中的灾难性遗忘。然而，与传统CNN架构中的正交投影不同，ViT架构中的提示梯度正交投影显示出完全不同且更大的挑战，即1) 高阶和非线性自注意操作；2) 由变压器块中的LayerNorm引起的提示分布漂移。从理论上讲，我们最终推导出两个一致性条件，以实现提示梯度正交投影，从而通过自我关注机制在视觉提示调整中消除对先前学习知识的干扰提供了理论保证。在实践中，提出了一种基于零空间的有效近似解决方案来实现提示梯度正交投影。大量实验结果展示了在四个不同的预训练基线模型上进行类增量基准测试时防止遗忘的有效性，并且我们的方法比最先进的方法表现更优。我们的代码可在补充材料中获得。

更新时间: 2024-06-09 05:57:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05658v1

Heart Sound Segmentation Using Deep Learning Techniques

Heart disease remains a leading cause of mortality worldwide. Auscultation, the process of listening to heart sounds, can be enhanced through computer-aided analysis using Phonocardiogram (PCG) signals. This paper presents a novel approach for heart sound segmentation and classification into S1 (LUB) and S2 (DUB) sounds. We employ FFT-based filtering, dynamic programming for event detection, and a Siamese network for robust classification. Our method demonstrates superior performance on the PASCAL heart sound dataset compared to existing approaches.

Updated: 2024-06-09 05:30:05

标题: 使用深度学习技术进行心音分割

摘要: 心脏疾病仍然是全球死亡的主要原因。听诊是通过使用心音图（PCG）信号进行计算机辅助分析以增强的过程。本文提出了一种新颖的心音分割和分类方法，将心音分为S1（LUB）和S2（DUB）声音。我们采用基于FFT的滤波、动态规划进行事件检测，以及Siamese网络进行稳健的分类。与现有方法相比，我们的方法在PASCAL心音数据集上表现出更好的性能。

更新时间: 2024-06-09 05:30:05

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.05653v1

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. However, in our method, we introduce several important modifications that allow us to significantly enhance 3D reconstruction quality. First of all, we examine the original LRM architecture and find several shortcomings. Subsequently, we introduce respective modifications to the LRM architecture, which lead to improved multi-view image representation and more computationally efficient training. Second, in order to improve geometry reconstruction and enable supervision at full image resolution, we extract meshes from the NeRF field in a differentiable manner and fine-tune the NeRF model through mesh rendering. These modifications allow us to achieve state-of-the-art performance on both 2D and 3D evaluation metrics, such as a PSNR of 28.67 on Google Scanned Objects (GSO) dataset. Despite these superior results, our feed-forward model still struggles to reconstruct complex textures, such as text and portraits on assets. To address this, we introduce a lightweight per-instance texture refinement procedure. This procedure fine-tunes the triplane representation and the NeRF color estimation model on the mesh surface using the input multi-view images in just 4 seconds. This refinement improves the PSNR to 29.79 and achieves faithful reconstruction of complex textures, such as text. Additionally, our approach enables various downstream applications, including text- or image-to-3D generation.

Updated: 2024-06-09 05:19:24

标题: GTR：通过几何和纹理细化改进大型3D重建模型

摘要: 我们提出了一种新颖的方法，用于从多视图图像中重建3D网格。我们的方法灵感来自于像LRM这样使用基于变压器的三平面生成器和在多视图图像上训练的神经辐射场（NeRF）模型的大型重建模型。然而，在我们的方法中，我们引入了几个重要的修改，使我们能够显着提高3D重建质量。首先，我们研究了原始LRM体系结构，并发现了几个缺点。随后，我们对LRM体系结构进行相应修改，导致改进的多视图图像表示和更高效的训练。其次，为了改善几何重建并实现全图分辨率的监督，我们以可微分的方式从NeRF场中提取网格，并通过网格渲染微调NeRF模型。这些修改使我们能够在2D和3D评估指标上实现最先进的性能，例如在Google扫描对象（GSO）数据集上达到28.67的PSNR。尽管取得了优越的结果，我们的前馈模型仍然难以重建复杂纹理，例如资产上的文本和肖像。为了解决这个问题，我们引入了一种轻量级的每个实例纹理细化过程。这个过程通过在仅4秒内使用输入的多视图图像微调三平面表示和NeRF颜色估计模型，将PSNR提高到29.79，并实现了对复杂纹理（如文本）的忠实重建。此外，我们的方法还支持各种下游应用，包括从文本或图像到3D的生成。

更新时间: 2024-06-09 05:19:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05649v1

ICU-Sepsis: A Benchmark MDP Built from Real Medical Data

We present ICU-Sepsis, an environment that can be used in benchmarks for evaluating reinforcement learning (RL) algorithms. Sepsis management is a complex task that has been an important topic in applied RL research in recent years. Therefore, MDPs that model sepsis management can serve as part of a benchmark to evaluate RL algorithms on a challenging real-world problem. However, creating usable MDPs that simulate sepsis care in the ICU remains a challenge due to the complexities involved in acquiring and processing patient data. ICU-Sepsis is a lightweight environment that models personalized care of sepsis patients in the ICU. The environment is a tabular MDP that is widely compatible and is challenging even for state-of-the-art RL algorithms, making it a valuable tool for benchmarking their performance. However, we emphasize that while ICU-Sepsis provides a standardized environment for evaluating RL algorithms, it should not be used to draw conclusions that guide medical practice.

Updated: 2024-06-09 05:11:00

标题: ICU-Sepsis：基于真实医疗数据构建的基准MDP

摘要: 我们提出了ICU-Sepsis，这是一个可以用于评估强化学习（RL）算法的基准的环境。败血症管理是一个复杂的任务，在近年来的应用RL研究中一直是一个重要的话题。因此，模拟败血症管理的MDPs可以作为基准的一部分，用于评估RL算法在具有挑战性的现实世界问题上的表现。然而，由于涉及获取和处理患者数据的复杂性，创建可用的模拟ICU中败血症护理的MDPs仍然是一个挑战。ICU-Sepsis是一个轻量级环境，模拟ICU中败血症患者的个性化护理。该环境是一个广泛兼容的表格MDP，即使对于最先进的RL算法也具有挑战性，使其成为评估其性能的宝贵工具。然而，我们强调，虽然ICU-Sepsis为评估RL算法提供了一个标准化的环境，但不应将其用于得出指导医疗实践的结论。

更新时间: 2024-06-09 05:11:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.05646v1

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful content and raising concerns about LLM safety. Due to language models with intensive parameters often regarded as black boxes, the mechanisms of alignment and jailbreak are challenging to elucidate. In this paper, we employ weak classifiers to explain LLM safety through the intermediate hidden states. We first confirm that LLMs learn ethical concepts during pre-training rather than alignment and can identify malicious and normal inputs in the early layers. Alignment actually associates the early concepts with emotion guesses in the middle layers and then refines them to the specific reject tokens for safe generations. Jailbreak disturbs the transformation of early unethical classification into negative emotions. We conduct experiments on models from 7B to 70B across various model families to prove our conclusion. Overall, our paper indicates the intrinsical mechanism of LLM safety and how jailbreaks circumvent safety guardrails, offering a new perspective on LLM safety and reducing concerns.

Updated: 2024-06-09 05:04:37

标题: 如何对齐和越狱：通过中间隐藏状态解释LLM的安全性

摘要: 大型语言模型（LLMs）依赖于安全对齐来避免对恶意用户输入做出响应。不幸的是，越狱可以规避安全防护栏，导致LLMs生成有害内容并引发对LLM安全性的担忧。由于参数密集的语言模型通常被视为黑匣子，因此对齐和越狱的机制很难阐明。在本文中，我们采用弱分类器通过中间隐藏状态来解释LLM的安全性。我们首先确认LLMs在预训练期间学习道德概念而不是对齐，并且可以在早期层识别恶意和正常输入。对齐实际上将早期概念与中间层的情感猜测联系起来，然后将它们精炼成特定的拒绝令牌以进行安全生成。越狱干扰了早期不道德分类转化为负面情绪的过程。我们在来自各种模型系列的7B到70B的模型上进行实验，以证明我们的结论。总的来说，我们的论文揭示了LLM安全性的内在机制以及越狱如何规避安全防护栏，为LLM安全性提供了新的视角并减少了担忧。

更新时间: 2024-06-09 05:04:37

领域: cs.CL,cs.AI,cs.CR,cs.CY

下载: http://arxiv.org/abs/2406.05644v1

DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models

The potential of large language models (LLMs) as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty. In this paper, we show that directly prompting LLMs on these types of decision-making problems can yield poor results, especially as the problem complexity increases. To aid in these tasks, we propose DeLLMa (Decision-making Large Language Model assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step scaffolding procedure, drawing upon principles from decision theory and utility theory, to provide a rational and human-auditable decision-making process. We validate our framework on multiple realistic decision-making environments, demonstrating that DeLLMa can consistently enhance the decision-making performance of leading language models, and achieve up to a 40% increase in accuracy over competing methods.

Updated: 2024-06-09 05:04:13

标题: DeLLMa：基于大型语言模型的不确定性决策制定框架

摘要: 大型语言模型（LLMs）作为决策支持工具的潜力正在越来越多地在商业、工程和医学等领域得到探索，这些领域通常面临决策不确定性较大的挑战。本文表明，直接在这些类型的决策问题上提示LLMs可能会产生较差的结果，特别是在问题复杂性增加的情况下。为了帮助解决这些任务，我们提出了DeLLMa（决策大型语言模型助手），这是一个旨在增强不确定环境下决策准确性的框架。DeLLMa包括一个多步骤的支架程序，借鉴了决策理论和效用理论的原则，以提供一个理性和可审计的决策过程。我们在多个现实决策环境上验证了我们的框架，证明DeLLMa可以持续提升领先语言模型的决策性能，并比竞争方法提高高达40％的准确性。

更新时间: 2024-06-09 05:04:13

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.02392v2

Random Projection Layers for Multidimensional Time Series Forecasting

All-Multi-Layer Perceptron (all-MLP) mixer models have been shown to be effective for time series forecasting problems. However, when such a model is applied to high-dimensional time series (e.g., the time series in a spatial-temporal dataset), its performance is likely to degrade due to overfitting issues. In this paper, we propose an all-MLP time series forecasting architecture, referred to as RPMixer. Our method leverages the ensemble-like behavior of deep neural networks, where each individual block within the network acts like a base learner in an ensemble model, especially when identity mapping residual connections are incorporated. By integrating random projection layers into our model, we increase the diversity among the blocks' outputs, thereby enhancing the overall performance of RPMixer. Extensive experiments conducted on large-scale spatial-temporal forecasting benchmark datasets demonstrate that our proposed method outperforms alternative methods, including both spatial-temporal graph models and general forecasting models.

Updated: 2024-06-09 04:54:29

标题: 多维时间序列预测的随机投影层

摘要: 全多层感知器（all-MLP）混合模型已被证明对于时间序列预测问题是有效的。然而，当这样一个模型应用于高维时间序列（例如，时空数据集中的时间序列）时，由于过拟合问题，其性能可能会下降。在本文中，我们提出了一种称为RPMixer的全MLP时间序列预测架构。我们的方法利用深度神经网络的类似集成行为，其中网络中的每个单独块的作用类似于集成模型中的基学习器，特别是当加入恒等映射残差连接时。通过将随机投影层集成到我们的模型中，我们增加了块输出之间的多样性，从而提高了RPMixer的整体性能。对大规模空间-时间预测基准数据集进行的大量实验表明，我们提出的方法优于替代方法，包括空间-时间图模型和一般预测模型。

更新时间: 2024-06-09 04:54:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.10487v3

ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models

Although Large Language Models (LLMs) exhibit remarkable adaptability across domains, these models often fall short in structured knowledge extraction tasks such as named entity recognition (NER). This paper explores an innovative, cost-efficient strategy to harness LLMs with modest NER capabilities for producing superior NER datasets. Our approach diverges from the basic class-conditional prompts by instructing LLMs to self-reflect on the specific domain, thereby generating domain-relevant attributes (such as category and emotions for movie reviews), which are utilized for creating attribute-rich training data. Furthermore, we preemptively generate entity terms and then develop NER context data around these entities, effectively bypassing the LLMs' challenges with complex structures. Our experiments across both general and niche domains reveal significant performance enhancements over conventional data generation methods while being more cost-effective than existing alternatives.

Updated: 2024-06-09 04:48:35

标题: ProgGen：使用自反大型语言模型逐步生成命名实体识别数据集

摘要: 尽管大型语言模型（LLMs）在各个领域展现出了显著的适应能力，但这些模型在结构化知识提取任务（如命名实体识别NER）方面往往表现不佳。本文探讨了一种创新的、成本效益高的策略，利用具有适度NER能力的LLMs来生成优质的NER数据集。我们的方法不同于基本的类别条件提示，而是指导LLMs对特定领域进行自我反思，从而生成与领域相关的属性（例如电影评论的类别和情感），用于创建属性丰富的训练数据。此外，我们预先生成实体术语，然后围绕这些实体开发NER上下文数据，有效地避开LLMs在处理复杂结构时的挑战。我们在通用和小众领域进行的实验显示，与传统数据生成方法相比，我们的方法在性能上有显著提升，同时比现有替代方案更具成本效益。

更新时间: 2024-06-09 04:48:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.11103v2

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.

Updated: 2024-06-09 04:34:55

标题: 狱长：用于嵌入式服务版权保护的多方向后门水印

摘要: Embedding as a Service（EaaS）已经成为一种广泛采用的解决方案，为自然语言处理（NLP）中的各种下游任务提供特征提取功能。先前的研究表明，EaaS可能容易受到模型提取攻击的影响；然而，通过向文本嵌入添加后门水印，并随后验证攻击模型的方式，可以缓解这种担忧。通过对最近用于EaaS的水印策略EmbMarker的分析，我们设计了一种新的CSE（聚类、选择、消除）攻击，可以去除后门水印同时保持嵌入的高效性，表明先前的水印方法是可以被攻破的。针对这种新威胁，我们提出了一个新的协议，通过整合多个可能的水印方向，使水印的去除变得更加困难。我们的防御方法WARDEN显著提高了水印的隐蔽性，并经验性地证明对抗CSE攻击是有效的。

更新时间: 2024-06-09 04:34:55

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.01472v2

A Generalized Version of Chung's Lemma and its Applications

Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized Chung's lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as stochastic gradient descent and random reshuffling, under a general $(\theta,\mu)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape. Our results demonstrate that the developed variant of Chung's lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.

Updated: 2024-06-09 04:25:10

标题: Chung引理的一个泛化版本及其应用

摘要: 钟氏引理是建立（随机）优化方法在强凸性类型假设和适当的多项式递减步长下的渐近收敛速率的经典工具。在这项工作中，我们发展了钟氏引理的广义版本，为更一般的步长规则提供了一个简单的非渐近收敛框架。我们通过推导丰富多样的随机方法的紧密非渐近收敛速率，展示了所提出的广义钟氏引理的广泛适用性。特别是，我们在一般的（θ，μ）-Polyak-Lojasiewicz（PL）条件下，针对各种步长策略，包括多项式、恒定、指数和余弦步长规则，获得了部分新的非渐近复杂性结果，如随机梯度下降和随机重排。值得注意的是，作为我们分析的副产品，我们观察到指数步长可以适应目标函数的几何形态，实现最佳收敛速率，而无需对底层景观有精确的了解。我们的结果表明，发展的钟氏引理变体提供了一种灵活、系统化和简化的方法，可在一般步长规则下建立非渐近收敛速率。

更新时间: 2024-06-09 04:25:10

领域: math.OC,cs.LG,math.PR,stat.ML,90C15, 90C30, 90C26

下载: http://arxiv.org/abs/2406.05637v1

OceanCastNet: A Deep Learning Ocean Wave Model with Energy Conservation

Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN). By incorporating wind fields at the current, previous, and future time steps, as well as wave fields at the current and previous time steps as input variables, OCN maintains energy balance within the model. Furthermore, the model employs adaptive Fourier operators as its core components and designs a masked loss function to better handle the impact of land-sea boundaries. A series of experiments on the ERA5 dataset demonstrate that OCN can achieve short-term forecast accuracy comparable to traditional models while exhibiting an understanding of the wave generation process. In comparative experiments under both normal and extreme conditions, OCN consistently outperforms the widely used WaveWatch III model in the industry. Even after long-term forecasting, OCN maintains a stable and energy-rich state. By further constructing a simple meteorological model, OCN-wind, which considers energy balance, this paper confirms the importance of energy constraints for improving the long-term forecast performance of deep learning meteorological models. This finding provides new ideas for future research on deep learning geophysical fluid models.

Updated: 2024-06-09 04:22:21

标题: OceanCastNet：一个具有能量守恒的深度学习海浪模型

摘要: 传统的波浪预测模型虽然基于能量守恒方程，但计算成本高昂。另一方面，现有的深度学习地球物理流体模型虽然计算效率高，但在长期预测中经常出现能量耗散等问题。本文提出了一种新颖的能量平衡深度学习波浪预测模型，称为OceanCastNet（OCN）。通过将当前、前一步和未来时间步的风场以及当前和前一步的波浪场作为输入变量，OCN在模型内保持能量平衡。此外，该模型采用自适应傅里叶算子作为其核心组件，并设计了一个掩码损失函数，以更好地处理陆海边界的影响。对ERA5数据集进行的一系列实验表明，OCN能够实现与传统模型相当的短期预测准确性，同时展现对波浪生成过程的理解。在正常和极端条件下的比较实验中，OCN始终优于业界广泛使用的WaveWatch III模型。即使在长期预测后，OCN仍然保持稳定和富有能量的状态。通过进一步构建一个考虑能量平衡的简单气象模型OCN-wind，本文确认了能量约束对改善深度学习气象模型的长期预测性能的重要性。这一发现为未来深度学习地球物理流体模型的研究提供了新思路。

更新时间: 2024-06-09 04:22:21

领域: physics.ao-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03848v2

What is my quantum computer good for? Quantum capability learning with physics-aware neural networks

Quantum computers have the potential to revolutionize diverse fields, including quantum chemistry, materials science, and machine learning. However, contemporary quantum computers experience errors that often cause quantum programs run on them to fail. Until quantum computers can reliably execute large quantum programs, stakeholders will need fast and reliable methods for assessing a quantum computer's capability-i.e., the programs it can run and how well it can run them. Previously, off-the-shelf neural network architectures have been used to model quantum computers' capabilities, but with limited success, because these networks fail to learn the complex quantum physics that determines real quantum computers' errors. We address this shortcoming with a new quantum-physics-aware neural network architecture for learning capability models. Our architecture combines aspects of graph neural networks with efficient approximations to the physics of errors in quantum programs. This approach achieves up to $\sim50\%$ reductions in mean absolute error on both experimental and simulated data, over state-of-the-art models based on convolutional neural networks.

Updated: 2024-06-09 04:11:41

标题: 我的量子计算机有什么用？具有物理意识的神经网络的量子能力学习

摘要: 量子计算机有潜力彻底改变多个领域，包括量子化学、材料科学和机器学习。然而，当代量子计算机存在错误，经常导致在其上运行的量子程序失败。在量子计算机能够可靠地执行大型量子程序之前，利益相关者需要快速可靠的方法来评估量子计算机的能力，即它可以运行哪些程序以及它能够如何运行它们。以前，现成的神经网络架构被用来模拟量子计算机的能力，但由于这些网络无法学习决定真实量子计算机错误的复杂量子物理，因此取得了有限的成功。我们通过一种新的、了解量子物理的神经网络架构来解决这一不足，用于学习能力模型。我们的架构结合了图神经网络的方面和对量子程序错误物理的有效近似。这种方法在实验数据和模拟数据上分别实现了约50%的平均绝对误差减少，超过了基于卷积神经网络的最新模型。

更新时间: 2024-06-09 04:11:41

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2406.05636v1

Heterogeneous Treatment Effects in Panel Data

We address a core problem in causal inference: estimating heterogeneous treatment effects using panel data with general treatment patterns. Many existing methods either do not utilize the potential underlying structure in panel data or have limitations in the allowable treatment patterns. In this work, we propose and evaluate a new method that first partitions observations into disjoint clusters with similar treatment effects using a regression tree, and then leverages the (assumed) low-rank structure of the panel data to estimate the average treatment effect for each cluster. Our theoretical results establish the convergence of the resulting estimates to the true treatment effects. Computation experiments with semi-synthetic data show that our method achieves superior accuracy compared to alternative approaches, using a regression tree with no more than 40 leaves. Hence, our method provides more accurate and interpretable estimates than alternative methods.

Updated: 2024-06-09 04:02:08

标题: 面板数据中的异质性处理效应

摘要: 我们解决了因果推断中的一个核心问题：使用具有一般治疗模式的面板数据估计异质治疗效应。许多现有方法要么不利用面板数据中潜在的结构，要么在允许的治疗模式上有限制。在这项工作中，我们提出并评估了一种新方法，该方法首先使用回归树将观测结果分成具有相似治疗效应的不相交簇，然后利用（假设的）面板数据的低秩结构来估计每个簇的平均治疗效应。我们的理论结果证明了得到的估计收敛于真实的治疗效应。使用半合成数据进行的计算实验表明，我们的方法与替代方法相比具有更高的准确性，使用的回归树不超过40个叶子。因此，我们的方法提供比替代方法更准确和可解释的估计。

更新时间: 2024-06-09 04:02:08

领域: stat.ML,cs.LG,econ.EM

下载: http://arxiv.org/abs/2406.05633v1

CCSI: Continual Class-Specific Impression for Data-free Class Incremental Learning

In real-world clinical settings, traditional deep learning-based classification methods struggle with diagnosing newly introduced disease types because they require samples from all disease classes for offline training. Class incremental learning offers a promising solution by adapting a deep network trained on specific disease classes to handle new diseases. However, catastrophic forgetting occurs, decreasing the performance of earlier classes when adapting the model to new data. Prior proposed methodologies to overcome this require perpetual storage of previous samples, posing potential practical concerns regarding privacy and storage regulations in healthcare. To this end, we propose a novel data-free class incremental learning framework that utilizes data synthesis on learned classes instead of data storage from previous classes. Our key contributions include acquiring synthetic data known as Continual Class-Specific Impression (CCSI) for previously inaccessible trained classes and presenting a methodology to effectively utilize this data for updating networks when introducing new classes. We obtain CCSI by employing data inversion over gradients of the trained classification model on previous classes starting from the mean image of each class inspired by common landmarks shared among medical images and utilizing continual normalization layers statistics as a regularizer in this pixel-wise optimization process. Subsequently, we update the network by combining the synthesized data with new class data and incorporate several losses, including an intra-domain contrastive loss to generalize the deep network trained on the synthesized data to real data, a margin loss to increase separation among previous classes and new ones, and a cosine-normalized cross-entropy loss to alleviate the adverse effects of imbalanced distributions in training data.

Updated: 2024-06-09 03:52:21

标题: CCSI：无数据类增量学习中的持续类特定印象

摘要: 在现实世界的临床环境中，传统的基于深度学习的分类方法在诊断新引入的疾病类型时面临困难，因为它们需要来自所有疾病类别的样本进行离线训练。类增量学习通过使经过特定疾病类别训练的深度网络适应新疾病而提供了一个有前途的解决方案。然而，当将模型调整到新数据时，灾难性遗忘发生，降低了先前类别的性能。先前提出的克服方法需要持续存储先前样本，可能引发有关医疗保健隐私和存储法规的潜在实际问题。因此，我们提出了一种新颖的无数据类增量学习框架，该框架利用对已学习类别进行数据合成，而不是来自先前类别的数据存储。我们的关键贡献包括获取被称为连续类特定印象（CCSI）的合成数据，用于以前无法访问的训练类别，并提出一种方法，有效利用这些数据来更新网络，并引入新类别。我们通过在以先前类别的每个类别的均值图像为起点的训练分类模型的梯度上进行数据反演，借鉴共享于医学图像之间的共同标志，并利用连续归一化层统计作为像素优化过程中的正则化器，获得CCSI。随后，我们通过将合成数据与新类数据结合，并结合几种损失，包括一个域内对比损失，将在合成数据上训练的深度网络泛化到真实数据，一个边距损失，增加先前类别与新类别之间的分离，并一个余弦归一化的交叉熵损失，以缓解训练数据中不平衡分布的不利影响。

更新时间: 2024-06-09 03:52:21

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.05631v1

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visually aligned features solely through watching videos. We show that DenseAV can discover the ``meaning'' of words and the ``location'' of sounds without explicit localization supervision. Furthermore, it automatically discovers and distinguishes between these two types of associations without supervision. We show that DenseAV's localization abilities arise from a new multi-head feature aggregation operator that directly compares dense image and audio representations for contrastive learning. In contrast, many other systems that learn ``global'' audio and video representations cannot localize words and sound. Finally, we contribute two new datasets to improve the evaluation of AV representations through speech and sound prompted semantic segmentation. On these and other datasets we show DenseAV dramatically outperforms the prior art on speech and sound prompted semantic segmentation. DenseAV outperforms the previous state-of-the-art, ImageBind, on cross-modal retrieval using fewer than half of the parameters. Project Page: \href{https://aka.ms/denseav}{https://aka.ms/denseav}

Updated: 2024-06-09 03:38:21

标题: 将“chirp”与“chat”区分开：自我监督的声音和语言的视觉定位

摘要: 我们提出了一种新颖的双编码器对齐架构DenseAV，通过观看视频来学习高分辨率、语义有意义且视听对齐的特征。我们展示了DenseAV可以在没有明确本地化监督的情况下发现单词的“含义”和声音的“位置”。此外，它可以自动发现和区分这两种类型的关联，无需监督。我们展示了DenseAV的本地化能力来自一个新的多头特征聚合运算符，直接比较密集图像和音频表示以进行对比学习。相比之下，许多学习“全局”音频和视频表示的系统无法本地化单词和声音。最后，我们提供了两个新数据集，以改善通过语音和声音提示的语义分割对AV表示的评估。在这些数据集和其他数据集上，我们展示了DenseAV在语音和声音提示的语义分割上远远优于先前的技术水平。DenseAV在跨模态检索中的表现也优于先前的技术水平，使用的参数不到一半。项目页面：\href{https://aka.ms/denseav}{https://aka.ms/denseav}

更新时间: 2024-06-09 03:38:21

领域: cs.CV,cs.CL,cs.IR,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.05629v1

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis and Fake Dispersion (REFD) strategy for audio deepfake algorithm recognition, demonstrating its effectiveness in discriminating ID samples while identifying OOD samples. For effective OOD detection, we first explore current post-hoc OOD methods and propose NSD, a novel OOD approach in identifying novel deepfake algorithms through the similarity consideration of both feature and logits scores. REFD achieves 86.83% F1-score as a single system in Audio Deepfake Detection Challenge 2023 Track3, showcasing its state-of-the-art performance.

Updated: 2024-06-09 03:33:59

标题: 广义源追踪：利用真实强调和虚假分散策略检测新颖的音频深度伪造算法

摘要: 随着深度伪造音频的泛滥，迫切需要调查它们的来源。当前的来源追踪方法可以有效区分分布内（ID）类别。然而，深度伪造算法的快速演变对精确识别分布外（OOD）新深度伪造算法提出了重大挑战。在本文中，我们提出了用于音频深度伪造算法识别的真实强调和伪造散布（REFD）策略，展示了它在辨别ID样本的同时识别OOD样本的有效性。为了有效地检测OOD，我们首先探索当前的事后OOD方法，并提出了NSD，一种通过同时考虑特征和对数分数的相似性来识别新深度伪造算法的新OOD方法。REFD在2023年音频深度伪造检测挑战赛Track3中作为单一系统达到了86.83%的F1分数，展示了其最先进的性能。

更新时间: 2024-06-09 03:33:59

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.03240v2

Domain Generalization Guided by Large-Scale Pre-Trained Priors

Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to maintain this ability, it could further enhance the generalization ability of the DG model. For this purpose, we introduce a new method called Fine-Tune with Large-scale pre-trained Priors (FT-LP), which incorporates the pre-trained model as a prior into the DG fine-tuning process, ensuring that the model refers to its pre-trained model at each optimization step. FT-LP comprises a theoretical framework and a simple implementation strategy. In theory, we verify the rationality of FT-LP by introducing a generalization error bound with the pre-trained priors for DG. In implementation, we utilize an encoder to simulate the model distribution, enabling the use of FT-LP when only pre-trained weights are available. In summary, we offer a new fine-tuning method for DG algorithms to utilize pre-trained models throughout the fine-tuning process. Through experiments on various datasets and DG models, our proposed method exhibits significant improvements, indicating its effectiveness.

Updated: 2024-06-09 03:32:32

标题: 领域泛化：基于大规模预训练先验指导

摘要: Domain generalization (DG)旨在从有限的源域训练模型，使其能够泛化到未知的目标域。通常，DG模型在微调初始化阶段只使用大规模预训练模型。然而，大规模预训练模型已经具备抵抗领域偏移的能力。如果我们在微调过程中持续引用预训练模型以保持这种能力，可以进一步增强DG模型的泛化能力。为此，我们引入了一种称为Fine-Tune with Large-scale pre-trained Priors (FT-LP)的新方法，将预训练模型作为先验融入到DG微调过程中，确保模型在每个优化步骤中参考其预训练模型。FT-LP包括一个理论框架和一个简单的实现策略。在理论上，我们通过引入与预训练先验相关的泛化误差界，验证了FT-LP的合理性。在实现上，我们利用编码器模拟模型分布，使得在只有预训练权重可用时可以使用FT-LP。总之，我们提供了一种新的微调方法，供DG算法在整个微调过程中利用预训练模型。通过在各种数据集和DG模型上进行实验，我们提出的方法表现出显著的改进，表明其有效性。

更新时间: 2024-06-09 03:32:32

领域: cs.LG

下载: http://arxiv.org/abs/2406.05628v1

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

Recent developments in Large Language Models (LLMs) have manifested significant advancements. To facilitate safeguards against malicious exploitation, a body of research has concentrated on aligning LLMs with human preferences and inhibiting their generation of inappropriate content. Unfortunately, such alignments are often vulnerable: fine-tuning with a minimal amount of harmful data can easily unalign the target LLM. While being effective, such fine-tuning-based unalignment approaches also have their own limitations: (1) non-stealthiness, after fine-tuning, safety audits or red-teaming can easily expose the potential weaknesses of the unaligned models, thereby precluding their release/use. (2) non-persistence, the unaligned LLMs can be easily repaired through re-alignment, i.e., fine-tuning again with aligned data points. In this work, we show that it is possible to conduct stealthy and persistent unalignment on large language models via backdoor injections. We also provide a novel understanding on the relationship between the backdoor persistence and the activation pattern and further provide guidelines for potential trigger design. Through extensive experiments, we demonstrate that our proposed stealthy and persistent unalignment can successfully pass the safety evaluation while maintaining strong persistence against re-alignment defense.

Updated: 2024-06-09 03:27:40

标题: 大规模语言模型中通过后门注入实现隐蔽持久的不对齐

摘要: 近期对大型语言模型（LLMs）的发展表现出显著的进步。为了促进对恶意利用的防范，一系列研究已经集中在将LLMs与人类偏好对齐，并抑制它们生成不当内容。不幸的是，这种对齐往往是脆弱的：用少量有害数据进行微调很容易使目标LLM失去对齐。虽然这种基于微调的失对齐方法是有效的，但它们也有自己的局限性：（1）非隐蔽性，微调后，安全审核或红队行动很容易暴露失对齐模型的潜在弱点，从而排除其发布/使用。（2）非持久性，失对齐的LLMs可以通过重新对齐来轻松修复，即再次用对齐的数据点进行微调。在这项工作中，我们展示了通过后门注入可以对大型语言模型进行隐蔽和持久的失对齐。我们还提供了对后门持久性与激活模式之间关系的新理解，并进一步提供了潜在触发器设计的指导方针。通过大量实验，我们证明了我们提出的隐蔽和持久的失对齐可以成功通过安全评估，同时保持对重新对齐防御的强大持久性。

更新时间: 2024-06-09 03:27:40

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.00027v2

Observation Denoising in CYRUS Soccer Simulation 2D Team For RoboCup 2024

In the Soccer Simulation 2D environment, accurate observation is crucial for effective decision making. However, challenges such as partial observation and noisy data can hinder performance. To address these issues, we propose a denoising algorithm that leverages predictive modeling and intersection analysis to enhance the accuracy of observations. Our approach aims to mitigate the impact of noise and partial data, leading to improved gameplay performance. This paper presents the framework, implementation, and preliminary results of our algorithm, demonstrating its potential in refining observations in Soccer Simulation 2D. Cyrus 2D Team is using a combination of Helios, Gliders, and Cyrus base codes.

Updated: 2024-06-09 03:15:29

标题: 《CYRUS足球模拟2D团队中的观测去噪技术应用于RoboCup 2024》

摘要: 在足球模拟2D环境中，准确的观察对于有效的决策至关重要。然而，诸如部分观察和嘈杂数据等挑战可能影响表现。为了解决这些问题，我们提出了一种去噪算法，利用预测建模和交集分析来提高观察的准确性。我们的方法旨在减轻噪声和部分数据的影响，从而提高游戏表现。本文介绍了我们算法的框架、实现和初步结果，展示了其在细化足球模拟2D中观察的潜力。Cyrus 2D团队正在使用Helios、Gilders和Cyrus基础代码的组合。

更新时间: 2024-06-09 03:15:29

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.05623v1

Cross Language Soccer Framework: An Open Source Framework for the RoboCup 2D Soccer Simulation

RoboCup Soccer Simulation 2D (SS2D) research is hampered by the complexity of existing Cpp-based codes like Helios, Cyrus, and Gliders, which also suffer from limited integration with modern machine learning frameworks. This development paper introduces a transformative solution a gRPC-based, language-agnostic framework that seamlessly integrates with the high-performance Helios base code. This approach not only facilitates the use of diverse programming languages including CSharp, JavaScript, and Python but also maintains the computational efficiency critical for real time decision making in SS2D. By breaking down language barriers, our framework significantly enhances collaborative potential and flexibility, empowering researchers to innovate without the overhead of mastering or developing extensive base codes. We invite the global research community to leverage and contribute to the Cross Language Soccer (CLS) framework, which is openly available under the MIT License, to drive forward the capabilities of multi-agent systems in soccer simulations.

Updated: 2024-06-09 03:11:40

标题: 跨语言足球框架：一个用于RoboCup 2D足球模拟的开源框架

摘要: RoboCup Soccer Simulation 2D（SS2D）研究受到现有基于Cpp的代码（如Helios、Cyrus和Gliders）复杂性的阻碍，这些代码也存在与现代机器学习框架的有限集成。本发展论文介绍了一种革命性的解决方案，即基于gRPC的、与语言无关的框架，与高性能的Helios基础代码无缝集成。这种方法不仅便于使用包括CSharp、JavaScript和Python在内的各种编程语言，而且保持了SS2D中实时决策所必需的计算效率。通过打破语言障碍，我们的框架显著增强了协作潜力和灵活性，使研究人员能够在不需要掌握或开发大量基础代码的情况下进行创新。我们邀请全球研究社区利用和贡献Cross Language Soccer（CLS）框架，该框架在MIT许可证下公开提供，以推动多智能体系统在足球模拟中的能力。

更新时间: 2024-06-09 03:11:40

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.05621v1

Domain Agnostic Conditional Invariant Predictions for Domain Generalization

Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capture the invariant features without domain labels. In DRM theory, we prove that reducing the discrepancy of prediction distribution between overall source domain and any subset of it can contribute to obtaining invariant features. To apply the DRM theory, we develop an algorithm which is composed of Bayesian inference and a new penalty termed as Categorical Discriminant Risk (CDR). In Bayesian inference, we transform the output of the model into a probability distribution to align with our theoretical assumptions. We adopt sliding update approach to approximate the overall prediction distribution of the model, which enables us to obtain CDR penalty. We also indicate the effectiveness of these components in finding invariant features. We evaluate our algorithm against various domain generalization methods on multiple real-world datasets, providing empirical support for our theory.

Updated: 2024-06-09 02:38:52

标题: 领域无关的条件不变预测用于领域泛化

摘要: 域泛化旨在通过从多个源域学习来在未见目标域上表现良好的模型。然而，最近提出的域泛化模型通常依赖于域标签，在许多实际场景中可能无法使用。为了解决这一挑战，我们提出了一种判别风险最小化（DRM）理论和相应的算法，以捕捉无需域标签的不变特征。在DRM理论中，我们证明减少整体源域与其任何子集之间预测分布的差异可以有助于获得不变特征。为了应用DRM理论，我们开发了一个算法，由贝叶斯推断和一种称为分类判别风险（CDR）的新惩罚组成。在贝叶斯推断中，我们将模型的输出转换为概率分布，以与我们的理论假设相一致。我们采用滑动更新方法来近似模型的整体预测分布，从而使我们能够获得CDR惩罚。我们还指出了这些组件在发现不变特征方面的有效性。我们在多个真实世界数据集上评估了我们的算法与各种域泛化方法的对比，为我们的理论提供了经验支持。

更新时间: 2024-06-09 02:38:52

领域: cs.LG

下载: http://arxiv.org/abs/2406.05616v1

Quantum Mixed-State Self-Attention Network

The rapid advancement of quantum computing has increasingly highlighted its potential in the realm of machine learning, particularly in the context of natural language processing (NLP) tasks. Quantum machine learning (QML) leverages the unique capabilities of quantum computing to offer novel perspectives and methodologies for complex data processing and pattern recognition challenges. This paper introduces a novel Quantum Mixed-State Attention Network (QMSAN), which integrates the principles of quantum computing with classical machine learning algorithms, especially self-attention networks, to enhance the efficiency and effectiveness in handling NLP tasks. QMSAN model employs a quantum attention mechanism based on mixed states, enabling efficient direct estimation of similarity between queries and keys within the quantum domain, leading to more effective attention weight acquisition. Additionally, we propose an innovative quantum positional encoding scheme, implemented through fixed quantum gates within the quantum circuit, to enhance the model's accuracy. Experimental validation on various datasets demonstrates that QMSAN model outperforms existing quantum and classical models in text classification, achieving significant performance improvements. QMSAN model not only significantly reduces the number of parameters but also exceeds classical self-attention networks in performance, showcasing its strong capability in data representation and information extraction. Furthermore, our study investigates the model's robustness in different quantum noise environments, showing that QMSAN possesses commendable robustness to low noise.

Updated: 2024-06-09 02:26:13

标题: 量子混合态自注意网络

摘要: 量子计算的快速发展越来越突显了其在机器学习领域的潜力，特别是在自然语言处理（NLP）任务的背景下。量子机器学习（QML）利用量子计算的独特能力提供了处理复杂数据处理和模式识别挑战的新视角和方法论。本文介绍了一种新颖的量子混合态注意力网络（QMSAN），将量子计算的原理与经典机器学习算法结合，特别是自注意网络，以提高处理NLP任务的效率和效果。QMSAN模型采用基于混合态的量子注意力机制，能够在量子领域内有效地直接估计查询和键之间的相似性，从而更有效地获取注意力权重。此外，我们提出了一种创新的量子位置编码方案，通过量子电路中的固定量子门实现，以提高模型的准确性。在各种数据集上的实验证实表明，QMSAN模型在文本分类方面优于现有的量子和经典模型，实现了显著的性能改进。QMSAN模型不仅大大减少了参数数量，而且在性能上超越了经典自注意网络，展示了其在数据表示和信息提取方面的强大能力。此外，我们的研究还探讨了模型在不同量子噪声环境下的稳健性，表明QMSAN在低噪声下具有可靠的稳健性。

更新时间: 2024-06-09 02:26:13

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2403.02871v2

Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision

In contemporary computer vision applications, particularly image classification, architectural backbones pre-trained on large datasets like ImageNet are commonly employed as feature extractors. Despite the widespread use of these pre-trained convolutional neural networks (CNNs), there remains a gap in understanding the performance of various resource-efficient backbones across diverse domains and dataset sizes. Our study systematically evaluates multiple lightweight, pre-trained CNN backbones under consistent training settings across a variety of datasets, including natural images, medical images, galaxy images, and remote sensing images. This comprehensive analysis aims to aid machine learning practitioners in selecting the most suitable backbone for their specific problem, especially in scenarios involving small datasets where fine-tuning a pre-trained network is crucial. Even though attention-based architectures are gaining popularity, we observed that they tend to perform poorly under low data finetuning tasks compared to CNNs. We also observed that some CNN architectures such as ConvNeXt, RegNet and EfficientNet performs well compared to others on a diverse set of domains consistently. Our findings provide actionable insights into the performance trade-offs and effectiveness of different backbones, facilitating informed decision-making in model selection for a broad spectrum of computer vision domains. Our code is available here: https://github.com/pranavphoenix/Backbones

Updated: 2024-06-09 02:01:25

标题: 使用哪种骨干网络：计算机视觉领域资源高效比较

摘要: 在当代计算机视觉应用中，特别是图像分类中，通常会使用在大型数据集（如ImageNet）上预训练的网络架构作为特征提取器。尽管这些预训练的卷积神经网络（CNNs）被广泛使用，但在不同领域和数据集大小下对各种资源高效的网络架构性能的理解仍存在差距。我们的研究系统地评估了多个轻量级的、预训练的CNN网络架构在多个数据集（包括自然图像、医学图像、星系图像和遥感图像）上的性能，在一致的训练设置下进行了测试。这项全面的分析旨在帮助机器学习从业者选择最适合他们特定问题的网络架构，尤其是在涉及小数据集的场景中，微调预训练网络至关重要。尽管基于注意力的架构越来越受欢迎，我们观察到它们在低数据微调任务中表现较差，与CNNs相比。我们还观察到一些CNN架构，如ConvNeXt、RegNet和EfficientNet在各种领域上表现出色，而其他架构则表现不佳。我们的研究结果提供了对不同网络架构性能折衷和有效性的可行见解，有助于在计算机视觉领域的广泛模型选择中做出明智决策。我们的代码可在此处找到：https://github.com/pranavphoenix/Backbones

更新时间: 2024-06-09 02:01:25

领域: cs.CV,cs.AI,cs.LG,I.2.10; I.4.0; I.4.1; I.4.2; I.4.6; I.4.7; I.4.8; I.4.9; I.4.10; I.2.10; I.5.1; I.5.2; I.5.4; J.2

下载: http://arxiv.org/abs/2406.05612v1

CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities

Recent large language models (LLMs) have shown indications of mathematical reasoning ability on challenging competition-level problems, especially with self-generated verbalizations of intermediate reasoning steps (i.e., chain-of-thought prompting). However, current evaluations mainly focus on the end-to-end final answer correctness, and it is unclear whether LLMs can make use of helpful side information such as problem-specific hints. In this paper, we propose a challenging benchmark dataset for enabling such analyses. The Concept and Hint-Annotated Math Problems (CHAMP) consists of high school math competition problems, annotated with concepts, or general math facts, and hints, or problem-specific tricks. These annotations allow us to explore the effects of additional information, such as relevant hints, misleading concepts, or related problems. This benchmark is difficult, with the best model only scoring 58.1% in standard settings. With concepts and hints, performance sometimes improves, indicating that some models can make use of such side information. Furthermore, we annotate model-generated solutions for their correctness. Using this corpus, we find that models often arrive at the correct final answer through wrong reasoning steps. In addition, we test whether models are able to verify these solutions, and find that most models struggle.

Updated: 2024-06-09 01:47:26

标题: CHAMP：一个用于LLMs数学推理能力细致分析的竞赛级数据集

摘要: 最近的大型语言模型(LLMs)在挑战性竞赛级问题上展现出了数学推理能力的迹象，尤其是通过自动生成中间推理步骤的口头表达（即思维链提示）。然而，当前的评估主要集中在端到端最终答案的正确性上，尚不清楚LLMs是否能够利用有用的侧面信息，如问题特定提示。在本文中，我们提出了一个挑战性的基准数据集，用于实现这种分析。概念和提示注释数学问题（CHAMP）包含高中数学竞赛问题，带有概念或一般数学事实的注释，以及提示或问题特定的技巧。这些注释使我们能够探索额外信息的影响，例如相关提示、误导性概念或相关问题。这个基准很困难，在标准设置下，最佳模型的得分仅为58.1%。有了概念和提示，性能有时会提高，表明一些模型可以利用这样的侧面信息。此外，我们为模型生成的解决方案注释其正确性。利用这一语料库，我们发现模型经常通过错误的推理步骤得出正确的最终答案。此外，我们测试模型是否能够验证这些解决方案，发现大多数模型都很困难。

更新时间: 2024-06-09 01:47:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.06961v2

Cooperative Graph Neural Networks

Graph neural networks are popular architectures for graph machine learning, based on iterative computation of node representations of an input graph through a series of invariant transformations. A large class of graph neural networks follow a standard message-passing paradigm: at every layer, each node state is updated based on an aggregate of messages from its neighborhood. In this work, we propose a novel framework for training graph neural networks, where every node is viewed as a player that can choose to either 'listen', 'broadcast', 'listen and broadcast', or to 'isolate'. The standard message propagation scheme can then be viewed as a special case of this framework where every node 'listens and broadcasts' to all neighbors. Our approach offers a more flexible and dynamic message-passing paradigm, where each node can determine its own strategy based on their state, effectively exploring the graph topology while learning. We provide a theoretical analysis of the new message-passing scheme which is further supported by an extensive empirical analysis on a synthetic dataset and on real-world datasets.

Updated: 2024-06-09 01:42:46

标题: 合作式图神经网络

摘要: 图神经网络是用于图机器学习的流行架构，基于通过一系列不变变换迭代计算输入图的节点表示。许多图神经网络遵循标准的消息传递范式：在每一层，每个节点状态都是基于其邻域消息的聚合进行更新的。在这项工作中，我们提出了一个新颖的框架来训练图神经网络，其中每个节点被视为一个可以选择“听取”，“广播”，“听取和广播”，或者“孤立”的玩家。标准的消息传播方案可以被视为这个框架的一个特例，其中每个节点都“听取和广播”到所有邻居。我们的方法提供了一个更灵活和动态的消息传递范式，在这种范式中，每个节点可以根据其状态确定自己的策略，有效地探索图拓扑结构并学习。我们对新的消息传递方案进行了理论分析，并在合成数据集和真实数据集上进行了广泛的实证分析来支持这一分析。

更新时间: 2024-06-09 01:42:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.01267v2

Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space

We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $\pi$ over $\mathbb{R}^d$ by a product measure $\pi^\star$. When $\pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $\pi^\star$ is close to the minimizer $\pi^\star_\diamond$ of the KL divergence over a \emph{polyhedral} set $\mathcal{P}_\diamond$, and (2) an algorithm for minimizing $\text{KL}(\cdot\|\pi)$ over $\mathcal{P}_\diamond$ with accelerated complexity $O(\sqrt \kappa \log(\kappa d/\varepsilon^2))$, where $\kappa$ is the condition number of $\pi$.

Updated: 2024-06-09 01:19:10

标题: 在Wasserstein空间中通过多面体优化的均场变分推断算法

摘要: 我们发展了一个有限维多面体子集的理论，该子集位于Wasserstein空间上，并通过一阶方法优化其上的泛函。我们的主要应用是解决均场变分推断问题，该问题旨在通过一个乘积测度$\pi^\star$来近似一个分布$\pi$在$\mathbb{R}^d$上的分布。当$\pi$是强对数凹和对数平滑时，我们提供了以下内容：(1) 近似率，证明$\pi^\star$接近KL散度在一个多面体集合$\mathcal{P}_\diamond$上的最小值$\pi^\star_\diamond$，以及(2) 一种算法，用于在$\mathcal{P}_\diamond$上加速复杂度为$O(\sqrt \kappa \log(\kappa d/\varepsilon^2))$的最小化$\text{KL}(\cdot\|\pi)$，其中$\kappa$是$\pi$的条件数。

更新时间: 2024-06-09 01:19:10

领域: math.ST,cs.LG,math.OC,stat.TH

下载: http://arxiv.org/abs/2312.02849v2

Deep Learning to Predict Glaucoma Progression using Structural Changes in the Eye

Glaucoma is a chronic eye disease characterized by optic neuropathy, leading to irreversible vision loss. It progresses gradually, often remaining undiagnosed until advanced stages. Early detection is crucial to monitor atrophy and develop treatment strategies to prevent further vision impairment. Data-centric methods have enabled computer-aided algorithms for precise glaucoma diagnosis. In this study, we use deep learning models to identify complex disease traits and progression criteria, detecting subtle changes indicative of glaucoma. We explore the structure-function relationship in glaucoma progression and predict functional impairment from structural eye deterioration. We analyze statistical and machine learning methods, including deep learning techniques with optical coherence tomography (OCT) scans for accurate progression prediction. Addressing challenges like age variability, data imbalances, and noisy labels, we develop novel semi-supervised time-series algorithms: 1. Weakly-Supervised Time-Series Learning: We create a CNN-LSTM model to encode spatiotemporal features from OCT scans. This approach uses age-related progression and positive-unlabeled data to establish robust pseudo-progression criteria, bypassing gold-standard labels. 2. Semi-Supervised Time-Series Learning: Using labels from Guided Progression Analysis (GPA) in a contrastive learning scheme, the CNN-LSTM architecture learns from potentially mislabeled data to improve prediction accuracy. Our methods outperform conventional and state-of-the-art techniques.

Updated: 2024-06-09 01:12:41

标题: 深度学习预测青光眼进展，利用眼部结构变化

摘要: 青光眼是一种慢性眼病，特征为视神经病变，导致不可逆转的视力损失。它通常逐渐发展，往往在晚期才被诊断出来。早期检测对于监测萎缩并制定治疗策略以防止进一步视力损害至关重要。基于数据的方法已经实现了计算机辅助算法，用于精确诊断青光眼。在这项研究中，我们使用深度学习模型来识别复杂的疾病特征和进展标准，检测提示青光眼的细微变化。我们探索了青光眼进展中的结构功能关系，并从结构性眼部恶化中预测功能损伤。我们分析了统计学和机器学习方法，包括使用光学相干断层扫描（OCT）进行准确的进展预测的深度学习技术。解决年龄变异、数据不平衡和嘈杂标签等挑战，我们开发了新颖的半监督时间序列算法： 1. 弱监督时间序列学习：我们创建了一个CNN-LSTM模型，从OCT扫描中编码时空特征。这种方法利用与年龄相关的进展和正-未标记数据来建立强健的伪进展标准，绕过黄金标准标签。 2. 半监督时间序列学习：使用引导进展分析（GPA）中的标签，在对比学习方案中，CNN-LSTM架构从潜在错误标记数据中学习，以提高预测准确性。我们的方法胜过传统和最先进的技术。

更新时间: 2024-06-09 01:12:41

领域: cs.CV,cs.AI,cs.LG,68T07,I.2.1

下载: http://arxiv.org/abs/2406.05605v1

A Knowledge-Component-Based Methodology for Evaluating AI Assistants

We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4, a large language model. This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises. A hint can be requested each time a student fails a test case. Our evaluation addresses three Research Questions: RQ1: Do the hints help students improve their code? RQ2: How effectively do the hints capture problems in student code? RQ3: Are the issues that students resolve the same as the issues addressed in the hints? To address these research questions quantitatively, we identified a set of fine-grained knowledge components and determined which ones apply to each exercise, incorrect solution, and generated hint. Comparing data from two large CS1 offerings, we found that access to the hints helps students to address problems with their code more quickly, that hints are able to consistently capture the most pressing errors in students' code, and that hints that address a few issues at once rather than a single bug are more likely to lead to direct student progress.

Updated: 2024-06-09 00:58:39

标题: 基于知识组件的方法论用于评估人工智能助手

摘要: 我们评估了一个由GPT-4驱动的CS1编程作业自动提示生成器。这个系统提供关于如何改进学生错误解决方案的自然语言指导，针对短编程练习。每次学生未通过测试用例时都可以请求一个提示。我们的评估涵盖了三个研究问题： RQ1：提示是否帮助学生改进他们的代码？RQ2：提示有多有效地捕捉到学生代码中的问题？RQ3：学生解决的问题是否与提示中解决的问题相同？为了定量地回答这些研究问题，我们确定了一组细粒度的知识组件，并确定了哪些适用于每个练习、错误解决方案和生成的提示。通过比较两个大型CS1课程的数据，我们发现访问提示有助于学生更快地解决他们代码中的问题，提示能够始终捕捉到学生代码中最紧迫的错误，并且一次解决几个问题的提示比单个错误更有可能导致学生直接进展。

更新时间: 2024-06-09 00:58:39

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.05603v1

Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models

Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown.

Updated: 2024-06-09 00:58:25

标题: 使用大型语言模型的会话系统的可解释用户满意度估计

摘要: 准确且可解释的用户满意度估计（USE）对于理解、评估和持续改进对话系统至关重要。用户在通用型（ChatGPT和必应Copilot）和面向任务的（客服聊天机器人）对话系统中通过不同的对话模式表达满意或不满意。现有基于特征化ML模型或文本嵌入的方法在提取可推广的模式方面存在不足，并且难以解释。在这项工作中，我们展示LLMs可以更有效地从自然语言表达中提取用户满意度的可解释信号，优于基于嵌入的方法。此外，通过使用来自标记示例的监督，LLM可以通过迭代提示框架定制为USE。由此产生的方法，用户满意度评分监督提示（SPUR），不仅具有更高的准确性，而且更具可解释性，因为它通过学习的规则本得出用户满意度得分，并提供详细的分析。

更新时间: 2024-06-09 00:58:25

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2403.12388v2